‍Signed, Sealed, Delivered: Low-Latency, Reliable Messaging with Postgres

Many production systems are a collection of loosely coupled services that communicate with each other using messages or events. The distributed nature of these systems makes reliability and correctness challenging. Lost messages or duplicate processing can cause bugs ranging from minor annoyances to disastrous outages.

Postgres is an elegant solution for implementing a messaging service, especially for applications that already use it to store data. The strong transactional guarantees of Postgres allow you to both send and consume messages exactly-once, making it easier to build highly reliable systems. While there exist dedicated notification services like AWS SNS, these do not come with strong transactional guarantees out of the box and add architectural complexity or vendor lock-in.

In this post, we’ll develop this idea: how Postgres simplifies the deployment of reliable messaging. We will describe how to implement notifications as a Postgres table, how to perform atomic messaging, how to use LISTEN/NOTIFY to improve performance, and finally how to ensure applications never miss a single notification.

Storing notifications in Postgres

To build notifications in Postgres, you can make each message a row in a Postgres table. Here’s what such a table would look like:

Each message, uniquely identified by an ID, is produced for the consumption of a unique recipient, also identified by an ID. Topics are a nice-to-have metadata column that helps categorize communications. To send a message, write a row, to consume, read the row and flip the “consumed” column.

The reason this simple abstraction is so powerful is that it allows you to provide exactly-once semantics for sending and consuming methods, helping solve difficult production challenges like atomic workflows and duplicate messages.

Atomic messaging

Messages are often part of multi-step workflows that must run atomically. For example, an application handling payments might update inventory, process payments, then use messaging to trigger a shipment. These workflows must be atomic, that is, either all the steps complete, or none do. This prevents inconsistencies between systems, e.g., not shipping an item that was paid for. With notifications stored in a Postgres table, workflows can write messages exactly-once, as part of a larger transaction.

On the receiving side, we can also guarantee that the message is consumed atomically with other business logic, for example, scheduling a shipment:

If the consumer process crashes after consuming the message but before scheduling the shipment, the message will remain marked as “unconsumed,” and a recovery worker will be able to pick up the work.

Duplicate messages

Duplicate messaging is a common problem in distributed architectures, for example when multiple workers can generate notifications about the same event. Duplicate messaging can lead to duplicate work, which at best wastes computing resources and at worst creates inconsistencies and bugs.

Storing notifications in a Postgres table allows us to handle concurrent execution out of the box. We leverage the unique constraint on the message ID as an idempotency key. When multiple workers attempt to insert the same message in the table, only one will be inserted into the database.

This extends to duplicate consumption. In a distributed setup, even with unique destination IDs, who knows if two processes won't run with the same ID and attempt to consume a message? By consuming messages exactly-once, we ensure that just one of these processes will read the message and perform the work associated with it.

Efficient and low-latency messaging

One challenge with storing notifications in Postgres is that consumers don’t know ahead of time when a new notification arrives. The obvious solution is to periodically poll the database to check for new messages, however this scales poorly as the number of consumers increase. Increasing the polling interval trades database load for poor application responsiveness.

We can address this challenge by leveraging Postgres’ LISTEN/NOTIFY. This feature allows Postgres clients to subscribe to notifications, which are emitted when a new message is written to the notifications table. That way, consumers do not need to poll the database, but are instead awakened when a new message is available.

To use LISTEN/NOTIFY, we first install a trigger function that fires every time a new message is inserted in the notifications table:

This trigger executes in the same transaction that inserts the notification. The trigger function calls pg_notify with a payload embedding both the destination ID and a topic.

On the application layer, we need to open a connection that listens for notifications (executing LISTEN notifications_channel on a dedicated connection). Here is a simple example of a loop waiting for notifications and parsing their payload:

How to not miss any notifications

However, there is one crucial challenge for making this system reliable: there is a possibility that notifications are lost when the consumer’s connection is severed. Network hiccups are common and often last just long enough to get you disconnected. Consider the following sequence of events:

  1. Insertion to the notifications table triggers pg_notify
  2. Consumer’s subscription connection to Postgres is severed, after pg_notify is called but before the signal has made it out of the Postgres server
  3. Consumers detect the connection issue, and re-creates a connection.

At the end of this sequence, the consumer will have lost the notification.

The solution is to explicitly catch connection errors and, in addition to creating a new LISTEN connection, query the notifications table, once, whenever we detect that the connection is severed. Here is an example of what it looks like at the application level:

Marking a pause before attempting to re-acquire the connection is important. It can otherwise lead to cascading failures if the database is overwhelmed by new connection attempts. A simple policy is to retry with exponential backoff, that is, slow down retries (up to a limit) when multiple attempts to re-acquire the connection fail.

Further, adding a random time interval (“jitter”) to the delay between retries is critical for stability in a distributed setup: a large enough number of consumers, all attempting to reconnect at the same time (regardless of the delay between attempts) can cause a “thundering herd” that will overwhelm the database.

Learn more

This is how we built a reliable and efficient notification system using Postgres. If you appreciate reliable systems, we’d love to hear from you. At DBOS, our goal is to make durable workflows as easy to work with as possible. Check it out:

Insights

Recent articles

The latest in durable execution, AI workflows & more.

How To
Apr 29, 2026

‍Goto Considered Harmful (2026 version)

Postgres creator and DBOS co-founder Dr. Mike Stonebraker explains why workflow architectures are preferred over event-driven and why AI frameworks are converging on a workflow-style model and supporting durable execution.
Mike Stonebraker
Benchmarks
Apr 23, 2026

Does Postgres Scale?

Benchmarking the workflow execution and workflow queueing scalability of a single Postgres server.
Peter Kraft
How To
Apr 21, 2026

Making AI Agents Stateful and Fault-tolerant on Google Cloud Run

Using open source DBOS durable execution libraries to make Google Cloud Run a safe place to run failure-proof, stateful agentic AI workflows.
Henri Maxime Demoulin