Saqueib Ansari

Posted on Jun 15 • Originally published at qcode.in

Where PgDog Fits in Real Postgres App Architectures

#postgres #database #laravel #node

Postgres routing becomes an application problem much earlier than most teams admit. The database is still the database, but the moment you add replicas, pooling, failover, or even the possibility of sharding, your app stops talking to a single server and starts depending on traffic policy. That policy decides where reads go, what happens inside transactions, how degraded mode works, and whether queue workers behave the same way as HTTP requests.

That is the real case for PgDog. It is not just another Postgres tool with a nicer landing page. It is a sign that some teams have outgrown the fiction that routing can stay tucked inside ORM config forever.

My view is straightforward: PgDog is interesting when you want routing policy to become infrastructure instead of framework behavior. If you mostly need connection pooling, stick with PgBouncer. If reads, replicas, failover, and consistency rules are already leaking into Laravel services, Node repositories, workers, and scripts, a smarter routing layer starts to make architectural sense.

Postgres Routing Is Usually Hidden Technical Debt

A lot of systems reach for database scaling through the wrong mental model. Teams think they are solving a throughput problem in Postgres, when they are actually creating coordination problems in the app.

The first symptom is easy to recognize: too many connections. PHP-FPM workers spike, queue consumers scale out, background jobs open short-lived sessions, and Postgres starts wasting resources on connection churn instead of query execution. That is the point where people add pooling and feel smart for a week.

The second symptom is more dangerous: now there is a primary, one or more replicas, and some rough rule like "reads go here, writes go there." The app starts carrying that rule around. Sometimes it lives in a database abstraction layer. Sometimes it leaks into custom repository code. Sometimes it becomes tribal knowledge like "this endpoint must force primary because the replica lags after checkout."

That is where architecture drifts. Your HTTP app might know how to route a safe read. Your queue worker may not. Your reporting job may bypass the app container entirely and connect directly. Your migration scripts may ignore the same rules. The result is not one database architecture. It is several inconsistent ones wearing the same brand name.

This is why routing matters more than it sounds. Once the application knows too much about topology, every runtime becomes a potential split-brain of behavior.

A cleaner mental model is this:

the app should express business intent
the infrastructure should own topology intent
the boundary between the two should stay explicit

That boundary is exactly what a proxy or routing layer tries to formalize.

What Native Postgres Clients Already Give You, and What They Do Not

Before adding any proxy, it is worth being honest about what the official Postgres stack can already do.

The libpq connection layer supports multiple hosts, target_session_attrs, and load_balance_hosts; the PostgreSQL docs describe how clients can select a primary, prefer a standby, or randomize host choice during connection establishment: libpq connection parameters. That is useful, and many teams underuse it.

If all you need is resilient connection establishment, you can often get farther than expected with plain client features:

host=db-primary,db-replica-1,db-replica-2
port=5432,5432,5432
target_session_attrs=read-write
load_balance_hosts=random
connect_timeout=2

That kind of setup helps with a few important things:

choosing a writable node when several hosts are listed
preferring standby servers for specific read-only clients
failing over without hardcoding a single IP everywhere
reducing dependency on DNS tricks that age badly during incidents

But there is a hard limit here: native client host selection happens at connection time, not at query time. Once a client connects, every subsequent query stays on that backend until the connection is dropped or recycled.

That means client-native features do not solve the actual routing problems most growing apps run into:

splitting reads and writes behind a single endpoint
ensuring workers, web requests, and scripts follow the same rules
handling query-level edge cases like locking reads or write CTEs
centralizing failover and replica traffic policy
creating a path toward sharding without rewriting application connection logic

So the question is not whether Postgres clients can already do failover-aware connections. They can. The question is whether your team needs query-aware routing policy rather than better connection strings.

If the answer is no, do not add a proxy. If the answer is yes, pretending the ORM can absorb all of that cleanly is where bad architecture starts.

Where PgDog Actually Fits

PgDog positions itself as a Postgres connection pooler, load balancer, and sharding proxy. The interesting part is not the label. The interesting part is that its load balancer works by understanding Postgres queries, not just by choosing a backend once and tunneling blindly.

According to the official load-balancer docs, PgDog can inspect incoming SQL, send plain SELECT traffic to replicas, route writes to the primary, and handle important edge cases like SELECT FOR UPDATE and write-producing CTEs: PgDog load balancer overview.

That sounds like a convenience feature until you compare it with what application-managed splitting usually looks like in practice.

In many apps, read/write splitting starts with code like this:

<?php

$connection = $isWrite ? 'pgsql-primary' : 'pgsql-replica';

return DB::connection($connection)
    ->table('orders')
    ->where('user_id', $userId)
    ->latest()
    ->first();

This looks harmless. It is also the beginning of a long maintenance bill.

The bill arrives in stages:

Stage 1: You duplicate routing decisions

A Laravel app grows one set of routing helpers. A Node service grows another. Queue workers do something slightly different. CLI tasks bypass both. Nobody notices because the happy path still works.

Stage 2: Consistency bugs become context-dependent

A user updates billing details and gets redirected to a page that reads from a lagging replica. Support cannot reproduce it reliably because it only happens after writes and only on one request path.

Stage 3: Incident behavior becomes arbitrary

A replica slows down or disappears. One service spills to primary. Another keeps timing out. A reporting worker hammers the only healthy node because its routing logic was copied from an old script.

That is the point where a routing layer becomes less about convenience and more about removing policy from application code.

The strongest case for PgDog is not "our database is huge." It is "our topology policy must be enforced consistently across many clients, and we no longer trust each app runtime to get it right on its own."

PgBouncer Versus PgDog Is a Scope Decision, Not a Hype Decision

Most teams evaluating PgDog should first ask whether they really just need PgBouncer.

PgBouncer is still the cleaner answer for a huge class of systems. If the main pain is connection churn, backend pressure from too many app processes, or the need for lightweight pooling, PgBouncer remains hard to beat. It is focused, proven, and easier to reason about operationally.

Its limits are also well documented. The official PgBouncer feature matrix makes it clear that transaction pooling changes client expectations in important ways. Session-level features like SET/RESET, LISTEN, session advisory locks, and some temp-table behavior do not behave the same way there: PgBouncer features.

That tradeoff is often acceptable. Many web apps can live with it. But it is solving a different class of problem.

Here is the practical comparison.

PgBouncer is the right answer when

your primary pain is too many connections
your topology is still simple
read routing is limited or already explicit in the app
you want the fewest moving parts between app and database
you are not ready to own query-aware proxy behavior

PgDog becomes more compelling when

you want one Postgres endpoint that hides topology from apps
you need read/write routing across more than one runtime
your worker processes and CLI jobs must behave like the web app
failover behavior should be policy-driven, not framework-driven
sharding is realistic enough to influence today’s connection strategy

That last point matters. Even if you are not sharding now, teams often make a mess by baking one-host assumptions deep into every application process. A proxy can buy optionality. The mistake is paying that complexity tax before the problem is real.

My bias is conservative here. Do not add PgDog because it sounds more future-proof. Add it because your current routing behavior is already fragmented enough to justify centralization.

The Real Design Work Is in Failure Modes

Any routing layer looks good when replicas are healthy, lag is low, and traffic is predictable. The decision gets real when the system is under stress.

The PgDog documentation is useful precisely because it exposes some of the awkward reality instead of pretending all SELECT queries are equal. A query may begin with SELECT and still be operationally part of a write path.

Example 1: Locking reads are not replica reads

A common queue pattern looks like this:

BEGIN;

SELECT id, payload
FROM jobs
WHERE status = 'pending'
ORDER BY id
FOR UPDATE SKIP LOCKED
LIMIT 1;

UPDATE jobs
SET status = 'running', started_at = NOW()
WHERE id = $1;

COMMIT;

If your routing model is based on string matching or ORM-level intuition, this is where it breaks. The first statement reads rows, but it also acquires locks and participates directly in a write workflow. PgDog explicitly documents SELECT FOR UPDATE handling because a serious routing layer has to understand that this belongs on primary.

If your app owns this logic in scattered code, you must trust every implementation path to remember the same nuance.

Example 2: A `SELECT` can still write

WITH created AS (
  INSERT INTO audit_log (user_id, action)
  VALUES ($1, 'email_change')
  RETURNING id, user_id
)
SELECT u.id, u.email, c.id AS audit_id
FROM users u
JOIN created c ON c.user_id = u.id
WHERE u.id = c.user_id;

This is exactly the kind of query that defeats simplistic read/write splitting. It reads like a report, but the CTE mutates state. PgDog’s docs call out write CTE inspection because real SQL is more subtle than "verb at the start of the statement."

Replica lag is not a footnote

Even a clever proxy cannot remove asynchronous replication lag. If your system writes on primary and immediately reads dependent state from a replica, you have an application-level consistency decision to make.

That decision should be explicit. Some flows must read from primary after a write:

checkout and payment confirmation
authentication or role changes
inventory and stock reservation
any flow where stale reads create user-visible contradiction

Other flows can often tolerate replica reads:

dashboards
reporting
search result decoration
analytics and internal back-office views

The failure is not using replicas. The failure is pretending every read has the same correctness requirements.

Degraded mode must be designed before the outage

PgDog documents options around whether the primary also serves reads and whether it can temporarily absorb read traffic when replicas are unavailable. That is operationally useful. It also forces a real decision: when replicas fail, do you want stale partial service, slower correct service, or hard failure for some classes of traffic?

There is no universal right answer.

A product team usually needs to decide among patterns like these:

Fail open to primary. Read traffic spills to primary so the app stays functional, but the write node risks overload.
Fail closed for replica-only classes. Some reporting or low-priority endpoints degrade or disable cleanly.
Mixed policy. Critical user flows fall back to primary; nonessential workloads shed load or queue.

That is not just a database choice. It is product behavior under stress. A proxy can enforce the policy, but it cannot invent it for you.

What Full Stack Teams Should Decide Before Inserting a Routing Layer

Adding PgDog without making these decisions first is how teams create a more sophisticated mess.

1. Define which flows require read-your-write guarantees

Do not wave this away with "we use replicas for reads." Map the actual product surfaces. If a user changes something and expects the next screen to reflect it immediately, that path probably cannot rely on asynchronous replicas.

This is especially relevant in Laravel and Node stacks that mix synchronous user requests with async jobs. A queue worker may update state that a web request reads a second later. If that request is replica-routed by default, you have just built inconsistency into the UX.

2. Standardize the endpoint story

If you adopt PgDog, the win is consistency. The app should stop encoding topology in ten places.

A Node service should be able to treat the database as a stable logical endpoint:

import { Pool } from 'pg';

export const pool = new Pool({
  host: process.env.PGDOG_HOST,
  port: 6432,
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  max: 20,
  connectionTimeoutMillis: 2000,
  idleTimeoutMillis: 30000,
});

export async function loadAccountSummary(accountId) {
  const { rows } = await pool.query(
    `
      select id, email, plan, created_at
      from accounts
      where id = $1
    `,
    [accountId]
  );

  return rows[0] ?? null;
}

That is a better abstraction boundary than wiring separate read and write DSNs into every service and hoping developers always choose correctly.

3. Decide how transactions interact with routing

Manually started transactions should usually stay boring. Once a unit of work spans several statements, the cost of clever routing often outweighs the benefit. Most teams are better off treating explicit transactions as primary-bound unless they have a very specific reason not to.

That is one reason proxy-level routing can be safer than app-level heuristics. The routing rules can remain conservative by default.

4. Be honest about observability

A routing layer with poor observability is a blame machine. If latency jumps, you need to know whether the problem is the proxy, a replica, the primary, a specific query class, or the app pool behavior.

At minimum, teams should expect visibility into:

backend health and ban state
route-level latency and error rate
replica saturation versus primary saturation
spillover behavior during degraded mode
connection pool pressure and queueing

If you cannot observe those, adding a proxy may make incident response worse before it makes architecture better.

5. Treat sharding as a roadmap signal, not a vanity signal

PgDog’s broader promise includes sharding. That matters only if your domain model, tenancy boundaries, or growth path make it plausible. Do not let hypothetical future sharding justify present-day complexity on its own.

But if your team already expects tenant partitioning, regional data placement, or hot-spot isolation to matter later, it is reasonable to prefer an access layer that does not force every client to relearn topology from scratch.

A Sensible Adoption Path

The best way to adopt a smarter routing layer is not with a big-bang cutover.

Start simple:

stabilize connection behavior first
document consistency-sensitive flows
measure read volume that is actually safe to offload
introduce the proxy for one service class or environment
validate degraded-mode behavior before calling it production-ready

A configuration skeleton might look like this:

[general]
load_balancing_strategy = "round_robin"
read_write_split = "exclude_primary"

[[databases]]
name = "app"
role = "primary"
host = "10.0.0.10"
port = 5432

[[databases]]
name = "app"
role = "replica"
host = "10.0.0.11"
port = 5432

[[databases]]
name = "app"
role = "replica"
host = "10.0.0.12"
port = 5432

That is not the hard part. The hard part is validating whether your app semantics match the routing policy you just declared.

A team that has not mapped stale-read tolerance, transaction expectations, and fallback behavior is not deploying a smart proxy. It is outsourcing confusion to infrastructure.

The Practical Decision Rule

PgDog makes sense when Postgres routing has already escaped the database team and started shaping application design. If replicas, failover, and traffic policy are leaking into repositories, workers, and framework config, centralizing those rules is a good architectural move.

If your problem is still mostly connection count, choose PgBouncer and keep the stack smaller. If your problem is now policy consistency across many clients, PgDog is the more serious tool.

The rule of thumb is simple: add a routing layer when topology has become part of product behavior, not just infrastructure trivia. At that point, hiding the problem inside app code is usually the more expensive choice.

Read the full post on QCode: https://qcode.in/pgdog-smarter-postgres-routing-apps/

DEV Community

Where PgDog Fits in Real Postgres App Architectures

Postgres Routing Is Usually Hidden Technical Debt

What Native Postgres Clients Already Give You, and What They Do Not

Where PgDog Actually Fits

Stage 1: You duplicate routing decisions

Stage 2: Consistency bugs become context-dependent

Stage 3: Incident behavior becomes arbitrary

PgBouncer Versus PgDog Is a Scope Decision, Not a Hype Decision

PgBouncer is the right answer when

PgDog becomes more compelling when

The Real Design Work Is in Failure Modes

Example 1: Locking reads are not replica reads

Example 2: A `SELECT` can still write

Replica lag is not a footnote

Degraded mode must be designed before the outage

What Full Stack Teams Should Decide Before Inserting a Routing Layer

1. Define which flows require read-your-write guarantees

2. Standardize the endpoint story

3. Decide how transactions interact with routing

4. Be honest about observability

5. Treat sharding as a roadmap signal, not a vanity signal

A Sensible Adoption Path

The Practical Decision Rule

Top comments (0)

Postgres Routing Is Usually Hidden Technical Debt

What Native Postgres Clients Already Give You, and What They Do Not

Where PgDog Actually Fits

Stage 1: You duplicate routing decisions

Stage 2: Consistency bugs become context-dependent

Stage 3: Incident behavior becomes arbitrary

PgBouncer Versus PgDog Is a Scope Decision, Not a Hype Decision

PgBouncer is the right answer when

PgDog becomes more compelling when

The Real Design Work Is in Failure Modes

Example 1: Locking reads are not replica reads

Example 2: A SELECT can still write

Replica lag is not a footnote

Degraded mode must be designed before the outage

What Full Stack Teams Should Decide Before Inserting a Routing Layer

1. Define which flows require read-your-write guarantees

2. Standardize the endpoint story

3. Decide how transactions interact with routing

4. Be honest about observability

5. Treat sharding as a roadmap signal, not a vanity signal

A Sensible Adoption Path

The Practical Decision Rule

Example 2: A `SELECT` can still write