The PostgreSQL Query Optimization Guide We Wish We Had at the Start

A practical, no-fluff guide to optimizing slow PostgreSQL queries. Covers EXPLAIN ANALYZE, index strategies, N+1 detection, and the 80/20 fixes that actually move the needle.

Cristian Curteanu April 10, 2024 3 min read
The PostgreSQL Query Optimization Guide We Wish We Had at the Start

Why PostgreSQL Performance Is Deceptively Hard

PostgreSQL is remarkably good at hiding problems until they explode in production. A query that runs in 4ms under 10 concurrent users will run in 4,000ms under 500. The database hasn’t changed — the load pattern has.

This guide covers the diagnostic and fix patterns we use on every engagement where PostgreSQL is in the stack.

Start with EXPLAIN ANALYZE

Every optimization starts with measurement. EXPLAIN ANALYZE runs the query and shows you the actual execution plan with real timing.

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id, u.email
ORDER BY order_count DESC
LIMIT 100;

The key numbers to look for:

  • Seq Scan vs Index Scan — A sequential scan on a large table is almost always a problem
  • Rows Removed by Filter — High numbers mean an index would help enormously
  • Actual vs Estimated Rows — Large discrepancies mean stale statistics; run ANALYZE

The Index Strategies That Actually Matter

Covering Indexes

A covering index includes all columns the query needs, so Postgres never has to visit the heap:

-- Instead of:
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Use a covering index for common query patterns:
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at)
  INCLUDE (status, total_amount);

Partial Indexes

Index only the rows you actually query:

-- If 95% of your queries filter on active users:
CREATE INDEX idx_users_active_email ON users(email)
  WHERE deleted_at IS NULL;

This index is dramatically smaller and faster than a full index.

Expression Indexes

When your WHERE clause uses a function:

-- Slow: can't use index on email
WHERE LOWER(email) = 'user@example.com'

-- Create the expression index:
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

Killing the N+1 Query

The N+1 problem is responsible for more performance emergencies than anything else we’ve seen. It looks harmless in code but becomes catastrophic at scale.

# This runs 1 + N queries:
users = User.objects.filter(active=True)
for user in users:
    print(user.orders.count())  # 1 query per user!

The fix is always to push the work into the database:

# This runs 1 query:
from django.db.models import Count
users = User.objects.filter(active=True).annotate(
    order_count=Count('orders')
)

Connection Pool Tuning

Every connection to Postgres consumes ~5-10MB of RAM and spawns a backend process. Most applications connect far more than necessary.

Rule of thumb: max_connections = (num_cpu_cores * 4) + effective_spindle_count

For a 4-core server: set max_connections = 100 and use PgBouncer in transaction mode.

# pgbouncer.ini
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

This lets 1000 application threads share 25 database connections — a 40x reduction in connection overhead.

The Checklist

Before calling any query “optimized,” verify:

  • EXPLAIN ANALYZE shows Index Scan (not Seq Scan) on tables > 10K rows
  • Statistics are fresh (SELECT reltuples FROM pg_class WHERE relname = 'your_table')
  • Connection pool is in transaction mode
  • No unbounded queries (always use LIMIT)
  • work_mem is set appropriately for sort/hash operations (start with 64MB, monitor)
  • autovacuum is running regularly on high-write tables

What We’ve Seen in the Wild

On a recent engagement, a client’s API p99 latency was 1,847ms. After two days of query analysis and index work:

  • Added 3 covering indexes on the main query path
  • Switched from ORM lazy loading to a single aggregation query
  • Deployed PgBouncer in transaction mode

Result: p99 latency dropped to 190ms. No application code changed. No schema migrations. Just indexes and pool configuration.

The database was never the bottleneck — the access pattern was.