PostgreSQL EXPLAIN ANALYZE: Reading Query Plans Like a Senior DBA

#postgresql #database #performance #sql #query-optimization

📋 Table of Contents ▼

Your query runs fine in development with 500 rows. It takes 14 seconds in production with 8 million. You add an index - it doesn't help. You add another - still nothing. You've been in this loop before.

The problem is that most engineers treat EXPLAIN ANALYZE output as a wall of noise and either ignore it or cargo-cult "add an index on that column." This guide will make the plan readable, actionable, and honest about when adding an index is the wrong answer.

What the Planner Actually Does

Before reading output, understand what you're reading. PostgreSQL's query planner is a cost-based optimizer. It has statistics about your data (maintained by ANALYZE) and it estimates the cheapest execution plan based on those statistics. "Cheapest" means lowest combined cost of I/O and CPU, not wall clock time - which is why the planner can be wrong.

The planner has three levers per join: Nested Loop, Hash Join, and Merge Join. Per table access: Sequential Scan, Index Scan, Bitmap Heap Scan, and Index Only Scan. Choosing wrong between these is responsible for most slow queries.

Reading Your First Plan

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id, u.email;

Real output on a production table (anonymized, ~2.1M users, ~18M orders):

HashAggregate  (cost=487234.12..487891.34 rows=65722 width=52)
               (actual time=8432.112..8501.334 rows=61847 loops=1)
  Group Key: u.id, u.email
  Batches: 5  Memory Usage: 4145kB  Disk Usage: 23872kB
  ->  Hash Left Join  (cost=18432.10..485244.88 rows=265498 width=44)
                      (actual time=312.445..7891.224 rows=1247832 loops=1)
        Hash Cond: (o.user_id = u.id)
        Buffers: shared hit=43221 read=189443
        ->  Seq Scan on orders o  (cost=0.00..398221.34 rows=18441234 width=16)
                                   (actual time=0.031..4221.334 rows=18441234 loops=1)
              Buffers: shared hit=12334 read=176234
        ->  Hash  (cost=17809.44..17809.44 rows=65013 width=36)
                  (actual time=311.223..311.224 rows=61847 loops=1)
              Buckets: 65536  Batches: 1  Memory Usage: 4321kB
              ->  Index Scan using idx_users_created_at on users u
                    (cost=0.56..17809.44 rows=65013 width=36)
                    (actual time=0.123..287.334 rows=61847 loops=1)
                    Index Cond: (created_at > (now() - '30 days'::interval))
                    Buffers: shared hit=30887 read=13209
Planning Time: 2.334 ms
Execution Time: 8521.445 ms

Let me annotate what this tells us:

The HashAggregate node shows Batches: 5 and Disk Usage: 23872kB. The aggregation spilled to disk because work_mem wasn't enough to hold all 61,847 groups in memory. Setting SET work_mem = '64MB' before this query (or globally if appropriate) can eliminate that spill and cut this node from ~500ms to ~80ms.

The Hash Left Join shows actual rows=1247832 being processed. That's every order from a 30-day user cohort - 1.2M rows flowing through this join. The actual time is 7.8 seconds, which is almost all of our total time.

The Seq Scan on orders - this is the critical node. We're reading all 18.4 million order rows. The Buffers: read=176234 means 176K pages read from disk (each 8KB = ~1.4GB of I/O). This is why it's slow. No index on orders is being used here because the planner correctly determined that filtering orders by user_id after joining is cheaper than doing 61,847 separate index lookups.

The fix: The right approach here isn't an index on orders.user_id in isolation. It's flipping the query:

-- Instead of joining all orders then filtering, 
-- filter users first, then aggregate orders per user
SELECT u.id, u.email, COALESCE(agg.order_count, 0) AS order_count
FROM (
  SELECT user_id, COUNT(*) AS order_count
  FROM orders
  GROUP BY user_id
) agg
RIGHT JOIN (
  SELECT id, email FROM users
  WHERE created_at > NOW() - INTERVAL '30 days'
) u ON u.id = agg.user_id;

Or use a lateral join with a partial index on orders(user_id). The key lesson: the planner chose seq scan on orders because it's joining to 61,847 users - that's a large fraction of the table. An index scan would mean 61,847 random I/O lookups, which is worse than one sequential read.

The Row Estimate Disaster

This is the most common source of bad plans. When the planner's estimated rows differ dramatically from actual rows, every downstream decision (join strategy, memory allocation, parallelism) is based on garbage input.

EXPLAIN ANALYZE
SELECT * FROM events
WHERE payload->>'type' = 'checkout'
  AND created_at > NOW() - INTERVAL '7 days';

Seq Scan on events  (cost=0.00..234221.44 rows=12 width=842)
                    (actual time=0.234..18221.334 rows=847234 loops=1)
  Filter: ((payload->>'type' = 'checkout') AND (created_at > ...))
  Rows Removed by Filter: 9152766

Estimated 12 rows. Actual 847,234 rows. That's a 70,000x estimation error. The planner has no statistics on payload->>'type' because it's a JSONB expression - it defaults to a generic 0.5% selectivity guess.

This matters because if this result feeds into a nested loop join (which the planner chose, expecting 12 rows), and the join has 847K rows, the nested loop will execute the inner side 847K times instead of the expected 12. Plan time explodes.

The fix: Create a statistics object on the expression:

-- PostgreSQL 14+: expression statistics
CREATE STATISTICS events_payload_type ON (payload->>'type') FROM events;
ANALYZE events;

-- Or better: generate a functional index which also gives statistics
CREATE INDEX idx_events_payload_type ON events ((payload->>'type'));
ANALYZE events;

After ANALYZE, re-run your query. The estimate will be based on actual histogram data.

Bitmap Scan: When and Why

EXPLAIN ANALYZE
SELECT * FROM products
WHERE category_id IN (3, 7, 12, 45, 99)
  AND price BETWEEN 100 AND 500
  AND in_stock = true;

Bitmap Heap Scan on products  (cost=234.44..8921.33 rows=892 width=234)
                               (actual time=3.445..45.223 rows=847 loops=1)
  Recheck Cond: ((category_id = ANY ('{3,7,12,45,99}'::integer[])) 
                 AND (price >= 100) AND (price <= 500))
  Heap Blocks: exact=712
  ->  BitmapAnd  (cost=234.44..234.44 rows=892 width=0)
                 (actual time=3.112..3.113 rows=0 loops=1)
        ->  Bitmap Index Scan on idx_products_category_id
              (cost=0.00..98.22 rows=8921 width=0)
              (actual time=1.334..1.334 rows=8921 loops=1)
              Index Cond: (category_id = ANY ('{3,7,12,45,99}'::integer[]))
        ->  Bitmap Index Scan on idx_products_price
              (cost=0.00..134.22 rows=45221 width=0)
              (actual time=1.667..1.668 rows=45221 rows=0 loops=1)
              Index Cond: ((price >= 100) AND (price <= 500))

A bitmap scan builds an in-memory bitmap of matching page locations, then fetches only those pages. BitmapAnd intersects two bitmaps from two separate indexes. This is PostgreSQL combining two partial indexes at query time - something MySQL can't do without a composite index.

The Recheck Cond line means the bitmap had lossy pages (too many matches to track exactly, so it flagged entire pages and rechecks the condition at the heap). If you're seeing lots of rechecks with large Heap Blocks: lossy=N counts, work_mem is too low for bitmap operations.

Nested Loop vs Hash Join vs Merge Join

These aren't interchangeable. The planner chooses based on input sizes and available indexes.

Strategy	When optimal	When terrible
Nested Loop	Small outer, indexed inner	Large outer × large inner
Hash Join	Medium/large unsorted inputs	Low `work_mem` (spills to disk)
Merge Join	Both inputs pre-sorted	Sorting cost exceeds join cost

You can force the planner to avoid a strategy (useful for testing):

SET enable_nestloop = off;
SET enable_hashjoin = off;
EXPLAIN ANALYZE SELECT ...;  -- forces merge join
RESET enable_nestloop;
RESET enable_hashjoin;

Never leave these set in production. This is a diagnostic tool. If disabling nested loop makes your query 10x faster, the real fix is fixing the statistics or adding a covering index on the inner table.

Spotting the ORM N+1 in EXPLAIN

ORMs generate N+1s that don't look like N+1s until you look at the query plan frequency. In Prisma or TypeORM, you might see a single query that runs a nested loop with loops=N showing your N+1 was actually pushed into the database:

Nested Loop  (cost=0.56..48221.34 rows=8921 width=234)
             (actual time=0.234..12334.445 rows=8921 loops=1)
  ->  Seq Scan on users  (cost=0.00..892.34 rows=8921 width=44)
                         (actual time=0.021..12.334 rows=8921 loops=1)
  ->  Index Scan using idx_orders_user_id on orders
        (cost=0.56..5.34 rows=1 width=190)
        (actual time=1.334..1.345 rows=1 loops=8921)   -- <-- 8921 executions!
        Index Cond: (user_id = users.id)

See loops=8921 on the inner index scan? That's your N+1, inside the database. Each user triggers a separate index lookup on orders. The fix here depends on whether you actually need all orders per user or just a count - but the answer is always a join or a subquery aggregation, not a loop.

The Fix Workflow

Run EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) - always with BUFFERS to see I/O
Find the node with the highest actual time - that's your bottleneck
Check the row estimate vs actual on that node's input
If estimates are wildly off: run ANALYZE tablename and check for missing statistics on expressions/JSONB
If seq scan on large table: check if a selective index would help, or if the query can be restructured
If hash/sort spilling to disk (Batches > 1, Disk Usage > 0): increase work_mem for that session
If nested loop with high loops: consider whether a hash join would be better; check statistics

One more thing: always run EXPLAIN ANALYZE in a transaction you roll back for DML queries, and always on production data (not dev) because row counts and data distribution are what drive the planner's decisions.

Building data-intensive applications where query performance isn't an afterthought? The team at Aunimeda builds production-grade systems with PostgreSQL, proper indexing strategies, and performance budgets baked in from day one. See our custom software development services or get in touch to talk through your architecture.

PostgreSQL EXPLAIN ANALYZE: Reading Query Plans Like a Senior DBA

What the Planner Actually Does

Reading Your First Plan

The Row Estimate Disaster

Bitmap Scan: When and Why

Nested Loop vs Hash Join vs Merge Join

Spotting the ORM N+1 in EXPLAIN

The Fix Workflow

Aunimeda

Read Also

Drizzle ORM vs Prisma in 2026: A Production Engineer's Honest Comparison

How to Scale MySQL with Read Replicas When Your App Slows Down (2015)

MySQL Optimization: How We Handled 100,000 Daily Queries on PHP 5.3

Need IT development for your business?