Introduction to Analyzing Queries with PostgreSQL EXPLAIN


Introduction to Analyzing Queries with PostgreSQL EXPLAIN

In the world of relational databases, performance is paramount. Slow queries can cripple application responsiveness, frustrate users, and consume excessive server resources. PostgreSQL, a powerful and popular open-source object-relational database system, provides a critical tool for understanding and optimizing query performance: the EXPLAIN command. Mastering EXPLAIN is an essential skill for any developer or database administrator working with PostgreSQL, allowing you to peer under the hood and see exactly how the database intends to execute your SQL queries.

This article provides a comprehensive introduction to using EXPLAIN in PostgreSQL. We will cover its basic usage, delve into interpreting the query plan output, explore advanced options like ANALYZE and BUFFERS, examine common plan nodes, discuss how to identify performance bottlenecks, and touch upon strategies for improving query performance based on EXPLAIN analysis. By the end, you’ll have a solid foundation for using this indispensable tool to make your PostgreSQL queries faster and more efficient.

Why Query Analysis Matters

Before diving into EXPLAIN, let’s briefly reiterate why understanding query execution is so important:

  1. Performance Bottlenecks: Inefficient queries are often the primary source of performance issues in database-driven applications. Identifying these queries is the first step towards optimization.
  2. Resource Consumption: Slow queries can consume significant CPU, memory (RAM), and I/O resources, potentially impacting other operations on the database server.
  3. Scalability: As data volumes grow, poorly written queries that performed adequately on small datasets can become prohibitively slow. Understanding execution plans helps write queries that scale better.
  4. Informed Optimization: Without knowing how a query is executed, attempts at optimization (like adding indexes) are often guesswork. EXPLAIN provides the data needed for targeted improvements.
  5. Understanding the Planner: It helps you understand the decisions made by the PostgreSQL query planner, leading to better intuition about how different SQL constructs and schema designs influence performance.

The Basics: EXPLAIN Your Query

The fundamental purpose of the EXPLAIN command is to show the execution plan that the PostgreSQL planner generates for a given SQL statement. It does not execute the query (unless you use the ANALYZE option, discussed later); it only reveals the planner’s strategy.

The basic syntax is straightforward:

sql
EXPLAIN <your_sql_query>;

For example, let’s consider a simple table and query:

“`sql
— Sample Table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_login TIMESTAMP,
status VARCHAR(10) DEFAULT ‘active’
);

— Add some indexes
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_status ON users (status);

— Insert some sample data (simplified)
INSERT INTO users (username, email, status) VALUES
(‘alice’, ‘[email protected]’, ‘active’),
(‘bob’, ‘[email protected]’, ‘inactive’),
(‘charlie’, ‘[email protected]’, ‘active’),
(‘david’, ‘[email protected]’, ‘active’);

— Now, let’s EXPLAIN a simple query
EXPLAIN SELECT id, username FROM users WHERE status = ‘active’;
“`

Running this EXPLAIN command might produce output similar to this (the exact costs and rows will vary based on your PostgreSQL version, configuration, and the actual data/statistics):

“`text
QUERY PLAN


Seq Scan on users (cost=0.00..1.05 rows=3 width=13)
Filter: (status = ‘active’::text)
(2 rows)
“`

Interpreting the Basic Output

Even this simple output tells us several things:

  1. QUERY PLAN: This indicates the start of the execution plan.
  2. Seq Scan on users: This is the core operation, called a plan node. It means PostgreSQL plans to read the users table sequentially (from beginning to end).
  3. cost=0.00..1.05: This provides the planner’s estimated cost for this operation.
    • 0.00 (Startup Cost): The estimated cost to retrieve the first row.
    • 1.05 (Total Cost): The estimated cost to retrieve all rows for this node.
    • Important: Cost units are arbitrary and relative. They don’t directly map to seconds or milliseconds. They represent units of work, roughly calibrated to sequential page fetches. Lower cost is generally better. The startup cost is crucial for queries with LIMIT.
  4. rows=3: The estimated number of rows that this node will output.
  5. width=13: The estimated average width (in bytes) of the rows output by this node.
  6. Filter: (status = 'active'::text): This shows a condition applied after fetching rows during the sequential scan. Rows are read, and then checked against this filter.

In this case, the planner chose a Sequential Scan because the table is very small, and it estimated that reading the whole table and filtering would be cheaper than using the idx_users_status index.

Understanding the Query Plan Structure

Query plans are typically represented as a tree of nodes. Execution generally starts from the bottom (leaf nodes) and flows upwards towards the root node. Each node performs a specific operation (like scanning a table, joining tables, sorting results) and passes its output rows to its parent node.

Consider a slightly more complex query:

sql
EXPLAIN SELECT u.username, o.order_date
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active' AND o.amount > 100
ORDER BY o.order_date DESC;

(Assuming an orders table exists with id, user_id, order_date, amount columns and appropriate indexes)

The EXPLAIN output might look something like this (simplified):

“`text
QUERY PLAN


Sort (cost=150.25..151.00 rows=300 width=25)
Sort Key: o.order_date DESC
-> Hash Join (cost=55.50..130.75 rows=300 width=25)
Hash Cond: (o.user_id = u.id)
-> Seq Scan on orders o (cost=0.00..65.00 rows=1000 width=16)
Filter: (amount > 100)
-> Hash (cost=45.00..45.00 rows=800 width=17)
-> Seq Scan on users u (cost=0.00..45.00 rows=800 width=17)
Filter: (status = ‘active’::text)
(9 rows)
“`

Let’s break down this tree structure:

  1. Root Node (Sort): The final operation is sorting the results by order_date in descending order. It receives rows from its child node.
  2. Child of Sort (Hash Join): This node joins rows from its two children based on the user_id = u.id condition using a Hash Join algorithm.
  3. Children of Hash Join:
    • Seq Scan on orders o: Reads the orders table sequentially, filtering for amount > 100. The results of this scan are used to probe the hash table built by the other child.
    • Hash: This node takes the output of its child (Seq Scan on users u) and builds an in-memory hash table based on the user_id column (derived from the Hash Cond of the parent Hash Join).
    • Child of Hash (Seq Scan on users u): Reads the users table sequentially, filtering for status = 'active'. Its output is fed into the Hash node.

Execution flow (simplified):
* Scan users, filter by status, build a hash table in memory based on user_id.
* Scan orders, filter by amount.
* For each row from the orders scan, probe the hash table using user_id to find matching users rows.
* Combine matching rows (join).
* Sort the resulting joined rows by order_date.
* Return the final sorted result.

The indentation helps visualize the parent-child relationships in the plan tree.

EXPLAIN ANALYZE: Getting Real Performance Data

While EXPLAIN shows the planned execution and estimated costs/rows, these are based on table statistics and cost models, which might not always be perfectly accurate. To see what actually happens when the query runs, including actual timings and row counts, you use EXPLAIN ANALYZE:

sql
EXPLAIN ANALYZE <your_sql_query>;

Crucial Warning: EXPLAIN ANALYZE executes the query!
* For SELECT queries, this is usually safe, although it still consumes resources.
* For INSERT, UPDATE, DELETE, or DDL statements, it will perform the data modification. If you want to analyze a modification query without changing data, wrap it in a transaction and roll it back:

sql
BEGIN;
EXPLAIN ANALYZE UPDATE users SET last_login = NOW() WHERE id = 1;
ROLLBACK; -- Reverts the UPDATE

Let’s re-run our first simple example with ANALYZE:

sql
EXPLAIN ANALYZE SELECT id, username FROM users WHERE status = 'active';

The output will now include actual performance metrics:

“`text
QUERY PLAN


Seq Scan on users (cost=0.00..1.05 rows=3 width=13) (actual time=0.015..0.018 rows=3 loops=1)
Filter: (status = ‘active’::text)
Rows Removed by Filter: 1
Planning Time: 0.080 ms
Execution Time: 0.035 ms
(5 rows)
“`

New Metrics Introduced by ANALYZE

  1. actual time=0.015..0.018: The actual time taken, in milliseconds.
    • 0.015 (Actual Startup Time): Time taken to get the first row from this node.
    • 0.018 (Actual Total Time): Time taken for this node to produce all its output rows. Crucially, this total time includes the time spent in all its child nodes. Therefore, the top-level node’s total time approximates the total execution time for that part of the plan.
  2. rows=3: The actual number of rows output by this node during execution. Compare this to the estimated rows from the plain EXPLAIN. Large discrepancies often indicate outdated or inaccurate table statistics.
  3. loops=1: The number of times this specific node was executed. For simple plans, it’s usually 1. In nested loops or subplans, a node might be executed multiple times. The actual time reported is the average time per loop, and the total time for the node across all loops is actual total time * loops.
  4. Rows Removed by Filter: (Optional, shown when filtering occurs) Indicates how many rows were read but discarded by the Filter condition.
  5. Planning Time: The time taken by the PostgreSQL planner to generate the query plan itself.
  6. Execution Time: The total time taken to execute the query plan, from the start of the first node to the completion of the root node.

Comparing estimated rows (rows=3) with actual rows (rows=3) here shows the planner’s estimate was accurate for this simple case. The query executed very quickly (0.035 ms).

Common Query Plan Nodes Explained

Understanding the different types of nodes that can appear in a query plan is key to interpreting EXPLAIN output. Here are some of the most common ones:

1. Scan Nodes (Reading data from tables)

  • Seq Scan (Sequential Scan):

    • What: Reads the entire table block by block from beginning to end.
    • When Chosen: For small tables; when no suitable index exists for the WHERE clause; when the query needs to retrieve a large percentage of the table’s rows (making index lookups less efficient).
    • Cost Factors: Primarily depends on the table size (number of blocks/pages). Filtering adds CPU cost.
    • Optimization: If a Seq Scan on a large table is slow and the WHERE clause is selective (returns a small fraction of rows), consider adding an index on the relevant column(s). Ensure statistics are up-to-date (ANALYZE <table_name>).
    • Example Output: Seq Scan on users (cost=0.00..45.00 rows=800 width=17)
  • Index Scan:

    • What: Uses an index to find the locations (TIDs – tuple identifiers) of relevant rows and then fetches those rows from the table’s main data storage (the “heap”).
    • When Chosen: When a WHERE clause condition matches an available index and is selective enough that the planner estimates fewer page accesses compared to a Seq Scan. Often used for equality (=), range (<, >, BETWEEN), or IN conditions on indexed columns.
    • Cost Factors: Cost of traversing the index B-tree + cost of fetching each qualifying row from the table heap. Can become expensive if many rows match (many random heap accesses).
    • Optimization: Ensure the index is selective. Sometimes a multi-column index is needed. Can potentially be improved to an Index Only Scan.
    • Example Output: Index Scan using idx_users_email on users (cost=0.28..8.29 rows=1 width=50) (assuming a query like WHERE email = '...')
      • Often includes Index Cond: showing the condition used for the index lookup.
  • Index Only Scan:

    • What: Similar to an Index Scan, but retrieves all required data directly from the index itself, without needing to access the table heap. This is much faster if applicable.
    • When Chosen: When all columns needed by the query (in SELECT, WHERE, ORDER BY, etc.) are available within the index definition, AND the visibility map indicates that all relevant rows are known to be visible to all transactions (avoiding the need to check the heap for visibility info).
    • Cost Factors: Cost of traversing the index. Significantly cheaper than an Index Scan as it avoids random heap I/O.
    • Optimization: Ensure indexes cover all needed columns for specific queries. Run VACUUM regularly to keep the visibility map up-to-date, making Index Only Scans more likely.
    • Example Output: Index Only Scan using users_pkey on users (cost=0.28..4.30 rows=1 width=4) (assuming SELECT id FROM users WHERE id = 1)
  • Bitmap Heap Scan (combined with Bitmap Index Scan):

    • What: A two-phase approach. First, one or more Bitmap Index Scan nodes scan index(es) to create an in-memory bitmap of potential row locations. Then, the Bitmap Heap Scan node fetches the actual rows from the heap in physical location order (more efficient than random access for multiple rows).
    • When Chosen: Often used when combining conditions on multiple indexes using OR or AND; when an index is moderately selective (returning too many rows for an Index Scan to be efficient but fewer than a Seq Scan); sometimes for IN clauses with many values.
    • Cost Factors: Cost of index scan(s) + cost of building bitmap + cost of fetching heap pages (optimized for locality).
    • Optimization: Analyze the underlying Bitmap Index Scan(s). Ensure appropriate indexes exist. Sometimes adjusting work_mem can influence bitmap creation.
    • Example Output:
      text
      Bitmap Heap Scan on users (cost=12.50..34.75 rows=100 width=50)
      Recheck Cond: ((status = 'active') OR (email LIKE '%@domain.com'))
      -> BitmapOr (cost=12.50..12.50 rows=100 width=0)
      -> Bitmap Index Scan on idx_users_status (cost=0.00..4.25 rows=50 width=0)
      Index Cond: (status = 'active')
      -> Bitmap Index Scan on idx_users_email (cost=0.00..8.00 rows=50 width=0)
      Index Cond: (email LIKE '%@domain.com') -- Note: LIKE often requires specific index types like trigram
  • TID Scan (Tuple ID Scan):

    • What: Fetches rows directly based on their physical location (TID).
    • When Chosen: Typically used for conditions like WHERE ctid = '...'. Not common in general application queries but can be seen internally or in specific use cases.
    • Cost Factors: Very fast for fetching a known, small set of rows by physical address.

2. Join Nodes (Combining data from multiple tables)

  • Nested Loop Join:

    • What: The simplest join algorithm. It iterates through rows from the outer relation (the first child node). For each outer row, it scans the inner relation (the second child node) to find matching rows based on the join condition.
    • When Chosen: Typically when one of the relations (usually the inner one) is very small. Often used when the inner side can efficiently use an index lookup for each outer row (e.g., Index Scan as the inner plan).
    • Cost Factors: Cost is roughly (Cost of Outer Scan) + (Outer Rows * Cost of Inner Scan per row). Can be very slow if both relations are large and the inner scan is inefficient (e.g., Seq Scan).
    • Optimization: Ensure an index exists on the join key column in the inner table.
    • Example Output:
      text
      Nested Loop (cost=0.28..20.35 rows=10 width=60)
      -> Seq Scan on users u (cost=0.00..1.05 rows=5 width=50) -- Outer
      Filter: (status = 'active')
      -> Index Scan using idx_orders_user_id on orders o (cost=0.28..3.85 rows=2 width=16) -- Inner (per outer row)
      Index Cond: (user_id = u.id)
    • Variations: Sometimes includes Materialize on the inner side if it’s scanned repeatedly without an index.
  • Hash Join:

    • What: A two-phase join.
      1. Build Phase: Scans the inner relation (usually the smaller one after filtering) and builds an in-memory hash table using the join key(s).
      2. Probe Phase: Scans the outer relation and, for each row, probes the hash table using the join key(s) to find matches.
    • When Chosen: Generally efficient for joining large tables when an equijoin condition (=) is used. Requires sufficient memory (work_mem) to hold the hash table of the inner relation.
    • Cost Factors: Cost of scanning both tables + cost of building the hash table + cost of probing. Can spill to disk if the hash table exceeds work_mem, significantly increasing cost.
    • Optimization: Increase work_mem if hash joins are spilling (check with EXPLAIN (ANALYZE, BUFFERS)). Ensure statistics are accurate so the planner chooses the smaller relation for the build phase.
    • Example Output: (Seen earlier)
      text
      Hash Join (cost=55.50..130.75 rows=300 width=25)
      Hash Cond: (o.user_id = u.id)
      -> Seq Scan on orders o (cost=0.00..65.00 rows=1000 width=16) -- Probe side
      -> Hash (cost=45.00..45.00 rows=800 width=17) -- Build side input
      -> Seq Scan on users u (cost=0.00..45.00 rows=800 width=17)
  • Merge Join (Merge Sort Join):

    • What: Requires both input relations to be sorted on the join key(s). It then reads through both sorted streams concurrently, matching rows like merging two sorted lists.
    • When Chosen: When both inputs are already sorted (e.g., from an index scan or a previous sort operation), or when the planner estimates that sorting both inputs and then merging is cheaper than a Hash Join (often for very large tables where hash tables might spill, or for non-equijoin conditions like >).
    • Cost Factors: Cost of scanning both tables + cost of sorting if inputs are not already sorted. The merge phase itself is typically very cheap.
    • Optimization: If Merge Join is chosen and requires explicit Sort nodes underneath, ensure work_mem is sufficient for sorting. Pre-sorting via indexes (on join keys) can make Merge Join very efficient.
    • Example Output:
      text
      Merge Join (cost=100.50..150.75 rows=500 width=40)
      Merge Cond: (u.id = o.user_id)
      -> Index Scan using users_pkey on users u (cost=0.28..60.30 rows=1000 width=17) -- Already sorted by ID
      -> Sort (cost=100.22..102.72 rows=1000 width=23) -- Explicit sort needed
      Sort Key: o.user_id
      -> Seq Scan on orders o (cost=0.00..65.00 rows=1000 width=23)

3. Other Important Nodes

  • Sort:

    • What: Sorts the input rows based on specified key(s).
    • When Chosen: Required for ORDER BY clauses (unless an index provides the required order), Merge Joins, certain GROUP BY strategies, DISTINCT operations.
    • Cost Factors: CPU and memory usage. Can be very expensive, especially if the data to be sorted exceeds work_mem and needs to spill to disk (look for “Sort Method: external merge Disk:” in verbose ANALYZE output).
    • Optimization: Increase work_mem if sorts spill to disk. See if an index can satisfy the ORDER BY clause to avoid the sort entirely. Limit the number of rows being sorted if possible.
    • Example Output: Sort (cost=150.25..151.00 rows=300 width=25)
  • Aggregate / HashAggregate / GroupAggregate:

    • What: Performs aggregation operations (like COUNT, SUM, AVG, MAX, MIN) usually associated with a GROUP BY clause or aggregate functions without GROUP BY. HashAggregate uses an in-memory hash table (similar to Hash Join) which is often faster for many groups. GroupAggregate is often used when the input is pre-sorted.
    • When Chosen: Whenever aggregate functions or GROUP BY are used.
    • Cost Factors: Depends on the number of input rows and the number of distinct groups. HashAggregate can spill to disk if the hash table exceeds work_mem.
    • Optimization: Increase work_mem if spilling occurs. Ensure indexes support any filtering before aggregation. Sometimes rewriting the query or using window functions can be more efficient.
    • Example Output: HashAggregate (cost=180.00..182.50 rows=200 width=32)
  • Limit:

    • What: Stops processing after retrieving the specified number of rows.
    • When Chosen: Directly corresponds to the LIMIT clause in SQL.
    • Cost Factors: Often very cheap, especially the startup cost. Significantly impacts total cost if it allows the planner to avoid processing large parts of the underlying plan (e.g., stopping an Index Scan early).
    • Optimization: Combining ORDER BY with LIMIT is often very efficient if an index matches the ORDER BY clause.
    • Example Output: Limit (cost=0.28..4.58 rows=10 width=50) (for LIMIT 10)
  • Materialize:

    • What: Caches the output of its child node in memory. Subsequent requests for rows from the parent node read from this cache instead of re-executing the child.
    • When Chosen: Often used in Nested Loop joins when the inner side is scanned multiple times and isn’t using an efficient index lookup, or in CTEs (WITH clauses) that are referenced multiple times.
    • Cost Factors: Adds a small overhead for the initial population but can save significant cost if the child subplan is expensive and executed many times. Uses memory.
    • Example Output: -> Materialize (cost=10.00..15.00 rows=50 width=20)
  • Gather / Gather Merge:

    • What: Nodes related to parallel query execution. Worker processes execute parts of the plan (the subplan under Gather or Gather Merge), and this node collects the results from the workers. Gather Merge preserves sort order from parallel workers.
    • When Chosen: The planner decides based on cost estimates whether parallelizing parts of the query (like Seq Scans, Hash Joins) would be faster using multiple CPU cores. Controlled by settings like max_parallel_workers_per_gather.
    • Cost Factors: Involves coordination overhead but can significantly speed up execution on multi-core systems for CPU-bound or I/O-bound tasks amenable to parallelization.
    • Example Output: Gather (cost=1000.00..8000.00 rows=10000 width=50)

Advanced EXPLAIN Options

Beyond the basic EXPLAIN and EXPLAIN ANALYZE, several options can provide even more insight:

sql
EXPLAIN (option [, ...]) <your_sql_query>;

Common options include:

  • ANALYZE [boolean]: Executes the query and shows actual run times and row counts. Defaults to false. We’ve covered this extensively.
  • VERBOSE [boolean]: Displays additional information, such as schema-qualified table and function names, output column lists for each node, and labels for variables. Can make complex plans easier to read. Defaults to false.
  • COSTS [boolean]: Includes the estimated startup and total costs for each node. Defaults to true. Setting it to false can slightly simplify the output if you only care about node types and row counts.
  • SETTINGS [boolean]: Includes information on configuration parameters that affect query planning and were changed from the built-in defaults (e.g., work_mem, enable_seqscan). Useful for understanding why the planner made certain choices. Defaults to false.
  • BUFFERS [boolean]: Requires ANALYZE. Shows information about buffer usage (memory and disk I/O). This is extremely valuable for identifying I/O bottlenecks. Defaults to false.
  • WAL [boolean]: Requires ANALYZE. Includes information about Write-Ahead Log (WAL) record generation. Useful for analyzing the impact of DML statements (INSERT, UPDATE, DELETE). Defaults to false.
  • TIMING [boolean]: Requires ANALYZE. Includes the actual startup and total time spent in each node. Defaults to true when ANALYZE is used. Turning it off can slightly reduce profiling overhead if you only need buffer or row count info from ANALYZE.
  • SUMMARY [boolean]: Includes a summary line at the end (total planning and execution time). Defaults to true when ANALYZE is used, false otherwise.
  • FORMAT { TEXT | XML | JSON | YAML }: Specifies the output format. Defaults to TEXT. Structured formats like JSON are excellent for programmatic analysis or feeding into visualization tools.

Deep Dive: EXPLAIN (ANALYZE, BUFFERS)

The BUFFERS option is arguably one of the most useful additions to ANALYZE. It reveals how different parts of the plan interact with PostgreSQL’s buffer cache (shared_buffers) and the disk.

sql
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE id = 500;

Output might look like:

“`text
QUERY PLAN


Index Scan using users_pkey on users (cost=0.29..8.30 rows=1 width=75) (actual time=0.035..0.036 rows=1 loops=1)
Index Cond: (id = 500)
Buffers: shared hit=4
Planning Time: 0.110 ms
Execution Time: 0.055 ms
(5 rows)

— Another example, perhaps involving disk reads
EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM large_table;
“`

“`text
QUERY PLAN


Finalize Aggregate (cost=19005.00..19005.01 rows=1 width=8) (actual time=155.230..155.231 rows=1 loops=1)
Buffers: shared hit=5 read=10000
-> Gather (cost=19004.00..19004.01 rows=2 width=8) (actual time=155.100..155.220 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5 read=10000
-> Partial Aggregate (cost=18004.00..18004.01 rows=1 width=8) (actual time=148.500..148.501 rows=1 loops=3)
Buffers: shared hit=5 read=10000
-> Parallel Seq Scan on large_table (cost=0.00..17500.00 rows=201600 width=0) (actual time=0.015..135.120 rows=166667 loops=3)
Buffers: shared hit=5 read=10000
Planning Time: 0.150 ms
Execution Time: 155.300 ms
(11 rows)
“`

Interpreting Buffers output:

  • shared hit=N: Number of buffer hits in PostgreSQL’s shared buffer cache. This means the required data block was already found in RAM. Hits are fast.
  • shared read=N: Number of buffer reads. The data block was not in the shared buffer cache and had to be read from the operating system / disk. Reads are slow (especially from spinning disks). High read counts often indicate I/O bottlenecks.
  • shared dirtied=N: Number of buffers that were modified by this query within the shared buffer cache (relevant for UPDATE/INSERT).
  • shared written=N: Number of previously dirtied buffers that were written out to disk by this backend during this query’s execution.
  • local hit/read/dirtied/written: Buffer activity related to temporary tables or temporary data structures (like large sorts or hash joins spilling to disk).
  • temp read/written: Number of blocks read from / written to temporary files (e.g., for sorts or hash joins exceeding work_mem). These indicate disk spilling and are usually performance killers.

Analyzing BUFFERS helps pinpoint whether a query is slow due to CPU work or because it’s waiting for data from disk (high read or temp read/written counts).

Using FORMAT JSON

The JSON format is incredibly useful for tools:

sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT id FROM users WHERE status = 'active';

This produces a detailed JSON structure representing the plan tree, including all the metrics from ANALYZE and BUFFERS, suitable for parsing and visualization.

json
[
{
"Plan": {
"Node Type": "Seq Scan",
"Parallel Aware": false,
"Relation Name": "users",
"Alias": "users",
"Startup Cost": 0.00,
"Total Cost": 1.05,
"Plan Rows": 3,
"Plan Width": 4,
"Actual Startup Time": 0.012,
"Actual Total Time": 0.015,
"Actual Rows": 3,
"Actual Loops": 1,
"Filter": "(status = 'active'::text)",
"Rows Removed by Filter": 1,
"Shared Hit Blocks": 1,
"Shared Read Blocks": 0,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Hit Blocks": 0,
"Local Read Blocks": 0,
"Local Dirtied Blocks": 0,
"Local Written Blocks": 0,
"Temp Read Blocks": 0,
"Temp Written Blocks": 0
},
"Planning Time": 0.065,
"Triggers": [
],
"Execution Time": 0.038
}
]

Interpreting the Plan: A Practical Approach

Analyzing an EXPLAIN ANALYZE output involves several steps:

  1. Check Planning vs. Execution Time: Is a significant amount of time spent planning the query? This is rare but can happen with extremely complex queries or specific planner bugs. Usually, execution time dominates.
  2. Look at the Top Node’s Actual Total Time: This is your baseline total execution time.
  3. Compare Estimated vs. Actual Rows: Start from the leaf nodes and work up. Are there major discrepancies between rows= (estimated) and actual rows=?
    • Huge Discrepancy (Orders of Magnitude): This is a red flag! It usually means the planner is working with inaccurate statistics about the data distribution in your tables. The most common fix is to run ANALYZE <table_name>; (or VACUUM ANALYZE <table_name>;) on the relevant table(s) to update the statistics. Poor estimates can lead the planner to choose suboptimal join methods (e.g., Nested Loop instead of Hash Join) or scan types.
  4. Identify High-Cost / High-Time Nodes: Scan the actual total time for each node. Which nodes contribute most to the overall execution time? Remember that a parent node’s time includes its children’s time. Focus on nodes with high self time (total time minus children’s total time) or leaf nodes with high time.
  5. Look for Expensive Operations:
    • Seq Scan on Large Tables: Is the table large (check Buffers: shared read=...)? Is the filter selective (check Rows Removed by Filter)? If yes to both, an index might be missing or unusable.
    • Index Scan Fetching Many Rows: If an index scan returns a large percentage of the table (actual rows= is high), a Seq Scan might have been faster. Maybe the index isn’t selective enough, or statistics are off.
    • Bitmap Heap Scan with High Heap Fetches: Check the underlying Bitmap Index Scan(s). Are they efficient? Is the number of rows fetched high?
    • Inefficient Joins (Nested Loop on Large Inner Relation): Usually indicates a missing index on the join key of the inner table.
    • Disk Spilling (Hash Join, Sort): Check EXPLAIN (ANALYZE, BUFFERS) for temp read/written blocks. If present, consider increasing work_mem.
    • High Buffer Reads: Use EXPLAIN (ANALYZE, BUFFERS) to see which nodes are causing disk I/O (shared read). Can caching be improved (larger shared_buffers)? Can the query be rewritten to access data more efficiently (e.g., via an Index Only Scan)?
  6. Relate Plan Nodes to SQL: Understand which part of your SQL query corresponds to which plan node(s). WHERE clauses usually relate to Scan nodes or Filters; JOIN clauses to Join nodes; ORDER BY to Sort nodes; GROUP BY / aggregates to Aggregate nodes.
  7. Check VERBOSE and SETTINGS: Use VERBOSE for clarity on column names and operations. Use SETTINGS if you suspect a non-default configuration parameter is influencing the plan.

Common Performance Problems & How EXPLAIN Helps

  • Problem: Slow query fetching specific rows from a large table.

    • EXPLAIN Shows: Seq Scan with a highly selective Filter (many Rows Removed by Filter). High actual time and potentially high shared read buffers.
    • Solution: Add an index on the column(s) used in the WHERE clause. Run ANALYZE <table_name>.
  • Problem: Query with ORDER BY is slow.

    • EXPLAIN Shows: A costly Sort node, possibly with Sort Method: external merge Disk: (visible with VERBOSE or checking BUFFERS for temp blocks).
    • Solution: Add an index matching the ORDER BY columns and direction. Increase work_mem if disk spill occurs. See if LIMIT can be applied before sorting.
  • Problem: Joining two large tables is slow.

    • EXPLAIN Shows: Nested Loop Join where the inner side is a Seq Scan on a large table, OR a Hash Join that spills to disk (temp read/written buffers > 0).
    • Solution: For Nested Loop, add an index on the join key of the inner table. For Hash Join spill, increase work_mem. Ensure statistics are accurate (ANALYZE) so the best join method is chosen.
  • Problem: Query performs poorly after data volume increased significantly.

    • EXPLAIN ANALYZE Shows: Estimated row counts (rows=) differ drastically from actual row counts (actual rows=). Planner chose a bad plan based on old statistics.
    • Solution: Run ANALYZE <table_name>; on all relevant tables. Consider adjusting autovacuum settings for more frequent analysis.
  • Problem: Query using multiple OR conditions is slow.

    • EXPLAIN Shows: Seq Scan or maybe multiple Index Scans combined inefficiently.
    • Solution: Consider using Bitmap Index Scans (may require separate indexes on the OR‘d columns) or rewriting the query perhaps using UNION ALL if appropriate.
  • Problem: Query needing columns a, b, c is using an Index Scan on (a) and fetching rows from heap.

    • EXPLAIN Shows: Index Scan followed by heap fetches (no Index Only Scan). High actual time relative to an index-only operation.
    • Solution: Create or modify an index to include all needed columns: CREATE INDEX ... ON table (a, b, c); or CREATE INDEX ... ON table (a) INCLUDE (b, c);. Run VACUUM <table_name> to update the visibility map.

Improving Queries Based on EXPLAIN

Based on your analysis, common optimization strategies include:

  1. Adding Indexes: The most frequent optimization. Add B-tree indexes for equality/range predicates (WHERE, JOIN, ORDER BY). Consider specialized index types (GIN, GiST, BRIN, Hash) for specific data types or query patterns (full-text search, geometric data, arrays, etc.). Remember to index foreign key columns.
  2. Modifying Indexes: Create covering indexes (using INCLUDE or multi-column indexes) to enable Index Only Scans. Ensure multi-column index order matches query conditions. Drop unused indexes.
  3. Updating Statistics: Run ANALYZE <table_name> or VACUUM ANALYZE <table_name> after significant data changes or if estimates are poor. Tune autovacuum/autoanalyze parameters.
  4. Rewriting Queries: Sometimes, changing the SQL structure can lead to a better plan. Examples:
    • Replacing complex subqueries with CTEs (Common Table Expressions) or temporary tables (though CTEs can sometimes act as optimization fences).
    • Using UNION ALL instead of OR in some cases.
    • Using EXISTS or NOT EXISTS instead of IN or NOT IN with subqueries.
    • Breaking down very complex queries into simpler steps.
  5. Adjusting PostgreSQL Configuration:
    • work_mem: Increase if sorts, hash joins, or hash aggregations spill to disk. Be cautious, as it’s allocated per operation per backend.
    • shared_buffers: Increase (within reason, often ~25% of system RAM) to allow more data/indexes to be cached in memory, reducing disk I/O. Requires a server restart.
    • effective_cache_size: Inform the planner about how much memory is available for disk caching by the OS and PostgreSQL combined. Helps it estimate the cost of index vs. sequential scans.
    • Caution: Tune configuration parameters carefully, monitoring overall system performance.
  6. Schema Changes: Sometimes, denormalization or changing data types might be necessary for performance, but this should be considered carefully.
  7. Using Materialized Views: For complex, slow aggregations or joins that are queried frequently but don’t need real-time data, pre-calculating results into a materialized view can provide significant speedups.

Tools for Visualizing EXPLAIN Output

While the default TEXT format is informative, the tree structure can be hard to grasp for complex queries. Visualization tools can help immensely:

  1. explain.depesz.com (Online): A popular web-based tool. Paste your EXPLAIN (ANALYZE, BUFFERS) output (TEXT format), and it generates an annotated, color-coded HTML representation highlighting expensive nodes and potential issues.
  2. PEV (Postgres EXPLAIN Visualizer) (Online/Offline): Another excellent tool (pev.dalibo.com or available as source). Takes EXPLAIN (FORMAT JSON) output and creates interactive, graphical plan diagrams. Allows easy navigation and identification of bottlenecks.
  3. PgAdmin 4 (Desktop Client): Includes a graphical EXPLAIN feature. When you run EXPLAIN/EXPLAIN ANALYZE in the query tool, a “Explain” tab appears below the results, showing a graphical plan tree with costs and timings.
  4. Other IDEs/Tools: Many database IDEs (like DataGrip) have built-in graphical explain plan viewers.

These tools use the same underlying data but present it in a way that makes it easier to see the flow, relative costs/times, and problem areas at a glance. Using EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) and feeding it into PEV is a particularly powerful combination.

Conclusion

The PostgreSQL EXPLAIN command, especially when combined with the ANALYZE and BUFFERS options, is an indispensable diagnostic tool. It provides a window into the query planner’s decisions and the actual execution process, revealing how and why a query performs the way it does.

By learning to read and interpret query plans, you can move beyond guesswork and implement targeted optimizations. Understanding common plan nodes (Scans, Joins, Sorts, Aggregates), identifying discrepancies between estimated and actual values, and pinpointing resource-intensive operations (high time, disk I/O, memory spills) are crucial steps. Remember to leverage table statistics (ANALYZE), appropriate indexing, query rewriting, and occasionally configuration tuning to address the bottlenecks uncovered by EXPLAIN.

Query optimization is often an iterative process: EXPLAIN ANALYZE, identify issues, make a change (add index, rewrite query), and then EXPLAIN ANALYZE again to verify the impact. While mastering EXPLAIN takes practice, the ability to dissect query performance is fundamental to building fast, scalable, and efficient applications on PostgreSQL. Start using it today on your slowest queries – you might be surprised by what you find.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top