Introduction to Analyzing Queries with PostgreSQL EXPLAIN
In the world of relational databases, performance is paramount. Slow queries can cripple application responsiveness, frustrate users, and consume excessive server resources. PostgreSQL, a powerful and popular open-source object-relational database system, provides a critical tool for understanding and optimizing query performance: the EXPLAIN
command. Mastering EXPLAIN
is an essential skill for any developer or database administrator working with PostgreSQL, allowing you to peer under the hood and see exactly how the database intends to execute your SQL queries.
This article provides a comprehensive introduction to using EXPLAIN
in PostgreSQL. We will cover its basic usage, delve into interpreting the query plan output, explore advanced options like ANALYZE
and BUFFERS
, examine common plan nodes, discuss how to identify performance bottlenecks, and touch upon strategies for improving query performance based on EXPLAIN
analysis. By the end, you’ll have a solid foundation for using this indispensable tool to make your PostgreSQL queries faster and more efficient.
Why Query Analysis Matters
Before diving into EXPLAIN
, let’s briefly reiterate why understanding query execution is so important:
- Performance Bottlenecks: Inefficient queries are often the primary source of performance issues in database-driven applications. Identifying these queries is the first step towards optimization.
- Resource Consumption: Slow queries can consume significant CPU, memory (RAM), and I/O resources, potentially impacting other operations on the database server.
- Scalability: As data volumes grow, poorly written queries that performed adequately on small datasets can become prohibitively slow. Understanding execution plans helps write queries that scale better.
- Informed Optimization: Without knowing how a query is executed, attempts at optimization (like adding indexes) are often guesswork.
EXPLAIN
provides the data needed for targeted improvements. - Understanding the Planner: It helps you understand the decisions made by the PostgreSQL query planner, leading to better intuition about how different SQL constructs and schema designs influence performance.
The Basics: EXPLAIN
Your Query
The fundamental purpose of the EXPLAIN
command is to show the execution plan that the PostgreSQL planner generates for a given SQL statement. It does not execute the query (unless you use the ANALYZE
option, discussed later); it only reveals the planner’s strategy.
The basic syntax is straightforward:
sql
EXPLAIN <your_sql_query>;
For example, let’s consider a simple table and query:
“`sql
— Sample Table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_login TIMESTAMP,
status VARCHAR(10) DEFAULT ‘active’
);
— Add some indexes
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_status ON users (status);
— Insert some sample data (simplified)
INSERT INTO users (username, email, status) VALUES
(‘alice’, ‘[email protected]’, ‘active’),
(‘bob’, ‘[email protected]’, ‘inactive’),
(‘charlie’, ‘[email protected]’, ‘active’),
(‘david’, ‘[email protected]’, ‘active’);
— Now, let’s EXPLAIN a simple query
EXPLAIN SELECT id, username FROM users WHERE status = ‘active’;
“`
Running this EXPLAIN
command might produce output similar to this (the exact costs and rows will vary based on your PostgreSQL version, configuration, and the actual data/statistics):
“`text
QUERY PLAN
Seq Scan on users (cost=0.00..1.05 rows=3 width=13)
Filter: (status = ‘active’::text)
(2 rows)
“`
Interpreting the Basic Output
Even this simple output tells us several things:
QUERY PLAN
: This indicates the start of the execution plan.Seq Scan on users
: This is the core operation, called a plan node. It means PostgreSQL plans to read theusers
table sequentially (from beginning to end).cost=0.00..1.05
: This provides the planner’s estimated cost for this operation.0.00
(Startup Cost): The estimated cost to retrieve the first row.1.05
(Total Cost): The estimated cost to retrieve all rows for this node.- Important: Cost units are arbitrary and relative. They don’t directly map to seconds or milliseconds. They represent units of work, roughly calibrated to sequential page fetches. Lower cost is generally better. The startup cost is crucial for queries with
LIMIT
.
rows=3
: The estimated number of rows that this node will output.width=13
: The estimated average width (in bytes) of the rows output by this node.Filter: (status = 'active'::text)
: This shows a condition applied after fetching rows during the sequential scan. Rows are read, and then checked against this filter.
In this case, the planner chose a Sequential Scan because the table is very small, and it estimated that reading the whole table and filtering would be cheaper than using the idx_users_status
index.
Understanding the Query Plan Structure
Query plans are typically represented as a tree of nodes. Execution generally starts from the bottom (leaf nodes) and flows upwards towards the root node. Each node performs a specific operation (like scanning a table, joining tables, sorting results) and passes its output rows to its parent node.
Consider a slightly more complex query:
sql
EXPLAIN SELECT u.username, o.order_date
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active' AND o.amount > 100
ORDER BY o.order_date DESC;
(Assuming an orders
table exists with id
, user_id
, order_date
, amount
columns and appropriate indexes)
The EXPLAIN
output might look something like this (simplified):
“`text
QUERY PLAN
Sort (cost=150.25..151.00 rows=300 width=25)
Sort Key: o.order_date DESC
-> Hash Join (cost=55.50..130.75 rows=300 width=25)
Hash Cond: (o.user_id = u.id)
-> Seq Scan on orders o (cost=0.00..65.00 rows=1000 width=16)
Filter: (amount > 100)
-> Hash (cost=45.00..45.00 rows=800 width=17)
-> Seq Scan on users u (cost=0.00..45.00 rows=800 width=17)
Filter: (status = ‘active’::text)
(9 rows)
“`
Let’s break down this tree structure:
- Root Node (
Sort
): The final operation is sorting the results byorder_date
in descending order. It receives rows from its child node. - Child of
Sort
(Hash Join
): This node joins rows from its two children based on theuser_id = u.id
condition using a Hash Join algorithm. - Children of
Hash Join
:Seq Scan on orders o
: Reads theorders
table sequentially, filtering foramount > 100
. The results of this scan are used to probe the hash table built by the other child.Hash
: This node takes the output of its child (Seq Scan on users u
) and builds an in-memory hash table based on theuser_id
column (derived from theHash Cond
of the parent Hash Join).- Child of
Hash
(Seq Scan on users u
): Reads theusers
table sequentially, filtering forstatus = 'active'
. Its output is fed into theHash
node.
Execution flow (simplified):
* Scan users
, filter by status
, build a hash table in memory based on user_id
.
* Scan orders
, filter by amount
.
* For each row from the orders
scan, probe the hash table using user_id
to find matching users
rows.
* Combine matching rows (join).
* Sort the resulting joined rows by order_date
.
* Return the final sorted result.
The indentation helps visualize the parent-child relationships in the plan tree.
EXPLAIN ANALYZE
: Getting Real Performance Data
While EXPLAIN
shows the planned execution and estimated costs/rows, these are based on table statistics and cost models, which might not always be perfectly accurate. To see what actually happens when the query runs, including actual timings and row counts, you use EXPLAIN ANALYZE
:
sql
EXPLAIN ANALYZE <your_sql_query>;
Crucial Warning: EXPLAIN ANALYZE
executes the query!
* For SELECT
queries, this is usually safe, although it still consumes resources.
* For INSERT
, UPDATE
, DELETE
, or DDL statements, it will perform the data modification. If you want to analyze a modification query without changing data, wrap it in a transaction and roll it back:
sql
BEGIN;
EXPLAIN ANALYZE UPDATE users SET last_login = NOW() WHERE id = 1;
ROLLBACK; -- Reverts the UPDATE
Let’s re-run our first simple example with ANALYZE
:
sql
EXPLAIN ANALYZE SELECT id, username FROM users WHERE status = 'active';
The output will now include actual performance metrics:
“`text
QUERY PLAN
Seq Scan on users (cost=0.00..1.05 rows=3 width=13) (actual time=0.015..0.018 rows=3 loops=1)
Filter: (status = ‘active’::text)
Rows Removed by Filter: 1
Planning Time: 0.080 ms
Execution Time: 0.035 ms
(5 rows)
“`
New Metrics Introduced by ANALYZE
actual time=0.015..0.018
: The actual time taken, in milliseconds.0.015
(Actual Startup Time): Time taken to get the first row from this node.0.018
(Actual Total Time): Time taken for this node to produce all its output rows. Crucially, this total time includes the time spent in all its child nodes. Therefore, the top-level node’s total time approximates the total execution time for that part of the plan.
rows=3
: The actual number of rows output by this node during execution. Compare this to the estimatedrows
from the plainEXPLAIN
. Large discrepancies often indicate outdated or inaccurate table statistics.loops=1
: The number of times this specific node was executed. For simple plans, it’s usually 1. In nested loops or subplans, a node might be executed multiple times. Theactual time
reported is the average time per loop, and the total time for the node across all loops isactual total time * loops
.Rows Removed by Filter
: (Optional, shown when filtering occurs) Indicates how many rows were read but discarded by theFilter
condition.Planning Time
: The time taken by the PostgreSQL planner to generate the query plan itself.Execution Time
: The total time taken to execute the query plan, from the start of the first node to the completion of the root node.
Comparing estimated rows (rows=3
) with actual rows (rows=3
) here shows the planner’s estimate was accurate for this simple case. The query executed very quickly (0.035 ms).
Common Query Plan Nodes Explained
Understanding the different types of nodes that can appear in a query plan is key to interpreting EXPLAIN
output. Here are some of the most common ones:
1. Scan Nodes (Reading data from tables)
-
Seq Scan
(Sequential Scan):- What: Reads the entire table block by block from beginning to end.
- When Chosen: For small tables; when no suitable index exists for the
WHERE
clause; when the query needs to retrieve a large percentage of the table’s rows (making index lookups less efficient). - Cost Factors: Primarily depends on the table size (number of blocks/pages). Filtering adds CPU cost.
- Optimization: If a Seq Scan on a large table is slow and the
WHERE
clause is selective (returns a small fraction of rows), consider adding an index on the relevant column(s). Ensure statistics are up-to-date (ANALYZE <table_name>
). - Example Output:
Seq Scan on users (cost=0.00..45.00 rows=800 width=17)
-
Index Scan
:- What: Uses an index to find the locations (TIDs – tuple identifiers) of relevant rows and then fetches those rows from the table’s main data storage (the “heap”).
- When Chosen: When a
WHERE
clause condition matches an available index and is selective enough that the planner estimates fewer page accesses compared to a Seq Scan. Often used for equality (=
), range (<
,>
,BETWEEN
), orIN
conditions on indexed columns. - Cost Factors: Cost of traversing the index B-tree + cost of fetching each qualifying row from the table heap. Can become expensive if many rows match (many random heap accesses).
- Optimization: Ensure the index is selective. Sometimes a multi-column index is needed. Can potentially be improved to an
Index Only Scan
. - Example Output:
Index Scan using idx_users_email on users (cost=0.28..8.29 rows=1 width=50)
(assuming a query likeWHERE email = '...'
)- Often includes
Index Cond:
showing the condition used for the index lookup.
- Often includes
-
Index Only Scan
:- What: Similar to an Index Scan, but retrieves all required data directly from the index itself, without needing to access the table heap. This is much faster if applicable.
- When Chosen: When all columns needed by the query (in
SELECT
,WHERE
,ORDER BY
, etc.) are available within the index definition, AND the visibility map indicates that all relevant rows are known to be visible to all transactions (avoiding the need to check the heap for visibility info). - Cost Factors: Cost of traversing the index. Significantly cheaper than an Index Scan as it avoids random heap I/O.
- Optimization: Ensure indexes cover all needed columns for specific queries. Run
VACUUM
regularly to keep the visibility map up-to-date, making Index Only Scans more likely. - Example Output:
Index Only Scan using users_pkey on users (cost=0.28..4.30 rows=1 width=4)
(assumingSELECT id FROM users WHERE id = 1
)
-
Bitmap Heap Scan
(combined withBitmap Index Scan
):- What: A two-phase approach. First, one or more
Bitmap Index Scan
nodes scan index(es) to create an in-memory bitmap of potential row locations. Then, theBitmap Heap Scan
node fetches the actual rows from the heap in physical location order (more efficient than random access for multiple rows). - When Chosen: Often used when combining conditions on multiple indexes using
OR
orAND
; when an index is moderately selective (returning too many rows for an Index Scan to be efficient but fewer than a Seq Scan); sometimes forIN
clauses with many values. - Cost Factors: Cost of index scan(s) + cost of building bitmap + cost of fetching heap pages (optimized for locality).
- Optimization: Analyze the underlying
Bitmap Index Scan
(s). Ensure appropriate indexes exist. Sometimes adjustingwork_mem
can influence bitmap creation. - Example Output:
text
Bitmap Heap Scan on users (cost=12.50..34.75 rows=100 width=50)
Recheck Cond: ((status = 'active') OR (email LIKE '%@domain.com'))
-> BitmapOr (cost=12.50..12.50 rows=100 width=0)
-> Bitmap Index Scan on idx_users_status (cost=0.00..4.25 rows=50 width=0)
Index Cond: (status = 'active')
-> Bitmap Index Scan on idx_users_email (cost=0.00..8.00 rows=50 width=0)
Index Cond: (email LIKE '%@domain.com') -- Note: LIKE often requires specific index types like trigram
- What: A two-phase approach. First, one or more
-
TID Scan
(Tuple ID Scan):- What: Fetches rows directly based on their physical location (TID).
- When Chosen: Typically used for conditions like
WHERE ctid = '...'
. Not common in general application queries but can be seen internally or in specific use cases. - Cost Factors: Very fast for fetching a known, small set of rows by physical address.
2. Join Nodes (Combining data from multiple tables)
-
Nested Loop Join
:- What: The simplest join algorithm. It iterates through rows from the outer relation (the first child node). For each outer row, it scans the inner relation (the second child node) to find matching rows based on the join condition.
- When Chosen: Typically when one of the relations (usually the inner one) is very small. Often used when the inner side can efficiently use an index lookup for each outer row (e.g.,
Index Scan
as the inner plan). - Cost Factors: Cost is roughly (Cost of Outer Scan) + (Outer Rows * Cost of Inner Scan per row). Can be very slow if both relations are large and the inner scan is inefficient (e.g., Seq Scan).
- Optimization: Ensure an index exists on the join key column in the inner table.
- Example Output:
text
Nested Loop (cost=0.28..20.35 rows=10 width=60)
-> Seq Scan on users u (cost=0.00..1.05 rows=5 width=50) -- Outer
Filter: (status = 'active')
-> Index Scan using idx_orders_user_id on orders o (cost=0.28..3.85 rows=2 width=16) -- Inner (per outer row)
Index Cond: (user_id = u.id) - Variations: Sometimes includes
Materialize
on the inner side if it’s scanned repeatedly without an index.
-
Hash Join
:- What: A two-phase join.
- Build Phase: Scans the inner relation (usually the smaller one after filtering) and builds an in-memory hash table using the join key(s).
- Probe Phase: Scans the outer relation and, for each row, probes the hash table using the join key(s) to find matches.
- When Chosen: Generally efficient for joining large tables when an equijoin condition (
=
) is used. Requires sufficient memory (work_mem
) to hold the hash table of the inner relation. - Cost Factors: Cost of scanning both tables + cost of building the hash table + cost of probing. Can spill to disk if the hash table exceeds
work_mem
, significantly increasing cost. - Optimization: Increase
work_mem
if hash joins are spilling (check withEXPLAIN (ANALYZE, BUFFERS)
). Ensure statistics are accurate so the planner chooses the smaller relation for the build phase. - Example Output: (Seen earlier)
text
Hash Join (cost=55.50..130.75 rows=300 width=25)
Hash Cond: (o.user_id = u.id)
-> Seq Scan on orders o (cost=0.00..65.00 rows=1000 width=16) -- Probe side
-> Hash (cost=45.00..45.00 rows=800 width=17) -- Build side input
-> Seq Scan on users u (cost=0.00..45.00 rows=800 width=17)
- What: A two-phase join.
-
Merge Join
(Merge Sort Join):- What: Requires both input relations to be sorted on the join key(s). It then reads through both sorted streams concurrently, matching rows like merging two sorted lists.
- When Chosen: When both inputs are already sorted (e.g., from an index scan or a previous sort operation), or when the planner estimates that sorting both inputs and then merging is cheaper than a Hash Join (often for very large tables where hash tables might spill, or for non-equijoin conditions like
>
). - Cost Factors: Cost of scanning both tables + cost of sorting if inputs are not already sorted. The merge phase itself is typically very cheap.
- Optimization: If Merge Join is chosen and requires explicit
Sort
nodes underneath, ensurework_mem
is sufficient for sorting. Pre-sorting via indexes (on join keys) can make Merge Join very efficient. - Example Output:
text
Merge Join (cost=100.50..150.75 rows=500 width=40)
Merge Cond: (u.id = o.user_id)
-> Index Scan using users_pkey on users u (cost=0.28..60.30 rows=1000 width=17) -- Already sorted by ID
-> Sort (cost=100.22..102.72 rows=1000 width=23) -- Explicit sort needed
Sort Key: o.user_id
-> Seq Scan on orders o (cost=0.00..65.00 rows=1000 width=23)
3. Other Important Nodes
-
Sort
:- What: Sorts the input rows based on specified key(s).
- When Chosen: Required for
ORDER BY
clauses (unless an index provides the required order), Merge Joins, certainGROUP BY
strategies,DISTINCT
operations. - Cost Factors: CPU and memory usage. Can be very expensive, especially if the data to be sorted exceeds
work_mem
and needs to spill to disk (look for “Sort Method: external merge Disk:” in verboseANALYZE
output). - Optimization: Increase
work_mem
if sorts spill to disk. See if an index can satisfy theORDER BY
clause to avoid the sort entirely. Limit the number of rows being sorted if possible. - Example Output:
Sort (cost=150.25..151.00 rows=300 width=25)
-
Aggregate
/HashAggregate
/GroupAggregate
:- What: Performs aggregation operations (like
COUNT
,SUM
,AVG
,MAX
,MIN
) usually associated with aGROUP BY
clause or aggregate functions withoutGROUP BY
.HashAggregate
uses an in-memory hash table (similar to Hash Join) which is often faster for many groups.GroupAggregate
is often used when the input is pre-sorted. - When Chosen: Whenever aggregate functions or
GROUP BY
are used. - Cost Factors: Depends on the number of input rows and the number of distinct groups.
HashAggregate
can spill to disk if the hash table exceedswork_mem
. - Optimization: Increase
work_mem
if spilling occurs. Ensure indexes support any filtering before aggregation. Sometimes rewriting the query or using window functions can be more efficient. - Example Output:
HashAggregate (cost=180.00..182.50 rows=200 width=32)
- What: Performs aggregation operations (like
-
Limit
:- What: Stops processing after retrieving the specified number of rows.
- When Chosen: Directly corresponds to the
LIMIT
clause in SQL. - Cost Factors: Often very cheap, especially the startup cost. Significantly impacts total cost if it allows the planner to avoid processing large parts of the underlying plan (e.g., stopping an Index Scan early).
- Optimization: Combining
ORDER BY
withLIMIT
is often very efficient if an index matches theORDER BY
clause. - Example Output:
Limit (cost=0.28..4.58 rows=10 width=50)
(forLIMIT 10
)
-
Materialize
:- What: Caches the output of its child node in memory. Subsequent requests for rows from the parent node read from this cache instead of re-executing the child.
- When Chosen: Often used in Nested Loop joins when the inner side is scanned multiple times and isn’t using an efficient index lookup, or in CTEs (
WITH
clauses) that are referenced multiple times. - Cost Factors: Adds a small overhead for the initial population but can save significant cost if the child subplan is expensive and executed many times. Uses memory.
- Example Output:
-> Materialize (cost=10.00..15.00 rows=50 width=20)
-
Gather
/Gather Merge
:- What: Nodes related to parallel query execution. Worker processes execute parts of the plan (the subplan under
Gather
orGather Merge
), and this node collects the results from the workers.Gather Merge
preserves sort order from parallel workers. - When Chosen: The planner decides based on cost estimates whether parallelizing parts of the query (like Seq Scans, Hash Joins) would be faster using multiple CPU cores. Controlled by settings like
max_parallel_workers_per_gather
. - Cost Factors: Involves coordination overhead but can significantly speed up execution on multi-core systems for CPU-bound or I/O-bound tasks amenable to parallelization.
- Example Output:
Gather (cost=1000.00..8000.00 rows=10000 width=50)
- What: Nodes related to parallel query execution. Worker processes execute parts of the plan (the subplan under
Advanced EXPLAIN
Options
Beyond the basic EXPLAIN
and EXPLAIN ANALYZE
, several options can provide even more insight:
sql
EXPLAIN (option [, ...]) <your_sql_query>;
Common options include:
ANALYZE [boolean]
: Executes the query and shows actual run times and row counts. Defaults tofalse
. We’ve covered this extensively.VERBOSE [boolean]
: Displays additional information, such as schema-qualified table and function names, output column lists for each node, and labels for variables. Can make complex plans easier to read. Defaults tofalse
.COSTS [boolean]
: Includes the estimated startup and total costs for each node. Defaults totrue
. Setting it tofalse
can slightly simplify the output if you only care about node types and row counts.SETTINGS [boolean]
: Includes information on configuration parameters that affect query planning and were changed from the built-in defaults (e.g.,work_mem
,enable_seqscan
). Useful for understanding why the planner made certain choices. Defaults tofalse
.BUFFERS [boolean]
: RequiresANALYZE
. Shows information about buffer usage (memory and disk I/O). This is extremely valuable for identifying I/O bottlenecks. Defaults tofalse
.WAL [boolean]
: RequiresANALYZE
. Includes information about Write-Ahead Log (WAL) record generation. Useful for analyzing the impact of DML statements (INSERT
,UPDATE
,DELETE
). Defaults tofalse
.TIMING [boolean]
: RequiresANALYZE
. Includes the actual startup and total time spent in each node. Defaults totrue
whenANALYZE
is used. Turning it off can slightly reduce profiling overhead if you only need buffer or row count info fromANALYZE
.SUMMARY [boolean]
: Includes a summary line at the end (total planning and execution time). Defaults totrue
whenANALYZE
is used,false
otherwise.FORMAT { TEXT | XML | JSON | YAML }
: Specifies the output format. Defaults toTEXT
. Structured formats likeJSON
are excellent for programmatic analysis or feeding into visualization tools.
Deep Dive: EXPLAIN (ANALYZE, BUFFERS)
The BUFFERS
option is arguably one of the most useful additions to ANALYZE
. It reveals how different parts of the plan interact with PostgreSQL’s buffer cache (shared_buffers
) and the disk.
sql
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE id = 500;
Output might look like:
“`text
QUERY PLAN
Index Scan using users_pkey on users (cost=0.29..8.30 rows=1 width=75) (actual time=0.035..0.036 rows=1 loops=1)
Index Cond: (id = 500)
Buffers: shared hit=4
Planning Time: 0.110 ms
Execution Time: 0.055 ms
(5 rows)
— Another example, perhaps involving disk reads
EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) FROM large_table;
“`
“`text
QUERY PLAN
Finalize Aggregate (cost=19005.00..19005.01 rows=1 width=8) (actual time=155.230..155.231 rows=1 loops=1)
Buffers: shared hit=5 read=10000
-> Gather (cost=19004.00..19004.01 rows=2 width=8) (actual time=155.100..155.220 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5 read=10000
-> Partial Aggregate (cost=18004.00..18004.01 rows=1 width=8) (actual time=148.500..148.501 rows=1 loops=3)
Buffers: shared hit=5 read=10000
-> Parallel Seq Scan on large_table (cost=0.00..17500.00 rows=201600 width=0) (actual time=0.015..135.120 rows=166667 loops=3)
Buffers: shared hit=5 read=10000
Planning Time: 0.150 ms
Execution Time: 155.300 ms
(11 rows)
“`
Interpreting Buffers
output:
shared hit=N
: Number of buffer hits in PostgreSQL’s shared buffer cache. This means the required data block was already found in RAM. Hits are fast.shared read=N
: Number of buffer reads. The data block was not in the shared buffer cache and had to be read from the operating system / disk. Reads are slow (especially from spinning disks). High read counts often indicate I/O bottlenecks.shared dirtied=N
: Number of buffers that were modified by this query within the shared buffer cache (relevant forUPDATE
/INSERT
).shared written=N
: Number of previously dirtied buffers that were written out to disk by this backend during this query’s execution.local hit/read/dirtied/written
: Buffer activity related to temporary tables or temporary data structures (like large sorts or hash joins spilling to disk).temp read/written
: Number of blocks read from / written to temporary files (e.g., for sorts or hash joins exceedingwork_mem
). These indicate disk spilling and are usually performance killers.
Analyzing BUFFERS
helps pinpoint whether a query is slow due to CPU work or because it’s waiting for data from disk (high read
or temp read/written
counts).
Using FORMAT JSON
The JSON
format is incredibly useful for tools:
sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT id FROM users WHERE status = 'active';
This produces a detailed JSON structure representing the plan tree, including all the metrics from ANALYZE
and BUFFERS
, suitable for parsing and visualization.
json
[
{
"Plan": {
"Node Type": "Seq Scan",
"Parallel Aware": false,
"Relation Name": "users",
"Alias": "users",
"Startup Cost": 0.00,
"Total Cost": 1.05,
"Plan Rows": 3,
"Plan Width": 4,
"Actual Startup Time": 0.012,
"Actual Total Time": 0.015,
"Actual Rows": 3,
"Actual Loops": 1,
"Filter": "(status = 'active'::text)",
"Rows Removed by Filter": 1,
"Shared Hit Blocks": 1,
"Shared Read Blocks": 0,
"Shared Dirtied Blocks": 0,
"Shared Written Blocks": 0,
"Local Hit Blocks": 0,
"Local Read Blocks": 0,
"Local Dirtied Blocks": 0,
"Local Written Blocks": 0,
"Temp Read Blocks": 0,
"Temp Written Blocks": 0
},
"Planning Time": 0.065,
"Triggers": [
],
"Execution Time": 0.038
}
]
Interpreting the Plan: A Practical Approach
Analyzing an EXPLAIN ANALYZE
output involves several steps:
- Check Planning vs. Execution Time: Is a significant amount of time spent planning the query? This is rare but can happen with extremely complex queries or specific planner bugs. Usually, execution time dominates.
- Look at the Top Node’s Actual Total Time: This is your baseline total execution time.
- Compare Estimated vs. Actual Rows: Start from the leaf nodes and work up. Are there major discrepancies between
rows=
(estimated) andactual rows=
?- Huge Discrepancy (Orders of Magnitude): This is a red flag! It usually means the planner is working with inaccurate statistics about the data distribution in your tables. The most common fix is to run
ANALYZE <table_name>;
(orVACUUM ANALYZE <table_name>;
) on the relevant table(s) to update the statistics. Poor estimates can lead the planner to choose suboptimal join methods (e.g., Nested Loop instead of Hash Join) or scan types.
- Huge Discrepancy (Orders of Magnitude): This is a red flag! It usually means the planner is working with inaccurate statistics about the data distribution in your tables. The most common fix is to run
- Identify High-Cost / High-Time Nodes: Scan the
actual total time
for each node. Which nodes contribute most to the overall execution time? Remember that a parent node’s time includes its children’s time. Focus on nodes with high self time (total time minus children’s total time) or leaf nodes with high time. - Look for Expensive Operations:
Seq Scan
on Large Tables: Is the table large (checkBuffers: shared read=...
)? Is the filter selective (checkRows Removed by Filter
)? If yes to both, an index might be missing or unusable.Index Scan
Fetching Many Rows: If an index scan returns a large percentage of the table (actual rows=
is high), aSeq Scan
might have been faster. Maybe the index isn’t selective enough, or statistics are off.Bitmap Heap Scan
with High Heap Fetches: Check the underlyingBitmap Index Scan
(s). Are they efficient? Is the number of rows fetched high?- Inefficient Joins (
Nested Loop
on Large Inner Relation): Usually indicates a missing index on the join key of the inner table. - Disk Spilling (
Hash Join
,Sort
): CheckEXPLAIN (ANALYZE, BUFFERS)
fortemp read/written
blocks. If present, consider increasingwork_mem
. - High Buffer Reads: Use
EXPLAIN (ANALYZE, BUFFERS)
to see which nodes are causing disk I/O (shared read
). Can caching be improved (largershared_buffers
)? Can the query be rewritten to access data more efficiently (e.g., via an Index Only Scan)?
- Relate Plan Nodes to SQL: Understand which part of your SQL query corresponds to which plan node(s).
WHERE
clauses usually relate to Scan nodes or Filters;JOIN
clauses to Join nodes;ORDER BY
to Sort nodes;GROUP BY
/ aggregates to Aggregate nodes. - Check
VERBOSE
andSETTINGS
: UseVERBOSE
for clarity on column names and operations. UseSETTINGS
if you suspect a non-default configuration parameter is influencing the plan.
Common Performance Problems & How EXPLAIN
Helps
-
Problem: Slow query fetching specific rows from a large table.
EXPLAIN
Shows:Seq Scan
with a highly selectiveFilter
(manyRows Removed by Filter
). Highactual time
and potentially highshared read
buffers.- Solution: Add an index on the column(s) used in the
WHERE
clause. RunANALYZE <table_name>
.
-
Problem: Query with
ORDER BY
is slow.EXPLAIN
Shows: A costlySort
node, possibly withSort Method: external merge Disk:
(visible withVERBOSE
or checkingBUFFERS
for temp blocks).- Solution: Add an index matching the
ORDER BY
columns and direction. Increasework_mem
if disk spill occurs. See ifLIMIT
can be applied before sorting.
-
Problem: Joining two large tables is slow.
EXPLAIN
Shows:Nested Loop Join
where the inner side is aSeq Scan
on a large table, OR aHash Join
that spills to disk (temp read/written
buffers > 0).- Solution: For Nested Loop, add an index on the join key of the inner table. For Hash Join spill, increase
work_mem
. Ensure statistics are accurate (ANALYZE
) so the best join method is chosen.
-
Problem: Query performs poorly after data volume increased significantly.
EXPLAIN ANALYZE
Shows: Estimated row counts (rows=
) differ drastically from actual row counts (actual rows=
). Planner chose a bad plan based on old statistics.- Solution: Run
ANALYZE <table_name>;
on all relevant tables. Consider adjusting autovacuum settings for more frequent analysis.
-
Problem: Query using multiple
OR
conditions is slow.EXPLAIN
Shows:Seq Scan
or maybe multipleIndex Scans
combined inefficiently.- Solution: Consider using
Bitmap Index Scan
s (may require separate indexes on theOR
‘d columns) or rewriting the query perhaps usingUNION ALL
if appropriate.
-
Problem: Query needing columns
a
,b
,c
is using anIndex Scan
on(a)
and fetching rows from heap.EXPLAIN
Shows:Index Scan
followed by heap fetches (noIndex Only Scan
). Highactual time
relative to an index-only operation.- Solution: Create or modify an index to include all needed columns:
CREATE INDEX ... ON table (a, b, c);
orCREATE INDEX ... ON table (a) INCLUDE (b, c);
. RunVACUUM <table_name>
to update the visibility map.
Improving Queries Based on EXPLAIN
Based on your analysis, common optimization strategies include:
- Adding Indexes: The most frequent optimization. Add B-tree indexes for equality/range predicates (
WHERE
,JOIN
,ORDER BY
). Consider specialized index types (GIN, GiST, BRIN, Hash) for specific data types or query patterns (full-text search, geometric data, arrays, etc.). Remember to index foreign key columns. - Modifying Indexes: Create covering indexes (using
INCLUDE
or multi-column indexes) to enable Index Only Scans. Ensure multi-column index order matches query conditions. Drop unused indexes. - Updating Statistics: Run
ANALYZE <table_name>
orVACUUM ANALYZE <table_name>
after significant data changes or if estimates are poor. Tune autovacuum/autoanalyze parameters. - Rewriting Queries: Sometimes, changing the SQL structure can lead to a better plan. Examples:
- Replacing complex subqueries with CTEs (Common Table Expressions) or temporary tables (though CTEs can sometimes act as optimization fences).
- Using
UNION ALL
instead ofOR
in some cases. - Using
EXISTS
orNOT EXISTS
instead ofIN
orNOT IN
with subqueries. - Breaking down very complex queries into simpler steps.
- Adjusting PostgreSQL Configuration:
work_mem
: Increase if sorts, hash joins, or hash aggregations spill to disk. Be cautious, as it’s allocated per operation per backend.shared_buffers
: Increase (within reason, often ~25% of system RAM) to allow more data/indexes to be cached in memory, reducing disk I/O. Requires a server restart.effective_cache_size
: Inform the planner about how much memory is available for disk caching by the OS and PostgreSQL combined. Helps it estimate the cost of index vs. sequential scans.- Caution: Tune configuration parameters carefully, monitoring overall system performance.
- Schema Changes: Sometimes, denormalization or changing data types might be necessary for performance, but this should be considered carefully.
- Using Materialized Views: For complex, slow aggregations or joins that are queried frequently but don’t need real-time data, pre-calculating results into a materialized view can provide significant speedups.
Tools for Visualizing EXPLAIN
Output
While the default TEXT
format is informative, the tree structure can be hard to grasp for complex queries. Visualization tools can help immensely:
- explain.depesz.com (Online): A popular web-based tool. Paste your
EXPLAIN (ANALYZE, BUFFERS)
output (TEXT format), and it generates an annotated, color-coded HTML representation highlighting expensive nodes and potential issues. - PEV (Postgres EXPLAIN Visualizer) (Online/Offline): Another excellent tool (pev.dalibo.com or available as source). Takes
EXPLAIN (FORMAT JSON)
output and creates interactive, graphical plan diagrams. Allows easy navigation and identification of bottlenecks. - PgAdmin 4 (Desktop Client): Includes a graphical
EXPLAIN
feature. When you run EXPLAIN/EXPLAIN ANALYZE in the query tool, a “Explain” tab appears below the results, showing a graphical plan tree with costs and timings. - Other IDEs/Tools: Many database IDEs (like DataGrip) have built-in graphical explain plan viewers.
These tools use the same underlying data but present it in a way that makes it easier to see the flow, relative costs/times, and problem areas at a glance. Using EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
and feeding it into PEV is a particularly powerful combination.
Conclusion
The PostgreSQL EXPLAIN
command, especially when combined with the ANALYZE
and BUFFERS
options, is an indispensable diagnostic tool. It provides a window into the query planner’s decisions and the actual execution process, revealing how and why a query performs the way it does.
By learning to read and interpret query plans, you can move beyond guesswork and implement targeted optimizations. Understanding common plan nodes (Scans, Joins, Sorts, Aggregates), identifying discrepancies between estimated and actual values, and pinpointing resource-intensive operations (high time, disk I/O, memory spills) are crucial steps. Remember to leverage table statistics (ANALYZE
), appropriate indexing, query rewriting, and occasionally configuration tuning to address the bottlenecks uncovered by EXPLAIN
.
Query optimization is often an iterative process: EXPLAIN ANALYZE
, identify issues, make a change (add index, rewrite query), and then EXPLAIN ANALYZE
again to verify the impact. While mastering EXPLAIN
takes practice, the ability to dissect query performance is fundamental to building fast, scalable, and efficient applications on PostgreSQL. Start using it today on your slowest queries – you might be surprised by what you find.