PostgreSQL Indexing Guide for Faster Queries (2026)
PostgreSQL Indexing Guide for Faster Queries (2026)
Most developers add an index when a query gets slow, see it drop from 3 seconds to 200ms, and call it done. That works until you have 20 indexes on a table, inserts start crawling, and your storage bill doubles. This guide covers the full picture: choosing the right index type, designing for your workload, diagnosing slow queries in production, and keeping indexes healthy over time.
Understanding Index Fundamentals & Performance Impact

How PostgreSQL Indexes Work (B-tree, Hash, and Beyond)
Without an index, PostgreSQL executes a sequential scan — it reads every row in the table until it finds the ones that match your WHERE clause. On a 10-million-row table, that's potentially 10 million row evaluations for a query returning one record. An index is a separate data structure that maps column values to physical row locations (ctid), letting PostgreSQL jump directly to matching rows.
B-tree is the default for a reason. It handles equality, range, sorting, and NULL comparisons. It's self-balancing, supports forward and backward traversal, and works with every comparison operator (=, <, >, BETWEEN, IN, LIKE 'foo%'). In practice, 95% of your indexes will be B-tree.
-- Basic B-tree index
CREATE INDEX idx_users_email ON users(email);
-- EXPLAIN output before the index (sequential scan):
-- Seq Scan on users (cost=0.00..58340.00 rows=1 width=200)
-- Filter: (email = 'alice@example.com')
-- EXPLAIN output after the index:
-- Index Scan using idx_users_email on users (cost=0.43..8.45 rows=1 width=200)
-- Index Cond: (email = 'alice@example.com'::text)
Hash indexes only support equality comparisons (=). They can be marginally faster than B-tree for pure equality lookups on high-cardinality columns, but they can't handle range queries or sorting. In most cases they're not worth the trade-off in flexibility.
| Index Type | Best For | Avoid When |
|---|---|---|
| B-tree | General queries, ranges, sorting | Almost never — it's the safe default |
| Hash | Equality-only on high-cardinality columns | You need range queries or ORDER BY |
| GIN | Full-text search, arrays, JSONB containment | Simple scalar columns |
| GiST | Geometric/spatial data, range types | Standard relational data |
| SP-GiST | Non-balanced structures (IP addresses, phone trees) | General-purpose queries |
| BRIN | Huge tables with naturally sorted data (timestamps, IDs) | Randomly ordered data or small tables |
When Indexes Actually Help (and When They Hurt)
The key concept is selectivity. An index pays off when it eliminates a large percentage of rows. A query filtering on status = 'active' where 80% of rows are active? PostgreSQL will likely ignore your index and do a sequential scan anyway — that's the right call. A query filtering on user_id = 42 in a 10-million-row table? The index saves you from reading 9,999,999 rows you don't need.
Here's the real-world cost. On a table with 10 million orders, adding an index on customer_id dropped a customer lookup from 1.2 seconds to 15ms. That's the good news. The hidden cost: every INSERT, UPDATE, and DELETE on that table now has to update the index too. One index adds ~10-20% overhead on writes. Ten indexes on an INSERT-heavy table compounds that into a serious bottleneck.
-- Wrong: indexing every column on a high-write events table
CREATE INDEX idx_events_type ON events(event_type);
CREATE INDEX idx_events_user ON events(user_id);
CREATE INDEX idx_events_ts ON events(created_at);
CREATE INDEX idx_events_session ON events(session_id);
CREATE INDEX idx_events_ip ON events(ip_address);
-- 5 indexes = every INSERT maintains 5 separate data structures
-- Right: analyze which queries actually run, then index selectively
-- If 90% of queries filter by user_id + created_at, one composite index covers it
CREATE INDEX idx_events_user_ts ON events(user_id, created_at);
The rule: index columns that appear in WHERE clauses of frequent, selective queries. Don't pre-index "just in case."
Designing & Creating the Right Indexes for Your Workload
Index Selection Strategy: OLTP vs. OLAP Patterns
OLTP workloads (transactional systems — e-commerce checkouts, user logins, order updates) run thousands of small queries per second, each touching a few rows. These benefit heavily from targeted indexes. Latency matters more than throughput on any single query.
OLAP workloads (analytics — monthly revenue reports, cohort analysis, aggregations) scan large chunks of the table. Indexes rarely help here because PostgreSQL will often just do the sequential scan regardless. Aggressive indexing on an analytics table wastes storage and slows ETL loads.
-- E-commerce schema: OLTP index strategy
-- Fast user lookups by email (login flow)
CREATE INDEX idx_users_email ON users(email);
-- Fast order lookup by customer (order history page)
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Fast product search by category + availability
CREATE INDEX idx_products_category_active ON products(category_id, is_available);
-- Monthly revenue report (OLAP) — no index needed
-- This scans all orders for the month anyway
-- SELECT SUM(total), DATE_TRUNC('month', created_at)
-- FROM orders GROUP BY 2;
-- PostgreSQL will seq scan and that's fine here
Hybrid workloads are the hard case. If the same table handles both real-time OLTP reads and nightly OLAP aggregations, partial indexes are your escape hatch — index only the active subset used by OLTP queries, leave the historical data unindexed for batch processing.
Multi-Column Indexes & Covering Indexes for Query Optimization
Column order in a composite index is not arbitrary. PostgreSQL can use a composite index if your query filters on the leading columns. An index on (org_id, is_active, email) helps queries filtering on org_id alone, org_id + is_active, or all three. It does nothing for a query filtering only on email.
-- Wrong order: email is rarely queried alone, org_id is the common filter
CREATE INDEX idx_users_bad ON users(email, org_id, is_active);
-- Right order: leading column is the most commonly filtered
CREATE INDEX idx_users_org_active ON users(org_id, is_active, email);
-- This query uses the index efficiently:
SELECT * FROM users WHERE org_id = 5 AND is_active = true;
-- This query cannot use the index:
SELECT * FROM users WHERE email = 'alice@example.com';
-- (needs its own index)
Covering indexes take this further. Instead of fetching the index entry and then doing a heap lookup for the actual row data, PostgreSQL can satisfy the query entirely from the index when you use INCLUDE to store extra columns alongside the index key.
-- Without covering index: Index Scan fetches index, then heap for total/status
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- With covering index: Index Only Scan, no heap access
CREATE INDEX idx_orders_customer_covering ON orders(customer_id)
INCLUDE (total, status);
-- EXPLAIN output showing the difference:
-- Index Only Scan using idx_orders_customer_covering on orders
-- (cost=0.43..4.45 rows=1 width=16)
-- Index Cond: (customer_id = 42)
-- Heap Fetches: 0 ← this is what you want
Partial Indexes & Expression Indexes for Advanced Scenarios
Partial indexes are one of the most underused features in PostgreSQL. Instead of indexing an entire column across all rows, you index only the rows matching a condition. The result: a smaller index that's faster to scan and cheaper to maintain.
-- Soft delete pattern: 99% of queries only touch non-deleted users
-- Wrong: full index includes millions of deleted rows
CREATE INDEX idx_users_id ON users(id);
-- Right: partial index only covers active rows
CREATE INDEX idx_active_users ON users(id) WHERE deleted_at IS NULL;
-- Expression index: queries using LOWER(email) need this to use the index
-- Wrong: index on raw email won't help case-insensitive searches
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com'; -- seq scan!
-- Right: index the expression itself
CREATE INDEX idx_email_lower ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com'; -- index scan!
-- Functional index for date-based filtering
CREATE INDEX idx_year_orders ON orders((EXTRACT(YEAR FROM created_at)));
SELECT * FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2025; -- uses index
Expression indexes come with a gotcha: your query must use the exact same expression as the index definition for PostgreSQL to recognize it. LOWER(email) and lower(email) both work (case-insensitive function names), but EMAIL::text won't match an index on LOWER(email).
Diagnosing Slow Queries & Identifying Missing Indexes
Using EXPLAIN & EXPLAIN ANALYZE to Find Bottlenecks
EXPLAIN shows the query plan with estimated costs. EXPLAIN ANALYZE actually runs the query and shows real timing and row counts. Always use EXPLAIN (ANALYZE, BUFFERS) in production diagnostics — the BUFFERS option reveals cache hits vs. disk reads, which is often where the real problem lies.
-- Always use this form for production debugging
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';
-- Example annotated output:
-- Index Scan using idx_orders_customer on orders
-- (cost=0.43..1200.50 rows=847 width=156)
-- (actual time=0.089..12.445 rows=834 loops=1)
-- Index Cond: (customer_id = 42)
-- Filter: (status = 'pending')
-- Rows Removed by Filter: 1243 ← RED FLAG: filtering 1243 rows after index scan
-- Buffers: shared hit=156 read=89 ← 89 disk reads, consider composite index
-- Planning Time: 0.312 ms
-- Execution Time: 13.201 ms
"Rows Removed by Filter" is a red flag. It means PostgreSQL used the index to find rows matching the first condition, then discarded many of them because of a second filter. The fix is usually a composite index that includes the filtering column. See the composite index deep dive for more on this pattern.
Sequential scans aren't always bad. On a 1,000-row table, a seq scan is faster than an index lookup. The alarm bells ring when you see Seq Scan on a table with millions of rows and a cost number in the tens of thousands.
Query Monitoring & Index Gap Detection in Production
Enable pg_stat_statements — it's the single most useful tool for finding slow queries at scale. It tracks call counts, total time, and mean execution time across all queries your database runs.
-- Find your worst offenders: slow queries running frequently
SELECT
query,
calls,
ROUND(mean_exec_time::numeric, 2) AS mean_ms,
ROUND(total_exec_time::numeric, 2) AS total_ms,
ROUND(stddev_exec_time::numeric, 2) AS stddev_ms
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY total_exec_time DESC
LIMIT 20;
-- Find indexes that nobody uses (candidates for removal)
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE 'pg_%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Find tables doing heavy sequential scans (missing index candidates)
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 100
AND n_live_tup > 10000
ORDER BY seq_tup_read DESC;
For slow query logging, set log_min_duration_statement = 1000 in your postgresql.conf to capture any query taking over 1 second. That's your first diagnostic pass. Tighten it to 100ms once you've cleared the obvious problems. Check out the PostgreSQL performance tuning checklist for a full configuration walkthrough.
Anti-Patterns That Kill Performance
The most common anti-pattern: redundant indexes. If you have an index on (customer_id) and another on (customer_id, status), the second index makes the first obsolete for most queries — but PostgreSQL maintains both on every write.
-- Wrong: redundant indexes, all maintained on every INSERT/UPDATE
CREATE INDEX idx_orders_cust ON orders(customer_id);
CREATE INDEX idx_orders_cust_status ON orders(customer_id, status);
CREATE INDEX idx_orders_cust_date ON orders(customer_id, created_at);
-- Right: analyze your actual query patterns first
-- If queries are either (customer_id) or (customer_id + status),
-- the composite handles both:
CREATE INDEX idx_orders_cust_status ON orders(customer_id, status);
-- The leading column (customer_id) covers single-column filter queries too
Low-cardinality columns are another trap. Indexing a boolean column like is_processed where 95% of rows are true gives you almost nothing. PostgreSQL's query planner will likely ignore the index and seq scan. Use a partial index instead: WHERE is_processed = false.
Maintaining Indexes & Preventing Performance Degradation
Index Bloat: Detection, Impact, and Remediation
PostgreSQL uses MVCC (Multi-Version Concurrency Control), which means UPDATE and DELETE operations don't immediately remove old row versions — they leave dead tuples behind. VACUUM cleans those from the table, but indexes retain pointers to dead tuples until they're reindexed. Over time, this creates bloat: the index grows larger than necessary, scans take longer, and you're doing more I/O for the same logical data.
-- Quick bloat estimate from pg_stat_user_tables
SELECT
schemaname,
tablename,
n_live_tup,
n_dead_tup,
ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE n_live_tup > 1000
ORDER BY dead_pct DESC;
-- Precise bloat analysis with pgstattuple extension
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT
index_size,
leaf_fragmentation,
avg_leaf_density
FROM pgstattuple('idx_orders_customer_id');
-- leaf_fragmentation > 30% = consider rebuilding
-- avg_leaf_density < 50% = significant bloat
For remediation, you have three options. VACUUM reclaims dead tuples from the table but doesn't rebuild the index structure itself. REINDEX rebuilds the index from scratch — effective, but it takes an exclusive lock (blocks reads and writes). In production, use REINDEX CONCURRENTLY to rebuild without locking.
-- Blocking rebuild (only for maintenance windows or non-critical tables)
REINDEX INDEX idx_orders_customer_id;
-- Non-blocking rebuild (safe for production)
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;
-- Rebuild all indexes on a table without downtime
REINDEX TABLE CONCURRENTLY orders;
Index Maintenance Strategy & Automation
Autovacuum handles most routine maintenance if it's configured correctly. The default settings are conservative — they work fine for small databases but fall behind on high-write production systems. The key parameters to tune are autovacuum_vacuum_scale_factor (default 0.2, meaning vacuum triggers when 20% of the table has changed) and autovacuum_analyze_scale_factor.
-- Override autovacuum settings per-table for high-write tables
ALTER TABLE events SET (
autovacuum_vacuum_scale_factor = 0.01, -- vacuum when 1% changes (not 20%)
autovacuum_analyze_scale_factor = 0.005,
autovacuum_vacuum_cost_delay = 2 -- less aggressive throttling
);
-- Check if autovacuum is keeping up
SELECT
relname,
last_autovacuum,
last_autoanalyze,
autovacuum_count,
n_dead_tup
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
Set up a monthly audit using the unused index query from the previous section. Dropping unused indexes is free performance: every index you remove speeds up writes and reduces storage. Don't be sentimental about indexes — if idx_scan = 0 over a 30-day window on a busy table, it's not helping anyone. For deeper reference, see the official PostgreSQL indexing documentation.
Also monitor index size trends over time. A well-maintained index on a 10M-row table should be stable in size. If it's growing 20% month-over-month with no corresponding data growth, you have a bloat problem that REINDEX CONCURRENTLY needs to address. The pganalyze tool automates most of this monitoring if you want a managed solution.
Frequently Asked Questions

Q: How many indexes are too many on a single table?
A: There's no fixed number, but once you're past 5-6 indexes on a high-write table, question each one. Run the unused index query against pg_stat_user_indexes after 30 days of production traffic and drop anything with zero scans. The real limit is write throughput degradation — benchmark INSERT performance with your actual workload.
Q: Why is PostgreSQL ignoring my index and doing a sequential scan?
A: Either the query planner thinks the index won't help (low selectivity, small table, or stale statistics), or the index genuinely won't help. Run EXPLAIN (ANALYZE, BUFFERS) to confirm. If statistics are stale, run ANALYZE tablename. If you want to force the planner to use the index temporarily for debugging, set SET enable_seqscan = off in your session — but never in production permanently.
Q: Should I use REINDEX or DROP/CREATE INDEX to fix bloat?
A: Use REINDEX CONCURRENTLY — it's equivalent to dropping and recreating the index but won't lock your table. DROP + CREATE requires you to manage the concurrency yourself and you risk a gap where the index doesn't exist. The only reason to use DROP + CREATE manually is if you need to change the index definition at the same time.
Wrap-up
Good indexing isn't a one-time task — it's a diagnostic loop: find slow queries with pg_stat_statements, explain the plan, add the right index (composite, partial, or covering depending on the pattern), then audit unused indexes and rebuild bloated ones on a schedule. The biggest gains come from the composite and covering index patterns, and the biggest wins on maintenance come from tuning autovacuum and running monthly unused-index audits.
Start today: run the pg_stat_statements query on your slowest-query list and look for sequential scans on tables over 100k rows — there's almost always a quick win waiting there.
References
- How Database Indexes Work – A Practical Guide with PostgreSQL Examples
- PostgreSQL Performance: Essential Indexing Guidelines - DEV Community
- PostgreSQL Indexes
- PostgreSQL Indexes: A Practical Guide for Developers and DBAs
- Efficient Use of PostgreSQL Indexes | Heroku Dev Center
- When not to use indexes in PostgreSQL - Facebook
Comments
Post a Comment