10 SQL Query Optimization Techniques (Indexes, EXPLAIN, Joins)
Speed up slow database queries with 10 proven SQL optimization techniques. Covers EXPLAIN ANALYZE, index types, N+1 in SQL, slow query log setup, and real before/after examples.
Get more content like this on Telegram!
Daily AI tips, notes & resources — free
Database performance problems have a specific feeling. The app works fine in development with 100 rows, then you deploy to production with 500,000 rows and suddenly every page takes 8 seconds to load. The query didn't change. The data did.
I've spent more time than I'd like staring at EXPLAIN ANALYZE output, and most slow queries have the same handful of root causes: missing indexes, unnecessary full table scans, N+1 query patterns, or poorly structured joins. These 10 techniques address those causes directly, with real before/after examples.
How to Read EXPLAIN ANALYZE
Before optimizing anything, you need to be able to read what the database is telling you. EXPLAIN ANALYZE is your primary diagnostic tool.
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT p.id, p.title, u.name as author_name
FROM posts p
JOIN users u ON p.author_id = u.id
WHERE p.published = true
ORDER BY p.created_at DESC
LIMIT 20;
Sample output:
Limit (cost=1523.45..1523.50 rows=20 width=156) (actual time=284.123..284.135 rows=20 loops=1)
-> Sort (cost=1523.45..1548.45 rows=10000 width=156) (actual time=284.121..284.126 rows=20 loops=1)
Sort Key: p.created_at DESC
Sort Method: top-N heapsort Memory: 26kB
-> Hash Join (cost=85.00..1273.45 rows=10000 width=156) (actual time=1.234..279.456 rows=9876 loops=1)
Hash Cond: (p.author_id = u.id)
-> Seq Scan on posts p (cost=0.00..1154.00 rows=10000 width=124) (actual time=0.021..240.123 rows=9876 loops=1)
Filter: (published = true)
Rows Removed by Filter: 124
-> Hash (cost=55.00..55.00 rows=2400 width=36) (actual time=1.201..1.201 rows=2400 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 172kB
-> Seq Scan on users u (cost=0.00..55.00 rows=2400 width=36) (actual time=0.012..0.891 rows=2400 loops=1)
Planning Time: 0.542 ms
Execution Time: 284.267 ms
Key things to read:
Seq Scan— scanning the entire table. Almost always the thing to fix with an index.cost=X..Y— estimated cost (arbitrary units). X is startup cost, Y is total cost.actual time=X..Y— real milliseconds. This is what matters.rows=N(estimated) vsrows=N(actual) — big gaps here mean the planner has stale statistics. RunANALYZE tablenameto update.loops=N— how many times this node executed. A nested loop with 10,000 loops is a red flag.
That 284ms query has a Seq Scan on posts scanning 10,000 rows. Let's fix it.
Technique 1: Add the Right Index
The most common fix. The query above scans every post row looking for published = true. An index on published helps, but a composite index on (published, created_at) is better — it covers both the filter and the sort:
-- Basic index on filter column
CREATE INDEX idx_posts_published ON posts(published);
-- Better: composite index covering filter + sort
CREATE INDEX idx_posts_published_created ON posts(published, created_at DESC);
After adding the composite index, the same query:
Limit (cost=0.43..1.23 rows=20 width=156) (actual time=0.089..0.134 rows=20 loops=1)
-> Index Scan using idx_posts_published_created on posts p
Index Cond: (published = true)
Execution Time: 0.201 ms
From 284ms to 0.2ms. The sort is now free because the index already stores data in the right order.
A few index creation tips:
-- Create index without locking the table (important on production)
CREATE INDEX CONCURRENTLY idx_posts_published_created
ON posts(published, created_at DESC);
-- Partial index for common filter (smaller, faster)
CREATE INDEX idx_posts_active ON posts(created_at DESC)
WHERE published = true;
-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE tablename = 'posts'
ORDER BY idx_scan DESC;
A partial index (the third example) only indexes rows where published = true. If 90% of your queries filter for published posts, this gives you a tiny index that fits in memory and queries even faster.
Technique 2: Index Types — Choosing the Right Tool
PostgreSQL supports several index types, and using the wrong one is like using a hammer when you need a scalpel:
| Index Type | Algorithm | Best For | Not For |
|---|---|---|---|
| B-tree (default) | Balanced tree | Equality, range, ORDER BY, LIKE 'prefix%' | LIKE '%suffix', full-text search |
| Hash | Hash table | Equality only (=) | Range queries (<, >, BETWEEN) |
| GIN | Inverted index | Arrays, JSONB, full-text search, tsvector | Simple scalar columns |
| GiST | Generalized search | Geometric types, full-text (tsvector), IP ranges | Simple scalar equality |
| BRIN | Block range | Very large tables with sequential data (time series, IoT) | Random data distribution |
| SP-GiST | Space-partitioned tree | Geometric, IP, phone-tree structures | General purpose |
-- GIN index for JSONB column queries
CREATE INDEX idx_posts_metadata ON posts USING GIN(metadata);
-- Now this query uses the index instead of scanning
SELECT * FROM posts WHERE metadata @> '{"featured": true}';
-- GIN index for full-text search
ALTER TABLE posts ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(content, ''))
) STORED;
CREATE INDEX idx_posts_search ON posts USING GIN(search_vector);
-- Fast full-text search
SELECT id, title FROM posts
WHERE search_vector @@ plainto_tsquery('english', 'database optimization');
For array columns, GIN indexes make operations like @> (contains) and && (overlaps) fast:
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);
SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'performance'];
Technique 3: Cover Your Queries with Covering Indexes
A covering index includes all columns a query needs, so PostgreSQL can answer the query entirely from the index without touching the main table (a "heap fetch").
-- Query: get post titles and excerpts for published posts
SELECT id, title, excerpt FROM posts WHERE published = true ORDER BY created_at DESC;
-- Regular index: still needs heap fetch for title, excerpt
CREATE INDEX idx_posts_published ON posts(published, created_at DESC);
-- Covering index: includes all needed columns
CREATE INDEX idx_posts_published_cover
ON posts(published, created_at DESC)
INCLUDE (id, title, excerpt);
The EXPLAIN output for the covering index shows Index Only Scan — no heap access, just reading the index structure. This is especially valuable for frequently-run queries on large tables.
Technique 4: Rewrite N+1 Queries as Proper JOINs
N+1 in SQL looks like this in application code:
// N+1 pattern — runs 1 query for posts, then 1 per post for author
const posts = await db.query('SELECT * FROM posts LIMIT 50');
for (const post of posts) {
post.author = await db.query('SELECT * FROM users WHERE id = $1', [post.author_id]);
}
// Total: 51 queries
The SQL equivalent — what this generates — is:
SELECT * FROM posts LIMIT 50;
-- Then 50 times:
SELECT * FROM users WHERE id = 1;
SELECT * FROM users WHERE id = 2;
-- ...
Rewrite as a single JOIN:
-- Single query, single round trip
SELECT
p.id,
p.title,
p.created_at,
u.id as author_id,
u.name as author_name,
u.avatar_url
FROM posts p
JOIN users u ON p.author_id = u.id
ORDER BY p.created_at DESC
LIMIT 50;
For fetching related data in bulk without a JOIN (useful when the relationship is one-to-many and JOIN would multiply rows):
-- Fetch authors for a known set of posts — one query instead of N
SELECT * FROM users
WHERE id = ANY(ARRAY[1, 2, 3, 4, 5]::int[]);
This is the SQL equivalent of DataLoader batching. See the pattern mirrored in GraphQL resolver best practices for the application-layer version.
Technique 5: SELECT Only What You Need
-- BAD: fetches all columns, including large text fields
SELECT * FROM posts WHERE published = true;
-- GOOD: fetch only what you display
SELECT id, title, excerpt, created_at, author_id
FROM posts
WHERE published = true;
SELECT * on a table with a content column containing 10KB of text per row means you're moving 100MB of text across the wire for 10,000 rows when you only needed titles. This is especially costly when the "wide" columns are JSONB or TEXT types.
Technique 6: Use Query Planner Statistics
PostgreSQL's query planner makes decisions based on table statistics. If stats are stale, the planner makes bad choices.
-- Update statistics for a specific table
ANALYZE posts;
-- Update all tables
ANALYZE;
-- Check when stats were last collected
SELECT schemaname, tablename, last_analyze, last_autoanalyze, n_live_tup
FROM pg_stat_user_tables
WHERE tablename = 'posts';
-- If the planner underestimates row counts, increase statistics target
ALTER TABLE posts ALTER COLUMN tags SET STATISTICS 500; -- default is 100
ANALYZE posts;
After a large bulk import or DELETE, always run ANALYZE on affected tables before relying on query plan decisions.
Technique 7: Pagination Done Right
This is a common one that looks innocent but destroys performance at scale:
-- SLOW: offset increases with page number — scans and discards all previous rows
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- PostgreSQL must read 10,020 rows and discard the first 10,000
For large datasets, use keyset pagination (cursor-based):
-- FAST: uses the index to start at the right place
-- First page
SELECT id, title, created_at FROM posts
WHERE published = true
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Subsequent pages: pass last row's values as cursor
SELECT id, title, created_at FROM posts
WHERE published = true
AND (created_at, id) < ('2026-05-15T10:30:00Z', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;
The second query uses the composite index directly and reads exactly 20 rows regardless of which "page" you're on. OFFSET 10000 reads 10,020 rows and discards 10,000. At page 500, that's reading 10,000+ rows for every request.
Technique 8: Optimize JOINs
JOIN order and type matter:
-- Ensure the smaller table drives the join when possible
-- PostgreSQL's planner usually figures this out, but sometimes needs help
-- Use EXISTS instead of IN for large subqueries
-- SLOW with large subquery result
SELECT * FROM posts WHERE author_id IN (
SELECT id FROM users WHERE country = 'US'
);
-- FASTER: EXISTS stops at first match
SELECT p.* FROM posts p
WHERE EXISTS (
SELECT 1 FROM users u
WHERE u.id = p.author_id AND u.country = 'US'
);
-- Or use a JOIN (often cleaner and just as fast)
SELECT DISTINCT p.* FROM posts p
JOIN users u ON p.author_id = u.id
WHERE u.country = 'US';
Technique 9: Enable and Use the Slow Query Log
In PostgreSQL:
-- In postgresql.conf (or via ALTER SYSTEM)
ALTER SYSTEM SET log_min_duration_statement = '1000'; -- log queries > 1 second
ALTER SYSTEM SET log_statement = 'none'; -- don't log everything, just slow ones
SELECT pg_reload_conf();
-- Better: use pg_stat_statements extension for cumulative stats
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slowest queries by total time
SELECT
round(total_exec_time::numeric, 2) as total_ms,
calls,
round(mean_exec_time::numeric, 2) as mean_ms,
round(stddev_exec_time::numeric, 2) as stddev_ms,
left(query, 100) as query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
In MySQL:
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
-- Or use Performance Schema
SELECT * FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;
Technique 10: VACUUM and Table Maintenance
PostgreSQL's MVCC (multi-version concurrency control) means deleted and updated rows leave "dead tuples" behind. These bloat tables and slow queries.
-- Check for bloat
SELECT
tablename,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / nullif(n_live_tup + n_dead_tup, 0), 1) as dead_pct,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
-- Manual VACUUM (doesn't lock, safe for production)
VACUUM posts;
-- VACUUM ANALYZE: clean dead tuples AND update statistics
VACUUM ANALYZE posts;
-- VACUUM FULL: reclaims disk space but locks the table — use during maintenance windows
VACUUM FULL posts;
Autovacuum handles this automatically in normal operation, but high-churn tables (lots of UPDATEs/DELETEs) sometimes need manual intervention. If dead_pct is consistently above 20%, consider tuning autovacuum parameters for that table.
Before and After: A Real Example
-- BEFORE: 1.2 seconds, sequential scan
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON p.author_id = u.id AND p.published = true
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name
ORDER BY post_count DESC
LIMIT 10;
-- Execution Time: 1234.567 ms (Seq Scan on posts, 500k rows)
-- Fix: composite index covering the join condition
CREATE INDEX CONCURRENTLY idx_posts_author_published
ON posts(author_id, published)
WHERE published = true;
-- Also index users.created_at for the WHERE filter
CREATE INDEX CONCURRENTLY idx_users_created_at ON users(created_at);
-- AFTER: 8ms, index scans
-- Execution Time: 7.891 ms
For an ORM-based workflow where you want SQL query optimization alongside your schema management, the patterns in Prisma ORM PostgreSQL and PostgreSQL vs MySQL complement what's here. For the API layer on top, SQL basics cheatsheet and SQL joins and subqueries cover the query fundamentals these techniques build on.
Conclusion
Query optimization isn't magic — it's reading what the database tells you and responding appropriately. Start by enabling pg_stat_statements or the slow query log to find which queries actually need attention. Then EXPLAIN ANALYZE each one to see whether you need an index, a query rewrite, or just fresh statistics.
The techniques here — indexing strategically, fixing N+1 joins, choosing the right index type, paginating with cursors, keeping tables vacuumed — address 95% of the slow query problems you'll encounter in a typical web application. Pick the one that matches your current bottleneck and apply it. Then measure again.
Don't add indexes speculatively. Add them because EXPLAIN ANALYZE showed you a sequential scan that's causing pain.
Frequently Asked Questions
How do I find which queries are slowing down my database?
Enable the slow query log. In PostgreSQL, set log_min_duration_statement = 1000 in postgresql.conf (logs queries over 1 second) or use pg_stat_statements extension which tracks cumulative stats for every query. In MySQL, set slow_query_log = ON and long_query_time = 1. Once you've identified slow queries, run EXPLAIN ANALYZE on each one to see exactly where the time is being spent — index scans, sequential scans, sort operations, etc.
When should I add an index and when should I avoid it?
Add an index when: a column appears frequently in WHERE, JOIN, or ORDER BY clauses; the column has high cardinality (many unique values); the table is read-heavy. Avoid indexes when: the table is very small (sequential scan is faster); the column has low cardinality (e.g., a boolean column — an index rarely helps); the table is write-heavy (every INSERT/UPDATE/DELETE must update all indexes). A good starting point: foreign key columns almost always benefit from indexes, and columns in your most common WHERE clauses usually do too.
What is the difference between EXPLAIN and EXPLAIN ANALYZE in PostgreSQL?
EXPLAIN shows the query plan PostgreSQL intends to use — estimated rows, costs, and which indexes it plans to use — without actually executing the query. EXPLAIN ANALYZE actually runs the query and compares the estimated plan against actual execution stats (actual rows, actual time, loops). Use EXPLAIN for a quick check without running expensive queries; use EXPLAIN ANALYZE when you need to see if the planner's estimates are accurate and where actual time is being spent. Always run EXPLAIN ANALYZE with BUFFERS to see cache hit/miss info too.
Frequently Asked Questions
AiTechWorlds Team
✓ Verified WriterThe AiTechWorlds team is passionate about AI, technology, and education. We create high-quality, research-backed content to help you learn, grow, and succeed in the modern digital world.
Related Articles
How to Use Docker Compose for Local Dev (Node.js + PostgreSQL)
Set up a full local dev environment with Docker Compose, Node.js, PostgreSQL, and pgAdmin. Includes .env config, named volumes, healthchecks, and common error fixes.
Build a Data Analysis Agent with AutoGPT (CSV, SQL, Plots)
Build a data analysis agent using AutoGPT that reads CSVs, queries SQL databases, and generates plots automatically. Full code with pandas and matplotlib.
Docker for Backend Developers: Containerize Your API (2026)
A practical Docker tutorial for backend developers — Dockerfile, docker-compose with a database, multi-stage builds, and when to use Docker vs bare metal vs Kubernetes.
10 LangChain Toolkits: SQL, Playwright, GitHub, Slack (2026)
Explore 10 LangChain toolkits including SQLDatabaseToolkit, PlaywrightBrowserToolkit, GitHubToolkit, and SlackToolkit with full Python code and comparison table.