AiTechWorlds
AiTechWorlds
Two chefs are making the same pasta dish. Chef A starts with the most expensive, time-consuming ingredient — slow-simmered tomatoes — then adds everything else, checking all 200 ingredients one by one regardless of relevance. Chef B reads the recipe first, identifies the fastest preparations, filters out what she doesn't need early, and builds the dish in the most efficient sequence.
Same dish. Same ingredients. Chef A takes two hours. Chef B takes twenty minutes.
This is query optimization. The database query optimizer is Chef B — it reads your SQL, considers dozens of execution strategies, and picks the one that accomplishes your request with the least work.
Your SQL is a declaration — you describe what you want, not how to get it. The optimizer decides the how.
SQL Query (what you want)
│
▼
┌─────────────┐
│ Parser │ → Syntax check, build parse tree
└─────────────┘
│
▼
┌─────────────┐
│ Rewriter │ → Expand views, apply rules
└─────────────┘
│
▼
┌─────────────────────┐
│ Planner/Optimizer │ → Generate logical plan
│ │ Estimate costs for each plan
│ │ Choose cheapest physical plan
└─────────────────────┘
│
▼
┌─────────────┐
│ Executor │ → Run the plan, return results
└─────────────┘
The optimizer uses table statistics (row counts, column distributions, histogram data) to estimate how expensive each plan will be. Run ANALYZE in PostgreSQL or ANALYZE TABLE in MySQL to keep these statistics fresh.
EXPLAIN shows the execution plan. EXPLAIN ANALYZE runs the query and shows actual times.
-- PostgreSQL
EXPLAIN ANALYZE
SELECT o.id, c.name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending' AND o.total > 1000;
Sample output:
Hash Join (cost=245.00..892.00 rows=150 width=48)
(actual time=3.2..8.7 rows=143 loops=1)
Hash Cond: (o.customer_id = c.id)
-> Seq Scan on orders o (cost=0.00..620.00 rows=180 width=32)
(actual time=0.1..5.2 rows=180 loops=1)
Filter: ((status = 'pending') AND (total > 1000))
Rows Removed by Filter: 9820
-> Hash (cost=120.00..120.00 rows=1000 width=24)
(actual time=1.8..1.8 rows=1000 loops=1)
-> Seq Scan on customers c (cost=0.00..120.00 rows=1000 width=24)
Planning Time: 0.9 ms
Execution Time: 9.1 ms
What to look for:
Seq Scan on large tables = potential missing indexcost=X..Y = estimated startup cost .. total costrows= estimate vs actual rows= — large differences mean stale statisticsloops= > 1 in nested loops = inner side runs many times (can be slow)Every column you select requires reading data from disk and transmitting it over the network.
-- Slow: fetches all columns including large TEXT/BLOB fields
SELECT * FROM products WHERE category = 'electronics';
-- Fast: fetches only needed columns
SELECT id, name, price FROM products WHERE category = 'electronics';
SELECT * also breaks covering indexes — the optimizer cannot use an index-only scan if you demand columns not in the index.
Push filters as close to the data source as possible. Eliminate rows before joining, not after.
-- Slow: joins first (potentially millions of rows), then filters
SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.total > 5000;
-- Same query — optimizer usually handles this, but be explicit in subqueries:
SELECT c.name, o.total
FROM customers c
JOIN (
SELECT customer_id, total FROM orders WHERE total > 5000
) o ON c.id = o.customer_id;
An index exists — but will the optimizer use it? These patterns bypass the index:
-- Index on order_date is IGNORED — function applied to column
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
-- Index USED — rewrite the filter as a range
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';
-- Index on price is IGNORED — implicit type conversion
SELECT * FROM products WHERE price = '99.99'; -- price is DECIMAL, not VARCHAR
-- Index USED — correct type
SELECT * FROM products WHERE price = 99.99;
Rule: Never apply functions or implicit conversions to an indexed column in a WHERE clause. Transform the value side instead.
The database typically joins the smallest result set to the largest. Help the optimizer by filtering aggressively before joining.
-- Let the optimizer see your intent: filter before the join
SELECT e.name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.id
WHERE e.hire_date > '2023-01-01' -- Dramatically reduces employees rows
AND d.location = 'New York'; -- Reduces departments rows
One of the most devastating performance anti-patterns in application code:
-- N+1: one query to get 100 orders, then 100 separate queries for each customer
SELECT * FROM orders; -- returns 100 rows
-- Then in a loop for each order:
SELECT * FROM customers WHERE id = ?; -- runs 100 times!
-- FIXED: one JOIN instead
SELECT o.*, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- 1 query. Same result.
The problem query — reporting screen taking 18 seconds:
-- BEFORE (slow)
SELECT * FROM orders
WHERE MONTH(created_at) = 3
AND YEAR(created_at) = 2024
AND status != 'cancelled';
EXPLAIN output:
Seq Scan on orders (cost=0.00..245000.00 rows=12400 width=256)
Filter: ((MONTH(created_at) = 3) AND (YEAR(created_at) = 2024) ...)
Rows Removed by Filter: 987600
Execution Time: 18,200 ms
The optimized query:
-- AFTER (fast)
CREATE INDEX idx_orders_created_status ON orders (created_at, status);
SELECT id, customer_id, total, created_at FROM orders
WHERE created_at >= '2024-03-01'
AND created_at < '2024-04-01'
AND status != 'cancelled';
EXPLAIN output:
Index Scan using idx_orders_created_status on orders
(cost=0.56..3240.00 rows=12200 width=48)
Index Cond: ((created_at >= '2024-03-01') AND (created_at < '2024-04-01'))
Filter: (status <> 'cancelled')
Execution Time: 42 ms
18,200 ms → 42 ms. That is a 433× speedup from three changes:
Don't guess which queries to optimize. Let the database tell you.
-- MySQL: enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Log queries taking > 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
-- PostgreSQL: in postgresql.conf
-- log_min_duration_statement = 1000 (milliseconds)
-- Then query pg_stat_statements:
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
| Technique | Impact | Effort |
|---|---|---|
| Add missing index | Very High | Low |
Remove SELECT * | Medium | Low |
| Avoid functions on indexed columns | High | Low |
| Fix N+1 queries | Very High | Medium |
| Rewrite subqueries as JOINs | Medium | Medium |
| Use covering index | High | Medium |
| Partition large tables | High | High |
Update table statistics (ANALYZE) | Medium | Low |
EXPLAIN ANALYZE is your primary debugging tool — always check the plan before assuming a query is optimal.pg_stat_statements to find the queries that actually need optimization.The fastest query is the one that reads the fewest rows. Every optimization technique ultimately serves this single goal.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises