Two chefs are making the same pasta dish. Chef A starts with the most expensive, time-consuming ingredient — slow-simmered tomatoes — then adds everything else, checking all 200 ingredients one by one regardless of relevance. Chef B reads the recipe first, identifies the fastest preparations, filters out what she doesn't need early, and builds the dish in the most efficient sequence.

Same dish. Same ingredients. Chef A takes two hours. Chef B takes twenty minutes.

This is query optimization. The database query optimizer is Chef B — it reads your SQL, considers dozens of execution strategies, and picks the one that accomplishes your request with the least work.

How the Query Optimizer Works

Your SQL is a declaration — you describe what you want, not how to get it. The optimizer decides the how.

SQL Query (what you want)
        │
        ▼
  ┌─────────────┐
  │   Parser    │  → Syntax check, build parse tree
  └─────────────┘
        │
        ▼
  ┌─────────────┐
  │  Rewriter   │  → Expand views, apply rules
  └─────────────┘
        │
        ▼
  ┌─────────────────────┐
  │  Planner/Optimizer  │  → Generate logical plan
  │                     │    Estimate costs for each plan
  │                     │    Choose cheapest physical plan
  └─────────────────────┘
        │
        ▼
  ┌─────────────┐
  │  Executor   │  → Run the plan, return results
  └─────────────┘

The optimizer uses table statistics (row counts, column distributions, histogram data) to estimate how expensive each plan will be. Run ANALYZE in PostgreSQL or ANALYZE TABLE in MySQL to keep these statistics fresh.

Reading EXPLAIN Output

EXPLAIN shows the execution plan. EXPLAIN ANALYZE runs the query and shows actual times.

-- PostgreSQL
EXPLAIN ANALYZE
SELECT o.id, c.name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending' AND o.total > 1000;

Sample output:

Hash Join  (cost=245.00..892.00 rows=150 width=48)
           (actual time=3.2..8.7 rows=143 loops=1)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders o  (cost=0.00..620.00 rows=180 width=32)
                             (actual time=0.1..5.2 rows=180 loops=1)
        Filter: ((status = 'pending') AND (total > 1000))
        Rows Removed by Filter: 9820
  ->  Hash  (cost=120.00..120.00 rows=1000 width=24)
             (actual time=1.8..1.8 rows=1000 loops=1)
        ->  Seq Scan on customers c  (cost=0.00..120.00 rows=1000 width=24)
Planning Time: 0.9 ms
Execution Time: 9.1 ms

What to look for:

Seq Scan on large tables = potential missing index
cost=X..Y = estimated startup cost .. total cost
rows= estimate vs actual rows= — large differences mean stale statistics
loops= > 1 in nested loops = inner side runs many times (can be slow)

Optimization Techniques

1. SELECT Only What You Need

Every column you select requires reading data from disk and transmitting it over the network.

-- Slow: fetches all columns including large TEXT/BLOB fields
SELECT * FROM products WHERE category = 'electronics';

-- Fast: fetches only needed columns
SELECT id, name, price FROM products WHERE category = 'electronics';

SELECT * also breaks covering indexes — the optimizer cannot use an index-only scan if you demand columns not in the index.

2. Filter Early with WHERE

Push filters as close to the data source as possible. Eliminate rows before joining, not after.

-- Slow: joins first (potentially millions of rows), then filters
SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.total > 5000;

-- Same query — optimizer usually handles this, but be explicit in subqueries:
SELECT c.name, o.total
FROM customers c
JOIN (
    SELECT customer_id, total FROM orders WHERE total > 5000
) o ON c.id = o.customer_id;

3. Use Indexes Correctly

An index exists — but will the optimizer use it? These patterns bypass the index:

-- Index on order_date is IGNORED — function applied to column
SELECT * FROM orders WHERE YEAR(order_date) = 2024;

-- Index USED — rewrite the filter as a range
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

-- Index on price is IGNORED — implicit type conversion
SELECT * FROM products WHERE price = '99.99';  -- price is DECIMAL, not VARCHAR

-- Index USED — correct type
SELECT * FROM products WHERE price = 99.99;

Rule: Never apply functions or implicit conversions to an indexed column in a WHERE clause. Transform the value side instead.

4. JOIN Order Matters

The database typically joins the smallest result set to the largest. Help the optimizer by filtering aggressively before joining.

-- Let the optimizer see your intent: filter before the join
SELECT e.name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.id
WHERE e.hire_date > '2023-01-01'   -- Dramatically reduces employees rows
  AND d.location = 'New York';     -- Reduces departments rows

5. The N+1 Problem

One of the most devastating performance anti-patterns in application code:

-- N+1: one query to get 100 orders, then 100 separate queries for each customer
SELECT * FROM orders;  -- returns 100 rows

-- Then in a loop for each order:
SELECT * FROM customers WHERE id = ?;  -- runs 100 times!

-- FIXED: one JOIN instead
SELECT o.*, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- 1 query. Same result.

Before and After: Real Optimization Example

The problem query — reporting screen taking 18 seconds:

-- BEFORE (slow)
SELECT * FROM orders
WHERE MONTH(created_at) = 3
  AND YEAR(created_at) = 2024
  AND status != 'cancelled';

EXPLAIN output:
Seq Scan on orders  (cost=0.00..245000.00 rows=12400 width=256)
  Filter: ((MONTH(created_at) = 3) AND (YEAR(created_at) = 2024) ...)
  Rows Removed by Filter: 987600
Execution Time: 18,200 ms

The optimized query:

-- AFTER (fast)
CREATE INDEX idx_orders_created_status ON orders (created_at, status);

SELECT id, customer_id, total, created_at FROM orders
WHERE created_at >= '2024-03-01'
  AND created_at <  '2024-04-01'
  AND status != 'cancelled';

EXPLAIN output:
Index Scan using idx_orders_created_status on orders
  (cost=0.56..3240.00 rows=12200 width=48)
  Index Cond: ((created_at >= '2024-03-01') AND (created_at < '2024-04-01'))
  Filter: (status <> 'cancelled')
Execution Time: 42 ms

18,200 ms → 42 ms. That is a 433× speedup from three changes:

Removed function from indexed column
Rewrote filter as a range scan
Selected only needed columns

Finding Slow Queries: The Slow Query Log

Don't guess which queries to optimize. Let the database tell you.

-- MySQL: enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- Log queries taking > 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

-- PostgreSQL: in postgresql.conf
-- log_min_duration_statement = 1000  (milliseconds)
-- Then query pg_stat_statements:
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Optimization Checklist

Technique	Impact	Effort
Add missing index	Very High	Low
Remove `SELECT *`	Medium	Low
Avoid functions on indexed columns	High	Low
Fix N+1 queries	Very High	Medium
Rewrite subqueries as JOINs	Medium	Medium
Use covering index	High	Medium
Partition large tables	High	High
Update table statistics (`ANALYZE`)	Medium	Low

Key Takeaways

The query optimizer converts your SQL into an execution plan using cost estimates based on table statistics.
EXPLAIN ANALYZE is your primary debugging tool — always check the plan before assuming a query is optimal.
Never apply functions to indexed columns in WHERE clauses — rewrite the comparison instead.
The N+1 problem is the most common application-level performance killer — always use JOINs instead of loops.
Use the slow query log and pg_stat_statements to find the queries that actually need optimization.
Optimization is data-driven: measure first, optimize second, measure again.

The fastest query is the one that reads the fewest rows. Every optimization technique ultimately serves this single goal.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

30 minLesson 14 of 16

Course Contents(16 lessons)

▾

Chapter 1: Database Fundamentals

What Is a Database? DBMS vs File Systems20 min

ER Diagrams: Designing Databases Visually35 min

The Relational Model: Tables, Keys, and Constraints28 min

Chapter 2: SQL

SQL DDL: Creating Tables, Constraints, and Schemas28 min

SQL DML: SELECT, INSERT, UPDATE, DELETE35 min

SQL JOINs: Combining Tables38 min

SQL Advanced: Subqueries, Aggregates, Window Functions40 min

Chapter 3: Normalization

Functional Dependencies, 1NF, and 2NF32 min

3NF, BCNF: Removing All Redundancy30 min

Chapter 4: Transactions and Concurrency

Transactions and ACID Properties32 min

Concurrency Control: Locks, Isolation Levels30 min

Deadlocks: Detection, Prevention, and Recovery28 min

Chapter 5: Indexing and Optimization

Indexes and B-Trees: Making Queries Fast32 min

Query Optimization and Execution Plans30 min

Chapter 6: NoSQL and Final Project

NoSQL Databases: Types, Use Cases, Tradeoffs30 min

Final Project: Design a Complete Database System45 min