AiTechWorlds
AiTechWorlds
Imagine you just received a 900-page computer networking textbook and your professor asks you to find every page that mentions "TCP/IP." Without an index, you start at page 1 and read every single line until you reach page 900. You might find 37 relevant pages — after reading all 900.
Now imagine the same textbook with a proper index in the back: "TCP/IP — see pages 47, 112, 234, 398, 512, 701." You jump directly to those six pages. Same book, same information. Completely different experience.
Database indexes work exactly this way. They are separate data structures that let the database jump directly to relevant rows instead of scanning the entire table.
A table without an index requires a full table scan — examining every row to find matches.
-- Table: users (10 million rows)
SELECT * FROM users WHERE email = 'alice@example.com';
| Scenario | Method | Time Complexity | Rows Examined |
|---|---|---|---|
| No index | Full table scan | O(n) | 10,000,000 |
| With index | B-Tree lookup | O(log n) | ~24 |
For 10 million rows, a B-Tree lookup requires roughly log₂(10,000,000) ≈ 23 comparisons. That is the difference between milliseconds and minutes.
A B-Tree (Balanced Tree) keeps data sorted and balanced, so every search, insert, and delete takes O(log n) time.
[ 30 | 70 ]
/ \ \
[10|20] [40|50|60] [80|90]
/ | \ / | | \ | \
[5] [15] [25] [35][45][55][65] [75] [85|95]
Properties:
Most databases (MySQL InnoDB, PostgreSQL, SQLite) use B+ Trees, a variant with two key differences:
B+ Tree Structure:
Internal nodes (keys only, no data):
[ 30 | 70 ]
/ | \
[10|20] [40|50|60] [80|90]
| | |
v v v
Leaf nodes (keys + data pointers, linked):
[5→][10→][15→][20→] ↔ [25→][30→][35→] ↔ [40→] ... ↔ [85→][90→][95→]
Why this matters for range queries:
SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
The database finds the first matching leaf node, then follows the linked list forward — no need to traverse the tree again.
Automatically created on the primary key. In InnoDB (MySQL), the table data is stored in the B+ Tree itself (called a clustered index).
CREATE TABLE users (
id INT PRIMARY KEY, -- Primary index created automatically
email VARCHAR(255),
name VARCHAR(100)
);
An additional index on any non-primary column.
CREATE INDEX idx_users_email ON users (email);
An index on multiple columns — extremely powerful when used correctly.
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);
An index that contains all columns needed by a query. The database never touches the main table.
-- Query needs: customer_id, order_date, total
CREATE INDEX idx_covering ON orders (customer_id, order_date, total);
-- Database resolves the query entirely from the index
An index on a subset of rows. Smaller, faster.
CREATE INDEX idx_active_users ON users (email) WHERE active = true;
-- Only indexes active users — ignores millions of deleted/inactive accounts
For searching within text content.
-- MySQL
CREATE FULLTEXT INDEX idx_content ON articles (title, body);
SELECT * FROM articles WHERE MATCH(title, body) AGAINST ('database optimization');
The columns in a composite index must match your query's filter order to be useful.
-- Index: (last_name, first_name, birth_year)
CREATE INDEX idx_name ON employees (last_name, first_name, birth_year);
-- Uses index fully:
SELECT * FROM employees WHERE last_name = 'Smith' AND first_name = 'John';
-- Uses index partially (only last_name):
SELECT * FROM employees WHERE last_name = 'Smith';
-- Does NOT use index (skips the leading column):
SELECT * FROM employees WHERE first_name = 'John';
Rule: A composite index can only be used left-to-right. Always put the most selective, most-queried column first.
Selectivity = the ratio of distinct values to total rows. High selectivity = useful index.
-- High selectivity (good for indexing)
CREATE INDEX idx_email ON users (email); -- Almost every value is unique
-- Low selectivity (poor index — often ignored by optimizer)
CREATE INDEX idx_gender ON users (gender); -- Only 2-3 distinct values
-- For 1 million users: "male" = 490,000 rows. The index saves nothing.
-- A full scan is faster than 490,000 B-Tree lookups.
A rule of thumb: if a column has fewer than 10–20 distinct values relative to table size, an index on it alone is rarely useful.
Every index is a separate data structure that must be kept in sync with the table.
| Operation | Without Index | With 3 Indexes |
|---|---|---|
| SELECT | Slow (scan) | Fast (lookup) |
| INSERT | Fast | 4× maintenance overhead |
| UPDATE | Fast | Index must be updated |
| DELETE | Fast | Index entries must be removed |
| Storage | Low | Each index uses additional disk space |
Practical guideline: Only create indexes that serve real, frequent queries. Unused indexes are pure overhead.
EXPLAIN shows you how the database will execute a query — and whether it uses an index.
-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';
Output — No Index (Bad):
Seq Scan on users (cost=0.00..234891.00 rows=1 width=128)
(actual time=1823.412..1823.413 rows=1 loops=1)
Filter: (email = 'alice@example.com')
Rows Removed by Filter: 9999999
Planning Time: 0.5 ms
Execution Time: 1823.8 ms
Output — With Index (Good):
Index Scan using idx_users_email on users (cost=0.56..8.58 rows=1 width=128)
(actual time=0.043..0.044 rows=1 loops=1)
Index Cond: (email = 'alice@example.com')
Planning Time: 0.3 ms
Execution Time: 0.1 ms
Key terms to watch:
Seq Scan = full table scan (often bad on large tables)Index Scan = using a B+ Tree indexIndex Only Scan = covering index, never touches main tableBitmap Heap Scan = index used, then data fetched in batchesEXPLAIN ANALYZE to verify that your indexes are actually being used.An index is a promise to the database: "I will read this column frequently enough to justify the overhead." Make that promise only when you mean it.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises