Imagine you just received a 900-page computer networking textbook and your professor asks you to find every page that mentions "TCP/IP." Without an index, you start at page 1 and read every single line until you reach page 900. You might find 37 relevant pages — after reading all 900.

Now imagine the same textbook with a proper index in the back: "TCP/IP — see pages 47, 112, 234, 398, 512, 701." You jump directly to those six pages. Same book, same information. Completely different experience.

Database indexes work exactly this way. They are separate data structures that let the database jump directly to relevant rows instead of scanning the entire table.

Why Indexes Matter: The Performance Gap

A table without an index requires a full table scan — examining every row to find matches.

-- Table: users (10 million rows)
SELECT * FROM users WHERE email = 'alice@example.com';

Scenario	Method	Time Complexity	Rows Examined
No index	Full table scan	O(n)	10,000,000
With index	B-Tree lookup	O(log n)	~24

For 10 million rows, a B-Tree lookup requires roughly log₂(10,000,000) ≈ 23 comparisons. That is the difference between milliseconds and minutes.

The B-Tree: Default Database Index

A B-Tree (Balanced Tree) keeps data sorted and balanced, so every search, insert, and delete takes O(log n) time.

                    [  30  |  70  ]
                   /         \         \
          [10|20]         [40|50|60]    [80|90]
          /  |  \          /  |  |  \    |   \
        [5] [15] [25]  [35][45][55][65] [75] [85|95]

Properties:

All leaf nodes are at the same depth (balanced)
Each internal node can hold multiple keys
A node with k keys has k+1 child pointers
Lookups, inserts, and deletes are all O(log n)

B+ Tree: What Databases Actually Use

Most databases (MySQL InnoDB, PostgreSQL, SQLite) use B+ Trees, a variant with two key differences:

All actual data (or row pointers) live only in leaf nodes. Internal nodes store only keys for routing.
Leaf nodes are linked in a doubly-linked list. This makes range scans blazing fast.

B+ Tree Structure:

Internal nodes (keys only, no data):
                    [  30  |  70  ]
                   /         |         \
          [10|20]         [40|50|60]    [80|90]
              |                |             |
              v                v             v
Leaf nodes (keys + data pointers, linked):
[5→][10→][15→][20→] ↔ [25→][30→][35→] ↔ [40→] ... ↔ [85→][90→][95→]

Why this matters for range queries:

SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';

The database finds the first matching leaf node, then follows the linked list forward — no need to traverse the tree again.

Types of Indexes

Primary Index

Automatically created on the primary key. In InnoDB (MySQL), the table data is stored in the B+ Tree itself (called a clustered index).

CREATE TABLE users (
    id INT PRIMARY KEY,  -- Primary index created automatically
    email VARCHAR(255),
    name VARCHAR(100)
);

Secondary Index

An additional index on any non-primary column.

CREATE INDEX idx_users_email ON users (email);

Composite Index

An index on multiple columns — extremely powerful when used correctly.

CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);

Covering Index

An index that contains all columns needed by a query. The database never touches the main table.

-- Query needs: customer_id, order_date, total
CREATE INDEX idx_covering ON orders (customer_id, order_date, total);
-- Database resolves the query entirely from the index

Partial Index (PostgreSQL)

An index on a subset of rows. Smaller, faster.

CREATE INDEX idx_active_users ON users (email) WHERE active = true;
-- Only indexes active users — ignores millions of deleted/inactive accounts

Full-Text Index

For searching within text content.

-- MySQL
CREATE FULLTEXT INDEX idx_content ON articles (title, body);
SELECT * FROM articles WHERE MATCH(title, body) AGAINST ('database optimization');

Composite Index: Order Matters

The columns in a composite index must match your query's filter order to be useful.

-- Index: (last_name, first_name, birth_year)
CREATE INDEX idx_name ON employees (last_name, first_name, birth_year);

-- Uses index fully:
SELECT * FROM employees WHERE last_name = 'Smith' AND first_name = 'John';

-- Uses index partially (only last_name):
SELECT * FROM employees WHERE last_name = 'Smith';

-- Does NOT use index (skips the leading column):
SELECT * FROM employees WHERE first_name = 'John';

Rule: A composite index can only be used left-to-right. Always put the most selective, most-queried column first.

Index Selectivity: When Indexes Are Useless

Selectivity = the ratio of distinct values to total rows. High selectivity = useful index.

-- High selectivity (good for indexing)
CREATE INDEX idx_email ON users (email);  -- Almost every value is unique

-- Low selectivity (poor index — often ignored by optimizer)
CREATE INDEX idx_gender ON users (gender);  -- Only 2-3 distinct values
-- For 1 million users: "male" = 490,000 rows. The index saves nothing.
-- A full scan is faster than 490,000 B-Tree lookups.

A rule of thumb: if a column has fewer than 10–20 distinct values relative to table size, an index on it alone is rarely useful.

The Tradeoff: Indexes Slow Down Writes

Every index is a separate data structure that must be kept in sync with the table.

Operation	Without Index	With 3 Indexes
SELECT	Slow (scan)	Fast (lookup)
INSERT	Fast	4× maintenance overhead
UPDATE	Fast	Index must be updated
DELETE	Fast	Index entries must be removed
Storage	Low	Each index uses additional disk space

Practical guideline: Only create indexes that serve real, frequent queries. Unused indexes are pure overhead.

Reading EXPLAIN: The Query Execution Plan

EXPLAIN shows you how the database will execute a query — and whether it uses an index.

-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

Output — No Index (Bad):

Seq Scan on users  (cost=0.00..234891.00 rows=1 width=128)
                    (actual time=1823.412..1823.413 rows=1 loops=1)
  Filter: (email = 'alice@example.com')
  Rows Removed by Filter: 9999999
Planning Time: 0.5 ms
Execution Time: 1823.8 ms

Output — With Index (Good):

Index Scan using idx_users_email on users  (cost=0.56..8.58 rows=1 width=128)
                                            (actual time=0.043..0.044 rows=1 loops=1)
  Index Cond: (email = 'alice@example.com')
Planning Time: 0.3 ms
Execution Time: 0.1 ms

Key terms to watch:

Seq Scan = full table scan (often bad on large tables)
Index Scan = using a B+ Tree index
Index Only Scan = covering index, never touches main table
Bitmap Heap Scan = index used, then data fetched in batches

Key Takeaways

Without an index, every query is a full table scan: O(n). With a B-Tree index: O(log n).
B+ Trees are the standard: internal nodes hold routing keys, leaf nodes hold data and are linked for efficient range scans.
Index selectivity determines usefulness — low-selectivity columns (booleans, status flags) rarely benefit from simple indexes.
Composite indexes are powerful but order-sensitive — always lead with the most selective, most-filtered column.
Indexes cost write performance and storage — only create indexes that serve real query patterns.
Always use EXPLAIN ANALYZE to verify that your indexes are actually being used.

An index is a promise to the database: "I will read this column frequently enough to justify the overhead." Make that promise only when you mean it.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

32 minLesson 13 of 16

Course Contents(16 lessons)

▾

Chapter 1: Database Fundamentals

What Is a Database? DBMS vs File Systems20 min

ER Diagrams: Designing Databases Visually35 min

The Relational Model: Tables, Keys, and Constraints28 min

Chapter 2: SQL

SQL DDL: Creating Tables, Constraints, and Schemas28 min

SQL DML: SELECT, INSERT, UPDATE, DELETE35 min

SQL JOINs: Combining Tables38 min

SQL Advanced: Subqueries, Aggregates, Window Functions40 min

Chapter 3: Normalization

Functional Dependencies, 1NF, and 2NF32 min

3NF, BCNF: Removing All Redundancy30 min

Chapter 4: Transactions and Concurrency

Transactions and ACID Properties32 min

Concurrency Control: Locks, Isolation Levels30 min

Deadlocks: Detection, Prevention, and Recovery28 min

Chapter 5: Indexing and Optimization

Indexes and B-Trees: Making Queries Fast32 min

Query Optimization and Execution Plans30 min

Chapter 6: NoSQL and Final Project

NoSQL Databases: Types, Use Cases, Tradeoffs30 min

Final Project: Design a Complete Database System45 min