A junior analyst asks: "Show me all the sales records." A senior analyst asks: "Which salesperson generated the most revenue last quarter, excluding returns, and how does that compare to their average over the past two years?"

Basic SELECT answers what. Advanced SQL answers how many, what average, which group ranks highest, and what changed between periods. These questions drive actual business decisions.

This lesson covers the tools that transform SQL from a data retrieval language into a data analysis language.

Setup: The Sales Data

orders

order_id	customer_id	salesperson	product	category	amount	order_date
1	101	Alice	SQL Book	Books	29.99	2024-01-15
2	102	Bob	Notebook	Stationery	9.99	2024-01-20
3	101	Alice	Pen Set	Stationery	14.99	2024-02-01
4	103	Bob	SQL Book	Books	29.99	2024-02-10
5	104	Alice	Dune Novel	Books	19.99	2024-02-15
6	102	Carol	Notebook	Stationery	9.99	2024-03-01
7	105	Carol	SQL Book	Books	29.99	2024-03-10

Aggregate Functions

Aggregate functions collapse multiple rows into a single value.

Function	Purpose	Ignores NULLs?
`COUNT(*)`	Count all rows	No
`COUNT(col)`	Count non-null values in column	Yes
`SUM(col)`	Total of all values	Yes
`AVG(col)`	Arithmetic mean	Yes
`MIN(col)`	Smallest value	Yes
`MAX(col)`	Largest value	Yes

SELECT
    COUNT(*)            AS total_orders,
    COUNT(DISTINCT customer_id) AS unique_customers,
    SUM(amount)         AS total_revenue,
    AVG(amount)         AS avg_order_value,
    MIN(amount)         AS smallest_order,
    MAX(amount)         AS largest_order
FROM orders;

Result:

total_orders	unique_customers	total_revenue	avg_order_value	smallest_order	largest_order
7	5	144.93	20.70	9.99	29.99

GROUP BY: Aggregating by Category

GROUP BY splits the table into groups and runs the aggregate function on each group.

SELECT
    salesperson,
    COUNT(*)    AS orders_count,
    SUM(amount) AS total_revenue
FROM orders
GROUP BY salesperson
ORDER BY total_revenue DESC;

Result:

salesperson	orders_count	total_revenue
Alice	3	64.97
Bob	2	39.98
Carol	2	39.98

Rule: Every column in SELECT must either be in GROUP BY or wrapped in an aggregate function. This is enforced by the SQL standard.

GROUP BY with multiple columns

SELECT
    salesperson,
    category,
    SUM(amount) AS revenue
FROM orders
GROUP BY salesperson, category
ORDER BY salesperson, revenue DESC;

Result:

salesperson	category	revenue
Alice	Books	49.98
Alice	Stationery	14.99
Bob	Books	29.99
Bob	Stationery	9.99
Carol	Books	29.99
Carol	Stationery	9.99

HAVING: Filtering After Aggregation

WHERE filters individual rows before aggregation. HAVING filters groups after aggregation.

Execution order:  FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY

-- Find salespersons with more than 2 orders
SELECT
    salesperson,
    COUNT(*) AS orders_count
FROM orders
GROUP BY salesperson
HAVING COUNT(*) > 2;

Result:

salesperson	orders_count
Alice	3

You cannot write WHERE COUNT(*) > 2 — aggregate functions are not allowed in WHERE. Use HAVING for post-aggregation filters.

Combined WHERE and HAVING:

-- Among Books orders only, show salespersons with revenue > 25
SELECT salesperson, SUM(amount) AS book_revenue
FROM orders
WHERE category = 'Books'          -- filter rows first
GROUP BY salesperson
HAVING SUM(amount) > 25;          -- then filter groups

Subqueries

A subquery is a query nested inside another query. It runs first and passes its result to the outer query.

Non-correlated subquery (runs once)

-- Find all orders with amount above average
SELECT order_id, salesperson, amount
FROM orders
WHERE amount > (SELECT AVG(amount) FROM orders);

The inner query calculates the average (20.70), then the outer query filters with that value.

Result:

order_id	salesperson	amount
1	Alice	29.99
4	Bob	29.99
5	Alice	19.99
7	Carol	29.99

Correlated subquery (runs once per outer row)

-- Find customers who spent more than their personal average
SELECT order_id, customer_id, amount
FROM orders o1
WHERE amount > (
    SELECT AVG(amount)
    FROM orders o2
    WHERE o2.customer_id = o1.customer_id   -- references outer query
);

The inner query re-runs for each row of the outer query. Powerful but potentially slow on large datasets.

Common Table Expressions (CTEs)

A CTE (using the WITH keyword) is a named, temporary result set defined at the top of a query. It makes complex queries readable by breaking them into named steps.

The same problem: subquery vs CTE

Subquery version (hard to read):

SELECT salesperson, total_revenue
FROM (
    SELECT salesperson, SUM(amount) AS total_revenue
    FROM orders
    GROUP BY salesperson
) AS revenue_summary
WHERE total_revenue > 40;

CTE version (reads like sentences):

WITH revenue_summary AS (
    SELECT salesperson, SUM(amount) AS total_revenue
    FROM orders
    GROUP BY salesperson
)
SELECT salesperson, total_revenue
FROM revenue_summary
WHERE total_revenue > 40;

Result (both versions):

salesperson	total_revenue
Alice	64.97

CTEs shine when you need multiple intermediate steps:

WITH
monthly_totals AS (
    SELECT
        salesperson,
        DATE_FORMAT(order_date, '%Y-%m') AS month,
        SUM(amount) AS monthly_revenue
    FROM orders
    GROUP BY salesperson, DATE_FORMAT(order_date, '%Y-%m')
),
best_month AS (
    SELECT salesperson, MAX(monthly_revenue) AS peak_revenue
    FROM monthly_totals
    GROUP BY salesperson
)
SELECT * FROM best_month
ORDER BY peak_revenue DESC;

Window functions compute values across a set of related rows without collapsing them into one row (unlike GROUP BY). They're the most powerful feature in modern SQL.

ROW_NUMBER(): Assign sequential numbers

SELECT
    salesperson,
    amount,
    order_date,
    ROW_NUMBER() OVER (PARTITION BY salesperson ORDER BY order_date) AS order_sequence
FROM orders;

Result:

salesperson	amount	order_date	order_sequence
Alice	29.99	2024-01-15	1
Alice	14.99	2024-02-01	2
Alice	19.99	2024-02-15	3
Bob	9.99	2024-01-20	1
Bob	29.99	2024-02-10	2
Carol	9.99	2024-03-01	1
Carol	29.99	2024-03-10	2

PARTITION BY salesperson restarts the count for each salesperson.

RANK(): Rank with gaps for ties

SELECT
    salesperson,
    SUM(amount) AS total,
    RANK() OVER (ORDER BY SUM(amount) DESC) AS revenue_rank
FROM orders
GROUP BY salesperson;

Result:

salesperson	total	revenue_rank
Alice	64.97	1
Bob	39.98	2
Carol	39.98	2

Bob and Carol tie at rank 2. The next rank would be 4 (rank 3 is skipped). Use DENSE_RANK() to avoid gaps.

LAG() and LEAD(): Compare to previous or next row

SELECT
    order_date,
    amount,
    LAG(amount) OVER (ORDER BY order_date)  AS prev_order_amount,
    amount - LAG(amount) OVER (ORDER BY order_date) AS change
FROM orders
WHERE salesperson = 'Alice';

Result:

order_date	amount	prev_order_amount	change
2024-01-15	29.99	NULL	NULL
2024-02-01	14.99	29.99	-15.00
2024-02-15	19.99	14.99	5.00

LAG() looks backward; LEAD() looks forward. Essential for trend analysis, running totals, and period-over-period comparisons.

Execution Order Reference

1. FROM           -- Which tables?
2. JOIN           -- Combine tables
3. WHERE          -- Filter rows
4. GROUP BY       -- Form groups
5. HAVING         -- Filter groups
6. SELECT         -- Compute output columns
7. DISTINCT       -- Remove duplicate rows
8. ORDER BY       -- Sort result
9. LIMIT/OFFSET   -- Trim to page size

Understanding this order explains why you can't use a SELECT alias in a WHERE clause (WHERE runs before SELECT) but can use it in ORDER BY (ORDER BY runs after SELECT).

Next lesson: Normalization — why you structure tables the way you do, and how to do it correctly.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

40 minLesson 7 of 16

Course Contents(16 lessons)

▾

Chapter 1: Database Fundamentals

What Is a Database? DBMS vs File Systems20 min

ER Diagrams: Designing Databases Visually35 min

The Relational Model: Tables, Keys, and Constraints28 min

Chapter 2: SQL

SQL DDL: Creating Tables, Constraints, and Schemas28 min

SQL DML: SELECT, INSERT, UPDATE, DELETE35 min

SQL JOINs: Combining Tables38 min

SQL Advanced: Subqueries, Aggregates, Window Functions40 min

Chapter 3: Normalization

Functional Dependencies, 1NF, and 2NF32 min

3NF, BCNF: Removing All Redundancy30 min

Chapter 4: Transactions and Concurrency

Transactions and ACID Properties32 min

Concurrency Control: Locks, Isolation Levels30 min

Deadlocks: Detection, Prevention, and Recovery28 min

Chapter 5: Indexing and Optimization

Indexes and B-Trees: Making Queries Fast32 min

Query Optimization and Execution Plans30 min

Chapter 6: NoSQL and Final Project

NoSQL Databases: Types, Use Cases, Tradeoffs30 min

Final Project: Design a Complete Database System45 min

Chapter 2: SQL

SQL Advanced: Subqueries, Aggregates, Window Functions

SQL Advanced Queries

Beyond "Show Me the Data"

Basic SELECT answers what. Advanced SQL answers how many, what average, which group ranks highest, and what changed between periods. These questions drive actual business decisions.

This lesson covers the tools that transform SQL from a data retrieval language into a data analysis language.

Setup: The Sales Data

orders

order_id	customer_id	salesperson	product	category	amount	order_date
1	101	Alice	SQL Book	Books	29.99	2024-01-15
2	102	Bob	Notebook	Stationery	9.99	2024-01-20
3	101	Alice	Pen Set	Stationery	14.99	2024-02-01
4	103	Bob	SQL Book	Books	29.99	2024-02-10
5	104	Alice	Dune Novel	Books	19.99	2024-02-15
6	102	Carol	Notebook	Stationery	9.99	2024-03-01
7	105	Carol	SQL Book	Books	29.99	2024-03-10

Aggregate Functions

Aggregate functions collapse multiple rows into a single value.

Function	Purpose	Ignores NULLs?
`COUNT(*)`	Count all rows	No
`COUNT(col)`	Count non-null values in column	Yes
`SUM(col)`	Total of all values	Yes
`AVG(col)`	Arithmetic mean	Yes
`MIN(col)`	Smallest value	Yes
`MAX(col)`	Largest value	Yes

SELECT
    COUNT(*)            AS total_orders,
    COUNT(DISTINCT customer_id) AS unique_customers,
    SUM(amount)         AS total_revenue,
    AVG(amount)         AS avg_order_value,
    MIN(amount)         AS smallest_order,
    MAX(amount)         AS largest_order
FROM orders;

Result:

total_orders	unique_customers	total_revenue	avg_order_value	smallest_order	largest_order
7	5	144.93	20.70	9.99	29.99

GROUP BY: Aggregating by Category

GROUP BY splits the table into groups and runs the aggregate function on each group.

SELECT
    salesperson,
    COUNT(*)    AS orders_count,
    SUM(amount) AS total_revenue
FROM orders
GROUP BY salesperson
ORDER BY total_revenue DESC;

Result:

salesperson	orders_count	total_revenue
Alice	3	64.97
Bob	2	39.98
Carol	2	39.98

Rule: Every column in SELECT must either be in GROUP BY or wrapped in an aggregate function. This is enforced by the SQL standard.

GROUP BY with multiple columns

SELECT
    salesperson,
    category,
    SUM(amount) AS revenue
FROM orders
GROUP BY salesperson, category
ORDER BY salesperson, revenue DESC;

Result:

salesperson	category	revenue
Alice	Books	49.98
Alice	Stationery	14.99
Bob	Books	29.99
Bob	Stationery	9.99
Carol	Books	29.99
Carol	Stationery	9.99

HAVING: Filtering After Aggregation

WHERE filters individual rows before aggregation. HAVING filters groups after aggregation.

Execution order:  FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY

-- Find salespersons with more than 2 orders
SELECT
    salesperson,
    COUNT(*) AS orders_count
FROM orders
GROUP BY salesperson
HAVING COUNT(*) > 2;

Result:

salesperson	orders_count
Alice	3

You cannot write WHERE COUNT(*) > 2 — aggregate functions are not allowed in WHERE. Use HAVING for post-aggregation filters.

Combined WHERE and HAVING:

-- Among Books orders only, show salespersons with revenue > 25
SELECT salesperson, SUM(amount) AS book_revenue
FROM orders
WHERE category = 'Books'          -- filter rows first
GROUP BY salesperson
HAVING SUM(amount) > 25;          -- then filter groups

Subqueries

A subquery is a query nested inside another query. It runs first and passes its result to the outer query.

Non-correlated subquery (runs once)

-- Find all orders with amount above average
SELECT order_id, salesperson, amount
FROM orders
WHERE amount > (SELECT AVG(amount) FROM orders);

The inner query calculates the average (20.70), then the outer query filters with that value.

Result:

order_id	salesperson	amount
1	Alice	29.99
4	Bob	29.99
5	Alice	19.99
7	Carol	29.99

Correlated subquery (runs once per outer row)

-- Find customers who spent more than their personal average
SELECT order_id, customer_id, amount
FROM orders o1
WHERE amount > (
    SELECT AVG(amount)
    FROM orders o2
    WHERE o2.customer_id = o1.customer_id   -- references outer query
);

The inner query re-runs for each row of the outer query. Powerful but potentially slow on large datasets.

Common Table Expressions (CTEs)

A CTE (using the WITH keyword) is a named, temporary result set defined at the top of a query. It makes complex queries readable by breaking them into named steps.

The same problem: subquery vs CTE

Subquery version (hard to read):

SELECT salesperson, total_revenue
FROM (
    SELECT salesperson, SUM(amount) AS total_revenue
    FROM orders
    GROUP BY salesperson
) AS revenue_summary
WHERE total_revenue > 40;

CTE version (reads like sentences):

WITH revenue_summary AS (
    SELECT salesperson, SUM(amount) AS total_revenue
    FROM orders
    GROUP BY salesperson
)
SELECT salesperson, total_revenue
FROM revenue_summary
WHERE total_revenue > 40;

Result (both versions):

salesperson	total_revenue
Alice	64.97

CTEs shine when you need multiple intermediate steps:

WITH
monthly_totals AS (
    SELECT
        salesperson,
        DATE_FORMAT(order_date, '%Y-%m') AS month,
        SUM(amount) AS monthly_revenue
    FROM orders
    GROUP BY salesperson, DATE_FORMAT(order_date, '%Y-%m')
),
best_month AS (
    SELECT salesperson, MAX(monthly_revenue) AS peak_revenue
    FROM monthly_totals
    GROUP BY salesperson
)
SELECT * FROM best_month
ORDER BY peak_revenue DESC;

Window functions compute values across a set of related rows without collapsing them into one row (unlike GROUP BY). They're the most powerful feature in modern SQL.

ROW_NUMBER(): Assign sequential numbers

SELECT
    salesperson,
    amount,
    order_date,
    ROW_NUMBER() OVER (PARTITION BY salesperson ORDER BY order_date) AS order_sequence
FROM orders;

Result:

salesperson	amount	order_date	order_sequence
Alice	29.99	2024-01-15	1
Alice	14.99	2024-02-01	2
Alice	19.99	2024-02-15	3
Bob	9.99	2024-01-20	1
Bob	29.99	2024-02-10	2
Carol	9.99	2024-03-01	1
Carol	29.99	2024-03-10	2

PARTITION BY salesperson restarts the count for each salesperson.

RANK(): Rank with gaps for ties

SELECT
    salesperson,
    SUM(amount) AS total,
    RANK() OVER (ORDER BY SUM(amount) DESC) AS revenue_rank
FROM orders
GROUP BY salesperson;

Result:

salesperson	total	revenue_rank
Alice	64.97	1
Bob	39.98	2
Carol	39.98	2

Bob and Carol tie at rank 2. The next rank would be 4 (rank 3 is skipped). Use DENSE_RANK() to avoid gaps.

LAG() and LEAD(): Compare to previous or next row

SELECT
    order_date,
    amount,
    LAG(amount) OVER (ORDER BY order_date)  AS prev_order_amount,
    amount - LAG(amount) OVER (ORDER BY order_date) AS change
FROM orders
WHERE salesperson = 'Alice';

Result:

order_date	amount	prev_order_amount	change
2024-01-15	29.99	NULL	NULL
2024-02-01	14.99	29.99	-15.00
2024-02-15	19.99	14.99	5.00

LAG() looks backward; LEAD() looks forward. Essential for trend analysis, running totals, and period-over-period comparisons.

Execution Order Reference

1. FROM           -- Which tables?
2. JOIN           -- Combine tables
3. WHERE          -- Filter rows
4. GROUP BY       -- Form groups
5. HAVING         -- Filter groups
6. SELECT         -- Compute output columns
7. DISTINCT       -- Remove duplicate rows
8. ORDER BY       -- Sort result
9. LIMIT/OFFSET   -- Trim to page size

Understanding this order explains why you can't use a SELECT alias in a WHERE clause (WHERE runs before SELECT) but can use it in ORDER BY (ORDER BY runs after SELECT).

Next lesson: Normalization — why you structure tables the way you do, and how to do it correctly.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

SQL Advanced: Subqueries, Aggregates, Window Functions

SQL Advanced Queries

Beyond "Show Me the Data"

Setup: The Sales Data

Aggregate Functions

GROUP BY: Aggregating by Category

GROUP BY with multiple columns

HAVING: Filtering After Aggregation

Subqueries

Non-correlated subquery (runs once)

Correlated subquery (runs once per outer row)

Common Table Expressions (CTEs)

The same problem: subquery vs CTE

Window Functions: Calculations Across Related Rows

ROW_NUMBER(): Assign sequential numbers

RANK(): Rank with gaps for ties

LAG() and LEAD(): Compare to previous or next row

Execution Order Reference

💬 DiscussionPowered by GitHub Discussions

SQL Advanced: Subqueries, Aggregates, Window Functions

SQL Advanced Queries

Beyond "Show Me the Data"

Setup: The Sales Data

Aggregate Functions

GROUP BY: Aggregating by Category

GROUP BY with multiple columns

HAVING: Filtering After Aggregation

Subqueries

Non-correlated subquery (runs once)

Correlated subquery (runs once per outer row)

Common Table Expressions (CTEs)

The same problem: subquery vs CTE

Window Functions: Calculations Across Related Rows

ROW_NUMBER(): Assign sequential numbers

RANK(): Rank with gaps for ties

LAG() and LEAD(): Compare to previous or next row

Execution Order Reference

💬 DiscussionPowered by GitHub Discussions