AiTechWorlds
AiTechWorlds
A junior analyst asks: "Show me all the sales records." A senior analyst asks: "Which salesperson generated the most revenue last quarter, excluding returns, and how does that compare to their average over the past two years?"
Basic SELECT answers what. Advanced SQL answers how many, what average, which group ranks highest, and what changed between periods. These questions drive actual business decisions.
This lesson covers the tools that transform SQL from a data retrieval language into a data analysis language.
orders
| order_id | customer_id | salesperson | product | category | amount | order_date |
|---|---|---|---|---|---|---|
| 1 | 101 | Alice | SQL Book | Books | 29.99 | 2024-01-15 |
| 2 | 102 | Bob | Notebook | Stationery | 9.99 | 2024-01-20 |
| 3 | 101 | Alice | Pen Set | Stationery | 14.99 | 2024-02-01 |
| 4 | 103 | Bob | SQL Book | Books | 29.99 | 2024-02-10 |
| 5 | 104 | Alice | Dune Novel | Books | 19.99 | 2024-02-15 |
| 6 | 102 | Carol | Notebook | Stationery | 9.99 | 2024-03-01 |
| 7 | 105 | Carol | SQL Book | Books | 29.99 | 2024-03-10 |
Aggregate functions collapse multiple rows into a single value.
| Function | Purpose | Ignores NULLs? |
|---|---|---|
COUNT(*) | Count all rows | No |
COUNT(col) | Count non-null values in column | Yes |
SUM(col) | Total of all values | Yes |
AVG(col) | Arithmetic mean | Yes |
MIN(col) | Smallest value | Yes |
MAX(col) | Largest value | Yes |
SELECT
COUNT(*) AS total_orders,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value,
MIN(amount) AS smallest_order,
MAX(amount) AS largest_order
FROM orders;
Result:
| total_orders | unique_customers | total_revenue | avg_order_value | smallest_order | largest_order |
|---|---|---|---|---|---|
| 7 | 5 | 144.93 | 20.70 | 9.99 | 29.99 |
GROUP BY splits the table into groups and runs the aggregate function on each group.
SELECT
salesperson,
COUNT(*) AS orders_count,
SUM(amount) AS total_revenue
FROM orders
GROUP BY salesperson
ORDER BY total_revenue DESC;
Result:
| salesperson | orders_count | total_revenue |
|---|---|---|
| Alice | 3 | 64.97 |
| Bob | 2 | 39.98 |
| Carol | 2 | 39.98 |
Rule: Every column in SELECT must either be in GROUP BY or wrapped in an aggregate function. This is enforced by the SQL standard.
SELECT
salesperson,
category,
SUM(amount) AS revenue
FROM orders
GROUP BY salesperson, category
ORDER BY salesperson, revenue DESC;
Result:
| salesperson | category | revenue |
|---|---|---|
| Alice | Books | 49.98 |
| Alice | Stationery | 14.99 |
| Bob | Books | 29.99 |
| Bob | Stationery | 9.99 |
| Carol | Books | 29.99 |
| Carol | Stationery | 9.99 |
WHERE filters individual rows before aggregation. HAVING filters groups after aggregation.
Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
-- Find salespersons with more than 2 orders
SELECT
salesperson,
COUNT(*) AS orders_count
FROM orders
GROUP BY salesperson
HAVING COUNT(*) > 2;
Result:
| salesperson | orders_count |
|---|---|
| Alice | 3 |
You cannot write WHERE COUNT(*) > 2 — aggregate functions are not allowed in WHERE. Use HAVING for post-aggregation filters.
Combined WHERE and HAVING:
-- Among Books orders only, show salespersons with revenue > 25
SELECT salesperson, SUM(amount) AS book_revenue
FROM orders
WHERE category = 'Books' -- filter rows first
GROUP BY salesperson
HAVING SUM(amount) > 25; -- then filter groups
A subquery is a query nested inside another query. It runs first and passes its result to the outer query.
-- Find all orders with amount above average
SELECT order_id, salesperson, amount
FROM orders
WHERE amount > (SELECT AVG(amount) FROM orders);
The inner query calculates the average (20.70), then the outer query filters with that value.
Result:
| order_id | salesperson | amount |
|---|---|---|
| 1 | Alice | 29.99 |
| 4 | Bob | 29.99 |
| 5 | Alice | 19.99 |
| 7 | Carol | 29.99 |
-- Find customers who spent more than their personal average
SELECT order_id, customer_id, amount
FROM orders o1
WHERE amount > (
SELECT AVG(amount)
FROM orders o2
WHERE o2.customer_id = o1.customer_id -- references outer query
);
The inner query re-runs for each row of the outer query. Powerful but potentially slow on large datasets.
A CTE (using the WITH keyword) is a named, temporary result set defined at the top of a query. It makes complex queries readable by breaking them into named steps.
Subquery version (hard to read):
SELECT salesperson, total_revenue
FROM (
SELECT salesperson, SUM(amount) AS total_revenue
FROM orders
GROUP BY salesperson
) AS revenue_summary
WHERE total_revenue > 40;
CTE version (reads like sentences):
WITH revenue_summary AS (
SELECT salesperson, SUM(amount) AS total_revenue
FROM orders
GROUP BY salesperson
)
SELECT salesperson, total_revenue
FROM revenue_summary
WHERE total_revenue > 40;
Result (both versions):
| salesperson | total_revenue |
|---|---|
| Alice | 64.97 |
CTEs shine when you need multiple intermediate steps:
WITH
monthly_totals AS (
SELECT
salesperson,
DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(amount) AS monthly_revenue
FROM orders
GROUP BY salesperson, DATE_FORMAT(order_date, '%Y-%m')
),
best_month AS (
SELECT salesperson, MAX(monthly_revenue) AS peak_revenue
FROM monthly_totals
GROUP BY salesperson
)
SELECT * FROM best_month
ORDER BY peak_revenue DESC;
Window functions compute values across a set of related rows without collapsing them into one row (unlike GROUP BY). They're the most powerful feature in modern SQL.
SELECT
salesperson,
amount,
order_date,
ROW_NUMBER() OVER (PARTITION BY salesperson ORDER BY order_date) AS order_sequence
FROM orders;
Result:
| salesperson | amount | order_date | order_sequence |
|---|---|---|---|
| Alice | 29.99 | 2024-01-15 | 1 |
| Alice | 14.99 | 2024-02-01 | 2 |
| Alice | 19.99 | 2024-02-15 | 3 |
| Bob | 9.99 | 2024-01-20 | 1 |
| Bob | 29.99 | 2024-02-10 | 2 |
| Carol | 9.99 | 2024-03-01 | 1 |
| Carol | 29.99 | 2024-03-10 | 2 |
PARTITION BY salesperson restarts the count for each salesperson.
SELECT
salesperson,
SUM(amount) AS total,
RANK() OVER (ORDER BY SUM(amount) DESC) AS revenue_rank
FROM orders
GROUP BY salesperson;
Result:
| salesperson | total | revenue_rank |
|---|---|---|
| Alice | 64.97 | 1 |
| Bob | 39.98 | 2 |
| Carol | 39.98 | 2 |
Bob and Carol tie at rank 2. The next rank would be 4 (rank 3 is skipped). Use DENSE_RANK() to avoid gaps.
SELECT
order_date,
amount,
LAG(amount) OVER (ORDER BY order_date) AS prev_order_amount,
amount - LAG(amount) OVER (ORDER BY order_date) AS change
FROM orders
WHERE salesperson = 'Alice';
Result:
| order_date | amount | prev_order_amount | change |
|---|---|---|---|
| 2024-01-15 | 29.99 | NULL | NULL |
| 2024-02-01 | 14.99 | 29.99 | -15.00 |
| 2024-02-15 | 19.99 | 14.99 | 5.00 |
LAG() looks backward; LEAD() looks forward. Essential for trend analysis, running totals, and period-over-period comparisons.
1. FROM -- Which tables?
2. JOIN -- Combine tables
3. WHERE -- Filter rows
4. GROUP BY -- Form groups
5. HAVING -- Filter groups
6. SELECT -- Compute output columns
7. DISTINCT -- Remove duplicate rows
8. ORDER BY -- Sort result
9. LIMIT/OFFSET -- Trim to page size
Understanding this order explains why you can't use a SELECT alias in a WHERE clause (WHERE runs before SELECT) but can use it in ORDER BY (ORDER BY runs after SELECT).
Next lesson: Normalization — why you structure tables the way you do, and how to do it correctly.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises