AiTechWorlds
AiTechWorlds
A university registrar puts together a quick spreadsheet to track student enrollments. It seems efficient at first: one row per student, with all their courses in one cell.
| StudentID | Name | Courses |
|---|---|---|
| 1 | Alice | Math, Physics, Chemistry |
| 2 | Ben | Physics, Biology |
| 3 | Clara | Math, Chemistry |
Now try to answer: "Which students are taking Physics?"
You cannot query it with a simple WHERE Courses = 'Physics'. You'd have to search inside the text of every cell. And what if a course is renamed from "Physics" to "Physics I"? You'd have to edit every row that contains it.
This table has a structural problem. Normalization is the process of restructuring tables to eliminate these problems.
Normalization is the systematic process of organizing a relational database to:
Normalization is a series of rules, each called a Normal Form (NF). Each form fixes a specific class of structural problem. You achieve them in order: 1NF first, then 2NF, then 3NF.
Before understanding what normalization fixes, understand what goes wrong without it.
Consider this single unnormalized table for a course registration system:
RegistrationRaw (before normalization)
| StudentID | StudentName | Department | DeptBuilding | CourseID | CourseName | Grade |
|---|---|---|---|---|---|---|
| 1 | Alice | Computer Science | Hall A | CS101 | Intro to CS | A |
| 1 | Alice | Computer Science | Hall A | CS201 | Data Structures | B |
| 2 | Ben | Mathematics | Hall B | MA101 | Calculus I | A |
| 2 | Ben | Mathematics | Hall B | CS101 | Intro to CS | B |
| 3 | Clara | Computer Science | Hall A | CS101 | Intro to CS | A |
Update Anomaly: The Computer Science department moves from Hall A to Hall C. You must update every row for every Computer Science student. Miss one row and the database becomes inconsistent — some rows say Hall A, some say Hall C.
Insert Anomaly: You want to add a new department, "Physics," in Hall D — but there are no students yet. You can't insert a row because the table requires a StudentID. You cannot record the department until a student enrolls in it.
Delete Anomaly: Clara (student 3) withdraws from the university. You delete her row. Now the fact that "CS101 is named Intro to CS" exists only because Clara was in it — and it's gone too.
These anomalies happen because one table is trying to represent multiple independent facts: student information, department information, course information, and enrollment information. Normalization separates them.
Before applying normal forms, you must understand functional dependency.
Attribute B is functionally dependent on attribute A (written A → B) if knowing the value of A tells you exactly one value of B.
StudentID → StudentName — knowing the ID tells you the name (one student has one name)CourseID → CourseName — knowing the course ID tells you the course nameStudentID, CourseID → Grade — knowing both the student AND the course tells you the gradeFunctional dependencies reveal the real-world rules your data must respect. Normalization ensures those rules are enforced by the table structure.
Rule: Every column must contain atomic values (indivisible), and there must be no repeating groups.
A table violates 1NF if:
"Math, Physics, Chemistry")Course1, Course2, Course3)Before 1NF:
| StudentID | Name | Courses |
|---|---|---|
| 1 | Alice | Math, Physics, Chemistry |
| 2 | Ben | Physics, Biology |
After 1NF — one row per student-course combination:
| StudentID | Name | CourseID |
|---|---|---|
| 1 | Alice | Math |
| 1 | Alice | Physics |
| 1 | Alice | Chemistry |
| 2 | Ben | Physics |
| 2 | Ben | Biology |
Now every cell holds one value. WHERE CourseID = 'Physics' works perfectly.
The primary key becomes the composite (StudentID, CourseID) — neither column alone uniquely identifies a row.
Dependency diagram after 1NF:
Rule: The table must be in 1NF, AND every non-key attribute must depend on the entire primary key — not just part of it.
This rule only applies when the primary key is composite. If the PK is a single column, 1NF → 2NF is automatic.
A partial dependency exists when a non-key attribute depends on only part of the composite key.
Identifying violations in the registration table:
The PK is (StudentID, CourseID).
Grade depends on (StudentID, CourseID) — ✓ full dependencyStudentName depends on StudentID alone — ✗ partial dependencyDepartment depends on StudentID alone — ✗ partial dependencyDeptBuilding depends on StudentID alone — ✗ partial dependencyCourseName depends on CourseID alone — ✗ partial dependencyDependency diagram showing violations:
Fixing 2NF: Separate each partial dependency into its own table.
Students table:
| StudentID | StudentName | Department | DeptBuilding |
|---|---|---|---|
| 1 | Alice | Computer Science | Hall A |
| 2 | Ben | Mathematics | Hall B |
| 3 | Clara | Computer Science | Hall A |
Courses table:
| CourseID | CourseName |
|---|---|
| CS101 | Intro to CS |
| CS201 | Data Structures |
| MA101 | Calculus I |
Enrollments table (junction):
| StudentID | CourseID | Grade |
|---|---|---|
| 1 | CS101 | A |
| 1 | CS201 | B |
| 2 | MA101 | A |
| 2 | CS101 | B |
| 3 | CS101 | A |
After 2NF:
Before normalization:
RegistrationRaw
┌───────────┬─────────────┬──────────────┬──────────────┬──────────┬────────────┬───────┐
│ StudentID │ StudentName │ Department │ DeptBuilding │ CourseID │ CourseName │ Grade │
└───────────┴─────────────┴──────────────┴──────────────┴──────────┴────────────┴───────┘
Everything in one table — redundant, anomaly-prone
After 1NF + 2NF:
Students Courses Enrollments
┌───────────┬─────────────┐ ┌──────────┬────────────┐ ┌───────────┬──────────┬───────┐
│ StudentID │ StudentName │ │ CourseID │ CourseName │ │ StudentID │ CourseID │ Grade │
│ Department│ DeptBuilding│ └──────────┴────────────┘ └───────────┴──────────┴───────┘
└───────────┴─────────────┘
Each table represents one concept. Each fact is stored once.
| Normal Form | What It Fixes | Rule |
|---|---|---|
| Unnormalized | — | Cells can have lists, no key required |
| 1NF | Multi-value cells, repeating columns | Atomic values, defined primary key |
| 2NF | Partial dependencies | Non-key attributes depend on full PK |
But our Students table still has a problem: DeptBuilding depends on Department, not on StudentID. That's a different kind of redundancy — a transitive dependency. It's what 3NF fixes.
Next lesson: 3NF and BCNF — removing transitive dependencies and reaching a fully clean schema.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises