AiTechWorlds
AiTechWorlds
Imagine a filing cabinet where a folder labeled "Student ID" contains the student's city, and inside that folder, based on the city, you can find the zip code. Nobody labeled the folder "Zip Code" — yet the zip code is hiding inside, reachable through a chain: Student ID → City → Zip Code.
You normalized to 2NF, removed partial dependencies, and thought you were done. But this chain — where a non-key column determines another non-key column — is a transitive dependency, and it quietly corrupts your database design.
3NF is about cutting those chains.
Given a table with primary key K, a transitive dependency exists when:
K → A → B
Column A is determined by K (fine), but column B is determined by A — not by K directly. B is "transitively" dependent on K through A.
| StudentID | StudentName | CourseID | CourseName | InstructorID | InstructorDept |
|---|---|---|---|---|---|
| S1 | Alice | C101 | Databases | I5 | CS |
| S1 | Alice | C202 | Networks | I7 | CS |
| S2 | Bob | C101 | Databases | I5 | CS |
Primary key: (StudentID, CourseID)
Problems:
CourseName depends only on CourseID — partial dependency (2NF violation)InstructorDept depends on InstructorID, which depends on CourseID — transitive dependency (3NF violation)Step 1 — Fix 2NF first (remove partial dependencies):
Split into:
Step 2 — Fix 3NF (remove transitive dependencies):
In the Course table: CourseID → InstructorID → InstructorDept
InstructorDept is not determined by the primary key CourseID directly — it is determined by InstructorID.
Split into:
Final 3NF Schema:
| Table | Columns |
|---|---|
| Student | StudentID (PK), StudentName |
| Course | CourseID (PK), CourseName, InstructorID (FK) |
| Instructor | InstructorID (PK), InstructorDept |
| Enrollment | StudentID (FK), CourseID (FK) |
Every non-key attribute now depends on the key, the whole key, and nothing but the key.
Boyce-Codd Normal Form (BCNF) says: for every functional dependency X → Y, X must be a candidate key (or superkey). No exceptions.
3NF allows one loophole: a non-key attribute can determine a prime attribute (one that belongs to a candidate key) if it satisfies certain conditions. BCNF closes that loophole entirely.
3NF rule: Non-key attributes must depend on candidate keys — but prime attributes get a pass. BCNF rule: Every determinant must be a candidate key. Full stop.
Consider a university scheduling rule: each teacher teaches exactly one subject, but a subject can be taught by multiple teachers, and each teacher is assigned a specific room.
| Teacher | Subject | Room |
|---|---|---|
| Prof. Chen | Databases | Lab A |
| Prof. Smith | Networks | Room 3 |
| Prof. Lee | Databases | Lab B |
Candidate keys: (Teacher, Subject) and (Teacher, Room)
But notice: Teacher → Room (each teacher has one assigned room). The determinant Teacher is not a candidate key — it is only part of one. This violates BCNF.
Fix: Split into:
| Scenario | Use 3NF | Use BCNF |
|---|---|---|
| Multiple overlapping candidate keys | Prefer 3NF (BCNF may lose FDs) | Use carefully |
| Single candidate key | Both give same result | BCNF preferred |
| Lossless decomposition needed | 3NF always achieves this | BCNF also does |
| Dependency preservation required | 3NF guarantees it | BCNF may not |
Key insight: BCNF decompositions are not always dependency-preserving. If enforcing all original functional dependencies matters more than eliminating every anomaly, 3NF is the practical choice.
| Normal Form | Condition | Eliminates | When Sufficient |
|---|---|---|---|
| 1NF | Atomic values, no repeating groups | Multi-valued cells | Rarely — only as a baseline |
| 2NF | No partial dependencies on composite PK | Update anomalies from partial keys | Simple tables with single-column PK |
| 3NF | No transitive dependencies | Chains: non-key → non-key | Most real-world OLTP systems |
| BCNF | Every determinant is a candidate key | Subtle anomalies 3NF misses | When overlapping keys cause issues |
| 4NF | No multi-valued dependencies | Independent multi-value facts in one table | Complex many-to-many relationships |
| 5NF | No join dependencies | Decomposition anomalies | Academic/theoretical contexts |
Normalization eliminates redundancy but introduces joins. Every join has a cost: CPU time, disk reads, index lookups. At scale, this cost becomes measurable.
Denormalization is the deliberate introduction of redundancy to improve read performance.
order_total instead of summing line items on every readDenormalization is a measured trade-off, not an excuse for lazy design. Normalize first, then denormalize where profiling proves it is necessary.
This schema represents a fully normalized (3NF) design for a library system:
| Table | Primary Key | Foreign Keys | Notable Columns |
|---|---|---|---|
| Member | MemberID | — | Name, Email, JoinDate |
| Book | BookID | PublisherID | ISBN, Title, PublicationYear |
| Author | AuthorID | — | FirstName, LastName |
| BookAuthor | BookID, AuthorID | BookID, AuthorID | Role (primary/co-author) |
| Publisher | PublisherID | — | Name, Country |
| Copy | CopyID | BookID, BranchID | Condition, AcquisitionDate |
| Branch | BranchID | — | Name, Address |
| Loan | LoanID | CopyID, MemberID | IssueDate, DueDate, ReturnDate |
| Fine | FineID | LoanID | Amount, PaidDate |
Every non-key column depends on its table's primary key directly — no partial dependencies, no transitive dependencies. The BookAuthor bridge table resolves the many-to-many relationship between books and authors cleanly.
Before declaring your schema normalized:
If you answer yes to all four, your schema is in BCNF and ready for production.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises