Imagine a filing cabinet where a folder labeled "Student ID" contains the student's city, and inside that folder, based on the city, you can find the zip code. Nobody labeled the folder "Zip Code" — yet the zip code is hiding inside, reachable through a chain: Student ID → City → Zip Code.

You normalized to 2NF, removed partial dependencies, and thought you were done. But this chain — where a non-key column determines another non-key column — is a transitive dependency, and it quietly corrupts your database design.

3NF is about cutting those chains.

What Is a Transitive Dependency?

Given a table with primary key K, a transitive dependency exists when:

K → A → B

Column A is determined by K (fine), but column B is determined by A — not by K directly. B is "transitively" dependent on K through A.

Example: Course Registration Table

StudentID	StudentName	CourseID	CourseName	InstructorID	InstructorDept
S1	Alice	C101	Databases	I5	CS
S1	Alice	C202	Networks	I7	CS
S2	Bob	C101	Databases	I5	CS

Primary key: (StudentID, CourseID)

Problems:

CourseName depends only on CourseID — partial dependency (2NF violation)
InstructorDept depends on InstructorID, which depends on CourseID — transitive dependency (3NF violation)
If instructor I5 moves departments, every row with C101 must be updated

Step-by-Step Normalization to 3NF

Step 1 — Fix 2NF first (remove partial dependencies):

Split into:

Enrollment (StudentID, CourseID)
Course (CourseID, CourseName, InstructorID)
Student (StudentID, StudentName)

Step 2 — Fix 3NF (remove transitive dependencies):

In the Course table: CourseID → InstructorID → InstructorDept

InstructorDept is not determined by the primary key CourseID directly — it is determined by InstructorID.

Split into:

Course (CourseID, CourseName, InstructorID)
Instructor (InstructorID, InstructorDept)

Final 3NF Schema:

Table	Columns
Student	StudentID (PK), StudentName
Course	CourseID (PK), CourseName, InstructorID (FK)
Instructor	InstructorID (PK), InstructorDept
Enrollment	StudentID (FK), CourseID (FK)

Every non-key attribute now depends on the key, the whole key, and nothing but the key.

BCNF: The Stricter Standard

Boyce-Codd Normal Form (BCNF) says: for every functional dependency X → Y, X must be a candidate key (or superkey). No exceptions.

3NF allows one loophole: a non-key attribute can determine a prime attribute (one that belongs to a candidate key) if it satisfies certain conditions. BCNF closes that loophole entirely.

3NF rule: Non-key attributes must depend on candidate keys — but prime attributes get a pass. BCNF rule: Every determinant must be a candidate key. Full stop.

Classic BCNF Violation: The Scheduling Example

Consider a university scheduling rule: each teacher teaches exactly one subject, but a subject can be taught by multiple teachers, and each teacher is assigned a specific room.

Teacher	Subject	Room
Prof. Chen	Databases	Lab A
Prof. Smith	Networks	Room 3
Prof. Lee	Databases	Lab B

Candidate keys: (Teacher, Subject) and (Teacher, Room)

But notice: Teacher → Room (each teacher has one assigned room). The determinant Teacher is not a candidate key — it is only part of one. This violates BCNF.

Fix: Split into:

TeacherRoom (Teacher, Room)
SubjectOffering (Teacher, Subject)

3NF vs BCNF: When Each Is Sufficient

Scenario	Use 3NF	Use BCNF
Multiple overlapping candidate keys	Prefer 3NF (BCNF may lose FDs)	Use carefully
Single candidate key	Both give same result	BCNF preferred
Lossless decomposition needed	3NF always achieves this	BCNF also does
Dependency preservation required	3NF guarantees it	BCNF may not

Key insight: BCNF decompositions are not always dependency-preserving. If enforcing all original functional dependencies matters more than eliminating every anomaly, 3NF is the practical choice.

Normal Form Reference Table

Normal Form	Condition	Eliminates	When Sufficient
1NF	Atomic values, no repeating groups	Multi-valued cells	Rarely — only as a baseline
2NF	No partial dependencies on composite PK	Update anomalies from partial keys	Simple tables with single-column PK
3NF	No transitive dependencies	Chains: non-key → non-key	Most real-world OLTP systems
BCNF	Every determinant is a candidate key	Subtle anomalies 3NF misses	When overlapping keys cause issues
4NF	No multi-valued dependencies	Independent multi-value facts in one table	Complex many-to-many relationships
5NF	No join dependencies	Decomposition anomalies	Academic/theoretical contexts

Denormalization: Breaking the Rules on Purpose

Normalization eliminates redundancy but introduces joins. Every join has a cost: CPU time, disk reads, index lookups. At scale, this cost becomes measurable.

Denormalization is the deliberate introduction of redundancy to improve read performance.

When to Denormalize

Read-heavy workloads — reporting, analytics, dashboards
Frequently joined tables — if two tables are always queried together, merging them reduces latency
Aggregated values — storing a pre-computed order_total instead of summing line items on every read
Data warehousing — star schema and snowflake schema deliberately denormalize for analytical query speed

Risks of Denormalization

Update anomalies return — the same data exists in multiple places
Application complexity — code must update all copies consistently
Storage cost — redundant data occupies more space

Denormalization is a measured trade-off, not an excuse for lazy design. Normalize first, then denormalize where profiling proves it is necessary.

Complete Normalized Library Database Schema

This schema represents a fully normalized (3NF) design for a library system:

Table	Primary Key	Foreign Keys	Notable Columns
Member	MemberID	—	Name, Email, JoinDate
Book	BookID	PublisherID	ISBN, Title, PublicationYear
Author	AuthorID	—	FirstName, LastName
BookAuthor	BookID, AuthorID	BookID, AuthorID	Role (primary/co-author)
Publisher	PublisherID	—	Name, Country
Copy	CopyID	BookID, BranchID	Condition, AcquisitionDate
Branch	BranchID	—	Name, Address
Loan	LoanID	CopyID, MemberID	IssueDate, DueDate, ReturnDate
Fine	FineID	LoanID	Amount, PaidDate

Every non-key column depends on its table's primary key directly — no partial dependencies, no transitive dependencies. The BookAuthor bridge table resolves the many-to-many relationship between books and authors cleanly.

Quick Checklist

Before declaring your schema normalized:

1NF — Are all columns atomic? No arrays or comma-separated lists in a cell?
2NF — Does every non-key column depend on the entire primary key?
3NF — Does every non-key column depend directly on the primary key, not through another column?
BCNF — Is every determinant in every functional dependency a candidate key?

If you answer yes to all four, your schema is in BCNF and ready for production.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

30 minLesson 9 of 16

Course Contents(16 lessons)

▾

Chapter 1: Database Fundamentals

What Is a Database? DBMS vs File Systems20 min

ER Diagrams: Designing Databases Visually35 min

The Relational Model: Tables, Keys, and Constraints28 min

Chapter 2: SQL

SQL DDL: Creating Tables, Constraints, and Schemas28 min

SQL DML: SELECT, INSERT, UPDATE, DELETE35 min

SQL JOINs: Combining Tables38 min

SQL Advanced: Subqueries, Aggregates, Window Functions40 min

Chapter 3: Normalization

Functional Dependencies, 1NF, and 2NF32 min

3NF, BCNF: Removing All Redundancy30 min

Chapter 4: Transactions and Concurrency

Transactions and ACID Properties32 min

Concurrency Control: Locks, Isolation Levels30 min

Deadlocks: Detection, Prevention, and Recovery28 min

Chapter 5: Indexing and Optimization

Indexes and B-Trees: Making Queries Fast32 min

Query Optimization and Execution Plans30 min

Chapter 6: NoSQL and Final Project

NoSQL Databases: Types, Use Cases, Tradeoffs30 min

Final Project: Design a Complete Database System45 min

Chapter 3: Normalization

3NF, BCNF: Removing All Redundancy

3NF, BCNF, and Denormalization

The Chain Problem

3NF is about cutting those chains.

What Is a Transitive Dependency?

Given a table with primary key K, a transitive dependency exists when:

K → A → B

Column A is determined by K (fine), but column B is determined by A — not by K directly. B is "transitively" dependent on K through A.

Example: Course Registration Table

StudentID	StudentName	CourseID	CourseName	InstructorID	InstructorDept
S1	Alice	C101	Databases	I5	CS
S1	Alice	C202	Networks	I7	CS
S2	Bob	C101	Databases	I5	CS

Primary key: (StudentID, CourseID)

Problems:

CourseName depends only on CourseID — partial dependency (2NF violation)
InstructorDept depends on InstructorID, which depends on CourseID — transitive dependency (3NF violation)
If instructor I5 moves departments, every row with C101 must be updated

Step-by-Step Normalization to 3NF

Step 1 — Fix 2NF first (remove partial dependencies):

Split into:

Enrollment (StudentID, CourseID)
Course (CourseID, CourseName, InstructorID)
Student (StudentID, StudentName)

Step 2 — Fix 3NF (remove transitive dependencies):

In the Course table: CourseID → InstructorID → InstructorDept

InstructorDept is not determined by the primary key CourseID directly — it is determined by InstructorID.

Split into:

Course (CourseID, CourseName, InstructorID)
Instructor (InstructorID, InstructorDept)

Final 3NF Schema:

Table	Columns
Student	StudentID (PK), StudentName
Course	CourseID (PK), CourseName, InstructorID (FK)
Instructor	InstructorID (PK), InstructorDept
Enrollment	StudentID (FK), CourseID (FK)

Every non-key attribute now depends on the key, the whole key, and nothing but the key.

BCNF: The Stricter Standard

Boyce-Codd Normal Form (BCNF) says: for every functional dependency X → Y, X must be a candidate key (or superkey). No exceptions.

3NF allows one loophole: a non-key attribute can determine a prime attribute (one that belongs to a candidate key) if it satisfies certain conditions. BCNF closes that loophole entirely.

3NF rule: Non-key attributes must depend on candidate keys — but prime attributes get a pass. BCNF rule: Every determinant must be a candidate key. Full stop.

Classic BCNF Violation: The Scheduling Example

Consider a university scheduling rule: each teacher teaches exactly one subject, but a subject can be taught by multiple teachers, and each teacher is assigned a specific room.

Teacher	Subject	Room
Prof. Chen	Databases	Lab A
Prof. Smith	Networks	Room 3
Prof. Lee	Databases	Lab B

Candidate keys: (Teacher, Subject) and (Teacher, Room)

But notice: Teacher → Room (each teacher has one assigned room). The determinant Teacher is not a candidate key — it is only part of one. This violates BCNF.

Fix: Split into:

TeacherRoom (Teacher, Room)
SubjectOffering (Teacher, Subject)

3NF vs BCNF: When Each Is Sufficient

Scenario	Use 3NF	Use BCNF
Multiple overlapping candidate keys	Prefer 3NF (BCNF may lose FDs)	Use carefully
Single candidate key	Both give same result	BCNF preferred
Lossless decomposition needed	3NF always achieves this	BCNF also does
Dependency preservation required	3NF guarantees it	BCNF may not

Key insight: BCNF decompositions are not always dependency-preserving. If enforcing all original functional dependencies matters more than eliminating every anomaly, 3NF is the practical choice.

Normal Form Reference Table

Normal Form	Condition	Eliminates	When Sufficient
1NF	Atomic values, no repeating groups	Multi-valued cells	Rarely — only as a baseline
2NF	No partial dependencies on composite PK	Update anomalies from partial keys	Simple tables with single-column PK
3NF	No transitive dependencies	Chains: non-key → non-key	Most real-world OLTP systems
BCNF	Every determinant is a candidate key	Subtle anomalies 3NF misses	When overlapping keys cause issues
4NF	No multi-valued dependencies	Independent multi-value facts in one table	Complex many-to-many relationships
5NF	No join dependencies	Decomposition anomalies	Academic/theoretical contexts

Denormalization: Breaking the Rules on Purpose

Normalization eliminates redundancy but introduces joins. Every join has a cost: CPU time, disk reads, index lookups. At scale, this cost becomes measurable.

Denormalization is the deliberate introduction of redundancy to improve read performance.

When to Denormalize

Read-heavy workloads — reporting, analytics, dashboards
Frequently joined tables — if two tables are always queried together, merging them reduces latency
Aggregated values — storing a pre-computed order_total instead of summing line items on every read
Data warehousing — star schema and snowflake schema deliberately denormalize for analytical query speed

Risks of Denormalization

Update anomalies return — the same data exists in multiple places
Application complexity — code must update all copies consistently
Storage cost — redundant data occupies more space

Denormalization is a measured trade-off, not an excuse for lazy design. Normalize first, then denormalize where profiling proves it is necessary.

Complete Normalized Library Database Schema

This schema represents a fully normalized (3NF) design for a library system:

Table	Primary Key	Foreign Keys	Notable Columns
Member	MemberID	—	Name, Email, JoinDate
Book	BookID	PublisherID	ISBN, Title, PublicationYear
Author	AuthorID	—	FirstName, LastName
BookAuthor	BookID, AuthorID	BookID, AuthorID	Role (primary/co-author)
Publisher	PublisherID	—	Name, Country
Copy	CopyID	BookID, BranchID	Condition, AcquisitionDate
Branch	BranchID	—	Name, Address
Loan	LoanID	CopyID, MemberID	IssueDate, DueDate, ReturnDate
Fine	FineID	LoanID	Amount, PaidDate

Quick Checklist

Before declaring your schema normalized:

1NF — Are all columns atomic? No arrays or comma-separated lists in a cell?
2NF — Does every non-key column depend on the entire primary key?
3NF — Does every non-key column depend directly on the primary key, not through another column?
BCNF — Is every determinant in every functional dependency a candidate key?

If you answer yes to all four, your schema is in BCNF and ready for production.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →