Here is the problem: an NFA can be in multiple states simultaneously. A DFA must be in exactly one state. These two models seem fundamentally incompatible — until you realize that a single DFA state can represent a set of NFA states.

The moment this clicks, the entire conversion algorithm is obvious. The DFA does not track which NFA state the machine is in. It tracks which set of NFA states the machine might be in. Each such set is a single DFA state.

If an NFA has 3 states — call them {A, B, C} — then the DFA's states are subsets of {A, B, C}: there are at most 2³ = 8 possible subsets, and therefore at most 8 DFA states. Many of these subsets will typically be unreachable from the start state, so the practical DFA is usually much smaller.

This algorithm was published in 1959 by Michael Rabin and Dana Scott in a landmark paper titled "Finite Automata and Their Decision Problems." It earned them the ACM Turing Award in 1976. Their key contribution was proving that nondeterminism adds no computational power to finite automata — only design convenience.

Why Conversion Matters

Designing an automaton for a complex pattern is dramatically easier with an NFA. ε-transitions let you compose sub-machines. Multiple transitions let you "guess" without committing.

Executing an automaton efficiently requires a DFA. A DFA needs only a single table lookup per input character — O(1) — giving O(|w|) total time to process string w.

The standard engineering solution:

Design the NFA (by hand, or via Thompson's construction from a regex)
Convert the NFA to a DFA using subset construction
Minimize the DFA using Hopcroft's algorithm
Use the minimized DFA for all subsequent matching

This pipeline is exactly what lexer generators like flex, re2c, and the Go regexp package implement.

The Subset Construction Algorithm

Input: NFA N = (Q_N, Σ, δ_N, q₀, F_N) Output: DFA D = (Q_D, Σ, δ_D, q_D₀, F_D) with L(D) = L(N)

Step 1 — Start state of DFA:

q_D₀ = ε-closure({q₀}) The DFA start state is the ε-closure of the NFA start state.

Step 2 — Transitions: For each DFA state S (a set of NFA states) and each input symbol a ∈ Σ:

δ_D(S, a) = ε-closure( ∪_{q ∈ S} δ_N(q, a) ) Follow all NFA transitions on a from every state in S, then take the ε-closure.

Step 3 — Accept states:

A DFA state S is an accept state if S ∩ F_N ≠ ∅ Any DFA state containing at least one NFA accept state is a DFA accept state.

Step 4 — Repeat Step 2 for every newly discovered DFA state until no new states appear.

Complete Worked Example

NFA: Accepts binary strings ending in "01".

States: Q_N = {q₀, q₁, q₂}, Σ = {0, 1}, start = q₀, accept = {q₂}

NFA transitions (no ε-transitions in this NFA):

NFA State	On '0'	On '1'
q₀	{q₀, q₁}	{q₀}
q₁	∅	{q₂}
q₂	∅	∅

Building the DFA Step by Step

Since there are no ε-transitions, ε-closure(S) = S for all S.

DFA Start State: ε-closure({q₀}) = {q₀}

Name this DFA state A = {q₀}.

Process A = {q₀}:

On '0': δ_N(q₀, 0) = {q₀, q₁} → ε-closure = {q₀, q₁} — new state B On '1': δ_N(q₀, 1) = {q₀} → ε-closure = {q₀} = A (already known)

Process B = {q₀, q₁}:

On '0': δ_N(q₀, 0) = {q₀, q₁} δ_N(q₁, 0) = ∅ Union = {q₀, q₁} → ε-closure = {q₀, q₁} = B (already known)

On '1': δ_N(q₀, 1) = {q₀} δ_N(q₁, 1) = {q₂} Union = {q₀, q₂} → ε-closure = {q₀, q₂} — new state C

Process C = {q₀, q₂}:

On '0': δ_N(q₀, 0) = {q₀, q₁} δ_N(q₂, 0) = ∅ Union = {q₀, q₁} → ε-closure = {q₀, q₁} = B (already known)

On '1': δ_N(q₀, 1) = {q₀} δ_N(q₂, 1) = ∅ Union = {q₀} → ε-closure = {q₀} = A (already known)

No new states discovered. Construction complete.

Complete Subset Construction Table

NFA States	DFA State	On '0'	On '1'	Accept?
{q₀}	A	B	A	No
{q₀, q₁}	B	B	C	No
{q₀, q₂}	C	B	A	Yes (contains q₂)

The Resulting DFA

This is the same DFA we would have designed directly in Lesson 3 — obtained automatically by the subset construction from the NFA.

Verification: Trace "0101" Through the DFA

Start: A
  Read '0' → B
  Read '1' → C  ← accept state
  Read '0' → B
  Read '1' → C  ← accept state
End: C ∈ F_D → ACCEPT ✓

State Explosion: The Worst Case

The theoretical maximum: an NFA with n states can produce a DFA with up to 2ⁿ states.

For n = 5 → up to 32 DFA states. For n = 10 → up to 1,024 DFA states. For n = 20 → up to 1,048,576 DFA states.

The worst-case construction (due to Meyer and Fischer, 1971): an NFA over Σ = {0, 1} where the language is "strings whose n-th character from the end is 1." This NFA needs n+1 states, but the equivalent DFA requires 2ⁿ states — and all of them are reachable.

In practice, most NFAs produced from real-world regular expressions do not trigger this worst case. Lazy construction — only computing DFA states that are actually reachable from the start state — avoids building unreachable states entirely.

DFA Minimization: After Conversion

Once you have a DFA, it may have redundant states. Two states are equivalent if they behave identically on every future input string — they both accept or both reject the same continuations.

Hopcroft's Algorithm (1971):

Start with two partitions: accept states and non-accept states
Repeatedly split partitions: if two states in the same partition behave differently on some input (going to different partitions), separate them
Continue until no more splits are needed
Each remaining partition is one minimized DFA state

Hopcroft's algorithm runs in O(n log n) time, where n is the number of DFA states. The result is the unique minimum-state DFA for the language.

Real-World Connection: Lexer Generators

A lexer (lexical analyzer) is the first stage of a compiler. It reads source code and produces tokens — keywords, identifiers, numbers, operators.

Token recognition rules are written as regular expressions. The lexer generator:

Takes each regex rule and converts it to an NFA (Thompson's construction)
Combines all NFAs in parallel with ε-transitions from a new start state
Converts the combined NFA to a DFA (subset construction)
Minimizes the DFA (Hopcroft's algorithm)
Outputs the DFA as a C/C++ lookup table

Tools that implement this pipeline: flex (the GNU lexer generator), re2c (used in PHP, Nginx, SQLite parsers), and ANTLR (for Java). The DFA lookup tables they generate are the inner loop of every compiler's tokenizer.

Key Takeaways

The subset construction proves, constructively, that NFA and DFA recognize exactly the same class of languages — the regular languages. The proof is the construction: given any NFA, you can build a DFA. Given any DFA, it is already an NFA (with only one-element result sets and no ε-transitions).

The cost is state count: up to exponential blowup in theory, manageable in practice. The benefit is design flexibility: build your automaton as an NFA (using Thompson's construction from a regex), then automatically convert it to an efficient DFA for execution.

This is one of the cleanest examples in all of computer science of the separation between what a system can do and how efficiently it does it.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

30 minLesson 5 of 16

Course Contents(16 lessons)

▾

Chapter 1: Mathematical Foundations

What Is Theory of Computation? Why It Matters20 min

Mathematical Foundations: Sets, Functions, Proofs28 min

Chapter 2: Finite Automata

Deterministic Finite Automata (DFA): States and Transitions35 min

Non-Deterministic Finite Automata (NFA)32 min

NFA to DFA: Subset Construction Algorithm30 min

Chapter 3: Regular Languages

Regular Expressions: Pattern Matching Formalized32 min

Pumping Lemma: Proving Languages Are Not Regular28 min

Closure Properties of Regular Languages25 min

Chapter 4: Context-Free Languages

Context-Free Grammars: Defining Programming Languages35 min

Pushdown Automata: Adding a Stack to Finite Automata32 min

CNF and the CYK Parsing Algorithm30 min

Pumping Lemma for Context-Free Languages25 min

Chapter 5: Computability

Turing Machines: The Universal Computer38 min

Decidability, Undecidability, and the Halting Problem32 min

Chapter 6: Computational Complexity

P vs NP: The Million-Dollar Question35 min

NP-Completeness and Polynomial Reductions38 min