AiTechWorlds
AiTechWorlds
Here is the problem: an NFA can be in multiple states simultaneously. A DFA must be in exactly one state. These two models seem fundamentally incompatible — until you realize that a single DFA state can represent a set of NFA states.
The moment this clicks, the entire conversion algorithm is obvious. The DFA does not track which NFA state the machine is in. It tracks which set of NFA states the machine might be in. Each such set is a single DFA state.
If an NFA has 3 states — call them {A, B, C} — then the DFA's states are subsets of {A, B, C}: there are at most 2³ = 8 possible subsets, and therefore at most 8 DFA states. Many of these subsets will typically be unreachable from the start state, so the practical DFA is usually much smaller.
This algorithm was published in 1959 by Michael Rabin and Dana Scott in a landmark paper titled "Finite Automata and Their Decision Problems." It earned them the ACM Turing Award in 1976. Their key contribution was proving that nondeterminism adds no computational power to finite automata — only design convenience.
Designing an automaton for a complex pattern is dramatically easier with an NFA. ε-transitions let you compose sub-machines. Multiple transitions let you "guess" without committing.
Executing an automaton efficiently requires a DFA. A DFA needs only a single table lookup per input character — O(1) — giving O(|w|) total time to process string w.
The standard engineering solution:
This pipeline is exactly what lexer generators like flex, re2c, and the Go regexp package implement.
Input: NFA N = (Q_N, Σ, δ_N, q₀, F_N) Output: DFA D = (Q_D, Σ, δ_D, q_D₀, F_D) with L(D) = L(N)
Step 1 — Start state of DFA:
q_D₀ = ε-closure({q₀})The DFA start state is the ε-closure of the NFA start state.
Step 2 — Transitions: For each DFA state S (a set of NFA states) and each input symbol a ∈ Σ:
δ_D(S, a) = ε-closure( ∪_{q ∈ S} δ_N(q, a) )Follow all NFA transitions on a from every state in S, then take the ε-closure.
Step 3 — Accept states:
A DFA state S is an accept state if
S ∩ F_N ≠ ∅Any DFA state containing at least one NFA accept state is a DFA accept state.
Step 4 — Repeat Step 2 for every newly discovered DFA state until no new states appear.
NFA: Accepts binary strings ending in "01".
States: Q_N = {q₀, q₁, q₂}, Σ = {0, 1}, start = q₀, accept = {q₂}
NFA transitions (no ε-transitions in this NFA):
| NFA State | On '0' | On '1' |
|---|---|---|
| q₀ | {q₀, q₁} | {q₀} |
| q₁ | ∅ | {q₂} |
| q₂ | ∅ | ∅ |
Since there are no ε-transitions, ε-closure(S) = S for all S.
DFA Start State: ε-closure({q₀}) = {q₀}
Name this DFA state A = {q₀}.
Process A = {q₀}:
On '0': δ_N(q₀, 0) = {q₀, q₁} → ε-closure = {q₀, q₁} — new state B On '1': δ_N(q₀, 1) = {q₀} → ε-closure = {q₀} = A (already known)
Process B = {q₀, q₁}:
On '0': δ_N(q₀, 0) = {q₀, q₁} δ_N(q₁, 0) = ∅ Union = {q₀, q₁} → ε-closure = {q₀, q₁} = B (already known)
On '1': δ_N(q₀, 1) = {q₀} δ_N(q₁, 1) = {q₂} Union = {q₀, q₂} → ε-closure = {q₀, q₂} — new state C
Process C = {q₀, q₂}:
On '0': δ_N(q₀, 0) = {q₀, q₁} δ_N(q₂, 0) = ∅ Union = {q₀, q₁} → ε-closure = {q₀, q₁} = B (already known)
On '1': δ_N(q₀, 1) = {q₀} δ_N(q₂, 1) = ∅ Union = {q₀} → ε-closure = {q₀} = A (already known)
No new states discovered. Construction complete.
| NFA States | DFA State | On '0' | On '1' | Accept? |
|---|---|---|---|---|
| {q₀} | A | B | A | No |
| {q₀, q₁} | B | B | C | No |
| {q₀, q₂} | C | B | A | Yes (contains q₂) |
This is the same DFA we would have designed directly in Lesson 3 — obtained automatically by the subset construction from the NFA.
Start: A
Read '0' → B
Read '1' → C ← accept state
Read '0' → B
Read '1' → C ← accept state
End: C ∈ F_D → ACCEPT ✓
The theoretical maximum: an NFA with n states can produce a DFA with up to 2ⁿ states.
For n = 5 → up to 32 DFA states. For n = 10 → up to 1,024 DFA states. For n = 20 → up to 1,048,576 DFA states.
The worst-case construction (due to Meyer and Fischer, 1971): an NFA over Σ = {0, 1} where the language is "strings whose n-th character from the end is 1." This NFA needs n+1 states, but the equivalent DFA requires 2ⁿ states — and all of them are reachable.
In practice, most NFAs produced from real-world regular expressions do not trigger this worst case. Lazy construction — only computing DFA states that are actually reachable from the start state — avoids building unreachable states entirely.
Once you have a DFA, it may have redundant states. Two states are equivalent if they behave identically on every future input string — they both accept or both reject the same continuations.
Hopcroft's Algorithm (1971):
Hopcroft's algorithm runs in O(n log n) time, where n is the number of DFA states. The result is the unique minimum-state DFA for the language.
A lexer (lexical analyzer) is the first stage of a compiler. It reads source code and produces tokens — keywords, identifiers, numbers, operators.
Token recognition rules are written as regular expressions. The lexer generator:
Tools that implement this pipeline: flex (the GNU lexer generator), re2c (used in PHP, Nginx, SQLite parsers), and ANTLR (for Java). The DFA lookup tables they generate are the inner loop of every compiler's tokenizer.
The subset construction proves, constructively, that NFA and DFA recognize exactly the same class of languages — the regular languages. The proof is the construction: given any NFA, you can build a DFA. Given any DFA, it is already an NFA (with only one-element result sets and no ε-transitions).
The cost is state count: up to exponential blowup in theory, manageable in practice. The benefit is design flexibility: build your automaton as an NFA (using Thompson's construction from a regex), then automatically convert it to an efficient DFA for execution.
This is one of the cleanest examples in all of computer science of the separation between what a system can do and how efficiently it does it.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises