AiTechWorlds
AiTechWorlds
Imagine a lock with a combination. A DFA is like trying each combination one at a time — methodical, deterministic, patient. An NFA is like having a thousand copies of yourself, each trying a different combination simultaneously. The lock opens if any copy finds the right sequence.
This parallel-exploration idea is called nondeterminism, and it is one of the most mind-bending concepts in computer science. The machine does not "guess" — it explores all possibilities at once. If any computational path leads to an accept state, the NFA accepts.
The surprising result — proved by Michael Rabin and Dana Scott in their 1959 paper that won the Turing Award — is that this apparent superpower changes nothing about which languages can be recognized. Every NFA can be converted to an equivalent DFA. The expressive power is identical.
But what NFAs do change is the ease of design. An NFA for a complex pattern might need 5 states. The equivalent DFA might need 32. The NFA is the designer's tool; the DFA is the implementer's tool.
Three key differences separate an NFA from a DFA:
| Feature | DFA | NFA | Impact on Design |
|---|---|---|---|
| Next states per input | Exactly 1 | 0, 1, or many | NFA can branch and explore |
| ε-transitions | Not allowed | Allowed | NFA can chain sub-machines |
| Acceptance | Must end in accept state | Any path to accept state | NFA is more forgiving to design |
| Implementation | Direct (table lookup) | Requires subset tracking | DFA is easier to implement |
| State count | Often more states | Often fewer states | NFA is more compact |
| Recognized languages | Regular languages | Regular languages | Same expressive power |
An NFA is also a 5-tuple:
N = (Q, Σ, δ, q₀, F)
The components are the same as a DFA, but the transition function δ is fundamentally different:
δ: Q × Σ_ε → P(Q)
Where:
Σ_ε = Σ ∪ {ε} — the alphabet extended with the empty stringP(Q) — the power set of Q (all subsets of states, including ∅)In a DFA: δ(q₀, 1) = q₁ (one next state)
In an NFA: δ(q₀, 1) = {q₁, q₂} (two next states — the machine "forks")
The ε-closure of a state q, written ε-closure(q), is the set of all states reachable from q using only ε-transitions (zero or more).
For a set of states S: ε-closure(S) = ∪_{q ∈ S} ε-closure(q)
Why this matters: Before processing each input symbol, and after taking any ε-transitions, we must compute the ε-closure to find all states the NFA could currently be in.
Example: If ε-closure(q₀) = {q₀, q₁, q₃}, the NFA starts the computation already considering itself to be in states q₀, q₁, and q₃ simultaneously — before reading a single character.
The same language as DFA Example 2 in Lesson 3 — but now designed as an NFA.
Idea: Most of the time we don't know if we are "starting the end." The NFA can simply guess at each 0 whether this 0 begins the final "01" suffix.
q₀ — general state (haven't started the end)q₁ — we just read a 0 that (we're guessing) begins the final "01"q₂ — we just read "01" at the end (accept state)Transitions:
δ(q₀, 0) = {q₀, q₁} — stay in q₀ (this might not be the last 0), or fork to q₁ (guessing this is the last 0)δ(q₀, 1) = {q₀} — reading a 1 keeps us in q₀δ(q₁, 1) = {q₂} — after the guessed 0, reading a 1 completes "01"δ(q₁, 0) = ∅ — if we see another 0, this path dies (our guess was wrong)δ(q₂, 0) = δ(q₂, 1) = ∅ — no transitions out of q₂ (we want exactly "...01" at the end)Trace on "0101":
At each step, track the set of active states:
Start: {q₀} (ε-closure)
Read '0':
From q₀ on '0': δ(q₀, 0) = {q₀, q₁}
ε-closure({q₀, q₁}) = {q₀, q₁}
Active: {q₀, q₁}
Read '1':
From q₀ on '1': δ(q₀, 1) = {q₀}
From q₁ on '1': δ(q₁, 1) = {q₂}
ε-closure({q₀, q₂}) = {q₀, q₂}
Active: {q₀, q₂}
Read '0':
From q₀ on '0': δ(q₀, 0) = {q₀, q₁}
From q₂ on '0': δ(q₂, 0) = ∅
ε-closure({q₀, q₁}) = {q₀, q₁}
Active: {q₀, q₁}
Read '1':
From q₀ on '1': δ(q₀, 1) = {q₀}
From q₁ on '1': δ(q₁, 1) = {q₂}
ε-closure({q₀, q₂}) = {q₀, q₂}
Active: {q₀, q₂}
Final active states: {q₀, q₂}
q₂ ∈ F → ACCEPT ✓
Notice: this NFA needed only 3 states for the same language that required 3 states in the DFA — but the design reasoning was far more natural. For more complex patterns, the state savings are dramatic.
ε-transitions are the glue that lets you build large NFAs from small components. Consider building an NFA for the language L₁ · L₂ (strings from L₁ followed by strings from L₂):
Add an ε-transition from every accept state of the L₁ NFA to the start state of the L₂ NFA. This composes the two machines without needing to redesign anything. This modular composition is exactly what Thompson's construction uses to convert a regex into an NFA.
In 1968, Ken Thompson — co-creator of Unix and the C programming language — published an algorithm to convert any regular expression into an NFA. The key insight: every regex operation corresponds to a simple NFA wiring pattern.
a: Two states connected by transition on aR₁ | R₂: New start state with ε-transitions to both NFA₁ and NFA₂; ε-transitions from both accept states to new single accept stateR₁R₂: ε-transition from accept states of NFA₁ to start of NFA₂R*: ε-transition from accept state back to start; new start and accept with ε-connections to allow zero repetitionsThe resulting NFA for pattern ab*c looks like:
Thompson's construction produces an NFA with at most 2n states for a regex with n symbols. This is the algorithm used internally by grep, awk, and most Unix text-processing tools.
If NFAs are easier to design, why not implement them directly? The answer is efficiency.
Simulating an NFA requires tracking a set of active states at each step. For an NFA with n states, there can be up to 2ⁿ different subsets — which means simulation can be expensive. In the worst case, O(n²) per input character when tracking sets.
By contrast, a DFA requires only a single state lookup per input character — O(1) per character, O(|w|) total.
The solution: Convert the NFA to a DFA once (the subset construction, covered in Lesson 5), then use the DFA for all subsequent matching. This is exactly what grep -E, RE2, and Rust's regex crate do: compile the pattern once, then match in linear time.
But: For interactive use — IDE highlighting, live search — it is sometimes faster to simulate the NFA directly rather than pay the upfront conversion cost. Both approaches have their place.
The conceptually cleanest way to think about NFA execution is as parallel computation:
{q₀} (just the start state)δ(q, a)
b. Union all these result sets together
c. Take the ε-closure of the unionThis view makes the equivalence with DFA obvious: the "set of currently active NFA states" is exactly one DFA state in the subset construction. We are running a DFA whose states happen to be subsets of NFA states.
NFAs do not compute anything that DFAs cannot compute. The Church-Turing Thesis for finite automata says: DFAs, NFAs, and regular expressions all define exactly the class of regular languages.
But NFAs are the right tool for design, and DFAs are the right tool for execution. Understanding both — and knowing how to convert between them — is the foundation of lexer generators, regex engines, and the entire field of language processing.
Lesson 5 walks through the conversion algorithm step by step, complete with a worked example showing exactly how a subset of NFA states becomes a single DFA state.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises