Imagine a lock with a combination. A DFA is like trying each combination one at a time — methodical, deterministic, patient. An NFA is like having a thousand copies of yourself, each trying a different combination simultaneously. The lock opens if any copy finds the right sequence.

This parallel-exploration idea is called nondeterminism, and it is one of the most mind-bending concepts in computer science. The machine does not "guess" — it explores all possibilities at once. If any computational path leads to an accept state, the NFA accepts.

The surprising result — proved by Michael Rabin and Dana Scott in their 1959 paper that won the Turing Award — is that this apparent superpower changes nothing about which languages can be recognized. Every NFA can be converted to an equivalent DFA. The expressive power is identical.

But what NFAs do change is the ease of design. An NFA for a complex pattern might need 5 states. The equivalent DFA might need 32. The NFA is the designer's tool; the DFA is the implementer's tool.

What Makes an NFA Different from a DFA

Three key differences separate an NFA from a DFA:

Multiple transitions: From a single state, reading a single symbol, an NFA may transition to zero, one, or multiple states simultaneously.
ε-transitions: An NFA may transition to another state without reading any input symbol (using the empty string ε).
Acceptance condition: An NFA accepts a string if at least one computational path ends in an accept state. It rejects only if all paths fail.

Feature	DFA	NFA	Impact on Design
Next states per input	Exactly 1	0, 1, or many	NFA can branch and explore
ε-transitions	Not allowed	Allowed	NFA can chain sub-machines
Acceptance	Must end in accept state	Any path to accept state	NFA is more forgiving to design
Implementation	Direct (table lookup)	Requires subset tracking	DFA is easier to implement
State count	Often more states	Often fewer states	NFA is more compact
Recognized languages	Regular languages	Regular languages	Same expressive power

Formal Definition

An NFA is also a 5-tuple:

N = (Q, Σ, δ, q₀, F)

The components are the same as a DFA, but the transition function δ is fundamentally different:

δ: Q × Σ_ε → P(Q)

Where:

Σ_ε = Σ ∪ {ε} — the alphabet extended with the empty string
P(Q) — the power set of Q (all subsets of states, including ∅)
δ(q, a) returns a set of next states (possibly empty — meaning the path dies)

In a DFA: δ(q₀, 1) = q₁ (one next state) In an NFA: δ(q₀, 1) = {q₁, q₂} (two next states — the machine "forks")

The ε-Closure

The ε-closure of a state q, written ε-closure(q), is the set of all states reachable from q using only ε-transitions (zero or more).

For a set of states S: ε-closure(S) = ∪_{q ∈ S} ε-closure(q)

Why this matters: Before processing each input symbol, and after taking any ε-transitions, we must compute the ε-closure to find all states the NFA could currently be in.

Example: If ε-closure(q₀) = {q₀, q₁, q₃}, the NFA starts the computation already considering itself to be in states q₀, q₁, and q₃ simultaneously — before reading a single character.

Example: NFA for Strings Ending in "01"

The same language as DFA Example 2 in Lesson 3 — but now designed as an NFA.

Idea: Most of the time we don't know if we are "starting the end." The NFA can simply guess at each 0 whether this 0 begins the final "01" suffix.

q₀ — general state (haven't started the end)
q₁ — we just read a 0 that (we're guessing) begins the final "01"
q₂ — we just read "01" at the end (accept state)

Transitions:

δ(q₀, 0) = {q₀, q₁} — stay in q₀ (this might not be the last 0), or fork to q₁ (guessing this is the last 0)
δ(q₀, 1) = {q₀} — reading a 1 keeps us in q₀
δ(q₁, 1) = {q₂} — after the guessed 0, reading a 1 completes "01"
δ(q₁, 0) = ∅ — if we see another 0, this path dies (our guess was wrong)
δ(q₂, 0) = δ(q₂, 1) = ∅ — no transitions out of q₂ (we want exactly "...01" at the end)

Trace on "0101":

At each step, track the set of active states:

Start: {q₀} (ε-closure)

Read '0':
  From q₀ on '0': δ(q₀, 0) = {q₀, q₁}
  ε-closure({q₀, q₁}) = {q₀, q₁}
  Active: {q₀, q₁}

Read '1':
  From q₀ on '1': δ(q₀, 1) = {q₀}
  From q₁ on '1': δ(q₁, 1) = {q₂}
  ε-closure({q₀, q₂}) = {q₀, q₂}
  Active: {q₀, q₂}

Read '0':
  From q₀ on '0': δ(q₀, 0) = {q₀, q₁}
  From q₂ on '0': δ(q₂, 0) = ∅
  ε-closure({q₀, q₁}) = {q₀, q₁}
  Active: {q₀, q₁}

Read '1':
  From q₀ on '1': δ(q₀, 1) = {q₀}
  From q₁ on '1': δ(q₁, 1) = {q₂}
  ε-closure({q₀, q₂}) = {q₀, q₂}
  Active: {q₀, q₂}

Final active states: {q₀, q₂}
q₂ ∈ F → ACCEPT ✓

Notice: this NFA needed only 3 states for the same language that required 3 states in the DFA — but the design reasoning was far more natural. For more complex patterns, the state savings are dramatic.

ε-Transitions: Chaining Sub-Machines

ε-transitions are the glue that lets you build large NFAs from small components. Consider building an NFA for the language L₁ · L₂ (strings from L₁ followed by strings from L₂):

Add an ε-transition from every accept state of the L₁ NFA to the start state of the L₂ NFA. This composes the two machines without needing to redesign anything. This modular composition is exactly what Thompson's construction uses to convert a regex into an NFA.

Thompson's Construction: From Regex to NFA

In 1968, Ken Thompson — co-creator of Unix and the C programming language — published an algorithm to convert any regular expression into an NFA. The key insight: every regex operation corresponds to a simple NFA wiring pattern.

Single character a: Two states connected by transition on a
Union R₁ | R₂: New start state with ε-transitions to both NFA₁ and NFA₂; ε-transitions from both accept states to new single accept state
Concatenation R₁R₂: ε-transition from accept states of NFA₁ to start of NFA₂
Kleene star R*: ε-transition from accept state back to start; new start and accept with ε-connections to allow zero repetitions

The resulting NFA for pattern ab*c looks like:

Thompson's construction produces an NFA with at most 2n states for a regex with n symbols. This is the algorithm used internally by grep, awk, and most Unix text-processing tools.

Why Not Just Use NFAs Everywhere?

If NFAs are easier to design, why not implement them directly? The answer is efficiency.

Simulating an NFA requires tracking a set of active states at each step. For an NFA with n states, there can be up to 2ⁿ different subsets — which means simulation can be expensive. In the worst case, O(n²) per input character when tracking sets.

By contrast, a DFA requires only a single state lookup per input character — O(1) per character, O(|w|) total.

The solution: Convert the NFA to a DFA once (the subset construction, covered in Lesson 5), then use the DFA for all subsequent matching. This is exactly what grep -E, RE2, and Rust's regex crate do: compile the pattern once, then match in linear time.

But: For interactive use — IDE highlighting, live search — it is sometimes faster to simulate the NFA directly rather than pay the upfront conversion cost. Both approaches have their place.

Processing an NFA: The Parallel View

The conceptually cleanest way to think about NFA execution is as parallel computation:

Start with the set {q₀} (just the start state)
Take the ε-closure of this set
For each input character a: a. For each currently active state q, compute δ(q, a) b. Union all these result sets together c. Take the ε-closure of the union
After the last character: if any active state is in F, accept; else reject

This view makes the equivalence with DFA obvious: the "set of currently active NFA states" is exactly one DFA state in the subset construction. We are running a DFA whose states happen to be subsets of NFA states.

Key Insight

NFAs do not compute anything that DFAs cannot compute. The Church-Turing Thesis for finite automata says: DFAs, NFAs, and regular expressions all define exactly the class of regular languages.

But NFAs are the right tool for design, and DFAs are the right tool for execution. Understanding both — and knowing how to convert between them — is the foundation of lexer generators, regex engines, and the entire field of language processing.

Lesson 5 walks through the conversion algorithm step by step, complete with a worked example showing exactly how a subset of NFA states becomes a single DFA state.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

32 minLesson 4 of 16

Course Contents(16 lessons)

▾

Chapter 1: Mathematical Foundations

What Is Theory of Computation? Why It Matters20 min

Mathematical Foundations: Sets, Functions, Proofs28 min

Chapter 2: Finite Automata

Deterministic Finite Automata (DFA): States and Transitions35 min

Non-Deterministic Finite Automata (NFA)32 min

NFA to DFA: Subset Construction Algorithm30 min

Chapter 3: Regular Languages

Regular Expressions: Pattern Matching Formalized32 min

Pumping Lemma: Proving Languages Are Not Regular28 min

Closure Properties of Regular Languages25 min

Chapter 4: Context-Free Languages

Context-Free Grammars: Defining Programming Languages35 min

Pushdown Automata: Adding a Stack to Finite Automata32 min

CNF and the CYK Parsing Algorithm30 min

Pumping Lemma for Context-Free Languages25 min

Chapter 5: Computability

Turing Machines: The Universal Computer38 min

Decidability, Undecidability, and the Halting Problem32 min

Chapter 6: Computational Complexity

P vs NP: The Million-Dollar Question35 min

NP-Completeness and Polynomial Reductions38 min