We already know that some languages — like {0ⁿ1ⁿ} — lie beyond the reach of regular expressions but are well within the power of context-free grammars. A natural question follows: are ALL non-regular languages context-free? Can a pushdown automaton recognise anything a finite automaton cannot? The answer is decisively no. The language {aⁿbⁿcⁿ | n ≥ 0} — equal numbers of three different symbols — is non-regular, yet no context-free grammar or pushdown automaton can recognise it either. Proving this requires the Pumping Lemma for Context-Free Languages, a tool that exposes the fundamental limitation of stack-based computation.

Motivation: Why We Need a New Tool

To prove a language L is not regular, we used the Pumping Lemma for regular languages: any sufficiently long string in L can have one middle substring "pumped" (repeated or removed) and remain in L. If pumping breaks membership, L is not regular.

For context-free languages, the analogous lemma is more powerful — but also more nuanced. The stack in a PDA creates a fundamentally nested, paired structure. When a derivation tree becomes deep enough, some variable must repeat, and that repetition creates two substrings that get pumped simultaneously — one on each side of a centre piece.

The Pumping Lemma for CFLs

Theorem (Sipser, 2013, Theorem 2.34): If L is a context-free language, then there exists a constant p ≥ 1 (the pumping length) such that for every string w ∈ L with |w| ≥ p, w can be written as:

w = u v x y z

where all five conditions hold:

|vy| ≥ 1 — at least one of v or y is non-empty (something is actually pumped).
|vxy| ≤ p — the "pumped middle" is not too long.
For all k ≥ 0: uv^k xy^k z ∈ L — pumping v and y simultaneously (same number of times) keeps the string in L.

The critical structural difference from the regular Pumping Lemma: two substrings are pumped, not one. When k = 0, both v and y are deleted; when k = 2, both are doubled; and so on — always in lockstep.

Why Two Substrings? The Parse Tree Argument

Consider a parse tree for a long string w using a CNF grammar with |V| variables. By the tree's structure, every leaf is a terminal and every internal node is a variable. If the string is long enough (specifically, longer than 2^|V|), the parse tree must have a path from root to leaf of length greater than |V|. By the Pigeonhole Principle, some variable R must appear twice on that path.

Because R appears twice, you can:

Pump up (k ≥ 2): replace the lower copy of R's subtree with the upper copy's subtree — duplicating both v and y.
Pump down (k = 0): replace the upper copy of R's subtree with the lower copy — removing both v and y.

The substrings v and y are the "flanking" pieces generated by the two different subtrees rooted at R. The piece x is the portion derived from the innermost R. The piece u is generated before the first R, and z is generated after.

Chomsky Hierarchy with Examples

Proof: {aⁿbⁿcⁿ} Is Not Context-Free

Claim: L = {aⁿbⁿcⁿ | n ≥ 0} is not a CFL.

Proof by contradiction: Assume L is context-free. Let p be the pumping length from the Pumping Lemma.

Choose w = a^p b^p c^p. Then w ∈ L and |w| = 3p ≥ p.

By the Pumping Lemma, w = uvxyz where |vy| ≥ 1 and |vxy| ≤ p.

Case analysis: Since |vxy| ≤ p, the combined string vxy cannot span all three symbol types without exceeding p characters. So vxy lies entirely within one or two adjacent symbol blocks:

Case 1: vxy lies within the a-block only. Then v and y contain only a's. Pumping (k = 2) increases the a-count but leaves b-count and c-count unchanged. The resulting string has more a's than b's or c's. Not in L. Contradiction.
Case 2: vxy spans a's and b's (but no c's). Then v and y together contain only a's and b's. Pumping (k = 2) increases the count of a's and/or b's but leaves c-count fixed. The resulting string has too many a's or b's relative to c's. Not in L. Contradiction.
Case 3: vxy lies within the b-block only. Same argument as Case 1 — c's are unchanged. Contradiction.
Case 4: vxy spans b's and c's. Same argument as Case 2 — a's are unchanged. Contradiction.
Case 5: vxy lies within the c-block only. Pumping (k = 2) increases c-count but leaves a's and b's unchanged. Not in L. Contradiction.

In every case, pumping produces a string outside L. Since the Pumping Lemma says pumping should keep strings in L, our assumption that L is context-free must be false. Therefore L is not context-free. ∎

Proof: {ww | w ∈ {0,1}*} Is Not Context-Free

The language of doubled strings (a string concatenated with itself) is also non-context-free. Intuitively, recognising ww requires remembering all of w to compare it against the second copy — but a stack can only compare things in a LIFO order, not by arbitrary position. The formal proof uses the same pumping argument: choose w = 0^p 1^p 0^p 1^p and show that pumping the two simultaneous substrings always destroys the doubling structure.

The Landscape of Languages

Language	Regular?	CFL?	Proof Tool
`(ab)*`	Yes	Yes	DFA construction
`{0ⁿ1ⁿ}`	No	Yes	Pumping (Regular) + PDA construction
`{ww^R}` (palindromes)	No	Yes	Pumping (Regular) + PDA construction
`{aⁿbⁿcⁿ}`	No	No	Pumping (CFL)
`{ww}`	No	No	Pumping (CFL)
Arithmetic expressions	No	Yes	CFG construction

Programming Language Implication

{aⁿbⁿcⁿ} is more than a theoretical exercise. It captures a pattern that arises naturally in programming: a language might require that the number of function parameters at declaration, the number at call-site, and the number in the return type all match. That three-way matching is precisely the {aⁿbⁿcⁿ} structure — and it is not context-free.

This is why type checking cannot be done by a CFG alone. A CFG can verify that a function call is syntactically structured correctly (correct parentheses, commas in the right places), but checking that the number and types of arguments match the declaration requires semantic analysis — a later compiler phase that uses symbol tables, not grammar rules. The PDA runs first to verify structure; then a separate system handles the constraints that are inherently beyond context-free power.

Sources: Sipser, M. (2013). Introduction to the Theory of Computation (3rd ed.). Cengage. | Bar-Hillel, Y., Perles, M., & Shamir, E. (1961). On formal properties of simple phrase structure grammars. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung, 14(2), 143–172. | Hopcroft, J. E., Motwani, R., & Ullman, J. D. (2006). Introduction to Automata Theory, Languages, and Computation (3rd ed.). Pearson.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

25 minLesson 12 of 16

Course Contents(16 lessons)

▾

Chapter 1: Mathematical Foundations

What Is Theory of Computation? Why It Matters20 min

Mathematical Foundations: Sets, Functions, Proofs28 min

Chapter 2: Finite Automata

Deterministic Finite Automata (DFA): States and Transitions35 min

Non-Deterministic Finite Automata (NFA)32 min

NFA to DFA: Subset Construction Algorithm30 min

Chapter 3: Regular Languages

Regular Expressions: Pattern Matching Formalized32 min

Pumping Lemma: Proving Languages Are Not Regular28 min

Closure Properties of Regular Languages25 min

Chapter 4: Context-Free Languages

Context-Free Grammars: Defining Programming Languages35 min

Pushdown Automata: Adding a Stack to Finite Automata32 min

CNF and the CYK Parsing Algorithm30 min

Pumping Lemma for Context-Free Languages25 min

Chapter 5: Computability

Turing Machines: The Universal Computer38 min

Decidability, Undecidability, and the Halting Problem32 min

Chapter 6: Computational Complexity

P vs NP: The Million-Dollar Question35 min

NP-Completeness and Polynomial Reductions38 min