AiTechWorlds
AiTechWorlds
We already know that some languages — like {0ⁿ1ⁿ} — lie beyond the reach of regular expressions but are well within the power of context-free grammars. A natural question follows: are ALL non-regular languages context-free? Can a pushdown automaton recognise anything a finite automaton cannot? The answer is decisively no. The language {aⁿbⁿcⁿ | n ≥ 0} — equal numbers of three different symbols — is non-regular, yet no context-free grammar or pushdown automaton can recognise it either. Proving this requires the Pumping Lemma for Context-Free Languages, a tool that exposes the fundamental limitation of stack-based computation.
To prove a language L is not regular, we used the Pumping Lemma for regular languages: any sufficiently long string in L can have one middle substring "pumped" (repeated or removed) and remain in L. If pumping breaks membership, L is not regular.
For context-free languages, the analogous lemma is more powerful — but also more nuanced. The stack in a PDA creates a fundamentally nested, paired structure. When a derivation tree becomes deep enough, some variable must repeat, and that repetition creates two substrings that get pumped simultaneously — one on each side of a centre piece.
Theorem (Sipser, 2013, Theorem 2.34): If L is a context-free language, then there exists a constant p ≥ 1 (the pumping length) such that for every string w ∈ L with |w| ≥ p, w can be written as:
w = u v x y z
where all five conditions hold:
The critical structural difference from the regular Pumping Lemma: two substrings are pumped, not one. When k = 0, both v and y are deleted; when k = 2, both are doubled; and so on — always in lockstep.
Consider a parse tree for a long string w using a CNF grammar with |V| variables. By the tree's structure, every leaf is a terminal and every internal node is a variable. If the string is long enough (specifically, longer than 2^|V|), the parse tree must have a path from root to leaf of length greater than |V|. By the Pigeonhole Principle, some variable R must appear twice on that path.
Because R appears twice, you can:
The substrings v and y are the "flanking" pieces generated by the two different subtrees rooted at R. The piece x is the portion derived from the innermost R. The piece u is generated before the first R, and z is generated after.
Claim: L = {aⁿbⁿcⁿ | n ≥ 0} is not a CFL.
Proof by contradiction: Assume L is context-free. Let p be the pumping length from the Pumping Lemma.
Choose w = a^p b^p c^p. Then w ∈ L and |w| = 3p ≥ p.
By the Pumping Lemma, w = uvxyz where |vy| ≥ 1 and |vxy| ≤ p.
Case analysis: Since |vxy| ≤ p, the combined string vxy cannot span all three symbol types without exceeding p characters. So vxy lies entirely within one or two adjacent symbol blocks:
In every case, pumping produces a string outside L. Since the Pumping Lemma says pumping should keep strings in L, our assumption that L is context-free must be false. Therefore L is not context-free. ∎
The language of doubled strings (a string concatenated with itself) is also non-context-free. Intuitively, recognising ww requires remembering all of w to compare it against the second copy — but a stack can only compare things in a LIFO order, not by arbitrary position. The formal proof uses the same pumping argument: choose w = 0^p 1^p 0^p 1^p and show that pumping the two simultaneous substrings always destroys the doubling structure.
| Language | Regular? | CFL? | Proof Tool |
|---|---|---|---|
(ab)* | Yes | Yes | DFA construction |
{0ⁿ1ⁿ} | No | Yes | Pumping (Regular) + PDA construction |
{ww^R} (palindromes) | No | Yes | Pumping (Regular) + PDA construction |
{aⁿbⁿcⁿ} | No | No | Pumping (CFL) |
{ww} | No | No | Pumping (CFL) |
| Arithmetic expressions | No | Yes | CFG construction |
{aⁿbⁿcⁿ} is more than a theoretical exercise. It captures a pattern that arises naturally in programming: a language might require that the number of function parameters at declaration, the number at call-site, and the number in the return type all match. That three-way matching is precisely the {aⁿbⁿcⁿ} structure — and it is not context-free.
This is why type checking cannot be done by a CFG alone. A CFG can verify that a function call is syntactically structured correctly (correct parentheses, commas in the right places), but checking that the number and types of arguments match the declaration requires semantic analysis — a later compiler phase that uses symbol tables, not grammar rules. The PDA runs first to verify structure; then a separate system handles the constraints that are inherently beyond context-free power.
Sources: Sipser, M. (2013). Introduction to the Theory of Computation (3rd ed.). Cengage. | Bar-Hillel, Y., Perles, M., & Shamir, E. (1961). On formal properties of simple phrase structure grammars. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung, 14(2), 143–172. | Hopcroft, J. E., Motwani, R., & Ullman, J. D. (2006). Introduction to Automata Theory, Languages, and Computation (3rd ed.). Pearson.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises