Pumping lemma for context-free languages
In computer science, in particular in formal language theory, the pumping lemma for context-free languages, also known as the Bar-Hillel lemma,[1] is a lemma that gives a property shared by all context-free languages and generalizes the pumping lemma for regular languages.
The pumping lemma can be used to construct a proof by contradiction that a specific language is not context-free. Conversely, the pumping lemma does not suffice to guarantee that a language is context-free; there are other necessary conditions, such as Ogden's lemma, or the Interchange lemma.
Formal statement
If a language [math]\displaystyle{ L }[/math] is context-free, then there exists some integer [math]\displaystyle{ p \geq 1 }[/math] (called a "pumping length")[2] such that every string [math]\displaystyle{ s }[/math] in [math]\displaystyle{ L }[/math] that has a length of [math]\displaystyle{ p }[/math] or more symbols (i.e. with [math]\displaystyle{ |s| \geq p }[/math]) can be written as
- [math]\displaystyle{ s = uvwxy }[/math]
with substrings [math]\displaystyle{ u, v, w, x }[/math] and [math]\displaystyle{ y }[/math], such that
- 1. [math]\displaystyle{ |vx| \geq 1 }[/math],
- 2. [math]\displaystyle{ |vwx| \leq p }[/math], and
- 3. [math]\displaystyle{ uv^n wx^n y \in L }[/math] for all [math]\displaystyle{ n \geq 0 }[/math].
Below is a formal expression of the Pumping Lemma.
[math]\displaystyle{ \begin{array}{l} (\forall L\subseteq \Sigma^*) \\ \quad (\mbox{context free}(L) \Rightarrow \\ \quad ((\exists p\geq 1) ((\forall s\in L) ((|s|\geq p) \Rightarrow \\ \quad ((\exists u,v,w,x,y \in \Sigma^*) (s=uvwxy \land |vx|\geq 1 \land |vwx|\leq p \land (\forall n \geq 0) (uv^nwx^ny\in L))))))) \end{array} }[/math]
Informal statement and explanation
The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.
The property is a property of all strings in the language that are of length at least [math]\displaystyle{ p }[/math], where [math]\displaystyle{ p }[/math] is a constant—called the pumping length—that varies between context-free languages.
Say [math]\displaystyle{ s }[/math] is a string of length at least [math]\displaystyle{ p }[/math] that is in the language.
The pumping lemma states that [math]\displaystyle{ s }[/math] can be split into five substrings, [math]\displaystyle{ s = uvwxy }[/math], where [math]\displaystyle{ vx }[/math] is non-empty and the length of [math]\displaystyle{ vwx }[/math] is at most [math]\displaystyle{ p }[/math], such that repeating [math]\displaystyle{ v }[/math] and [math]\displaystyle{ x }[/math] the same number of times ([math]\displaystyle{ n }[/math]) in [math]\displaystyle{ s }[/math] produces a string that is still in the language. It is often useful to repeat zero times, which removes [math]\displaystyle{ v }[/math] and [math]\displaystyle{ x }[/math] from the string. This process of "pumping up" [math]\displaystyle{ s }[/math] with additional copies of [math]\displaystyle{ v }[/math] and [math]\displaystyle{ x }[/math] is what gives the pumping lemma its name.
Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having [math]\displaystyle{ p }[/math] equal to the maximum string length in [math]\displaystyle{ L }[/math] plus one. As there are no strings of this length the pumping lemma is not violated.
Usage of the lemma
The pumping lemma is often used to prove that a given language L is non-context-free, by showing that arbitrarily long strings s are in L that cannot be "pumped" without producing strings outside L.
For example, if [math]\displaystyle{ S\subset \N }[/math] is infinite but does not contain an (infinite) arithmetic progression, then [math]\displaystyle{ L = \{a^{n} : n\in S\} }[/math] is not context-free. In particular, neither the prime numbers nor the square numbers are context-free.
For example, the language [math]\displaystyle{ L = \{ a^n b^n c^n | n \gt 0 \} }[/math] can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that L is context free. By the pumping lemma, there exists an integer p which is the pumping length of language L. Consider the string [math]\displaystyle{ s = a^p b^p c^p }[/math] in L. The pumping lemma tells us that s can be written in the form [math]\displaystyle{ s = uvwxy }[/math], where u, v, w, x, and y are substrings, such that [math]\displaystyle{ |vx| \geq 1 }[/math], [math]\displaystyle{ |vwx| \leq p }[/math], and [math]\displaystyle{ uv^i wx^i y \in L }[/math] for every integer [math]\displaystyle{ i \geq 0 }[/math]. By the choice of s and the fact that [math]\displaystyle{ |vwx| \leq p }[/math], it is easily seen that the substring vwx can contain no more than two distinct symbols. That is, we have one of five possibilities for vwx:
- [math]\displaystyle{ vwx = a^j }[/math] for some [math]\displaystyle{ j \leq p }[/math].
- [math]\displaystyle{ vwx = a^j b^k }[/math] for some j and k with [math]\displaystyle{ j+k \leq p }[/math]
- [math]\displaystyle{ vwx = b^j }[/math] for some [math]\displaystyle{ j \leq p }[/math].
- [math]\displaystyle{ vwx = b^j c^k }[/math] for some j and k with [math]\displaystyle{ j+k \leq p }[/math].
- [math]\displaystyle{ vwx = c^j }[/math] for some [math]\displaystyle{ j \leq p }[/math].
For each case, it is easily verified that [math]\displaystyle{ uv^i wx^i y }[/math] does not contain equal numbers of each letter for any [math]\displaystyle{ i \neq 1 }[/math]. Thus, [math]\displaystyle{ uv^2 wx^2 y }[/math] does not have the form [math]\displaystyle{ a^i b^i c^i }[/math]. This contradicts the definition of L. Therefore, our initial assumption that L is context free must be false.
While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free. On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example
- [math]\displaystyle{ L = \{ b^j c^k d^l | j, k, l \in \mathbb{N} \} \cup \{ a^i b^j c^j d^j | i, j \in \mathbb{N}, i \ge 1\} }[/math]
for s=bjckdl with e.g. j≥1 choose vwx to consist only of b's, for s=aibjcjdj choose vwx to consist only of a's; in both cases all pumped strings are still in L.[3]
A precursor of the pumping lemma was used in 1960 by Scheinberg to prove that [math]\displaystyle{ L = \{ a^n b^n a^n | n \gt 0 \} }[/math] is not context-free.[4]
References
- ↑ Kreowski, Hans-Jörg (1979). "A pumping lemma for context-free graph languages". in Claus, Volker; Ehrig, Hartmut; Rozenberg, Grzegorz (in en). Graph-Grammars and Their Application to Computer Science and Biology. Lecture Notes in Computer Science. 73. Berlin, Heidelberg: Springer. pp. 270–283. doi:10.1007/BFb0025726. ISBN 978-3-540-35091-0. https://link.springer.com/chapter/10.1007%2FBFb0025726.
- ↑ Berstel, Jean; Lauve, Aaron; Reutenauer, Christophe; Saliola, Franco V. (2009). Combinatorics on words. Christoffel words and repetitions in words. CRM Monograph Series. 27. Providence, RI: American Mathematical Society. p. 90. ISBN 978-0-8218-4480-9. http://webpages.math.luc.edu/~lauve/papers/wordsbook.pdf. (Also see [www-igm.univ-mlv.fr/~berstel/ Aaron Berstel's website)
- ↑ John E. Hopcroft, Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X. https://archive.org/details/introductiontoau00hopc. Here: sect.6.1, p.129
- ↑ Stephen Scheinberg (1960). "Note on the Boolean Properties of Context Free Languages". Information and Control 3 (4): 372–375. doi:10.1016/s0019-9958(60)90965-7. https://core.ac.uk/download/pdf/82210847.pdf. Here: Lemma 3, and its use on p.374-375.
- Bar-Hillel, Y.; Micha Perles; Eli Shamir (1961). "On formal properties of simple phrase-structure grammars". Zeitschrift für Phonetik, Sprachwissenschaft, und Kommunikationsforschung 14 (2): 143–172. — Reprinted in: Y. Bar-Hillel (1964). Language and Information: Selected Essays on their Theory and Application. Addison-Wesley series in logic. Addison-Wesley. pp. 116–150. ISBN 0201003732. OCLC 783543642.
- Michael Sipser (1997). Introduction to the Theory of Computation. PWS Publishing. ISBN 0-534-94728-X. https://archive.org/details/introductiontoth00sips. Section 1.4: Nonregular Languages, pp. 77–83. Section 2.3: Non-context-free Languages, pp. 115–119.
Original source: https://en.wikipedia.org/wiki/Pumping lemma for context-free languages.
Read more |