Unrestricted grammar

From HandWiki
Short description: Language Theory

In automata theory, the class of unrestricted grammars (also called semi-Thue, type-0 or phrase structure grammars) is the most general class of grammars in the Chomsky hierarchy. No restrictions are made on the productions of an unrestricted grammar, other than each of their left-hand sides being non-empty.[1]:220 This grammar class can generate arbitrary recursively enumerable languages.

Formal definition

An unrestricted grammar is a formal grammar [math]\displaystyle{ G = (N, T, P, S) }[/math], where

  • [math]\displaystyle{ N }[/math] is a finite set of nonterminal symbols,
  • [math]\displaystyle{ T }[/math] is a finite set of terminal symbols with [math]\displaystyle{ N }[/math] and [math]\displaystyle{ T }[/math] disjoint,[note 1]
  • [math]\displaystyle{ P }[/math] is a finite set of production rules of the form [math]\displaystyle{ \alpha \to \beta , }[/math] where [math]\displaystyle{ \alpha }[/math] and [math]\displaystyle{ \beta }[/math] are strings of symbols in [math]\displaystyle{ N \cup T }[/math] and [math]\displaystyle{ \alpha }[/math] is not the empty string, and
  • [math]\displaystyle{ S \in N }[/math] is a specially designated start symbol.[1]:220

As the name implies, there are no real restrictions on the types of production rules that unrestricted grammars can have.[note 2]

Equivalence to Turing machines

The unrestricted grammars characterize the recursively enumerable languages. This is the same as saying that for every unrestricted grammar [math]\displaystyle{ G }[/math] there exists some Turing machine capable of recognizing [math]\displaystyle{ L(G) }[/math] and vice versa. Given an unrestricted grammar, such a Turing machine is simple enough to construct, as a two-tape nondeterministic Turing machine.[1]:221 The first tape contains the input word [math]\displaystyle{ w }[/math] to be tested, and the second tape is used by the machine to generate sentential forms from [math]\displaystyle{ G }[/math]. The Turing machine then does the following:

  1. Start at the left of the second tape and repeatedly choose to move right or select the current position on the tape.
  2. Nondeterministically choose a production [math]\displaystyle{ \beta \to \gamma }[/math] from the productions in [math]\displaystyle{ G }[/math].
  3. If [math]\displaystyle{ \beta }[/math] appears at some position on the second tape, replace [math]\displaystyle{ \beta }[/math] by [math]\displaystyle{ \gamma }[/math] at that point, possibly shifting the symbols on the tape left or right depending on the relative lengths of [math]\displaystyle{ \beta }[/math] and [math]\displaystyle{ \gamma }[/math] (e.g. if [math]\displaystyle{ \beta }[/math] is longer than [math]\displaystyle{ \gamma }[/math], shift the tape symbols left).
  4. Compare the resulting sentential form on tape 2 to the word on tape 1. If they match, then the Turing machine accepts the word. If they don't, the Turing machine will go back to step 1.

It is easy to see that this Turing machine will generate all and only the sentential forms of [math]\displaystyle{ G }[/math] on its second tape after the last step is executed an arbitrary number of times, thus the language [math]\displaystyle{ L(G) }[/math] must be recursively enumerable.

The reverse construction is also possible. Given some Turing machine, it is possible to create an equivalent unrestricted grammar[1]:222 which even uses only productions with one or more non-terminal symbols on their left-hand sides. Therefore, an arbitrary unrestricted grammar can always be equivalently converted to obey the latter form, by converting it to a Turing machine and back again. Some authors[citation needed] use the latter form as definition of unrestricted grammar.

Computational properties

The decision problem of whether a given string [math]\displaystyle{ s }[/math] can be generated by a given unrestricted grammar is equivalent to the problem of whether it can be accepted by the Turing machine equivalent to the grammar. The latter problem is called the Halting problem and is undecidable.

Recursively enumerable languages are closed under Kleene star, concatenation, union, and intersection, but not under set difference; see Recursively enumerable language.

The equivalence of unrestricted grammars to Turing machines implies the existence of a universal unrestricted grammar, a grammar capable of accepting any other unrestricted grammar's language given a description of the language. For this reason, it is theoretically possible to build a programming language based on unrestricted grammars (e.g. Thue).

See also

Notes

  1. Actually, [math]\displaystyle{ T\cap N=\emptyset }[/math] is not strictly necessary since unrestricted grammars make no real distinction between the two. The designation exists purely so that one knows when to stop generating sentential forms of the grammar; more precisely, the language [math]\displaystyle{ L(G) }[/math] recognized by [math]\displaystyle{ G }[/math] is restricted to strings of terminal symbols.
  2. While Hopcroft and Ullman (1979) do not mention the cardinalities of [math]\displaystyle{ N }[/math], [math]\displaystyle{ T }[/math], [math]\displaystyle{ P }[/math] explicitly, the proof of their Theorem 9.3 (construction of an equivalent Turing machine from a given unrestricted grammar, p.221, cf. Section #Equivalence to Turing machines) tacitly requires finiteness of [math]\displaystyle{ P }[/math] and finite lengths of all strings in rules of [math]\displaystyle{ P }[/math]. Any member of [math]\displaystyle{ N }[/math] or [math]\displaystyle{ T }[/math] that does not occur in [math]\displaystyle{ P }[/math] can be omitted without affecting the generated language.

References

  1. 1.0 1.1 1.2 1.3 Hopcroft, John; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation (1st ed.). Addison-Wesley. ISBN 0-201-44124-1.