Global index grammar

From HandWiki

Global index grammars (GIGs) are a class of grammars introduced in Castaño (2004)[1] in order to model a number of phenomena, including natural language grammar and genome grammar. The easiest description of GIGs is by comparison to Indexed grammars. Whereas in indexed grammars, a stack of indices is associated with each nonterminal symbol, and can vary from one to another depending on the course of the derivation, in a GIG, there is a single global index stack that is manipulated in the course of the derivation (which is strictly leftmost for any rewrite operation that pushes a symbol to the stack). Because of the existence of a global stack, a GIG derivation is considered complete when there are no non-terminal symbols left to be rewritten, and the stack is empty.

Rule Description

GIG rules come in essentially four forms: rules that do something unconditionally, rules that do something conditioned on the topmost symbol of the stack, rules that push to the stack, and rules that pop from the stack. We can notate these in turn as:

[math]\displaystyle{ A \to \alpha }[/math] (unconditionally rewrite A as α, doing nothing to the stack)
[math]\displaystyle{ A \xrightarrow[f]{} \alpha }[/math] (rewrite A as α if f is the topmost stack symbol, doing nothing to the stack)
[math]\displaystyle{ A \xrightarrow[+f]{} x \alpha }[/math] (unconditionally rewrite A as and push f to the stack)
[math]\displaystyle{ A \xrightarrow[-f]{} \alpha }[/math] (conditionally rewrite A as α if f is the topmost symbol of the stack, then pop f from the stack)

where f is any index symbol, α is any string of terminals and/or non-terminal symbols, and x is a terminal is a terminal symbol. Because occasionally a rewrite rule might need to be conditioned on the stack being in some sense empty, the symbol # is used as the bottom-most stack symbol, meaning an "empty" stack contains exactly one symbol, #.

The third rule form, the push rule, should be pointed out, as it differs from the pop rule in requiring that all push operations introduce at least one new terminal symbol to the derivation string. Without this constraint, the class of grammars would be Type-0 and thus Turing Complete.

Example

For this example, we will denote steps in the derivation by placing the derivation string over a stack, as in [math]\displaystyle{ \frac{abXd}{[ffg]} }[/math].

GIGs (but not trGIGs as below) can generate the non-indexed language [math]\displaystyle{ \{ ww^{+} : w \in \{a,b\}^{*} \} }[/math] using the following grammar:

[math]\displaystyle{ S \to AS ~|~ BS ~|~ C ~|~ \epsilon }[/math]
[math]\displaystyle{ C \to RC ~|~ L }[/math]
[math]\displaystyle{ R \xrightarrow[-f]{} RA }[/math]
[math]\displaystyle{ R \xrightarrow[-g]{} RB }[/math]
[math]\displaystyle{ R \xrightarrow[\#]{} \epsilon }[/math]
[math]\displaystyle{ A \xrightarrow[+f]{} a }[/math]
[math]\displaystyle{ B \xrightarrow[+g]{} b }[/math]
[math]\displaystyle{ L \xrightarrow[-f]{} La ~|~ a }[/math]
[math]\displaystyle{ L \xrightarrow[-g]{} Lb ~|~ b }[/math]

A derivation for the string ababab is as follows:

[math]\displaystyle{ \frac{S}{[\#]} \to \frac{AS}{[\#]} \to \frac{aS}{[\#f]} \to \frac{aBS}{[\#f]} \to \frac{abS}{[\#fg]} \to \frac{abC}{[\#fg]} \to \frac{abRC}{[\#fg]} \to \frac{abRBC}{[\#f]} \to }[/math]
[math]\displaystyle{ \frac{abRABC}{[\#]} \to \frac{abABC}{[\#]} \to \frac{abaBC}{[\#f]} \to \frac{ababC}{[\#fg]} \to \frac{ababL}{[\#fg]} \to \frac{ababLb}{[\#f]} \to \frac{ababab}{[\#]} }[/math]

A similar derivation follows for abbabbabb, aaabaaabaaabaaab, and other such sentences.

Computational Power

The global index languages are a subset of the context sensitive languages, and a superset of the context free languages. It is known that GIGs can generate the MIX/Bach language [math]\displaystyle{ \{ p(a^n b^n c^n) : n \geq 1 \} }[/math], where p is the string permutation function, which is conjectured (but not proven) not to be representable as an indexed language. It is not known whether or not all IGs are also GIGs. It is entirely possible that GIGs and IGs describe merely-overlapping subsets of the CSLs.

trGIGs

A subclass of GIGs is the class of trGIGs, which make the pop and push rules uniform, by requiring that pop rules also introduce at least one terminal symbol into the derivation.

Example

An example of such a grammar, characterizing the language [math]\displaystyle{ \{a^m b^n c^m d^n : m, n \geq 1 \} }[/math], is:

[math]\displaystyle{ \begin{array}{l} S \to AD \\ A \to aAc ~|~ aBc \\ B \xrightarrow[+f]{} bB ~|~ b \\ D \xrightarrow[-f]{} dD ~|~ d \end{array} }[/math]

The derivation for the string aabbbccddd is then:

[math]\displaystyle{ \begin{align} \frac{S}{[\#]} & \to \frac{AD}{[\#]} \to \frac{aAcD}{[\#]} \to \frac{aaBccD}{[\#]} \to \frac{aabBccD}{[\#f]} \to \frac{aabbBccD}{[\#ff]} \\ & \to \frac{aabbbccD}{[\#fff]} \to \frac{aabbbccdD}{[\#ff]} \to \frac{aabbbccddD}{[\#f]} \to \frac{aabbbccddd}{[\#]} \\ \end{align} }[/math]

References

  1. Castaño, José M. 2004. Global Index Languages. Dissertation, Brandeis University.