Generalized context-free grammar

From HandWiki
Revision as of 03:03, 11 May 2022 by imported>AIposter (simplify)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Generalized context-free grammar (GCFG) is a grammar formalism that expands on context-free grammars by adding potentially non-context-free composition functions to rewrite rules.[1] Head grammar (and its weak equivalents) is an instance of such a GCFG which is known to be especially adept at handling a wide variety of non-CF properties of natural language.

Description

A GCFG consists of two components: a set of composition functions that combine string tuples, and a set of rewrite rules. The composition functions all have the form [math]\displaystyle{ f(\langle x_1, ..., x_m \rangle, \langle y_1, ..., y_n \rangle, ...) = \gamma }[/math], where [math]\displaystyle{ \gamma }[/math] is either a single string tuple, or some use of a (potentially different) composition function which reduces to a string tuple. Rewrite rules look like [math]\displaystyle{ X \to f(Y, Z, ...) }[/math], where [math]\displaystyle{ Y }[/math], [math]\displaystyle{ Z }[/math], ... are string tuples or non-terminal symbols.

The rewrite semantics of GCFGs is fairly straightforward. An occurrence of a non-terminal symbol is rewritten using rewrite rules as in a context-free grammar, eventually yielding just compositions (composition functions applied to string tuples or other compositions). The composition functions are then applied, successively reducing the tuples to a single tuple.

Example

A simple translation of a context-free grammar into a GCFG can be performed in the following fashion. Given the grammar in (1), which generates the palindrome language [math]\displaystyle{ \{ ww^R : w \in \{a, b\}^{*} \} }[/math], where [math]\displaystyle{ w^R }[/math] is the string reverse of [math]\displaystyle{ w }[/math], we can define the composition function conc as in (2a) and the rewrite rules as in (2b).

[math]\displaystyle{ S \to \epsilon ~|~ aSa ~|~ bSb }[/math]

 

 

 

 

(1)

[math]\displaystyle{ conc(\langle x \rangle, \langle y \rangle, \langle z \rangle) = \langle xyz \rangle }[/math]

 

 

 

 

(2a)

[math]\displaystyle{ S \to conc(\langle \epsilon \rangle, \langle \epsilon \rangle, \langle \epsilon \rangle) ~|~ conc(\langle a \rangle, S, \langle a \rangle) ~|~ conc(\langle b \rangle, S, \langle b \rangle) }[/math]

 

 

 

 

(2b)

The CF production of abbbba is




and the corresponding GCFG production is

[math]\displaystyle{ S \to conc(\langle a \rangle, S, \langle a \rangle) }[/math]
[math]\displaystyle{ conc(\langle a \rangle, conc(\langle b \rangle, S, \langle b \rangle), \langle a \rangle) }[/math]
[math]\displaystyle{ conc(\langle a \rangle, conc(\langle b \rangle, conc(\langle b \rangle, S, \langle b \rangle), \langle b \rangle), \langle a \rangle) }[/math]
[math]\displaystyle{ conc(\langle a \rangle, conc(\langle b \rangle, conc(\langle b \rangle, conc(\langle \epsilon \rangle, \langle \epsilon \rangle, \langle \epsilon \rangle), \langle b \rangle), \langle b \rangle), \langle a \rangle) }[/math]
[math]\displaystyle{ conc(\langle a \rangle, conc(\langle b \rangle, conc(\langle b \rangle, \langle \epsilon \rangle, \langle b \rangle), \langle b \rangle), \langle a \rangle) }[/math]
[math]\displaystyle{ conc(\langle a \rangle, conc(\langle b \rangle, \langle bb \rangle, \langle b \rangle), \langle a \rangle) }[/math]
[math]\displaystyle{ conc(\langle a \rangle, \langle bbbb \rangle, \langle a \rangle) }[/math]
[math]\displaystyle{ \langle abbbba \rangle }[/math]

Linear Context-free Rewriting Systems (LCFRSs)

Weir (1988)[1] describes two properties of composition functions, linearity and regularity. A function defined as [math]\displaystyle{ f(x_1, ..., x_n) = ... }[/math] is linear if and only if each variable appears at most once on either side of the =, making [math]\displaystyle{ f(x) = g(x, y) }[/math] linear but not [math]\displaystyle{ f(x) = g(x, x) }[/math]. A function defined as [math]\displaystyle{ f(x_1, ..., x_n) = ... }[/math] is regular if the left hand side and right hand side have exactly the same variables, making [math]\displaystyle{ f(x, y) = g(y, x) }[/math] regular but not [math]\displaystyle{ f(x) = g(x, y) }[/math] or [math]\displaystyle{ f(x, y) = g(x) }[/math].

A grammar in which all composition functions are both linear and regular is called a Linear Context-free Rewriting System (LCFRS). LCFRS is a proper subclass of the GCFGs, i.e. it has strictly less computational power than the GCFGs as a whole.

On the other hand, LCFRSs are strictly more expressive than linear-indexed grammars and their weakly equivalent variant tree adjoining grammars (TAGs).[2] Head grammar is another example of an LCFRS that is strictly less powerful than the class of LCFRSs as a whole.

LCFRS are weakly equivalent to (set-local) multicomponent TAGs (MCTAGs) and also with multiple context-free grammar (MCFGs [1]).[3] and minimalist grammars (MGs). The languages generated by LCFRS (and their weakly equivalents) can be parsed in polynomial time.[4]

See also

References

  1. 1.0 1.1 Weir, David Jeremy (Sep 1988). Characterizing mildly context-sensitive grammar formalisms (PDF) (Ph.D.). Paper. AAI8908403. University of Pennsylvania Ann Arbor.
  2. Laura Kallmeyer (2010). Parsing Beyond Context-Free Grammars. Springer Science & Business Media. p. 33. ISBN 978-3-642-14846-0. https://books.google.com/books?id=F5wC0dko1L4C&pg=PA33. 
  3. Laura Kallmeyer (2010). Parsing Beyond Context-Free Grammars. Springer Science & Business Media. p. 35-36. ISBN 978-3-642-14846-0. https://books.google.com/books?id=F5wC0dko1L4C&pg=PA35. 
  4. Johan F.A.K. van Benthem; Alice ter Meulen (2010). Handbook of Logic and Language (2nd ed.). Elsevier. p. 404. ISBN 978-0-444-53727-0. https://books.google.com/books?id=K7yJLmZCbFUC&pg=PA404.