Inside–outside algorithm

From HandWiki
Short description: Parameter estimation method for probabilistic context-free grammars

For parsing algorithms in computer science, the inside–outside algorithm is a way of re-estimating production probabilities in a probabilistic context-free grammar. It was introduced by James K. Baker in 1979 as a generalization of the forward–backward algorithm for parameter estimation on hidden Markov models to stochastic context-free grammars. It is used to compute expectations, for example as part of the expectation–maximization algorithm (an unsupervised learning algorithm).

Inside and outside probabilities

The inside probability [math]\displaystyle{ \beta_j(p,q) }[/math] is the total probability of generating words [math]\displaystyle{ w_p \cdots w_q }[/math], given the root nonterminal [math]\displaystyle{ N^j }[/math] and a grammar [math]\displaystyle{ G }[/math]:[1]

[math]\displaystyle{ \beta_j(p,q) = P(w_{pq}|N^j_{pq}, G) }[/math]

The outside probability [math]\displaystyle{ \alpha_j(p,q) }[/math] is the total probability of beginning with the start symbol [math]\displaystyle{ N^1 }[/math] and generating the nonterminal [math]\displaystyle{ N^j_{pq} }[/math] and all the words outside [math]\displaystyle{ w_p \cdots w_q }[/math], given a grammar [math]\displaystyle{ G }[/math]:[1]

[math]\displaystyle{ \alpha_j(p,q) = P(w_{1(p-1)}, N^j_{pq}, w_{(q+1)m}|G) }[/math]

Computing inside probabilities

Base Case:

[math]\displaystyle{ \beta_j(p,p) = P(w_{p}|N^j, G) }[/math]

General case:

Suppose there is a rule [math]\displaystyle{ N_j \rightarrow N_r N_s }[/math] in the grammar, then the probability of generating [math]\displaystyle{ w_p \cdots w_q }[/math] starting with a subtree rooted at [math]\displaystyle{ N_j }[/math] is:

[math]\displaystyle{ \sum_{k=p}^{k=q-1} P(N_j \rightarrow N_r N_s)\beta_r(p,k) \beta_s(k+1,q) }[/math]

The inside probability [math]\displaystyle{ \beta_j(p,q) }[/math] is just the sum over all such possible rules:

[math]\displaystyle{ \beta_j(p,q) = \sum_{N_r,N_s} \sum_{k=p}^{k=q-1} P(N_j \rightarrow N_r N_s)\beta_r(p,k) \beta_s(k+1,q) }[/math]

Computing outside probabilities

Base Case:

[math]\displaystyle{ \alpha_j(1,n) = \begin{cases} 1 & \mbox{if } j=1 \\ 0 & \mbox{otherwise} \end{cases} }[/math]

Here the start symbol is [math]\displaystyle{ N_1 }[/math].

General case:

Suppose there is a rule [math]\displaystyle{ N_r \rightarrow N_j N_s }[/math] in the grammar that generates [math]\displaystyle{ N_j }[/math]. Then the left contribution of that rule to the outside probability [math]\displaystyle{ \alpha_j(p,q) }[/math] is:

[math]\displaystyle{ \sum_{k=q+1}^{k=n} P(N_r \rightarrow N_j N_s)\alpha_r(p,k) \beta_s(q+1,k) }[/math]

Now suppose there is a rule [math]\displaystyle{ N_r \rightarrow N_s N_j }[/math] in the grammar. Then the right contribution of that rule to the outside probability [math]\displaystyle{ \alpha_j(p,q) }[/math] is:

[math]\displaystyle{ \sum_{k=1}^{k=p-1} P(N_r \rightarrow N_s N_j)\alpha_r(k,q) \beta_s(k,p-1) }[/math]

The outside probability [math]\displaystyle{ \alpha_j(p,q) }[/math] is the sum of the left and right contributions over all such rules:

[math]\displaystyle{ \alpha_j(p,q) = \sum_{N_r,N_s} \sum_{k=q+1}^{k=n} P(N_r \rightarrow N_j N_s)\alpha_r(p,k) \beta_s(q+1,k) + \sum_{N_r,N_s} \sum_{k=1}^{k=p-1} P(N_r \rightarrow N_s N_j)\alpha_r(k,q) \beta_s(k,p-1) }[/math]

References

  1. 1.0 1.1 Manning, Christopher D.; Hinrich Schütze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press. pp. 388–402. ISBN 0-262-13360-1. https://archive.org/details/foundationsstati00mann_118. 

External links