Variable-order Markov model

From HandWiki
Short description: Markov-based processes with variable "memory"

In the mathematical theory of stochastic processes, variable-order Markov (VOM) models are an important class of models that extend the well known Markov chain models. In contrast to the Markov chain models, where each random variable in a sequence with a Markov property depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization.

This realization sequence is often called the context; therefore the VOM models are also called context trees.[1] VOM models are nicely rendered by colorized probabilistic suffix trees (PST).[2] The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as statistical analysis, classification and prediction.[3][4][5]

Example

Consider for example a sequence of random variables, each of which takes a value from the ternary alphabet {a, b, c}. Specifically, consider the string constructed from infinite concatenations of the sub-string aaabc: aaabcaaabcaaabcaaabc…aaabc.

The VOM model of maximal order 2 can approximate the above string using only the following five conditional probability components: Pr(a | aa) = 0.5, Pr(b | aa) = 0.5, Pr(c | b) = 1.0, Pr(a | c)= 1.0, Pr(a | ca) = 1.0.

In this example, Pr(c | ab) = Pr(c | b) = 1.0; therefore, the shorter context b is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0.

To construct the Markov chain of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: Pr(a | a), Pr(a | b), Pr(a | c), Pr(b | a), Pr(b | b), Pr(b | c), Pr(c | a), Pr(c | b), Pr(c | c). To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: Pr(a | aa), Pr(a | ab), , Pr(c | cc). And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: Pr(a | aaa), Pr(a | aab), , Pr(c | ccc).

In practical settings there is seldom sufficient data to accurately estimate the exponentially increasing number of conditional probability components as the order of the Markov chain increases.

The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."[1]

Definition

Let A be a state space (finite alphabet) of size [math]\displaystyle{ |A| }[/math].

Consider a sequence with the Markov property [math]\displaystyle{ x_1^{n}=x_1x_2\dots x_n }[/math] of n realizations of random variables, where [math]\displaystyle{ x_i\in A }[/math] is the state (symbol) at position i [math]\displaystyle{ \scriptstyle (1 \le i \le n) }[/math], and the concatenation of states [math]\displaystyle{ x_i }[/math] and [math]\displaystyle{ x_{i+1} }[/math] is denoted by [math]\displaystyle{ x_ix_{i+1} }[/math].

Given a training set of observed states, [math]\displaystyle{ x_1^{n} }[/math], the construction algorithm of the VOM models[3][4][5] learns a model P that provides a probability assignment for each state in the sequence given its past (previously observed symbols) or future states.

Specifically, the learner generates a conditional probability distribution [math]\displaystyle{ P(x_i\mid s) }[/math] for a symbol [math]\displaystyle{ x_i \in A }[/math] given a context [math]\displaystyle{ s\in A^* }[/math], where the * sign represents a sequence of states of any length, including the empty context.

VOM models attempt to estimate conditional distributions of the form [math]\displaystyle{ P(x_i\mid s) }[/math] where the context length [math]\displaystyle{ |s| \le D }[/math] varies depending on the available statistics. In contrast, conventional Markov models attempt to estimate these conditional distributions by assuming a fixed contexts' length [math]\displaystyle{ |s| = D }[/math] and, hence, can be considered as special cases of the VOM models.

Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order Markov models that leads to a better variance-bias tradeoff of the learned models.[3][4][5]

Application areas

Various efficient algorithms have been devised for estimating the parameters of the VOM model.[4]

VOM models have been successfully applied to areas such as machine learning, information theory and bioinformatics, including specific applications such as coding and data compression,[1] document compression,[4] classification and identification of DNA and protein sequences,[6] [1][3] statistical process control,[5] spam filtering,[7] haplotyping,[8] speech recognition,[9] sequence analysis in social sciences,[2] and others.

See also

References

  1. 1.0 1.1 1.2 Rissanen, J. (Sep 1983). "A Universal Data Compression System". IEEE Transactions on Information Theory 29 (5): 656–664. doi:10.1109/TIT.1983.1056741. 
  2. 2.0 2.1 Gabadinho, Alexis; Ritschard, Gilbert (2016). "Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package" (in en). Journal of Statistical Software 72 (3). doi:10.18637/jss.v072.i03. ISSN 1548-7660. http://www.jstatsoft.org/v72/i03/. 
  3. 3.0 3.1 3.2 3.3 Shmilovici, A.; Ben-Gal, I. (2007). "Using a VOM Model for Reconstructing Potential Coding Regions in EST Sequences". Computational Statistics 22 (1): 49–69. doi:10.1007/s00180-007-0021-8. 
  4. 4.0 4.1 4.2 4.3 4.4 Begleiter, R.; El-Yaniv, R.; Yona, G. (2004). "On Prediction Using Variable Order Markov models". Journal of Artificial Intelligence Research 22: 385–421. doi:10.1613/jair.1491. 
  5. 5.0 5.1 5.2 5.3 Ben-Gal, I.; Morag, G.; Shmilovici, A. (2003). "Context-Based Statistical Process Control: A Monitoring Procedure for State-Dependent Processes". Technometrics 45 (4): 293–311. doi:10.1198/004017003000000122. ISSN 0040-1706. http://www.eng.tau.ac.il/~bengal/Technometrics_final.pdf. 
  6. Grau J.; Ben-Gal I.; Posch S.; Grosse I. (2006). "VOMBAT: Prediction of Transcription Factor Binding Sites using Variable Order Bayesian Trees". Nucleic Acids Research (Nucleic Acids Research, vol. 34, issue W529–W533.) 34 (Web Server issue): W529-33. doi:10.1093/nar/gkl212. PMID 16845064. PMC 1538886. http://www.eng.tau.ac.il/~bengal/VOMBAT.pdf. 
  7. Bratko, A.; Cormack, G. V.; Filipic, B.; Lynam, T.; Zupan, B. (2006). "Spam Filtering Using Statistical Data Compression Models". Journal of Machine Learning Research 7: 2673–2698. http://www.jmlr.org/papers/volume7/bratko06a/bratko06a.pdf. 
  8. Browning, Sharon R. "Multilocus association mapping using variable-length Markov chains." The American Journal of Human Genetics 78.6 (2006): 903–913.
  9. Smith, A.; Denenberg, J.; Slack, T.; Tan, C.; Wohlford, R. (1985). "Application of a sequential pattern learning system to connected speech recognition". ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing. 10. Tampa, FL, USA: Institute of Electrical and Electronics Engineers. pp. 1201–1204. doi:10.1109/ICASSP.1985.1168282. https://ieeexplore.ieee.org/document/1168282.