Margin-infused relaxed algorithm

From HandWiki

Margin-infused relaxed algorithm (MIRA)[1] is a machine learning algorithm, an online algorithm for multiclass classification problems. It is designed to learn a set of parameters (vector or matrix) by processing all the given training examples one-by-one and updating the parameters according to each training example, so that the current training example is classified correctly with a margin against incorrect classifications at least as large as their loss.[2] The change of the parameters is kept as small as possible.

A two-class version called binary MIRA[1] simplifies the algorithm by not requiring the solution of a quadratic programming problem (see below). When used in a one-vs-all configuration, binary MIRA can be extended to a multiclass learner that approximates full MIRA, but may be faster to train.

The flow of the algorithm[3][4] looks as follows:

Algorithm MIRA
  Input: Training examples [math]\displaystyle{ T = \{x_i, y_i\} }[/math]
  Output: Set of parameters [math]\displaystyle{ w }[/math]
  [math]\displaystyle{ i }[/math] ← 0, [math]\displaystyle{ w^{(0)} }[/math] ← 0
  for [math]\displaystyle{ n }[/math] ← 1 to [math]\displaystyle{ N }[/math]
    for [math]\displaystyle{ t }[/math] ← 1 to [math]\displaystyle{ |T| }[/math]
      [math]\displaystyle{ w^{(i+1)} }[/math] ← update [math]\displaystyle{ w^{(i)} }[/math] according to [math]\displaystyle{ \{x_t, y_t\} }[/math]
      [math]\displaystyle{ i }[/math][math]\displaystyle{ i + 1 }[/math]
    end for
  end for
  return [math]\displaystyle{ \frac{\sum_{j=1}^{N \times |T|} w^{(j)}}{N \times |T|} }[/math]
  • "←" denotes assignment. For instance, "largestitem" means that the value of largest changes to the value of item.
  • "return" terminates the algorithm and outputs the following value.

The update step is then formalized as a quadratic programming[2] problem: Find [math]\displaystyle{ min\|w^{(i+1)} - w^{(i)}\| }[/math], so that [math]\displaystyle{ score(x_t,y_t) - score(x_t,y')\geq L(y_t,y')\ \forall y' }[/math], i.e. the score of the current correct training [math]\displaystyle{ y }[/math] must be greater than the score of any other possible [math]\displaystyle{ y' }[/math] by at least the loss (number of errors) of that [math]\displaystyle{ y' }[/math] in comparison to [math]\displaystyle{ y }[/math].

References

  1. 1.0 1.1 Crammer, Koby; Singer, Yoram (2003). "Ultraconservative Online Algorithms for Multiclass Problems". Journal of Machine Learning Research 3: 951–991. http://jmlr.csail.mit.edu/papers/v3/crammer03a.html. 
  2. 2.0 2.1 McDonald, Ryan; Crammer, Koby; Pereira, Fernando (2005). "Online Large-Margin Training of Dependency Parsers". Association for Computational Linguistics. pp. 91–98. http://aclweb.org/anthology-new/P/P05/P05-1012.pdf. 
  3. Watanabe, T. et al (2007): "Online Large Margin Training for Statistical Machine Translation". In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 764–773.
  4. Bohnet, B. (2009): Efficient Parsing of Syntactic and Semantic Dependency Structures. Proceedings of Conference on Natural Language Learning (CoNLL), Boulder, 67–72.

External links