Leave-one-out error

From HandWiki
Illustration of leave-one-out cross-validation (LOOCV) when n = 8 observations. A total of 8 models will be trained and tested.

For mathematical analysis and statistics, Leave-one-out error can refer to the following:

  • Leave-one-out cross-validation Stability (CVloo, for stability of Cross Validation with leave one out): An algorithm f has CVloo stability β with respect to the loss function V if the following holds:

[math]\displaystyle{ \forall i\in\{1,...,m\}, \mathbb{P}_S\{\sup_{z\in Z}|V(f_S,z_i)-V(f_{S^{|i}},z_i)|\leq\beta_{CV}\}\geq1-\delta_{CV} }[/math]

  • Expected-to-leave-one-out error Stability ([math]\displaystyle{ Eloo_{err} }[/math], for Expected error from leaving one out): An algorithm f has [math]\displaystyle{ Eloo_{err} }[/math] stability if for each n there exists a[math]\displaystyle{ \beta_{EL}^m }[/math] and a [math]\displaystyle{ \delta_{EL}^m }[/math] such that:

[math]\displaystyle{ \forall i\in\{1,...,m\}, \mathbb{P}_S\{|I[f_S]-\frac{1}{m}\sum_{i=1}^m V(f_{S^{|i}},z_i)|\leq\beta_{EL}^m\}\geq1-\delta_{EL}^m }[/math], with [math]\displaystyle{ \beta_{EL}^m }[/math]and [math]\displaystyle{ \delta_{EL}^m }[/math] going to zero for [math]\displaystyle{ n\rightarrow\inf }[/math]

Preliminary notations

With X and Y being a subset of the real numbers R, or X and Y ⊂ R, being respectively an input space X and an output space Y, we consider a training set:

[math]\displaystyle{ S = \{z_1 = (x_1,\ y_1)\ ,..,\ z_m = (x_m,\ y_m)\} }[/math] of size m in [math]\displaystyle{ Z = X \times Y }[/math] drawn independently and identically distributed (i.i.d.) from an unknown distribution, here called "D". Then a learning algorithm is a function [math]\displaystyle{ f }[/math] from [math]\displaystyle{ Z_m }[/math] into [math]\displaystyle{ F \subset YX }[/math] which maps a learning set S onto a function [math]\displaystyle{ f_S }[/math] from the input space X to the output space Y. To avoid complex notation, we consider only deterministic algorithms. It is also assumed that the algorithm [math]\displaystyle{ f }[/math] is symmetric with respect to S, i.e. it does not depend on the order of the elements in the training set. Furthermore, we assume that all functions are measurable and all sets are countable which does not limit the interest of the results presented here.

The loss of an hypothesis f with respect to an example [math]\displaystyle{ z = (x,y) }[/math] is then defined as [math]\displaystyle{ V(f,z) = V(f(x),y) }[/math]. The empirical error of f can then be written as [math]\displaystyle{ I_S[f] = \frac{1}{n}\sum V(f,z_i) }[/math].

The true error of f is [math]\displaystyle{ I[f] = \mathbb{E}_z V(f,z) }[/math]

Given a training set S of size m, we will build, for all i = 1....,m, modified training sets as follows:

  • By removing the i-th element

[math]\displaystyle{ S^{|i} = \{z_1 ,...,\ z_{i-1},\ z_{i+1},...,\ z_m\} }[/math]

  • and/or by replacing the i-th element

[math]\displaystyle{ S^i = \{z_1 ,...,\ z_{i-1},\ z_i',\ z_{i+1},...,\ z_m\} }[/math]

See also

References

  • S. Mukherjee, P. Niyogi, T. Poggio, and R. M. Rifkin. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math., 25(1-3):161–193, 2006