Sanov's theorem

From HandWiki
Revision as of 15:41, 6 February 2024 by WikiEditor (talk | contribs) (linkage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.

Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector [math]\displaystyle{ x^n = x_1, x_2, \ldots, x_n }[/math]. Then, we have the following bound on the probability that the empirical measure [math]\displaystyle{ \hat{p}_{x^n} }[/math] of the samples falls within the set A:

[math]\displaystyle{ q^n(\hat{p}_{x^n}\in A) \le (n+1)^{|X|} 2^{-nD_{\mathrm{KL}}(p^*||q)} }[/math],

where

  • [math]\displaystyle{ q^n }[/math] is the joint probability distribution on [math]\displaystyle{ X^n }[/math], and
  • [math]\displaystyle{ p^* }[/math] is the information projection of q onto A.

In words, the probability of drawing an atypical distribution is bounded by a function of the KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.

Furthermore, if A is a closed set, then

[math]\displaystyle{ \lim_{n\to\infty}\frac{1}{n}\log q^n(\hat{p}_{x^n}\in A) = -D_{\mathrm{KL}}(p^*||q). }[/math]

References

  • Sanov, I. N. (1957) "On the probability of large deviations of random variables". Mat. Sbornik 42(84), No. 1, 11–44.
  • Санов, И. Н. (1957) "О вероятности больших отклонений случайных величин". МАТЕМАТИЧЕСКИЙ СБОРНИК' 42(84), No. 1, 11–44.