Stein's lemma

From HandWiki
Short description: Theorem of probability theory

Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory.[1] The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.

Note that the name "Stein's lemma" is also commonly used[2] to refer to a different result in the area of statistical hypothesis testing, which connects the error exponents in hypothesis testing with the Kullback–Leibler divergence. This result is also known as the Chernoff–Stein lemma[3] and is not related to the lemma discussed in this article.

Statement of the lemma

Suppose X is a normally distributed random variable with expectation μ and variance σ2. Further suppose g is a differentiable function for which the two expectations E(g(X) (X − μ)) and E(g ′(X)) both exist. (The existence of the expectation of any random variable is equivalent to the finiteness of the expectation of its absolute value.) Then

[math]\displaystyle{ E\bigl(g(X)(X-\mu)\bigr)=\sigma^2 E\bigl(g'(X)\bigr). }[/math]

In general, suppose X and Y are jointly normally distributed. Then

[math]\displaystyle{ \operatorname{Cov}(g(X),Y)= \operatorname{Cov}(X,Y)E(g'(X)). }[/math]

For a general multivariate Gaussian random vector [math]\displaystyle{ (X_1, ..., X_n) \sim N(\mu, \Sigma) }[/math] it follows that

[math]\displaystyle{ E\bigl(g(X)(X-\mu)\bigr)=\Sigma\cdot E\bigl(\nabla g(X)\bigr). }[/math]

Proof

The univariate probability density function for the univariate normal distribution with expectation 0 and variance 1 is

[math]\displaystyle{ \varphi(x)={1 \over \sqrt{2\pi}}e^{-x^2/2} }[/math]

Since [math]\displaystyle{ \int x \exp(-x^2/2)\,dx = -\exp(-x^2/2) }[/math] we get from integration by parts:

[math]\displaystyle{ E[g(X)X] = \frac{1}{\sqrt{2\pi}}\int g(x) x \exp(-x^2/2)\,dx = \frac{1}{\sqrt{2\pi}}\int g'(x) \exp(-x^2/2)\,dx = E[g'(X)] }[/math].

The case of general variance [math]\displaystyle{ \sigma^2 }[/math] follows by substitution.

More general statement

Isserlis' theorem is equivalently stated as[math]\displaystyle{ \operatorname{E}(X_1 f(X_1,\ldots,X_n))=\sum_{i=1}^{n} \operatorname{Cov}(X_1X_i)\operatorname{E}(\partial_{X_i}f(X_1,\ldots,X_n)). }[/math]where [math]\displaystyle{ (X_1,\dots X_{n}) }[/math] is a zero-mean multivariate normal random vector.

Suppose X is in an exponential family, that is, X has the density

[math]\displaystyle{ f_\eta(x)=\exp(\eta'T(x) - \Psi(\eta))h(x). }[/math]

Suppose this density has support [math]\displaystyle{ (a,b) }[/math] where [math]\displaystyle{ a,b }[/math] could be [math]\displaystyle{ -\infty ,\infty }[/math] and as [math]\displaystyle{ x\rightarrow a\text{ or }b }[/math], [math]\displaystyle{ \exp (\eta'T(x))h(x) g(x) \rightarrow 0 }[/math] where [math]\displaystyle{ g }[/math] is any differentiable function such that [math]\displaystyle{ E|g'(X)|\lt \infty }[/math] or [math]\displaystyle{ \exp (\eta'T(x))h(x) \rightarrow 0 }[/math] if [math]\displaystyle{ a,b }[/math] finite. Then

[math]\displaystyle{ E\left[\left(\frac{h'(X)}{h(X)} + \sum \eta_i T_i'(X)\right)\cdot g(X)\right] = -E[g'(X)]. }[/math]

The derivation is same as the special case, namely, integration by parts.

If we only know [math]\displaystyle{ X }[/math] has support [math]\displaystyle{ \mathbb{R} }[/math], then it could be the case that [math]\displaystyle{ E|g(X)| \lt \infty \text{ and } E|g'(X)| \lt \infty }[/math] but [math]\displaystyle{ \lim_{x\rightarrow \infty} f_\eta(x) g(x) \not= 0 }[/math]. To see this, simply put [math]\displaystyle{ g(x)=1 }[/math] and [math]\displaystyle{ f_\eta(x) }[/math] with infinitely spikes towards infinity but still integrable. One such example could be adapted from [math]\displaystyle{ f(x) = \begin{cases} 1 & x \in [n, n + 2^{-n}) \\ 0 & \text{otherwise} \end{cases} }[/math] so that [math]\displaystyle{ f }[/math] is smooth.

Extensions to elliptically-contoured distributions also exist.[4][5][6]

See also

References

  1. Ingersoll, J., Theory of Financial Decision Making, Rowman and Littlefield, 1987: 13-14.
  2. Csiszár, Imre; Körner, János (2011). Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press. p. 14. ISBN 9781139499989. https://books.google.com/books?id=2gsLkQlb8JAC&pg=PA14. 
  3. Thomas M. Cover, Joy A. Thomas (2006). Elements of Information Theory. John Wiley & Sons, New York. ISBN 9781118585771. https://books.google.com/books?id=VWq5GG6ycxMC. 
  4. Cellier, Dominique; Fourdrinier, Dominique; Robert, Christian (1989). "Robust shrinkage estimators of the location parameter for elliptically symmetric distributions". Journal of Multivariate Analysis 29 (1): 39–52. doi:10.1016/0047-259X(89)90075-4. 
  5. Hamada, Mahmoud; Valdez, Emiliano A. (2008). "CAPM and option pricing with elliptically contoured distributions". The Journal of Risk & Insurance 75 (2): 387–409. doi:10.1111/j.1539-6975.2008.00265.x. 
  6. Landsman, Zinoviy (2008). "Stein's Lemma for elliptical random vectors". Journal of Multivariate Analysis 99 (5): 912––927. doi:10.1016/j.jmva.2007.05.006.