Proof of Stein's example

Short description: Mathematical proof

Stein's example is an important result in decision theory which can be stated as

The ordinary decision rule for estimating the mean of a multivariate Gaussian distribution is inadmissible under mean squared error risk in dimension at least 3.

The following is an outline of its proof.^[1] The reader is referred to the main article for more information.

Sketched proof

The risk function of the decision rule [math]\displaystyle{ d(\mathbf{x}) = \mathbf{x} }[/math] is

[math]\displaystyle{ R(\theta,d) = \operatorname{E}_\theta[ |\mathbf{\theta - X}|^2] }[/math]

[math]\displaystyle{ =\int (\mathbf{\theta - x})^T(\mathbf{\theta - x}) \left( \frac{1}{2\pi} \right)^{n/2} e^{(-1/2) (\mathbf{\theta - x})^T (\mathbf{\theta - x}) } m(dx) }[/math]

[math]\displaystyle{ = n. }[/math]

Now consider the decision rule

[math]\displaystyle{ d'(\mathbf{x}) = \mathbf{x} - \frac{\alpha}{|\mathbf{x}|^2}\mathbf{x} }[/math]

where [math]\displaystyle{ \alpha = n-2 }[/math]. We will show that [math]\displaystyle{ d' }[/math] is a better decision rule than [math]\displaystyle{ d }[/math]. The risk function is

[math]\displaystyle{ R(\theta,d') = \operatorname{E}_\theta\left[ \left|\mathbf{\theta - X} + \frac{\alpha}{|\mathbf{X}|^2}\mathbf{X}\right|^2\right] }[/math]

[math]\displaystyle{ = \operatorname{E}_\theta\left[ |\mathbf{\theta - X}|^2 + 2(\mathbf{\theta - X})^T\frac{\alpha}{|\mathbf{X}|^2}\mathbf{X} + \frac{\alpha^2}{|\mathbf{X}|^4}|\mathbf{X}|^2 \right] }[/math]

[math]\displaystyle{ = \operatorname{E}_\theta\left[ |\mathbf{\theta - X}|^2 \right] + 2\alpha\operatorname{E}_\theta\left[\frac{\mathbf{(\theta-X)^T X}}{|\mathbf{X}|^2}\right] + \alpha^2\operatorname{E}_\theta\left[\frac{1}{|\mathbf{X}|^2} \right] }[/math]

— a quadratic in [math]\displaystyle{ \alpha }[/math]. We may simplify the middle term by considering a general "well-behaved" function [math]\displaystyle{ h:\mathbf{x} \mapsto h(\mathbf{x}) \in \mathbb{R} }[/math] and using integration by parts. For [math]\displaystyle{ 1\leq i \leq n }[/math], for any continuously differentiable [math]\displaystyle{ h }[/math] growing sufficiently slowly for large [math]\displaystyle{ x_i }[/math] we have:

[math]\displaystyle{ \operatorname{E}_\theta [ (\theta_i - X_i) h(\mathbf{X}) | X_j=x_j (j\neq i) ]= \int (\theta_i - x_i) h(\mathbf{x}) \left( \frac{1}{2\pi} \right)^{n/2} e^{ -(1/2)\mathbf{(x-\theta)}^T \mathbf{(x-\theta)} } m(dx_i) }[/math]

[math]\displaystyle{ = \left[ h(\mathbf{x}) \left( \frac{1}{2\pi} \right)^{n/2} e^{-(1/2) \mathbf{(x-\theta)}^T \mathbf{(x-\theta)} } \right]^\infty_{x_i=-\infty} - \int \frac{\partial h}{\partial x_i}(\mathbf{x}) \left( \frac{1}{2\pi} \right)^{n/2} e^{-(1/2)\mathbf{(x-\theta)}^T \mathbf{(x-\theta)} } m(dx_i) }[/math]

[math]\displaystyle{ = - \operatorname{E}_\theta \left[ \frac{\partial h}{\partial x_i}(\mathbf{X}) | X_j=x_j (j\neq i) \right]. }[/math]

Therefore,

[math]\displaystyle{ \operatorname{E}_\theta [ (\theta_i - X_i) h(\mathbf{X})]= - \operatorname{E}_\theta \left[ \frac{\partial h}{\partial x_i}(\mathbf{X}) \right]. }[/math]

(This result is known as Stein's lemma.)

Now, we choose

[math]\displaystyle{ h(\mathbf{x}) = \frac{x_i}{|\mathbf{x}|^2}. }[/math]

If [math]\displaystyle{ h }[/math] met the "well-behaved" condition (it doesn't, but this can be remedied—see below), we would have

[math]\displaystyle{ \frac{\partial h}{\partial x_i} = \frac{1}{|\mathbf{x}|^2} - \frac{2 x_i^2}{|\mathbf{x}|^4} }[/math]

and so

[math]\displaystyle{ \operatorname{E}_\theta\left[\frac{\mathbf{(\theta-X)^T X}}{|\mathbf{X}|^2}\right] = \sum_{i=1}^n \operatorname{E}_\theta \left[ (\theta_i - X_i) \frac{X_i}{|\mathbf{X}|^2} \right] }[/math]

[math]\displaystyle{ = - \sum_{i=1}^n \operatorname{E}_\theta \left[ \frac{1}{|\mathbf{X}|^2} - \frac{2 X_i^2}{|\mathbf{X}|^4} \right] }[/math]

[math]\displaystyle{ = -(n-2)\operatorname{E}_\theta \left[\frac{1}{|\mathbf{X}|^2}\right]. }[/math]

Then returning to the risk function of [math]\displaystyle{ d' }[/math]:

[math]\displaystyle{ R(\theta,d') = n - 2\alpha(n-2)\operatorname{E}_\theta\left[\frac{1}{|\mathbf{X}|^2}\right] + \alpha^2\operatorname{E}_\theta\left[\frac{1}{|\mathbf{X}|^2} \right]. }[/math]

This quadratic in [math]\displaystyle{ \alpha }[/math] is minimized at

[math]\displaystyle{ \alpha = n-2, }[/math]

giving

[math]\displaystyle{ R(\theta,d') = R(\theta,d) - (n-2)^2\operatorname{E}_\theta\left[\frac{1}{|\mathbf{X}|^2} \right] }[/math]

which of course satisfies

[math]\displaystyle{ R(\theta,d') \lt R(\theta,d). }[/math]

making [math]\displaystyle{ d }[/math] an inadmissible decision rule.

It remains to justify the use of

[math]\displaystyle{ h(\mathbf{X})= \frac{\mathbf{X}}{|\mathbf{X}|^2}. }[/math]

This function is not continuously differentiable, since it is singular at [math]\displaystyle{ \mathbf{x}=0 }[/math]. However, the function

[math]\displaystyle{ h(\mathbf{X}) = \frac{\mathbf{X}}{\varepsilon + |\mathbf{X}|^2} }[/math]

is continuously differentiable, and after following the algebra through and letting [math]\displaystyle{ \varepsilon \to 0 }[/math], one obtains the same result.

References

↑ Samworth, Richard (December 2012). "Stein's Paradox". Eureka 62: 38–41. http://www.statslab.cam.ac.uk/~rjs57/SteinParadox.pdf.

0.00

(0 votes)

[1] Samworth, Richard (December 2012). "Stein's Paradox". Eureka 62: 38–41. http://www.statslab.cam.ac.uk/~rjs57/SteinParadox.pdf.

[1]

Anonymous

Search

Proof of Stein's example

Namespaces

More

Page actions

Sketched proof

References

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Proof of Stein's example

Sketched proof

References

Navigation

Wiki tools

Page tools

Other projects

Categories