Stein's unbiased risk estimate
In statistics, Stein's unbiased risk estimate (SURE) is an unbiased estimator of the mean-squared error of "a nearly arbitrary, nonlinear biased estimator."[1] In other words, it provides an indication of the accuracy of a given estimator. This is important since the true mean-squared error of an estimator is a function of the unknown parameter to be estimated, and thus cannot be determined exactly. The technique is named after its discoverer, Charles Stein.[2]
Formal statement
Let [math]\displaystyle{ \mu \in {\mathbb R}^d }[/math] be an unknown parameter and let [math]\displaystyle{ x \in {\mathbb R}^d }[/math] be a measurement vector whose components are independent and distributed normally with mean [math]\displaystyle{ \mu_i, i=1,...,d, }[/math] and variance [math]\displaystyle{ \sigma^2 }[/math]. Suppose [math]\displaystyle{ h(x) }[/math] is an estimator of [math]\displaystyle{ \mu }[/math] from [math]\displaystyle{ x }[/math], and can be written [math]\displaystyle{ h(x) = x + g(x) }[/math], where [math]\displaystyle{ g }[/math] is weakly differentiable. Then, Stein's unbiased risk estimate is given by[3]
- [math]\displaystyle{ \operatorname{SURE}(h) = d\sigma^2 + \|g(x)\|^2 + 2 \sigma^2 \sum_{i=1}^d \frac{\partial}{\partial x_i} g_i(x) = -d\sigma^2 + \|g(x)\|^2 + 2 \sigma^2 \sum_{i=1}^d \frac{\partial}{\partial x_i} h_i(x), }[/math]
where [math]\displaystyle{ g_i(x) }[/math] is the [math]\displaystyle{ i }[/math]th component of the function [math]\displaystyle{ g(x) }[/math], and [math]\displaystyle{ \|\cdot\| }[/math] is the Euclidean norm.
The importance of SURE is that it is an unbiased estimate of the mean-squared error (or squared error risk) of [math]\displaystyle{ h(x) }[/math], i.e.
- [math]\displaystyle{ \operatorname E_\mu \{ \operatorname{SURE}(h) \} = \operatorname{MSE}(h),\,\! }[/math]
with
- [math]\displaystyle{ \operatorname{MSE}(h) = \operatorname E_\mu \|h(x)-\mu\|^2. }[/math]
Thus, minimizing SURE can act as a surrogate for minimizing the MSE. Note that there is no dependence on the unknown parameter [math]\displaystyle{ \mu }[/math] in the expression for SURE above. Thus, it can be manipulated (e.g., to determine optimal estimation settings) without knowledge of [math]\displaystyle{ \mu }[/math].
Proof
We wish to show that
- [math]\displaystyle{ \operatorname E_\mu \|h(x)-\mu\|^2 = \operatorname E_\mu \{ \operatorname{SURE}(h) \}. }[/math]
We start by expanding the MSE as
- [math]\displaystyle{ \begin{align} \operatorname E_\mu \| h(x) - \mu\|^2 & = \operatorname E_\mu \|g(x) + x - \mu\|^2 \\ & = \operatorname E_\mu \|g(x)\|^2 + \operatorname E_\mu \|x - \mu\|^2 + 2 \operatorname E_\mu g(x)^T (x - \mu) \\ & = \operatorname E_\mu \|g(x)\|^2 + d \sigma^2 + 2 \operatorname E_\mu g(x)^T(x - \mu). \end{align} }[/math]
Now we use integration by parts to rewrite the last term:
- [math]\displaystyle{ \begin{align} \operatorname E_\mu g(x)^T(x - \mu) & = \int_{{\mathbb R}^d} \frac{1}{\sqrt{2 \pi \sigma^{2d}}} \exp\left(-\frac{\|x - \mu\|^2}{2 \sigma^2} \right) \sum_{i=1}^d g_i(x) (x_i - \mu_i) d^d x \\ & = \sigma^2 \sum_{i=1}^d\int_{{\mathbb R}^d} \frac{1}{\sqrt{2 \pi \sigma^{2d}}} \exp\left(-\frac{\|x - \mu\|^2}{2 \sigma^2} \right) \frac{dg_i}{dx_i} d^d x \\ & = \sigma^2 \sum_{i=1}^d \operatorname E_\mu \frac{dg_i}{dx_i}. \end{align} }[/math]
Substituting this into the expression for the MSE, we arrive at
- [math]\displaystyle{ \operatorname E_\mu \|h(x) - \mu\|^2 = \operatorname E_\mu \left( d\sigma^2 + \|g(x)\|^2 + 2\sigma^2 \sum_{i=1}^d \frac{dg_i}{dx_i}\right). }[/math]
Applications
A standard application of SURE is to choose a parametric form for an estimator, and then optimize the values of the parameters to minimize the risk estimate. This technique has been applied in several settings. For example, a variant of the James–Stein estimator can be derived by finding the optimal shrinkage estimator.[2] The technique has also been used by Donoho and Johnstone to determine the optimal shrinkage factor in a wavelet denoising setting.[1]
References
- ↑ 1.0 1.1 Donoho, David L.; Iain M. Johnstone (December 1995). "Adapting to Unknown Smoothness via Wavelet Shrinkage". Journal of the American Statistical Association 90 (432): 1200–1244. doi:10.2307/2291512.
- ↑ 2.0 2.1 Stein, Charles M. (November 1981). "Estimation of the Mean of a Multivariate Normal Distribution". The Annals of Statistics 9 (6): 1135–1151. doi:10.1214/aos/1176345632.
- ↑ Wasserman, Larry (2005). All of Nonparametric Statistics.
Original source: https://en.wikipedia.org/wiki/Stein's unbiased risk estimate.
Read more |