Rao–Blackwell theorem

Short description: Statistical theorem

In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result that characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria.

The Rao–Blackwell theorem states that if $δ (X)$ is any kind of estimator of a parameter $θ$ , then the conditional expectation of $δ (X)$ given $T (X)$ , where $T$ is a sufficient statistic, is typically a better estimator of $θ$ , and is never worse. Sometimes one can very easily construct a very crude estimator $δ (X)$ , and then evaluate that conditional expected value to get an estimator that is in various senses optimal.

The theorem is named after C.R. Rao and David Blackwell. The process of transforming an estimator using the Rao–Blackwell theorem can be referred to as Rao–Blackwellization. The transformed estimator is called the Rao–Blackwell estimator.^[1]^[2]^[3]

Definitions

An estimator $δ (X)$ is an observable random variable (i.e. a statistic) used for estimating some unobservable quantity. For example, one may be unable to observe the average height of all male students at some university, but one may observe the heights of a random sample of 40 of them. The average height of those 40—the "sample average"—may be used as an estimator of the unobservable "population average".
A sufficient statistic $T (X)$ is a statistic calculated from data $X$ to estimate some parameter $θ$ for which no other statistic which can be calculated from data X provides any additional information about $θ$ . It is defined as an observable random variable such that the conditional probability distribution of all observable data $X$ given $T (X)$ does not depend on the unobservable parameter $θ$ , such as the mean or standard deviation of the whole population from which the data $X$ was taken. In the most frequently cited examples, the "unobservable" quantities are parameters that parametrize a known family of probability distributions according to which the data are distributed.

In other words, a sufficient statistic

T (X)

for a parameter

θ

is a statistic such that the conditional probability of the data

X

, given

T (X)

, does not depend on the parameter

θ

.

A Rao–Blackwell estimator $δ_{1} (X)$ of an unobservable quantity $θ$ is the conditional expected value

$E [δ (X) ∣ T (X)]$ of some estimator $δ (X)$ given a sufficient statistic $T (X)$ . Call $δ (X)$ the "original estimator" and $δ_{1} (X)$ the "improved estimator". It is important that the improved estimator be observable, i.e. that it does not depend on $θ$ . Generally, the conditional expected value of one function of these data given another function of these data does depend on $θ$ , but the very definition of sufficiency given above entails that this one does not.

The mean squared error of an estimator is the expected value of the square of its deviation from the unobservable quantity being estimated of $θ$ .

The theorem

Mean-squared-error version

One case of Rao–Blackwell theorem states:

The mean squared error of the Rao–Blackwell estimator does not exceed that of the original estimator.

In other words,

E [(δ_{1} (X) - θ)^{2}] \leq E [(δ (X) - θ)^{2}] .

The essential tools of the proof besides the definition above are the law of total expectation and the fact that for any random variable $Y$ , $E [Y^{2}]$ cannot be less than ${(E [Y])}^{2}$ . That inequality is a case of Jensen's inequality, although it may also be shown to follow instantly from the frequently mentioned fact that

0 \leq Var [Y] = E [(Y - E [Y])^{2}] = E [Y^{2}] - {(E [Y])}^{2} .

More precisely, the mean square error of the Rao-Blackwell estimator has the following decomposition^[4]

E [(δ_{1} (X) - θ)^{2}] = E [(δ (X) - θ)^{2}] - E [Var [δ (X) ∣ T (X)]]

Since $E [Var [δ (X) ∣ T (X)]] \geq 0$ , the Rao-Blackwell theorem immediately follows.

Convex loss generalization

The more general version of the Rao–Blackwell theorem speaks of the "expected loss" or risk function:

E [L (δ_{1} (X))] \leq E [L (δ (X))]

where the "loss function" $L$ may be any convex function. If the loss function is twice-differentiable, as in the case for mean-squared-error, then we have the sharper inequality^[4]

E [L (δ (X))] - E [L (δ_{1} (X))] \geq \frac{1}{2} E_{T} [\inf_{x} L^{″} (x) Var [δ (X) ∣ T]] .

Properties

The improved estimator is unbiased if and only if the original estimator is unbiased, as may be seen at once by using the law of total expectation. The theorem holds regardless of whether biased or unbiased estimators are used.

The theorem seems very weak: it says only that the Rao–Blackwell estimator is no worse than the original estimator. In practice, however, the improvement is often enormous.^[5]

Examples

A Poisson-distribution example

Phone calls arrive at a switchboard according to a Poisson process at an average rate of λ per minute. This rate is not observable, but the numbers $X_{1}, \dots, X_{n}$ of phone calls that arrived during n successive one-minute periods are observed. It is desired to estimate the probability e^−λ that the next one-minute period passes with no phone calls.

An extremely crude estimator of the desired probability is

δ_{0} = {\begin{matrix} 1 & if X_{1} = 0, \\ 0 & otherwise, \end{matrix}

i.e., it estimates this probability to be 1 if no phone calls arrived in the first minute and zero otherwise. Despite the apparent limitations of this estimator, the result given by its Rao–Blackwellization is a very good estimator.

The sum

S_{n} = \sum_{i = 1}^{n} X_{i} = X_{1} + \dots + X_{n}

can be readily shown to be a sufficient statistic for λ, i.e., the conditional distribution of the data $X_{1}, \dots, X_{n}$ , depends on λ only through this sum. Therefore, we find the Rao–Blackwell estimator

δ_{1} = E [δ_{0} ∣ S_{n} = s_{n}] .

After doing some algebra we have

\begin{aligned} δ_{1} & = E [𝟏_{{X_{1} = 0}} | \sum_{i = 1}^{n} X_{i} = s_{n}] \\ = P [X_{1} = 0 | \sum_{i = 1}^{n} X_{i} = s_{n}] \\ = P [X_{1} = 0, \sum_{i = 2}^{n} X_{i} = s_{n}] \times P {[\sum_{i = 1}^{n} X_{i} = s_{n}]}^{- 1} \\ = e^{- λ} \frac{{((n - 1) λ)}^{s_{n}} e^{- (n - 1) λ}}{s_{n}!} \times {(\frac{(n λ)^{s_{n}} e^{- n λ}}{s_{n}!})}^{- 1} \\ = \frac{{((n - 1) λ)}^{s_{n}} e^{- n λ}}{s_{n}!} \times \frac{s_{n}!}{(n λ)^{s_{n}} e^{- n λ}} \\ = {(1 - \frac{1}{n})}^{s_{n}} \end{aligned}

Since the expected total number of calls arriving during the first n minutes is nλ, one might not be surprised if this estimator has a fairly high probability (if n is big, by WLLN, the sample average converges in probability to the parameter λ) of being close to

{(1 - \frac{1}{n})}^{n λ} \approx e^{- λ} .

So $δ_{1}$ is clearly a very much improved estimator of that last quantity. In fact, since $S_{n}$ is complete and $δ_{0}$ is unbiased, $δ_{1}$ is the unique minimum variance unbiased estimator by the Lehmann–Scheffé theorem.

A uniform-distribution example

Suppose n independent positive samples are taken from a uniform distribution with an unknown upper bound $θ$ , that is, $X_{1}, \dots, X_{n} \sim U (0, θ) i.i.d.,$ and the aim is to estimate this parameter $θ$ . This is a continuous (and thus simpler) version of the German tank problem.

In this setting, the maximal value statistic $M ≜ T (X) ≜ \max_{i} X_{i}$ is sufficient, as given $M$ , one (random) sample would equal it, and the others would be uniformly and independently distributed between 0 and $M$ ; this statistic is also complete.

There are two natural estimators for the upper bound $θ$ , which improve as n grows. The first is twice the sample mean. This is unbiased (as $E [X_{i}] = θ / 2$ for each $i$ ), but is not necessarily consistent - it may even be lower than the observed maximum $M$ , and then clearly also lower than $θ$ . Thus it cannot be optimal (i.e., of lowest variance).

The second natural estimator is $M$ itself, but it is obviously too low and thus biased; intuitively, it should be "inflated". The factor by which it may be inflated in order to become unbiased can be found using the beta distribution of the uniform order statistics, but just doing that is unmotivated and has no further guarantees.

We can then apply the Rao-Blackwell procedure using the first of the suggestions above as the crude estimator. Actually, as the procedure is linear, we can even just use $δ (X) ≜ 2 X_{1}$ - it would be a bad estimator by itself (not using the whole information, its error does not improve with n), but that is perfectly fine for Rao-Blackwellization. Moving from the mean of the samples to just picking one of them keeps the logic the same, only casting it in the language of probabilities rather than expectations.

So, what is the conditional expectation of $X_{1}$ given $M$ ? It has a probability of 1/n to be the maximal observed value $M$ itself, otherwise it is uniformly distributed in $[0, M]$ and thus has an expectation of $M / 2$ . Hence, finally, the Rao-Blackwell estimator is

δ_{1} (X) = E [δ (X) ∣ T (X)] = E [2 X_{1} ∣ M] = 2 (\frac{1}{n} \cdot M + \frac{n - 1}{n} \cdot \frac{M}{2}) = \frac{n + 1}{n} M

.

This is indeed the correctly-inflated version of $M$ we were hoping for (inflating less and less as the number of samples n grows). It is unbiased, and again, by the Lehmann–Scheffé theorem, it is the (unique) minimum-variance such estimator.

Idempotence

Rao–Blackwellization is an idempotent operation. Using it to improve the already improved estimator does not obtain a further improvement, but merely returns as its output the same improved estimator.

Completeness and Lehmann–Scheffé minimum variance

If the conditioning statistic is both complete and sufficient, and the starting estimator is unbiased, then the Rao–Blackwell estimator is the unique "best unbiased estimator": see Lehmann–Scheffé theorem.

An example of an improvable Rao–Blackwell improvement, when using a minimal sufficient statistic that is not complete, was provided by Galili and Meilijson in 2016.^[6] Let $X_{1}, \dots, X_{n}$ be a random sample from a scale-uniform distribution $X \sim U ((1 - k) θ, (1 + k) θ),$ with unknown mean $E [X] = θ$ and known design parameter $k \in (0, 1)$ . In the search for "best" possible unbiased estimators for $θ,$ it is natural to consider $X_{1}$ as an initial (crude) unbiased estimator for $θ$ and then try to improve it. Since $X_{1}$ is not a function of $T = (X_{(1)}, X_{(n)})$ , the minimal sufficient statistic for $θ$ (where $X_{(1)} = \min (X_{i})$ and $X_{(n)} = \max (X_{i})$ ), it may be improved using the Rao–Blackwell theorem as follows:

{\hat{θ}}_{R B} = E_{θ} [X_{1} | X_{(1)}, X_{(n)}] = \frac{X_{(1)} + X_{(n)}}{2} .

However, the following unbiased estimator can be shown to have lower variance:

{\hat{θ}}_{L V} = \frac{1}{2 (k^{2} \frac{n - 1}{n + 1} + 1)} [(1 - k) X_{(1)} + (1 + k) X_{(n)}] .

And in fact, it could be even further improved when using the following estimator:

{\hat{θ}}_{B A Y E S} = \frac{n + 1}{n} [1 - \frac{\frac{(\frac{X_{(1)}}{1 - k})}{(\frac{X_{(n)}}{1 + k})} - 1}{{[\frac{(\frac{X_{(1)}}{1 - k})}{(\frac{X_{(n)}}{1 + k})}]}^{n + 1} - 1}] \frac{X_{(n)}}{1 + k}

The model is a scale model. Optimal equivariant estimators can then be derived for loss functions that are invariant.^[7]

References

↑ "Conditional expectation and unbiased sequential estimation". Annals of Mathematical Statistics 18 (1): 105–110. 1947. doi:10.1214/aoms/1177730497.
↑ "Unbiased estimates". Izvestiya Akad. Nauk SSSR. Ser. Mat. 14: 303–326. 1950.
↑ Rao, C. Radhakrishna (1945). "Information and accuracy attainable in the estimation of statistical parameters". Bulletin of the Calcutta Mathematical Society 37 (3): 81–91.
↑ ^4.0 ^4.1 J. G. Liao; A. Berg (22 June 2018). "Sharpening Jensen's Inequality". The American Statistician 73 (3): 278–281. doi:10.1080/00031305.2017.1419145.
↑ Carpenter, Bob (January 20, 2020). "Rao-Blackwellization and discrete parameters in Stan". https://statmodeling.stat.columbia.edu/2020/01/29/rao-blackwellization-and-discrete-parameters-in-stan/. ""The Rao-Blackwell theorem states that the marginalization approach has variance less than or equal to the direct approach. In practice, this difference can be enormous.""
↑ Tal Galili; Isaac Meilijson (31 Mar 2016). "An Example of an Improvable Rao–Blackwell Improvement, Inefficient Maximum Likelihood Estimator, and Unbiased Generalized Bayes Estimator". The American Statistician 70 (1): 108–113. doi:10.1080/00031305.2015.1100683. PMID 27499547.
↑ Taraldsen, Gunnar (2020). "Micha Mandel (2020), "The Scaled Uniform Model Revisited," The American Statistician, 74:1, 98–100: Comment". The American Statistician 74 (3): 315. doi:10.1080/00031305.2020.1769727. ISSN 0003-1305. https://doi.org/10.1080/00031305.2020.1769727.

External links

Hazewinkel, Michiel, ed. (2001), "Rao–Blackwell–Kolmogorov theorem", Encyclopedia of Mathematics, Springer Science+Business Media B.V. / Kluwer Academic Publishers, ISBN 978-1-55608-010-4, https://www.encyclopediaofmath.org/index.php?title=R/r077550

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Rao–Blackwell theorem. Read more

[LS1-1] "Conditional expectation and unbiased sequential estimation". Annals of Mathematical Statistics 18 (1): 105–110. 1947. doi:10.1214/aoms/1177730497.

[LS2-2] "Unbiased estimates". Izvestiya Akad. Nauk SSSR. Ser. Mat. 14: 303–326. 1950.

[LS3-3] Rao, C. Radhakrishna (1945). "Information and accuracy attainable in the estimation of statistical parameters". Bulletin of the Calcutta Mathematical Society 37 (3): 81–91.

[LiaoBerg2018-4] 4.0 ^4.1 J. G. Liao; A. Berg (22 June 2018). "Sharpening Jensen's Inequality". The American Statistician 73 (3): 278–281. doi:10.1080/00031305.2017.1419145.

[LS4-5] Carpenter, Bob (January 20, 2020). "Rao-Blackwellization and discrete parameters in Stan". https://statmodeling.stat.columbia.edu/2020/01/29/rao-blackwellization-and-discrete-parameters-in-stan/. ""The Rao-Blackwell theorem states that the marginalization approach has variance less than or equal to the direct approach. In practice, this difference can be enormous.""

[6] Tal Galili; Isaac Meilijson (31 Mar 2016). "An Example of an Improvable Rao–Blackwell Improvement, Inefficient Maximum Likelihood Estimator, and Unbiased Generalized Bayes Estimator". The American Statistician 70 (1): 108–113. doi:10.1080/00031305.2015.1100683. PMID 27499547.

[7] Taraldsen, Gunnar (2020). "Micha Mandel (2020), "The Scaled Uniform Model Revisited," The American Statistician, 74:1, 98–100: Comment". The American Statistician 74 (3): 315. doi:10.1080/00031305.2020.1769727. ISSN 0003-1305. https://doi.org/10.1080/00031305.2020.1769727.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Anonymous

Search

Rao–Blackwell theorem

Namespaces

More

Page actions

Contents

Definitions