Chapman–Robbins bound

From HandWiki

In statistics, the Chapman–Robbins bound or Hammersley–Chapman–Robbins bound is a lower bound on the variance of estimators of a deterministic parameter. It is a generalization of the Cramér–Rao bound; compared to the Cramér–Rao bound, it is both tighter and applicable to a wider range of problems. However, it is usually more difficult to compute. The bound was independently discovered by John Hammersley in 1950,[1] and by Douglas Chapman and Herbert Robbins in 1951.[2]

Statement

Let [math]\displaystyle{ \Theta }[/math] be the set of parameters for a family of probability distributions [math]\displaystyle{ \{\mu_\theta : \theta\in\Theta\} }[/math] on [math]\displaystyle{ \Omega }[/math].

For any two [math]\displaystyle{ \theta, \theta' \in \Theta }[/math], let [math]\displaystyle{ \chi^2(\mu_{\theta'}; \mu_{\theta}) }[/math] be the [math]\displaystyle{ \chi^2 }[/math]-divergence from [math]\displaystyle{ \mu_{\theta} }[/math] to [math]\displaystyle{ \mu_{\theta'} }[/math]. Then:

Theorem — Given any scalar random variable [math]\displaystyle{ \hat g: \Omega \to \R }[/math], and any two [math]\displaystyle{ \theta, \theta'\in\Theta }[/math], we have [math]\displaystyle{ \operatorname{Var}_\theta[\hat g] \geq \sup_{\theta'\neq \theta \in \Theta}\frac{(E_{\theta'}[\hat g] - E_{\theta}[\hat g])^2}{\chi^2(\mu_{\theta'} ; \mu_\theta)} }[/math].

A generalization to the multivariable case is:[3]

Theorem — Given any multivariate random variable [math]\displaystyle{ \hat g: \Omega \to \R^m }[/math], and any [math]\displaystyle{ \theta, \theta' \in\Theta }[/math], [math]\displaystyle{ \chi^2(\mu_{\theta'} ; \mu_\theta) \geq (E_{\theta'}[\hat g] - E_{\theta}[\hat g])^T \operatorname{Cov}_\theta[\hat g]^{-1} (E_{\theta'}[\hat g] - E_{\theta}[\hat g]) }[/math]

Proof

By the variational representation of chi-squared divergence:[3][math]\displaystyle{ \chi^2(P; Q) = \sup_g \frac{(E_P[g]-E_Q[g])^2}{\operatorname{Var}_Q[g]} }[/math] Plug in [math]\displaystyle{ g = \hat g, P = \mu_{\theta'}, Q = \mu_\theta }[/math], to obtain: [math]\displaystyle{ \chi^2(\mu_{\theta'}; \mu_\theta) \geq \frac{(E_{\theta'}[\hat g]-E_\theta[\hat g])^2}{\operatorname{Var}_\theta[\hat g]} }[/math]Switch the denominator and the left side and take supremum over [math]\displaystyle{ \theta' }[/math] to obtain the single-variate case. For the multivariate case, we define [math]\displaystyle{ h = \sum_{i=1}^m v_i \hat g_i }[/math] for any [math]\displaystyle{ v\neq 0 \in \R^m }[/math]. Then plug in [math]\displaystyle{ g = h }[/math] in the variational representation to obtain: [math]\displaystyle{ \chi^2(\mu_{\theta'}; \mu_\theta) \geq \frac{(E_{\theta'}[h]-E_\theta[h])^2}{\operatorname{Var}_\theta[h]} = \frac{\langle v, E_{\theta'}[\hat g]-E_\theta[\hat g]\rangle^2}{v^T \operatorname{Cov}_\theta[\hat g] v} }[/math]Take supremum over [math]\displaystyle{ v\neq 0 \in\R^m }[/math], using the linear algebra fact that [math]\displaystyle{ \sup_{v\neq 0} \frac{v^T ww^T v}{v^T M v} = w^T M^{-1}w }[/math], we obtain the multivariate case.

Relation to Cramér–Rao bound

Usually, [math]\displaystyle{ \Omega = \mathcal X^n }[/math] is the sample space of [math]\displaystyle{ n }[/math] independent draws of a [math]\displaystyle{ \mathcal X }[/math]-valued random variable [math]\displaystyle{ X }[/math] with distribution [math]\displaystyle{ \lambda_\theta }[/math] from a by [math]\displaystyle{ \theta \in \Theta \subseteq \mathbb R^m }[/math] parameterized family of probability distributions, [math]\displaystyle{ \mu_\theta = \lambda_\theta^{\otimes n} }[/math] is its [math]\displaystyle{ n }[/math]-fold product measure, and [math]\displaystyle{ \hat g : \mathcal X^n \to \Theta }[/math] is an estimator of [math]\displaystyle{ \theta }[/math]. Then, for [math]\displaystyle{ m=1 }[/math], the expression inside the supremum in the Chapman–Robbins bound converges to the Cramér–Rao bound of [math]\displaystyle{ \hat g }[/math] when [math]\displaystyle{ \theta' \to \theta }[/math], assuming the regularity conditions of the Cramér–Rao bound hold. This implies that, when both bounds exist, the Chapman–Robbins version is always at least as tight as the Cramér–Rao bound; in many cases, it is substantially tighter.

The Chapman–Robbins bound also holds under much weaker regularity conditions. For example, no assumption is made regarding differentiability of the probability density function p(x; θ) of [math]\displaystyle{ \lambda_\theta }[/math]. When p(x; θ) is non-differentiable, the Fisher information is not defined, and hence the Cramér–Rao bound does not exist.

See also

References

  1. "On estimating restricted parameters", Journal of the Royal Statistical Society, Series B 12 (2): 192–240, 1950 
  2. Chapman, D. G. (1951), "Minimum variance estimation without regularity assumptions", Annals of Mathematical Statistics 22 (4): 581–586, doi:10.1214/aoms/1177729548 
  3. 3.0 3.1 Polyanskiy, Yury (2017). "Lecture notes on information theory, chapter 29, ECE563 (UIUC)". https://people.lids.mit.edu/yp/homepage/data/LN_stats.pdf. 

Further reading

  • Lehmann, E. L.; Casella, G. (1998), Theory of Point Estimation (2nd ed.), Springer, pp. 113–114, ISBN 0-387-98502-6