Hajek projection

From HandWiki
Revision as of 15:57, 30 June 2023 by BotanyGa (talk | contribs) (update)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In statistics, Hájek projection of a random variable [math]\displaystyle{ T }[/math] on a set of independent random vectors [math]\displaystyle{ X_1,\dots,X_n }[/math] is a particular measurable function of [math]\displaystyle{ X_1,\dots,X_n }[/math] that, loosely speaking, captures the variation of [math]\displaystyle{ T }[/math] in an optimal way. It is named after the Czech statistician Jaroslav Hájek .

Definition

Given a random variable [math]\displaystyle{ T }[/math] and a set of independent random vectors [math]\displaystyle{ X_1,\dots,X_n }[/math], the Hájek projection [math]\displaystyle{ \hat{T} }[/math] of [math]\displaystyle{ T }[/math] onto [math]\displaystyle{ \{X_1,\dots,X_n\} }[/math] is given by[1]

[math]\displaystyle{ \hat{T} = \operatorname{E}(T) + \sum_{i=1}^n \left[ \operatorname{E}(T\mid X_i) - \operatorname{E}(T)\right] = \sum_{i=1}^n \operatorname{E}(T\mid X_i) - (n-1)\operatorname{E}(T) }[/math]

Properties

  • Hájek projection [math]\displaystyle{ \hat{T} }[/math] is an [math]\displaystyle{ L^2 }[/math]projection of [math]\displaystyle{ T }[/math] onto a linear subspace of all random variables of the form [math]\displaystyle{ \sum_{i=1}^n g_i(X_i) }[/math], where [math]\displaystyle{ g_i:\mathbb{R}^d \to \mathbb{R} }[/math] are arbitrary measurable functions such that [math]\displaystyle{ \operatorname{E}(g_i^2(X_i))\lt \infty }[/math] for all [math]\displaystyle{ i=1,\dots,n }[/math]
  • [math]\displaystyle{ \operatorname{E} (\hat{T}\mid X_i)=\operatorname{E}(T\mid X_i) }[/math] and hence [math]\displaystyle{ \operatorname{E}(\hat{T})=\operatorname{E}(T) }[/math]
  • Under some conditions, asymptotic distributions of the sequence of statistics [math]\displaystyle{ T_n=T_n(X_1,\dots,X_n) }[/math] and the sequence of its Hájek projections [math]\displaystyle{ \hat{T}_n = \hat{T}_n(X_1,\dots,X_n) }[/math] coincide, namely, if [math]\displaystyle{ \operatorname{Var}(T_n)/\operatorname{Var}(\hat{T}_n) \to 1 }[/math], then [math]\displaystyle{ \frac{T_n-\operatorname{E}(T_n)}{\sqrt{\operatorname{Var}(T_n)}} - \frac{\hat{T}_n-\operatorname{E}(\hat{T}_n)}{\sqrt{\operatorname{Var}(\hat{T}_n)}} }[/math] converges to zero in probability.

References

  1. Vaart, Aad W. van der (1959-....). (2012). Asymptotic statistics. Cambridge University Press. ISBN 9780511802256. OCLC 928629884. http://worldcat.org/oclc/928629884.