# Ancillary statistic

An **ancillary statistic** is a measure of a sample whose distribution (or whose pmf or pdf) does not depend on the parameters of the model.^{[1]}^{[2]}^{[3]} An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to construct prediction intervals. They are also used in connection with Basu's theorem to prove independence between statistics.^{[4]}
This concept was first introduced by Ronald Fisher in the 1920s,^{[5]} but its formal definition was only provided in 1964 by Debabrata Basu.^{[6]}^{[7]}

## Examples

Suppose *X*_{1}, ..., *X*_{n} are independent and identically distributed, and are normally distributed with unknown expected value *μ* and known variance 1. Let

- [math]\displaystyle{ \overline{X}_n = \frac{X_1+\,\cdots\,+X_n}{n} }[/math]

be the sample mean.

The following statistical measures of dispersion of the sample

- Range: max(
*X*_{1}, ...,*X*_{n}) − min(*X*_{1}, ...,*X*)_{n} - Interquartile range:
*Q*_{3}−*Q*_{1} - Sample variance:

- [math]\displaystyle{ \hat{\sigma}^2:=\,\frac{\sum \left(X_i-\overline{X}\right)^2}{n} }[/math]

are all *ancillary statistics*, because their sampling distributions do not change as *μ* changes. Computationally, this is because in the formulas, the *μ* terms cancel – adding a constant number to a distribution (and all samples) changes its sample maximum and minimum by the same amount, so it does not change their difference, and likewise for others: these measures of dispersion do not depend on location.

Conversely, given i.i.d. normal variables with known mean 1 and unknown variance *σ*^{2}, the sample mean [math]\displaystyle{ \overline{X} }[/math] is *not* an ancillary statistic of the variance, as the sampling distribution of the sample mean is *N*(1, *σ*^{2}/*n*), which does depend on *σ* ^{2} – this measure of location (specifically, its standard error) depends on dispersion.^{[8]}

### In location-scale families

In a location family of distributions, [math]\displaystyle{ (X_1 - X_n, X_2 - X_n, \dots, X_{n-1} - X_n) }[/math] is an ancillary statistic.

In a scale family of distributions, [math]\displaystyle{ (\frac{X_1}{X_n}, \frac{X_2}{X_n}, \dots, \frac{X_{n-1}}{X_n}) }[/math] is an ancillary statistic.

In a location-scale family of distributions, [math]\displaystyle{ (\frac{X_1 - X_n}{S}, \frac{X_2 - X_n}{S}, \dots, \frac{X_{n - 1} - X_n}{S}) }[/math], where [math]\displaystyle{ S^2 }[/math] is the sample variance, is an ancillary statistic.^{[3]}^{[9]}

## In recovery of information

It turns out that, if [math]\displaystyle{ T_1 }[/math] is a non-sufficient statistic and [math]\displaystyle{ T_2 }[/math] is ancillary, one can sometimes recover all the information about the unknown parameter contained in the entire data by reporting [math]\displaystyle{ T_1 }[/math] while conditioning on the observed value of [math]\displaystyle{ T_2 }[/math]. This is known as *conditional inference*.^{[3]}

For example, suppose that [math]\displaystyle{ X_1, X_2 }[/math] follow the [math]\displaystyle{ N(\theta, 1) }[/math] distribution where [math]\displaystyle{ \theta }[/math] is unknown. Note that, even though [math]\displaystyle{ X_1 }[/math] is not sufficient for [math]\displaystyle{ \theta }[/math] (since its Fisher information is 1, whereas the Fisher information of the complete statistic [math]\displaystyle{ \overline{X} }[/math] is 2), by additionally reporting the ancillary statistic [math]\displaystyle{ X_1 - X_2 }[/math], one obtains a joint distribution with Fisher information 2.^{[3]}

## Ancillary complement

Given a statistic *T* that is not sufficient, an **ancillary complement** is a statistic *U* that is ancillary and such that (*T*, *U*) is sufficient.^{[2]} Intuitively, an ancillary complement "adds the missing information" (without duplicating any).

The statistic is particularly useful if one takes *T* to be a maximum likelihood estimator, which in general will not be sufficient; then one can ask for an ancillary complement. In this case, Fisher argues that one must condition on an ancillary complement to determine information content: one should consider the Fisher information content of *T* to not be the marginal of *T*, but the conditional distribution of *T*, given *U*: how much information does *T* *add*? This is not possible in general, as no ancillary complement need exist, and if one exists, it need not be unique, nor does a maximum ancillary complement exist.

### Example

In baseball, suppose a scout observes a batter in *N* at-bats. Suppose (unrealistically) that the number *N* is chosen by some random process that is independent of the batter's ability – say a coin is tossed after each at-bat and the result determines whether the scout will stay to watch the batter's next at-bat. The eventual data are the number *N* of at-bats and the number *X* of hits: the data (*X*, *N*) are a sufficient statistic. The observed batting average *X*/*N* fails to convey all of the information available in the data because it fails to report the number *N* of at-bats (e.g., a batting average of 0.400, which is very high, based on only five at-bats does not inspire anywhere near as much confidence in the player's ability than a 0.400 average based on 100 at-bats). The number *N* of at-bats is an ancillary statistic because

- It is a part of the observable data (it is a
*statistic*), and - Its probability distribution does not depend on the batter's ability, since it was chosen by a random process independent of the batter's ability.

This ancillary statistic is an **ancillary complement** to the observed batting average *X*/*N*, i.e., the batting average *X*/*N* is not a sufficient statistic, in that it conveys less than all of the relevant information in the data, but conjoined with *N*, it becomes sufficient.

## See also

## Notes

- ↑ Lehmann, E. L.; Scholz, F. W. (1992). "Ancillarity".
*Lecture Notes-Monograph Series*. Institute of Mathematical Statistics Lecture Notes - Monograph Series**17**: 32–51. doi:10.1214/lnms/1215458837. ISBN 0-940600-24-2. ISSN 0749-2170. https://projecteuclid.org/ebooks/institute-of-mathematical-statistics-lecture-notes-monograph-series/Current-issues-in-statistical-inference--Essays-in-honor-of/chapter/Ancillarity/10.1214/lnms/1215458837.pdf. - ↑
^{2.0}^{2.1}Ghosh, M.; Reid, N.; Fraser, D. A. S. (2010). "Ancillary statistics: A review".*Statistica Sinica***20**(4): 1309–1332. ISSN 1017-0405. https://www.jstor.org/stable/24309506. - ↑
^{3.0}^{3.1}^{3.2}^{3.3}Mukhopadhyay, Nitis (2000).*Probability and Statistical Inference*. United States of America: Marcel Dekker, Inc.. pp. 309–318. ISBN 0-8247-0379-0. - ↑ Dawid, Philip (2011), DasGupta, Anirban, ed., "Basu on Ancillarity" (in en),
*Selected Works of Debabrata Basu*(New York, NY: Springer): pp. 5–8, doi:10.1007/978-1-4419-5825-9_2, ISBN 978-1-4419-5825-9 - ↑ Fisher, R. A. (1925). "Theory of Statistical Estimation" (in en).
*Mathematical Proceedings of the Cambridge Philosophical Society***22**(5): 700–725. doi:10.1017/S0305004100009580. ISSN 0305-0041. Bibcode: 1925PCPS...22..700F. https://www.cambridge.org/core/product/identifier/S0305004100009580/type/journal_article. - ↑ Basu, D. (1964). "Recovery of Ancillary Information".
*Sankhyā: The Indian Journal of Statistics, Series A (1961-2002)***26**(1): 3–16. ISSN 0581-572X. https://www.jstor.org/stable/25049300. - ↑ Stigler, Stephen M. (2001) (in en),
*Ancillary history*, Institute of Mathematical Statistics Lecture Notes - Monograph Series, Beachwood, OH: Institute of Mathematical Statistics, pp. 555–567, doi:10.1214/lnms/1215090089, ISBN 978-0-940600-50-8, http://projecteuclid.org/euclid.lnms/1215090089, retrieved 2023-04-24 - ↑ Buehler, Robert J. (1982). "Some Ancillary Statistics and Their Properties".
*Journal of the American Statistical Association***77**(379): 581–589. doi:10.1080/01621459.1982.10477850. ISSN 0162-1459. https://www.tandfonline.com/doi/abs/10.1080/01621459.1982.10477850. - ↑ "Ancillary statistics". https://ani.stat.fsu.edu/~debdeep/ancillary.pdf.

Original source: https://en.wikipedia.org/wiki/Ancillary statistic.
Read more |