# Ancillary statistic

An ancillary statistic is a measure of a sample whose distribution (or whose pmf or pdf) does not depend on the parameters of the model.[1][2][3] An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to construct prediction intervals. They are also used in connection with Basu's theorem to prove independence between statistics.[4] This concept was first introduced by Ronald Fisher in the 1920s,[5] but its formal definition was only provided in 1964 by Debabrata Basu.[6][7]

## Examples

Suppose X1, ..., Xn are independent and identically distributed, and are normally distributed with unknown expected value μ and known variance 1. Let

$\displaystyle{ \overline{X}_n = \frac{X_1+\,\cdots\,+X_n}{n} }$

be the sample mean.

The following statistical measures of dispersion of the sample

$\displaystyle{ \hat{\sigma}^2:=\,\frac{\sum \left(X_i-\overline{X}\right)^2}{n} }$

are all ancillary statistics, because their sampling distributions do not change as μ changes. Computationally, this is because in the formulas, the μ terms cancel – adding a constant number to a distribution (and all samples) changes its sample maximum and minimum by the same amount, so it does not change their difference, and likewise for others: these measures of dispersion do not depend on location.

Conversely, given i.i.d. normal variables with known mean 1 and unknown variance σ2, the sample mean $\displaystyle{ \overline{X} }$ is not an ancillary statistic of the variance, as the sampling distribution of the sample mean is N(1, σ2/n), which does depend on σ 2 – this measure of location (specifically, its standard error) depends on dispersion.[8]

### In location-scale families

In a location family of distributions, $\displaystyle{ (X_1 - X_n, X_2 - X_n, \dots, X_{n-1} - X_n) }$ is an ancillary statistic.

In a scale family of distributions, $\displaystyle{ (\frac{X_1}{X_n}, \frac{X_2}{X_n}, \dots, \frac{X_{n-1}}{X_n}) }$ is an ancillary statistic.

In a location-scale family of distributions, $\displaystyle{ (\frac{X_1 - X_n}{S}, \frac{X_2 - X_n}{S}, \dots, \frac{X_{n - 1} - X_n}{S}) }$, where $\displaystyle{ S^2 }$ is the sample variance, is an ancillary statistic.[3][9]

## In recovery of information

It turns out that, if $\displaystyle{ T_1 }$ is a non-sufficient statistic and $\displaystyle{ T_2 }$ is ancillary, one can sometimes recover all the information about the unknown parameter contained in the entire data by reporting $\displaystyle{ T_1 }$ while conditioning on the observed value of $\displaystyle{ T_2 }$. This is known as conditional inference.[3]

For example, suppose that $\displaystyle{ X_1, X_2 }$ follow the $\displaystyle{ N(\theta, 1) }$ distribution where $\displaystyle{ \theta }$ is unknown. Note that, even though $\displaystyle{ X_1 }$ is not sufficient for $\displaystyle{ \theta }$ (since its Fisher information is 1, whereas the Fisher information of the complete statistic $\displaystyle{ \overline{X} }$ is 2), by additionally reporting the ancillary statistic $\displaystyle{ X_1 - X_2 }$, one obtains a joint distribution with Fisher information 2.[3]

## Ancillary complement

Given a statistic T that is not sufficient, an ancillary complement is a statistic U that is ancillary and such that (TU) is sufficient.[2] Intuitively, an ancillary complement "adds the missing information" (without duplicating any).

The statistic is particularly useful if one takes T to be a maximum likelihood estimator, which in general will not be sufficient; then one can ask for an ancillary complement. In this case, Fisher argues that one must condition on an ancillary complement to determine information content: one should consider the Fisher information content of T to not be the marginal of T, but the conditional distribution of T, given U: how much information does T add? This is not possible in general, as no ancillary complement need exist, and if one exists, it need not be unique, nor does a maximum ancillary complement exist.

### Example

In baseball, suppose a scout observes a batter in N at-bats. Suppose (unrealistically) that the number N is chosen by some random process that is independent of the batter's ability – say a coin is tossed after each at-bat and the result determines whether the scout will stay to watch the batter's next at-bat. The eventual data are the number N of at-bats and the number X of hits: the data (XN) are a sufficient statistic. The observed batting average X/N fails to convey all of the information available in the data because it fails to report the number N of at-bats (e.g., a batting average of 0.400, which is very high, based on only five at-bats does not inspire anywhere near as much confidence in the player's ability than a 0.400 average based on 100 at-bats). The number N of at-bats is an ancillary statistic because

• It is a part of the observable data (it is a statistic), and
• Its probability distribution does not depend on the batter's ability, since it was chosen by a random process independent of the batter's ability.

This ancillary statistic is an ancillary complement to the observed batting average X/N, i.e., the batting average X/N is not a sufficient statistic, in that it conveys less than all of the relevant information in the data, but conjoined with N, it becomes sufficient.