Rule of three (statistics)

From HandWiki
Comparison of the rule of three to the exact binomial one-sided confidence interval with no positive samples

In statistical analysis, the rule of three states that if a certain event did not occur in a sample with n subjects, the interval from 0 to 3/n is a 95% confidence interval for the rate of occurrences in the population. When n is greater than 30, this is a good approximation of results from more sensitive tests. For example, a pain-relief drug is tested on 1500 human subjects, and no adverse event is recorded. From the rule of three, it can be concluded with 95% confidence that fewer than 1 person in 500 (or 3/1500) will experience an adverse event. By symmetry, for only successes, the 95% confidence interval is [1−3/n,1].

The rule is useful in the interpretation of clinical trials generally, particularly in phase II and phase III where often there are limitations in duration or statistical power. The rule of three applies well beyond medical research, to any trial done n times. If 300 parachutes are randomly tested and all open successfully, then it is concluded with 95% confidence that fewer than 1 in 100 parachutes with the same characteristics (3/300) will fail.[1]

Derivation

A 95% confidence interval is sought for the probability p of an event occurring for any randomly selected single individual in a population, given that it has not been observed to occur in n Bernoulli trials. Denoting the number of events by X, we therefore wish to find the values of the parameter p of a binomial distribution that give Pr(X = 0) ≤ 0.05. The rule can then be derived[2] either from the Poisson approximation to the binomial distribution, or from the formula (1−p)n for the probability of zero events in the binomial distribution. In the latter case, the edge of the confidence interval is given by Pr(X = 0) = 0.05 and hence (1−p)n = .05 so n ln(1–p) = ln .05 ≈ −2.996. Rounding the latter to −3 and using the approximation, for p close to 0, that ln(1−p) ≈ −p (Taylor's formula), we obtain the interval's boundary 3/n.

By a similar argument, the numerator values of 3.51, 4.61, and 5.3 may be used for the 97%, 99%, and 99.5% confidence intervals, respectively, and in general the upper end of the confidence interval can be given as [math]\displaystyle{ \frac{-\ln(\alpha)}{n} }[/math], where [math]\displaystyle{ 1-\alpha }[/math] is the desired confidence level.

Extension

The Vysochanskij–Petunin inequality shows that the rule of three holds for unimodal distributions with finite variance beyond just the binomial distribution, and gives a way to change the factor 3 if a different confidence is desired. Chebyshev's inequality removes the assumption of unimodality at the price of a higher multiplier (about 4.5 for 95% confidence). Cantelli's inequality is the one-tailed version of Chebyshev's inequality.

See also

Notes

  1. There are other meanings of the term "rule of three" in mathematics, and a further distinct meaning within statistics:

    A century and a half ago Charles Darwin said he had "no Faith in anything short of actual measurement and the Rule of Three," by which he appeared to mean the peak of arithmetical accomplishment in a nineteenth-century gentleman, solving for x in "6 is to 3 as 9 is to x." Some decades later, in the early 1900s, Karl Pearson shifted the meaning of the rule of three – "take 3σ [three standard deviations] as definitely significant" – and claimed it for his new journal of significance testing, Biometrika. Even Darwin late in life seems to have fallen into the confusion. (Ziliak and McCloskey, 2008, p. 26; parenthetic gloss in original)

  2. "Professor Mean" (2010) "Confidence interval with zero events", The Children's Mercy Hospital. Retrieved 2013-01-01.

References

  • Ziliak, S. T.; D. N. McCloskey (2008). The cult of statistical significance: How the standard error costs us jobs, justice, and lives. University of Michigan Press. ISBN:0472050079