Grubbs's test for outliers

In statistics, Grubbs's test or the Grubbs test (named after Frank E. Grubbs, who published the test in 1950^[1]), also known as the maximum normalized residual test or extreme studentized deviate test, is a test used to detect outliers in a univariate data set assumed to come from a normally distributed population.

Definition

Grubbs's test is based on the assumption of normality. That is, one should first verify that the data can be reasonably approximated by a normal distribution before applying the Grubbs test.^[2]

Grubbs's test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or fewer since it frequently tags most of the points as outliers.^{[citation needed]}

Grubbs's test is defined for the hypothesis:

H₀: There are no outliers in the data set

H_a: There is exactly one outlier in the data set

The Grubbs test statistic is defined as:

[math]\displaystyle{ G = \frac{\displaystyle\max_{i=1,\ldots, N}\left \vert Y_i - \bar{Y}\right\vert}{s} }[/math]

with [math]\displaystyle{ \overline{Y} }[/math] and [math]\displaystyle{ s }[/math] denoting the sample mean and standard deviation, respectively. The Grubbs test statistic is the largest absolute deviation from the sample mean in units of the sample standard deviation.

This is the two-sided version of the test. The Grubbs test can also be defined as a one-sided test. To test whether the minimum value is an outlier, the test statistic is

[math]\displaystyle{ G = \frac{\bar{Y}-Y_\min}{s} }[/math]

with Y_min denoting the minimum value. To test whether the maximum value is an outlier, the test statistic is

[math]\displaystyle{ G = \frac{Y_\max - \bar{Y}}{s} }[/math]

with Y_max denoting the maximum value.

For the two-sided test, the hypothesis of no outliers is rejected at significance level α if

[math]\displaystyle{ G \gt \frac{N-1}{\sqrt{N}} \sqrt{\frac{t_{\alpha/(2N),N-2}^2}{N - 2 + t_{\alpha/(2N),N-2}^2}} }[/math]

with t_{α/(2N),N−2} denoting the upper critical value of the t-distribution with N − 2 degrees of freedom and a significance level of α/(2N). For the one-sided tests, replace α/(2N) with α/N.

Related techniques

Several graphical techniques can, and should, be used to detect outliers. A simple run sequence plot, a box plot, or a histogram should show any obviously outlying points. A normal probability plot may also be useful.

References

↑ Grubbs, Frank E. (1950). "Sample criteria for testing outlying observations". Annals of Mathematical Statistics 21 (1): 27–58. doi:10.1214/aoms/1177729885.
↑ Quoted from the Engineering and Statistics Handbook, paragraph 1.3.5.17, http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm

Grubbs, Frank (February 1969). "Procedures for Detecting Outlying Observations in Samples". Technometrics (Technometrics, Vol. 11, No. 1) 11 (1): 1–21. doi:10.2307/1266761.
Stefansky, W. (1972). "Rejecting Outliers in Factorial Designs". Technometrics (Technometrics, Vol. 14, No. 2) 14 (2): 469–479. doi:10.2307/1267436.

This article incorporates public domain material from the National Institute of Standards and Technology website https://www.nist.gov.

0.00

(0 votes)

[1] Grubbs, Frank E. (1950). "Sample criteria for testing outlying observations". Annals of Mathematical Statistics 21 (1): 27–58. doi:10.1214/aoms/1177729885.

[2] Quoted from the Engineering and Statistics Handbook, paragraph 1.3.5.17, http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm

[1]

[2]

Anonymous

Search

Grubbs's test for outliers

Namespaces

More

Page actions

Contents

Definition

Related techniques

See also

References

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Grubbs's test for outliers

Definition

Related techniques

See also

References

Navigation

Wiki tools

Page tools

Other projects

Categories