ANOVA on ranks

From HandWiki
Revision as of 14:50, 6 February 2024 by Ohm (talk | contribs) (change)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In statistics, one purpose for the analysis of variance (ANOVA) is to analyze differences in means between groups. The test statistic, F, assumes independence of observations, homogeneous variances, and population normality. ANOVA on ranks is a statistic designed for situations when the normality assumption has been violated.

Logic of the F test on means

The F statistic is a ratio of a numerator to a denominator. Consider randomly selected subjects that are subsequently randomly assigned to groups A, B, and C. Under the truth of the null hypothesis, the variability (or sum of squares) of scores on some dependent variable will be the same within each group. When divided by the degrees of freedom (i.e., based on the number of subjects per group), the denominator of the F ratio is obtained.

Treat the mean for each group as a score, and compute the variability (again, the sum of squares) of those three scores. When divided by its degrees of freedom (i.e., based on the number of groups), the numerator of the F ratio is obtained.

Under the truth of the null hypothesis, the sampling distribution of the F ratio depends on the degrees of freedom for the numerator and the denominator.

Model a treatment applied to group A by increasing every score by X. (This model maintains the underlying assumption of homogeneous variances. In practice it is rare – if not impossible – for an increase of X in a group mean to occur via an increase of each member's score by X.) This will shift the distribution X units in the positive direction, but will not have any impact on the variability within the group. However, the variability between the three groups' mean scores will now increase. If the resulting F ratio raises the value to such an extent that it exceeds the threshold of what constitutes a rare event (called the Alpha level), the Anova F test is said to reject the null hypothesis of equal means between the three groups, in favor of the alternative hypothesis that at least one of the groups has a larger mean (which in this example, is group A).


Handling violation of population normality

Ranking is one of many procedures used to transform data that do not meet the assumptions of normality. Conover and Iman provided a review of the four main types of rank transformations (RT).[1] One method replaces each original data value by its rank (from 1 for the smallest to N for the largest). This rank-based procedure has been recommended as being robust to non-normal errors, resistant to outliers, and highly efficient for many distributions. It may result in a known statistic (e.g., in the two independent samples layout ranking results in the Wilcoxon rank-sum / Mann–Whitney U test), and provides the desired robustness and increased statistical power that is sought. For example, Monte Carlo studies have shown that the rank transformation in the two independent samples t-test layout can be successfully extended to the one-way independent samples ANOVA, as well as the two independent samples multivariate Hotelling's T2 layouts[2] Commercial statistical software packages (e.g., SAS) followed with recommendations to data analysts to run their data sets through a ranking procedure (e.g., PROC RANK) prior to conducting standard analyses using parametric procedures.[3][4][5]

Failure of ranking in the factorial ANOVA and other complex layouts

ANOVA on ranks means that a standard analysis of variance is calculated on the rank-transformed data. Conducting factorial ANOVA on the ranks of original scores has also been suggested.[6][7][8] However, Monte Carlo studies,[9][10][11][12] and subsequent asymptotic studies[13][14] found that the rank transformation is inappropriate for testing interaction effects in a 4x3 and a 2x2x2 factorial design. As the number of effects (i.e., main, interaction) become non-null, and as the magnitude of the non-null effects increase, there is an increase in Type I error, resulting in a complete failure of the statistic with as high as a 100% probability of making a false positive decision. Similarly, it was found that the rank transformation increasingly fails in the two dependent samples layout as the correlation between pretest and posttest scores increase.[15] It was also discovered that the Type I error rate problem was exacerbated in the context of Analysis of Covariance, particularly as the correlation between the covariate and the dependent variable increased.[16]

Transforming ranks

A variant of rank-transformation is 'quantile normalization' in which a further transformation is applied to the ranks such that the resulting values have some defined distribution (often a normal distribution with a specified mean and variance). Further analyses of quantile-normalized data may then assume that distribution to compute significance values. However, two specific types of secondary transformations, the random normal scores and expected normal scores transformation, have been shown to greatly inflate Type I errors and severely reduce statistical power.[17]

Violating homoscedasticity

The ANOVA on ranks has never been recommended when the underlying assumption of homogeneous variances has been violated, either by itself, or in conjunction with a violation of the assumption of population normality.[citation needed] In general, rank based statistics become nonrobust with respect to Type I errors for departures from homoscedasticity even more quickly than parametric counterparts that share the same assumption.[citation needed]

Further information

Kepner and Wackerly summarized the literature in noting "by the late 1980s, the volume of literature on RT methods was rapidly expanding as new insights, both positive and negative, were gained regarding the utility of the method. Concerned that RT methods would be misused, Sawilowsky et al. (1989, p. 255) cautioned practitioners to avoid the use of these tests 'except in those specific situations where the characteristics of the tests are well understood'."[18] According to Hettmansperger and McKean,[19] "Sawilowsky (1990)[20] provides an excellent review of nonparametric approaches to testing for interaction" in ANOVA.

Notes

  1. Conover, W. J.; Iman, R. L. (1981). "Rank transformations as a bridge between parametric and nonparametric statistics". American Statistician 35 (3): 124–129. doi:10.2307/2683975. http://is.ba.ttu.edu/conover/Dr.Conover.htm. 
  2. Nanna, M. J. (2002). "Hoteling's T2 vs. the rank transformation with real Likert data". Journal of Modern Applied Statistical Methods 1: 83–99. doi:10.22237/jmasm/1020255180. 
  3. SAS Institute. (1985). SAS/stat guide for personal computers (5th ed.). Cary, NC: Author.
  4. SAS Institute. (1987). SAS/stat guide for personal computers (6th ed.). Cary, NC: Author.
  5. *SAS Institute. (2008). SAS/STAT 9.2 User's guide: Introduction to Nonparametric Analysis. Cary, NC. Author.
  6. Conover, W. J.; Iman, R. L. (1976). "On some alternative procedures using ranks for the analysis of experimental designs". Communications in Statistics - Theory and Methods A5 (14): 1349–1368. doi:10.1080/03610927608827447. 
  7. Iman, R. L. (1974). "A power study of a rank transform for the two-way classification model when interactions may be present". Canadian Journal of Statistics 2 (2): 227–239. doi:10.2307/3314695. 
  8. Iman, R. L., & Conover, W. J. (1976). A comparison of several rank tests for the two-way layout (SAND76-0631). Albuquerque, NM: Sandia Laboratories.
  9. Sawilowsky, S. (1985). Robust and power analysis of the 2x2x2 ANOVA, rank transformation, random normal scores, and expected normal scores transformation tests. Unpublished doctoral dissertation, University of South Florida.
  10. Sawilowsky, S.; Blair, R. C.; Higgins, J. J. (1989). "An investigation of the type I error and power properties of the rank transform procedure in factorial ANOVA". Journal of Educational Statistics 14 (3): 255–267. doi:10.2307/1165018. 
  11. Blair, R. C.; Sawilowsky, S. S.; Higgins, J. J. (1987). "Limitations of the rank transform statistic in tests for interactions". Communications in Statistics - Simulation and Computation B16 (4): 1133–1145. doi:10.1080/03610918708812642. 
  12. Sawilowsky, S. (1990). "Nonparametric tests of interaction in experimental design". Review of Educational Research 60 (1): 91–126. doi:10.3102/00346543060001091. 
  13. Thompson, G. L. (1991). "A note on the rank transform for interactions". Biometrika 78 (3): 697–701. doi:10.1093/biomet/78.3.697. 
  14. Thompson, G. L.; Ammann, L. P. (1989). "Efficiencies of the rank-transform in two-way models with no interaction". Journal of the American Statistical Association 84 (405): 325–330. doi:10.1080/01621459.1989.10478773. 
  15. Blair, R. C.; Higgins, J. J. (1985). "A Comparison of the Power of the Paired Samples Rank Transform Statistic to that of Wilcoxon's Signed Ranks Statistic". Journal of Educational and Behavioral Statistics 10 (4): 368–383. doi:10.3102/10769986010004368. 
  16. Headrick, T. C. (1997). Type I error and power of the rank transform analysis of covariance (ANCOVA) in a 3 x 4 factorial layout. Unpublished doctoral dissertation, University of South Florida.
  17. Sawilowsky, S. (1985). "A comparison of random normal scores test under the F and Chi-square distributions to the 2x2x2 ANOVA test". Florida Journal of Educational Research 27: 83–97. https://feraonline.org/article/4-a-comparison-of-random-normal-scores-test-under-the-f-and-chi-square-distributions-to-the-2x2x2-anova-test/. 
  18. Kepner, James L.; Wackerly, Dennis D. (1996). "On Rank Transformation Techniques for Balanced Incomplete Repeated-Measures Designs". Journal of the American Statistical Association 91 (436): 1619–1625. doi:10.1080/01621459.1996.10476730. 
  19. Hettmansperger, T. P.; McKean, J. W. (1998). Robust nonparametric statistical methods. Kendall's Library of Statistics. 5 (First ed.). London: Edward Arnold. pp. xiv+467 pp. ISBN 0-340-54937-8. 
  20. Sawilowsky, S. (1990). "Nonparametric tests of interaction in experimental design". Review of Educational Research 60: 91–126. doi:10.3102/00346543060001091.