Shapiro–Francia test

From HandWiki
Revision as of 22:17, 6 March 2023 by CodeMe (talk | contribs) (link)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The Shapiro–Francia test is a statistical test for the normality of a population, based on sample data. It was introduced by S. S. Shapiro and R. S. Francia in 1972 as a simplification of the Shapiro–Wilk test.[1]

Theory

Let [math]\displaystyle{ x_{(i)} }[/math] be the [math]\displaystyle{ i }[/math]-th ordered value from our size-[math]\displaystyle{ n }[/math] sample. For example, if the sample consists of the values [math]\displaystyle{ \left\{ 5.6, -1.2, 7.8, 3.4 \right\} }[/math], [math]\displaystyle{ x_{(2)} = 3.4 }[/math], because that is the second-lowest value. Let [math]\displaystyle{ m_{i:n} }[/math] be the mean of the [math]\displaystyle{ i }[/math]th order statistic when making [math]\displaystyle{ n }[/math] independent draws from a normal distribution. For example, [math]\displaystyle{ m_{2:4} \approx -0.297 }[/math], meaning that the second-lowest value in a sample of four draws from a normal distribution is typically about 0.297 standard deviations below the mean.[2] Form the Pearson correlation coefficient between the [math]\displaystyle{ x }[/math] and the [math]\displaystyle{ m }[/math]:

[math]\displaystyle{ W' = \frac{\operatorname{cov}(x, m)}{\sigma_x \sigma_m} = \frac{\sum_{i=1}^n (x_{(i)} - \bar{x}) (m_i - \bar{m})}{\sqrt{\left( \sum_{i=1}^n (x_{(i)} - \bar{x})^2 \right) \left( \sum_{i=1}^n (m_i - \bar{m})^2 \right)}} }[/math]

Under the null hypothesis that the data is drawn from a normal distribution, this correlation will be strong, so [math]\displaystyle{ W' }[/math] values will cluster just under 1, with the peak becoming narrower and closer to 1 as [math]\displaystyle{ n }[/math] increases. If the data deviate strongly from a normal distribution, [math]\displaystyle{ W' }[/math] will be smaller.[1]

This test is a formalization of the older practice of forming a Q–Q plot to compare two distributions, with the [math]\displaystyle{ x }[/math] playing the role of the quantile points of the sample distribution and the [math]\displaystyle{ m }[/math] playing the role of the corresponding quantile points of a normal distribution.

Compared to the Shapiro–Wilk test statistic [math]\displaystyle{ W }[/math], the Shapiro–Francia test statistic [math]\displaystyle{ W' }[/math] is easier to compute, because it does not require that we form and invert the matrix of covariances between order statistics.

Practice

There is no known closed-form analytic expression for the values of [math]\displaystyle{ m_{i:n} }[/math] required by the test. There, are however, several approximations that are adequate for most practical purposes.[2]

The exact form of the null distribution of [math]\displaystyle{ W' }[/math] is known only for [math]\displaystyle{ n=3 }[/math].[1] Monte-Carlo simulations have shown that the transformed statistic [math]\displaystyle{ \ln(1-W') }[/math] is nearly normally distributed, with values of the mean and standard deviation that vary slowly with [math]\displaystyle{ n }[/math] in an easily parameterized form.[3]

Power

Comparison studies have concluded that order statistic correlation tests such as Shapiro–Francia and Shapiro–Wilk are among the most powerful of the established statistical tests for normality.[4] One might assume that the covariance-adjusted weighting of different order statistics used by the Shapiro–Wilk test should make it slightly better, but in practice the Shapiro–Wilk and Shapiro–Francia variants are about equally good. In fact, the Shapiro–Francia variant actually exhibits more power to distinguish some alternative hypothesis.[5]

References

  1. 1.0 1.1 1.2 Shapiro, S. S.; Francia, R. S. (1972-03-01). "An Approximate Analysis of Variance Test for Normality". Journal of the American Statistical Association (American Statistical Association) 67 (337): 215–216. doi:10.2307/2284728. ISSN 1537-274X. OCLC 1480864. 
  2. 2.0 2.1 Arnold, Barry C.; Balakrishnan, Narayanaswamy; Nagaraja, Haikady N. (2008). A First Course in Order Statistics. Classics in Applied Mathematics. 54. Philadelphia, PA: Society for Industrial and Applied Mathematics. ISBN 978-0-89871-648-1. 
  3. Royston, Patrick (1993). "A Toolkit for Testing for Non-Normality in Complete and Censored Samples". The Statistician (Royal Statistical Society) 42 (1): 37–43. doi:10.2307/2348109. 
  4. Razali, Nornadiah Mohd; Wah, Yap Bee (2011). "Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors, and Anderson–Darling Tests". Journal of Statistical Modeling and Analytics (Kuala Lumpur: Institut Statistik Malaysia) 2 (1): 21–33. ISBN 978-967-363-157-5. https://www.researchgate.net/publication/267205556. 
  5. Ahmad, Fiaz; Khan, Rehan Ahmad (2015). "A power comparison of various normality tests". Pakistan Journal of Statistics and Operation Research (Lahore, Pakistan: College of Statistical and Actuarial Sciences, University of the Punjab) 11 (3): 331–345. doi:10.18187/pjsor.v11i3.845. ISSN 2220-5810. https://pjsor.com/pjsor/article/download/845/437.