Goodness of fit
Part of a series on 
Regression analysis 

Models 
Estimation 
Background 

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chisquare test). In the analysis of variance, one of the components into which the variance is partitioned may be a lackoffit sum of squares.
Fit of distributions
In assessing whether a given distribution is suited to a dataset, the following tests and their underlying measures of fit can be used:
 Bayesian information criterion
 Kolmogorov–Smirnov test
 Cramér–von Mises criterion
 Anderson–Darling test
 BerkJones tests^{[1]}^{[2]}
 Shapiro–Wilk test
 Chisquared test
 Akaike information criterion
 Hosmer–Lemeshow test
 Kuiper's test
 Kernelized Stein discrepancy^{[3]}^{[4]}
 Zhang's Z_{K}, Z_{C} and Z_{A} tests^{[5]}
 Moran test
 Density Based Empirical Likelihood Ratio tests^{[6]}
Regression analysis
In regression analysis, more specifically regression validation, the following topics relate to goodness of fit:
 Coefficient of determination (the Rsquared measure of goodness of fit);
 Lackoffit sum of squares;
 Mallows's Cp criterion
 Prediction error
 Reduced chisquare
Categorical data
The following are examples that arise in the context of categorical data.
Pearson's chisquare test
Pearson's chisquare test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:
[math]\displaystyle{ \chi^2 = \sum_{i=1}^n {\frac{(O_i  E_i)}{E_i}^2} }[/math] where:
 O_{i} = an observed count for bin i
 E_{i} = an expected count for bin i, asserted by the null hypothesis.
The expected frequency is calculated by: [math]\displaystyle{ E_i \, = \, \bigg( F(Y_u) \,  \, F(Y_l) \bigg) \, N }[/math] where:
 F = the cumulative distribution function for the probability distribution being tested.
 Y_{u} = the upper limit for class i,
 Y_{l} = the lower limit for class i, and
 N = the sample size
The resulting value can be compared with a chisquare distribution to determine the goodness of fit. The chisquare distribution has (k − c) degrees of freedom, where k is the number of nonempty cells and c is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution plus one. For example, for a 3parameter Weibull distribution, c = 4.
Example: equal frequencies of men and women
For example, to test the hypothesis that a random sample of 100 people has been drawn from a population in which men and women are equal in frequency, the observed number of men and women would be compared to the theoretical frequencies of 50 men and 50 women. If there were 44 men in the sample and 56 women, then
[math]\displaystyle{ \chi^2 = {(44  50)^2 \over 50} + {(56  50)^2 \over 50} = 1.44 }[/math]
If the null hypothesis is true (i.e., men and women are chosen with equal probability in the sample), the test statistic will be drawn from a chisquare distribution with one degree of freedom. Though one might expect two degrees of freedom (one each for the men and women), we must take into account that the total number of men and women is constrained (100), and thus there is only one degree of freedom (2 − 1). In other words, if the male count is known the female count is determined, and vice versa.
Consultation of the chisquare distribution for 1 degree of freedom shows that the cumulative probability of observing a difference more than [math]\displaystyle{ \chi^2=1.44 }[/math] if men and women are equally numerous in the population is approximately 0.23. This probability is higher than the conventionally accepted criteria for statistical significance (a probability of .001.05), so normally we would not reject the null hypothesis that the number of men in the population is the same as the number of women (i.e. we would consider our sample within the range of what we'd expect for a 50/50 male/female ratio.)
Note the assumption that the mechanism that has generated the sample is random, in the sense of independent random selection with the same probability, here 0.5 for both males and females. If, for example, each of the 44 males selected brought a male buddy, and each of the 56 females brought a female buddy, each [math]\displaystyle{ {(O_i  E_i)}^2 }[/math] will increase by a factor of 4, while each [math]\displaystyle{ E_i }[/math] will increase by a factor of 2. The value of the statistic will double to 2.88. Knowing this underlying mechanism, we should of course be counting pairs. In general, the mechanism, if not defensibly random, will not be known. The distribution to which the test statistic should be referred may, accordingly, be very different from chisquare.^{[7]}
Binomial case
A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are n trials each with probability of success, denoted by p. Provided that np_{i} ≫ 1 for every i (where i = 1, 2, ..., k), then
[math]\displaystyle{ \chi^2 = \sum_{i=1}^{k} {\frac{(N_i  np_i)^2}{np_i}} = \sum_{\mathrm{all\ cells}}^{} {\frac{(\mathrm{O}  \mathrm{E})^2}{\mathrm{E}}}. }[/math]
This has approximately a chisquare distribution with k − 1 degrees of freedom. The fact that there are k − 1 degrees of freedom is a consequence of the restriction [math]\displaystyle{ \sum N_i=n }[/math]. We know there are k observed cell counts, however, once any k − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only k − 1 freely determined cell counts, thus k − 1 degrees of freedom.
Gtest
Gtests are likelihoodratio tests of statistical significance that are increasingly being used in situations where Pearson's chisquare tests were previously recommended.^{[8]}
The general formula for G is
 [math]\displaystyle{ G = 2\sum_{i} {O_{i} \cdot \ln\left(\frac{O_i}{E_i}\right)}, }[/math]
where [math]\displaystyle{ O_i }[/math] and [math]\displaystyle{ E_i }[/math] are the same as for the chisquare test, [math]\displaystyle{ \ln }[/math] denotes the natural logarithm, and the sum is taken over all nonempty cells. Furthermore, the total observed count should be equal to the total expected count:[math]\displaystyle{ \sum_i O_i = \sum_i E_i = N }[/math]where [math]\displaystyle{ N }[/math] is the total number of observations.
Gtests have been recommended at least since the 1981 edition of the popular statistics textbook by Robert R. Sokal and F. James Rohlf.^{[9]}
See also
 All models are wrong
 Deviance (statistics) (related to GLM)
 Overfitting
 Statistical model validation
 Theil–Sen estimator
References
 ↑ Berk, Robert H.; Jones, Douglas H. (1979). "Goodnessoffit test statistics that dominate the Kolmogorov statistics". Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 47 (1): 47–59. doi:10.1007/BF00533250.
 ↑ Moscovich, Amit; Nadler, Boaz; Spiegelman, Clifford (2016). "On the exact BerkJones statistics and their pvalue calculation". Electronic Journal of Statistics 10 (2). doi:10.1214/16EJS1172.
 ↑ Liu, Qiang; Lee, Jason; Jordan, Michael (20 June 2016). "A Kernelized Stein Discrepancy for Goodnessoffit Tests". The 33rd International Conference on Machine Learning. New York, New York, USA: Proceedings of Machine Learning Research. pp. 276–284. http://proceedings.mlr.press/v48/liub16.html.
 ↑ Chwialkowski, Kacper; Strathmann, Heiko; Gretton, Arthur (20 June 2016). "A Kernel Test of Goodness of Fit". The 33rd International Conference on Machine Learning. New York, New York, USA: Proceedings of Machine Learning Research. pp. 2606–2615. http://proceedings.mlr.press/v48/chwialkowski16.html.
 ↑ Zhang, Jin (2002). "Powerful goodnessoffit tests based on the likelihood ratio". J. R. Stat. Soc. B 64 (2): 281–294. doi:10.1111/14679868.00337. http://anakena.dcc.uchile.cl/~mnmonsal/eso.pdf. Retrieved 5 November 2018.
 ↑ Vexler, Albert; Gurevich, Gregory (2010). "Empirical Likelihood Ratios Applied to GoodnessofFit Tests Based on Sample Entropy". Computational Statistics and Data Analysis 54 (2): 531–545. doi:10.1016/j.csda.2009.09.025.
 ↑ Maindonald, J. H.; Braun, W. J. (2010). Data Analysis and Graphics Using R. An ExampleBased Approach. (Third ed.). New York: Cambridge University Press. pp. 116118. ISBN 9780521762939. https://archive.org/details/dataanalysisgrap00main_071.
 ↑ McDonald, J.H. (2014). "G–test of goodnessoffit". Handbook of Biological Statistics (Third ed.). Baltimore, Maryland: Sparky House Publishing. pp. 53–58. http://www.biostathandbook.com/gtestgof.html.
 ↑ Sokal, R. R.; Rohlf, F. J. (1981). Biometry: The Principles and Practice of Statistics in Biological Research (Second ed.). W. H. Freeman. ISBN 0716724111. https://archive.org/details/biometryprincipl00soka_0.
Further reading
 HuberCarol, C.; Balakrishnan, N.; Nikulin, M. S. et al., eds. (2002), GoodnessofFit Tests and Model Validity, Springer
 Ingster, Yu. I.; Suslina, I. A. (2003), Nonparametric GoodnessofFit Testing Under Gaussian Models, Springer
 Rayner, J. C. W.; Thas, O.; Best, D. J. (2009), Smooth Tests of Goodness of Fit (2nd ed.), Wiley
 "Empirical likelihood ratios applied to goodnessoffit tests based on sample entropy", Computational Statistics & Data Analysis 54 (2): 531–545, 2010, doi:10.1016/j.csda.2009.09.025
Original source: https://en.wikipedia.org/wiki/Goodness of fit.
Read more 