G-test

Short description: Statistical test

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.^[1]

Formulation

The general formula for test statistics of the G-test is

G = 2 \sum_{i} O_{i} \cdot \ln (\frac{O_{i}}{E_{i}}),

where $O_{i} \geq 0$ is the observed count in a cell, $E_{i} > 0$ is the expected count under the null hypothesis, $\ln$ denotes the natural logarithm, and the sum is taken over all non-empty cells. The resulting $G$ is asymptotically chi-squared distributed as the total number of observations tends to infinity (convergence in distribution^[2]).

Furthermore, the total observed count must be equal to the total expected count:

\sum_{i} O_{i} = \sum_{i} E_{i} = N,

where $N$ is the total number of observations.

Both, the G-test statistics $G$ and the chi-square test statistics $χ^{2}$ are special cases of a general family of power divergence statistics by Cressie and Read^[2]. For $λ \notin {0, - 1}$ set

{CR}_{λ} = \frac{2}{λ (λ + 1)} \sum_{i} O_{i} ({(\frac{O_{i}}{E_{i}})}^{λ} - 1) .

Then,

G = \lim_{λ \to 0} {CR}_{λ}, χ^{2} = {CR}_{1} .

Derivation

We can derive the value of the G-test from the log-likelihood ratio test where the underlying model is a multinomial model.

Suppose we had a sample $O = (O_{1}, \dots, O_{m})$ where each $O_{i}$ is the number of times that an object of type $i$ was observed. Furthermore, let $N = \sum_{i = 1}^{m} O_{i}$ be the total number of observations. If we assume that the underlying model is multinomial, then the test statistic is defined by

\ln (\frac{L (\tilde{p} | O)}{L (\hat{p} | O)}) = \ln (\frac{\prod_{i = 1}^{m} {\tilde{p}}_{i}^{O_{i}}}{\prod_{i = 1}^{m} {\hat{p}}_{i}^{O_{i}}}),

where $\tilde{p} = ({\tilde{p}}_{1}, \dots, {\tilde{p}}_{m})$ is the null hypothesis and $\hat{p} = ({\hat{p}}_{1}, \dots, {\hat{p}}_{m})$ is the maximum likelihood estimate (MLE) of the parameters given the data. Recall that for the multinomial model, the MLE of ${\hat{p}}_{i}$ given some data is given by

{\hat{p}}_{i} = \frac{O_{i}}{N} .

Furthermore, we may represent each null hypothesis parameter ${\tilde{p}}_{i}$ as

{\tilde{p}}_{i} = \frac{E_{i}}{N},

where $E_{i}$ is the expected count of objects of type $i$ under the null hypothesis. Thus, by substituting the representations of ${\tilde{p}}_{i}$ and ${\hat{p}}_{i}$ in the log-likelihood ratio, the equation simplifies to

\ln (\frac{L (\tilde{p} | O)}{L (\hat{p} | O)}) = \ln (\prod_{i = 1}^{m} {(\frac{E_{i}}{O_{i}})}^{O_{i}}) = \sum_{i = 1}^{m} O_{i} \ln (\frac{E_{i}}{O_{i}})

Finally, multiply by a factor of $- 2$ (used to make the G-test formula asymptotically equivalent to the Pearson's chi-squared test statistics) to achieve the form

G = - 2 \sum_{i = 1}^{m} O_{i} \ln (\frac{E_{i}}{O_{i}}) = 2 \sum_{i = 1}^{m} O_{i} \ln (\frac{O_{i}}{E_{i}})

Heuristically, one can imagine $O_{i}$ as continuous and approaching zero, in which case $O_{i} \ln O_{i} \to 0$ , and terms with zero observations can simply be dropped. However the expected count in each cell must be strictly greater than zero for each cell ( $E_{i} > 0$ for all $i$ ) to apply the method.

Distribution and use

Given the null hypothesis that the observed frequencies result from random sampling from a distribution with the given expected frequencies, the distribution of the test statistics $G$ is approximately a chi-squared distribution, with the same number of degrees of freedom as in the corresponding chi-squared test.

For very small samples the multinomial test for goodness of fit, and Fisher's exact test for contingency tables, or even Bayesian hypothesis selection are preferable to the G-test.^[3] McDonald recommends to always use an exact test (exact test of goodness-of-fit, Fisher's exact test) if the total sample size is less than 1 000 .

There is nothing magical about a sample size of 1 000, it's just a nice round number that is well within the range where an exact test, chi-square test, and G–test will give almost identical

p

values. Spreadsheets, web-page calculators, and SAS shouldn't have any problem doing an exact test on a sample size of 1 000 .

— John H. McDonald (2014)^[3]

G-tests have been recommended at least since the 1981 edition of Biometry, a statistics textbook by Robert R. Sokal and F. James Rohlf.^[4]

Relation to other metrics

Relation to the chi-squared test

The commonly used chi-squared tests for goodness of fit to a distribution and for independence in contingency tables are in fact approximations of the log-likelihood ratio on which the G-tests are based.^[5]

The general formula for Pearson's chi-squared test statistic is

χ^{2} = \sum_{i} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} .

The approximation of the G-test statistics by chi-squared test statistics is obtained by a second order Taylor expansion of the natural logarithm around 1 (see the derivation below). We have $G \approx χ^{2}$ when the observed counts $O_{i}$ are close to the expected counts $E_{i}$ . When this difference is large, however, the approximation by the chi-squared test statistics begins to break down. Here, the effects of outliers in data will be more pronounced, and this explains the why chi-squared tests fail in situations with little data.

For samples of a reasonable size, the G-test and the chi-squared test will lead to the same conclusions. However, the approximation to the theoretical chi-squared distribution for the G-test is better than for the Pearson's chi-squared test.^[6] In cases where $O_{i} > 2 \cdot E_{i}$ for some cell case the G-test is always better than the chi-squared test. For testing goodness-of-fit the G-test is infinitely more efficient than the chi-squared test in the sense of Bahadur, but the two tests are equally efficient in the sense of Pitman or in the sense of Hodges and Lehmann.^[7]^[8]

Derivation (chi-squared)

Consider

G = 2 \sum_{i} O_{i} \ln (\frac{O_{i}}{E_{i}}),

and let $O_{i} = E_{i} + δ_{i}$ with $\sum_{i} δ_{i} = 0$ , so that the total number of counts remains the same. Assume that $δ_{i} = O_{i} - E_{i}$ is small in comparison to $E_{i}$ for all $i$ . To be more precise, notice that $E_{i} = Θ (n)$ using big Θ notation. If $O_{i} = E_{i} + 𝒪 (n^{1 / 2})$ using big O notation for large $n$ , which should be true under the null hypothesis because of the central limit theorem, then $δ_{i} = 𝒪 (n^{1 / 2})$ and

\frac{δ_{i}^{3}}{E_{i}^{2}} = 𝒪 (\frac{n^{3 / 2}}{n^{2}}) = 𝒪 (n^{- 1 / 2})

follow, which will be used later.

Upon substitution we find,

G = 2 \sum_{i} (E_{i} + δ_{i}) \ln (1 + \frac{δ_{i}}{E_{i}}) .

Using the Taylor expansion $\ln (1 + x) = x - \frac{1}{2} x^{2} + 𝒪 (x^{3})$ yields

G = 2 \sum_{i} (E_{i} + δ_{i}) (\frac{δ_{i}}{E_{i}} - \frac{1}{2} \frac{δ_{i}^{2}}{E_{i}^{2}} + 𝒪 (\frac{δ_{i}^{3}}{E_{i}^{3}})),

and distributing terms we find,

G = 2 \sum_{i} (δ_{i} + \frac{1}{2} \frac{δ_{i}^{2}}{E_{i}} + 𝒪 (\frac{δ_{i}^{3}}{E_{i}^{2}})) .

Now, using $\sum_{i} δ_{i} = 0$ and $δ_{i} = O_{i} - E_{i}$ and $𝒪 (δ_{i}^{3} / E_{i}^{2}) = 𝒪 (n^{- 1 / 2})$ for large $n$ , we can write the result,

G \approx \sum_{i} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} .

Relation to Kullback–Leibler divergence

The G-test statistic is proportional to the Kullback–Leibler divergence of the theoretical distribution $\tilde{p} = ({\tilde{p}}_{1}, \dots, {\tilde{p}}_{m})$ of the null hypothesis from the empirical distribution $\hat{p} = ({\hat{p}}_{1}, \dots, {\hat{p}}_{m})$ of the observed data:

\begin{aligned} G & = 2 \sum_{i} O_{i} \cdot \ln (\frac{O_{i}}{E_{i}}) = 2 N \sum_{i} {\hat{p}}_{i} \cdot \ln (\frac{{\hat{p}}_{i}}{{\tilde{p}}_{i}}) \\ = 2 N D_{K L} (\hat{p} ‖ \tilde{p}), \end{aligned}

where $N$ is the total number of observations and ${\tilde{p}}_{i} = \frac{E_{i}}{N}$ and ${\hat{p}}_{i} = \frac{O_{i}}{N}$ are the theoretical and empirical probabilities of objects of type $i$ , respectively.

Relation to mutual information

For analysis of contingency tables the value of the G-test statistics can also be expressed in terms of mutual information.

In this case objects with two-dimensional types $(i, j)$ are considered. Let $O_{i j}$ be the count of objects of type $(i, j)$ , i.e., $O_{i j}$ is the entry in the contingency table in row $i$ and column $j$ . Set

N = \sum_{i j} O_{i j}, {\hat{p}}_{i j} = \frac{O_{i j}}{N}, {\hat{p}}_{i ∙} = \frac{\sum_{j} O_{i j}}{N}, {\hat{p}}_{∙ j} = \frac{\sum_{i} O_{i j}}{N} .

Then the estimated expected count of objects of type $(i, j)$ assuming independence is given by

E_{i j} = N {\hat{p}}_{i ∙} {\hat{p}}_{∙ j} .

Finally, the G-test statistics in this case is given by

G = 2 \sum_{i j} O_{i j} \ln (\frac{O_{i j}}{E_{i j}})

Let $X, Y$ be random variables with joint distribution given by the empirical distribution ${\hat{p}}_{i j}$ of the contingency table, i.e.,

P (X = i, Y = j) = {\hat{p}}_{i j}, P (X = i) = {\hat{p}}_{i ∙}, P (Y = j) = {\hat{p}}_{∙ j} .

Then the G-test statistics can be expressed in several alternative forms:

\begin{aligned} G & = 2 N \cdot \sum_{i j} {\hat{p}}_{i j} (\ln ({\hat{p}}_{i j}) - \ln ({\hat{p}}_{i ∙}) - \ln ({\hat{p}}_{∙ j})) \\ = 2 N \cdot (H (X) + H (Y) - H (X, Y)) \\ = 2 N \cdot MI (X, Y), \end{aligned}

where the entropies $H (X)$ and $H (Y)$ are given

H (X) = - \sum_{i} {\hat{p}}_{i ∙} \ln ({\hat{p}}_{i ∙}), H (Y) = - \sum_{j} {\hat{p}}_{∙ j} \ln ({\hat{p}}_{∙ j})

and the joint entropy $H (X, Y)$ is given by

H (X, Y) = - \sum_{i j} {\hat{p}}_{i j} \ln ({\hat{p}}_{i j})

and the mutual information of $X$ and $Y$ is

MI (X, Y) = H (X) + H (Y) - H (X, Y) .

It can also be shown that the inverse document frequency weighting commonly used for text retrieval is an approximation of G applicable when the row sum for the query is much smaller than the row sum for the remainder of the corpus. Similarly, the result of Bayesian inference applied to a choice of single multinomial distribution for all rows of the contingency table taken together versus the more general alternative of a separate multinomial per row produces results very similar to the G-test statistic.{{citation needed|date=August 2011}

Application

The McDonald–Kreitman test in statistical genetics is an application of the G-test.
Dunning^[9] introduced the test to the computational linguistics community where it is now widely used.
The R-scape program (used by Rfam) uses G-test to detect co-variation between RNA sequence alignment positions.^[10]

Statistical software

In R fast implementations can be found in the AMR and Rfast packages. For the AMR package, the command is g.test which works exactly like chisq.test from base R. R also has the likelihood.test function in the Deducer package. Note: Fisher's G-test in the GeneCycle Package of the R programming language (fisher.g.test) does not implement the G-test as described in this article, but rather Fisher's exact test of Gaussian white-noise in a time series.^[11]
Another R implementation to compute the G-test statistic and corresponding p-values is provided by the R package entropy. The commands are Gstat for the standard G statistic and the associated p-value and Gstatindep for the G statistic applied to comparing joint and product distributions to test independence.
In SAS, one can conduct G-test by applying the /chisq option after the proc freq.^[12]
In Stata, one can conduct a G-test by applying the lr option after the tabulate command.
In Java, use org.apache.commons.math3.stat.inference.GTest.^[13]
In Python, use scipy.stats.power_divergence with lambda_=0.^[14]

References

↑ McDonald, J.H. (2014). "G–test of goodness-of-fit". Handbook of Biological Statistics (Third ed.). Baltimore, Maryland: Sparky House Publishing. pp. 53–58. http://www.biostathandbook.com/gtestgof.html.
↑ ^2.0 ^2.1 Cressie, Noel; Read, Timothy R. C. (1984). "Multinomial goodness-of-fit tests". Journal of the Royal Statistical Society. Series B. Methodological 46 (3): 440–464. https://www.jstor.org/stable/2345686. Retrieved 14 January 2026.
↑ ^3.0 ^3.1 McDonald, John H. (2014). "Small numbers in chi-square and G–tests". Handbook of Biological Statistics (3rd ed.). Baltimore, MD: Sparky House Publishing. pp. 86–89. http://www.biostathandbook.com/small.html.
↑ Sokal, R. R.; Rohlf, F. J. (1981). Biometry: The Principles and Practice of Statistics in Biological Research (Second ed.). New York: Freeman. ISBN 978-0-7167-2411-7. https://archive.org/details/biometryprincipl00soka_0.
↑ Hoey, J. (2012). "The Two-Way Likelihood Ratio (G) Test and Comparison to Two-Way Chi-Squared Test". arXiv:1206.4881 [stat.ME].
↑ Harremoës, P.; Tusnády, G. (2012). "Information divergence is more chi squared distributed than the chi squared statistic". Proceedings ISIT 2012. pp. 538–543. Bibcode: 2012arXiv1202.1125H.
↑ Quine, M. P.; Robinson, J. (1985). "Efficiencies of chi-square and likelihood ratio goodness-of-fit tests". Annals of Statistics 13 (2): 727–742. doi:10.1214/aos/1176349550.
↑ Harremoës, P.; Vajda, I. (2008). "On the Bahadur-efficient testing of uniformity by means of the entropy". IEEE Transactions on Information Theory 54 (1): 321–331. doi:10.1109/tit.2007.911155. Bibcode: 2008ITIT...54..321H.
↑ Dunning, Ted (1993). "Accurate Methods for the Statistics of Surprise and Coincidence ", Computational Linguistics, Volume 19, issue 1 (March, 1993).
↑ Rivas, Elena (30 October 2020). "RNA structure prediction using positive and negative evolutionary information". PLOS Computational Biology 16 (10). doi:10.1371/journal.pcbi.1008387. PMID 33125376. Bibcode: 2020PLSCB..16E8387R.
↑ Fisher, R. A. (1929). "Tests of significance in harmonic analysis". Proceedings of the Royal Society of London A 125 (796): 54–59. doi:10.1098/rspa.1929.0151. Bibcode: 1929RSPSA.125...54F.
↑ G-test of independence, G-test for goodness-of-fit in Handbook of Biological Statistics, University of Delaware. (pp. 46–51, 64–69 in: McDonald, J. H. (2009) Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.)
↑ "org.apache.commons.math3.stat.inference.GTest". https://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/stat/inference/GTest.html.
↑ "Scipy.stats.power_divergence — SciPy v1.7.1 Manual". https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.power_divergence.html#scipy.stats.power_divergence.

External links

G²/Log-likelihood calculator

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/G-test. Read more

[1] McDonald, J.H. (2014). "G–test of goodness-of-fit". Handbook of Biological Statistics (Third ed.). Baltimore, Maryland: Sparky House Publishing. pp. 53–58. http://www.biostathandbook.com/gtestgof.html.

[CressieRead-1984-GOF-2] 2.0 ^2.1 Cressie, Noel; Read, Timothy R. C. (1984). "Multinomial goodness-of-fit tests". Journal of the Royal Statistical Society. Series B. Methodological 46 (3): 440–464. https://www.jstor.org/stable/2345686. Retrieved 14 January 2026.

[McDonald-2014-HBS-3] 3.0 ^3.1 McDonald, John H. (2014). "Small numbers in chi-square and G–tests". Handbook of Biological Statistics (3rd ed.). Baltimore, MD: Sparky House Publishing. pp. 86–89. http://www.biostathandbook.com/small.html.

[4] Sokal, R. R.; Rohlf, F. J. (1981). Biometry: The Principles and Practice of Statistics in Biological Research (Second ed.). New York: Freeman. ISBN 978-0-7167-2411-7. https://archive.org/details/biometryprincipl00soka_0.

[5] Hoey, J. (2012). "The Two-Way Likelihood Ratio (G) Test and Comparison to Two-Way Chi-Squared Test". arXiv:1206.4881 [stat.ME].

[6] Harremoës, P.; Tusnády, G. (2012). "Information divergence is more chi squared distributed than the chi squared statistic". Proceedings ISIT 2012. pp. 538–543. Bibcode: 2012arXiv1202.1125H.

[7] Quine, M. P.; Robinson, J. (1985). "Efficiencies of chi-square and likelihood ratio goodness-of-fit tests". Annals of Statistics 13 (2): 727–742. doi:10.1214/aos/1176349550.

[8] Harremoës, P.; Vajda, I. (2008). "On the Bahadur-efficient testing of uniformity by means of the entropy". IEEE Transactions on Information Theory 54 (1): 321–331. doi:10.1109/tit.2007.911155. Bibcode: 2008ITIT...54..321H.

[9] Dunning, Ted (1993). "Accurate Methods for the Statistics of Surprise and Coincidence ", Computational Linguistics, Volume 19, issue 1 (March, 1993).

[10] Rivas, Elena (30 October 2020). "RNA structure prediction using positive and negative evolutionary information". PLOS Computational Biology 16 (10). doi:10.1371/journal.pcbi.1008387. PMID 33125376. Bibcode: 2020PLSCB..16E8387R.

[11] Fisher, R. A. (1929). "Tests of significance in harmonic analysis". Proceedings of the Royal Society of London A 125 (796): 54–59. doi:10.1098/rspa.1929.0151. Bibcode: 1929RSPSA.125...54F.

[12] G-test of independence, G-test for goodness-of-fit in Handbook of Biological Statistics, University of Delaware. (pp. 46–51, 64–69 in: McDonald, J. H. (2009) Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.)

[13] "org.apache.commons.math3.stat.inference.GTest". https://commons.apache.org/proper/commons-math/javadocs/api-3.3/org/apache/commons/math3/stat/inference/GTest.html.

[14] "Scipy.stats.power_divergence — SciPy v1.7.1 Manual". https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.power_divergence.html#scipy.stats.power_divergence.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

Anonymous

Search

G-test

Namespaces

More

Page actions

Contents