Cramér's V

Short description: Statistical measure of association

In statistics, Cramér's V (sometimes referred to as Cramér's phi and denoted as φ_c) is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946.^[1]

Usage and interpretation

φ_c is the intercorrelation of two discrete variables^[2] and may be used with variables having two or more levels. φ_c is a symmetrical measure: it does not matter which variable we place in the columns and which in the rows. Also, the order of rows/columns doesn't matter, so φ_c may be used with nominal data types or higher (notably, ordered or numerical).

Cramér's V varies from 0 (corresponding to no association between the variables) to 1 (complete association) and can reach 1 only when each variable is completely determined by the other. It may be viewed as the association between two variables as a percentage of their maximum possible variation.

φ_c² is the mean square canonical correlation between the variables.^{[citation needed]}

In the case of a 2 × 2 contingency table Cramér's V is equal to the absolute value of Phi coefficient.

Calculation

Let a sample of size n of the simultaneously distributed variables [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] for [math]\displaystyle{ i=1,\ldots,r; j=1,\ldots,k }[/math] be given by the frequencies

[math]\displaystyle{ n_{ij}= }[/math] number of times the values [math]\displaystyle{ (A_i,B_j) }[/math] were observed.

The chi-squared statistic then is:

[math]\displaystyle{ \chi^2=\sum_{i,j}\frac{(n_{ij}-\frac{n_{i.}n_{.j}}{n})^2}{\frac{n_{i.}n_{.j}}{n}}\;, }[/math]

where [math]\displaystyle{ n_{i.}=\sum_jn_{ij} }[/math] is the number of times the value [math]\displaystyle{ A_i }[/math] is observed and [math]\displaystyle{ n_{.j}=\sum_in_{ij} }[/math] is the number of times the value [math]\displaystyle{ B_j }[/math] is observed.

Cramér's V is computed by taking the square root of the chi-squared statistic divided by the sample size and the minimum dimension minus 1:

[math]\displaystyle{ V = \sqrt{\frac{\varphi^2}{\min(k - 1,r-1)}} = \sqrt{ \frac{\chi^2/n}{\min(k - 1,r-1)}}\;, }[/math]

where:

[math]\displaystyle{ \varphi }[/math] is the phi coefficient.
[math]\displaystyle{ \chi^2 }[/math] is derived from Pearson's chi-squared test
[math]\displaystyle{ n }[/math] is the grand total of observations and
[math]\displaystyle{ k }[/math] being the number of columns.
[math]\displaystyle{ r }[/math] being the number of rows.

The p-value for the significance of V is the same one that is calculated using the Pearson's chi-squared test.^{[citation needed]}

The formula for the variance of V=φ_c is known.^[3]

In R, the function cramerV() from the package rcompanion^[4] calculates V using the chisq.test function from the stats package. In contrast to the function cramersV() from the lsr^[5] package, cramerV() also offers an option to correct for bias. It applies the correction described in the following section.

Bias correction

Cramér's V can be a heavily biased estimator of its population counterpart and will tend to overestimate the strength of association. A bias correction, using the above notation, is given by^[6]

[math]\displaystyle{ \tilde V = \sqrt{\frac{\tilde\varphi^2}{\min(\tilde k - 1,\tilde r - 1)}} }[/math]

where

[math]\displaystyle{ \tilde\varphi^2 = \max\left(0,\varphi^2 - \frac{(k-1)(r-1)}{n-1}\right) }[/math]

and

[math]\displaystyle{ \tilde k = k - \frac{(k-1)^2}{n-1} }[/math]

[math]\displaystyle{ \tilde r = r - \frac{(r-1)^2}{n-1} }[/math]

Then [math]\displaystyle{ \tilde V }[/math] estimates the same population quantity as Cramér's V but with typically much smaller mean squared error. The rationale for the correction is that under independence, [math]\displaystyle{ E[\varphi^2]=\frac{(k-1)(r-1)}{n-1} }[/math].^[7]

References

↑ Cramér, Harald. 1946. Mathematical Methods of Statistics. Princeton: Princeton University Press, page 282 (Chapter 21. The two-dimensional case). ISBN:0-691-08004-6 (table of content )
↑ Sheskin, David J. (1997). Handbook of Parametric and Nonparametric Statistical Procedures. Boca Raton, Fl: CRC Press.
↑ Liebetrau, Albert M. (1983). Measures of association. Newbury Park, CA: Sage Publications. Quantitative Applications in the Social Sciences Series No. 32. (pages 15–16)
↑ "Rcompanion: Functions to Support Extension Education Program Evaluation". 2019-01-03. https://CRAN.R-project.org/package=rcompanion.
↑ "Lsr: Companion to "Learning Statistics with R"". 2015-03-02. https://CRAN.R-project.org/package=lsr.
↑ Bergsma, Wicher (2013). "A bias correction for Cramér's V and Tschuprow's T". Journal of the Korean Statistical Society 42 (3): 323–328. doi:10.1016/j.jkss.2012.10.002.
↑ Bartlett, Maurice S. (1937). "Properties of Sufficiency and Statistical Tests". Proceedings of the Royal Society of London. Series A 160 (901): 268–282. doi:10.1098/rspa.1937.0109. Bibcode: 1937RSPSA.160..268B.
↑ Tyler, Scott R.; Bunyavanich, Supinda; Schadt, Eric E. (2021-11-19). "PMD Uncovers Widespread Cell-State Erasure by scRNAseq Batch Correction Methods" (in en). BioRxiv: 2021.11.15.468733. doi:10.1101/2021.11.15.468733. https://www.biorxiv.org/content/10.1101/2021.11.15.468733v1.

External links

A Measure of Association for Nonparametric Statistics (Alan C. Acock and Gordon R. Stavig Page 1381 of 1381–1386)
Nominal Association: Phi and Cramer's Vl from the homepage of Pat Dattalo.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Cramér's V. Read more

[1] Cramér, Harald. 1946. Mathematical Methods of Statistics. Princeton: Princeton University Press, page 282 (Chapter 21. The two-dimensional case). ISBN:0-691-08004-6 (table of content )

[Ref_a-2] Sheskin, David J. (1997). Handbook of Parametric and Nonparametric Statistical Procedures. Boca Raton, Fl: CRC Press.

[3] Liebetrau, Albert M. (1983). Measures of association. Newbury Park, CA: Sage Publications. Quantitative Applications in the Social Sciences Series No. 32. (pages 15–16)

[4] "Rcompanion: Functions to Support Extension Education Program Evaluation". 2019-01-03. https://CRAN.R-project.org/package=rcompanion.

[5] "Lsr: Companion to "Learning Statistics with R"". 2015-03-02. https://CRAN.R-project.org/package=lsr.

[bergsma13-6] Bergsma, Wicher (2013). "A bias correction for Cramér's V and Tschuprow's T". Journal of the Korean Statistical Society 42 (3): 323–328. doi:10.1016/j.jkss.2012.10.002.

[7] Bartlett, Maurice S. (1937). "Properties of Sufficiency and Statistical Tests". Proceedings of the Royal Society of London. Series A 160 (901): 268–282. doi:10.1098/rspa.1937.0109. Bibcode: 1937RSPSA.160..268B.

[8] Tyler, Scott R.; Bunyavanich, Supinda; Schadt, Eric E. (2021-11-19). "PMD Uncovers Widespread Cell-State Erasure by scRNAseq Batch Correction Methods" (in en). BioRxiv: 2021.11.15.468733. doi:10.1101/2021.11.15.468733. https://www.biorxiv.org/content/10.1101/2021.11.15.468733v1.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Anonymous

Search

Cramér's V

Namespaces

More

Page actions

Contents

Usage and interpretation

Calculation

Bias correction

See also

References

External links

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Cramér's V

Usage and interpretation

Calculation

Bias correction

See also

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories