Additive disequilibrium and z statistic
Additive disequilibrium (D) is a statistic that estimates the difference between observed genotypic frequencies and the genotypic frequencies that would be expected under Hardy–Weinberg equilibrium. At a biallelic locus with alleles 1 and 2, the additive disequilibrium exists according to the equations[1]
- [math]\displaystyle{ \begin{align} f_{11} & = p_1^2 + D \\[5pt] f_{12} & = 2p_1 (1-p_1) - 2D \\[5pt] f_{22} & = (1-p_1)^2 + D \end{align} }[/math]
where fij is the frequency of genotype ij in the population, p is the allele frequency in the population, and D is the additive disequilibrium coefficient.[1]
Having a value of D > 0 indicates an excess of homozygotes/deficiency of heterozygotes in the population, whereas D < 0 indicates an excess of heterozygotes/deficiency of homozygotes. When D = 0, the genotypes are considered to be in Hardy Weinberg Equilibrium. In practice, the estimated additive disequilibrium from a sample, [math]\displaystyle{ \widehat{D} }[/math] , will rarely be exactly 0, but it may be small enough to conclude that it is not significantly different from 0. Finding the value of the additive disequilibrium coefficient provides an alternative assessment in accepting or rejecting Hardy Weinberg Equilibrium in a set of genotypic frequencies.[1]
Because the genotype and allele frequencies must be positive numbers in the interval (0,1), there exists a constraint on the range of possible values for D, which is as follows:
- [math]\displaystyle{ \max_{u\,\in\, (1,2)} -p_u^2 \le D \le p_1 (1-p_1) }[/math]
To estimate D from a sample, use the formula:
- [math]\displaystyle{ \widehat{D} = \widehat{f}_{11} - \widehat{p}_1^2 = \frac{n_{11}}{n} - \left( \frac{2n_{11}+n_{12}}{2n} \right)^2 }[/math]
where n11 (n12) is the number of individuals in the sample with that particular genotype and n is the total number of individuals in the sample. Note that [math]\displaystyle{ \widehat{f}_{11} }[/math] and [math]\displaystyle{ \widehat{p}_1 }[/math] are sample estimates of the population genotype and allele frequencies.
The approximate sampling variance of [math]\displaystyle{ \widehat{D} }[/math] (given by [math]\displaystyle{ \operatorname{var}(\widehat{D}) }[/math]) is:
- [math]\displaystyle{ \operatorname{var} \widehat D = \frac{\widehat p_1^2(1 - \widehat p_1^2)} n }[/math] [2]
From this an estimated 95% confidence interval can be calculated, which is
- [math]\displaystyle{ \widehat{D} \pm 1.96 \sqrt{\operatorname{var}(\widehat{D})} }[/math]
Note: [math]\displaystyle{ \sqrt{\operatorname{var}(\widehat{D})} }[/math] is also equal to the estimated standard deviation .
If the confidence interval for [math]\displaystyle{ \widehat{D} }[/math] does not include zero, we can reject the null hypothesis for Hardy Weinberg Equilibrium.
- Similarly, we can also test for Hardy Weinberg Equilibrium using the z-statistic, which uses information from the estimate of additive disequilibrium to determine significance. When using the z-statistic, however, the goal is to transform the statistic in a way such that asymptotically, it has a standard normal distribution. To do this, divide [math]\displaystyle{ \widehat{D} }[/math] by its standard deviation, which gives the simplified equation:[1]
- [math]\displaystyle{ z = \frac{\widehat{D}\sqrt n}{\widehat{p}_1 (1-\widehat{p}_1)} }[/math]
When z is large, [math]\displaystyle{ \widehat{D} }[/math] and thus the departure from Hardy Weinberg Equilibrium are also large. If the value of z is sufficiently large, it is unlikely that the deviations would occur by chance and thus the hypothesis of Hardy Weinberg Equilibrium can be rejected.[1]
To determine if z is significantly larger or smaller than expected under Hardy Weinberg Equilibrium, find "the probability of observing" a value as or more extreme as the observed z "under the null hypothesis". The tail probability is normally used, [math]\displaystyle{ \mathbb{P} }[/math](y > z), where y is standard normal random variable. When z is positive, the tail probability is 1 − [math]\displaystyle{ \mathbb{P} }[/math](y ≤ z). Because normal distributions are symmetric, the upper and lower tail probabilities will be equal, and thus you can find the upper probability and multiply by 2 to find the combined tail probabilities.
If z is negative, find the negative tail probability, [math]\displaystyle{ \mathbb{P} }[/math](y ≤ z), and multiply by 2 to find the combined probability in both upper and lower tails.
The probability values calculated from these equations can be analyzed by comparison to a pre-specified value of α. When the observed probability p ≤ α, we can "reject the null hypothesis of Hardy Weinberg Equilibrium". If p > α, we fail to reject the null hypothesis. Commonly used values of α are 0.05, 0.01, and 0.001.[3]
At a significance of α = 0.05, we can reject the hypothesis of Hardy Weinberg Equilibrium if the absolute value of z is "greater than or equal to the critical value 1.96" for the two-sided test.[1][4]
References
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 Weir, Bruce (1996). Genetic data analysis : methods for discrete population genetic data (2. ed., [rev. and expanded]. ed.). Sunderland, Mass.: Sinauer. pp. 94–96. ISBN 0-87893-902-4.
- ↑ Chen, J. J.; Duan, T.; Single, R.; Mather, K.; Thomson, G. (23 May 2005). "Hardy-Weinberg Testing of a Single Homozygous Genotype". Genetics 170 (3): 1439–1442. doi:10.1534/genetics.105.043190. PMID 15911570.
- ↑ "7.1.3.1. Critical values and p values". NIST SEMATECH. http://www.itl.nist.gov/div898/handbook/prc/section1/prc131.htm. Retrieved 4 December 2017.
- ↑ "Tests of Significance". http://www.stat.yale.edu/Courses/1997-98/101/sigtest.htm.
Original source: https://en.wikipedia.org/wiki/Additive disequilibrium and z statistic.
Read more |