Correction for attenuation

From HandWiki

Correction for attenuation is a statistical procedure developed by Charles Spearman in 1904 that is used to "rid a correlation coefficient from the weakening effect of measurement error" (Jensen, 1998), a phenomenon known as regression dilution. In measurement and statistics, the correction is also called disattenuation. The correction assures that the correlation across data units (for example, people) between two sets of variables is estimated in a manner that accounts for error contained within the measurement of those variables.[1]


Estimates of correlations between variables are diluted (weakened) by measurement error. Disattenuation provides for a more accurate estimate of the correlation by accounting for this effect.


Let [math]\displaystyle{ \beta }[/math] and [math]\displaystyle{ \theta }[/math] be the true values of two attributes of some person or statistical unit. These values are variables by virtue of the assumption that they differ for different statistical units in the population. Let [math]\displaystyle{ \hat{\beta} }[/math] and [math]\displaystyle{ \hat{\theta} }[/math] be estimates of [math]\displaystyle{ \beta }[/math] and [math]\displaystyle{ \theta }[/math] derived either directly by observation-with-error or from application of a measurement model, such as the Rasch model. Also, let

[math]\displaystyle{ \hat{\beta} = \beta + \epsilon_{\beta} , \quad\quad \hat{\theta} = \theta + \epsilon_\theta, }[/math]

where [math]\displaystyle{ \epsilon_{\beta} }[/math] and [math]\displaystyle{ \epsilon_\theta }[/math] are the measurement errors associated with the estimates [math]\displaystyle{ \hat{\beta} }[/math] and [math]\displaystyle{ \hat{\theta} }[/math].

The estimated correlation between two sets of estimates is

[math]\displaystyle{ \operatorname{corr}(\hat{\beta},\hat{\theta})= \frac{\operatorname{cov}(\hat{\beta},\hat{\theta})}{\sqrt{\operatorname{var}[\hat{\beta}]\operatorname{var}[\hat{\theta}}]} }[/math]
[math]\displaystyle{ =\frac{\operatorname{cov}(\beta+\epsilon_{\beta}, \theta+\epsilon_\theta)}{\sqrt{\operatorname{var}[\beta+\epsilon_{\beta}]\operatorname{var}[\theta+\epsilon_\theta]}}, }[/math]

which, assuming the errors are uncorrelated with each other and with the true attribute values, gives

[math]\displaystyle{ \operatorname{corr}(\hat{\beta},\hat{\theta})= \frac{\operatorname{cov}(\beta,\theta)}{\sqrt{(\operatorname{var}[\beta]+\operatorname{var}[\epsilon_\beta])(\operatorname{var}[\theta]+\operatorname{var}[\epsilon_\theta])}} }[/math]
[math]\displaystyle{ =\frac{\operatorname{cov}(\beta,\theta)}{\sqrt{(\operatorname{var}[\beta]\operatorname{var}[\theta])}}.\frac{\sqrt{\operatorname{var}[\beta]\operatorname{var}[\theta]}}{\sqrt{(\operatorname{var}[\beta]+\operatorname{var}[\epsilon_\beta])(\operatorname{var}[\theta]+\operatorname{var}[\epsilon_\theta])}} }[/math]
[math]\displaystyle{ =\rho \sqrt{R_\beta R_\theta}, }[/math]

where [math]\displaystyle{ R_\beta }[/math] is the separation index of the set of estimates of [math]\displaystyle{ \beta }[/math], which is analogous to Cronbach's alpha; that is, in terms of classical test theory, [math]\displaystyle{ R_\beta }[/math] is analogous to a reliability coefficient. Specifically, the separation index is given as follows:

[math]\displaystyle{ R_\beta=\frac{\operatorname{var}[\beta]}{\operatorname{var}[\beta]+\operatorname{var}[\epsilon_\beta]}=\frac{\operatorname{var}[\hat{\beta}]-\operatorname{var}[\epsilon_\beta]}{\operatorname{var}[\hat{\beta}]}, }[/math]

where the mean squared standard error of person estimate gives an estimate of the variance of the errors, [math]\displaystyle{ \epsilon_\beta }[/math]. The standard errors are normally produced as a by-product of the estimation process (see Rasch model estimation).

The disattenuated estimate of the correlation between the two sets of parameter estimates is therefore

[math]\displaystyle{ \rho = \frac{\mbox{corr}(\hat{\beta},\hat{\theta})}{\sqrt{R_\beta R_\theta}}. }[/math]

That is, the disattenuated correlation estimate is obtained by dividing the correlation between the estimates by the geometric mean of the separation indices of the two sets of estimates. Expressed in terms of classical test theory, the correlation is divided by the geometric mean of the reliability coefficients of two tests.

Given two random variables [math]\displaystyle{ X^\prime }[/math] and [math]\displaystyle{ Y^\prime }[/math] measured as [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] with measured correlation [math]\displaystyle{ r_{xy} }[/math] and a known reliability for each variable, [math]\displaystyle{ r_{xx} }[/math] and [math]\displaystyle{ r_{yy} }[/math], the estimated correlation between [math]\displaystyle{ X^\prime }[/math] and [math]\displaystyle{ Y^\prime }[/math] corrected for attenuation is

[math]\displaystyle{ r_{x'y'} = \frac{r_{xy}}{\sqrt{r_{xx}r_{yy}}} }[/math].

How well the variables are measured affects the correlation of X and Y. The correction for attenuation tells one what the estimated correlation is expected to be if one could measure X′ and Y′ with perfect reliability.

Thus if [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are taken to be imperfect measurements of underlying variables [math]\displaystyle{ X' }[/math] and [math]\displaystyle{ Y' }[/math] with independent errors, then [math]\displaystyle{ r_{x'y'} }[/math] estimates the true correlation between [math]\displaystyle{ X' }[/math] and [math]\displaystyle{ Y' }[/math].

See also


  • Jensen, A.R. (1998). The g Factor: The Science of Mental Ability Praeger, Connecticut, US. ISBN:0-275-96103-6
  • Spearman, C. (1904) "The Proof and Measurement of Association between Two Things". The American Journal of Psychology, 15 (1), 72–101 JSTOR 1412159
  1. Franks, Alexander; Airoldi, Edoardo; Slavov, Nikolai (2017-05-08). "Post-transcriptional regulation across human tissues". PLOS Computational Biology 13 (5): e1005535. doi:10.1371/journal.pcbi.1005535. ISSN 1553-7358. PMID 28481885. PMC 5440056. 

External links