Sheppard's correction

From HandWiki

In statistics, Sheppard's corrections are approximate corrections to estimates of moments computed from binned data. The concept is named after William Fleetwood Sheppard. Let [math]\displaystyle{ m_k }[/math] be the measured kth moment, [math]\displaystyle{ \hat{\mu}_k }[/math] the corresponding corrected moment, and [math]\displaystyle{ c }[/math] the breadth of the class interval (i.e., the bin width). No correction is necessary for the mean (first moment about zero). The first few measured and corrected moments about the mean are then related as follows:

[math]\displaystyle{ \begin{align} \hat{\mu}_2 &= m_2 - \frac{1}{12} c^2 \\ \hat{\mu}_3 &= m_3 \\ \hat{\mu}_4 &= m_4 - \frac{1}{2} m_2c^2 + \frac{7}{240} c^4. \end{align} }[/math]

When the data come from a normally distributed population, then binning and using the midpoint of the bin as the observed value results in an overestimate of the variance. That is why the correction to the variance is negative. The reason why the uncorrected estimate of the variance is an overestimate is that the error is negatively correlated with the observation. For the uniform distribution, the error is uncorrelated with the observation, so a correction should be +c2/12, which is the variance of the error itself rather than −c2/12. Thus Sheppard's correction is biased in favor of population distributions in which the error is negatively correlated with the observation.

The cumulants of the sum of the grouped variable and the uniform variable are the sums of the cumulants. As odd cumulants of a uniform distribution are zero; only even moments are affected.

The second and fourth cumulants of the uniform distribution on (−0.5c, 0.5c) are respectively, c2/12 and −c4/120.

The correction to moments can be derived from the relation between cumulants and moments.

References