Binning

From HandWiki
Revision as of 19:26, 4 August 2021 by imported>PolicyEnforcerIA (attribution)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


The process of grouping measured data into data classes or histogram bins. Discretization, quantization, or digitizing are very similar concepts. After binning, the fine-grain information of the original measured values is lost, and one uses only bin contents. The amount of information lost in this way is negligible if the bin widths are small compared with the experimental resolution.

Many statistical methods, notably those based on the chi-square distribution, require that data be binned, and that the bins satisfy certain constraints, namely that the number of events in each bin be not less than a certain minimum number so that the distribution of expected events per bin is approximately Gaussian. Opinions differ on the minimum number of events required, but this is usually taken as being between five and ten, provided only a few bins have this minimum number. There is no reason why bins should be of equal width, except for convenience of computation (e.g. in image processing), and many studies indicate that the statistically optimal binning is that which gives equally probable bins.

Where the amount of data is so small that wide bins are necessary, it is preferable to avoid binning by using other methods if possible. For example, use the maximum likelihood fit instead of the least squares fit, and use the Kolmogorov test or the Cramer-Smirnov-Von-Mises test rather than the one-dimensional chi-square test.