Balanced clustering

From HandWiki

Balanced clustering is a special case of clustering where, in the strictest sense, cluster sizes are constrained to [math]\displaystyle{ \lfloor {n\over k}\rfloor }[/math] or [math]\displaystyle{ \lceil{n \over k}\rceil }[/math], where [math]\displaystyle{ n }[/math] is the number of points and [math]\displaystyle{ k }[/math] is the number of clusters.[1] A typical algorithm is balanced k-means, which minimizes mean square error (MSE). Another type of balanced clustering called balance-driven clustering has a two-objective cost function that minimizes both the imbalance and the MSE. Typical cost functions are ratio cut[2] and Ncut.[3] Balanced clustering can be used for example in scenarios where freight has to be delivered to [math]\displaystyle{ n }[/math] locations with [math]\displaystyle{ k }[/math] cars. It is then preferred that each car delivers to an equal number of locations.

Software

There exists implementations for balanced k-means[4] and Ncut[5]

References

  1. M. I. Malinen and P. Fränti (August 2014). "Balanced K-Means for Clustering". Structural, Syntactic, and Statistical Pattern Recognition. Lecture Notes in Computer Science. 8621. pp. 32–41. doi:10.1007/978-3-662-44415-3_4. ISBN 978-3-662-44414-6. 
  2. L. Hagen and A. B. Kahng (1992). "New spectral methods for ratio cut partitioning and clustering". IEEE Transactions on Computer-Aided Design 11 (9): 1074–1085. doi:10.1109/43.159993. 
  3. J. Shi and J. Malik (2000). "Normalized cuts and image segmentation". IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (8): 888–905. doi:10.1109/34.868688. https://repository.upenn.edu/cis_papers/107. 
  4. M. I. Malinen and P. Fränti. "Balanced k-Means implementation". University of Eastern Finland. http://cs.uef.fi/sipu/soft/Balanced.zip. 
  5. T. Cour, S. Yu and J. Shi. "Ncut implementation". University of Pennsylvania. http://www.cis.upenn.edu/~jshi/software/. 

Levin, M. Sh. (2017). "On Balanced Clustering (Indices, Models, Examples)". Journal of Communications Technology and Electronics 62 (12): 1506–1515. doi:10.1134/S1064226917120105.