Behzad Kamgar-Parsi
Paper download is intended for registered attendees only, and is
subjected to the IEEE Copyright Policy. Any other use is strongly forbidden.
Papers from this author
Penalized K-Means Algorithms for Finding the Number of Clusters
Behzad Kamgar-Parsi, Behrooz Kamgar-Parsi
Auto-TLDR; Exploring the coefficient of additive penalty in k-means for ideal clusters
Abstract Slides Poster Similar
In many applications we want to find the number of clusters in a dataset. A common approach is to use a penalized k-means algorithm with an additive penalty term linear in the number of clusters. Obviously, the number of discovered clusters depends critically on the value of the coefficient of the penalty term, and an open problem is estimating the value of the coefficient in a principled manner. In this paper (a) We derive rigorous bounds for the coefficient of additive penalty in k-means for ideal clusters. Although in practice clusters typically deviate from the ideal assumption, the ideal case serves as a useful guideline. (b) We propose an alternative approach to additive penalty, namely multiplicative penalty, which appears to produce a more reliable signature for the correct number of clusters in most cases. We also empirically investigate certain types of deviations from ideal cluster assumption and show, in particular, that the best way to resolve ambiguous solutions is to combine additive and multiplicative penalties.