Date of this Version
2005 IEEE International Conference on Electro Information Technology, Lincoln, NE, USA, 22-25 May 2005. DOI: 10.1109/EIT.2005.1626978
Clustering is a practical data mining approach of pattern detection. Because of the sensitivity of initial conditions, k-means clustering often suffers from low clustering performance. We present a procedure to refine initial conditions of k-means clustering by analyzing density distributions of a data set before estimating the number of clusters k necessary for the data set, as well as the positions of the initial centroids of the clusters. We demonstrate that this approach indeed improves the accuracy and performance of k-means clustering measured by average intra to interclustering error ratio. This method is applied to the virtual ecology project to design a virtual blue jay system.