Statistics, Department of

 

The R Journal

Date of this Version

8-2016

Document Type

Article

Citation

The R Journal (August 2016) 8(1); Editor: Michael Lawrence

Comments

Copyright 2016, The R Foundation. Open access material. License: CC BY 3.0 Unported

Abstract

The general clustering algorithms do not guarantee optimality because of the hardness of the problem. Polynomial-time methods can find the clustering corresponding to the exact optimum only in special cases. For example, the dynamic programming algorithm can solve the one-dimensional clustering problem, i.e., when the items to be clustered can be characterised by only one scalar number. Optimal one-dimensional clustering is provided by package Ckmeans.1d.dp in R. The paper shows a possible generalisation of the method implemented in this package to multidimensional data: the dynamic programming method can be applied to find the optimum clustering of vectors when only subsequent items may form a cluster. Sequential data are common in various fields including telecommunication, bioinformatics, marketing, transportation etc. The proposed algorithm can determine the optima for a range of cluster numbers in order to support the case when the number of clusters is not known in advance.

Share

COinS