Clustering is the task of categorizing objects into different classes in an unsupervised way. Hierarchical clustering algorithms are usually very effective in detecting the dataset underlying structure. However, they do not create clusters, but compute only a hierarchical representation of the dataset. It is then desirable to make them a suitable automatic pre-processing step for the algorithms operating on the selected clusters. To this purpose, in this paper we present an algorithm that finds the best clustering partition according to clustering validity indexes. In particular, our automatic approach performs a validity index-driven search through a clustering tree. The best partition is then selected cutting the tree in a non-horizontal way. The algorithm was implemented in a software tool and then tested on different datasets. The overall system makes then hierarchical clustering an automatic step, where no user interaction is needed in order to select clusters from a hierarchical cluster representation.
Automatic Cluster Selection Using Index Driven Search Strategy
FERRARETTI, Denis;GAMBERONI, Giacomo;LAMMA, Evelina
2009
Abstract
Clustering is the task of categorizing objects into different classes in an unsupervised way. Hierarchical clustering algorithms are usually very effective in detecting the dataset underlying structure. However, they do not create clusters, but compute only a hierarchical representation of the dataset. It is then desirable to make them a suitable automatic pre-processing step for the algorithms operating on the selected clusters. To this purpose, in this paper we present an algorithm that finds the best clustering partition according to clustering validity indexes. In particular, our automatic approach performs a validity index-driven search through a clustering tree. The best partition is then selected cutting the tree in a non-horizontal way. The algorithm was implemented in a software tool and then tested on different datasets. The overall system makes then hierarchical clustering an automatic step, where no user interaction is needed in order to select clusters from a hierarchical cluster representation.I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.