A New MDL-based Clustering Algorithm

Authors

  • Zdravko Markov Central Connecticut State University

DOI:

https://doi.org/10.32473/flairs.38.1.138866

Abstract

The Minimum Description Length (MDL) principle is commonly used to evaluate machine learning models. In clustering, it may be used to assess the overall quality of data clustering or in the implementation of clustering algorithms for guiding the search for possible clusterings. In this paper, we explore the latter approach. We describe a new clustering algorithm, which uses an MDL-based measure during the process of creating clusters. The algorithm processes the data instances to be clustered one at a time and uses them to form new clusters or assigns them to already created ones. Instead of using distances (as in other popular algorithms) this process is controlled by evaluating the overall clustering quality at each step based on the MDL measure. The basic advantages of the algorithm are its simplicity, lack of adjustable parameters, and low computational complexity. The computation of the MDL score for the current clustering, which is performed for each instance, doesn’t depend on the number of instances in the clusters and is linear with the number of attributes. The performance of the algorithm is evaluated on benchmark datasets.

Downloads

Published

14-05-2025

How to Cite

Markov, Z. (2025). A New MDL-based Clustering Algorithm. The International FLAIRS Conference Proceedings, 38(1). https://doi.org/10.32473/flairs.38.1.138866