Genetic Elitist Approach and Density Peaks to Improve K-Means Clustering

Libero Nigro; Franco Cicirelli; Francesco Pupo

doi:10.3390/a19020131

,

and

¹

DIMES, University of Calabria, 87036 Rende, Italy

²

Institute for High Performance Computing and Networking (ICAR), CNR—National Research Council of Italy, 87036 Rende, Italy

^*

Author to whom correspondence should be addressed.

Algorithms2026, 19(2), 131;https://doi.org/10.3390/a19020131

Version Notes

Order Reprints

Abstract

K-Means is a well-known algorithm for unsupervised clustering, very often used due to its simplicity and efficiency. Its long-time widespread use has stimulated researchers to investigate its properties further. A critical property concerns K-Means’s strong dependence on the seeding method adopted to initialize centroids. Poor initialization causes K-Means to get stuck in a local sub-optimal solution. This paper proposes DPCCs—Density Peaks of Candidate Centroids—a novel seeding method for K-Means. DPCC rests on genetic concepts and density peaks to define an initialization solution close to the optimal one. First, a population of J elitist candidate solutions, that is, solutions capable of yielding a reduced clustering cost, is built. Although none of these particular solutions can be near the optimal one, candidate centroids, as experimentally confirmed, tend to thicken around ground truth centroids. Therefore, subsequent generations of the population are created by repeating the k-nearest neighbors (kNNs) procedure for different values of the k parameter, and estimating density through the reverse nearest neighbors (RNNs) relationship of each centroid. Centroid density peaks are then exploited to rearrange the population solutions toward extracting a candidate solution, which is finally optimized by K-Means. The paper describes the design and operation of DPCC, which is currently implemented in parallel Java. The clustering effectiveness of DPCC is demonstrated by applications to both benchmark and real-world datasets. Results are compared with those of other competing algorithms.

Keywords:

unsupervised clustering; K-Means; seeding methods; genetic clustering; density peaks; k-nearest neighbors; reverse nearest neighbors; benchmark datasets; real-world datasets; Java

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.