Skip Content
You are currently on the new version of our website. Access the old version .
AlgorithmsAlgorithms
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

5 February 2026

Genetic Elitist Approach and Density Peaks to Improve K-Means Clustering

,
and
1
DIMES, University of Calabria, 87036 Rende, Italy
2
Institute for High Performance Computing and Networking (ICAR), CNR—National Research Council of Italy, 87036 Rende, Italy
*
Author to whom correspondence should be addressed.

Abstract

K-Means is a well-known algorithm for unsupervised clustering, very often used due to its simplicity and efficiency. Its long-time widespread use has stimulated researchers to investigate its properties further. A critical property concerns K-Means’s strong dependence on the seeding method adopted to initialize centroids. Poor initialization causes K-Means to get stuck in a local sub-optimal solution. This paper proposes DPCCs—Density Peaks of Candidate Centroids—a novel seeding method for K-Means. DPCC rests on genetic concepts and density peaks to define an initialization solution close to the optimal one. First, a population of J elitist candidate solutions, that is, solutions capable of yielding a reduced clustering cost, is built. Although none of these particular solutions can be near the optimal one, candidate centroids, as experimentally confirmed, tend to thicken around ground truth centroids. Therefore, subsequent generations of the population are created by repeating the k-nearest neighbors (kNNs) procedure for different values of the k parameter, and estimating density through the reverse nearest neighbors (RNNs) relationship of each centroid. Centroid density peaks are then exploited to rearrange the population solutions toward extracting a candidate solution, which is finally optimized by K-Means. The paper describes the design and operation of DPCC, which is currently implemented in parallel Java. The clustering effectiveness of DPCC is demonstrated by applications to both benchmark and real-world datasets. Results are compared with those of other competing algorithms.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.