Next Article in Journal
Optimizing Tourism Routes: A Quantum Approach to the Profitable Tour Problem
Previous Article in Journal
Mathematical and Algorithmic Advances in Machine Learning for Statistical Process Control: A Systematic Review
Previous Article in Special Issue
Orientation-Modulated Hyperuniformity in Frustrated Vicsek–Kuramoto Systems
 
 
Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

K-Means Community Detection Algorithm Based on Density Peaks

by
Hongyan Gao
1,
Jing Han
2,
Yue Liu
1,
Peng Zhang
1,
Bo Yang
1,
Yanqing Zu
1,
Fei Liu
1,* and
Yu Qian
1,*
1
School of Physics and Opto-Electronic Technology, Baoji University of Arts and Sciences, Baoji 721016, China
2
Physics Teaching and Research Section, Fugu Middle School of Shaanxi Province, Yulin 719499, China
*
Authors to whom correspondence should be addressed.
Entropy 2026, 28(2), 152; https://doi.org/10.3390/e28020152
Submission received: 10 December 2025 / Revised: 13 January 2026 / Accepted: 28 January 2026 / Published: 29 January 2026

Abstract

The identification of community structure is pivotal for understanding the functional characteristics of complex networks. To address the limitations of most existing community detection algorithms, which often require predefining the number of communities and lack robustness, this paper proposes a novel community detection algorithm named D-means (K-means community detection algorithm based on density peaks). This algorithm integrates the concept of density peak clustering with K-means spectral clustering, employing Chebyshev’s inequality to automatically determine the number of community centers, thereby enabling unsupervised identification of community quantities. By designing a multi-dimensional evaluation framework, the comparative experiments were conducted on LFR benchmark networks (Lancichinetti-Fortunato-Radicchi benchmark networks) and real-world social network datasets. The results demonstrate that the D-means algorithm outperforms traditional algorithms in terms of ACC (accuracy), ARI (adjusted rand index), and NMI (normalized mutual information) metrics, while also achieving improvements in runtime efficiency, showcasing strong robustness. Finally, the D-means algorithm was applied to the public transportation network of Urumqi. Empirical analysis identified 12 functionally significant transportation communities, providing theoretical support for urban rail transit optimization and commercial facility layout planning.

1. Introduction

Complex systems theory, at the forefront of systems science, profoundly reveals the universal patterns inherent in natural and social systems. It has become a focal point of interdisciplinary research across fields such as physics, mathematics, computer science, and transportation engineering [1,2,3]. As abstract models for describing complex systems, complex networks distill the commonalities of diverse systems into nodes and edges, thereby serving as powerful tools for characterizing and understanding real-world large-scale complex systems [4]. Within this framework, community structure—defined as subsets of nodes that are densely interconnected internally but sparsely connected externally—provides researchers with a mesoscale perspective situated between the macroscopic whole and microscopic individuals. This perspective is key to uncovering the functional modules and hierarchical organization inherent in complex networks [5]. For instance, analyzing the community structure of urban transportation networks can identify highly synergistic traffic subzones, offering crucial theoretical insights for optimizing traffic flow, planning infrastructure, and deploying commercial services [6].
In response to the highly heterogeneous structural characteristics of real-world networks, community detection methods have evolved into two main paradigms: global and local approaches. Global methods (e.g., the Louvain algorithm, spectral clustering) require complete network topological information and achieve network partitioning by optimizing global objective functions such as modularity [7]. These methods can generate globally consistent structural partitions but suffer from resolution limits and high computational complexity, making them difficult to apply directly to ultra-large-scale networks or dynamically evolving scenarios. The traditional K-means clustering algorithm, which optimizes by minimizing the sum of squared distances from nodes to cluster centers, is also widely used in community detection [8]. To enhance its adaptability to network data, researchers have proposed various improvements: for instance, Hajij et al. identified initial cluster centers using local PageRank values to improve stability [9]; Sommer et al. developed an autonomous kernel K-means framework that deeply integrates kernel techniques with modularity criteria [10]; Cai et al. introduced a Density-Degree (DD) index that combines node density and degree centrality to construct an improved K-means community detection framework [11]. However, most of these improved methods still rely on multiple iterative evaluations to determine the optimal cluster centers, a process that incurs high computational costs and significantly limits efficiency in large-scale network scenarios.
Concurrently, significant progress has been made in deep learning-based community detection methods, with graph neural network technology enabling effective joint modeling of network topology and node attributes. For instance, Kojaku et al. proposed a community detection method based on neural network embedding [12]; He et al. designed a semi-supervised overlapping community detection model named SSGCAE using graph neural networks [13]; and Guo et al. introduced a community detection algorithm based on adaptive graph contrastive learning [14]. However, such methods typically rely on large amounts of annotated data for training and suffer from a lack of interpretability in their decision-making processes, which to some extent restricts their broader application in the analysis of practical complex systems.
Local community detection methods, such as the LFM algorithm(local fitness maximization algorithm) and seed expansion algorithms, rely solely on local network information. They start from seed nodes and gradually expand to form communities. These methods offer advantages in computational efficiency and scalability, particularly excelling at identifying small-scale community structures. In recent years, local methods have continued to evolve. For example, Baltsou proposed a Hint Enhancement Framework that strengthens the importance of specific seed nodes through edge re-weighting or network rewiring strategies, followed by performing local community detection on the optimized network [15]. Meanwhile, Ni et al. developed the SLSS algorithm, a semi-supervised approach based on structural similarity measures, which requires only partial known community information to complete the detection task [16].
However, local methods still face significant limitations: their results are highly sensitive to the selection of seed nodes and often lack structural consistency from a global perspective. A systematic comparison reveals fundamental differences between global and local methods in terms of detection mechanisms and application scenarios—global methods tend to produce partitions with better structural consistency, while local methods hold a clear advantage in efficiency for large-scale network analysis.
An important trend in current research lies in integrating the strengths of both approaches. By combining global structural information with local computational efficiency, this fusion paradigm aims to enhance the quality of community partitions while maintaining scalability. Such integrated approaches are becoming crucial for addressing the challenges of complex network analysis.
Against this background, the Density Peaks Clustering (DPC) algorithm has garnered widespread attention due to its unique ability to automatically identify cluster centers [17]. By calculating the local density of data points and their distance to points of higher density, the algorithm enables intuitive selection of cluster centers through a two-dimensional decision graph. Since its introduction in 2014 [18], researchers have proposed various improvements to address its limitations in handling high-dimensional data, parameter sensitivity, and density heterogeneity [19,20,21].
Although the algorithm has achieved significant success in traditional data clustering, its application in complex network community detection remains exploratory. It faces two core challenges: first, how to directly process graph-structured data; and second, how to appropriately define “density” and “distance” metrics within a network environment. Currently, only a few studies have attempted to adapt this algorithm for network analysis, such as graph-based label propagation methods, which assign labels to remaining nodes to form final clusters [22].
In summary, two notable issues emerge from current research: On the one hand, there is an urgent need in the field of community detection for novel methods that can overcome the sensitivity of traditional K-means algorithms to initial centers and their reliance on multiple evaluations. On the other hand, although the DPC algorithm demonstrates significant advantages in automatically identifying centers, its application in community partitioning for complex networks remains very limited, particularly lacking in-depth analysis of multimodal transportation networks from the mesoscale perspective of community structure. It is worth noting that existing research still requires further exploration in the following areas: (1) how to effectively integrate the stability of global methods with the efficiency of local methods; (2) how to adapt advanced algorithms from the data clustering domain to network analysis scenarios; and (3) how to apply community detection methods to practical complex systems such as multimodal transportation networks.
Based on this, this study proposes a novel community detection algorithm that integrates Density Peaks Clustering with K-means. The core contributions of this research are: (1) leveraging the automatic center identification mechanism of the DPC algorithm to fundamentally address the initial center selection problem in K-means; (2) designing density and distance metrics suitable for complex network graph data, establishing an effective pathway for migrating data clustering algorithms to the network analysis domain; (3) applying the proposed algorithm to multimodal transportation networks to reveal their organizational patterns and operational mechanisms from the novel mesoscale perspective of community structure, providing scientific decision support for transportation system planning and management; and (4) conducting systematic experimental validation across multiple scenarios—including synthetic networks, social networks, and transportation networks—comprehensively comparing the proposed algorithm with traditional methods, local methods, and the latest hybrid methods to fully demonstrate its effectiveness and superiority.

2. Density Peak Clustering Algorithm

The DPC algorithm is a classical density-based clustering approach that detects density peak points as cluster centers by computing the local density and relative distance of each data point, then assigns the remaining points to their corresponding clusters. As an innovative clustering algorithm, DPC is capable of identifying non-convex clusters and relies on two crucial parameters: the distance parameter and the density threshold. The core idea of this algorithm is based on the following two fundamental assumptions:
Assumption 1.
Cluster centers are characterized by higher local density compared to surrounding points.
Assumption 2.
The distance between different cluster centers in all clusters is relatively large.
To mathematically formalize these two assumptions, consider a clustering dataset X = { x 1 , x 2 , , x n } . For each data point x i , two key metrics are introduced, i.e., the local density ρ i and relative distance δ i , as described by the following definitions (Definitions 1 to 5).
Definition 1.
The Euclidean distance is employed to measure the distance between any two data points in the dataset, calculated as follows:
d i s t ( x i , x j ) = l = 1 m ( x i l x j l ) 2 ,
where  d i s t ( , )  denotes the Euclidean distance,  m  is the dimension of the data sample, and  x i l  is the value of  x i  in the  l th dimension.
Definition 2.
When applying the truncated density estimation method, the local density of a point i is determined by counting the number of neighboring samples that fall within its cutoff distance (also called the truncation radius). The mathematical expression is as follows:
ρ i = x j X χ ( d i s t ( x i ,   x j ) d c ) ,
χ ( x ) = { 0 ,     x 0 , 1 ,     x < 0 ,
where  ρ i  is the local density,  d i s t ( x i , x j )  is the Euclidean distance between the sample  x i  and  x j , and  d c  is the truncation distance of the data points.
Definition 3.
Using Gaussian density estimation, a point’s local density is calculated by summing the Gaussian-weighted distances from all other points, formulated as:
ρ i = x j X e d i s t ( x i , x j ) d c 2 ,
 where  d c  and  d i s t ( , )  is consistent with the definition using truncated density estimation method.
Definition 4.
Relative distance  δ i  refers to the minimum distance between sample  i  and other points with higher density values, written as:
δ i = { max j ( d i s t ( x i ,   x j ) ) ,   o t h e r w i s e , min j : ρ i < ρ j ( d i s t ( x i ,   x j ) ) ,   i f j   s . t .   ρ i < ρ j ,
The clustering performance varies with dataset size: the truncated kernel method demonstrates superior efficiency for large-scale datasets, while the Gaussian kernel approach yields more precise clustering results for smaller datasets.
The DPC algorithm operates without requiring pre-defined cluster centers. Instead, it automatically identifies optimal cluster centers through two key metrics: local density ( ρ ) and relative distance ( δ ). The fundamental principle involves detecting density peaks—data points that simultaneously exhibit both high local density and significant separation distance from any points of greater density. The implementation process involves: (1) computing ρ and δ values for all data points, (2) generating a decision graph with ρ on the horizontal axis and δ on the vertical axis, and (3) visually selecting cluster centers as distinct points occupying the upper-right region of the graph where both ρ and δ values are maximized. The complete algorithmic workflow is illustrated in Figure 1, with detailed execution steps outlined below:
Input: Dataset X = { x 1 , x 2 , , x n } , truncation distance parameter d c .
Output: Clustering results C = { c 1 , c 2 , , c m } .
Step 1: Calculate the distance between each pair of data points based on the sample dataset X to obtain the distance matrix D n × n ;
Step 2: Use the definition to calculate the local density ρ i and relative distance δ i of each point x i ;
Step 3: Draw a decision diagram of ( ρ i , δ i ) and visually select points with high local density and large relative distance values as the cluster center points;
Step 4: Traverse the nodes and assign each non clustered center data point to a cluster with a higher density and the closest distance to it, and finally output the clustering result.
To illustrate the cluster center selection process using a decision graph, we present a demonstrative example in Figure 2, in which red and blue dots represent two different clusters, and black dots represent noise points. Figure 2a shows the distribution of 28 data points, and Figure 2b displays the corresponding decision graph (local density on the horizontal axis and relative distance on the vertical axis).
Cluster centers are selected from the decision graph by identifying points that simultaneously exhibit both high local density ( ρ ) and large relative distance ( δ ), as exemplified by points 1 and 10 in Figure 2b. These points appear in the upper-right region of the decision graph, representing optimal cluster center candidates. Conversely, points 26–28 demonstrate an inverse pattern with relatively large δ values but negligible ρ values, which characterizes them as noise points rather than valid cluster centers.
While the decision graph provides an intuitive approach for cluster center selection, this manual method suffers from subjectivity and inefficiency, particularly when processing complex datasets where visual identification becomes both time-consuming and error-prone. To address these limitations, we propose an automated selection criterion based on the product of local density and relative distance ( γ i = ρ i δ i ). By sorting all data points in descending order of their γ values, the algorithm can automatically identify potential cluster centers as points with the highest γ scores. Following center selection, remaining points are efficiently assigned to their nearest higher-density neighbor in a single pass. This optimized version of the DPC algorithm achieves clustering in O ( n 2 ) time complexity (where n represents the total data points) without requiring iterative computations, offering significant advantages in both automation and computational efficiency.
The DPC algorithm overcomes traditional clustering limitations by leveraging local density and relative distance metrics, yet it faces several critical challenges: its performance is highly sensitive to the truncation distance parameter ( d c ) whose selection currently relies on empirical estimation rather than automated methods, introducing operational complexity; its robustness is compromised by noise and outliers due to potential similarities in density-distance characteristics between noise points and genuine cluster centers; and it suffers from the curse of dimensionality in high-dimensional spaces where conventional distance metrics become ineffective. To enhance the algorithm’s reliability and applicability, researchers have proposed various improvements including automated parameter selection techniques, advanced density measurement methods, and optimized allocation strategies to address these inherent limitations.

3. K-Means Community Detection Algorithm Based on Density Peak

Current research on similarity-based merging algorithms faces two primary challenges: determining the initial number of communities and optimizing the search strategy. The K-means algorithm, for instance, is highly sensitive to this initial configuration, which is often difficult to specify for real-world networks and significantly impacts the final clustering quality. Therefore, accurately and efficiently estimating the number of communities is crucial for performance.
The Density Peak Clustering algorithm addresses this by automatically determining the number of clusters based on data distribution, without requiring pre-specification. Inspired by this advantage, we propose a community detection algorithm that utilizes center and neighbor node information, adopting the core idea of automatically identifying community centers from density peaks.
However, the traditional density peak algorithm relies on coordinate information to compute local density and relative distance, making it unsuitable for network data. Thus, adaptations are necessary to better suit community detection in complex networks.

3.1. Density and Distance Attributes of Nodes

In complex network analysis, node density is a comprehensive metric. Its definition incorporates both the number of directly connected neighbors and the tightness of connections among those neighbors. This tightness is quantified by the number of edges between neighbors, referred to as the clustering coefficient. A higher node degree generally implies greater influence in the network, while a higher clustering coefficient reflects a stronger ability to aggregate surrounding nodes. Therefore, by integrating the leadership and cohesion of nodes, this paper proposes a local density index. A higher node density value indicates that the node is not only a key member within its community but is also surrounded by other significant nodes.
Definition 5.
The density attribute of a node is defined as the sum of its degree and its clustering coefficient. For a node  i , it is calculated as:
ρ i = K i + C i ,
where  K i  denotes the normalized degree of node  i , and  C i  represents its normalized clustering coefficient. Due to the disparate scales where node degree can be much larger than 1 while the clustering coefficient is confined to [0, 1], we employed min-max normalization to scale both the node degree and the clustering coefficient into the [0, 1] interval. This metric relies solely on the network’s topological structure, thus effectively circumventing the high-dimensional challenges often associated with traditional density peak clustering algorithms.
The distance attribute of a node is another crucial metric for identifying central nodes, in addition to its density attribute. In the proposed community detection algorithm, this attribute is derived by comparing density attributes between node pairs. For any two nodes i and j satisfying the following condition:
ρ j > ρ i
ρ j = ρ i ,   C j > C i ,
Definition 6.
The distance attribute of a node  i  is defined as the minimum value among the shortest path distances from this node to all others that possess a higher density attribute. This is formally expressed as:
δ i = min ( d i j ) ,
where  δ i  represents the distance attribute of node  i , and  d i j  denotes the shortest path distance between nodes  i  and  j , respectively.
The calculation of a node’s distance attribute follows a hierarchical proximity-based search:
1.
If any direct neighbor of node i possesses a higher density attribute, the distance attribute δ i is assigned a value of 1.
2.
If no higher-density node exists among direct neighbors, the search extends to second-degree neighbors (neighbors of neighbors). Should any second-degree neighbor have a higher density attribute, δ i is set to 2.
3.
If no node with higher density is found among either direct or second-degree neighbors, δ i is set to 3.
Real-world networks typically exhibit small-world characteristics, characterized by short average path lengths and high clustering coefficients. As a result, distance attribute values δ i in such networks are generally much smaller than density attribute values ρ i . To ensure that both attributes contribute equally in the selection of community centers, Min-Max normalization is applied to scale each attribute to the interval [0, 1]. The normalization formula is as follows:
ρ i n o r = ρ i min ( ρ ) max ( ρ ) min ( ρ ) ,
δ i n o r = δ i min ( δ ) max ( δ ) min ( δ ) ,
where ρ i n o r and δ i n o r denote the normalized density attribute and normalized distance attribute of node i , respectively.

3.2. Selection of Community Centers

Nodes with higher centrality values are more likely to be community centers. To systematically select these central nodes, this paper employs the product index γ from the density peak clustering algorithm, which is computed as:
γ i = ρ i n o r × δ i n o r ,
Following the computation of the product index γ for all nodes, the values are sorted in descending order, and an upper limit is established using Chebyshev’s inequality. As a statistical theorem that does not assume any specific probability distribution, Chebyshev’s inequality provides a conservative upper bound for the probability that a random variable deviates from its expected value. The inequality is premised on the following condition:
Theorem 1.
Let the random variable X have a mathematical expectation  μ , and a variance  σ 2 . For any  ε > 0 , satisfy the following conditions:
P X μ ε σ 2 ε 2 ,
Therefore, given a known mean  μ  and variance  σ 2 , a conservative upper bound for the probability of the random variable X can be derived. This upper bound does not depend on any specific probability distribution and is solely related to a single parameter  ε . The process of selecting cluster centers from the decision graph reveals that the local density and relative distance of these centers exhibit a significant jump in the graph, whereas ordinary data points display low density and a clustered distribution. This observed pattern aligns with the principles of Chebyshev’s inequality. Since the probability distribution of the product index  γ  is determined by the value of parameter  ε , an effective upper bound can be established for  γ i  for the majority of data points.
| γ i μ | ε σ 2 ,
From the above equation, it can be inferred that if the distance from the γ i of a node i to the mean is greater than this value, the node is identified as a low probability data point, which is the central node, satisfying the following formula:
Based on the equation above, it can be inferred that if the deviation of a node i ’s product index γ i from the mean exceeds this threshold value, the node is classified as a low-probability data point and identified as a central node. This condition satisfies the following expression:
γ i > ( μ + ε × σ ) ,
where μ and σ represent the mean and standard deviation of the product index across all nodes, respectively.
As mentioned earlier, a major challenge of the traditional K-means algorithm in the process of community detection is the inability to automatically determine the number of communities, and the choice of community quantity directly affects the final quality of segmentation. This article uses density peak clustering algorithm to select an appropriate number of density peak points as community centers, where the ε need to be determined. The community centers are determined by ε , and the specific number of communities is also determined.
The D-means algorithm excels in achieving a balance between parametric simplicity and robustness of results. Unlike the K-means algorithm, which requires a predefined number of communities, or the Label Propagation Algorithm (LPA), which is sensitive to random initialization, the core hyperparameter of D-means is typically the neighborhood radius ε. This parameter intuitively defines the physical scale of node neighborhoods in the feature space.
Through systematic parameter sweep experiments, we observe that the performance of the D-means algorithm exhibits a typical three-phase pattern as ε varies:
Under-partitioning region (ε > ε_high): When ε is excessively large, the algorithm tends to merge multiple natural communities, resulting in a significantly lower number of communities than the ground truth. Metrics such as modularity (Q) and normalized mutual information (NMI) decline sharply.
Stable plateau region (ε_low ≤ ε ≤ ε_high): A broad interval exists for ε values. Within this range, the core community structure identified by the algorithm remains stable, with key performance indicators (e.g., ACC, NMI) fluctuating by less than 5%. This interval can typically be estimated efficiently using statistical measures of pairwise node distances, such as the 15th to 60th percentiles of the distance distribution, significantly reducing the difficulty of parameter tuning.
Over-partitioning region (ε < ε_low): When ε is too small, node neighborhoods become overly constrained, leading the algorithm to identify each node or its small clusters as independent communities. This results in a large number of trivial communities, rendering the partitioning results meaningless at a macroscopic level, while modularity also performs poorly. We set ε within the range of the 15th to 60th percentiles of the distance distribution.

3.3. Specific Execution Steps of D-Means Algorithm

This paper proposes a novel algorithm named the D-means algorithm. Based on the topological connections within the network structure, the algorithm assigns density and distance attributes to each node. Candidate central nodes are identified as those exhibiting higher density and greater distance values. The Chebyshev inequality is then employed to automatically select potential central nodes from the candidate set.
Subsequently, the eigenvectors of the normalized Laplacian matrix are used as the distance metric to compute the similarity between each node and the central nodes. Here, the number of eigenvectors is set equal to the number of community centers, which was automatically determined in the preceding step. Each node is then assigned to the closest community until all nodes have been classified. The overall workflow of the D-means algorithm is illustrated in Figure 3.
The detailed procedure is described as follows:
Input: Complex network G = ( V , E ) .
Output: Results of community division.
Step 1: Calculate the node density ρ i and distance attributes δ i of each node, and normalize these attributes;
Step 2: Calculate the product index γ i = ρ i n o r × δ i n o r of each node and set an upper limit using the Chebyshev inequality to automatically select the center point of the community;
Step 3: Use the eigenvectors of the standardized Laplacian matrix L as the distance metric;
Step 4: Calculate the distance between each node and the center of each community, and assign the node to the center of the community with the closest distance;
Step 5: The algorithm stops immediately after the final assignment of nodes to communities.

3.4. Analysis of Algorithm Complexity and Scalability

Assuming a network with N nodes and M edges, the time complexity of computing node density attributes—which incorporate both node degree and clustering coefficient—is O ( N + M ) . The computation of node distance attributes also requires O ( N 2 ) . Normalization involves processing each element, resulting in a complexity of O ( N ) . The identification of community center nodes using Chebyshev inequality has a complexity of O ( N ) . The eigenvalue and eigenvector computation of the Laplacian matrix dominates this process with a time complexity of O ( N 3 ) . Considering all steps, the overall time complexity of the proposed D-means algorithm is O ( N 3 ) . Due to the high complexity arising from the need for precise eigenvector computation, the algorithm’s capability to process ultra-large-scale networks is constrained. Therefore, D-means is currently more suitable for small- to medium-scale network analysis scenarios that place a high demand on automatic center identification.

4. Algorithm Validation and Analysis

To evaluate the performance of the proposed algorithm, this paper conducts comparative experiments between the D-means algorithm and six classical community detection methods: GN (Girvan–Newman algorithm), FN (fast community detection algorithms), Louvain (Louvain method for community detection in large networks), Walktrap (Walktrap algorithm for community detection using short random walks), K-means (K-Means-based Community Detection Algorithm), and LPA (Label Propagation Algorithm, LPA). The test datasets include LFR benchmark networks and four widely used real-world networks. Multiple evaluation metrics, such as ACC, NMI, Q, ARI and F1 (F1-Score) are employed to assess the accuracy of the community partitions. Each algorithm was independently executed 10 times on each dataset, and the final results are reported as the average values of each metric to ensure stability and reliability.

4.1. Datasets and Parameter Settings

All experiments were conducted in a Python 2021 environment, with results visualized using Origin 2022 software. The evaluation utilized both real-world networks and LFR benchmark datasets. For the LFR networks, algorithm accuracy was assessed by varying key parameters, with initial configurations set as follows: K = 10 , m a x K = 60 , m u = 0.1 , t 1 = 2 , t 2 = 1 , m i n c = 200 , m a x c = 400 . Table 1 summarizes the specifications of the real and synthetic networks used in this study, where C denotes the number of ground-truth communities, N the number of nodes, M the number of edges, K the average degree, and m u the mixing parameter. In the table, “-” indicates that the corresponding parameter does not need to be set.

4.2. Evaluation Metrics

In order to comprehensively describe the reliability and effectiveness of the algorithm, we used five evaluation metrics, including Q, ACC, ARI, NMI, and F1. Their meanings and specific calculation formulas are as follows:
Q denotes the modularity, Assuming the network is partitioned into k communities, the formula for Q is:
Q = i = 1 k ( e i i m ( a i 2 m ) 2 )
In the formula, e i i represents the number of internal edges within community i , a i represents the sum of the degrees of all nodes in community i , and m represents the total number of edges in the entire network. The value of modularity ranges between −1 and 1. A value closer to 1 indicates denser connections within communities, implying a more pronounced community structure.
ACC represents the proportion of nodes correctly partitioned by the algorithm. Assume the true community structure of the network is A = { A 1 , A 2 , , A k } , and the community structure computed by the algorithm is B = { B 1 , B 2 , , B k } , where k represent the number of communities. The formula for ACC is:
ACC = N c o r r e n t N
In the formula, N c o r r e n t refers to the number of nodes for which the community detected by the algorithm exactly matches the true community structure. N is the total number of nodes. The value of ACC ranges from 0 to 1. A larger value indicates a more accurate partition.
ARI denotes the adjusted rand index, The formula for ARI is:
ARI = RI E [ RI ] max ( RI ) E [ RI ]
In the formula, RI is the Rand Index, E [ RI ] is its expected value, and max ( RI ) denotes the maximum possible value of RI . The value of ARI ranges from −1 to 1. A value closer to 1 indicates a higher consistency in the community partition, while a value closer to 0 or −1 indicates lower consistency.
NMI denotes Normalized Mutual Information. Assume A and B represent the true community structure and the algorithmically detected community structure, respectively. The formula for calculating NMI is:
NMI ( A , B ) = 2 I ( A , B ) H ( A ) + H ( B )
In the formula, I ( A , B ) represents the mutual information between A and B , while H ( A ) and H ( B ) denote the entropy of A and B , respectively.
The value of NM I ranges from 0 to 1. A larger value indicates that the community structure obtained by the algorithm is closer to the ground-truth structure. When NMI = 1, it signifies that the algorithm’s result is completely consistent with the true community partition. Conversely, when NMI = 0, it indicates that the partition obtained by the algorithm is entirely independent of (or completely different from) the true community partition.
F1 denotes the F1-Score, The F1-Score for community detection is calculated by treating the problem as the classification of node pairs. First, we count the number of node pairs that are correctly grouped together in both the ground-truth and detected partitions (True Positives, TP), as well as the misclassified pairs (False Positives, FP and False Negatives, FN). Then, Precision ( P = T P / ( T P + F P ) ) and Recall ( R = T P / ( T P + F N ) ) are computed. Finally, the F1-Score is derived as their harmonic mean: F 1 = 2 · ( P · R ) ( P + R ) .

4.3. Automatic Selection of Central Nodes

Figure 4 illustrates the method for determining community center nodes using Chebyshev’s inequality. Each subplot displays the product index γ of different network nodes, with the horizontal axis representing the node index and the vertical axis representing the value of γ . The red line indicates a preset upper threshold. Any node whose γ value exceeds this threshold is identified as a community center node. As shown in the subplots, the γ values of the identified center nodes are significantly higher than those of ordinary nodes, demonstrating a clear separation between center nodes and others. In the Karate network, for example, the threshold is set to γ = 0.414 , and Node 1 and Node 34 are automatically selected as center nodes. For comparison, the traditional density peak clustering algorithm—which relies on visual inspection—also identifies the same two center nodes. This indicates that the centers selected by the Chebyshev-based method are highly consistent with those chosen manually. This approach reasonably sets the threshold and effectively identifies center nodes without increasing computational complexity, thereby overcoming the subjectivity and high computational cost of traditional methods.

4.4. Comparative Analysis of Algorithms

(1)
Experiments on real datasets
This section will use the real network dataset in Table 1 above to test the D-means algorithm. The performance of the algorithm will be evaluated by comparing and analyzing the evaluation indicators of each dataset. Due to the applicability of the algorithm, the symbol “-” in the table indicates that the relevant data values are extremely small (approximately zero), rendering them insignificant for comparison purposes.
Table 2 compares the evaluation metrics of the D-means algorithm against six other methods on real-world datasets. The results demonstrate that D-means achieves better overall performance in terms of ACC, ARI, and NMI. However, it does not excel in modularity, as the algorithm does not rely on modularity optimization for network partitioning.
On the Karate dataset, the ACC, ARI, and NMI of D-means are significantly higher than those of other algorithms and closely align with the results of K-means, whereas the performance of other methods is considerably lower. Figure 5 visualizes the partitioning result of D-means on the Karate network. Only one node (Node 3) is misclassified, which can be attributed to its position at the community boundary and its strong connections to nodes in both communities, leading to ambiguity in community assignment.
For the Dolphins dataset, the metrics of D-means are slightly lower than those of K-means but remain superior to other algorithms. The ground-truth network contains two communities, while D-means identifies three, suggesting that the algorithm captures a more fine-grained community structure. The corresponding partition result is shown in Figure 6.
Figure 7 shows the partitioning results of the D-means algorithm on the Polbooks dataset. The real Polbooks network is divided into three communities, and the D-means algorithm only identifies the center points of two communities. Through further analysis of the internal community structure, it was found that the unrecognized community only contains 13 nodes, and the distance attributes of these nodes are small, which makes it impossible to identify them as an independent community. Although there is some error, the overall performance of this algorithm is still superior to other algorithms. Figure 8 shows the performance of the D-means algorithm on the Football dataset. On this dataset, the K-means algorithm performs the best, while the D-means algorithm’s partitioning results are not ideal. Only 7 communities were identified on this dataset using the D-means algorithm, while the actual number of communities was 12. The reason for this phenomenon is that there are potential community center points in the communities that are directly connected to another community node with a higher density attribute, resulting in the community center points being recognized as ordinary nodes, which affects the division of communities.
(2)
Experiments on artificial benchmark network
The GN algorithm suffers from high computational complexity, particularly in large-scale networks where its efficiency decreases exponentially. Consequently, it was excluded from subsequent tests on synthetic benchmark networks. Table 3 compares the performance of the D-means algorithm against five other methods on artificial benchmark networks. Due to the applicability of the algorithm, the symbol “-” in the table indicates that the relevant data values are extremely small (approximately zero), rendering them insignificant for comparison purposes. Experimental results demonstrate that D-means achieves superior community detection performance, confirming its effectiveness.
Among the other algorithms, K-means delivers the strongest overall results but relies on a predefined number of communities, limiting its practical applicability. The D-means algorithm performs closely behind K-means across all metrics, with only marginal differences, yet does not require prior knowledge of the community count—a significant advantage. The Louvain algorithm executes rapidly and scales well to large networks, though it is outperformed by D-means on both the LFR1 and LFR2 datasets. The Walktrap algorithm, based on random walks, is sensitive to the starting node and random seeds, and its performance depends heavily on parameters such as walk length. This necessitates extensive tuning, increasing the practical cost of application. LPA performs poorly on large-scale networks or complex structures, exhibiting strong randomness and poor robustness. In contrast, the FN algorithm performs the weakest, as it incorporates numerous redundant node queries during computation, leading to significantly lower evaluation scores and the poorest partitioning accuracy among all compared methods.
Table 4 compares the execution time of the D-means algorithm against five other methods on synthetic benchmark networks. Experimental results indicate that D-means achieves the highest time efficiency, with its advantage becoming more pronounced as network size increases. The K-means algorithm ranks second in speed, demonstrating high efficiency on smaller networks, though its runtime grows rapidly with increasing node count. While Louvain performs well in detection accuracy, its execution time exceeds that of D-means by more than threefold. LPA exhibits low time complexity with gentle growth scaling. The Walktrap and FN algorithms show the highest computational cost, with runtimes over 200 times greater than that of the D-means method. When the D-means algorithm is executed on a benchmark network with 10,000 nodes, its runtime significantly increases, indicating that the algorithm is currently unsuitable for handling large-scale network computation tasks.
In summary, the D-means algorithm requires no parameter tuning across different networks, demonstrating strong stability and high efficiency in most scenarios, which significantly enhances its flexibility and practical applicability. By integrating the advantages of both DPC and K-means algorithms, it provides an automated and robust community detection solution for small- to medium-scale networks.

5. Application of D-Means Based Community Detection Algorithm in Public Transportation

As a core component of urban infrastructure, the public transportation system exhibits complex structural characteristics. Traditional planning methods are often limited to localized and static analyses, making it difficult to reveal the intrinsic patterns of such systems from a mesoscopic perspective. In this study, we employ the D-means algorithm to investigate the community structure of public transportation networks from a complex network perspective, with the aim of uncovering latent community patterns and functional modules. This approach provides a scientific basis for optimizing urban transportation systems and enhancing public mobility.
We take the Urumqi Metro-Bus Multimodal Transportation Network (MT Network) as the research object. Applying the D-means algorithm for community detection, we obtain a modularity value of 0.555 and identify 12 communities in the MT network, as visualized in Figure 9. The largest community comprises 434 nodes, while the smallest consists of only 7 nodes.
The city is surrounded by mountains to the east, west, and south, while the northern area consists of a flat alluvial plain formed by the Toutun River. This topography gives the city a loggia-like layout, elongated in the north–south direction and relatively narrow from east to west. Analysis of geographical distribution reveals that the public transportation network exhibits a radial pattern, with density gradually decreasing from the urban center toward the periphery. Tianshan District features the highest concentration of stations, while Urumqi County shows a notably sparser distribution. This spatial disparity reflects significant differences in the functional importance and travel demand across zones within the transportation network. The distribution of stations is influenced by multiple factors, including terrain, economic development, and population density. For instance, economically developed and densely populated areas require a higher density of transit stations to meet residents’ mobility needs, whereas regions with complex terrain or lower population density naturally exhibit a more dispersed station layout.
Larger communities consist of densely connected nodes with frequent interactions, forming tightly knit structural units. The most extensive community, for instance, radiates from key hubs such as Hongqi Road West, Zhongquan Square, Heping North Road, and Qizhong. Centered around commercial districts, it encompasses multiple administrative centers including the Market Supervision Bureau, Tax Bureau, Regional Government, and the Xinjiang Branch of the National Financial Regulatory Administration, collectively forming a closely interconnected functional zone focused on commerce and administration. The strong linkages among these nodes facilitate not only commercial exchanges but also enhance social interactions within the region. Another community, centered on Henan East Road in the new urban area, spans multiple residential complexes such as the 23rd Street Residential Community, Chenxin Residential Community, and Zhongtian Hanlin Finished Homes, while also incorporating major scientific and educational institutions including Xinjiang University of Finance and Economics, Xinjiang Vocational University, Xinjiang Police College, and the Chinese Academy of Sciences (Xinjiang Branch). This integration reflects a distinct atmosphere characterized by academic activity and residential life. The compact spatial structure of these communities is often situated in areas with high transportation density in Urumqi. Due to topographical constraints such as rivers and southern mountains, industrial emissions tend to accumulate in the northern plains, leading to the concentration of heavy industrial zones in the northern part of the city. This is exemplified by the deep orange community, which centers around the Shayibak District Corps Industrial Park. Smaller communities, though comprising fewer nodes, maintain strong internal connections and often represent relatively independent or peripheral units within the network. For instance, the light yellow community is located in the industrial park of the new district, clustering enterprises such as China National Pharmaceutical Group Xinjiang Special Pharmaceutical Co., Ltd., Laiwo Technology, and Xinjiang Zhongtai Special Power Equipment Co., Ltd., indicating a clear correlation between community formation and industrial-economic activities. Similarly, the blue-green community in eastern Shuimogou District contains multiple large recreational venues. Despite scattered transportation routes, internal connectivity remains strong. Community division within the urban public transportation network is shaped by a combination of topographic, economic, and social factors. This is reflected in the spatial distribution of communities such as the pink, black, and brown ones, located, respectively, in residential areas of Urumqi County, Shuimogou District, and Toutunhe District, highlighting the role of community structure in serving daily living and residential needs.
The analysis reveals that stations within the same community are geographically adjacent and strongly interconnected. The urban network exhibits clear functional zoning: Tianshan District is dominated by commercial and administrative areas, the new urban area by research-oriented residential clusters, the northern new district by industrial parks, Shuimogou District by large entertainment venues, and Toutunhe and Midong Districts by industrial development. The public transportation layout is shaped by topography, urban functional zoning, economic development, and population distribution. To enhance regional economic vitality, urban planning should prioritize the placement of major facilities such as shopping malls, hospitals, and schools near central nodes. Bridging nodes, which connect multiple communities, are critical to network resilience; their failure or congestion could disrupt inter-community traffic flow. The D-means algorithm effectively uncovers the community structure of the city’s public transportation network, reflecting its structural and functional characteristics, and provides robust support for urban planning and traffic management.

6. Conclusions

In this study, the density peak clustering algorithm is applied to community structure detection, addressing the limitations of the traditional K-means algorithm. A density peak-based community detection algorithm, termed D-means, is proposed by integrating and refining the K-means framework. This algorithm redefines node density and distance attributes to enable the automatic identification of central nodes and determination of the number of communities. To evaluate its performance, longitudinal experiments were conducted on benchmark networks and real-world datasets for comparative analysis. The results demonstrate that the D-means algorithm exhibits stable and satisfactory performance, with significantly reduced running time compared to existing methods. Furthermore, the algorithm was applied to partition the urban transportation network of Urumqi. The analysis revealed that stations within the same community are geographically proximate and highly interconnected, while central nodes exert a considerable influence on both traffic flow and economic dynamics within the community. Influenced by factors such as terrain, economic activity, and industrial layout, the network communities display distinct regional and functional characteristics. These findings indicate that the D-means algorithm can effectively identify community structures and functional zones in urban transportation networks. With technological advancements, real-world networks have grown increasingly large in scale and often exhibit overlapping community structures. However, due to its reliance on eigendecomposition, this algorithm suffers from high computational complexity, making it unsuitable for large-scale network detection and incapable of identifying overlapping communities. To enhance the algorithm’s scalability, we plan to adopt approximation techniques such as the Lanczos algorithm and iterative methods in subsequent work to reduce computational complexity. Furthermore, after spectral embedding, we will replace K-means with C-means clustering for soft assignment, thereby extending the algorithm to handle overlapping communities.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/e28020152/s1.

Author Contributions

Conceptualization, H.G. and J.H.; methodology, J.H.; software, J.H.; validation, Y.L., B.Y. and P.Z.; writing—original draft preparation, H.G.; writing—review and editing, Y.Q. and Y.Z.; project administration, F.L.; funding acquisition, F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant numbers 12375033, 62402010, Key R&D Program of Shaanxi Provincial Department of Science and Technology, grant number 2025CY-JJQ-65. The Natural Science Basic Research Plan in Shaanxi Province of China, grant number, 2024SF-YBXM-134, the Shaanxi Fundamental Science Research Project for Mathematics and Physics, grant numbers 22JSY021, 23JSQ051, the Baoji University of Arts and Sciences Innovative Research Project of Postgraduates, grant number YJSCX25YB45.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original data used in this study are sourced as follows: The four network datasets (Karate, Dolphins, Football, Polbooks) are publicly available with the kind permission of Professor Mark Newman from the Department of Physics and the Center for the Study of Complex Systems at the University of Michigan, and are stored at https://public.websites.umich.edu/~mejn/netdata (accessed on 5 June 2025); The LFR benchmark dataset is generated based on the literature: Lancichinetti, A., Fortunato, S., & Radicchi, F. (2008). Benchmark graphs for testing community detection algorithms.Physical Review E, 78 (4), 046110 [23]; Urumqi Metro-Bus Multimodal Transportation Network dataset will be provided as Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cheung, K.K.; Ozturk, U.; Malik, N.; Agarwal, A.; Krishnan, R.; Rajagopalan, B. A review of synchronization of extreme precipitation events in monsoons from complex network perspective. J. Hydrol. 2025, 651, 132604. [Google Scholar] [CrossRef]
  2. Shin, S.; Kim, J. Review of complex network analysis for MEG. Korean J. Appl. Stat. 2023, 36, 361–380. [Google Scholar] [CrossRef]
  3. Yang, R.; Wang, Y.; Wang, X.; Lei, Z.; Zheng, Z.; Qian, Y. The extended master stability function approach to alternating synchronization modes on networked oscillator systems. New J. Phys. 2026, 28, 013901. [Google Scholar] [CrossRef]
  4. Glover, C.; Albert-László, B. Measuring Entanglement in Physical Networks. Phys. Rev. Lett. 2024, 133, 077401. [Google Scholar] [CrossRef] [PubMed]
  5. Saoud, B. Community structure detection in networks based on Tabu search. J. Control Decis. 2024, 11, 222–232. [Google Scholar] [CrossRef]
  6. Rashid, N.U.; Chan, K.T. A Review on Transportation Network based on Complex Network Approach. arXiv 2023, arXiv:2308.04636. [Google Scholar] [CrossRef]
  7. Sahu, S.; Kothapalli, K.; Banerjee, D.S. Shared-Memory Parallel Dynamic Louvain Algorithm for Community Detection. In IEEE International Parallel and Distributed Processing Symposium Workshops; IEEE: New York, NY, USA, 2024; pp. 1204–1205. [Google Scholar]
  8. Wang, Y. An Improved Complex Network Community Detection Algorithm Based on K-Means. Adv. Intell. Soft Comput. 2012, 160, 243–248. [Google Scholar]
  9. Hajij, M.; Said, E.; Todd, R. PageRank and The K-Means Clustering Algorithm. arXiv 2020, arXiv:2005.04774. [Google Scholar]
  10. Sommer, F.; Fouss, F.; Saerens, M. Modularity-Driven Kernel k-means for Community Detection. In Artificial Neural Networks and Machine Learning; Springer: Cham, Switzerland, 2017; Volume 10614, pp. 423–433. [Google Scholar]
  11. Cai, B.; Zeng, L.; Wang, Y.; Li, H.; Hu, Y. Community Detection Method Based on Node Density, Degree Centrality, and K-Means Clustering in Complex Network. Entropy 2019, 12, 1145. [Google Scholar] [CrossRef]
  12. Kojaku, S.; Radicchi, F.; Ahn, Y.Y.; Fortunato, S. Network community detection via neural embeddings. Nat. Commun. 2024, 15, 9446. [Google Scholar] [CrossRef]
  13. He, C.; Zheng, Y.; Cheng, J.; Tang, Y.; Chen, G.; Liu, H. Semi-supervised overlapping community detection in attributed graph with graph convolutional autoencoder. Inf. Sci. 2022, 608, 1464–1479. [Google Scholar] [CrossRef]
  14. Guo, K.; Lin, J.; Zhuang, Q.; Zeng, R.; Wang, J. Adaptive graph contrastive learning for community detection. Appl. Intell. 2023, 53, 28768–28786. [Google Scholar] [CrossRef]
  15. Baltsou, G.; Tsichlas, K.; Vakali, A. Local community detection with hints. Appl. Intell. 2022, 52, 9599–9620. [Google Scholar] [CrossRef]
  16. Ni, L.; Ge, J.; Zhang, Y.; Luo, W.; Sheng, S.V. Semi-Supervised Local Community Detection. IEEE Trans. Knowl. Data Eng. 2024, 36, 823–839. [Google Scholar] [CrossRef]
  17. Wei, X.; Peng, M.; Huang, H.; Zhou, Y. An overview on density peaks clustering. Neurocomputing 2023, 14, 126633. [Google Scholar] [CrossRef]
  18. Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 1492–1496. [Google Scholar] [CrossRef] [PubMed]
  19. Zang, W.; Liu, X.; Ma, L.; Sun, M.; Che, J.; Zhao, Y.; Wang, Y.; Wang, D.; Liu, X. DPC-MFP: An adaptive density peaks clustering algorithm with multiple feature points. Neurocomputing 2025, 618, 129060. [Google Scholar] [CrossRef]
  20. Xie, J.; Yan, H.; Wang, M.; Grant, P.W.; Pedrycz, W. WANN-DPC: Density peaks finding clustering based on Weighted Adaptive Nearest Neighbors. Pattern Recognit. 2026, 170, 111953. [Google Scholar] [CrossRef]
  21. Bian, Z.; Chung, F.; Wang, S. Fuzzy Density Peaks Clustering. IEEE Trans. Fuzzy Syst. 2021, 29, 1725–1738. [Google Scholar] [CrossRef]
  22. Seyedi, S.A.; Lotfi, A.; Moradi, P.; Qader, N.N. Dynamic graph-based label propagation for density peaks clustering. Expert Syst. Appl. 2019, 115, 314–328. [Google Scholar] [CrossRef]
  23. Lancichinetti, A.; Fortunato, S.; Radicchi, F. Benchmark graphs for testing community detection algorithms. Phys. Rev. E 2008, 78, 046110. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Flow Chart of Density Peak Clustering Algorithm.
Figure 1. Flow Chart of Density Peak Clustering Algorithm.
Entropy 28 00152 g001
Figure 2. Dataset distribution and decision diagram: (a) Data distribution; (b) Decision diagram (Serial numbers represent node identifiers).
Figure 2. Dataset distribution and decision diagram: (a) Data distribution; (b) Decision diagram (Serial numbers represent node identifiers).
Entropy 28 00152 g002
Figure 3. Flowchart of D-means algorithm.
Figure 3. Flowchart of D-means algorithm.
Entropy 28 00152 g003
Figure 4. Automated identification of community center nodes on four real-world networks: (a) Karate; (b) Dolphins; (c) Polbooks; (d) Football.
Figure 4. Automated identification of community center nodes on four real-world networks: (a) Karate; (b) Dolphins; (c) Polbooks; (d) Football.
Entropy 28 00152 g004
Figure 5. Community structure identified by the D-means algorithm in the Karate network (Numbers represent different nodes in the network, and different colors represent different communities).
Figure 5. Community structure identified by the D-means algorithm in the Karate network (Numbers represent different nodes in the network, and different colors represent different communities).
Entropy 28 00152 g005
Figure 6. Community structure identified by the D-means algorithm in the Dolphins network (Numbers represent different nodes in the network, and different colors represent different communities).
Figure 6. Community structure identified by the D-means algorithm in the Dolphins network (Numbers represent different nodes in the network, and different colors represent different communities).
Entropy 28 00152 g006
Figure 7. Community structure identified by the D-means algorithm in the Polbooks network (Numbers represent different nodes in the network, and different colors represent different communities).
Figure 7. Community structure identified by the D-means algorithm in the Polbooks network (Numbers represent different nodes in the network, and different colors represent different communities).
Entropy 28 00152 g007
Figure 8. Community structure identified by the D-means algorithm in the Football network (Numbers represent different nodes in the network, and different colors represent different communities).
Figure 8. Community structure identified by the D-means algorithm in the Football network (Numbers represent different nodes in the network, and different colors represent different communities).
Entropy 28 00152 g008
Figure 9. The Community Division Results of MT Network (Different colors represent different communities).
Figure 9. The Community Division Results of MT Network (Different colors represent different communities).
Entropy 28 00152 g009
Table 1. Real Network Dataset and Synthetic Network Parameters.
Table 1. Real Network Dataset and Synthetic Network Parameters.
Name C N M K m u
Karate234784.59-
Dolphins2621595.13-
Football1211561310.66-
Polbooks31054418.4-
LFR1310005336100.1
LFR26200010,646100.1
LFR310300015,822100.1
LFR414400021,289100.1
The “-” in the table indicates that this parameter is not applicable to the network.
Table 2. Comparison of D-means algorithm with six other algorithms on real datasets.
Table 2. Comparison of D-means algorithm with six other algorithms on real datasets.
KarateDolphins
QACCARINMIF1QACCARINMIF1
GN0.4010.4410.4690.5800.5880.5190.1130.3950.5540.363
FN0.3810.1760.2710.2430.5300.4920.355-0.0040.405
Louvain0.3890.6470.6450.6780.7530.5100.0970.3180.5070.325
Walktrap0.4000.7650.2290.1890.5060.5010.694-0.0170.258
K-means0.3600.9710.8820.8360.8830.3790.9840.9350.8890.538
LPA0.3720.9710.8820.8370.9710.3840.403−0.0120.0220.273
D-means0.3600.9710.8820.8360.5620.4440.8870.5670.5220.651
PolbooksFootball
QACCARINMIF1QACCARINMIF1
GN0.5170.8100.6820.5580.6590.6000.2960.7780.8790.138
FN0.5020.0670.6380.5310.7360.5680.0780.0020.1410.918
Louvain0.5180.0760.6310.5550.6410.602-0.6160.8130.160
Walktrap0.5260.8480.6650.5540.6530.6030.226-0.2130.567
K-means0.4990.8380.6750.5740.7160.6010.9130.8970.9240.108
LPA0.4770.8290.6130.5370.6890.5790.1740.7890.8890.195
D-means0.5120.8480.5460.5000.5580.5830.6350.5650.7710.475
The “-” in the table indicates that the algorithm is not applicable, leading to poor partitioning results with values close to zero, which are meaningless.
Table 3. Comparison of D-means algorithm and four other algorithms on artificial benchmark network.
Table 3. Comparison of D-means algorithm and four other algorithms on artificial benchmark network.
LFR1LFR2
QACCARINMIF1QACCARINMIF1
FN0.5640.328-0.0020.3340.7220.167-0.0030.170
Louvain0.5540.9720.9160.8750.7300.730-0.9980.9970.997
Walktrap0.5650.356-0.00150.7300.7300.215-0.0030.170
K-means0.56611110.7301111
LPA0.5560.3471.01.00.3330.7300.6860.9970.9970.666
D-means0.5610.9990.9780.9710.9850.6920.9870.9140.9220.927
LFR3LFR4
QACCARINMIF1QACCARINMIF1
FN0.7980.093-0.0060.1010.8210.115-0.0080.075
Louvain0.799-1110.825-111
Walktrap0.7990.135-0.05790.1010.8220.1160.0010.0080.075
K-means0.79911110.8251111
LPA0.7990.2231.01.00.20.8240.1050.9980.9980.071
D-means0.7750.9980.9700.9970.9720.7780.9960.9240.9580.930
The “-” in the table indicates that the algorithm is not applicable, leading to poor partitioning results with values close to zero, which are meaningless.
Table 4. Running time results of five algorithms on artificial benchmark networks.
Table 4. Running time results of five algorithms on artificial benchmark networks.
Time/sLFR1LFR2LFR3LFR4
FN53.18402.481403.763480.95
Louvain3.44510.1411.2735.38
Walktrap203.022558.9434,189.7038,734.87
K-means0.8282.8367.27949.19
LPA9.0039.3509.51910.089
D-means0.9843.0257.23114.12
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, H.; Han, J.; Liu, Y.; Zhang, P.; Yang, B.; Zu, Y.; Liu, F.; Qian, Y. K-Means Community Detection Algorithm Based on Density Peaks. Entropy 2026, 28, 152. https://doi.org/10.3390/e28020152

AMA Style

Gao H, Han J, Liu Y, Zhang P, Yang B, Zu Y, Liu F, Qian Y. K-Means Community Detection Algorithm Based on Density Peaks. Entropy. 2026; 28(2):152. https://doi.org/10.3390/e28020152

Chicago/Turabian Style

Gao, Hongyan, Jing Han, Yue Liu, Peng Zhang, Bo Yang, Yanqing Zu, Fei Liu, and Yu Qian. 2026. "K-Means Community Detection Algorithm Based on Density Peaks" Entropy 28, no. 2: 152. https://doi.org/10.3390/e28020152

APA Style

Gao, H., Han, J., Liu, Y., Zhang, P., Yang, B., Zu, Y., Liu, F., & Qian, Y. (2026). K-Means Community Detection Algorithm Based on Density Peaks. Entropy, 28(2), 152. https://doi.org/10.3390/e28020152

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop