1. Introduction
Social networks, as mappings of social relationships, have been widely studied within the framework of complex network theory. They encompass broad and profound interpersonal relationships and information exchange, covering the fields of social science and computer science, with far-reaching implications for business, politics, healthcare, and other fields [
1,
2]. In complex networks, individuals are abstracted as nodes, and the relationships between nodes are abstracted as edges [
3]. Social networks are characterized by many complex network features, including the small-world phenomenon [
4], scale-free [
5], community structure [
6], and so on. Recent research has shown that an in-depth exploration of the characteristics of the community structure of social networks is helpful to reveal the intrinsic organization and patterns of networks in community structures [
7]. In this way, it provides strong theoretical support for a more comprehensive understanding of the structure and function of social networks.
In recent years, social network structure optimization methods have been classified into global optimization algorithms and local optimization algorithms based on the scope of their search space and the nature of their objectives. In the 20th century, Kernighan et al. [
8] proposed the Kernighan–Lin algorithm, which partitions the network into predefined sizes and quantities, defining a function gain for network partitioning and dividing the community based on the change in gain. A large variety of global optimization algorithms evolved from this idea. Newman [
9] defines edge betweenness as the importance index of edges, and they sequentially remove non-community core edges from the network to obtain the community partitioning result; in contrast, Shen et al. [
10] propose a merging clustering algorithm to add edges to the network sequentially based on maximal clusters. Newman et al. [
9] first proposed a modular function to measure the results of community division in 2004, and since then a lot of optimization algorithms have been proposed and improved based on modular functions. Boettcher et al. [
11] optimize the modular function Q using the fitness value of genetic algorithms; Li et al. [
12] present an extreme optimization semi-supervised algorithm based on pairwise constraint structure enhancement, which solved the pseudo-connection problem. However, the modularity optimization algorithm is an NP-complete problem, and it is difficult to find the optimal solution in polynomial time. Donath et al. [
13] first introduced the concept of spectral clustering, and in the same year, Fielder et al. [
14] extended spectral clustering to community detection technology by using the Laplace matrix. Shi et al. [
15] and Ng et al. [
16] use the normalized Laplacian matrix to define the spectral clustering algorithm to achieve processing of large datasets under sparse similarity graph constraints. However, each of these global optimization algorithms has obvious limitations in application by requiring information about the topology of the entire network, as well as the number and size of predefined communities. Later, Clauset et al. [
17] proposed a local optimization algorithm, which led to the emergence of various local optimization algorithms such as the clique percolation method [
18], the label propagation algorithm [
19], and the local edge clustering optimization algorithm [
20].
The traditional label propagation algorithm [
19] is an efficient local optimization algorithm that does not require complete network structure information and has lower time complexity, which is close to linear complexity, but the algorithm is highly random and weakly robust and is prone to fall into local optima, which makes it difficult to obtain effective community detection. Zhang et al. [
21] present a node importance label optimization algorithm based on Bayesian networks, which updates the order of nodes based on their importance to optimize the stochasticity of traditional LPA algorithms, but it requires a large amount of prior knowledge in defining the node importance. Kouni et al. [
22] use the LPA strategy, allowing a node to contain multiple labels, giving a label confidence function, and updating labels according to the confidence coefficients of different labels. Yu et al. [
23] propose a Deep Walk label optimization algorithm for overlapping community detection. This method utilizes the Deep Walk model to learn the network topology and obtain low-dimensional vector representations of nodes. It constructs a weight matrix by vector dot product operations and detects overlapping communities based on information exchange between nodes. Hosseini et al. [
24] used the similarity index to assign weights to edges and transformed the label propagation process into the ant colony traversal process according to the ant colony optimization (ACO) algorithm, which transforms the problem of labeling influence into a problem of transfer probability. Wang et al. [
25], in order to improve the robustness of the LPA algorithm, propose an importance metric that combines network locality with the node’s global position in the network, but at the expense of increasing the complexity of the algorithm itself and sacrificing the advantages of the LPA algorithm’s fast running speed and low time overhead. Similarly, Liu et al. [
26] add multi-step greedy fusion [
27] to optimize the LPA algorithm to improve the robustness of the algorithm and increases the complexity of the algorithm. Laassem et al. [
28] extended Coulomb’s law for electrostatic attraction in physics to social networks, proposing a novel similarity matrix that quantifies node importance through pairwise attraction forces. However, this Coulomb matrix-based approach incurs significant computational overhead in both time and space complexity. In order to solve the problem of the instability and low quality of the attribute graph, Berahmand et al. [
29] generate a weighted graph combining node attributes and topological structure from the attribute graph of nodes with edge connections, so that the detected community is characterized by both structural cohesion and attribute homogeneity, which maintains the original efficiency of the LPA algorithm and reduces the number of iterations. However, most label optimization algorithms tend to consider only local node information while ignoring global importance. Additionally, while some optimization algorithms improve community division results, they still suffer from instability issues.
Meanwhile, with the rapid development of deep learning techniques, community detection methods based on Graph Neural Networks (GNNs), such as GCNs [
30] and GAT [
31], have gradually emerged as a research hotspot. These methods overcome the limitations of traditional approaches that rely on handcrafted features or optimization functions by learning low-dimensional node representations in an end-to-end manner, automatically capturing both global topological and local neighborhood features. CPGC [
32] effectively detects overlapping and non-overlapping communities by integrating representation learning with clustering, improving graph convolution operations, and introducing community perspective similarity, thereby leveraging both attribute and structural information. However, challenges remain in computational efficiency and scalability. GEAM [
33], a graph-enhanced attention model designed for multiplex networks, enhances community detection accuracy by effectively integrating cross-layer semantic information via inter-layer contrastive learning, a self-attention adaptive fusion mechanism, and an edge density-driven module. Additionally, for large-scale networks, inductive learning methods such as GraphSAGE [
34] achieve efficient training by sampling neighboring nodes.
Although deep learning methods have demonstrated excellent performance in community detection, their effectiveness relies heavily on high-quality training data. Moreover, their black-box nature makes community partitioning results difficult to interpret, and they often exhibit limited generalizability on sparse networks. To address the advantages and disadvantages of existing algorithms, we propose a new label propagation algorithm for network node importance, called the DegreeRank Label Propagation (DRLP) Algorithm. By explicitly defining node importance metrics and label update rules, DRLP maintains algorithmic efficiency while achieving interpretable community partitioning logic. Specifically, by defining a new update rule and fixing the order of nodes for the label update, DRLP improves both the robustness of the traditional LPA and community detection accuracy. Firstly, DRLP generates a specific metric indicator to reflect the importance of nodes in the whole network topology based on the local and global structural characteristics of each node. Secondly, DRLP repeats the process of labeling each node according to a new label updating rule until the community structure of the whole network is detected. Throughout the algorithm phase, we determine the update order of nodes in descending order of node importance to reduce the randomness of the algorithm; meanwhile, in the label update rules, we consider that the update of node labels is determined by both the authority of the neighboring nodes (node importance) and the closeness between the node and the neighboring nodes. Finally, DRLP divides all nodes with the same label into a community. The main contributions of this paper are summarized as follows:
We propose a new method to reflect the correlation between network nodes that can efficiently find the shortest path and decrease the time complexity. We obtain a way to detect the correlation between nodes more accurately when analyzing the influence of nodes such that network topology information is kept.
We introduce a damping factor that reflects the affinity between nodes, which can be adjusted to affect the change of affinity between nodes caused by unexpected events, which is more in line with interpersonal interactions in real social networks.
We present a new node importance metric to solve the random problem of existing similar algorithms by globalizing the local characteristics of nodes. This metric provides a more accurate assessment of node importance, which can improve the accuracy of community partition.
We propose a modified label propagation strategy. We emphasize the influence of neighboring nodes on the target node when selecting a label, which ensures the maximization of influence propagation within each community during the community partitioning process. This method is used to solve the problem of random selection of nodes, which enhances the efficiency and feasibility of the algorithm.
We perform simulations to verify the stability and superiority of the DRLP algorithm on real network datasets and artificial synthetic networks. We also validate the higher accuracy and better performance in terms of NMI and modularity than other methods.
The rest of this paper is organized as follows.
Section 2 summarizes related work. Then, in
Section 3, we introduce the main ideas and detailed process of our algorithm, and we also analyze the complexity of the proposed algorithm. We introduce the relevant parameter settings of our algorithm and discuss the experimental results of the real-world and synthetic networks in
Section 4. Finally,
Section 5 provides conclusions and perspectives.
3. Solution for DRLP
We will introduce the main ideas of DRLP and show how to efficiently solve it step by step in this section. Firstly, we extract the sub-graph at each stage of the experiment, and the process is shown in
Figure 1.
DRLP is designed in four stages. In phase 1, a similarity adjacency matrix is constructed according to the node correlation coefficients. In phase 2, a unique label is initially assigned to each node, the node importance (NI) index is calculated, and a sorted node order (SNO) list is generated based on their importance. In phase 3, a novel label update strategy is proposed, namely utilizing a preference selection strategy for label updating. Finally, the community division is finished when the label propagation process stops. From
Figure 1, it can be seen that our algorithm’s community segmentation results are evident.
The details of the DRLP model will be elaborated as follows. For better discussion, the descriptions of notations used in this paper are listed in Abbreviations section.
3.1. Construction of Similarity Adjacency Matrix
In this section, we present a weighted adjacency matrix that can reflect the correlation between network nodes.
Many network analysis techniques, such as Shortest Path Distance [
36], Average Path Length [
4], Diameter [
4], Clustering Coefficient [
4], and Jaccard Similarity [
37], can be used to measure the distance or similarity between nodes in a graph or complex network. Among these metrics, the Shortest Path Distance and Diameter are usually necessary for calculating the shortest path lengths between nodes, resulting in high time complexity, which gives poor performance in large-scale networks. Although the density of connections between nodes and their neighbors is provided by the Clustering Coefficient, the distance or similarity between individual nodes is not directly reflected by the coefficient. Jaccard Similarity partially denotes the intersection of neighbor sets between two nodes, but in real-world social networks, the similarity or distance between two nodes is not always consistent. For example, if node A has only node B as its neighbor while node B has other neighbors besides A, the influence of B on A is much greater than that of A on B. Therefore, we propose a new metric to measure the distance or similarity between nodes.
Definition 2 (Similarity adjacency matrix,
S).
The model of a social network is an undirected complex graph , where each node u in V is a user or entity in the social network, and each edge in E is the social relationship between entities u and v. For any two nodes u and v, the similarity index of node u for node v in graph G is defined as follows: The method for calculating the similarity adjacency matrix
S is given in Algorithm 1.
Algorithm 1 Constructing the similarity adjacency matrix |
|
3.2. Node Importance Index
Complex networks consist of numerous nodes [
38] with diverse and intricate topological configurations. There are some drawbacks in most of the existing methods for measuring node importance, such as monolithic metrics and ignoring the global or local roles of nodes in the network topology. To address this problem, a new metric for node importance is defined by combining the characteristics of PageRank and the degree centrality index.
Definition 3 (Node importance, NI).
For any node u in V, the node importance index of node u in graph G is defined as follows:where is the node importance index of node u, is the degree of node v, and α is the damping factor. The importance of each node is initialized first based on the degree centrality and neighborhood relevance among the given importance metrics, and then the importance of each node is iteratively updated following the PageRank strategy, with the number of iterations denoted as β. The method for calculating the node importance (NI) is given in Algorithm 2.
Algorithm 2 Solving node importance index |
|
3.3. Modified Label Selection Process
In this section, we propose a modified label update rule and introduce a preference label selection strategy.
Differently from the existing algorithms, we not only analyze the frequency of labels among the neighbors of the node, but also refer to the intimacy between nodes and their neighbors and the influence of neighbor nodes. According to our strategy, the greater the proportion of shared neighbors in all the neighbors of the target node, the greater the influence over the target node. Similarly, the node with greater importance will have a greater influence on the target node. Therefore, we define the label influence as the impact of label groups among neighbors on the node.
Definition 4 (Label influence,
).
For any node u in V and v in , the influence of the label on node u is defined as follows:where is the influence of label on node u, and is the set of neighbors of node u with label . The process of improved label selection is given in Algorithm 3.
Algorithm 3 The modified label selection process |
|
3.4. DegreeRank Label Propagation (DRLP) Algorithm
After giving some core definitions of the algorithm, this section demonstrates the detailed process of the DegreeRank Label Propagation Algorithm, as shown in Algorithm 4.
3.5. Complexity Analysis
Given an undirected unweighted graph G, where is the number of nodes, is the total number of edges, and d is the average degree of nodes, the time complexity of each step of the DRLP is calculated as follows:
- Step 1:
Initialize each node with a unique label and time complexity ;
- Step 1:
Construct the similarity adjacency matrix with time complexity ;
- Step 1:
Calculate the DC value of each node with time complexity ;
- Step 1:
Calculate the list of NI with time complexity ;
- Step 1:
Sort the nodes based on their importance with time complexity ;
- Step 1:
Update the labels for each node with t iterations and time complexity .
Therefore, we obtain the time complexity of the DRLP algorithm, which is
.
Algorithm 4 DRLP algorithm |
|
5. Conclusions
Addressing complex and diverse community detection tasks, this paper proposes a Degree and PageRank-based Label Propagation (DRLP) algorithm. First, we calculate the weight matrix for each node, then compute the node importance based on the weight matrix, and finally sort the nodes in descending order of importance. During the label propagation process, a novel neighbor closeness strategy is adopted, which avoids the random selection of the traditional LPA and effectively improves the accuracy of community detection. Using different evaluation indicators, DRLP was compared and analyzed with several label improvement algorithms. Simulation results on real and synthetic network datasets show that our algorithm achieves better modular density with maintaining high modularity, and it demonstrates high accuracy for both small-scale and medium-to-large-scale synthetic networks. However, the algorithm’s complexity might present scalability limitations for very large networks (e.g., ), necessitating additional optimization.
While DRLP demonstrates good performance across medium-to-large-scale networks, we recognize that its complexity may impose scalability limitations for extremely large networks. Therefore, future work could explore some optimization strategies such as parallel computing or similar techniques to reduce the algorithm’s time complexity. Furthermore, based on the existing framework, we plan to extend DRLP’s applicability to more complex network scenarios in the future, including weighted networks, dynamic networks, and optimized implementations for overlapping community detection. Additionally, we will explore potential synergies between DRLP and GNN architectures to further enhance its performance. These comprehensive optimizations will collectively expand the algorithm’s practical utility while preserving its core advantage in community detection accuracy.