Unsupervised Community Detection Algorithm with Stochastic Competitive Learning Incorporating Local Node Similarity

: Community detection is an important task in the analysis of complex networks, which is signiﬁcant for mining and analyzing the organization and function of networks. As an unsupervised learning algorithm based on the particle competition mechanism, stochastic competitive learning has been applied in the ﬁeld of community detection in complex networks, but still has several limitations. In order to improve the stability and accuracy of stochastic competitive learning and solve the problem of community detection, we propose an unsupervised community detection algorithm LNSSCL (Local Node Similarity-Integrated Stochastic Competitive Learning). The algorithm calculates node degree as well as Salton similarity metrics to determine the starting position of particle walk; local node similarity is incorporated into the particle preferential walk rule; the particle is dynamically adjusted to control capability increments according to the control range; particles select the node with the strongest control capability within the node to be resurrected; and the LNSSCL algorithm introduces a node afﬁliation selection step to adjust the node community labels. Experimental comparisons with 12 representative community detection algorithms on real network datasets and synthetic networks show that the LNSSCL algorithm is overall better than other compared algorithms in terms of standardized mutual information (NMI) and modularity (Q). The improvement effect for the stochastic competition learning algorithm is evident, and it can effectively accomplish the community detection task in complex networks.


Introduction
With the advancement of information technology, many complex systems in real life can often be described and represented in the form of complex networks, such as social networks, citation networks, scientist collaboration networks, and protein interaction networks.The majority of these real-life networks typically exhibit distinct community structures, where a network consists of multiple communities, and the connections between nodes within a community are highly dense, while connections between nodes of different communities are relatively sparse [1].Community detection is a fundamental task in the analysis of complex networks, aiming to partition the entire network into several communities.This process holds significant importance for studying and analyzing the organizational structure and functionality of networks, as well as uncovering latent patterns within them.
There has been a great deal of research in detecting and evaluating community structures in complex networks.In pursuit of this fundamental task of community detection, researchers have proposed numerous community detection algorithms based on various methods such as graph partitioning, statistical inference, clustering, modularity optimization, dynamics, and deep learning.More detailed reviews of community detection can be available through several more extensive review articles [2][3][4][5][6].Among many methods, adjusted according to its current control range.When a particle runs out of energy, it is assigned a unique resurrection position.After the wandering is completed, the community label is adjusted through the node affiliation selection step to obtain the final community discovery result.The main objectives and contributions of this study are as follows: (1) Determining the initial positions of particles based on node degrees and the Salton similarity index, ensuring fixed and dispersed particle placements to mitigate intense early-stage competition and subsequently accelerate convergence speed; (2) Incorporating the proposed node similarity measure to enhance the deterministic and directional aspects of particle preferential walk rules; refining the rules for selecting particle resurrection positions; introducing a node affiliation selection step to refine the final community detection results and enhance algorithm stability; (3) Dynamically adapting the increment of particle control ability according to the particle's current control range, thereby improving the effectiveness of detecting communities of varying sizes within the network; (4) The LNSSCL algorithm is experimentally compared with 12 representative algorithms on real network datasets and synthetic networks.The results demonstrate that the proposed algorithm enhances the community detection performance of stochastic competitive learning and, overall, outperforms other algorithms.
The remainder of this paper is organized as follows.Section 2 presents related work.Section 3 introduces some related preliminary knowledge.Section 4 describes the details and main framework of the proposed LNSSCL algorithm.The experiments are shown and discussed in Section 5. Section 6 gives the conclusions of this paper.

Related Work
Complex networks can model various relationships and internal operating mechanisms between entities and objects in the real world.Detecting the community structure present in the network can help us further reveal aspects of the real world.Therefore, community detection in complex networks has received attention from many fields and is rapidly evolving.Among the many methods proposed, Dynamics-based methods utilize the dynamic properties of complex networks.Most of these algorithms have linear time complexity and can be better scaled to large-scale networks.Since it is impractical to review all previously proposed algorithms for community detection, this section mentions only some representative algorithms from recent years.
Roghani et al. [21] introduced a community detection algorithm based on local balance label diffusion.They assigned importance scores to each node using a novel local similarity measure, selected initial core nodes, and expanded communities by balancing the diffusion of labels from core to boundary nodes, achieving rapid convergence in large-scale networks with stable and accurate results.Toth et al. [22] proposed the Synwalk algorithm, which incorporates the concept of random blocks into random walk-based community detection algorithms, combining the strengths of representative algorithms like Walktrap [23] and Infomap [24], yielding promising results.Yang et al. [25] introduced a method of enhancing Markov similarity, which utilizes the steady-state Markov transition of the initial network to derive an enhanced Markov similarity matrix.By partitioning the network into initial community structures based on the Markov similarity index and subsequently merging small communities, tightly connected communities are obtained.Jokar et al. [26] proposed a community discovery algorithm based on the synergy of label propagation and simulated annealing, which achieved good results.You et al. [27] proposed a three-stage community discovery algorithm TS, which obtained good results through central node identification, label propagation, and community combination.Fahimeh et al. [28] proposed a community detection algorithm that utilizes both local and global network information.The algorithm consists of four components: preprocessing, master community composition, community merging, and optimal community structure selection.Zhang et al. [29] propose a graph layout-based label propagation algorithm to reveal communities in a network, using multiple graph layout information to detect accurate communities and improve stability.
Chin et al. [30] proposed the semi synchronization constrained label propagation algorithm SSCLPA, which implements various constraints to improve the stability of LPA.Fei et al. [31] proposed a novel network core structure extraction algorithm for community detection (CSEA) using variational autoencoders to discover community structures more accurately.Li et al. [32] developed a new community detection method and proposed a new relaxation formulation with a low-rank double stochastic matrix factorization and a corresponding multiplicative optimization-minimization algorithm for efficient optimization.

Background
In this section, we introduce some related preliminary knowledge, including basic definition, local node similarity, and the theory of stochastic competitive learning.

Basic Definition
The number of nodes is n and the number of edges is m.Unless otherwise specified, this paper solely focuses on the analysis of undirected simple graphs.The neighborhood of node v i is defined as Let A be the adjacency matrix of network G, an n-order matrix, defined as follows:

Local Node Similarity
In the analysis of complex networks, node similarity metrics are commonly employed to assess the degree of similarity between nodes.Generalizing from the classical triadic closure principle in social network analysis, it is understood that in a given complex network, the greater the number of common neighbors between two nodes, the more similar these nodes are.The specific definition of the common neighbor of nodes v i and v j is as follows: Based on the local structure, node similarity metrics are derived from the concept of common neighbors and encompass various indices such as the Salton index, Jaccard index, Sorenson index, Hub Promoted Index, Hub Depressed Index, Leicht-Holme-Newman Index, Preferential Attachment Index, Adamic-Adar Index, and Resource Allocation Index [33].The higher the local similarity between nodes, the higher the probability that they belong to the same community, and vice versa.Node similarity metrics based on local structure also offer the advantage of lower computational complexity and have been introduced into the task of complex network community detection.

Stochastic Competitive Learning
Stochastic Competitive Learning, as a classical particle competition model, constitutes a competitive dynamical system composed of multiple particles, achieving community detection through unsupervised learning [14].
In Stochastic Competitive Learning, multiple particles are randomly placed within the nodes of the network.Each particle serves as a community indicator, while the nodes in the network are treated as territories to be contended.The primary objective of particles is to expand their territories by continually traversing the network and gaining control over new nodes, while simultaneously strengthening their control over already dominated nodes.Due to the finite number of nodes in the network, natural competition arises among particles.When a particle visits any node, it enhances its control over the current node, consequently weakening the control of other competing particles over that node.Ultimately, each particle's control range tends to stabilize, leading to convergence.By analyzing the control ranges of particles after convergence, the underlying community structure of complex networks is unveiled [13].
As a stochastic nonlinear dynamical system, Stochastic Competitive Learning describes the state of the entire dynamic system through vectors p(t), E(t), S(t), and matrix Nu(t).Among these, vector p(t) represents the current positions of each particle within the network; vector E(t) signifies the energy possessed by each particle.When a particle visits a node under its control, its energy increases by ∆, whereas visiting a node controlled by a competing particle reduces its energy by ∆.This mechanism limits each particle's roaming range, thus minimizing remote and redundant network access.Vector S(t) denotes the dynamic state of each particle: particles with energy are in an active state, continuously traversing the network; when energy depletes, particles enter a dormant state and randomly jump to one of the nodes under their control for revival.Matrix Nu(t) records the visitation counts of each particle for all nodes in the network.The more a particle visits a particular node, the greater its control over that node.The particle with the highest visitation count for a node attains control over it.
In Stochastic Competitive Learning, particles in an active state navigate through the network following a convex combination of random and preferential walk rules [34].The particle walking rule, denoted as P (k) transition (i, j, t), is defined as follows: where i denotes node v i .j denotes node v j .t represents the moment.k indicates the particle.
pre f (i, j, t) denotes the particle preferential walk rule.P (k) rand (i, j) represents the particle random rule.λ ∈ [0, 1] represents the probability of a particle performing preferential walk, regulating the balance between random and preferential walking.When λ = 1, the particle exclusively follows preferential walking; when λ ∈ (0, 1), the particle performs a combination of both random and preferential walking; and when λ = 0, the particle solely engages in random walking.The random walking mode guides the particle's exploratory behavior, where the particle randomly visits neighboring nodes without considering their control capacity.This mode reflects the particle's randomness, and the equation for random walking is defined as follows: The preferential walking mode guides the particle's defensive behavior, where the particle prioritizes visiting nodes it already controls, rather than nodes that are not yet under its control.This mode reflects the particle's determinism, and the equation for preferential walking is defined as follows: where Nu (k) j (t) represents the current control capacity of particle k over node v j , determined by the proportion of visits that the particle makes to the node.The equation is defined as follows: Appl.Sci.2023, 13, 10496 6 of 22 When the entire system reaches the convergence criterion, particles cease their wandering.The convergence criterion for the system is defined as follows: where the convergence factor ε typically takes a value of 0.05.Finally, the community structure is revealed based on the control ranges of individual particles after convergence.
To address the issue of determining the number of particles, stochastic competitive learning employs the particle average maximum control capability metric to establish a reasonable particle placement count within the network, thereby determining a suitable community quantity [35].The definition of the particle average maximum control capability metric is as follows: where max s∈K Nu (s) u (t) represents the maximum control capability exerted by particle s on node u.For a network with a community count of K, if the number of particles placed is exactly K, each particle will dominate a community without excessively interfering with the control regions of other particles.Therefore, R(t) will take its maximum value.When the particle count is less than the actual number of communities, each particle competes with others to control larger communities, resulting in the attenuation of their control over nodes and causing a decrease in R(t) .When the particle count exceeds the actual community count, particles unavoidably fiercely compete for control over the same group of nodes, leading to a decrease in R(t) .In conclusion, when R(t) is maximized, the corresponding optimal number of particles placed is the best quantity.The specific method is as follows: gradually increase the placed particle count from 2 to K + 1 and record the R value when the system converges under different particle counts.The optimal number of particles placed corresponds to the particle count that yields the maximum R(t) value.

LNSSCL Algorithm
To enhance the stability and accuracy of community detection results, improvements have been made in various aspects such as particle initialization positions, particle preferential walking rules, particle control ability increments, particle resurrection position selection, and the introduction of node affiliation selection.In light of these enhancements, we propose the Unsupervised Community Detection Algorithm with Stochastic Competitive Learning Incorporating Local Node Similarity, which integrates local node similarity into the stochastic competitive learning framework.

Determining Particle Initialization Positions
The Stochastic Competitive Learning algorithm stipulates that each particle randomly selects a different node in the network as its starting position for walking.The random uncertainty in particle initialization can lead to unstable community detection outcomes.Additionally, this initialization approach might result in particles' starting positions clustering within a single community, intensifying the competitive relationships among particles during their walks.This situation requires a considerable amount of time for convergence.Addressing these concerns, the random placement for initialization is abandoned.Instead, each particle's initial position is determined based on the node's degree and the Salton similarity index between nodes.This approach aims to distribute particles across different communities as much as possible, accelerating the convergence rate of particle walks and enhancing the stability of community detection outcomes.
Node degree is commonly used to measure the importance of a node within the entire network, while the Salton similarity index is often employed to gauge the similarity between a node and its neighboring nodes.It is defined as follows: Combining the two aforementioned metrics, the rules for determining particle initialization positions are as follows.Firstly, arrange all nodes in the network in descending order based on their degree values.Select the node with the highest degree value as the starting position for the first particle's walk.Next, calculate the average Salton similarity index between the node where the already determined starting-position particle is located and all other nodes in the network.Choose the node with the smallest average Salton similarity index as the starting position for the next particle.This process is then repeated iteratively to progressively determine the starting positions for the remaining particles.Finally, when the starting position for each particle is determined, the particle initialization process is completed.Figure 1 depicts the particle initialization position under the condition of three particles.As can be seen from the figure, particles 1, 2, and 3 are dispersed and placed in the network after the position initialization step.p (k) (0) denotes the starting position of particle k.The rules for determining particle initialization positions can be expressed as follows: Appl.Sci.2023, 13, x FOR PEER REVIEW 8 of 23

Incorporating Node Local Similarity into Particle Preferential Movement Rule
The stochastic competitive learning algorithm stipulates that particles navigate through the network based on a convex combination of random walk and preferential walk rules.The preferential walk rule ensures that particles preferentially access nodes under their control, reflecting the deterministic nature of particle traversal, and numerically equivalent to the particle's control ability ratio.However, this rule solely focuses on the particle's control ability over nodes, without considering the influence of node local similarity indicators.This may lead to a relatively high degree of randomness and weak inclination in the initial direction of preferential walk, thereby affecting the stability and accuracy of community detection results.To address these issues, an enhancement to the particle preferential walk rule is introduced by incorporating node similarity, enabling nodes with greater similarity to be more likely visited by particles.This modification enhances the directionality and determinism of particle traversal during the walk.
The improved node similarity for the enhanced particle preferential walk rule not

Incorporating Node Local Similarity into Particle Preferential Movement Rule
The stochastic competitive learning algorithm stipulates that particles navigate through the network based on a convex combination of random walk and preferential walk rules.The preferential walk rule ensures that particles preferentially access nodes under their control, reflecting the deterministic nature of particle traversal, and numerically equivalent to the particle's control ability ratio.However, this rule solely focuses on the particle's control ability over nodes, without considering the influence of node local similarity indicators.This may lead to a relatively high degree of randomness and weak inclination in the initial direction of preferential walk, thereby affecting the stability and accuracy of community detection results.To address these issues, an enhancement to the particle preferential walk rule is introduced by incorporating node similarity, enabling nodes with greater similarity to be more likely visited by particles.This modification enhances the directionality and determinism of particle traversal during the walk.
The improved node similarity for the enhanced particle preferential walk rule not only considers cases where nodes share common neighbors, but also accounts for situations where nodes lack common neighbors.When there are shared neighbors between nodes, a similarity index considering both common neighbors and degree difference is employed to measure the degree of similarity between two nodes, as defined below: For nodes v i and , and N h is set to N(v j ).When there are no shared neighbors between nodes, further consideration is needed for nodes with a degree value of 1.For nodes without shared neighbors and with a degree value of 1, since their behavior is solely related to their unique first-order neighbor, their similarity value is set to 1.For nodes without shared neighbors and with a degree value greater than 1, their similarity is associated with the node's degree value.Based on the negative correlation between node degree and the unfavorable Hub Depressed Index, the degree value is inversely related to its similarity.The equation for calculating the similarity between nodes without shared neighbors is defined as follows: Building upon this, the equation for the particle's preferential walk rule incorporating node similarity is provided: where Sim(i, j) represents the similarity index between nodes v i and v j , and Nu (k) j (t) repre- sents the control capacity of particle k over node v j .The improved preferential walk rule takes into account both the particle's control capacity over nodes and the similarity index between nodes as equally significant factors.This approach avoids the issue of randomness in the preferential walk direction that arises after particle initialization, thereby enhancing the inclination and certainty of particle movement throughout the entire preferential walk process.

Dynamically Adjusting Particle Control Capacity Increment
In the Stochastic Competitive Learning algorithm, the control capacity of particles is quantified as the proportion of node visits, thereby the number of times a particle visits a node determines its control capacity over that node.When particle k visits node v i , the equation for the change in the particle's visit count to that node is given by: According to Equation ( 14), it can be observed that the increment of particle control capacity remains constant at 1.This would lead to particles having the same level of competitive increment during the walking process.This uniform competitive increment among different particles could potentially result in similar community sizes controlled by different particles.Consequently, this might lead to instances where representative particles of smaller communities erroneously compete for nodes at the boundaries of larger communities.However, many real-life complex systems, represented as complex networks, often encompass communities of varying sizes.The constant increment in particle control capacity could potentially yield suboptimal results in the final community detection outcome.Figure 2 depicts the possible encroachment of a small community into a large community node when the particle control capacity increment is constant.The size of community 1 in the figure is actually larger than that of community 2. However, because of the constant particle control capacity increment, it makes it possible for the range of communities controlled by each particle to converge to the same size.This then causes nodes that should belong to community 2 to be misclassified to community 2.  Addressing the aforementioned issues, an improvement is made to the particle control capacity increment in order to enhance the effectiveness of discovering communities of varying sizes within the network.The enhanced particle control capacity increment is dynamically adjusted based on the current control range of the particle, and the specific equation is provided below: where     k C t represents the current control range of particle k , which indicates the number of nodes currently under the control of particle k .From Equation (15), it can be observed that the particle's control capacity increment is positively correlated with its current control range.This relationship can unveil community structures of different sizes within the network and prevent particles from erroneously encroaching upon nodes located at the boundaries of communities.

Determining Particle Resurrection Locations and Node Affiliation Selection
In stochastic competitive learning, when a particle visits a node under its control, its energy increases.On the other hand, when it visits a node controlled by a competing particle, its energy decreases.This energy manipulation serves to constrain the particle's walking range, thus reducing long-range and redundant accesses in the network.If a particle frequently visits nodes controlled by competing particles, its energy will continuously decrease until it is exhausted and enters a dormant state.Subsequently, the particle will randomly jump to a node within its control range to revive and recharge.
Clearly, the choice of the particle's revival location has a high degree of randomness, which can lead to unstable community detection results.To address this issue, based on the particle's control over nodes, we select the node with the highest control capability as the unique revival location among the nodes it already controls, eliminating the uncertainty in location selection.If a particle currently doesn't control any nodes, it will randomly choose any node in the network for revival.The improved particle revival location selection is shown in Figure 3, where the energy of particle 1 is depleted due to its traversal into the control region of particle 3.After the improvement, particle 1 will no longer randomly jump to any controlled node within the dashed box, but will instead Addressing the aforementioned issues, an improvement is made to the particle control capacity increment in order to enhance the effectiveness of discovering communities of varying sizes within the network.The enhanced particle control capacity increment is dynamically adjusted based on the current control range of the particle, and the specific equation is provided below: where C (k) (t) represents the current control range of particle k, which indicates the number of nodes currently under the control of particle k.From Equation (15), it can be observed that the particle's control capacity increment is positively correlated with its current control range.This relationship can unveil community structures of different sizes within the network and prevent particles from erroneously encroaching upon nodes located at the boundaries of communities.

Determining Particle Resurrection Locations and Node Affiliation Selection
In stochastic competitive learning, when a particle visits a node under its control, its energy increases.On the other hand, when it visits a node controlled by a competing particle, its energy decreases.This energy manipulation serves to constrain the particle's walking range, thus reducing long-range and redundant accesses in the network.If a particle frequently visits nodes controlled by competing particles, its energy will continuously decrease until it is exhausted and enters a dormant state.Subsequently, the particle will randomly jump to a node within its control range to revive and recharge.
Clearly, the choice of the particle's revival location has a high degree of randomness, which can lead to unstable community detection results.To address this issue, based on the particle's control over nodes, we select the node with the highest control capability as the unique revival location among the nodes it already controls, eliminating the uncertainty in location selection.If a particle currently doesn't control any nodes, it will randomly choose any node in the network for revival.The improved particle revival location selection is shown in Figure 3, where the energy of particle 1 is depleted due to its traversal into the control region of particle 3.After the improvement, particle 1 will no longer randomly jump to any controlled node within the dashed box, but will instead jump to the node indicated by the dashed arrow (assuming that node has the highest control capability value for particle 1).Once the algorithm reaches the convergence criterion, particles will cease their walks.Based on the control capabilities of each particle over nodes, all nodes in the network are assigned to the corresponding communities represented by particles.However, due to the potential randomness and potential misclassifications in the steps executed by particles before stopping their walks, a node membership selection step is introduced.By considering the frequency of community labels among neighboring nodes, this step ensures that each node is correctly assigned to its appropriate community, further optimizing the community detection outcomes.Specifically, for each node, the occurrence frequency of community labels among its neighboring nodes is observed.If the most frequent neighboring community label is unique, it is selected as the community label for that node.If the most frequent neighboring community label is not unique, an influence score is computed for each community, and the community label with the highest influence score is selected.The influence score k C Effect for a community is calculated as shown in the equation below: where k C is one of the most frequent communities, and   i N v represents the set of neighboring nodes of node i v .

Algorithm Description
Algorithm 1 describes the method of the LNSSCL algorithm; the pseudocode is shown below.

Algorithm 1 LNSSCL algorithm
The probability of preferential walk for particles  , Particle energy increment  , Convergence factor  Output: The number of communities K , The set of communities C 1: for each particle k do: Once the algorithm reaches the convergence criterion, particles will cease their walks.Based on the control capabilities of each particle over nodes, all nodes in the network are assigned to the corresponding communities represented by particles.However, due to the potential randomness and potential misclassifications in the steps executed by particles before stopping their walks, a node membership selection step is introduced.By considering the frequency of community labels among neighboring nodes, this step ensures that each node is correctly assigned to its appropriate community, further optimizing the community detection outcomes.Specifically, for each node, the occurrence frequency of community labels among its neighboring nodes is observed.If the most frequent neighboring community label is unique, it is selected as the community label for that node.If the most frequent neighboring community label is not unique, an influence score is computed for each community, and the community label with the highest influence score is selected.The influence score E f f ect C k for a community is calculated as shown in the equation below: where C k is one of the most frequent communities, and N(v i ) represents the set of neigh- boring nodes of node v i .

Algorithm Description
Algorithm 1 describes the method of the LNSSCL algorithm; the pseudocode is shown below.

Algorithm 1 LNSSCL algorithm
The probability of preferential walk for particles λ, Particle energy increment ∆, Convergence factor ε Output: The number of communities K, The set of communities C 1: t = 1 2: K = 2 3: repeat 1: for each particle k do: 5: calculate the initial positions of particles p (k) (0) using Equation (10) 6: end for 7: repeat 8: for k = 1 to K do: 9: calculate the particle's random walk probability P (k) rand (i, j) using Equation (4) 10: calculate the particle's preferential walk probability pre f (i, j, t) using Equation ( 11), (12), and (13) 11: calculate the particle's walk probability P (k) transition (i, j, t) using Equation (3) 12: particles walk based on the walk probability and dynamically adjust the particle's control increment using Equation (15) 13: if E (k) (0) ≤ 0: 14: particle performs the revival step by jumping to the within their control range that possesses the control capability for revival and re-energization.

Time Complexity Analysis
For a complex network G = (V, E), assuming the average degree of nodes is d, the number of nodes is n, the number of edges is m, and the common neighbors between two nodes is c.The determination of particle starting positions involves calculating node degrees and the Salton similarity index between nodes, with a time complexity of O(m).
Active particles wandering in the network require calculating the probability for each particle to move from the current node to neighboring nodes.The random walk probability for each particle only requires computing node degrees, while the preferential walk probability needs to calculate the similarity index between the current node and its neighbors, with a time complexity of O Kcd .When a particle's energy is exhausted, the revival step involves maintaining a hash table to store each particle's control nodes and their corresponding control capability values.Finding the node with the maximum control capability for jumping has a time complexity of O(1).Updating the particle control matrix has a time complexity of O K 2 .Since each node in the network is visited at least once by a particle, the total time complexity of particle wandering is O Kcdn + K 2 n .To determine the optimal number of particles, the algorithm needs to gradually change the particle count from 2 to K , where K is a constant slightly larger than the actual number of communities in the network.Therefore, the entire particle wandering process of the LNSSCL algorithm has a time complexity of O K 2 cdn + K 3 n .After the particle wandering process concludes, assigning community labels to all nodes based on their control capability values requires a time complexity of O(n).The node membership selection step involves each node selecting its community label based on the frequency of community labels among its neighboring nodes, with a time complexity of O d + cd 2 n .Since complex networks are usually sparse networks, d n.In summary, the time complexity of the LNSSCL algorithm is , where M is a constant.The time complexity of the LNSSCL algorithm is linearly related to the sum of the number of nodes and edges in the network, making the algorithm highly scalable on large-scale networks.

Experiments and Discussions
To test the effectiveness of the LNSSCL algorithm, experiments were conducted on real network datasets and synthetic networks, comparing the proposed algorithm with 12 representative community detection algorithms.The selected benchmark algorithms for experimentation include community detection algorithms based on random walk such as Walktrap [23] and Infomap [24]; modularity-based algorithms CNM [36], Louvain [37], and Leiden [38]; label propagation-based algorithms LPA [39], TS [27], GLLPA [29], and SSCLPA [30]; hierarchical clustering algorithm Paris [40], Markov chain-based community detection algorithm MSC [25], and the stochastic competitive learning algorithm based on the particle competition mechanism SCL [14].

Experimental Environment and Initial Parameters
The algorithm was implemented using NetworkX and scikit-learn.The specific experimental environment is shown in Table 1.For the setting of the initial parameters, we refer to the value range of the literature [35].The value range of the particle preferential wandering probability parameter λ is [0.2, 0.8], and we set its initial value to 0.6; the value range of the particle energy update value ∆ is [0.1, 0.4], and we set its initial value to 0.3; the convergence factor ε is set to 0.05; and the minimum value of the particle energy E min is set to 0, and the maximum value of the particle energy E max is set to 1.

Datasets
The experiment utilized 11 real network datasets, including four labeled real network datasets and seven unlabeled real network datasets.The labeled real network datasets consist of the Karate network [41], Dolphins network [42], Polbooks network [43], and Football network [1].The unlabeled real network datasets include the Lesmis network [44], Jazz network [45], Email network [46], Netscience network [47], Power Grid network [48], Facebook network [49], and PGP network [50].The basic information is presented in Table 2. Likewise, to expand experiments, the LFR benchmark is used as synthetic networks [51].We generate different scale networks on the LFR test network model for experiments.The specific parameter settings of the LFR network are shown in Table 3, where d is average degree, d max denotes maximum degree, minc represents minimum community size, maxc is maximum community size, and tau1 and tau2 are the parameters for power law distribution.µ is the mixed parameter.The larger the mixed parameter, the more difficult the community division.

Evaluation Index
In this paper, we use two widely adopted evaluation metrics for community detection algorithms to assess the quality of the algorithm's community detection results.For labeled real network datasets, we utilize Normalized Mutual Information (NMI) [52] and modularity [36] to evaluate the community detection results of each algorithm.For unlabeled real-world network datasets, since the ground-truth community structures of these network datasets are still unknown, we assess the quality of the detected results in terms of the modularity only.For LFR networks, we use NMI to evaluate the community detection results of each algorithm.
The NMI score is defined as shown in Equation (17): where X represents the ground truth community structure and Y represents the community detection results of the algorithm.C X denotes the number of true communities in the ground truth, and C Y represents the number of communities detected by the algorithm.C represents the confusion matrix, where rows represent the ground truth community structure and columns represent the algorithm's community detection results.C ij represents the number of common nodes between the true community i in X and the community j detected in Y. C i represents the sum of row i in matrix C, and C j represents the sum of row j in matrix C. N stands for the total number of nodes in the network.N MI ∈ [0, 1].A higher NMI value indicates a better agreement between the algorithm's community detection results and the true community structure, thus implying a better performance of community detection.The modularity(Q) is defined as shown in Equation ( 18): where m denotes the number of edges in the network, a ij denotes the connection status between node v i and node v j ; c i denotes the community label of node v i , c j denotes the community label of node v j ; δ c i , c j denotes the Kronecker function, which takes the value 1 if c j and c j are the same; otherwise, it takes the value 0. Generally, a larger modularity value implies a more distinct community structure.

Experimental Results and Analysis on Labeled Real Network Datasets
On the four labeled real network datasets, namely Karate, Dolphins, Polbooks, and Football, we conducted comparative experiments between the proposed LNSSCL algorithm and 12 other representative community detection algorithms.We evaluated the community detection results of each algorithm using NMI score and modularity Q.The experimental results are shown in Tables 4 and 5.As can be seen in Tables 4 and 5, the LNSSCL algorithm achieves the highest NMI values on all labeled real network datasets and has some degree of improvement over the other algorithms.Whereas, for the comparison of modularity Q, the LNSSCL algorithm does not take the optimal value on all the datasets.Though the proposed algorithm took the highest value only on Dolphins and Football datasets, the modularity scores on Karate and Polbooks datasets were close to the highest level.
The NMI measures the similarity between the algorithm's output community segmentation results and the real online community structure.The larger the NMI, the higher the similarity between the algorithm's output community segmentation results and the real online community structure.It can be seen that the community detection results of the LNSSCL algorithm on the four labeled real network datasets of Karate, Dolphins, Polbooks, and Football are closest to the real community structure of the above networks.In addition, the LNSSCL algorithm improves its NMI values by 3.3%, 10.2%, 10.5%, and 8.8% on the Karate, Dolphins, Polbooks, and Football datasets, respectively, compared to the SCL algorithm.This demonstrates the effectiveness of a series of improvements to the SCL algorithm by the LNSSCL algorithm, which improves the stability and accuracy of the SCL algorithm.
For other algorithms, CNM, Louvain, and Leiden algorithms seek to maximize the modularity of the whole network, which obtains good modularity scores but fails to discover the real community structure well.LPA, TS, GLLPA, and SSCLPA algorithms tend to have a certain degree of randomness in the node order of label updating and the label propagation process, making the final community.The computational complexity of Walktrap and Infomap, two randomized wandering algorithms, is relatively high and sensitive to the parameters set by themselves.Due to the small size of the dataset used in the experiments and the existence of small communities, although they have higher modularity scores, they failed to achieve higher NMI scores, and the consistency with the real community structure is insufficient.
Overall, the LNSSCL algorithm works best for community detection on the Karate, Dolphins, Polbooks, and Football datasets.

Experimental Results and Analysis on Unlabeled Real Network Datasets
On the seven unlabeled real network datasets, namely Lesmis, Jazz, Email, Netscience, Power Grid, Facebook, and PGP, the proposed LNSSCL algorithm was compared against 12 representative learning algorithms through experimental evaluation.The assessment of various algorithm's community detection outcomes was conducted using modularity Q, and the experimental results are shown in Table 6.Since the unlabeled network dataset is without real community structure, we can accomplish community delineation by identifying structural features with strong internal connections and sparse external connections.Modularity measures the strength of community structure in the network and evaluates the results of algorithmic community delineation when the dataset has no real community structure.The larger the modularity degree is, the better the quality of community detection and the stronger the connection within the community.
As can be seen in Table 6, the LNSSCL algorithm achieves the maximum modularity on all seven unlabeled network datasets used in the experiments, outperforming other comparison algorithms.In addition, these network datasets are of different types, sizes, and sparsities, and the high modularity performance of the LNSCCL algorithm reflects the algorithm's good generalization and universality.In particular, the performance on two larger datasets, Facebook and PGP, shows that the algorithm has some scalability.For other algorithms, CNM, Louvain, and other algorithms oriented to maximize the modularity achieved high modularity on the Netscience, Power Grid, Facebook, and PGP datasets, with Leiden in particular being the most prominent, but did not show the best performance on the smaller datasets.LPA, TS, GLLPA, and SSCLPA algorithms are not as effective as the stable LNSSCL algorithm in the experiments, because some randomness in the node order of the label update and label propagation process cannot show stable and accurate performance.The MSC algorithm constructs a steady-state Markov similarity augmented matrix, which is capable of stable and efficient community delineation, and achieves high modularity in the experiments.Further, the NSSCL algorithm improves its modularity values by 6.7%, 12.7%, 6.4%, 4.2%, 10.3%, 5.3%, and 9.9% on Lesmis, Jazz, Email, Netscience, Power Grid, Facebook, and PGP datasets, respectively, as compared to the SCL algorithm.This shows that the LNSSCL algorithm improves the stability and accuracy of the original SCL algorithm for community detection to some extent.

Experimental Results and Analysis on synthetic networks
In order to better measure the performance of the algorithm, we generate networks of different sizes for experiments on the LFR test network model.In this case, the number of nodes in the LFR1 network is 1000 and the number of nodes in the LFR2 network is 4000.Since the real community structure of the LFR network is known, the performance of the algorithm is measured using NMI.Among the important parameters used to create the LFR network, the mixing parameter µ is used to represent the complexity of the community structure and determines the clarity of the community structure.As the mixing parameter µ increases, the community structure becomes more complex and the difficulty of recognizing the community increases.The experimental results of the LFR network with different mixing parameter µ are shown in Figure 4, where the horizontal coordinates represent the individual values of the mixing parameter µ and the vertical coordinates represent the NMI.
ues, except for the CNM and Paris algorithms.When μ is greater than 0.4, the NMIs all show a decrease.Among them, label propagation-based algorithms such as LPA have very obvious changes in decline, especially the LPA algorithm, which has an NMI value of 0 in both networks when μ is not less than 0.6.The reason for the analysis is that label propagation-based algorithms have a certain degree of randomness, which is prone to cause low performance and instability in community detection when the community structure is not clear enough.Comparing the synthetic networks LFR1 and LFR2, the NMI of most of the algorithms increased with the increase in the number of nodes.However, the performance of the CNM algorithm shows a decrease, which is attributed to the fact that the algorithm may have the problem of resolution limitation when dealing with networks with larger communities, and is unable to decompose large communities into smaller sub-communities.From Figure 4, it can be found that the NMI value of LNSSCL tends to 1 when = 0.1, and the NMI value of LNSSCL is maximum when = 0.8.It indicates that the performance of LNSSCL decays the slowest, i.e., the performance of LNSSCL is more stable.As can be seen in Figure 4, the performance of each algorithm decreases to varying degrees in both sets of networks as the mixing parameter µ increases, but the community detection performance of the LNSSCL algorithm still outperforms most of the compared algorithms.
When the mixing parameter µ is less than 0.4, the algorithms have higher NMI values, except for the CNM and Paris algorithms.When µ is greater than 0.4, the NMIs all show a decrease.Among them, label propagation-based algorithms such as LPA have very obvious changes in decline, especially the LPA algorithm, which has an NMI value of inclined to preferentially wander, frequently visit the nodes that have been controlled, and consolidate the community scope, but it is easy to fall into the localized area and it cannot visit new nodes.From Figure 5, it can be seen that the parameter λ can achieve the best community detection performance by appropriately increasing the value of the parameter when balancing random wandering and preferential wandering (λ = 0.5).For the Polbooks, Football, and Jazz datasets, the best performance is achieved when the value of λ is 0.6, while the Email dataset takes the value of λ as 0.5.
is very small, the particle is not penalized enough by the energy it receives for visiting nodes that are not under its control, so the particle's energy will not be depleted during its wanderings.The particle will frequently visit nodes that should belong to the nodes to which the competing particles belong and enter the core of other communities.As a result, all nodes in the network will be in constant competition, unable to establish and consolidate community boundaries, and the final community detection will be less effective.When  is very large, the particle will simply run out of energy once it visits a node controlled by a competing particle.The particle will frequently enter the resurrection phase and is not expected to move away from its initial position to take control of other nodes.As can be seen in Figure 6, a parameter  of 0.3 achieves the best perfor- mance on all four datasets of the experiment.The different values of the parameter  have roughly the same trend in affecting the performance on different datasets in the experiment.
According to Figures 5 and 6, the algorithm is somewhat sensitive to the values of The parameter ∆ is responsible for updating the particle's energy value.When ∆ is very small, the particle is not penalized enough by the energy it receives for visiting nodes that are not under its control, so the particle's energy will not be depleted during its wanderings.The particle will frequently visit nodes that should belong to the nodes to which the competing particles belong and enter the core of other communities.As a result, all nodes in the network will be in constant competition, unable to establish and consolidate community boundaries, and the final community detection will be less effective.When ∆ is very large, the particle will simply run out of energy once it visits a node controlled by a competing particle.The particle will frequently enter the resurrection phase and is not expected to move away from its initial position to take control of other nodes.As can be seen in Figure 6, a parameter ∆ of 0.3 achieves the best performance on all four datasets of the experiment.The different values of the parameter ∆ have roughly the same trend in affecting the performance on different datasets in the experiment.

Conclusions and Future Work
This paper introduces a novel unsupervised community detection algorithm named LNSSCL, which incorporates node local similarity into the process of stochastic competitive learning.Firstly, the algorithm determines the starting position of particles' walks According to Figures 5 and 6, the algorithm is somewhat sensitive to the values of λ and ∆.Numerically, there is little fluctuation within a small interval of relative range.Based on the results of the parameter sensitivity analysis experiments, we set λ to 0.6 and ∆ to 0.3 for experiments on other datasets.

Conclusions and Future Work
This paper introduces a novel unsupervised community detection algorithm named LNSSCL, which incorporates node local similarity into the process of stochastic competitive learning.Firstly, the algorithm determines the starting position of particles' walks by calculating the degree value of nodes as well as the Salton similarity index.At the same time, the fusion of node similarity optimizes the particle preferential walk rule.During the particle wandering process, the particle control capacity increment is dynamically adjusted according to the control range of each particle.When a particle runs out of energy, the particle selects the node with the largest control power within its control range for resurrection.After the particle stops wandering, the nodes in the network are selected for affiliation based on the frequency of occurrence of community labels of neighboring nodes and the effectiveness score of neighboring communities, and the community detection results are finally obtained.Comparative experiments on real network datasets and synthetic networks show that the LNSSCL algorithm is effective in improving the SCL algorithm.Compared with other representative algorithms, the LNSSCL algorithm has better quality of community detection and is able to reveal a more reasonable community structure.
Nevertheless, the LNSSCL algorithm also has some defects.Compared with the SCL algorithm, the algorithm performs multiple node local similarity calculations during the community detection process, which requires more computational cost and complexity, and may have a larger time overhead on ultra-large networks.In the selection of some hyperparameters of the algorithm, no special parameter tuning method is used; the param-

Figure 2 .
Figure 2. Schematic of error community detection when particle control capacity increment is constant.

Figure 2 .
Figure 2. Schematic of error community detection when particle control capacity increment is constant.

Figure 3 .
Figure 3. Schematic of particle resurrection location selection.

Figure 3 .
Figure 3. Schematic of particle resurrection location selection.

Figure 4 .
Figure 4. Experimental results of NMI under synthetic networks: (a) experimental results of LFR1 network where n = 1000; (b) experimental results of LFR2 network where n = 4000.

Figure 4 .
Figure 4. Experimental results of NMI under synthetic networks: (a) experimental results of LFR1 network where n = 1000; (b) experimental results of LFR2 network where n = 4000.

Figure 5 .Figure 5 .
Figure 5. Parameter sensitivity analysis for  : (a) experimental results of Polbooks; (b) experi- mental results of Football; (c) experimental results of Jazz; (d) experimental results of Email.

Figure 5 .Figure 6 .
Figure 5. Parameter sensitivity analysis for  : (a) experimental results of Polbooks; (b) experi- mental results of Football; (c) experimental results of Jazz; (d) experimental results of Email.

Figure 6 .
Figure 6.Parameter sensitivity analysis for ∆: (a) experimental results of Polbooks; (b) experimental results of Football; (c) experimental results of Jazz; (d) experimental results of Email.

Table 2 .
Basic information of real network datasets.

Table 3 .
The parameters for LFR network construction.

Table 4 .
Comparison of the NMI of each algorithm on labeled real network datasets.The largest NMI are in bold.

Table 5 .
Comparison of modularity Q of each algorithm on labeled real network datasets.The largest Q are in bold.

Table 6 .
Comparison of modularity Q of each algorithm on unlabeled real network datasets.The largest Q are in bold.