An Information-Theoretic Approach for Detecting Community Structure Based on Network Representation

.


Introduction
A network topology diagram (or network diagram for short) is an abstract representation of systems and structures that commonly exist in the real world. In a network diagram, a node represents an individual or element in the system, and an edge represents the relationship between individuals or elements. It has been shown that some nodes in the network are more closely connected to each other than to the other parts of the network, and therefore these nodes can be regarded as an independent part, called a community [1]. Community structures reflect the local topological features and the relationships between elements. Thus, detecting communities in a network diagram becomes an indispensable step in understanding the maintenance and evolution of a network, and this can be applied to the analysis and prediction of the real-world network systems. For example, constructing a recommendation system on the basis of the community detection of friend relationship networks and identifying advertising marketing behaviors and telecom frauds by detecting abnormal links between communities in a telecommunication network [2].
However, finding the community structure accurately in the network has been proven to be a difficult task. This is mainly due to two reasons, one being the ambiguity definition of the pros and cons of a community, and the other being that there are an exponentially large number of possible community partitions [1]. For the former, although researchers generally reach a consensus that a community is a cluster of nodes with closer internal connections than external connections, different researchers use different indexes to define and measure this closeness, including the well-known modularity. However, these indexes cannot be accurate in all cases. For example, modularity has a resolution limit problem, • An average mutual-information-based community evaluation index is proposed, which is applicable to top-down community evolution processes and bottom-up community evolution processes. By calculating the average mutual information and information entropy of adjacent states, the AMI-based community evaluation index can measure the stability of each state of a community evolution process. • An information-theoretic approach based on network representation named AMI-NRL is proposed. This approach combines network representation and the AMI-based community evaluation index to achieve stable and accurate community detection. • Experiments were conducted to verify the accuracy and stability of the approach, in comparison with typical community detection algorithms on real-world and synthetic datasets.
The rest of this paper is organized as follows: Section 2 summarizes the current research on community detection. Section 3 introduces and explains in detail the AMI-NRL approach. Section 4 shows the experiments conducted to verify the effectiveness of the algorithm. Section 5 concludes the article and provides a prospect of our research.

Related Work
The current research on community detection mainly focuses on finding community structures from multiple types of networks. Depending on the type of community structure, they can be divided into overlapping/non-overlapping community detection. Depending on the type of networks, they can be divided into algorithms for static/dynamic, directed/undirected, weighted/unweighted networks, etc. [1]. The design of a community detection method mainly includes two parts-one is an index to evaluate a given community partition, the other is a specific method of community division. The former evaluates the strengths and weaknesses of a community partition, and the latter finds the best partition of a network by certain steps or processes according to the guidance of the former.
In the community evaluation index, the most widely used is Girvan and Newman's modularity [3], which is applied in their GN algorithm and the fast Newman algorithm. Modularity brings about an unprecedented development on static non-overlapping community detection algorithms, but the resolution limit of it [7] makes traditional modularitybased approaches unable to obtain ideal community partitions in many cases, and even sometimes leads to obviously unreasonable results. Besides the modularity method, a variety of other types of community evaluation indexes have also been proposed for overlapping or non-overlapping community detection. The two-layer coding method of Rosvall et al. [8] transforms the community partitioning problem into an information compression problem. This method considers that the average coding length of nodes for a random walking in the network can be used to measure the quality of a partition. The accuracy of the method is time-dependent, and more iterations will provide more accurate results. By evaluating the stability of random walks in the network, the stability index [9] proposed by Lambiotte et al. can also be used to evaluate the quality of community partition. The hyperparameter t of the stability metric determines on what time scale the index will measure the random walking, but it is difficult to determine what value t should be chosen for t in different networks. Unlike the above research, we introduce the average mutual information (AMI) by the inspiration of information theory and system stability in order to measure the stability of communities during the evolution process of them, as well as to propose an AMI-based community evaluation index aiming at obtaining accurate, stable, and unique results.
In terms of the method of community division, many types of methods have been successively proposed and coordinated with the community evaluation index in order to detect communities in networks. The top-down splitting method, for example, the GN algorithm [3], continuously deletes edges in the network until all edges are deleted, finding the optimal state as the final result using the community evaluation index. The efficiency of the GN algorithm is low, due to a large time cost calculating the edge-betweenness when the network is large. The bottom-up aggregation method initializes the network that each node is a separated community, and then pairs of communities are continuously selected to merge until the entire network finally forms a large community; following this, a community evaluation index is used to search the optimal state as the result in this process. For example, the FN algorithm [10], the CNM algorithm [11], the fast-unfolding algorithm (also known as the Louvain algorithm) [12], and the Infomap algorithm [8] are specific methods for generating community partitions using a bottom-up aggregation idea. Such types of methods are faster and more efficient, but the accuracy is also low because the deviation tends to widen from the previous state to next state during the aggregation process. The label propagation method detects the communities by first labeling some nodes and propagating labels between nodes by the similarity of two nodes, such as the COPRA algorithm [13] and the CLPA-GNR algorithm [14]. These types of algorithms are characterized by high operating efficiency but are less stable. On the basis of the idea of local optimization, local expansion methods follow the idea starting from multiple nodes and greedily expand to their neighborhoods until the specified community boundary conditions are reached, thereby obtaining overlapping/non-overlapping community partitions. Examples are LFM [4] and GCE [15]. This type of method can be effectively applied to detect overlapping communities, but it is easy for it to fall into local optima due to the difficulty in dealing with global information in the process of expansion. Heuristic methods are also used for community detection. Infomap [16] proposed by Rosvall et al. encodes and compresses the network information and obtains the optimal community partitioning by simulated annealing. The clone selection algorithms such as CSA-Net [17] can be applied to detect community structures in complex networks at multiple resolutions. However, such algorithms often have multiple adjustable parameters. It is difficult to determine appropriate parameter values, and it is also easy to obtain trapping in the local optimum. The random-walking methods obtain structural information such as the closeness of the connections between nodes (or node correlation) through random walks in the network, and then are based on this structural information (or the preset community evaluation index) to detect communities. An example is the community detection algorithm based on positive and negative links [18] proposed by Su et al. Such a type of method is difficult to distinguish between the two partitions with similar community structures, which leads to the failure of finding the best partition. Recently, some motif-based methods have been proposed that focus on the higher-order structural characteristics of the network, e.g., EdMot [19] proposed by Pei-Zhen Li et al. Such methods have high efficiency, especially in large neural networks and collaboration networks, but have a lack of accuracy on unweighted and undirected networks compared with other methods.
In recent years, researchers have devoted more attention to studying how to extract the structural information of networks by means of network representation learning, such as ComE [6], which constructs the "community detection-community representation-node representation" closed-loop framework, and MemeRep [20], which uses network representation to optimize the modularity density. Although gaining of a better representation of a network has been extensively studied in the literature, the issue of how to obtain accurate partitioning by using the representation has not gained much attention, while the community partitions are straightway obtained by clustering the representation vectors using methods such as K-means or DBSCAN, which lead to multiple results depending on the parameters and initialization methods. Our research pays more attention to how to use the network representation obtained to get a community partition with higher stability and accuracy. In our approach, a bottom-up community evolution process is generated using the representation of the target network, so that the most stable state of the process can be found as the optimal partition, by the usage of the AMI-based community evaluation index.

Community Evaluation Index Based on Average Mutual Information
The community evaluation index is used to measure the quality of a partition for a specific network. The approach uses a community evaluation index based on average mutual information (AMI) to measure the stability of each state of a bottom-up cohesion or top-down division community evolution process by calculating the average mutual information and information entropy of adjacent states, which finally leads to the optimal community partition of such a process.

Definition 1. Community evolution process of top-down division.
In a community evolution process: P = P 0 , P 1 , P 2 , · · · , P N ; each state is a community partition. If adjacent partitions P n and P n+1 satisfy where X i and Y j denote communities in a partition, n is an integer, and n < N, then community evolution process P is named a top-down division community evolution process. In other words, partition P n+1 comes from P n where one or several of the communities split into more communities. Correspondingly, if adjacent partitions P n and P n+1 satisfy then community evolution process P is named a community evolution process of bottom-up cohesion. In other words, partition P n+1 comes from P n , where some of the communities merge into one.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 5 of 16 Figure 1 shows an example of community evolution process of top-down division.

Definition 2. Community evolution process of bottom-up cohesion.
Correspondingly, if adjacent partitions and +1 satisfy then community evolution process is named a community evolution process of bottom-up cohesion. In other words, partition +1 comes from , where some of the communities merge into one.
For a partition in a bottom-up cohesion or top-down division community evolution process, the community evaluation index based on AMI is defined as and , +1 are the AMI value between the partition of current state with that of the previous state, and that of the next state, respectively.
is the entropy of , and is the scale factor. Better results can be achieved in our experiments when setting to the total number of nodes of the target network. The community evaluation index can be used to measure the stability of each state. This is because when a state has a high AMI-value with its adjacent states and a low entropy, it is considered instantaneously stable on the category of information theory.
Since the essence of a community is a node set and the essence of a partition is a set of communities, the AMI value , between two partitions and ( ≠ ) can be calculated using the AMI formula between sets, namely, where ∈ , which denotes the th community in partition ; ∈ , which denotes the th community in partition . ( ; ) denotes the mutual information between communities and , and denotes the relevancy degree between communities and , which is defined as Consider a top-down division community evolution process = 0 , 1 , 2 , ⋯ , . Since states of such a process are essentially partitions, there are only three possible cases for a pair of random communities ∈ , ∈ , < : 1.
= , which denotes that in partitions and , the member nodes of communities and are totally the same. and are actually the same community, which remain unchanged in two states. For a partition P n in a bottom-up cohesion or top-down division community evolution process, the community evaluation index Q P n based on AMI is defined as where I P n−1 ,P n and I P n ,P n+1 are the AMI value between the partition of current state with that of the previous state, and that of the next state, respectively. H P n is the entropy of P n , and k is the scale factor. Better results can be achieved in our experiments when setting k to the total number of nodes N of the target network. The community evaluation index Q P n can be used to measure the stability of each state. This is because when a state has a high AMI-value with its adjacent states and a low entropy, it is considered instantaneously stable on the category of information theory.
Since the essence of a community is a node set and the essence of a partition is a set of communities, the AMI value I P a ,P b between two partitions P a and P b (a = b) can be calculated using the AMI formula between sets, namely, where X i ∈ P a , which denotes the ith community in partition P a ; Y j ∈ P b , which denotes the jth community in partition P b . I X i ; Y j denotes the mutual information between communities X i and Y j , and ω ij denotes the relevancy degree between communities X i and Y j , which is defined as Consider a top-down division community evolution process P = P 0 , P 1 , P 2 , · · · , P N . Since states of such a process are essentially partitions, there are only three possible cases for a pair of random communities X i ∈ P a , Y j ∈ P b , a < b: 1.
X i = Y j , which denotes that in partitions P a and P b , the member nodes of communities X i and Y j are totally the same. X i and Y j are actually the same community, which remain unchanged in two states.

2.
Y j ⊆ X i , which denotes that community Y j in partition P b comes from a splitting community X i in partition P a . 3.
X i ∩ Y j = ∅, which denotes that in partitions P a and P b , the member nodes of X i and Y j are totally different; in other words, the two communities have no relations on the timeline.
For Case 1, P Y j X j = 1.
, where n X i and n Y j are, respectively, the number of member nodes of communities X i and Y j .
For Case 3, P Y j X j = 0. From Equations (4) and (5), we can further derive that ω ij · I X i ; Y j in Case 3 is 0, but in Case 1 and 2, the value of I X i ; Y j still needs to be calculated to obtain I P a ,P b . To achieve this, we use the following formula: where a, b ∈ 0, 1, . P(X i = 1) denotes the probability of taking a random node, and it belongs to community X i . Thus, P(X i = 1) = n X i n , where n X i is the number of the member nodes of community X i , and n is the total number of nodes of the whole network.
P(X i = 0) denotes the probability of taking a random node, and it does not belong to community X i . Thus, P(X i = 0) = n−n X i n . P Y j = 1 and P Y j = 0 can be calculated in a similar way. P Y j = 1 X i = 1 denotes the probability of taking a random member node of community X i , and it is also a member node of community Y j . Therefore, For Case 1 and Case 2, if a node does not belong to community X i , then it certainly cannot belong to community Y j . Therefore, For a community evolution process of bottom-up cohesion, by inverting its states, it can be transformed into a top-down division community evolution process. For instance, if P bottom−up = {P 0 , P 1 , P 2 , · · · , P N } is a bottom-up cohesion community evolution process, then P top−down = {P N , P N−1 , · · · , P 2 , P 1 } is its corresponding top-down division community evolution process. Thereby, the AMI value of any pair of the adjacent states of it can be calculated by the method above. Here, we do not go into detail.
After the AMI value of the adjacent partitions, the entropies of each partition also need to be calculated.
For partition P a , the entropy H P a of it is defined as where X i denotes the ith community of partition P a . Finally, the community evaluation index Q P n of each state in the community evolution process P can be calculated. The state that has the largest Q P n is the best partition in the community evolution process.

Basic Process of the Approach
The basic idea of AMI-NRL is to form a community evolution process using the representation of the nodes and then to select the most stable partition in process as the final result. The following steps show the basic process of the approach:

1.
Random walks through out the network, recording the passing nodes to obtain the node sequences.

2.
Obtaining the vectorial representation of each node by inputting these sequences into the Word2vec model.

3.
Cluster these vectors with aggregation hierarchical clustering, regarding the clusters as communities, in order to form a bottom-up cohesion community evolution process P = {P 0 , P 1 , P 2 , · · · , P N } 4.
Calculating the AMI value I P n ,P n+1 of each two adjacent partitions, as well as the entropy H P n of each partition.

5.
Calculating the Q P n value of each partition and selecting the partition P a with largest Q P n to output as the result partition.
Steps 1 and 2 are used to obtain the representation of each node of the network using Deepwalk [21]. We conducted experiments on a number of different network representation learning methods, including Deepwalk [21], Node2Vec [22], Walklets [23], NMFADMM [24], NetMF [25], GLEE [26], RandNE [27], BoostNE [28], and GraRep [29]. By comparing the results of applying them to our approach, we finally selected Deepwalk because of its clearer and more stable boundaries in its clustering results on multiple real-world datasets.
In Step 1, a random walk iteration of R times is designed. According to our experiments, better results can be achieved in our experiments when setting R from 10 to 25. In each iteration, each node of the network is correspondingly set as the starting point of the random walker. Then, it randomly moves to its neighboring nodes equiprobably. After L − 1 moves, it generates N node sequences with length L. In our experiments, we found that for larger networks, larger L values were needed to be set.
Step 1 finally generates R · N node sequences with length L.
In Step 2, the sequences are input into the Skip-gram model, which is a two-layer neural network used in Word2vec. In the training phase, a specific node is input at one time, and the model is required to output the predict nodes accurately within its window, that is, the context of the input node. After multiple iterations, a weight matrix from the input layer to the hidden layer can be learned. Each row of the weight matrix represents the weight of each node in the input layer corresponding to the neurons in the hidden layer. This is a vectorial representation of the node. A good result can be obtained by setting the context window size to 7 and the number of hidden-layer neurons to 50.
After obtaining the representation of the nodes, Step 3 performs an agglomerative hierarchical clustering on them to construct a bottom-up cohesion community evolution process. While clustering, Euclidean distance and Ward's method [30] are used to calculate the distance between vectors and between clusters, respectively. The hierarchical clustering will produce a dendrogram. Each layer of the dendrogram naturally corresponds to a community partition. In adjacent layers, there will be a pair of communities merged into one. A bottom-up cohesion community evolution process can be generated in this way.
Finally, in Steps 4 and 5, by using the community evaluation index based on AMI, the Q P n value of each partition of the process is calculated. The partition P a with largest Q P n value is selected as the result of community detection.
The pseudo code of the implemented approach is shown in Algorithm 1:

Experiments
This section shows the implementation of the approach and experimentally verifies its accuracy. Three real-world networks of different sizes and LFR synthetic networks were selected for the experiments. These datasets and corresponding parameters are specifically described in Table 1. Since these four datasets involved ground-truth labels, we used the number of communities (CN) and normalized mutual information (NMI) as evaluation indicators and compared them with other community detection algorithms in order to evaluate the accuracy of the approach.

Datasets and Experimental Parameters
The datasets used in the experiment include the Karate Club [31], Dolphins [32], Polbooks, Polblogs [33], and LFR benchmark [34]. In the first four are real-world datasets, the last one generates synthetic networks with given parameters. Table 1 lists the main properties of datasets, as well as the experimental parameters including random-walk length, number of iterations, and dimensions of vectors.

Benchmarks
To verify the accuracy of the algorithm, we considered both the number of communities (CN) and normalized mutual information (NMI). NMI evaluates the difference between two partitions on the basis of information theory; therefore, it can be used as an index aiming at evaluating the accuracy of community detection, which is introduced in [35].
The formula of NMI is shown as follows: where C A and C B are the number of communities of partitions A and B, respectively. C is a confusion matrix that indicates the number of nodes that belong to a pair of communities at the same time. Specifically, C ij denotes the number of nodes simultaneously belonging to community i in partition A and community j in partition B. C i· and C ·j denote the sum of all elements of a row and a column, respectively, in matrix C. C A and C B denote the number of communities of partitions A and B, respectively. NMI measures the similarity of two community partitions; it is equal to 1 if the two partitions are identical, and it is close to 0 if they are quite dissimilar.

Karate Club
This approach can output stable community detection results on the Karate Club dataset, as shown in Figure 2 and Table 2. It was exactly the same as the labeled partition.

Dolphins
The results on the Dolphins dataset, as shown in Figure 3 and Table 3, had a difference of 0 to 1 point from the labeled partition when running multiple times. The difference node was Node 39, which had two edges connected correspondingly to two communities, and therefore we hold the opinion that it is reasonable no matter which community the node belongs to.

Dolphins
The results on the Dolphins dataset, as shown in Figure 3 and Table 3, had a difference of 0 to 1 point from the labeled partition when running multiple times. The difference node was Node 39, which had two edges connected correspondingly to two communities, and therefore we hold the opinion that it is reasonable no matter which community the node belongs to. ence of 0 to 1 point from the labeled partition when running multiple times. The difference node was Node 39, which had two edges connected correspondingly to two communities, and therefore we hold the opinion that it is reasonable no matter which community the node belongs to.   Table 4 and Figure 4 show the results on the Polbooks dataset. The result differs from the labeled partition by 16 nodes: 0, 4, 6, 18, 28, 46, 48, 52, 58, 64, 65, 67, 68, 76, 77, and 85.   Table 4 and Figure 4 show the results on the Polbooks dataset. The result differs from the labeled partition by 16 nodes: 0, 4, 6, 18, 28, 46, 48, 52, 58, 64, 65, 67, 68, 76, 77, and 85.   Table 5 shows the result of the link analysis on these nodes, taking into account the edges inside communities and between communities in both community detection results and labeled partitions. As shown in Table 5, nodes 0, 4, 6, 18, 28, 46, 48, 58, 64, 65, 76, and 77 had a greater number of edges inside communities than the labeled partition; nodes 52, 67, and 68 had the same internal and external edges as that of the labeled partition. Only node 85 had one more internal edge than the labeled partition. Therefore, we hold the opinion that the AMI-NRL achieves a better result on the Polbooks than the labeled partition. Table 5. Link analysis of the result and the labeled community partition on Polbooks.

LFR Benchmark
The LFR benchmark is a type of artificially generated network proposed by Andrea Lancichinetti et al. in [33]. Compared to GN benchmark networks, LFR networks can simulate real-world networks more accurately in terms of its scale-free feature on degree distribution and community size. By altering multiple parameters, users can determine characteristics of the network such as network size, community size, average node degree, and community mixing degree. Table 7 shows the parameters used in the experiments.

LFR Benchmark
The LFR benchmark is a type of artificially generated network proposed by Andrea Lancichinetti et al. in [33]. Compared to GN benchmark networks, LFR networks can simulate real-world networks more accurately in terms of its scale-free feature on degree distribution and community size. By altering multiple parameters, users can determine characteristics of the network such as network size, community size, average node degree, and community mixing degree. Table 7 shows the parameters used in the experiments. The key parameter of the LFR reference network is the parameter µ (mu). This parameter controls the mixing degree of a single community with other communities. The higher the mixing degree is set at, the more edges will be generated to connect different communities. This results in a higher average externality of the network, which makes it more difficult for a community detection algorithm to distinguish the community structure. The results of running AMI-NRL on LFR networks and the comparison with the other community detection algorithms are shown in Figure 6 and Tables 8 and 9.  The key parameter of the LFR reference network is the parameter (mu). This parameter controls the mixing degree of a single community with other communities. The higher the mixing degree is set at, the more edges will be generated to connect different communities. This results in a higher average externality of the network, which makes it more difficult for a community detection algorithm to distinguish the community structure. The results of running AMI-NRL on LFR networks and the comparison with the other community detection algorithms are shown in Figure 6 and Tables 8 and 9.

Conclusions
In this paper, we raise the idea of detecting communities in the network on the basis of stability. Following this idea, a community evaluation index based on average mutual information was used to find the most stable state in a community evolution process. The stability of each state of the process was measured by calculating the average mutual

Conclusions
In this paper, we raise the idea of detecting communities in the network on the basis of stability. Following this idea, a community evaluation index based on average mutual information was used to find the most stable state in a community evolution process. The stability of each state of the process was measured by calculating the average mutual information of adjacent states and their information entropy. On the basis of this index, we proposed AMI-NRL. In this approach, the network is transformed into vectors through network representation learning. Then, agglomerative hierarchical clustering is performed on these vectors to simulate a real-world evolution process of communities. Finally, the optimal community partition can be found through the process by finding the peak value of the community evaluation index. Experiments on real-world and synthetic networks show that the approach is accurate and stable for the detection of communities.
In experiments, we also found that the ground truth of some datasets was not necessarily the most reasonable community partition. By comparing the labeled partition of the Polbooks dataset and the partition obtained by AMI-NRL, we found that the community structure obtained by the latter was more closely connected within the community, while the connection between communities was sparser. In other words, its community structure was more explicit. In future research, we will analyze this issue in depth and try to propose a more accurate and reasonable community evaluation model.
Due to the definition of AMI and the limitations of agglomerative hierarchical clustering, the current approach is limited to non-overlapping community detection in undirected and unweighted networks. We will continue to expand our research to more types of networks, as well as the detection of overlapping communities.