Identifying Communities in Dynamic Networks Using Information Dynamics

Identifying communities in dynamic networks is essential for exploring the latent network structures, understanding network functions, predicting network evolution, and discovering abnormal network events. Many dynamic community detection methods have been proposed from different viewpoints. However, identifying the community structure in dynamic networks is very challenging due to the difficulty of parameter tuning, high time complexity and detection accuracy decreasing as time slices increase. In this paper, we present a dynamic community detection framework based on information dynamics and develop a dynamic community detection algorithm called DCDID (dynamic community detection based on information dynamics), which uses a batch processing technique to incrementally uncover communities in dynamic networks. DCDID employs the information dynamics model to simulate the exchange of information among nodes and aims to improve the efficiency of community detection by filtering out the unchanged subgraph. To illustrate the effectiveness of DCDID, we extensively test it on synthetic and real-world dynamic networks, and the results demonstrate that the DCDID algorithm is superior to the representative methods in relation to the quality of dynamic community detection.


Introduction
Systems in the real world can be abstracted into complex networks, and a large number of algorithms for complex network mining have been proposed [1][2][3][4][5], most of which focus on static networks. However, most networks in the real world are evolving over time. For example, news and public opinions are constantly being produced, spread and abandoned; infectious diseases are constantly appearing, spreading and dying; urban road networks are constantly being built, expanded and demolished; organizations or groups in social networks are constantly forming, growing and dissolving, and so on. Studying the structural characteristics of dynamic networks is beneficial to analyze the formation mechanism of networks, explore the implicit structure, and predict the evolution of the network structure. These have important theoretical and practical value for the development of various fields and disciplines in real society. For decades, researchers have conducted extensive research on community structures in complex networks and have proposed many well-known static community discovery methods, such as GN [6], CPM [7], LPA [8], Louvain [9], Infomap [10], MCL [11], etc. However, the network structure in the real world is not static. In contrast, the community structure of most networks is constantly evolving over time. The existing static community discovery methods divide the community based on the static topology of the network and have not considered the relationship between the network structures under multiple snapshots. Therefore, most static community detection algorithms are not suitable for dynamic networks.
Recently, community detection in dynamic networks has attracted increasing attention and become a popular research topic because of their high potential for apprehending social phenomena over time. Many methods have been designed for identifying the communities in dynamic networks from different perspectives, which include dynamic community detection based on evolutionary clustering [12] and incremental [13] dynamic community detection. The evolutionary clustering-based method defines a general clustering framework, so that some static clustering algorithms can also be extended to community detection in dynamic networks. Evolutionary clustering includes two evaluation indicators; one is the clustering quality of the current snapshot, and the other is the difference of the clustering result between the current and the previous snapshots. The methods based on evolutionary clustering consider both that the evolution of the network structure is smooth and the relationship between snapshots, but such methods still have a high time complexity. Similar to evolutionary clustering, the incremental dynamic community detection methods still assume that the network evolution is smooth. This type of algorithm first performs global community detection on the initial network, and then incrementally detects the changed subgraphs in the snapshot network. Therefore, incremental community detection methods often have faster detection speeds. Such methods have also attracted the attention of researchers and several incremental algorithms have been proposed [13][14][15]. However, most of the existing incremental dynamic community detection methods have gradually reduced their performance as the number of snapshots increases. In the process of incremental community identification, only the local structure of the network is detected; therefore, some changes in the associated structure may be neglected. In addition, when processing the addition and deletion of nodes and edges incrementally, the processing order may have an impact on the detection results and efficiency [16]. Therefore, accurately and efficiently identifying the community structure in a dynamic network is still a very interesting and challenging task.
In this paper, we propose a dynamic community detection framework based on information dynamics. Based on this framework, we design a dynamic community detection algorithm DCDID. We regard a network as a closed dynamic system with the information exchanges in the network based on the topology. Through the exchange of information between nodes, nodes with the same characteristics will self-organize, and finally, the community structure in the network will be naturally exposed. This new viewpoint provides a fresh approach to dynamic community detection and it has some attractive features. We will introduce DCDID in Section 3, but first, let us describe the basic idea of the proposed method.

Basic Idea
In social networks, a person usually prefers to communicate with people who share their values. This feature promotes the formation and development of groups, and groups also facilitate the communication of information between people. Therefore, the exchange of information among people plays a very important role in the formation and growth of communities. The structure of the network affects information propagation; in turn, the spread of information also reflects the community structure of the network. Based on these characteristics, we can construct models to automatically reveal the community structure by imitating the communication of information between people. Thus, we designed a new approach based on information dynamics to acquire insights into the division of communities in dynamic networks, where the basic idea is to consider an accommodative dynamical system and investigate its information dynamics over time. In particular, in an interpersonal network, people with similar interests or features are more likely to interact with others, and the propagation of information between them tends to be more frequent. With the diffusion and interaction of information, people in the same community have almost the equivalent amount of information, whereas those in diverse communities have different amounts of information. Over time, the information dynamics on the network reaches the steady state, and the communities can be naturally uncovered by counting the amount of information of nodes in the network.
To better illustrate the basic idea, we employ a toy network as an example. Figure 1 describes the process of information dynamics in a dynamic social network, which consists of many users denoted as cartoon persons with different colors. In this dynamic artificial network, we take a company as an example to present the process of dynamic community detection based on information dynamics, which comprises the following stages. First, everyone possesses his/her own knowledge as initial information because of distinct occupations (i.e., v = 0.66, u 1 = 0.5), as shown in Figure 1a (see Definition 3). Then, the information spreads through the topological structure of the network (see Equation (4)). For example, user v shares its information with the connected neighbors u 1 − u 4 , as shown in Figure 1b. Over time, the amount of information exchanged between people tends to zero and the information dynamics reaches the steady state. Next, the communities are naturally revealed by computing the different information in the network. Figure 1b shows the community structure detected by the information dynamics model in the time slice T 0 . Building upon the information of communities detected at the time slice T 0 , an incremental community discovery framework is adopted for the subsequent snapshot networks (see Figure 2), which includes adding nodes, deleting nodes, adding edges, and deleting edges events. Figure 1c-f demonstrates that the addition and deletion of nodes and edges may lead to changes in the network structure. For example, the addition of user u 5 causes the two communities to merge into one community, as shown in Figure 1c. Similarly, Figure 1d,e present that the deletion of users u 6 and u 7 results in a split in the community.

Contributions
By imitating the information dynamics, the proposed dynamic community detection algorithm has several attractive benefits, the most important of which are listed as follows: • Effective Dynamic Community Detection: We propose an information dynamics-based framework, which employs the batch processing technique to incrementally uncover community structures in dynamic networks. In addition, we develop a dynamic community detection method DCDID for revealing the communities by simulating the exchange of information between nodes in each time slice of dynamic networks. The DCDID method provides a natural manner to uncover community structure and obtains high-quality communities in each time slice (cf. Figures 7-12). • Parameter-free: The DCDID approach does not require parameter settings and prior knowledge, and it automatically detects the communities through information dynamics driven by the local topological structure of the network. • Scalability: Because of using the batch processing and incremental technology, DCDID only needs to update those communities where the subgraph changed and keep the rest of the subgraph unchanged in each time slice. Thus, DCDID has a low time complexity and can be applied to large-scale dynamic networks. (cf. Section 3.5).
The rest of this paper is organized as follows. In the second section, we introduce the related work. Section 3 gives the information dynamic model and presents the dynamic community detection framework and algorithm in detail. Section 4 describes the experimental results of DCDID on synthetic and real-world networks. Finally, our conclusions are presented in Section 5.

Related Work
In recent years, many methods have been proposed for community detection in dynamic networks. Here, we only summarize some contributions in the most closely related areas from the literature. For more detailed research work of dynamic community detection, please refer to the review literature in [17][18][19].
Incremental-based Methods. The topic of Incremental community detection in dynamic networks has attracted significant attention in recent years. Shang et al. [20] proposed a real-time detection algorithm in dynamic networks, which detects the community structure of the initial network based on the Louvain algorithm, and then employs the strategy of modularity optimization to update the community structure according to the type of the changed edge. Chong et al. [16] presented an incremental batch processing technique, which uses the Louvain method for batch processing of the changed structure. In addition, it pointed out some problems existing in the real-time processing algorithms based on event processing of nodes and edges, such as the impact of the processing sequence on detection results and efficiency. LBTR is an incremental community detection method based on machine learning, which was proposed by Shang et al. [14]. This method classifies and predicts nodes that need to be processed by machine learning methods, and filters unchanged nodes to improve the efficiency of community detection. Takaffoli et al. [21] introduced an Incremental L-metric community detection method, which employs the current and temporal data to identify the communities in each snapshot network.
Evolution-based Methods. Chakrabarti et al. [12] provided a general framework, which simultaneously optimizes two evaluation indicators: the clustering quality of each snapshot network and the difference of the clustering result from one snapshot to the next. Chi et al. [22] proposed an evolutionary spectral clustering that is an extension of evolutionary clustering. FacetNet [23] is a classic and widely used approach for analyzing communities and their evolutions in dynamic networks. It uses the nonnegative matrix factorization to combine communities and their evolutions in a unified way, which is different from traditional methods in that two-stage techniques are handled separately. However, the FacetNet method requires prior knowledge and must specify the number of communities. Kim and Han [24] introduced an evolutionary clustering method based on particles and density (PDEM). This method treats the network as a collection of tiny particles and divides high-quality communities through cost-embedding technology and modularity optimization. Folino and Pizzuti [25] presented a multiobjective genetic algorithm (DYNMOGA) based on an evolutionary clustering framework to identify the community structure in dynamic networks. DYNMOGA maximizes the cost of the current snapshot by optimizing the modularity to achieve good community partitioning and minimizes timing costs by optimizing the normalized mutual information (NMI). DYNMOGA can achieve a better quality of community discovery, but one of its disadvantages is high time complexity.
Optimization-based Methods. Optimization-based approaches handle the community detection by transforming it into an optimization problem, and the optimal solution is obtained by optimizing a predefined objective function. However, this is often an NP-hard problem. Tantipathananandh et al. [26] proposed a framework for community detection in dynamic social networks. In each time step, the social interaction of individuals or organizations is observed in the form of subgraphs. It uses combinatorial optimization algorithms based on dynamic programming, exhaustive search, maximum matching and greedy heuristics to approximate the community structure in the network. One disadvantage of this algorithm is the high time complexity. Niu et al. provided L-DMGA [27], which is a multiobjective optimization genetic algorithm and is based on label propagation dynamics. L-DMGA regards community detection in dynamic networks as a multiobjective optimization problem. Quick Community Adaptation (QCA) [28] is a modularity-based optimization method, and it is also an incremental dynamic community discovery method. QCA uses the Louvain method to perform initial community partitioning and then processes the changed nodes and edges using the objective function in the subsequent snapshot networks. One advantage of the QCA approach is the low time complexity due to the incremental technology it employs. Agarwal et al. [29] introduced a community detection method (DyPerm) based on maximizing permanence in dynamic networks. One disadvantage of the DyPerm method is that it needs to have the initial community structure of the network, but this is unknown in many real networks.
Other Methods Xie et al. [15] developed an incremental community detection method (LabelRankT) based on label propagation in dynamic networks. An advantage of this method is its low time complexity (O(m)), which is linear with the edges m. However, the LabelRankT algorithm has several parameters that are difficult to set [30]. Xin et al. [31] presented an adaptive random walk sampling method (ARWS), which only updates the adjacent nodes of nodes affected by dynamic events. ARWS is used to detect overlapping community structures in dynamic networks. Quiles et al. [32] introduced a dynamic community detection method based on particle dynamics, which regards the nodes of the network as particles, and employs the diffusive equations of motion and the topological properties of the network to construct a dynamic model of the interaction between particles for clustering. This method needs to set the relative strength parameters of attraction between particles.
In summary, many types of dynamic community detection algorithms have been proposed from different perspectives. Each algorithm has its own advantages and disadvantages, such as parameter setting, high time complexity, and detection accuracy decreasing with time slice. Therefore, efficient community detection in dynamic networks is still a challenging task.

Preliminaries
Dynamic community detection is the process of partitioning communities in each snapshot of a continuously changing network. We model a dynamic network as an undirected graph G t = (V t , E t ), where t represents the time step, V t is the set of nodes, and E t is the set of edges. The definition of community detection in dynamic networks is depicted as follows: Let DG = {G 1 , G 2 , ..., G T } be a given dynamic network, where T denotes time steps (1 ≤ t ≤ T). Let P t = {C 1 , C 2 , ..., C k } represent the result of community partition in the snapshot network at time step t. The purpose of dynamic community detection is to find an optimal partition DP = {P 1 , P 2 , ..., P T }. Before elaborating the proposed framework for dynamic community detection, we introduce some basic definitions, which will be used in the following sections. Definition 1. (Jaccard similarity coefficient [33]) Let G t = (V t , E t ) be an undirected network at time steps t. The Jaccard similarity coefficient of two nodes v and u is defined as follows: is the set of adjacent nodes of node u .
In real life, social networks often include strong and weak relationships, which play a significant role in information propagation and community formation. To describe this relationship, we use contact strength to represent the degree of tightness between nodes in a given network. Here, we use triangles to formalize the definition of the contact strength because the triangle structure can better characterize the tightness of the vertices.
be an undirected network at time step t, and the contact strength of vertex v on vertex u is defined as follows: where T u denotes the number of triangles for vertex u, and the intersection between N(u) and N(v) represents the amount of triangles shared by two nodes u and v.
Here, CS uv is an asymmetric function, in other words, the values of CS uv and CS vu may not be equal. For example, everyone has their own circle of friends, and the contact strength between two people is likely to be unequal.
In the real world, the more friends a person has, the more resources he has, so he may obtain more information. To describe the information of nodes in the network, we use the degree of nodes to characterize the initial information of the nodes. Definition 3. (Information) Let G t = (V t , E t ) be an undirected network at time step t, and the information of vertex u is defined as follows: where D u represents the degree of vertex u, and D max denotes the maximum degree of the network G t .

Information Dynamic Model
To reveal the communities in the dynamic networks, we begin to construct the information dynamic model, which includes three parts: information propagation volume, information loss, and propagation model. For a more detailed introduction, please refer to our work in another literature [34].
Propagation Volume. According to the research on the patterns of information diffusion among people in the real world, we assume that everyone can acquire information from their neighbors and propagate information to them. The diffusion of information is greatly dependent on the local topology of a network, such as the degree of a node, similarities and connection strengths of nodes. For example, people prefer to choose to communicate with people with whom they have closer links or common interests. To describe the propagation of information in a more realistic way, we employ the node similarity, connection strength, and information difference to model the amount of information diffusion. Formally, let I u→v represent the information that a node v obtains from its neighbor u, which is defined as follows: where JS uv denotes the Jaccard similarity coefficient between node u and node v, and CS uv represents the contact strength of node v on node u. The coupling function f (·) denotes the information that can be disseminated from the u to v, which is defined as follows: We can see that the nodes with a large information volume are more likely to diffuse and affect the nodes with a small information volume. When the information of I u and I v are close to equal, the amount of information passed between them tends to zero. The loss of information may occur during the information propagation process in the real world. For example, if the information disseminated is familiar or attractive to us, we can understand and spread it more easily. By contrast, owing to environmental factors, people may misunderstand, ignore, or even lose information. To describe the loss of information in a more realistic and accurate manner, we employ the topological features and information volume for its characterization. Formally, we define the loss of information as follows: Information Loss. In the real world, because of the influence of complex environments, loss may occur during information dissemination. As an example, if people are familiar or interested in the information diffused, they may understand and propagate it more easily. Conversely, we may ignore, misunderstand or even lose information. To reflect the loss of information in a more realistic and accurate way, we use the volume of information and the topological features for its characterization. Let I (u→v)_cost denote the loss of information, which is defined as follows: where the first item of the formula characterizes the local topological feature, which consists of the local average similarity and local average degree. I (u→v)_cost is positively correlated with coupling function f (·) and negatively correlated with JS uv . Therefore, the larger the information volume to be diffused, the greater the information loss is, and the more similar the communication objects are, the smaller the information loss is. Information Propagation. Finally, by considering the information volume and the information loss described above, the information dynamics of a node v over time is provided by the following: where I v(t) represents the information of node v at time step t, and the second term of this formula denotes the information that is acquired from its neighbors. As we can see, the information of node v at time step t + 1 includes the information at time step t and the information obtained from its neighbors at time step t + 1. With the evolution of time, the information propagated tends to zero. Eventually, the information in the network will reach equilibrium, and the communities can be uncovered naturally.

Dynamic Community Detection Framework
Building upon the proposed information dynamics model, we construct a dynamic community detection framework. Figure 2 illustrates the rationale of dynamic community detection based on information dynamics, which consists of two parts: initial community detection and incremental community detection. First, the initial communities C 0 are obtained by using the information dynamics model at time slice T 0 , and then the dynamic community detection at time slice T t is performed. Next, we introduce the specific process of dynamic community detection based on information dynamics.

Initial Community Detection.
We employ the information dynamics model to detect the community structure of the whole network at time slice T 0 .
Incremental Community Detection. After the initial community detection using the information dynamics model, an incremental community discovery method is adopted for the subsequent snapshot networks. The process of incremental community detection mainly involves the following steps: (1) Extract Changed Subgraph ∆G t . Unlike static networks, the structure of dynamic networks is evolving over time. Nodes and edges in a dynamic network may occur or disappear over time, which may lead to constant changes in the community structure of the network. Next, we will analyze network events that may cause changes in community structure, including adding nodes, deleting nodes, adding edges, and deleting edges.
(a) Add Nodes. Adding nodes refers to the new nodes added to the current time slice network G t compared with previous time slice network G t−1 . Let AN denotes the set of added nodes, which is defined as follows: where V t and V t−1 represent the set of nodes in the networks G t and G t−1 , respectively. The added nodes can be calculated by solving the difference set of these two sets. What is the impact of adding a new node to the community structure in the network? When the added node is within one community, the original community structure will not change [35,36]. As shown in Figure 3b, adding a node inside one community increases the connection density of the community, and the number of communities has not changed. Therefore, it is only necessary to divide the added nodes into the current community.
Conversely, when the added node is not inside the community, it may cause a change in the community structure, as shown in Figure 3c. In this case, it is necessary to record the added nodes and the connected communities and add them to the subgraph ∆G t . (b) Delete Nodes. The deleted node refers to a node that is removed in the current time slice network G t compared with the previous time slice network G t−1 . Let AN represent the set of deleted nodes, which is given by the following: The deleted nodes can be computed by solving the difference set between sets V t and V t−1 . Figure 4 shows one node deletions within the community and between communities, respectively. We can observe that the deletion of a node within one community or between the communities has caused changes in the structure of the community. Therefore, when deleting a node, we need to add the deleted node and the connected communities to the subgraph ∆G t . (c) Add Edges. Similarly, the added edges correspond to the new edges in the current time slice network G t compared to the previous time slice network G t−1 . Formally, we define the added edges as follows: where E t and E t−1 represent the set of edges in the networks G t and G t−1 , respectively. Figure 5b shows that adding an edge within the community enhances the density of the edges inside it and does not change the structure of the community. Therefore, it is not necessary to deal with the new edges added. However, the addition of edges between communities may lead to changes in community structure. Figure 5c illustrates that the added edges between communities have led to the merger of this two communities into one community. (d) Delete Edges. The deleted edge refers to the edge removed in the current time slice network G t compared to the previous time slice network G t−1 . Let DE denote the set of deleted edges, which is defined as follows: Deleting edges within one community may lead to changes in the community structure. As shown in Figure 6b, deleting the inner edge causes the community to split. Therefore, we need to add the current edge and the community involved to the subgraph ∆G t . In contrast, deleting the links between the communities weakens the connection between them, which does not cause changes of the original communities. Figure 6c shows an edge deleted between communities, and the communities have not changed. Therefore, there is no need to process the deleted edges and related communities. (2) Calculate Changed Communities ∆C t . After obtaining the subgraph ∆G t that may change, we need to redetect the communities in the subgraph. Here, we employ the information dynamics to discover the subgraph ∆G t incrementally and obtain the corresponding community structure ∆C t (cf. Algorithm 3) (3) Compute Unchanged Communities C t−1 . Based on the acquired networks G t−1 and ∆G t , we can calculate all the communities and the communities that may change at the t − 1 time slice. Therefore, the unchanged communities can be obtained by calculating the difference set of the two sets. (4) Compute Communities C t . Communities in the network at time slice t are composed of the unchanged communities at the previous time slice t − 1 and the changed communities at the time slice t. Let C t denote the communities of network G t at time slice t, which is given as follows: where C t−1 represents the unchanged communities at the previous time slice t − 1, and ∆C t denotes the changed communities at the time slice t.
Unlike the general real-time incremental algorithm, the incremental method based on information dynamics proposed in this paper adopts batch processing instead of single event processing. The advantage of this method is that it can improve the detection efficiency of the community. In contrast, when the events are handled one by one, the difference in the order of event processing may result in different detection results, and the detection efficiency of the community is also affected [16].

Dynamic Community Detection Algorithm
In this section, we introduce the dynamic community detection algorithm based on information dynamics (DCDID).
(1) Community Detection based on Information Dynamics. Based on the information dynamics models, we identify the community structure by simulating the interaction of information on the network, which mainly involves several steps. In the beginning, each node is provided initial information in light of the local topology features (cf. Equation 3). Then, the information diffuses in the network and every node is constantly interacting with neighbor nodes. The exchange of information between nodes in the same community is more frequent than that in different communities. At each step, every node updates its information based on the information dynamics models (cf. Equation 7). As time evolves, the exchange of information between nodes tends to zero, and the information dynamics of each node in the network reaches the convergent state. Finally, the amount of information for each node in the same community is basically the same, and the information on each node in different communities is different. Therefore, we can naturally uncover the communities by considering the amount of information for each node. (2) Dynamic Community Detection. DCDID mainly consists of the three steps: initial community structure detection, calculation of subgraphs that have changed, and incremental community identification.
(a) Initial Community Structure Detection. The initial community structure is the community partition of the network at time slice T 0 . There is no prior information about community structure in the initial time slice, so it is necessary to perform community detection on the entire network. We use the community detection based on information dynamics (CDID) to identify the community structure of the initial network at time slice T 0 . The CDID algorithm is given in the appendices (Algorithm 2 in Appendix A). (b) Changed Subgraphs. Considering the operations that may cause changes in the community structure, we divide the events that change the network into four categories: adding nodes, deleting nodes, adding edges, and deleting edges. Algorithm 3-6 in Appendix A show the specific process, and each type of event returns a subgraph that may change. (c) Incremental Community Identification. At present, most incremental dynamic community detection methods adopt the fine-grained processing method, which processes an event when an event is generated. For example, when a node is added to the network, the node is detected. The advantage of this design is that the processing of events is takes place in real time, but the disadvantage is that it increases the computational complexity. Here, we employ a batch-based incremental community detection method. Based on the obtained subgraphs that may change, we employ the information dynamics model to incrementally detect the communities. The DCDID algorithm is given in Algorithm 1.

Complexity Analysis
The time complexity of the DCDID algorithm is mainly composed of two parts: one is the time complexity of the initial community partition, and the other is the time complexity of the incremental community detection.
(1) Initial Community Partition. In the beginning, community detection is required for the entire network, so the time complexity is the time taken by the CDID algorithm. The CDID algorithm consists of three steps: information initialization, information dynamic interaction, and community partition. In the first step, CDID needs to compute the initial information, Jaccard similarity coefficient and contact strength. Thus, the time complexity of information initialization is O(k · n), where k denotes the average degree, and n represents the number of nodes in the network. In the second step, the time complexity of information interaction is O(L · n · k) due to the local interaction strategy, where L denotes the number of iterations. It is typically between 20 and 100. In the third step, the time complexity is O(k · n) because of two loops for finding communities. Thus, the time complexity for the initial community partition is O(L · n · k). (2) Incremental Community Detection. The incremental community detection mainly includes the calculation of the changed subgraph and the incremental partition communities. The calculation of the subgraph consists of adding nodes, deleting nodes, adding edges, and deleting edges. The time complexity of adding nodes is O(|∆V t | · k t ), where ∆V t is the nodes set added and k t is the average degree at time slice t. Similarly, the time complexity of deleting nodes is O(|∆V t | · k t ).
The time complexity of adding edges and deleting edges is O(|∆E t |) because of only one loop. The next step is to detect the communities of the subgraph ∆G that have changed, and the time complexity is O(L · |∆V t | · k t ). Thus, the time complexity for incremental community detection is O(|∆E t | + L · |∆V t | · k t ).
In summary, the time complexity of the DCDID algorithm is divided into the complexity O(L · n · k) at the time slice T 0 and the time complexity O(|∆E t | + L · |∆V t | · k t ) at the time slice T t . In general, |∆V t |,|∆E t | and k t are very small, so the time complexity at the time slice T t is relatively low. Therefore, the DCDID method can be applied to dynamic community detection of large-scale networks.

Experiments
In this section, we evaluate our dynamic community detection method DCDID on real-world and synthetic networks to demonstrate its benefits. To extensively research the performance of DCDID, we compare it with several representative dynamic community detection algorithms. Before the empirical comparison, we briefly introduce the comparison algorithms.
QCA [28] is a modularity optimization algorithm based on Louvain [9]. It adaptively updates and detects new community structures according to the changes of network structure and network information in the previous time slice.
FacetNet [23] is a well-known dynamic community detection algorithm, which uses nonnegative matrix decomposition to analyze the community structure and evolution in dynamic networks and optimizes the quality of detected communities through a loss function. FacetNet requires parameter settings, such as the number of communities.
DYNMOGA [25] is a multiobjective optimization genetic algorithm based on evolutionary clustering. It detects the community structure in dynamic networks by optimizing Modularity and NMI. The DYNMOGA algorithm also requires parameter settings.
DyPerm [29] is an optimization method based on permanence, which is also an incremental dynamic community detection method. DyPerm needs to specify the actual communities at the beginning.
InBatch [16] is an incremental dynamic network community detection algorithm based on the Louvain method. It processes the changed network structure in batches instead of an event-by-event approach.
LBTR [14] is an incremental dynamic community detection method based on machine learning. It also uses the Louvain method to obtain the initial community structure, and then implements the machine learning method for classification, prediction and revision.

Data Description
Synthetic Networks.Although there are many dynamic networks in the real world, we rarely know the ground truth of communities. Therefore, we constructed many synthetic networks with ground truth communities to evaluate algorithms. To obtain a synthetic network that is more similar to a real-world network, we employ the extended LFR [37] model to generate dynamic networks. The extended LFR benchmark can be easily controlled by several parameters, such as the number of time slices, average degree, community size, mixing parameter, and several events that cause the structure of the network to change. Table 1 shows the detailed descriptions of parameters for this benchmark model.

Real-world Data Sets.
To evaluate the effect of the dynamic community detection algorithms more comprehensively, we select several different scales of real-world networks with ground truth. Table 2 shows the statistical properties of each dynamic network, where N denotes the average number of nodes, M represents the average number of edges, k is the average degree, CC is the average clustering coefficient, and S denotes the number of time slices. All these dynamic networks are available at the website of social patterns (http://www.sociopatterns.org/datasets/) and the data repository of citation networks (https://www.aminer.cn/citation). Next, we briefly describe these real-world networks.

High school dynamic contact networks (HSD11):
This data set contains the time series network of contacts between three classes in a high school in Marseille, France, in December 2011. HSD11 includes 7 time slices, and the network of each time slice contains three communities. In HSD11 dynamic network, one node represents a student, and one edge indicates that there is a connection between the students.
2012 High school dynamic contact networks (HSD12): Similar to HSD11, this data set is also a time series network for connections between high school students in Marseille, France. This network was collected in November 2012 and consists of five classes. The other information is consistent with HSD11.
Primary school contact networks (PS): This data set contains a time series network of contacts between children and teachers. Each child or teacher represents a node, and the contact between them represents an edge.

Contact network in a workplace (CW):
This data set is a time series network of contacts between people in an office building in France. The CW dynamic network consists of five departments as real communities and records the contact between people at intervals of 20 seconds.
Cumulative coauthorship network (CC): This data set is a collaboration network derived from the citation network. The data set used in this paper has been collated and modified by reference [38]. In this dynamic network, one node represents an author of one paper, and an edge denotes the relationship between the coauthors of a paper.

Noncumulative coauthorship network (NCC):
This data set is also a collaboration network similar to the CC dynamic network. The CC dynamic network accumulates the changes of each node and edge. In contrast, the NCC dynamic network does not accumulate, i.e., when two authors copublish papers many times, there is still only one link between them.

Evaluation Metrics
At present, the evaluation of the performance of the dynamic community detection is mainly to quantify the goodness of community detection in each time slice, and the evaluation criteria adopted are consistent with community detection in a static network. Here, we employ N MI [39] and ARI [40] metrics to evaluate the performance of the comparison algorithms. N MI and ARI are widely used for quantifying the quality of communities detected by algorithms. Before presenting the experiment, let us briefly describe the evaluation metrics.
Normalized mutual information (N MI) is a well-known metric, which originates from information theory. N MI is widely used to evaluate the result of disjoint community detection, which posits that if two partitions of communities are similar, then little additional information is needed to deduce one division from the other. Formally, the definition of N MI is given as follows: where A and B are the partitions of communities, I(A; B) denotes the mutual information of random variables A and B, and H(A) represent the entropy of A. The value of N MI ranges from 0 to 1, where 0 represents that the detected communities are completely independent of the real communities, whereas 1 denotes a perfect match with the ground truth. Adjusted rand index (ARI) is another measure to evaluate the similarity between two communities, which is defined as follows: where RI denotes the similarity of two partitions, which includes all pairs of samples. Next, it calculates the numbers of pairs that are assigned to the same or different partitions in the predicted and true partitions [41]. More specifically, where a i , b j and n ij are values from the contingency table. For details, please refer to [42].

Performance Evaluation
In these experiments, we employed the parameters of comparison algorithms to the default values suggested by the authors. The DyPerm is available at GitHub (https://github.com/ayush14029/ Dyperm-Code). The codes of LBTR and InBatch are provided by Jiaxing Shang [14]. QCA, FacetNet, DYNMOGA, and DCDID are available at GitHub (https://github.com/sunwww168/DCDID). For each dynamic network, the average result of each algorithm was obtained by averaging 20 independent runs. All experiments were performed on a desktop computer with a 3.3 GHz CPU of an Intel Core i5 and 16.0 GB RAM.
Evaluation on Synthetic Networks. The changes of the community structure in dynamic networks mainly include: the nodes' switching community membership in each time slice, the birth and death of communities, the growth and contraction of communities, and the merger and split of communities. To evaluate the performance of the proposed dynamic community detection algorithm, we employ the dynamic LFR benchmark model to generate multiple synthetic networks with different characteristics. To cover several states of dynamic network community structure changes, we evaluate the comparison algorithms from four aspects: node switch, community birth and death, community expansion and contraction, and community merger and split. Without loss of generality, the common parameters of these states are set as follows: the time slices s = 20, the number of nodes in each time slices n = 1000, the average degree k = [5 − 25], and the max degree maxk = [20 − 50]. The DyPerm algorithm does not take part in the comparison of the initial time slice because of it uses the ground truth as the initial community information.
(1) Node Switch. Node switch refers to the transition of a node from one community to another in different time slices in a dynamic network. The parameter p represents the probability that a node switches community membership in different time slices. We varied the value of p from 0.1 to 0.8 and fixed the parameters k = 10, maxk = 20, and µ = 0.1. Figure 7 shows the performance of comparison algorithms with different p on N MI and ARI metrics. Because of the limitation of space, we only display the results of community detection when p is 0.1, 0.4 and 0.8. In terms of the N MI metric, DCDID and DYNMOGA methods acquired the best effects, and the values of N MI obtained on each time slice were approximately 0.95. It demonstrates that the performances of these two methods are relatively stable over time. FacetNet also performed well, and the N MI values achieved reached to 0.9. However, the FacetNet algorithm needs to specify the number of communities in the network, which is often unknown in the real world. DyPerm also had a stable performance, and the values of N MI were basically maintained at approximately 0.8. QCA, InBatch, and LBTR methods initially implemented very high N MI values because these algorithms use the Louvain method to detect the community structure of the initial time slice. However, the performances of these algorithms gradually declined over time. In particular, the N MI values obtained by the InBatch algorithm were close to 0.2 when p was larger than 0.4. In terms of the ARI metric, DCDID obtained the best effect, which achieved the highest ARI value among these algorithms. DYNMOGA and FacetNet methods acquired better results than the other comparison algorithms. Although DyPerm attained good N MI values, its ARI values were very low. Because the granularity on the community partition of the DyPerm algorithm became increasingly fine with the increase of time slices, the number of communities it partitions was usually several dozen times the number of real communities. The ARI values of QCA, InBatch and LBTR algorithms also decreased with the increase of time slices. In particular, the values of ARI were close to zero when p was larger than 0.4 and the time slice was greater than 10. (2) Community Birth and Death. To investigate the effects of comparison algorithms on the birth and death event of communities, we fixed parameters k = 10, maxk = 20, µ = 0.1, and p = 0.1, and varied the number of birth and death communities to generate dynamic networks. Because the dynamic LFR model cannot generate a dynamic network when the number of birth and death communities both reach 16, we changed the number of birth communities from 2 to 16 and varied the number of death communities from 2 to 8. Figure 8 reveals the performance of each algorithm on N MI and ARI metrics with different numbers of birth and death communities. In terms of the NMI metric, DCDID and DYNMOGA were very stable, and they obtained higher values than the other algorithms. FacetNet cannot run on this dynamic network because the number of communities in each time slice is constantly changing over time. DyPerm also performed well with the N MI values stable at approximately 0.8, but the ARI values were very low. In particular, its values were close to zero when the number of birth and death communities were greater than 8 and the time slice was greater than 12. QCA and LBTR achieved acceptable results, and they also obtained an N MI value of 0.5 when the number of birth communities reached 16. By contrast, InBatch did not perform well in this group of experiments and obtained the lowest values of N MI and ARI.   (4) Community Merger and Split. Figure 10 describes the effectiveness of comparison algorithms when the number of merger and split communities varied from 5 to 40. We can observe that DCDID performed best in these comparison algorithms, and the N MI value of each time slice was stable at approximately 0.96 when the number of merger and split communities was less than 20. DYNMOGA also performed well, which achieved better results of community detection than other comparison algorithms. As shown in Figure 10c, the results of DCDID, DYNMOGA, FacetNet and DyPerm algorithms fluctuated greatly when the number of communities merged and split in dynamic networks reaches 40, i.e., almost all communities in the network have changed. We can see that the N MI and ARI values obtained were lower in the seventh time slice, but the quality of the community detection of these four algorithms was still better than other comparison algorithms. The performance of QCA, InBatch and LBTR algorithms also decreased gradually over time. Interestingly, the N MI values of InBatch had a higher improvement when the number of communities merged and split increased to 40 in the 13th time slice (Figure 10c). We analyzed the network structure of the current time slice and found that the community structure under this time slice was relatively clear, and there were fewer links between the communities. This may also be the reason the N MI values of the DCDID, DYNMOGA and FacetNet algorithms had a peak in this time slice. Real-world Networks. To further compare DCDID and the other algorithms, we tested their performances on several real-world dynamic networks with ground truth. The N MI and ARI indexes are employed to evaluate the efficiency of these algorithms. Figures 11 and 12 show the evaluation results of comparison algorithms on real-world dynamic networks. In the CC and NCC dynamic networks, DCDID and DyPerm achieved the best quality of community detection and obtained N MI values of approximately 0.5. In terms of the ARI metric, most of the comparison algorithms did not perform ideally on these two networks, which may be related to their low-average degree. As shown in Table 2, the average degree of each time slice is 4.3. However, the DCDID method acquired the highest value compared with the other algorithms. It indicates that DCDID can still achieve better quality of community detection in the dynamic networks with a low-average degree. DYNMOGA and FacetNet methods cannot run on these two dynamic networks because the number of nodes in them have reached 100,000. QCA obtained acceptable performance on these two real networks, and the results of community detection are better than that of the InBatch and LBTR methods. On CW, HSD11 and HSD12 dynamic networks, the InBatch algorithm performed very well, where it obtained the highest N MI and ARI values. The DCDID algorithm also achieved good quality of community detection on these three networks, and the results are better than those of the DYNMOGA, QCA and DyPerm algorithms. In the HSD11 and HSD12 dynamic networks, LBTR yielded stable results, similar to DCDID. On the PS dynamic network, DCDID and LBTR obtained the best quality of community detection, and the N MI and ARI values obtained were approximately 0.9 and 0.8, respectively. The QCA method also produced acceptable results on this network. The DyPerm algorithm performance was not ideal, and it failed to identify the communities in CW, HSD11, HSD12 and PS networks. In summary, many experiments on synthetic dynamic networks demonstrated that the DCDID method achieved good quality of community detection under different network events. The evaluation on the real-world dynamic network shows that DCDID cannot only achieve good community recognition on low-average degree networks but also achieve good community detection performance on other dynamic networks with different characteristics. The DyPerm algorithm performed well on most synthetic networks, but its community divisions were too fine-grained, which resulted in too low ARI values. In addition, this algorithm uses ground truth community information as the initial information; however, this is unknown in most real dynamic networks. DYNMOGA and FacetNet methods performed well on generation networks, but their spatial complexity is high, and they cannot be run on real networks CC and NCC. In addition, the FacetNet method requires prior knowledge and must set the number of communities. QCA, InBatch and LBTR had significant cumulative errors on the synthetic dynamic networks, and their performance degraded rapidly over time. However, InBatch and LBTR methods have achieved good quality of community detection in real-world dynamic networks. In particular, the InBatch method achieved the best quality of community detection on CW, HSD11 and HSD12 networks.

Runtime
To evaluate the scalability of DCDID at different network scales, we employed the dynamic LFR model to generate dynamic networks with different sizes. We varied the number of node ranges from 1K to 100M and fixed the parameters at k = [10 − 20], maxk = [20 − 50], p = 0.1, µ = 0.1, and s = 5. Figure 13 reveals the overall running time of the comparison algorithms in five time slices of the dynamic networks. We observed that the DyPerm algorithm used the highest runtime, and the runtime was more than ten days when the number of nodes reaches 5M. The DCDID algorithm was faster than the DyPerm, FaceNet and DYNMOGA methods, and this advantage was obvious as the size of the network increased. This difference was mainly due to the low time complexity of O(|∆E t | + L · |∆V t | · k t ), where |∆V t |,|∆E t | and k t are very small. Therefore, the DCDID algorithm can handle large-scale dynamic networks. When the number of nodes reaches 5M, FaceNet and DYNMOGA methods indicated that there was insufficient memory space to run. Although DCDID was slower than QCA, InBatch and LBTR, its quality of community detection was higher than these three algorithms.

Conclusions
In this paper, we proposed an information dynamics-based dynamic community detection framework, which uses a batch processing technique to incrementally uncover community structures in dynamic networks. In addition, we designed a new algorithm called DCDID and used the information dynamics model to simulate the exchange of information between nodes in each time slice of dynamic networks. DCDID provides an intuitive and topologically driven manner for community detection in dynamic networks. To validate the performance of DCDID, we compared it with six representative dynamic community detection algorithms on synthetic and real-world datasets.
Experiments demonstrate that DCDID performed well at dynamic community detection and obtained better results than the representative methods compared in the evaluation.

Conflicts of Interest:
The authors declare no conflict of interest.