Identifying and Ranking Inﬂuential Nodes in Complex Networks Based on Dynamic Node Strength

: Identifying and ranking the node inﬂuence in complex networks is an important issue. It helps to understand the dynamics of spreading process for designing efﬁcient strategies to hinder or accelerate information spreading. The idea of decomposing network to rank node inﬂuence is adopted widely because of low computational complexity. Of this type, decomposition is a dynamic process, and each iteration could be regarded as an inverse process of spreading. In this paper, we propose a new ranking method, Dynamic Node Strength Decomposition, based on decomposing network. The spreading paths are distinguished by weighting the edges according to the nodes at both ends. The change of local structure in the process of decomposition is considered. Our experimental results on four real networks with different sizes show that the proposed method can generate a more monotonic ranking list and identify node inﬂuence more effectively.


Introduction
Spreading phenomenon generally exists in various fields including physics [1], ecology [2], biology [3] and so on, which is studied in the framework of complex network appropriately. Identifying and ranking the influential spreaders in the complex network is significant for understanding the spreading mechanism. Many works focus on how to measure the influence of nodes in a sufficiently large and complex network [4][5][6][7][8][9][10][11]. As is known, there are numerous classic topology metrics, including degree centrality [12], betweenness centrality [13], closeness centrality [14], Katz centrality [15] and so on, which can be adopted to distinguish which nodes are more important. However, the methods of ranking influential nodes are constantly improved.
In recent years, various efficient ranking methods to identify influential nodes in the complex network are proposed, such as LeaderRank [16], PageRank [17], semi-local centrality [18], TOPSIS [19,20], HybridRank [21] and so on. Among these methods, the k-shell method proposed by Kitsak et al. [22] is the most well-known. It has been found that the location of spreader has a great impact on spreading, and the most influential spreaders are often located in the core of the network. Based on this feature, they proposed the method named k-shell to decompose network according to degree. In this method, each node is assigned to an index k s and nodes with large k s are influential. There is an obvious drawback of decomposing networks according to the classical k-shell method: too many nodes are assigned to the same rank which leads to the coarse ranking. This is caused by a variety of factors: First, the standard for a node to be removed is only its degree, which does not take local influence of its neighbors into account [23]. Second, the process of removing nodes is recursive, and the order of nodes removed will affect the ranking result. Third, the global characteristics of the network are considered insufficiently [24].
The idea of ranking influential spreaders based on decomposing network is desirable because this kind of method generally has low computational complexity. The process of decomposing networks could be regarded as the recurrence of spreading process in some sense [25]. Joonhyun et al. [26] presented that the locations of node neighbors should not be ignored and proposed a novel method based on decomposing networks, coreness centrality (C nc ), to evaluate spreading capability of a node by considering the coreness of its neighbors. Zeng et al. [25] considered the residual edges when decomposing the network, whereas the information contained in the exhausted nodes and edges is ignored, and proposed the mixed degree decomposition (MDD) to consider both the residual degree and the exhausted degree. Liu et al. [27] proposed a method to generate a more distinguishable ranking list based on distance from the target node to the core of network. Wang et al. [28] utilized neighborhood attributes and entropy method to weight the node position based on iteration information in the k-shell decomposition. Overall, decomposing networks to identify the node influence is advisable.
Based on decomposing networks dynamically, we propose a method named Dynamic Node Strength Decomposition (DNSD) to identify and rank node influence: both the difference of edges and the influence of decomposition order on the node ranking are also taken into account. To evaluate the effectiveness of the proposed method, we apply the Susceptible-Infected-Recovered (SIR) model to simulate spreading process in four real networks and measure the Kendall's τ between the ranking and spreading capacity of nodes. Experimental results show that our method has good advantage in resolution ratio and performs more effective in identifying influential spreaders than other methods.

Methods
Decomposing a network to identify node influence is known as the k-shell method [22], which partitions a network into hierarchical sub-structures related to centrality. After decomposition, each node is assigned an index, k s , which represents the location of the node in the network. Nodes with high/low values of k s are located to the center/periphery of the network. The innermost nodes constitute the core of the network, while other nodes constitute external layers (k-shells). An example network is given to illustrate how k-shell method works (see Figure 1). First, the nodes with degree k = 1 in the network are removed. This process is repeated recursively until the degree of the remaining nodes is greater than 1, and these nodes are assigned to k s = 1. Then, nodes with k = 2 in the network are removed recursively until the degree of the nodes remaining in the network is greater than 2. The nodes removed in this process are assigned to k s = 2. This routine runs until all nodes of the network have been assigned to corresponding k s . Finally, nodes with the same k s are assigned to the corresponding k-shell.
The Mixed Degree Decomposition (MDD) method notes that the dynamic process of decomposing network have influence on the resulting result [25]. Some nodes are removed at each decomposition step and the edges linked to these nodes are removed too, which changes the degree of the remaining nodes. MDD defines an index named as the mixed degree that is the weighted sum of the residual degree (number of edges linked to the remaining nodes) and the exhausted degree (number of edges linked to the removed nodes). The nodes are removed according to the mixed degree in each step of MDD procedure. The effect of MDD method to the sample network in Figure 1 is shown in Table 1. This idea that considers the influence of the removed edge is also adapted in our method. However, the edges are weighted based on the nodes at both ends.
The neighborhood coreness centrality method (C nc ) considers that the k s of each node generated by the k-shell method reflects the information of location, and a spreader with more neighbors located in the core of the network is more influential [26]. C nc adopts the k-shell method to obtain the k s of each node first. Then, the neighborhood coreness centrality of the target node is equal to the sum of its neighbors' k s . The ranking list of node influence is generated based on the C nc . There is an improved C nc method named as the extended neighborhood coreness C nc+ , which denotes the k s of neighbors and two-step neighbors as the metric of node influence.  Table 1. Table 1. The ranking lists of the sample network in Figure 1 measured by different methods. α is set to 0.5 and β is set to 0 simply. The ranks greater than 12 are not shown. Inspired by these methods, we propose our method, Dynamic Node Strength Decomposition (DNSD). This method also applies the idea of dynamically decomposing network dynamically, but it is based on an alternative standard for removing the node. In many works, the edges in unweighted networks are treated equally. This means that, for different nodes, there are only differences in the number of edges, but no differences in features of edges. If there is only one edge between two nodes with a large number of edges, the importance of this edge is obviously vital. In this moment, it is unreasonable that this edge and other edges are treated equally. Actually, edges are spreading paths, which are different in the spreading process. Information could spread more widely through some specific paths. Thus, the importance of different edges should be distinguished, which depends on the nodes connected by the edges. To reflect this potential difference, we define the edge weight as

Rank
According to Equation (1), if an edge connects two nodes with large degree, it is assigned a large weight.
In the process of decomposing network, the local characteristics of the network change at each step of decomposition as the nodes are removed [29,30]. This means that the order of node removal affects the final result. In DNSD method, this effect is taken into account. We define the edges linked to the nodes that have been removed as vanishing edges and the edges linked to the existing nodes as existing edges. The dynamic weight of node i in the decomposition process is composed of the existing weight, w e ij , and the vanishing weight, w v ij , as follows: where α is the variable parameter in the range of 0-1. If α is set to 1, the dynamic weight is equal to the normal weight. When α is set to 0, it means that the vanishing weight is not considered. Consequently, we encounter the decomposition criteria by taking the edges weights and two-step weights into account. Each node is assigned to a node strength by the following measure: where γ(i) and γ(j) are the set of neighbors of node i and node j, respectively. β is is the attenuation coefficient in the range between 0 and 1. The farther an edge is from the target node, the smaller the β is. because the effect of information spreading in the network is gradually attenuated. In the measure of node strength, not only the weight of the node's edges but also those of its neighbors are considered. Actually, the wider is the neighborhood considered, the more accurately is the spreading outcome of a node predicted. However, the computations increase as more neighbors are considered [23,31]. Usually, it is a choice between efficiency and accuracy. In this way, the detailed decomposition according to DNSD can be conducted in the following procedure: Step 1. Initially, calculate the S i of each node in the network. At this moment, w d ij of each node is equal to w e ij because there is no node that has been removed.
Step 2. Remove nodes with the lowest S i (denoted as S), and these nodes are assigned to the S-shell. Update k d of each remaining node by k d i = k e i + α · k v i . Then, calculate the S i of each node in the network again. Remove all the nodes with S i lower than or equal to S and assign them to the S-shell too. Repeat this routine until the S i of the remaining nodes in the network is greater than S.
Step 3. Execute Step 2 recursively until each node has been placed into one S-shell. For the nodes removed in the final decomposition, they are placed in the S-shell according to the S i in the last decomposition.
Through the above procedure, DNSD method can rank nodes according to the S and assign them in the corresponding S-shell. The most influential nodes are assigned to the maximal S. To illustrate the procedure of DNSD in detail, an example is shown in Figure 2, in which α in Equation (2) is set to 0.5 and β in Equation (3) is set to 0 simply. To further improve the accuracy of DNSD, the S of the target node and its neighbors are also considered. This is based on the idea that crucial nodes not only are located at crucial locations in the network but also have more edges with other nodes located at crucial locations [24,32]. In this way, the improved node strength is defined as: where γ(i) is the set of neighbors of node i. The improved method is called DNSD + . We simply compare the effect of our methods with several methods in the sample network of Figure 1. The results are shown in Table 1. It can be seen there are many nodes with the same rank in the first three methods, among which the k-shell method simply divides the nodes into three ranks. Obviously, DNSD and DNSD + can distinguish nodes more precisely, even though there are still several ranks greater than 12 that are not displayed.

Experimental Results
In this section, we compare the performance of our methods with state-of-the-art methods in four real networks with different sizes: Zacharys Karate Club [33], Email-Eucore [34], Co-authors [35] and US Power Grid [36] networks. The Susceptible-Infected-Recovered (SIR) model [37][38][39][40] is applied to stimulate spreading influence of nodes. The resolution ratio and effectiveness of all methods are computed.

Evaluation Methodologies
First, we apply the monotonicity index, M(R), in [26] to evaluate resolution ratio of different ranking methods, as follows: where R is ranking vector ranking method, n is the size of R and n r is the number of nodes in ties with the same rank r. This index quantifies proportion of ties in R. The M(R) is 0 if all nodes are assigned to one rank, while it is 1 if each node is assigned to a different rank. The closer the M(R) is to 1, the better the resolution ratio of the ranking method is. We remark that there are various spreading processes on networks such as epidemic, rumor, information spreading and so on. In the previous works, the method of decom-posing network is proved to be suitable for the epidemic spreading process [31,40]. In this article, we apply SIR to simulate spreading capacity of nodes. In the SIR model, there are three kinds of individuals: (S) susceptible individuals can be infected; (I) infected individuals have the ability to infect susceptible individuals; and infected individuals are likely to recover with the probability η and become (R) recovered individuals who have immunity to disease and no ability to infect others. Initially, the spreading process starts with only one seed individual infected, and all other individuals are susceptible. At each time step, each infected individual attempts to infect its neighbors with the probability θ and then recover with the possibility η. The epidemic spreading process terminates when there is no infected individual in the network and the disease cannot spread anymore. In this paper, the recovery possibility η is set to 1. We take the average range of recovered population, µ(i), over a sufficiently large number of simulations, which are set to be 1000, as the indicator to evaluate the influence of node i.
To evaluate the effectiveness of the ranking methods, the ranking results by the methods are compared with spreading capacity simulated by SIR model. Kendall's τ [41] is adopted to judge the correlation between the two ranking lists. The spreading capacity ranking list, R 1 , simulated by SIR model is denoted as X, and the ranking list, R 2 , measured by ranking methods is denoted as Y. They have the same number of elements, all of which are n. Let (x 1 , y 1 ), (x 2 , y 2 ), (x 3 , y 3 ), . . . , (x n , y n ) be a series of sequence pairs. For any pair of (x i , y i ) and (x j , y j ), if x i > x j and y i > y j or if x i < x j and y i < y j , it is said to be concordant; if x i > x j and y i < y j or if x i < x j and y i > y j , the pair is said to be discordant; and if x i = x j or y i = y j , the pair is neither concordant nor discordant. The Kendall's τ of R 1 and R 2 is defined as: where n c and n d are the numbers of concordant and discordant pairs, respectively. The higher the τ is, the more accurate the method is.

Applications to the Real Networks
The monotonicity M of different methods is summarized in Table 2, where the basic characteristics of the real networks containing the mount of nodes and edges are shown in the first two columns. The value of α is set to 0.5, because the performances of our methods are best when the value of α is between 0.45 and 0.6, and the better choice only accounts for 1%. Meanwhile, β is set to 0, which is further considered in subsequent experiments, although it is enough to show the superiority of our methods in monotonicity. Note that the monotonic indexes of DNSD and DNSD + are more than 0.9 in four networks, which outperform the other four methods. The C nc is also competitive, but there is still a gap compared with our methods. The performance of the DC is not much different from MDD, because the two are equivalent when the coefficient before the exhausted degree in MDD equals to 1. The performance of k-shell is the worst in any case, especially in Power Grid and Karate Club. This result shows that the ranking lists generated by our methods have strong monotonicity.
To more clearly show the advantage of our methods in resolution ratio, we plot complementary cumulative distribution functions (CCDF) of the rankings, as shown in Figure 3. Note that the curves of DNSD and DNSD + always descend slowly, which means that our methods can distribute nodes in more ranks compared with other methods. Specifically, DNSD generates 27 ranks in Karate Club, 136 ranks in Email, 141 ranks in Co-authors and 102 ranks in Power Grid, while DNSD + generates 27 ranks in Karate Club, 153 ranks in Email, 362 ranks in Co-authors and 391 ranks in Power Grid. Obviously, DNSD + is much better than DNSD, because the former considers the location of neighbor nodes. This reflects that the influential nodes always have neighbors close to the core of network. The curve of k-shell descends quickly because the mount of ranks obtained by the k-shell method is limited while the number of nodes in each rank is quite large, which indicates that the node influence is not well identified by the k-shell method. The performances of MDD in Email, Co-authors and Power Grid are much better than those of the other methods, except ours.   Next, we investigate the effectiveness of different ranking methods by comparing the ranking results with the spreading capacity µ(i) obtained from the SIR simulation, which are shown as Figure 4. The θ is set to 1/<k>, where <k> denotes average degree of the network. Two comparison methods and the proposed methods for each network are shown. The number of nodes in the Karate Club is limited, and the difference among methods is not obvious. In Email, the µ(i) fluctuates greatly at each rank by C nc and k-shell. The correlation between µ(i) and ranking measured by DNSD is much higher, and DNSD + is even better. In Co-authors, the distributions of points by methods concentrate at low ranks. Some nodes with low ranks have high µ(i) and few nodes with high ranks only have below average µ(i) by DC and k-shell. Note that the distribution of the points is very diffuse for DC and MDD in Power Grid. On the contrary, the points gather on the trend that slants upward to the right for DNSD and DNSD + . There is a long tail phenomenon in the (p), which shows that the effect of DNSD + is not very good at high ranks in Power Grid. The reason may be that the structure of Power Grid is simple. In other words, it is obvious that our methods outperform the other methods by showing that rankings by DNSD and DNSD + are highly correlated with the spreading capacity µ(i) by SIR simulation in most cases.  Based on above observation, we measure the Kendall's τ of these methods, as shown in Figure 5, where the θ is between 0.06 and 0.2 with an interval of 0.1. If the infection possibility is large enough, the range of infection will cover almost the whole network; thus, the spreading origin is no longer important. In Figure 5a-d, the α in Equation (2) is set to 0.5. The effect of DNSD and DNSD + are shown in Figure 5e-l when α is set between 0.3 and 0.7 with an interval of 0.1. Note that the τ is very volatile in the Karate Club, because homogeneous nodes could have very different ranks when many measurement factors are considered. When α is set to 0.5 or 0.6, the effect of DNSD and DNSD + is better than those with other α in the corresponding network. We compare DNSD and DNSD + with other methods in each network when α is 0.5. One can observe that the superiority of our methods is not obvious when the infection possibility is lower than 0.1 because it is hard to spread infection far away when the infection possibility is low. However, the τ of our methods exceeds 0.8 in most cases with the increase of infection probability. Although the k-shell method is considered to be able to identify influential nodes, our experimental results show that there is weak correlation between the k s measured by k-shell and the influence of spreaders. These results show that our methods are more effective for identifying node influence.
In the above experiments, β is set to 0 simply, which denotes the influence of two-step neighbors is not considered temporarily. Here, we set β to 0-1 with the interval 0.1 and measure the Kendall's τ between the rankings obtained by our methods and node influence simulated by the SIR model again. The results of the τ under different infection possibility and β are shown in Figure 6 in the form of heat map, where Figure 6a-d shows the ones measured by DNSD and Figure 6e-h those measured by DNSD + . Note that the influence of β in Karate Club is not obvious (there is no obvious concentration trend of red spots) because of small network size. In Email, When β is set close to 0.6 the performance of DNSD is better (the areas are almost red in the middle right part of the heat map), while the performance of DNSD + is better with β between 0.3 and 0.6. In Co-authors, τ is higher than others when β is set between 0.4 and 0.6 for our methods. In Power Grid, DNSD is not sensitive to β, and it is improved a little when β is close to 0.3. For DNSD + , β close to 0.7 outperforms others. One can observe τ become higher (the color of the areas changes from cyan to orange) with the increase of β in most cases. However, it is not that the larger β is, the higher τ is. It is obvious that the network structure and network scale have impacts on the role of neighbors. When the network is small, such as Karate Club, too much consideration of neighbors may result that nodes with similar spreading capability are attached to very different ranks. When the network structure is simple, such as Power Grid, there are many nodes with similar local structure. It is not enough to only take the close neighbors into account. Meanwhile, too many neighbors being considered means a lot of computation, which is a choice between efficiency and accuracy. However, neighbors deserve consideration in our methods. Because our methods are based on decomposing network. Of this type, each iteration in dynamic decomposition process could be regarded as an inverse process of spreading. In theory, the more neighbors are considered, the more accurately the spreading could be predicted.

Conclusions
In this paper, we propose an efficient ranking method based on decomposing networks, Dynamic Node Strength Decomposition, to identify and rank the node influence in complex networks. The effect of decomposition process on ranking result is reflected by dynamic degree, which is evaluated the degree vanished in the decomposing process and the existing degree [25]. The spreading paths are distinguished by the dynamic degree of nodes at both ends of edge, and dynamic node strength is composed of edge weight of nodes and neighbors. Our methods have better monotonic than degree centrality, k-shell, MDD and C nc by monotonicity index. We evaluated the effectiveness of ranking methods by Kendall's τ of the ranking obtained by the ranking methods and the size of infected scale of SIR simulation. The experimental results show that the ranking result of our method are highly correlated with the spreading capacity simulated by SIR. We can draw the conclusion that the proposed methods are better than degree centrality, k-shell, MDD and C nc methods in the specific types of networks, such as email, co-authors and power grid networks [31], when SIR simulation is applied.
Our methods still have much room for improvement in some way. Some works show that the influence of nodes cloud be affected by the size of their community [42,43] . As long as neighbors considered are wide enough, there is not much deviation between rank measured by our method and real influence of node. This is not advisable because of the large amount of calculation. Moreover, finding the set of influential spreaders in complex networks is something our method cannot deal with [44,45], where node influence, the distance between nodes in the influential set and size of influential set need to be considered. Therefore, some more effective and efficient improvements will still be sought in further investigation.