Identifying Node Importance in a Complex Network Based on Node Bridging Feature

: Identifying node importance in complex networks is of great signiﬁcance to improve the network damage resistance and robustness. In the era of big data, the size of the network is huge and the network structure tends to change dynamically over time. Due to the high complexity, the algorithm based on the global information of the network is not suitable for the analysis of large-scale networks. Taking into account the bridging feature of nodes in the local network, this paper proposes a simple and efﬁcient ranking algorithm to identify node importance in complex networks. In the algorithm, if there are more numbers of node pairs whose shortest paths pass through the target node and there are less numbers of shortest paths in its neighborhood, the bridging function of the node between its neighborhood nodes is more obvious, and its ranking score is also higher. The algorithm takes only local information of the target nodes, thereby greatly improving the efﬁciency of the algorithm. Experiments performed on real and synthetic networks show that the proposed algorithm is more effective than benchmark algorithms on the evaluation criteria of the maximum connectivity coefﬁcient and the decline rate of network efﬁciency, no matter in the static or dynamic attack manner. Especially in the initial stage of attack, the advantage is more obvious, which makes the proposed algorithm applicable in the background of limited network attack cost.


Introduction
In recent years, network science research has attracted a great amount of attention from researchers in different fields including physics, mathematics, chemistry, medical science, biology, computer science, sociology and so on [1][2][3][4][5][6][7][8][9]. In particular, the vulnerability of complex networks is one of the most important directions due to its considerable effect on network cascade failure caused by random failure and deliberate attack [10][11][12][13][14][15][16][17][18]. Random failure can be regarded as a simple abstraction of successive errors in complex networks by destroying nodes or edges with uniform probability. Deliberate attack means that the network nodes or edges are attacked according to their importance in descending order under the premise of mastering global information of the network [19][20][21][22][23]. On 14 August 2003, a large-scale power cascade failure in the northeastern United States and eastern Canada caused global concern. Similarly, in early 2008, due to some damage to major transmission lines and key towers, severe ice sheet disasters in southern China caused large-scale blackouts. These examples indicate that important node failure may result in great damage on the whole network. One of the crucial questions in protecting networks from cascading failures is to design an efficient method to identify important nodes and take protective strategies. The important nodes in the network

•
The paper presents a node bridging feature in complex networks, which refers to the fact that the greater the number of node pairs whose shortest paths pass through the target node and the less the number of shortest paths in its neighborhood, the more significant bridge function and structure importance the node has; • We propose a novel node importance identification algorithm based on the node bridging feature, which just needs local information of the target nodes, making the algorithm applicable in large-scale networks; • Through comprehensive experiments on real and synthetic datasets, the proposed algorithm is demonstrated to outperform a state-of-the-art model compared with five benchmark algorithms on evaluation criteria of the maximum connectivity coefficient and the decline rate of network efficiency, no matter whether in static or dynamic attack strategies; • The advantage of the proposed algorithm is more obvious when removing a small number of important nodes, which makes the algorithm applicable in the background of limited network attack cost.
The outline of this paper is as follows. In Section 2, we present our method to identify node importance and introduce the benchmark algorithms, evaluation criteria and datasets. We make simulation analysis on real and synthetic networks under static and dynamic attacks in Section 3. Finally, a summary and some conclusions are stated in Section 4.

The Proposed Method
Considering an undirected and unweighted network G(N, E) composed of N nodes and E edges, the network is represented by A(a ij ) N * N , where a ij = 1 represents node i and node j are connected; otherwise, a ij = 0.
From the perspective of the network robustness and invulnerability, when a node does not work due to deliberate attack and the node is on the shortest path between a pair of its neighbor nodes, the connection of the neighbor node pair would be affected. As shown in Figure 1a, node c builds a bridge between node a and node b. Therefore, removing node c will result in disconnection between nodes a and b. In Figure 1b, nodes a, b, c are connected to each other, so node c no longer acts as an intermediary. In this situation, even if node c is removed, nodes a and b can still maintain effective connection. In addition, as shown in Figure 1c, nodes a and b can communicate with each other through c, but other short communication paths such as a ↔ d ↔ b also exist in the neighbourhood. In this case, the bridging function of node c is weakened. Furthermore, we study the situation when there is a connection between the node's one-hop neighbors and two-hops neighbors. Figure 1d-f show several ways of direct or indirect contact between nodes a and b. In Figure 1d, when there is only one shortest path between nodes a and b, the bridging feature of node c is obvious. In Figure 1e and Figure 1f, when the shortest path length between nodes a and b is less than 3, even if node c is removed, nodes a and b still keep effective contact for there is absolutely at least one another path between nodes a and b in this case. Similar to Figure 1c, Figure 1f shows when there exist two shortest paths a ↔ c ↔ d ↔ b and a ↔ e ↔ f ↔ b of which the length is 3, the bridging function of node c is also weakened. In this paper, we just take into account the target node's neighbors within two-hops. Based on the above observations, we propose a novel method by taking into account the bridging feature of nodes to identify node importance in complex networks. For a node in the network, the greater the number of node pairs whose shortest paths pass through the target node, and the less the number of shortest paths between its neighbor node pairs, the more significant the bridge function and structure importance the node has. The evaluation value of the importance of node i calculated by the algorithm proposed in this paper can be expressed as where P (2) mn represents the number of the paths with the shortest path length two-hops between nodes m and n, P xy represents the number of the paths with the shortest path length three-hops between nodes x and y, Γ 1 (i) and Γ 2 (i) separately represent the sets of neighbors one-hop and two-hops away from node i. If the two-hops shortest paths of the neighbor node pair (such as m and n) do not pass through the target node i, then define P Similarly, if the three-hops shortest paths of the neighbor node pair do not pass through the target node i, the second part of the formula is also 0. Therefore, when any two nodes in the neighborhood are connected, the NBF value of the target node i is 0.
Taking node c in Figure 1g as an example, we show the calculation process of the algorithm. The number of the shortest paths with two-hops between the node pair a and d is 1, so the first part of the NBF algorithm can be calculated as 1 For the two-hops neighbor nodes e and b of node c, calculate the number of the shortest paths with three-hops between them and nodes a and d, separately. It is easy to get P As a result, only the node pair a and b and the node pair d and e contribute to the ranking score of node c. The second part of the NBF algorithm can be calculated as 1 333. Therefore, the importance of node c in Figure 1g is expressed as NBF(c) = 0.5 + 0.333 = 0.8333.

Benchmark Methods
We here introduce the benchmark algorithms compared in this paper, including degree centrality, k-shell algorithm, WL algorithm, ego betweenness centrality and LLS algorithm.

1.
Degree centrality Degree centrality is a very simple ranking algorithm. The node degree k i represents the number of neighbours of node i, namely

K-shell algorithm
The implementation of the k-shell decomposition method is as follows: firstly, continuously remove the nodes with degree one until all nodes' degrees are larger than one. All of these removed nodes are assigned 1-shell. Then, keep removing the existing nodes until all nodes' degrees are larger than two and add the removed nodes to 2-shell. Repeat this procedure until all nodes have been assigned to one of the shells.

3.
WL algorithm WL algorithm holds the opinion that the importance of nodes in the network is closely related to the importance of edges the nodes connected. The weight of edge ij is expressed as where k i is the degree of node i. The weight of node is expressed as where Γ i is the set of neighbors of node i. Thus, the importance of the node is expressed as 4. Ego betweenness (Abbreviated as EgoBet) Ego network consists of a target node and the nodes (1-hop neighbors) which are connected to the target node and all the edges between those nodes. The standard measure of betweenness considers all the shortest paths of node pairs across the target node, while ego betweenness just takes into account the shortest paths of node pairs within the ego network. Ego betweenness can be expressed as where σ(s, t|i) is the number of shortest paths passing through node i between node s and node t, is the total number of shortest paths between node s and node t, and Γ 1 (i) represents the set of 1-hop neighbors of node i.

5.
LLS algorithm LLS algorithm is a method to evaluate the importance of nodes based on similarity of node neighbors. The similarity of node neighbors is calculated by Jaccard index when nodes b and c are not connected, while the value of similarity is 1 when nodes b and c are connected, namely where b(i) is the set of one hop and two hop neighbors of node i. Thus, the node importance is denoted as For the five benchmark algorithms, k-shell algorithm makes use of the information of the whole network and all of the other five algorithms just utilize the local information.

Evaluation Criterion of Algorithms
We adopt two criteria: the maximum connectivity coefficient [40] and the decline rate of network efficiency [41,42] to calculate the connectivity of the network after attack, in order to evaluate the effectiveness of the node importance identification algorithms. After important nodes get attacked, the connectivity of the network will turn worse. The more important the nodes, the worse the connectivity of the network.

Maximum Connectivity Coefficient
The maximum connectivity coefficient G can be calculated as follows: where R represents the number of nodes in the maximum connected component after attack and N represents the total number of nodes in the network. The faster the decrease in G, the more efficient the attack strategy.

Decline Rate of Network Efficiency
The most direct effect of node removal is causing the shortest distance between nodes to become longer or even making the nodes unreachable. The network efficiency represents the strength of network connectivity after removing some nodes and can be described as where N is the total number of network nodes and η ij is the network efficiency between node pair i and j, η ij = 1/d ij , d ij is the shortest path between nodes i and j. When nodes i and j are not connected, η ij = 0. In order to analyze the effect on network efficiency of removing nodes more directly, the decline rate of network efficiency µ is adopted, which is defined as where η 0 is the efficiency of the original network and η is the efficiency of the network after removing nodes. The higher the µ value, the more significant the effect on network efficiency of removing nodes and the more important the removed nodes.

Data Description
To evaluate the performance of the proposed method, we apply it to real and synthetic networks. The real networks include: USAir (American aviation network) [43,44], Netscience (Scientist cooperation network) [45,46], Infectious (People infection network) [47,48], USAirport (American airport network) [49,50], Yeast (Protein interaction network) [51,52] and Power (Power grid of the western United States) [2,39]. The statistical properties of the six networks are presented in Table 1. It can be seen from the table that the six real data sets used in this paper have their own characteristics and are representative, which can well verify the effectiveness of the algorithm. In addition, one synthetic small-world network is used, which was generated by a Watts-Strogatz model [2] with the parameters N = 6000, K = 6 and p = 0.1. The synthetic network is denoted as WS in this paper.

Results and Analysis
On the real and synthetic networks, take degree centrality, k-shell algorithm, WL algorithm, ego betweenness centrality, LLS algorithm and the proposed NBF algorithm as attack strategies and rank the nodes in the network by the five algorithms. Then, remove a fraction of top important nodes according to the ranking result in a static and dynamic manner. Analyze the changes of the maximum connectivity coefficient and the decline rate of network efficiency when the nodes are removed, and verify the effectiveness of the proposed algorithm at last. Static network attacks refer to the network nodes being removed in descending order of node importance calculated initially, regardless of the impact of network structure changes due to node removal. In contrast, dynamic network attacks mean that only the most important node is removed in each round of attack, and all node importance in the remaining network needs to be recalculated each time. Figure 2 shows the comparison of the network maximum connectivity coefficient G subjects with different static attack strategies on real and synthetic networks. As can be seen from Figure 2, in the static attack mode, the NBF attack strategy corresponds to the fastest decline of G, that is to say, the proposed algorithm performs the best when identifying node importance. The advantage of the NBF algorithm is more obvious especially when the rate of removed nodes is small. For example, in the USAirport network from Figure 2a, when the rate of removed nodes is less than 15%, the decline speed of G is much more faster. In real applications, 15% is a very big attack rate when taking into account the cost of attack which has to be paid for. Since the NBF attack strategy can make the network fragment the most when removing a small rate of nodes, and also guarantee a great strike effect when removing a large rate of nodes, the proposed NBF algorithm performs the best on ranking node importance and has the highest application value. In addition, one can find that ego betweenness centrality attack strategy has the second best attack effect in four networks, LLS attack strategy has the second best attack effect in one network and WL attack strategy has the second best attack effect in one network. In order to further verify the efficiency of our NBF method, the experiment analyzes the changes of the network maximum connectivity coefficient in the dynamic attack mode, as shown in Figure 3. Observing the experimental results, we can also find that the maximum connectivity coefficient under the NBF dynamic attack strategy decreases faster than other strategies, which shows that the NBF algorithm is more accurate than other algorithms for ranking node importance. The design principle of the NBF algorithm considers the bridging feature of the nodes, so the structure importance of the network nodes can be more effectively sorted. The experimental results verify this point. In addition, it can be observed that, when the network nodes are removed by k-shell method, the network has the worst fragmentation effect. This is because the k-shell method can not distinguish the importance of nodes in the same shell layer. Comparing the static attack and dynamic attack results of each algorithm in the same data set in Figures 2 and 3, it can always be observed that the dynamic attack effect is better than the static attack effect. This is due to the fact that, in the dynamic attack mode, the importance of nodes is recalculated when removing a node, ensuring that each attacked node is the most important node in the current network. However, in the static attack mode, the network structure changes with the removal of nodes, and the importance of the nodes may be greatly reduced due to the drastic changes in the network structure.

Experimental Results on the Maximum Connectivity Coefficient
In addition, we find that the decline speed of G is related to the network structure, especially the average degree k of the network. The Power network and the Netscience network have the lowest and second lowest average degree, respectively. In addition, the two networks are almost down after removing the top 10% important nodes under most attack strategies. The reason is that less average degree means less edges for the node pairs, so, after removing a small fraction of important nodes, the connectivity of the network turns bad very fast. Thus, we just show the removing process of the top 30% important nodes instead of the total nodes in the two networks. Figure 4 reflects the changes in the decline rate of network efficiency µ when nodes are removed in descending orders according to the importance ranking by different algorithms. The larger the value of µ, the more significant the decrease in network efficiency, and the more accurate the node importance identification algorithms. It can be observed that, in the static attack mode, the value of µ under the NBF attack strategy is larger than other strategies, which shows that the NBF algorithm is more accurate than other algorithms. Similar to the results on the network maximum connectivity coefficient, the NBF algorithm has more obvious advantages when the proportion of node removal is small, so the proposed algorithm has the best application value. Furthermore, we investigate the decline rate of network efficiency µ subjects with different dynamic attack strategies, and the result is shown in Figure 5. The result is consistent with that of static attack strategies. It can be seen from the figure that the NBF algorithm designed in this paper has the highest impact on network fragmentation compared with the other five algorithms. The effectiveness of the proposed algorithm is further verified by different evaluation criteria.

Experimental Results on the Decline Rate of Network Efficiency
In summary, the comparison experiments on the maximum connectivity coefficient and the decline rate of network efficiency show that the proposed algorithm performs better than degree centrality, k-shell algorithm, WL algorithm, LLS algorithm and ego betweenness centrality in static attack strategies and in dynamic attack strategies. The advantage of the proposed algorithm is more obvious in the initial stage of attack. In addition, the dynamic attack effect is better than the static attack for the same importance ranking algorithm.

Complexity Analysis
When computing the shortest paths of node pairs within 2-hops neighbors for the target node in NBF algorithm and ego betweenness, we just compute the square and cube of the adjacency matrix of the network and check the elements in the new matrix instead of using a Dijkstra algorithm, which makes it much faster. The method is also used in [30].
The computational complexity of the six methods is shown in Table 2, where n is the total number of nodes in the network, m is the number of edges and k is the average degree of the network. From the table, we can see that the computational complexity of k-shell is O(m), which is the lowest, but, from the experimental results, one can see that it performs the worst. WL algorithm and degree centrality have the second lowest computational complexity, but the attack effect is also not good.
The computational complexity of NBF algorithm is O(n k 2 ), which is equal to that of ego betweenness and LLS algorithm. Although NBF algorithm and ego betweenness have the same computational complexity, ego betweenness just considers neighbor nodes within 1-hop, while the NBF algorithm takes into account neighbor nodes within 2-hops. The actual computing time of ego betweenness is less than that of the NBF algorithm. Both the NBF algorithm and LLS algorithm consider neighbor nodes within 2-hops, and the computational cost of the two algorithms is almost the same. One can see from the experimental results that the NBF algorithm, ego betweenness and LLS algorithm outperform the other algorithms in most cases and the NBF algorithm performs the best in almost all cases. In summary, NBF can get the best attack effect in reasonable time when compared with other outstanding algorithms, which makes it applicable in large-scale networks. Table 2. The computational complexity of six methods.

Method
Information Computational Complexity

Conclusions
The identification of node importance in complex networks is of theoretical and practical significance for improving network robustness and invulnerability. By analyzing the neighborhood structure of the target node, we propose a node importance identification algorithm based on a node bridging feature. The algorithm just needs neighborhood information within two hops of the node for computing instead of global information, which makes it applicable in a large-scale network. The robustness simulation experiments on real networks and synthetic networks show that the proposed algorithm performs better than degree centrality, k-shell algorithm, WL algorithm, LLS algorithm and ego betweenness centrality under two network connectivity evaluation criteria whether in static or dynamic attack strategies. Especially in the real applications where the cost of network attacks is limited, the advantage of the proposed algorithm is more obvious.
We also find that the dynamic attack effect is better than the static attack for the same node importance identification algorithm. This is due to the fact that, in the dynamic attack mode, the importance of nodes is recalculated when removing a node, ensuring that each attacked node is the most important node in the current network. Therefore, both proposed algorithms and attack strategy construction are key factors for the invulnerability of the complex network, which guides the way to construct and maintain more robust networks.
In addition, we discover that attack effect is related to the network structure, especially the average degree of the network. This is because a less average degree corresponds to less edges of the node pairs; thus, after removing the important nodes, the connectivity of the network turns bad very fast. The Power network and the Netscience network have the lowest and second lowest average degree, respectively, and the two networks are almost down after removing the top 10% important nodes under most attack strategies.
The algorithm designed in this paper is for the undirected and unweighted networks. In real networks, the connections between nodes usually have directions, and each connection has different weights. It is easy to know from the expression of the NBF algorithm that the algorithm can be extended to directed weighted networks, which is the focus of future research. In addition, an algorithm that is optimal for one network may get sub-optimal results in a different network, and it is almost impossible to design a universal ranking algorithm which outperforms the best in all networks. In order to get more universal conclusions, we will test our algorithm on more real and synthetic networks in the future.