Critical Nodes Identiﬁcation in Complex Networks

: Critical nodes identiﬁcation in complex networks is signiﬁcance for studying the survivability and robustness of networks. The previous studies on structural hole theory uncovered that structural holes are gaps between a group of indirectly connected nodes and intermediaries that ﬁll the holes and serve as brokers for information exchange. In this paper, we leverage the property of structural hole to design a heuristic algorithm based on local information of the network topology to identify node importance in undirected and unweighted network, whose adjacency matrix is symmetric. In the algorithm, a node with a larger degree and greater number of structural holes associated with it, achieves a higher importance ranking. Six real networks are used as test data. The experimental results show that the proposed method not only has low computational complexity, but also outperforms degree centrality, k-shell method, mapping entropy centrality, the collective inﬂuence algorithm, DDN algorithm that based on node degree and their neighbors, and random ranking method in identifying node importance for network connectivity in complex networks.


Introduction
Networks have become an attractive subject in complex system research, due to their universality for depicting a wide range of systems in nature and society [1][2][3]. This is based on the fact that a complex system which is composed of a set of components with connections between them can be graphically represent by a network graph, where nodes represent individual components and links stand for their interactions.
Networks are everywhere [4,5], ranging from nature and biological system to society, such as cellular networks [6], traffic networks [7][8][9], social networks [10], power grids [11], and many others. Research regarding evaluation of node importance for certain structural or functional objectives in complex network is a challenge of both theoretical and practical significance. In real life, complex networks or systems are not always secure. Threats to complex networks can come from two sources, random failures and target destruction [12,13]. Previous studies show that a large number of real systems such as the World Wide Web, the Internet, power grids, and airline networks, etc. are with notable heterogeneity [14], which means that the role of nodes in network structure and function varies greatly. These networks have a higher tolerance to random failures but are sensitive to intentional attacks.
Critical nodes are those special nodes that can affect the structure and function of the network to a greater extent. Protecting the key nodes of the network is of great significance for improving the robustness and survivability of the network [15]. How to design an effective algorithm to evaluate the importance of nodes has attracted a large number of scholars to study it in recent years [16]. Many methods have been proposed from different aspects to identify critical nodes, such as degree centrality (Deg) [17], mapping entropy centrality (ME) [18], the collective influence algorithm (CI) [19], semi-local centrality [20], coreness [21], and H-index [22] which are based on node local neighbors information. Eccentricity [23], information indices [24], Katz centrality [25], closeness centrality, betweenness centrality [26], and subgraph centrality [27] are designed from the perspective of number of paths for communication. In addition to the methods mentioned above, there are also many iterative refinement algorithms, which consider not only the number of neighbors but also the importance of its neighbors. Representative algorithms include the well-known PageRank [28], HITs [29], and eigenvector centrality [30], as well as some recently porposed algorithms such as LeaderRank [31] and TwitterRank [32].
Due to the rapid development of information technology, as well as the constantly expanding network scale and increasing complexity of the network, the design for evaluating algorithms on node importance in very large-scale networks is becoming a significant challenge nowadays. The importance evaluation of algorithms is computationally expensive if they are based on global information of the network. Therefore, the design of evaluation algorithm should not only be effective but also efficient. To address this, designing a sorting algorithm based on network local information would be a better solution. In this paper, we present a new approach called DSHC (based on local degree information and structure hole count) to evaluate node importance of the network. We summarize the following three contributions of the paper: (1) We draw inspiration from the theory of structural holes [33] and only consider the node's local neighborhood information to evaluate the importance of nodes. This makes the algorithm more computationally attractive for large-scale networks; (2) The proposed algorithm can effectively identify the hubs with numerous structural holes, which play important role in bridging different clusters of the network; (3) Empirical analyses on real and synthetic networks demonstrate that the proposed method can outperform Deg, k-shell, ME, CI, DDN [34], and random ranking method (Rand).
The outline of this paper is organized as follows: First, we introduce a theory for the study of competition relationship in social networks analysis according to the theory of structural holes and then, devise the evaluation algorithm to identify node importance and introduce the benchmark algorithms in Section 2. In Section 3, we introduce the indicators used to evaluate the accuracy of importance ranking. In Section 4, we present real network datasets for experiments. In Section 5, we compare the proposed algorithms with other existing methods in both synthetic networks and real networks. Finally, we provide a summary in Section 6.

Materials and Methods
An unweighted and undirected network G(V, E) with N = |V| nodes and M = |E| edges is considered in this paper. The network could be described by an adjacent matrix A(a ij ), where a ij = 1 represents there is a connection linking node i and node j, and a ij = 0 otherwise. The degree value of node i is represented by k i .

Measurement of Node Importance Based on Degree and Structural Hole Count
We begin our analysis by introducing a theory for the study of competitive relationships in social networks, based on the so-called structural holes. From a sociological perspective, structural holes are gaps between a group of indirectly connected nodes, and some individuals that act as structural hole spanners to fill the hole get more network benefits than their neighbors. Take Figure 1 as an example, there are three structural holes marked by dashed arrows associated with the intermediate node Ego. Compared with its neighbors A, B, C, and D, Ego will gain more network benefits than its neighbors, because there is no alternative communication channel between them. Obviously, the Ego node plays a significant role in maintaining the network connectivity and we can infer that the large degree value a node has and the greater the number of structural holes associated with the node, the more significant the node will be. In view of the analysis above, we leverage the property of structural holes to design an intuitive algorithm for quantifying node importance in maintaining the network connectivity, based on degree and structural hole count, expressed by where Γi represents the neighbor set of node i and Δij represents the number of structural holes formed between node i and j with node i as the intermediary. According to Equation (1), the larger the degree of a node and its neighbors, and the higher the number of structural holes between the node and its neighbors, this means that the stronger the irreplaceability of nodes in the structure, the smaller the value of DSHC will be. According to Figure 1, we calculate the DSHCEgo_A between node Ego and node A, there are two structural holes between node Ego and node A,{A-Ego-C, A-Ego-D}. Therefore, we can get DSHCEgo_A = ((1/4 + 1/2) × (1/(1 + 2))) 2 = 1/16 Futhermore, we calculate the DSHCEgo_B, DSHCEgo_C, and DSHCEgo_D, and sum them to get DSHCEgo.
The algorithm comprehensively considers the degree of the node and its neighbors and the information about the topological overlapping of its neighbors. A large DSHC value indicates that the neighborhood of the node is closely interconnected and forms a dense local cluster. Removal of such nodes usually does not significantly affect the network connectivity.
The DSHC algorithm is described as follows: In view of the analysis above, we leverage the property of structural holes to design an intuitive algorithm for quantifying node importance in maintaining the network connectivity, based on degree and structural hole count, expressed by

Algorithm 1 the DSHC Method
where Γ i represents the neighbor set of node i and ∆ ij represents the number of structural holes formed between node i and j with node i as the intermediary. According to Equation (1), the larger the degree of a node and its neighbors, and the higher the number of structural holes between the node and its neighbors, this means that the stronger the irreplaceability of nodes in the structure, the smaller the value of DSHC will be. According to Figure 1, we calculate the DSHC Ego_A between node Ego and node A, there are two structural holes between node Ego and node A,{A-Ego-C, A-Ego-D}. Therefore, we can get DSHC Ego_A = ((1/4 + 1/2) × (1/(1 + 2))) 2 = 1/16 Futhermore, we calculate the DSHC Ego_B , DSHC Ego_C , and DSHC Ego_D , and sum them to get DSHC Ego .
The algorithm comprehensively considers the degree of the node and its neighbors and the information about the topological overlapping of its neighbors. A large DSHC value indicates that the neighborhood of the node is closely interconnected and forms a dense local cluster. Removal of such nodes usually does not significantly affect the network connectivity.
The DSHC algorithm is described as follows:

Algorithm 1 the DSHC Method
Input: Network adjacency matrix A(a ij ), degree of network nodes k, the size of network N Output: The DSHC value of each node 1: for i = 1 to N 2:

Benchmark Methods
We use several popular heuristic methods that are also based solely on the network topology to investigate the performance of the proposed method. These measures include: 1. Degree centrality Degree centrality is a basic ranking algorithm to evaluate the importance of nodes. The degree of node i can be defined as 2. The k-shell decomposition method The k-shell decomposition algorithm categorizes the nodes into core nodes and fringe nodes. The algorithm steps are as follows: First, delete all nodes that have only one connection and assign those nodes to the 1-shell. In this process, there may be new nodes with k = 1, and then pruning is repeated until all nodes with k = 1 are removed. The removed nodes will be classified using a 1-shell. Next, this process continues in a similar way for nodes with degree k = 2 and gets the 2-shell of the network. The pruning is repeated until all network nodes are assigned to one of the shells.

The DDN method
The DDN method believes that the importance of a node is determined by the degree of the node and the degree of its neighbors, which is defined as where the weight of node is expressed as w ij = k i × k j .

The ME centrality
The ME centrality is based on local neighborhood information, which is defined as where P is the number of node i's neighbors.

The CI algorithm
The CI algorithm can be utilized to identify a set of key nodes whose removal is most efficient in destroying the network connectivity, which is defined as where ∂ball(i, ) is the boundary of the sphere centered on node i with radius , and is composed of all nodes whose distance from node i does not exceed . The CI radius of this article is set to 2.

Random ranking method
Ranking the importance of network nodes in a random order.

How to Evaluate the Performance
There are three kinds of metrics which are often used to evaluate the performance of ranking algorithm, namely susceptibility value S [35], the maximum connectivity coefficient [36] G, and the network efficiency [37,38] η. The change of network connectivity caused by removing node is equivalent to the importance of node.

The maximum connectivity coefficient
The maximum connectivity coefficient reflects the size of the maximum connected component of the network, which is attacked by removing node. Mathematically reads where R means the size of the giant component after node removal and N represents the number of nodes in the network. Obviously, the smaller G is, the better the attack strategy is.

Susceptibility
Susceptibility quantifies the change of network connectivity in response to the removal of network nodes. The gradual removal of network nodes breaks the network into many disconnected parts. In this process, the susceptibility value, S, usually has a peak value corresponding to a specific proportion, pc, at which the network collapse. In particular, if the network is broken multiple times during the process of gradually removing nodes, there exists multiple peaks. The susceptibility value S is defined as: where n s denotes the number of components with size equal to s and N represents the size of the network nodes. Obviously, the smaller the value of p c , the better the ranking algorithm.

The Decline Rate of Network Efficiency
Network efficiency reflects the quality of the network connectivity, which is defined as where η ij means the efficiency of node i and j, η ij = 1/d ij , and d ij denotes the shortest path length between node i and j. As network nodes are gradually removed, the average shortest path between nodes becomes larger, which makes the connectivity of the network worse. We adopt the decline rate of the network efficiency to analyze the disintegration effect of nodes removal, which is defined as where η represents the efficiency of the network after network attack and u 0 denotes the network efficiency of the original network.

Data Description
To validate the effectiveness of the DSHC method, six real complex networks including Facebook (Slavo Zitnik's social network on Facebook) [39], Erdos (scientific collaboration network) [40], USAir (American aviation network) [41,42], USAirport (USA airport network) [43], Yeast (yeast protein-protein binding network) [44,45], and Power (connections between US west power stations) [46] are used for a comparison experiment. The basic statistics of six real networks are shown in Table 1.

Results and Analysis
In order to verify the effectiveness of the DSHC indicator in measuring the importance of nodes, we compare the proposed method with K-shell, Degree, ME, CI, and Rand to simulate the intentional attack effect on real and synthetic networks by selectively removing the nodes in descending order of node importance. The experiments were simulated on MATLAB toolset.

Experiments in Real Networks
The experimental results in Figure 2 reflect the performance of different attack strategies on six real networks in terms of the maximum connectivity coefficient, which is estimated by numerical simulation of the change of relative size of the giant component of the network, as mentioned previously. It can be seen that in all six networks, using DSHC indicators to remove important nodes in the network, can make the scale of the network giant change the fastest. This indicates that the proposed DSHC index can identify the most important nodes more accurately in the network than the k-shell, degree, ME, CI, DDN, and Rand. For example, in the Yeast network in Figure 2, when p = 0.2, the G values of the k-shell, degree, ME, CI, DDN, DSHC, and Rand are 0.7154, 0.5314, 0.5903, 0.5781, 0.6143, 0.2080, and 0.7229, respectively, which indicates that the network connectivity becomes worst as compared with the k-shell, degree, ME, CI, DDN, and Rand when using the DSHC method to remove the top 20% nodes. The attack effectiveness of DSHC is 70.92%, 71.23% higher than that of k-shell and Rand respectively. Similarly, in power network, when p = 0.1, the G values of the indicators k-shell, degree, ME, CI, DDN, DSHC, and Rand are 0.7891, 0.1358, 0.2374, 0.3111, 0.3036, 0.0498, and 0.8197, respectively. This shows that when the top 10% of nodes are removed according to the DSHC index, the network connectivity is also the worst as compared with the top 10% of nodes removed according to the k-shell, degree, ME, CI, DDN, and Rand algorithms.  Figure 3 shows the susceptibility of the resulting networks as a function of proportion p of nodes removed, using different attack strategies, as discussed above. We can see that the DSHC method outperforms other methods in that the network collapses into many disconnected pieces with the minimum p. More specifically, the susceptibility reaches the maximum by removing about 3% of the nodes with the proposed method in the power network ( Figure 3). In contrast, the values, p, corresponding to the maximum susceptibility value of the CI and degree index are 3.56% and 4.09%, respectively. Overall, the performances of random schemes are inferior; the degree centrality can outperform the CI index, and our approach always outperforms other competing methods in these real networks.  Figure 3 shows the susceptibility of the resulting networks as a function of proportion p of nodes removed, using different attack strategies, as discussed above. We can see that the DSHC method outperforms other methods in that the network collapses into many disconnected pieces with the minimum p. More specifically, the susceptibility reaches the maximum by removing about 3% of the nodes with the proposed method in the power network ( Figure 3). In contrast, the values, p, corresponding to the maximum susceptibility value of the CI and degree index are 3.56% and 4.09%, respectively. Overall, the performances of random schemes are inferior; the degree centrality can outperform the CI index, and our approach always outperforms other competing methods in these real networks. Symmetry 2020, 12, x FOR PEER REVIEW 8 of 14 We further compare the changes in the decline rate of network efficiency when nodes are removed based on different importance sequences calculated by k-shell, degree, ME, CI, DDN, DSHC, and Rand. It can be observed that nodes removal according to the proposed method results in the largest decrease of the network efficiency. Taking network Facebook network in Figure 4 as an example, when p changes from 0.1 to 0.5, the curve corresponding to the DSHC algorithm is always higher than the curve represented by other algorithms. In other networks, it can also be seen that DSHC algorithm can achieve better results in a larger scale of p. We further compare the changes in the decline rate of network efficiency when nodes are removed based on different importance sequences calculated by k-shell, degree, ME, CI, DDN, DSHC, and Rand. It can be observed that nodes removal according to the proposed method results in the largest decrease of the network efficiency. Taking network Facebook network in Figure 4 as an example, when p changes from 0.1 to 0.5, the curve corresponding to the DSHC algorithm is always higher than the curve represented by other algorithms. In other networks, it can also be seen that DSHC algorithm can achieve better results in a larger scale of p.

Experiments in Synthetic Networks
In addition to the real network, we also verify our approach on synthetic small-world networks. We generate three small-world networks with N = 1000 by a configuration model [47], the number of neighbor nodes of each node in the nearest-neighbor coupled network is α = 8. We adjust the density of the network by adjusting the randomization reconnection probabilities μ, and generate three network datasets with μ = 0.06, 0.08, 0.1, respectively. From the results in Figure 5, we find that our approach generally outperforms the five existing methods. This is especially true when the networks have a low level of clustering, as shown in Figure 5b,c. This is because, in a network with a low level of clustering, the structural hole characteristics of the nodes in the network are more obvious, and therefore our algorithm produces an advantage. For a highly clustered network, removing nodes whose neighbors are densely interconnected with each other hardly blocks communication between

Experiments in Synthetic Networks
In addition to the real network, we also verify our approach on synthetic small-world networks. We generate three small-world networks with N = 1000 by a configuration model [47], the number of neighbor nodes of each node in the nearest-neighbor coupled network is α = 8. We adjust the density of the network by adjusting the randomization reconnection probabilities µ, and generate three network datasets with µ = 0.06, 0.08, 0.1, respectively. From the results in Figure 5, we find that our approach generally outperforms the five existing methods. This is especially true when the networks have a low level of clustering, as shown in Figure 5b,c. This is because, in a network with a low level of clustering, the structural hole characteristics of the nodes in the network are more obvious, and therefore our algorithm produces an advantage. For a highly clustered network, removing nodes whose neighbors are densely interconnected with each other hardly blocks communication between nodes' neighbors, because alternative communication channels for its neighbors still exist. Similar results can also be found in Figures 6 and 7. These results imply that nodes with higher structural holes count, and a larger degree are crucial for maintaining the network connectivity.
Symmetry 2020, 12, x FOR PEER REVIEW 10 of 14 nodes' neighbors, because alternative communication channels for its neighbors still exist. Similar results can also be found in Figures 6 and 7. These results imply that nodes with higher structural holes count, and a larger degree are crucial for maintaining the network connectivity. nodes' neighbors, because alternative communication channels for its neighbors still exist. Similar results can also be found in Figures 6 and 7. These results imply that nodes with higher structural holes count, and a larger degree are crucial for maintaining the network connectivity.

Conclusions
Ranking the importance of nodes in complex networks is a challenge of both theoretical and practical significance. In this paper, we leverage the property of structural holes to design a local algorithm to evaluate node importance in maintaining network connectivity. To compare the performance of the proposed measurement with other ranking methods, we investigate the node importance for network connectivity on real and synthetic networks. The experimental results show that our method can outperform the k-shell, degree, ME, CI, DDN, and random sort algorithm. In addition to this, comparisons with the methods using global information, show that the method based on the topological characteristics depends merely on local connectivity patterns, which is much more suitable for large-scale networks.
The algorithm proposed in this paper is used mainly in undirected and unweighted networks. When the local information of nodes is obtained, the structural importance of nodes in the network can be calculated whereas, in the real world, there is a cost limit to launching attacks in the network. Therefore, in future work, we should consider the cost of disintegrating nodes on the design of the node importance ranking algorithm.