Identification of Key Nodes in a Power Grid Based on Modified PageRank Algorithm

For avoiding the occurrence of large-scale blackouts due to disconnected nodes in the power grid, a modified PageRank algorithm is proposed to identify key nodes by integrating the topological information and node type. The node betweenness index is first introduced based on complex network theory, which is modified to reflect the node topological information in the power grid. Then, according to the characteristics of different node types in the power grid, a modified PageRank algorithm is proposed to rapidly identify key nodes, which takes the generator nodes, load nodes, and contact nodes into account. IEEE 39-Bus system and IEEE 118-Bus system are used for the simulations. Simulation results showed that the network transmission efficiencies of the power grid are reduced from 64.23% to 5.62% and from 45.4% to 5.12% in the two simulation systems compared with other methods. The proposed identification algorithm improved the accuracy, and a provincial power grid simulation system in China is used to verify the feasibility and validity. The identified nodes are removed, which split the power grid according to importance index values. The proposed method in this paper is helpful to prevent the occurrence of cascading failure in the power system, and it can also be used to power systems with renewable energy sources and an AC/DC hybrid power grid.


Introduction
In a large-scale power grid, natural disasters, deliberate assaults, element failures, and other faults may cause large-scale blackouts [1]. Blackouts that occur in the power grid are considered great impact events and may cause large load shedding and even serious social impacts [2,3]. Usually, a blackout develops with one or several failures of power grid elements, which are referred as key elements, such as key generators, transmission lines, transformers, or power load nodes. Generators and power load nodes are at the core of power production and consumption in the power grid, and their failure will have a serious impact on the operation of the power grid. Authors [4] have proposed a new load distribution law to emulate the power grid, where the initial generation of generators and the initial loads of substations are calculated according to the path efficiency and the load of the consumers. The importance of generator nodes and power load nodes is discussed. For avoiding the occurrence of large-scale blackouts, it is important to identify key nodes in the power grid. In the literature, the identification of key nodes in the power grid can be divided into two aspects: dynamic analysis methods and static analysis methods.
For dynamic analysis methods, transmission line faults and load changes are commonly referred to identify key nodes. In [5], a cascading failure model is proposed based on complex network theory by combining the node overload failures and hidden failures of transmission lines in blackouts. In terms of voltage stability analysis, authors in [6] proposed a new index for identification of vulnerable nodes to improve voltage stability based on reactive compensation. In [7], the influence of different faults is analyzed, and a quantitative coupling degree method is proposed to identify key nodes from the regional power grid in a transient process. From the characteristic analysis of network load, a cascading failure model is established based on the network important assessment index, which is considered the load oscillation degree of the attacked nodes in [8]. In order to avoid only power sources re-energizing the critical loads, authors in [9] proposed a new look-ahead restoration strategy for re-energizing the critical loads. The aforementioned methods can identify the key nodes from the view of operation characteristics in the power grid, but the topological structure is ignored. The key node identification method of the integrated grid topology and operation parameters can fully reflect the importance of nodes in the power grid.
In the past decade, static analysis methods that identify key nodes in the power grid have gradually grown, for example, complex network centrality in [10], topological and controllability features in [11], and electrical betweenness combined with generation rated capacity and load change in [12,13]. Expanded betweenness has been proposed, which considers transmission distribution factors and transmission ultimate capacity [14,15], the network response structural characteristic indexes have been formulated in terms of the Kirchhoff matrix [16], and the bus dependency matrix is established by the maximum power flow of the shortest path and node [17]. Additionally, an improved structure holes theory has been defined by the relative importance between the node and its adjacent node [18]. Because the above indexes and methods only partly consider factors of power grid nodes, the identification results are inaccurate. Based on the topological structure and electrical characteristics, the multi-index evaluation algorithm has been proposed by a different comprehensive method [19]. Authors in [20] proposed a ranking process method, which includes both static (via optimal power flow) and dynamic (via transient stability) performance analyses, to assess deterministic indices. Authors in [21] proposed a Coupling Strength Matrix (CSM) method, which is based on Network Structural Characteristics Theory and the Relative Electrical Distance (RED) between nodes in the network. The basic idea is to establish a power grid model that can depict the actual grid characteristics based on graph theory or complex network theory, and then establish indexes to identify important nodes. The operating characteristics and node types of the power system need to be considered.
Recently, due to the high speed and accuracy for identifying important nodes in a directed network [22], the famous PageRank algorithm has received much attention in many fields. In a power grid, a modified PageRank algorithm is presented to assess the node importance, which considers the characteristics of nodal load properties, transmission ultimate capacity, and model structure [23]. In [24], a simplified connection diagram is constructed to reveal the cascading failure characteristic with hidden failures, and a modified PageRank algorithm is designed to assess ultra-fast, vulnerable transmission lines in large-scale power grids. The modified sorting algorithm PageRank, called hypertext induced topic selection (HITS), is proposed to identify key nodes in the power grid, which is modified based on power flow, load capacity, and power source in [25]. In reference [26], according to the improved PageRank algorithm and optimization coefficient of each node, the important nodes are determined in the distribution network; those algorithms are obtained by the modified PageRank algorithm, which can assess the importance of nodes and identify the key nodes by iteration. However, the node importance varies with node type in the power grid. It is necessary to assess node importance according to the characteristics of different node types. Meanwhile, in a radial distribution network, AC/DC hybrid power system, the modified transfer matrix H should be defined according to the topology and operation characteristics. Therefore, based on the topological information, the characteristics of node type are introduced in the PageRank algorithm for identifying key nodes of the power grid as follows.
The structure of the remainder of this paper is as follows. Section 2 states the preliminary complex network theory and power grid model. The modified PageRank algorithm is proposed for identification of key nodes by considering node type and state information transfer in Section 3. In Section 4, based on IEEE 39-Bus system, IEEE 118-Bus system, and a provincial power grid simulation system, simulation results are compared and discussed with other approaches in detail. Finally, discussions and conclusions are given in Section 5.

Complex Network Theory
Based on complex network theory, a real-world network model can be abstracted into a graph composed of nodes and edges. Let the graph G = (V, E), V = {v 1 , v 2 , . . . , v n } be the set of nodes with n vertices, and E = {e 1 , e 2 , . . . , e m } is the set of edges with m edges. Then, the n × n adjacency matrix B = β ij can describe the connection between nodes in graph G; that is, when node i and node j are connected together, β ij = 1, otherwise, β ij = 0. Several topological properties have been be proposed in the past to capture the structural characteristics of various networks, as listed below [27].
(1) Degree Degree is the fundamental character of nodes. It depicts the connection relation that one node is linking with others. The degree of node i in graph G is defined as where β ij is the element of the adjacency matrix B, and n is the number of nodes in graph G.
(2) Betweenness Betweenness can describe the interaction of the complex network, which is divided into node betweenness and edge betweenness [28]. Node betweenness is defined as the proportion of shortest paths between any pair of nodes that travel through the node, which can be denoted by where V is a set of nodes in network, and i and j depict the node in graph G. µ ij means the total number of shortest paths between nodes i and j; µ ij (p) means the number that pass thorough node p in the shortest path between nodes i and j.

Power Grid Model
Based on the complex network theory and graph theory, the actual power grid can be considered a large complex network with nodes and edges. In the power grid, the buses can be simplified as the nodes, and the transmission lines and transformer branch can be considered as the edges. Supposing the network is represented as an undirected and unweighted graph G = (V, E), with n vertices in set V and m edges in set E, the adjacency matrix B G can be used to define the connectivity of the graph. The elements of the matrix B G , i.e., β ij , denote the weight for the edge connecting the two vertices. β ij = 1 means that there is a connection between node i and node j; β ij = 0 denotes there is no connection between i and j.
The unweighted and undirected graph G ignores the direction of power transmission in the actual operating power grid. Therefore, according to graph G and the basic data of the power grid, we can ensure the direction of the edges. At this time, graph G can be further abstracted as a direct weighted network, denoted by D = (V, E, W), where W is the weight vector that consists of each line reactance. Then, the adjacency matrix B D is used to depict the direction of edges linked by two nodes.
An actual network structure of the IEEE 5-Bus system is shown in Figure 1a. The graph G and network D are obtained by using the above method, which is shown in Figure 1b,c. From Figure 1b, the degree of nodes is [2, 3, 3, 1, 1] T , the edge betweenness is [3, 3, 1.5, 1.5, 1] T , and the nodes 2 3 and lines 1 2 are more important than other nodes and lines. From Figure 1c, the out-degree of nodes is [0, 2, 1, 1, 1] T , the edge betweenness is [2, 3, 0.5, 1.5, 1.5] T , and node 2 and line 2 are more important.

PageRank Algorithm
PageRank algorithm was firstly proposed by the founders of Google in 1998. It ranks the importance or class of a webpage by the hyperlink structure of the web system [22]. At the heart of Google's search engine, the PageRank algorithm ranks high importance when a webpage is pointed out by other important pages with high PageRank values. In addition, based on complex network theory, the web system can be abstracted as a directed graph, in which nodes correspond to each webpage and edges correspond to the hyperlinks between two webpages. The PageRank value (denoted by PR) of webpage P i -th is PR(P i ), and the PR of a page is calculated as follows: where D(P i ) is the set of pages pointing to P i , and N(P j ) is the number of out-links from page P j . The PR(P j ) in (3) is unknown. Thus, an iterative process is introduced to solve the problem. The iterative formula is defined as Assuming a web system with n pages, the PR of a page can be calculated by (3). In order to intuitively view the PR value of pages before and after iteration, we can introduce an n × n matrix T and an n × 1 vector R. Then, (4) is written compactly as where R k is the PR value vector of all pages after the k-th iteration. T is a row-normalized hyperlink matrix. If there is a link from i to j, T ij = 1/N(P j ); otherwise, T ij = 0.
T is a sparse matrix, and it is also a stochastic transition probability matrix. However, the matrix can have zero rows, which means that those nodes have no out-links. In other words, the web system has dangling nodes. To solve the dangling nodes in the web system and ensure convergence, a new matrix G is proposed, and (5) can be written as where In (6)-(8), the n × n matrix G is referred to as Google matrix. α ∈ [0, 1] is a parameter that controls the proportion of time and follows the hyperlinks to randomly enter a new page, which is generally 0.85. e is a column vector of all ones. The binary vector a indicates the dangling node vector. If webpage i is a dangling node, a i = 1; otherwise, a i = 0. H is a transfer matrix that demonstrates the relationship and information transmission between a webpage and its linked webpages. Matrix H is the combination of matrix T and αe T /n. Thus, the zero rows of T are replaced with e T /n, indicating that each dangling node is connected to other nodes equally.
To ensure that the iteration process in (6) can still converge to a unique positive vector when the initial value is arbitrarily chosen, the Google matrix G is verified to satisfy the properties of being stochastic, irreducible and aperiodic.

PageRank Algorithm Applied to the Power Grid
The PageRank algorithm was originally used to sort internet pages. According to the simplification principle of complex network theory, both internet and power grid can be simplified to a directed-weighted network model. The nodes correspond to buses, edges correspond to the transmission lines, and the connection strength between nodes is represented by the line reactance. The comparison of the directed-weighted network model, internet, and power grid topology is shown in Table 1. Based on the above comparison, the power grid can be simplified to the directedweighted network model and satisfy the application conditions of the PageRank algorithm. The PageRank algorithm can be applied to rank the importance of nodes in the power grid, but it has two shortcomings: (1) the electrical characteristics between nodes are not considered. (2) The transmission power between nodes is not distributed in equal proportion. Therefore, in order to overcome the above disadvantages, the following modified PageRank algorithm is proposed by considering node type and transmission characteristics to assess the node importance.

Modified Transfer Matrix
A power grid can be described as a directed-weighted network model D = (V, E, W), the n × n adjacency matrix B = β ij . When node i and node j are connected together j, β ij =1; otherwise, β ij = 0. To avoid equivalent power distribution between transmission lines in the power grid, the equivalent impedance is used to describe the connection weight between nodes [8], and it is calculated by where Z ij is the i-th row and j-th column element of the impedance matrix Z. Then, in order to describe the node influence from the view of topology, the node betweenness index is modified by combining the node degree and transmission characteristic. The weighted betweenness is obtained as follows: where δ ij means the total number of all possible paths between generator i and load j, and δ ij (p) represents the number of through the node p in all possible paths between generator i and load j. d p is the degree of node p. Different node types assume different responsibilities in the power grid; the node set V can be divided into three sets: the generator node set GN, the load nodes set LN, and the contact node set CN. The corresponding modified transfer matrix of the PageRank algorithm is proposed in the following.

Generator Nodes
Generator nodes mainly undertake the task of transmitting the power to other adjacent nodes. It is the first node of the power transmission path and guarantees uninterrupted power supply in the power grid. According to the actual power grid topology, the generator node has at least one outlet, and the node voltage and rated active power are known. In other words, the generator node is the PV node type.
To assess the importance of generator nodes, there are two aspects that should be considered: (1) how much output power the generator node transfers to other adjacent nodes; (2) the power weight value of the generator node itself. Therefore, considering the topological importance and power weight value of generator nodes, the modified transfer matrix H is defined by mapping the relationship of the adjacency matrix, shown as follows.
where GN is the set of generators; β ij is the element of the adjacency matrix B; D i is the set of out-links of generator node i; P Gi is the rated power of the generator node i; B w (i) is the weighted betweenness of generator node i. max(P G ) is the maximum capacity of generators in the power grid.

Load Nodes
Load nodes mainly consume the power supplied by generators. The load power, including active power and reactive power, is known. It can be considered that the load node is the PQ node type. Based on the power grid structure and the power flow direction, there are two types of load nodes: (1) the power is transmitted to the load node and all is absorbed; (2) the power is transmitted to load node and only part of the power is adsorbed, the remaining part is transmitted to other load nodes.
For the first type, we know that those nodes only have inflow power to supply the load. Thus, only the load capacity needs to be considered. The element of the modified transfer matrix H is defined by mapping with the adjacency matrix, shown as follows.
where DN is the set of load nodes, and S Di is the capacity of the load node i. For the second type, those load nodes not only consume power, but they also transmit power. Therefore, considering the topological importance and the load power, the element of the modified transfer matrix H is defined by mapping with the adjacency matrix, as given below.
where ω=|U max -U i /U i -U min | is the voltage deviation, and U i is the voltage of node i. U max and U min are the upper and lower voltage limits of nodes.

Contact Nodes
Contact nodes mainly undertake the task of power transmission and make the connection between generator and load. The total power of these nodes is always 0, and there is no power consumption. Therefore, introducing the topological information and voltage of contact nodes into the modified transfer matrix H, the elements are defined as follows.
Therefore, the modified transfer matrix H is defined by (3)- (6). The characteristics, such as the topological importance, the active power of generator, power load, and node voltage, are considered in the modified transfer matrix H. Thus, it is better and more comprehensive to represent the importance of nodes in the actual power grid.

Convergence of Google Matrix G m
It can be seen that the modified transfer matrix H is not stochastic. That is, the sum of rows is not equal to 1. For this reason, a background node is added to make the sum of rows equal to 1 in the modified transfer matrix H [29]. The modified matrix M is shown as follows.
where b is the n × 1-dimension vector. Based on the modified matrix M, for considering the convergence of Google matrix G m , the two vectors E and V are defined as follows.
Therefore, the modified Google matrix G m can be calculated by Because the nodes consist of two parts, i.e., power grid node and background node, the PR value of all nodes can be calculated by where PR gk is the PR value of power grid node after k iterations. PR bk is the PR value of background node after k iterations. G m is the (n + 1) × (n + 1)-dimension modified Google matrix.
To guarantee the convergence of the modified PageRank algorithm by the modified Google matrix G m , G m should satisfy the properties of being stochastic, irreducible and aperiodic.

Property 1. G m is stochastic.
Proof. Stochastic matrix is non-negative, and the sum of each row is equal to 1 [22]. The modified Google matrix G m in (9) is clearly satisfying the condition of G m > 0. The sum of rows is equal to 1 because the background nodes are added in the matrix H to guarantee that the modified matrix M is stochastic, and the two vectors of matrix E and V are also stochastic. Therefore, G m is stochastic.

Property 2. G m is irreducible.
Proof. Based on the conclusion in [22], a matrix is irreducible when the directed graph has strong coupling. Based on the defined (7) and (8), it can be proven that each element of the modified Google matrix G m satisfies the condition of G m-ij > 0. Therefore, the corresponding network of G m should have strong coupling, which shows its irreducibility.

Property 3. G m is aperiodic.
Proof. According to [22], a primitive matrix is irreducible, which has no less than one positive diagonal element. Meanwhile, a primitive matrix must be aperiodic [22]. Based on the basic PageRank definition, the (n + 1) th diagonal element of G m is 1, and other diagonal elements of G m also satisfy the condition G m-Ii > 0. Thus, G m is primitive, implying its aperiodicity.

Verification Indexes
With the disconnection of transmission nodes in the power grid, the power transmission path is damaged, and the power transmission capacity is affected. Network efficiency E is used to assess the network information transmission from the perspective of the entire network. It is defined as follows [30]: where n represents the number of power gird nodes, and d ij represents the shortest path distance between node i and node j. When the power grid node is disturbed, the power transmission path and transmission capacity will be affected. To verify the change of transmission capacity in the power grid, the network transmission efficiency of the power grid is defined as where E 0 is the network transmission efficiency under the normal running state, and E k is the transmission efficiency after the k-th fault of the power grid. Based on the modified PageRank algorithm in Equations (11)- (18), the convergence of the algorithm is also verified. According to the process of the modified PageRank algorithm, the process of identification of key nodes in the power grid is shown in Figure 2. Here, the initial PR value of each node is 1.

IEEE 39-Bus System
In this part, IEEE 39-Bus system is used as the simulation analysis system, which has 39 nodes, including 10 generator nodes and 19 load nodes. According to the capacity of the generator and power load, the 39 nodes can be divided into 9 generator nodes, 20 load nodes, and 10 contact nodes. The corresponding directed-weighted network model is shown in Figure 3. For verifying the accuracy of the modified PageRank algorithm, the PR values are calculated, and the key nodes are identified in the simulation test system. According to the sorting of PageRank values, the first 10 key nodes are picked to compare with the proposed modified PageRank algorithm (PMPR), basic PageRank algorithm (BPR), the combination of random matrix and entropy theory (CRMET), and the node importance index method (NII), which are shown in Table 2. For verifying the validity of the modified PageRank algorithm for key node identification, an intentional attack is used for simulation in the normal operation test system. The identified first 10 key nodes are removed in turn by PMPR, BPR, CRMET, and the NII method. The network transmission efficiency values are shown in Figure 4. From the simulation results, when all the 10 key nodes are removed, the network transmission efficiency of the power grid is lower with PMPR than that with CRMET and NII methods. The network transmission efficiency of the grid is 64.23% by the CRMET and 37.06% by NII method. Compared with the network transmission efficiency of the BPR and the PMPR, the network transmission efficiency of BPR is lower than that of PMPR by the first two nodes removed. Because node 4 and node 8 all have three ingoing lines and have 500 MW and 520 MW loads, respectively, these two nodes are removed and may greatly influence the power grid more than the PMPR method, which removed nodes 6 and 29, from the network transmission efficiency perspective. However, nodes 6 and 29 are connected to generators 31 and 38, which serve as the key nodes for the generator to transmit power, from the perspective of node types; nodes 6 and 29 are more important than nodes 4 and 8. After removing all the 10 identified key nodes by the BPR, the network transmission efficiency was 37.64%. However, after removing all the 10 identified key nodes by the PMPR, the network transmission efficiency was as low as 5.62%. The results show that the identified 10 key nodes by the PMPR have more influence on the transmission capacity than the other three methods.

IEEE 118-Bus System
In the static deliberate attack mode, the IEEE118-Bus system is also carried out to assess the node importance. The identified key nodes are removed, in turn, by the proposed modified PageRank algorithm (PMPR); the modified PageRank algorithm (MPR), which takes the importance of nodal load, nodal load capacity, and network topology into account [23] but does not consider the node types; and the model based on co-citation (MBCC)-hypertext induced topic selection (HITS) algorithm (MBCC-HITS) [25]. The corresponding transmission efficiency values are shown in Figure 5. As shown in Figure 5, the transmission efficiency of the PMPR is larger than the MBCC-HITS method when the first 3 or 9 nodes are removed only. Because the 3rd node is node 49 in the MBCC-HITS method, node 49 is connected to the outgoing line of node 66, which is connected to a load and a generator. The 9th node is node 92, which is connected to the generator node. Removing them may have more influence on the power grid from the network transmission efficiency perspective. However, the network transmission efficiency indexes of the grid reduced to 45.4, 29.84, and 5.12% after all the first 20 identified key nodes were removed by MBCC-HITS, MPR, and PMPR. The reduction in network transmission efficiency with PMPR was greater compared to the MPR and MBCC-HITS method. In other words, the identified key nodes by PMPR had the greatest impact on the power grid. Therefore, compared with the other two methods, the PMPR proposed in this paper takes topological information, node type, and operating characteristics into account comprehensively, and it is more accurate for identifying key nodes in the power grid.

A Provincial Power Grid Test System
For verifying the practicability of the proposed modified PageRank algorithm, key nodes were identified in an actual operation power grid. The directed-weighted model of a provincial power grid simulation system in China was established, where the corresponding electrical parameters were known. The node importance index values of the provincial power grid were obtained as shown in Figure 6. According to the descending order, the first 15 non-generator nodes were selected as the key nodes of the system, as shown in Table 3. Figure 6. Node importance index value in a provincial power grid test system. Table 3. Key nodes in the provincial power grid test system.

Ranking
Node Importance Ranking Substation  Voltage/kV   1  98  YX1  330  2  103  MY1  330  3  64  NL1  330  4  79  LF1  330  5  95  YD1  330  6  63  CY1  330  7  42  YH1  750  8  68  XY1  750  9  99  ZY1  330  10  96  GY1  330  11  102  XS1  330  12  90  XX1  330  13  54  HL1  330  14  106  TY1  330  15  97  HZ1  330 The key nodes shown in Table 3 have two 750 kV substations in the top 15 key nodes. It means that they mainly undertake power transmission and conversion, and they play a pivotal role in the provincial power grid. Removal of the nodes will make the power grid split, and the new energy output power in a certain area cannot be fully consumed, which will reduce the utilization rate of new energy. The remaining key nodes are 330 kV substations, among them, nodes 103, 64, 79, 63, 54, and 106 are all connection nodes for the power output of thermal power plants. The removal of nodes will lead to the separation of thermal power plants from the main grid and even result in power imbalance in the grid. In conclusion, simulation results verify that the identified key nodes of the proposed modified PageRank algorithm are reasonable and practical.

Discussion and Conclusions
By analyzing the complex network theory and the famous PageRank algorithm, a modified PageRank algorithm is presented to assess the node importance of a power grid. The proposed method comprehensively takes the following factors into account: network topology, nodal type, and state transmission information. Theoretical analysis and case studies show the necessity of considering these factors to identify the key power grid nodes and the validity of the modified PageRank algorithm. However, it is not comprehensive to construct the modified transfer matrix by the proposed three indexes only. It needs to be analyzed by combining with the operation characteristics of the actual power system. In addition, it is necessary to study the distributed parallel computation to improve the fast solution of the modified PageRank algorithm from a technical point of view. Finally, the evolution process of the key nodes in power system needs further analysis to reveal its mechanism. Nevertheless, the following conclusions are still drawn from our study.
Simulation results indicate that the proposed method, by considering the node type and topological information, is more feasible and unbiased than the PageRank algorithm without considering electrical characteristics. The network transmission efficiency of the grid is as low as 5.62% with the proposed algorithms, compared with 64.23%, 37.06%, and 37.64% for MBCC-HITS, MPR, and PMPR, respectively, in IEEE 39-Bus system. The network transmission efficiency indexes of the grid reduced to 45.4, 29.84, and 5.12% by the MBCC-HITS, MPR, and PMPR, respectively, in IEEE 118-Bus system.
The validity of the key power nodes is verified. The proposed modified algorithm is more accurate and comprehensive, and it can be easily applied to the practical power grid based on the practical application of a provincial power grid test system. Based on the above research and analysis, the proposed method in this paper is helpful to prevent the occurrence of cascading failure in a power system and provide actual reference for power system operators. At the same time, the research can also be used to identify the key nodes of a power system with renewable energy sources and AC/DC hybrid power grids in future research.