Optimal Transmission Switching for Short-Circuit Current Limitation Based on Deep Reinforcement Learning

: The gradual expansion of power transmission networks leads to an increase in short-circuit current (SCC), which has an impact on the secure operation of transmission networks when the SCC exceeds the interrupting capacity of the circuit breakers. In this regard, optimal transmission switching (OTS) is proposed to reduce the short-circuit current while maximizing the loadability with respect to voltage stability. However, the OTS model is a complex combinatorial optimization problem with binary decision variables. To address this problem, this paper employs the deep Q-network (DQN)-based RL algorithm to solve the OTS problem. Case studies on the IEEE 30-bus system and 118-bus system are presented to demonstrate the effectiveness of the proposed method. The numerical results show that the DQN-based agent can select the effective branches at each step and reduce the SCC after implementing the OTS strategies.


Motivations
With the gradual expansion of power transmission networks, the electrical distance between substations has become shorter, which has, in turn, led to an increase in shortcircuit current (SCC). When the SCC magnitude exceeds the interrupting capacity of circuit breakers (CBs), the circuit breakers may not be able to interrupt the electric arc. In this case, the branch cannot be opened and, therefore, the short-circuit fault is not isolated, which will lead to damage to the CBs and, more importantly, will endanger the security of the power system. To address this problem, the replacement of CBs with higher interrupting capacity and the installation of a fault current limiter [1][2][3] have been proposed. However, investment in equipment is necessary for the above-mentioned countermeasures. On the contrary, network reconfiguration can reduce the SCC in an economical way, as it does not require investment in equipment.
However, transmission network reconfiguration is a complex combinatorial optimization problem, which is difficult to compute using conventional mathematical programming algorithms. Inspired by the success of reinforcement learning (RL) in solving combinatorial optimization problems, this paper employs the deep Q-network (DQN)-based RL algorithm to solve the OTS problem with the purpose of reducing the short-circuit current while maintaining the maximum loadability of the transmission network.

Optimal Transmission Switching
Transmission network reconfiguration is also called optimal transmission switching (OTS) [4]. It has been reported that OTS can be used to reduce transmission loss [5,6], relieve overloads and voltage violations [7,8], and reduce operating costs [4,9]. With the above-mentioned benefits, OTS has been incorporated with unit commitment (UC) [10,11] and transmission expansion planning (TEP) [12] in order to enhance the flexibility of transmission system's operation and planning. In [13,14], OTS and UC are coordinated to reduce short-circuit current. In the literature, OTS is usually modeled as a mixed integerprogramming (MIP) problem with massive binary variables that are related to each branch in the power network. Therefore, OTS is a complex combinatorial optimization problem. To enhance the computation efficiency, some efforts focus on the computational strategies including solution space reduction [15] and sensitivity analysis [16].

Application of Reinforcement Learning in Power Engineering
In recent years, reinforcement learning has gained more attention as an alternative method for solving combinatorial optimization problems [17]. In the field of power engineering, RL-based methods have been proposed for operation planning [18], voltage control [19], wide-area-damping control [20], and so on. In [21], a proximal policy optimization (PPO) is proposed to learn the control strategy for power systems' dynamic security. In [22], the multi-agent deep deterministic policy gradient (MADDPG) is proposed to regulate the static var compensators (SVCs) in order to enhance the voltage stability of urban power grids.

Organization of This Paper
The rest of this paper is organized as follows. The problem's description and formulation are discussed in Section 2. The deep reinforcement learning-based optimal transmission strategy is proposed in Section 3. In Section 4, case studies on two benchmark system are presented. Finally, the study's conclusions are presented in Section 5.

Computation of Short-Circuit Current
In high-voltage power networks, the short-circuit current of a three-phase short-circuit fault is usually higher than other types of short-circuit faults. Therefore, the three-phase short-circuit current is computed to determine whether the maximum short-circuit current has exceeded the interrupting capacity of the circuit breakers.
In addition, as the resistance is significantly smaller than the reactance for highvoltage transmission lines and transformers, the resistances of all the devices are neglected in practical applications [23]. Under this assumption, the nodal admittance matrix Y scc for short circuit current computation is different from the one for power flow computation. The elements in the nodal admittance matrix Y scc can be computed as follows (1) and (2): where Y ii and Y ij are the diagonal and the off-diagonal elements in Y scc . L is the set of branches, including the transmission lines and the transformers. x k is the reactance of the kth branch, while π k denotes the operating status of the kth branch. π k = 1 indicates that the kth branch is closed; otherwise, π k = 0 indicates that the kth branch is opened. G is the set of generators, while x dg is the d-axis sub-transient reactance of the gth generator. b Ci is the shunt capacitor at node i. After forming the nodal admittance matrix Y scc , the nodal impedance matrix Z scc can be computed by the inversion of Y scc : As we are focusing on the three-phase short-circuit current, the SCC of node i can be computed by (4): where I * scc,i is the per unit value of the SCC. V 0 i is the voltage magnitude under the normal operating condition and V 0 i can be approximated by 1.0 p.u. Z ii is the ith diagonal element of the nodal impedance matrix Z scc .

Formulation of Optimal Transmission Switching for Short-Circuit Current Limitation
In this paper, the optimal transmission switching strategy is studied from the perspective of transmission network development. During the long-term development of transmission networks, there may be a period in which the network is confronted with a short-circuit current problem. Instead of minimizing the operating cost by combining OTS with unit commitment, the proposed OTS model attempts to reduce the short-circuit current while maximizing the loadability of the transmission network. The objective of the proposed OTS model is three-fold, as given in (5)-(8): Here, where I limit scc,i is the maximum limit of the short-circuit current at node i and I scc,i is the real value of the short-circuit current. λ 0 and λ OTS are the maximum loadability coefficients computed by the continuation power flow (CPF) [24]. N L is the number of branches in the power network.
It is clear that the objective f 1 minimizes the over-current of SCC while the objective f 2 attempts to maintain the loadability of the power network after transmission switching. Furthermore, the objective f 3 is set to reduce the number of branches that need to be switched off. The constraints are listed as follows: (1) The network connectivity constraint. In other words, the transmission-switching strategy should not cause network splitting.
(2) The power flow constraint: (3) The branch power flow security constraint: (4) The bus voltage magnitude security constraint: where P G,i and P D,i are the active power generation and the active power load at node i, while Q G,i and Q D,i are the reactive power generation and the reactive power load. V i and V j are the bus voltage variables. G ij and B ij are the real part and the imaginary part of the corresponding element in the nodal admittance matrix for power flow computation. δ ij denotes the phase angle different between node i and node j. S ij is the power flow from node i to node j and S max ij is the maximum limit. V min i and V max i are the security limits of the bus voltage magnitude.

Brief Introduction to Deep Q-Learning
In the general framework of reinforcement learning, an agent interacts with the environment E and, more importantly, learns to select the actions a based on the rewards r provided by the environment. Intuitively, the environment E represents the problem to be solved. At each step t, the agent generates an action a t according to the partial or complete observation of the current state s t of the environment E based on its policy π(a t |s t ). After implementing the action a t , the environment E returns a reward r t+1 and the new state s t+1 to the agent. During the procedure of RL, the agent learns to improve the policy π(a t |s t ) in order to maximize the aggregated rewards.
The conventional algorithm for RL is the Q-learning algorithm. The optimal Qfunction Q * (s, a) can be defined as the maximum return that can be obtained starting from the current observation s by taking the action a and following the optimal policy thereafter. The optimal Q-function obeys the Bellman optimality equation as shown in (13): where E[·] denotes the computation of the expectation of the immediate rewards r and the maximum future rewards. γ is the discount coefficient. s and a are the possible next states and the corresponding actions. The basic idea behind many reinforcement learning algorithms is to estimate the Q-function by using the Bellman equation as an iterative update, as shown in (14): When the action space grows, it is impractical to use the Q-table to form the optimal policy. To address this problem, the deep Q-network-based RL algorithm [25] was proposed by Google DeepMind. In DQN, the neural network is used to approximate the Q-function as shown in (15): Then, the Q-network can be trained by minimizing a sequence of loss functions: Here, where y i is the target for iteration i and ρ(·) is a probability distribution over sequences and actions that is referred to as the behavior distribution. The parameters from the previous iteration θ i remain fixed when optimizing the loss function L i (θ i ).

The Proposed Methodology
We consider the procedure of optimal transmission switching for short-circuit current limitation as a Markov decision process (MDP). The settings of the MDP for optimal transmission switching are as follows.
(1) The environment. The targeted transmission network is considered as the interactive environment for the DRL agent. The computation of the power flow, short-circuit current, and maximum load can be used to compute the rewards.
(2) The state. The state of the environment is set as the network structure, which is represented by the operating state of the branches. In this regard, the state s can be formulated as (18): (3) The action. The DRL agent chooses a branch to be switched off at each step.
(4) The reward. The reward is an important component for reinforcement learning as the agent tunes the network parameter of the Q-network according to the reward. Based on the OTS model described in Section 2, the reward function can be defined by (19): where c sc = ∑ i∈B I limit scc,i − I scc,i I limit (5) The training procedure. During the training procedure, the DRL agent interacts with the environment and thus learns to maximize the reward by selecting the most prospective action. As the action is generated by the Q-network, which is a deep neural network that takes the state as the input and outputs the Q-values for all the potential actions, the training procedure can be viewed as the process that fine-tunes the Q-network. Firstly, the Q-network is initialized with random weights. At the start of each episode, the state of the environment is reset, which means all the branches are closed, and the initial network structure is retained. Then, we generate a random seed ε. If ε is lower than the threshold (usually 0.1), select a random branch; otherwise, the state is fed into the Q-network and then the branch that is related to the highest Q-value is selected. After selecting the branch, this branch is switched off, and the state and the corresponding network structure are updated. The reward in (19) is computed by determining the power flow, short-circuit current, and the continuation power flow computation. The record (s t , a t , r t , s t+1 ) is stored in the replay memory D. If the SCC at all the nodes is lower than the limit, the episode is terminated. The episodes repeat until the maximum episode is reached. In addition, during the interactive training procedure, when the size of the replay memory D is larger than the pre-set capacity N D , the recorded instances in the replay memory D are used to learn the weights of the Q-network by using back propagation algorithms such as the ADAM algorithm. The pseudo-code of the training procedure is demonstrated in Algorithm 1.

Algorithm 1 Training Procedure of OTS Agent
(1) Input: the network structure of the power system (2) Output: the well-trained Q-network (3) Initialize the Q-network and the replay memory D with capacity N D (4) for episode = 1 to M, do: (5) Reset the state (6) for t = 1 to T, do: With probability ε, select a random action; otherwise, generate the action via the Q-network (8) Update the state (9) Perform power flow computation, short-circuit computation, and continuation power flow computation according to the changed network structure under the current state (10) Compute the reward by (19)  (11) Store (s t , a t , r t , s t+1 ) in the replay buffer (12) If the SCC at all the nodes is lower than the limit, do: (13) End the loop of t (14) end if (15) end for (16) if the size of D is larger than N D , do: (17) Sample a minibatch of S samples from D (18) Update the parameters of Q-network by a gradient descent step on (16)  (19) end if (20) end for (6) Decision making for OTS-based short-circuit current limitation. With the welltrained Q-network, the Markov decision process for optimal transmission switching starts with the initial network structure. At each step, the Q-network generates an action that is related to the switching of a branch and is expected to obtain the highest reward. Implement the action and then compute the short-circuit current under the changed network structure. If there are any nodes at which the short-circuit current exceeds the interrupting capacity of the circuit break, the action of branch switching continues. Otherwise, if there is no node that suffers from a short-circuit current problem, the MDP for OTS ends, and the final network structure is used as the optimal solution.

Results
Case studies on the IEEE 30-bus system and the 118-bus system are presented herein to demonstrate the effectiveness of the proposed deep reinforcement learning-based optimal transmission-switching method. The data of these testing systems can be found in [26]. The sub-transient reactance of each generator in both cases is set uniformly as 0.1 p.u.

Illustrative Case Study on the Modified IEEE 30-Bus System
The network structure of the IEEE 30-bus system is shown in Figure 1. One transmission line from Bus-11 to Bus-21 is added as in [13] for the case study on the IEEE 30-bus system. Under this network structure, the short-circuit current magnitudes of all the buses are computed and are shown in Figure 2.  The short-circuit current magnitudes of all the buses in IEEE 30-bus system before transmission switching.
The maximum limit of short-circuit current is set to be 12 kA, and the objective is to reduce the SCC of the non-generator buses to this limit. According to the discussion in Section 3.2, the environment for OTS is set, and then the DQN-based agent is trained based on Algorithm I. Except for the branches that will cause islanding if they are switched off, the others are all considered in the action space of the agent. With the well-trained Qnetwork, the transmission-switching strategy for short-circuit current limitation is generated. During this decision process, the Branches 4-12 is switched off at the first step and then the Branches 6-9 is switched off at the second step. After these two steps, there is no  The short-circuit current magnitudes of all the buses in IEEE 30-bus system before transmission switching.
The maximum limit of short-circuit current is set to be 12 kA, and the objective is to reduce the SCC of the non-generator buses to this limit. According to the discussion in Section 3.2, the environment for OTS is set, and then the DQN-based agent is trained based on Algorithm I. Except for the branches that will cause islanding if they are switched off, the others are all considered in the action space of the agent. With the well-trained Qnetwork, the transmission-switching strategy for short-circuit current limitation is generated. During this decision process, the Branches 4-12 is switched off at the first step and then the Branches 6-9 is switched off at the second step. After these two steps, there is no The maximum limit of short-circuit current is set to be 12 kA, and the objective is to reduce the SCC of the non-generator buses to this limit. According to the discussion in Section 3.2, the environment for OTS is set, and then the DQN-based agent is trained based on Algorithm I. Except for the branches that will cause islanding if they are switched off, the others are all considered in the action space of the agent. With the well-trained Q-network, the transmission-switching strategy for short-circuit current limitation is generated. During this decision process, the Branches 4-12 is switched off at the first step and then the Branches 6-9 is switched off at the second step. After these two steps, there is no bus at which the short-circuit current exceeds the maximum limit. Then, the OTS strategy is generated, and the short-circuit current magnitudes after transmission switching are shown in Figure 3. It can be seen from Figure 3 that the SCCs are reduced below the limitation.
Energies 2022, 15, x FOR PEER REVIEW bus at which the short-circuit current exceeds the maximum limit. Then, the OTS strategy is generated, and the short-circuit current magnitudes after transmission switching are shown in Figure 3. It can be seen from Figure 3 that the SCCs are reduced below the limitation. Figure 3. The short-circuit current magnitudes of all the buses in IEEE 30-bus system after transmission switching.

Comparative Case Study with Conventional Genetic Algorithm
The proposed OTS model is a typical combinatorial optimal model with binary variables. Conventionally, this kind of optimal model is solved by evolutionary programming algorithms such as genetic algorithms (GA) [27,28]. To further demonstrate the effectiveness of the proposed method, a comparative case study is carried out. The individuals of the population are represented by (18), which is the state of the power network environment. The number of populations is 100, and the individuals are initialized by independent random sampling. The maximum iteration is 100. The mutation rate is 0.2, while the crossover rate is 0.9. The numerical results are shown in Table 1. While the OTS solutions of both methods are feasible, as the SCC of the non-generator buses has been reduced to lower than 12 kA and the minimum margins of the SCCs are comparable to each other, the maximum loadability of the proposed method is 4.0817 times of the base condition, which is higher than that of the genetic algorithm.

Scalability Case Study on the IEEE 118-Bus System
A case study on the IEEE 118-bus system is presented herein to validate the scalability of the proposed method. The maximum limitation in this case is 25 kA. The shortcircuit current magnitudes of all the buses before transmission switching are shown in

Comparative Case Study with Conventional Genetic Algorithm
The proposed OTS model is a typical combinatorial optimal model with binary variables. Conventionally, this kind of optimal model is solved by evolutionary programming algorithms such as genetic algorithms (GA) [27,28]. To further demonstrate the effectiveness of the proposed method, a comparative case study is carried out. The individuals of the population are represented by (18), which is the state of the power network environment. The number of populations is 100, and the individuals are initialized by independent random sampling. The maximum iteration is 100. The mutation rate is 0.2, while the crossover rate is 0.9. The numerical results are shown in Table 1. While the OTS solutions of both methods are feasible, as the SCC of the non-generator buses has been reduced to lower than 12 kA and the minimum margins of the SCCs are comparable to each other, the maximum loadability of the proposed method is 4.0817 times of the base condition, which is higher than that of the genetic algorithm.

Scalability Case Study on the IEEE 118-Bus System
A case study on the IEEE 118-bus system is presented herein to validate the scalability of the proposed method. The maximum limitation in this case is 25 kA. The short-circuit current magnitudes of all the buses before transmission switching are shown in Figure 4. It can be seen that the SCC of Bus-66 is the highest among all the buses in the testing system.   In this testing system, 103 branches are not allowed to be switched off due to islanding and N-1 reliability. The remaining 83 branches are used to form the action space. After the training of the DQN-based agent, the transmission-switching strategy for short-circuit current limitation is generated. The branches that are switched off during the decision process are Branch 65-68, Branch 60-61, and Branch 65-66. The short-circuit current magnitudes after transmission switching are shown in Figure 5. It can be seen that after the switching of these three branches, the short-circuit current can be reduced below the limitation, which further demonstrates the effectiveness of the proposed DRL-based OTS method. In this testing system, 103 branches are not allowed to be switched off due to islanding and N-1 reliability. The remaining 83 branches are used to form the action space. After the training of the DQN-based agent, the transmission-switching strategy for short-circuit current limitation is generated. The branches that are switched off during the decision process are Branch 65-68, Branch 60-61, and Branch 65-66. The short-circuit current magnitudes after transmission switching are shown in Figure 5. It can be seen that after the switching of these three branches, the short-circuit current can be reduced below the limitation, which further demonstrates the effectiveness of the proposed DRL-based OTS method.

Conclusions
To prevent the short-circuit current from exceeding the interrupting capacity of the breakers, an optimal transmission-switching model has been proposed in this paper to reduce the short-circuit current while maximizing the loadability of the transmission network. Considering that this optimal transmission-switching model is a complex combinatorial optimization problem with binary decision variables, the deep Q-network-based reinforcement-learning algorithm was proposed to search for the optimal solution. Case studies on two benchmark testing systems were presented.

Conclusions
To prevent the short-circuit current from exceeding the interrupting capacity of the breakers, an optimal transmission-switching model has been proposed in this paper to reduce the short-circuit current while maximizing the loadability of the transmission network. Considering that this optimal transmission-switching model is a complex combinatorial optimization problem with binary decision variables, the deep Q-network-based reinforcement-learning algorithm was proposed to search for the optimal solution. Case studies on two benchmark testing systems were presented.
The numerical results show that (1) the proposed method can select the effective branches at each step and reduce the short-circuit current after implementing the transmission-switching strategies, (2) the proposed method outperforms the conventional genetic algorithm in terms of the performance of the solution, and (3) the case studies on the IEEE 118-bus system demonstrate that the proposed method can be applied to transmission networks of different scales.