Next Article in Journal
Bio-Inspired Aerodynamic Noise Control: A Bibliographic Review
Previous Article in Journal
Longitudinal Displacement Behavior and Girder End Reliability of a Jointless Steel-Truss Arch Railway Bridge during Operation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid

1
School of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
2
Computer Information Systems Department, Buffalo State College, Buffalo, NY 14222, USA
3
College of Automation, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2019, 9(11), 2217; https://doi.org/10.3390/app9112217
Submission received: 22 April 2019 / Revised: 26 May 2019 / Accepted: 26 May 2019 / Published: 30 May 2019
(This article belongs to the Section Optics and Lasers)

Abstract

:
With the rapid growth of distributed energy sources, power grid has become a flexible and complex networked control system. However, it increases the chances of being a denial-of-service attack, which degrades the performance of the power grid, even causing cascading failures. To mitigate negative effects from denial-of-service attack and enhance the reliability of the power grid, we propose a networked control system structure based optimization scheme that is derived from a Stackelberg game model for the frequency regulation of a power grid with distributed energy sources. In the proposed game model, both denial-of-service attacker and control system designer as a defender are considered without using any analytical model. For defenders, we propose a sparse neural network based DES control and system structure design scheme. The neural network is used to approximate the desired control output and reinforce signals for the improvements of short- and long-term performance. It also introduces the sparse regulation of column grouping in the neural network learning process to explore the structure of control system that involves the placement of sensor, distributed energy sources actuator, and communication topology. For denial-of-service attackers, the related attack constraints and attack rewards are established. The solution of game equilibrium is considered as an optimal solution for both denial-of-service attack strategy and control structure. An offline optimization algorithm is proposed to solve the game equilibrium. The effectiveness of proposed scheme is verified by two cases, which illustrate the optimal solutions of both control structure and denial-of-service attack strategy.

1. Introduction

With the development of network techniques, electricity supply via the modern power grid is increasingly depending on networked control systems (NCSs). In the power grid and communication integrated network, the efficiency and reliability of power grid are gradually enhanced [1,2]. However, new network control techniques generate related vulnerabilities in the control system of power grids. As the connection of virtual and physical worlds, cyber attacks against NCSs can render the large disturbances of power grid that have been confirmed during the past few years [3].
As a common type of cyber attack, denial-of-service (DoS) attack occupies communication resources to prohibit the transmission of measurement and control signals in NCSs [1]. Compared with deception attacks [4], DoS attacks not only require little prior knowledge about NCSs, but also destroy control operations in real time [5]. In particular, blocking real-time control signals can cause the instability of the power grid [6]. A lot of security control approaches, such as stochastic time delay system, triggering strategy, and game theory, have been applied to the prevention of DoS attacks in power grid.
Stochastic time delay system: In a stochastic time delay system, DoS is modeled as a stochastic process with a delay in the signal. Subject to intermittent DoS attacks, An et al. proposed a decentralized solution of adaptive output feedback control for a power grid [7]. A switching-type state estimator is presented to estimate the state of power grid by using discontinuous output measurement. Sun et al. modeled DoS attacks as a Markov process that converted DoS attacks to stochastic noises of NCSs first [8]. Then, a resilient control model was used to process the converted noises.
Triggering strategy: Event-triggering time-sequence of control signal is adopted to reduce communication costs in the system. The triggering time sequence is able to defend against DoS attacks. Peng et al. proposed a resilient event-triggering based frequency control method for power system control in energy-limited DoS attacks [1]. The proposed event-triggered communication scheme can tolerate a certain degree of data loss in the open communication induced by energy-limited DoS attacks. Hu et al. designed a periodically resilient event-triggering communication scheme to identify DoS attacks initiated by power-constrained pulse-width-modulated jammers [9].
Game theory: In game theory based cyber security model, a game model is constructed for both DoS attacker and defender to obtain a Nash equilibrium as an optimal solution. Li et al. modeled the interactions between the transmission power of sensors and the interference power of DoS attacker by a signal-to-interference-and-noise-ratio (SINR) based network [10]. A modified Nash Q-learning algorithm was proposed to analyze the related interactions as well. Yuan et al. analyzed the resilient control issues of NCS under DoS attack via a unified game approach [5]. Yuan et al. also built a multi-stage hierarchical game with a corresponding hierarchy of decisions that was implemented to achieve a resilient control system [11]. According to optimal control structures, optimal criteria were constructed for DoS attackers and cyber defenders. Ding et al. modeled the remote estimation under DoS attacks by using the strategy of zero-sum stochastic games and presented a monotone structure to solve the proposed model optimistically [12]. At the same time, Ding et al. also formulated the decision-making process of a target channel as a two-player zero-sum stochastic game framework and proposed a Nash Q-learning algorithm to obtain the optimal strategy [13]. Zhu et al. built a hybrid game-theoretic framework to improve the robustness of power system security [14]. Since game theory based control strategy can process strategic interactions among multiple decision makers, it is often used in large-scale system security control.
Despite the game theory based method being able to be efficient against conventional DoS threats, only using control methods to defense DoS attacks is not sufficient to counter potential and increasingly sophisticated attacks. In large-scale control systems, structure factors, such as distributed energy sources (DES) placements, sensors, and communication link topologies, can critically affect the performance in DoS attacks. Existing work of energy storage systems [15], DES placement [16], sensor scheduling [17] and coverage aims at minimizing the electrical and computational costs [18]. Upon existing research in the optimization of control system structure for cyber security, there are several critical challenges. One critical challenge is the transient and dynamic issues of DES and cyber resources allocation. Most existing research focused on the characteristics of steady state [19]. They ignored the transient and dynamic issues in control systems [20]. Another critical challenge is that most existing algorithms used a precisely analytical model as a premise [21]. However, most large-scale power systems are unable to be precisely modeled. In this case, a model free method is required for control-system-structure optimization.
With the consideration of DoS [22,23] attacks and the control-system-structure optimization problem, we propose a sparse neural network based NCS optimization method that is derived from the Stackelberg game model for the frequency regulation of DES in power systems. A neural controller is trained by reinforcement learning of offline simulation and online processes to improve the system performance. The system performance is also improved under limited cyber resources by optimizing the control structure, which involves the placement of DES, RTU sensors and communication topology. It is optimized by imposing a group sparse regulation on neural network weights. The Stackelberg game model is used, which derives a minimax optimization of system performance, so that the optimized control structure is robust to DoS attacks under the consideration of worst case attacks. The structure consists of the placement of DES, RTU sensors and communication topologies. The contributions of this paper are summarized as follows:
  • Sparse neural network based reinforcement learning is proposed to improve the frequency regulation of DES in control systems without using a power system analytical model, which involves adaptiveness, performance, and structure.
  • The Stackelberg game model is used to derive the optimal control scheme and structure, so that the proposed frequency regulation system is robust to the worst case of DoS attacks. In addition, the reliability of proposed frequency regulation system is enhanced.
The remainder of this paper is organized as follows: the system model and related problems are formulated in Section 2, which introduces the power system and DoS attack model, and describes the frequency regulation as well; Section 3 elaborates control structure and control law design by sparse neural network based reinforcement learning; Stackelberg game model and the optimization scheme of control structure are derived under DoS attacks in Section 4; and Section 5 demonstrates the simulation results to verify the proposed algorithm.

2. Problem Formulation

The formulation of power grid and control objective will be introduced in this section. We consider a multi-area system that is integrated with Distributed Energy resources. The control objective is to mitigate the frequency regulation. The DoS attack may degrade the control system performance and even lead to failures. Thus, a design problem involving controller design and structure design under the cyber attack is also introduced.

2.1. Power Grid Frequency Dynamic Model

We consider the interconnected multi-area power system. Each one of n areas is connected to each other by tie-line (also called transmission line). As shown in Figure 1, each area equips a turbine generator and DES, such as wind power, solar power, battery, etc. [20].
It also contains a load frequency controller (LFC) and tie-line bias controller (TBC) for frequency synchronization. Even if the synchronization measures regulate the frequency, an auxiliary control offered by DES may be necessary to enhance the system performance, when the power system encounters a severe disturbance, such as system fault, and sudden large load drop. Considering the auxiliary control, the dynamic model of area i can be formulated as a discrete linear difference equation [24]:
x i ( k + 1 ) = A i x i ( k ) + B i u i ( k ) + j N p ( i ) B j i x j ( k ) + E i w i ( k ) ,
where i N + [ 1 , n ] , x i = Δ f i Δ P m i Δ P v i Δ P t i e i A C E i T is area state. Δ f i is the deviation related to synchronized frequency; Δ P m i is the mechanical power deviation of generator; Δ P v i is the valve position deviation of turbine; Δ P t i e i is the deviation of tie-line power injection from other physical neighbored areas; A C E i is the A C E signal of area i and A C E i = α i Δ f i + Δ P t i e i . u i as the auxiliary control output of DES for frequency regulation is the sum of all the powers generated from power-electronic interfaced DES; w i is the disturbance caused by model error or other time-varying factors; A i is the system transition matrix; B i and B j i are the gains of control effect and other physical neighbored systems; E i is the disturbance gain. N p ( i ) denotes the physical neighbored areas of area i.
In this linear model, loads are assumed to be constant because the variation of loads is slow relative to the dynamic frequency regulation. Therefore, A i , B i , and  B j i can be modeled in time invariant [24]. The system is linear time invariant (LTI) as well as an NCS. The DES controller of area i for frequency regulation are written as
u i = φ i ( x i , x N c ( i ) ) ,
where the time stage k is neglected. N c ( i ) denotes the cyber connected areas of area i. The controller calculates DES control outputs by the received state x i and x j as well as j N c ( i ) from local and remote areas, respectively. The control objective is to mitigate the frequency deviation and reduce the overall costs defined in Section 3, which consider the least quadratic of state x i and the control output u i .

2.2. DoS Attack Model

For the previously mentioned NCS, we consider attacker launch attacks, when the power system requires the emergency auxiliary control offered by DES. DoS attack blocks communication channels to degrade the control performance, even causing system failures. The blocks of communication channels probably result in the absence of some remote states x j and j N c ( i ) of the current time stage k [25]. When controller cannot obtain the remote state from cyber connected area j, “zero-control” strategy is applied to assigning x j = 0 . u i becomes
u i = φ [ { α j i x j | j N c ( i ) i } ] ,
where α j i { 0 , 1 } is distributed according to Bernoulli distribution, which is
P ( α j i ( k ) = 0 ) = δ j i , P ( α j i ( k ) = 1 ) = 1 δ j i ,
where δ j i [ 0 , 1 ] is the probability of packet drop in the communication from area j to i, k N + . The probability δ j i increases the intensity of DoS attack. DoS attacker has the limited cyber resources, which constrains the intensity of DoS attack at a certain degree. The constraint in our model is subject to
i = 1 n j N c ( i ) δ j i C .
The probability δ j i is limited to the intensity of DoS attack. The intensity of DoS attack is under the constraints of cyber cost, which is also limited. Therefore, the sum of probability δ j i must be less than a maximal real number C.

2.3. Control, Structure Design and Optimization Problem

Even though the NCS of power grid suffers DoS attacks, one of the NCS objectives is to enhance the system performance by mitigating the frequency deviation Δ f i and minimizing the control cost. To achieve the control objective, the control law φ i should be investigated. Moreover, the design of control structure and its optimization also need to be considered. Since the control structure associates the placement of sensors (in remote terminal unit (RTU)), actuators (in DES) and communication topology, these placements seriously affect the control performance in a large-scale multi-area power system. Generally, the placed resources (RTU, DES, etc.) are expensive.
In mathematical form, the control design gives a φ i to achieve the system performance. We define Q shown in Equation (6) as the control cost to measure system performance.
φ * = arg min φ Q ,
where “argmin” here also denotes the functional minimization; the control structure design aims to select cyber connected area set N c ( i ) for i = 1 , , n , which is
N c * = arg min N c Q .
Therefore, as an optimization, it searches a combination of φ i and N c ( i ) for N c ( i ) , i = 1 , , n to minimize the control cost in the worst case of DoS attack with constraint Equation (5) as shown in Equation (8):
φ * , N c * , δ * = arg max δ min φ , N c Q ,
where φ = φ 1 φ n , N c = N c ( 1 ) N c ( n ) , δ = δ 1 N c ( 1 ) δ n N c ( n ) .
We assume the attacker knows the information of the designed power system. The attacker can figure out the minimal system performance, when the controller reaches its maximal performance. The details of optimization in a Stackelberg game are discussed in Section 4.

3. Control and Structure Design

3.1. Control Design by Reinforcement Learning

Before introducing control design, a cost function is defined to measure the control cost Q. Control performance usually involves the quadratic of state and control output. Thus, we define the control cost Q by a control strategy utility function used in our previous work [24]:
Q i ( k ) = min u i ( k + j ) , j [ 0 , ] t = 0 α N t p i ( k + t ) ,
where 0 < α < 1 is the discount rate, and N is a positive integer. p i as a binary performance index is defined as:
p i ( k ) = 1 , a 1 x i ( k ) + a 2 u i ( k ) c , 0 , otherwise ,
where · denotes 2-norm, a 1 and a 2 are the weights of state and control cost respectively, and c is the threshold of performance indication. If the current cost is in an allowed range (less than c), the binary performance index shown in Equation (10) is 1; otherwise, it is 0. Binary performance index makes the strategy utility function limited. When time horizon is limited, even the system diverges, and it can avoid a numerical crash in the learning process.
To prevent the system diverge, we construct a desired control output u d i to apply a damping rate L i R 5 × 5 , L i < 1 to the system, so that the power grid frequency dynamic becomes
x i ( k + 1 ) = L i x i ( k ) + j N p ( i ) \ N c ( i ) B j i x j ( k ) + E i w i ( k ) .
If we approximate u d i by a neural network output u ^ d i , the neural network is defined as
u ^ d i ( k ) = M a W i ϕ ( k ) ,
where ϕ R n l × 1 is the radial basis of neural network calculated from state x 1 , x 2 , , x n , W i R n l × n l is the trainable weight, and  M a R 1 × n l is a given constant matrix. As shown in Figure 2, the proposed neural network structure is based on a radial basis function (RBF). An RBFb-based neural network can identify nonlinear dynamical systems [26]. l is the dimension of radius basis for one area. Under the control of u ^ d i = u i , the system of area i becomes
x i ( k + 1 ) = L i x i ( k ) + B i [ u ^ d i ( k ) u d i ( k ) ] + j N p ( i ) \ N c ( i ) B j i x j ( k ) + E i w i ( k ) .
According to the control theory of stability, if  B i [ u ^ d i ( k ) u d i ( k ) ] + E i w i ( k ) and j N p ( i ) \ N c ( i ) B j i x j ( k ) in the right part of Equation (13) are bounded, the system is ultimately uniformly bound (UUB) [27]. We assume the disturbance w i ( k ) is bounded as w i ( k ) ε , where ε is a small real number, so learning W i should involve the approximation of u d i .
The control cost involving system performance is also approximated by a neural network as shown in Equation (14):
Q ^ i ( k ) = M c W i ϕ ( k ) ,
where M c R 1 × n l is a given constant matrix.
We train the neural network weight W i to approximate the desired control output u d i as well as the performance measurement for the improvement of system performance. Therefore, we define a loss function for training as    
V i ( k ) = 1 2 B i u ^ d i ( k ) B i u d i ( k ) 2 + 1 2 α N + 1 p i ( k ) + α 1 Q ^ i ( k + 1 ) Q ^ i ( k ) 2 + 1 2 Q ^ i ( k ) 2 .
The first term denotes the approximation error of the desired control output, the second term denotes the approximation error of system performance, and the third term is used to improve the long-term system performance. As shown in Equation (16), the desired control output u d i and system performance Q i can be expressed by the neural network:
u d i ( k ) = M a W i * ϕ ( k ) + υ a i ( k ) , Q i ( k ) = M c W i * ϕ ( k ) + υ c i ( k ) ,
where υ a i ( k ) v and υ c i ( k ) v are the optimal approximation errors, v is is a small number, and  W i * is the optimal approximation of a constant matrix. We derive the defined loss function V i ( k ) subject to W i ( k ) and neglect v a i and v c i , so the converted online iteration formula is shown as
W i ( k + 1 ) = W i ( k ) β i L i ( k ) W i ( k ) , L i ( k ) W i ( k ) = M a T x i ( k + 1 ) L i x i ( k ) + M c T α N + 1 p i ( k ) + α 1 Q ^ i ( k + 1 ) ϕ ( k ) T .

3.2. Structure Design by Sparse Neural Networks

The structure design involves the placement of sensor, actuator, and communication topology. It is formulated by the neural network weight W i . The weight matrix is separated into column groups, so we can define the radius basis ϕ as
ϕ = ϕ 1 T ϕ 2 T ϕ n T T ,
where ϕ i = e x i x c 1 2 σ 2 e x i x c 2 2 σ 2 e x i x c l 2 σ 2 T , x c j is a given radius center, j N + [ 1 , l ] , and  σ is a given radius width. According to the definition of ϕ , we know that the neural weight is separated into n column groups. Each group that has l columns corresponds to the radius basis ϕ i of one area. If the weight matrix W i is group sparse [28] with respect to these columns, the structure of control system can be figured out. For instance, if the column group j of W i corresponding to the gain of ϕ i is zero, the controller of area i does not need the information from area j, and the communication channel between area j and i is also not necessary. If all the controllers do not require the information from area j, any sensor is not necessary for area j. If W i = 0 , it means that the actuator in area i is not required. The key is to force W i to be column group sparse. Thus, a group sparse regulation term is added to the loss function. Thus, the loss function becomes
V i ( k ) = 1 2 B i u ^ i ( k ) B i u d i ( k ) 2 + Q ^ i ( k ) 2 + 1 2 α N + 1 p i ( k ) + α 1 Q ^ i ( k + 1 ) Q ^ i ( k ) 2 + γ i j = 1 n v e c W i G ( j ) ,
where W i G ( j ) denotes the column group j or the gain of radius basis ϕ j . γ i is the regulation weight. The sparse regulated learning iteration is demonstrated in Equation (20), which can be also derived by Equation (17):
W i ( k + 1 ) = W i ( k ) β i V i ( k ) W i ( k ) V i ( k ) W i ( k ) = M a T x i ( k + 1 ) L i x i ( k ) + M c T α N + 1 p i ( k ) + α 1 Q ^ i ( k + 1 ) ϕ ( k ) T + γ i W i ( k ) D i ( k ) ,
where D i = d i a g 1 v e c ( W i G ( 1 ) ) 1 1 v e c ( W i G ( 2 ) ) 1 1 v e c ( W i G ( n ) ) 1 , and 1 = [ 1 , 1 , , 1 ] R l . W i j means the jth weight block in W i , which involves ϕ j . Thus, we have
u ^ d i = W i 1 ϕ 1 + W i 2 ϕ 2 + + W i n ϕ n .
The stability analysis of iteration in Equation (20) can be taken in a similar way used in [24]. We know that if β i and γ i are small enough, the iteration in Equation (20) is stable and the errors in results are limited.

4. Structure Optimization under DoS Attacks

In this section, the control system structure design is solved by a structure optimization problem. Considering DoS attack, a Stackelberg game is formulated in the structure optimization. In addition, the optimization algorithm is also presented.

4.1. Stackelberg Game Formulation

As shown in Equation (20), the iteration searches a system structure and control scheme in simulation, and then a DoS attacker observes the system and takes attack actions. It is a leader–follower sequence, therefore a Stackelberg game model is proposed to obtain the optimal solution of control structure optimization and DoS attacks.
There are two actors in the proposed model. One is a defender that is the system designer, and the other one is an attacker that is a DoS attacker. The defender’s action is N c = N c ( 1 ) , N c ( 2 ) , , N c ( n ) . The reward function of defender is defined as
r N c , δ = i n Q i N c ( i ) , δ .
The attacker’s action is δ = δ N c ( 1 ) 1 δ N c ( 2 ) 2 δ N c ( n ) n . The reward of attacker is r N c , δ . Therefore, it is a zero-sum game. The structure optimization becomes a min-max optimization problem as follows:
N c * , δ * = arg min δ max N c r ( N c , δ ) ,
where N c * and δ * are the equilibrium of game model, which can also be treated as the optimal solution for both defender and attacker. Thus, it can obtain r ( N c , δ * ) r ( N c * , δ * ) r ( N c * , δ ) .

4.2. Structure Optimization under Dos Attacks

For a given DoS attack δ , we know that the optimal structure and control law can be figured out by the iteration of Equation (20): The iteration derives as follows.
N c * δ = arg max N c r ( N c , δ ) .
If j N c * ( i ) , the iteration of Equation (20) usually obtains a small v e c ( W i j ) instead of v e c ( W i j ) = 0 because it is an numerical method. Therefore, a threshold method is used in our algorithm to obtain N c * δ . We can obtain the estimated N c * δ by Equation (25):
N ^ c * δ i = j v e c ( W j i ) ρ i , i N + [ 1 , n ] ,
where N + is the integer set, and  ρ i is a given threshold that is a small positive number. The equilibrium of Stackelberg game can be reached by solving the optimization problem of Equation (23), or an optimization problem of minimization with constraints as follows:
δ * = arg min δ r N c * δ , δ , s . t . i = 1 n j N c ( i ) δ j i C , δ j i [ 0 , 1 ] j , i N + [ 1 , n ] .
In summary, the algorithm of structure optimization under DoS attacks is described in Algorithm 1.
Algorithm 1 Algorithm of Structure Optimization under DoS Attacks
  • Input: : the radial basis dimension l, area number n, learning rate β i and γ i .
  • Output: : system structure N c * and optimal attack strategy δ * .
    • Initialize M a , M c , W i , i = 1 , 2 , , n and δ ( 0 ) , each element is subject to Gaussian distribution with small average value 0.01 and variance 0.14. Set t = 0 and maximal iteration number as MAXITER;
    • For given δ ( t ) , find N ^ c * δ ( t ) by the iteration of Equation (20) and (25);
    • Measure the gradient of r N c * δ , δ ( t ) that is subject to δ ;
    • Update δ to obtain δ ( t + 1 ) based on the gradient obtained in step c, and handle the constraints of Equation (26) by Lagrange method [29] or barrier function method [30];
    • Check whether the updated variance is less than a threshold or t equals to the maximal iteration number MAXITER. If yes, end the algorithm and go to step f, otherwise t = t + 1 and return to step b;
    • Obtain N c * = N ^ c * δ ( t ) and δ * = δ ( t ) .
The obtained N c * by Algorithm 1 is the optimized control structure of power system. The worst case of system performance with our controller is also known by Algorithm 1. However, we do not further consider the unstable control system caused by a large C [5] or the attacker has a lot of cyber resources to launch DoS attacks. These issues will be investigated in future.

5. Experiments and Analysis

This section illustrates two cases of IEEE 14 bus system and 24 bus system [31] to show the effectiveness and advantages of the proposed scheme. The scheme takes DoS attacks into account and considers the worst case of attacks under the constraints in the Stackelberg game model mentioned above. In the Stackelberg game model, the scheme uses the optimal structure design to enhance the system performance by reinforcement learning. We assume sub-system parameters are as follows:
A i = 0 j D i T j i 0 0 0 1 / M i D i / M i 1 / M i 0 0 0 0 1 / T d i 1 / T d i 0 0 1 / ( T g i R g i ) 0 1 / T g i K i / T g i 1 b i 0 0 0 , B j i = 0 0 0 0 0 T j i M i h D i T i h 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 , B i = 0 1 / M i 0 0 0 T ,
where M i and D i are inertia and damping constant respectively, T g i and T d i are the governor and gas turbine constant respectively, and R g i and T i j are the regulation and synchronizing constant respectively [20]. The related parameters of sub-system are listed in Table 1 [24].
The constraints of DoS attacks are set as follows:
i = 1 14 j N c ( i ) δ j i 1.4 .
The subsystems in grid communicate with each other by communication channels. The above constraint means that the sum of all communication channel jamming possibilities is 1.4, or the average jamming possibility is 0.1 for each communication channel. The larger C in Equation (26) is, the more cyber resources for attackers would be taken. In the following simulations, we set the sampling period to 0.1 s.

5.1. Case I: IEEE 14 Bus Test System

The IEEE 14 bus test system is carried out under reinforcement learning control and DoS attacks. The initial state of power grid simulation is assumed to be affected by a large disturbance. The simulation time lasts 5 s. Without any control, we can see that the system collapses under serious DoS attacks after a large power grid disturbance occurs as shown in Figure 3. The frequency increases to a large value, which reaches to a maximal value about 40 Hz in 5 s. Therefore, a controller is required to maintain the stability. The parameters of reinforcement learning algorithm are listed in Table 2. The detail of parameter selection principle can be referred to [2].
The parameter β i decides the result of IEEE 14 bus test system. If β i is too large, the online learning of Equation (20) diverges; otherwise, too small β i may cause the slow convergence rate. The selection of γ i is another key factor. Too large γ i may result in the poor control performance and high sparsity of W i .
Figure 4 and Figure 5 show the results that use the proposed control scheme and the optimal DoS attack strategy. The optimal DoS attack strategy is obtained by solving Equation (26), and the optimal controllers are learned by the online iteration of Equation (20). According to the results in Figure 4, we can see the frequency deviation curves of all sub-systems converge to a steady value around 0. The maximal fluctuant magnitude of these curves is a small value about 1.2 Hz. The swings end in about 4 s.
For the optimal DoS attack strategy shown in Figure 5a, the attacks focus on the communication from buses 3, 6, 10, 11, and 13. For the optimal control structure about placement of sensor, actuator, and communication topology shown in Figure 5b, it shows the F-norm for block W i j in W i , i , j = 1 , 2 , , n . If the norm value is large, it means sub-system j needs information from sub-system i. Thus, sub-system i needs to install sensors, and sub-system j requires actuators as well as the communication topology from sub-system i to j. The sensors are mainly installed in buses 2, 3, 6, 10, 11, 13, and 14. According to the solution, those communication lines that are not attacked maintain the system stable. Thus, the obtained solution is optimal for both attacker and defender (control system designer) as Nash equilibrium.

5.2. Case II: IEEE 24 Bus Test System

This section verifies the effectiveness of the proposed scheme in a relatively large system—IEEE 24 bus test system. The simulation time lasts 10 s. Both reinforcement learning control and DoS attacks are applied. The optimal attack strategy and the optimized control structure are obtained by solving the optimization of Equation (26). Specifically, the optimal attack strategy is solved by the offline optimization algorithm described in Section 4.2, and the optimized control structure is obtained by the online learning of Equation (20).
The parameters of reinforcement learning algorithm are listed in Table 3. The detail of parameter selection can be referred to [2].
The simulation results are shown in Figure 6 and Figure 7. Figure 6 shows the frequency deviation curves of power grid under the worst case of DoS attacks. The results illustrate the effectiveness of propose solution under the large-size disturbance and DoS attacks. The swings of all the frequency deviation curves end in about 5 s. The magnitude of swings is small, and its maximal value is 1.5 Hz.
From the optimal attack strategy results shown in Figure 7a, we know the attacker should focus on the communications from buses 2, 3, 4, 14, 19, 21, and 22 with the attack possibility of 0.2–0.35. The optimal control structure under the worst case of DoS attacks is shown in Figure 7b. The sensor should be installed in buses 2, 3, 4, 6, 14, 15, 19, 22, 23, and 24. The weight of bus 19 is high, which means the information obtained from bus 19 has high importance. Thus, DoS attacks have such an attack strategy that attacks the communication lines of bus 19 with a relatively high attack rate.

6. Conclusions

This paper proposes a novel optimization method of control structure as well as a reinforcement learning method in an integrated neural network under DoS attacks. The reinforcement learning, which involves the frequency damping and control performance, is approximated by the integrated neural network. The frequency damping includes the desired control input holding the system stable and accelerating the convergence rate in online learning. The approximation of control performance is adopted as the reinforcement signal to enhance the long-term system performance. To obtain a reasonable control scheme and structure under DoS attacks, a Stackelberg game model is also proposed. The worst case of DoS attacks is considered and solved by the proposed optimization solution. The optimization solution also derives the optimal control structure, which involves the placement of sensors, actuators (DES), and the communication topology. The simulation results illustrate the effectiveness of the proposed scheme. The optimal DoS attacks and the control structure are obtained in the simulation. In future, the convexity of the game and the existence of game equilibrium will be analyzed. The constraints of DER considering the environment and variation will also be considered. In addition, the error of the proposed algorithm will be analyzed theoretically and numerically in the next steps.

Author Contributions

Conceptualization, J.S.; methodology, J.S.; software, J.S.; validation, J.S., G.Q. and Z.Z.; formal analysis, Z.Z.; investigation, G.Q. and Z.Z.; resources, J.S.; data curation, J.S. and Z.Z.; writing–original draft preparation, J.S.; writing–review and editing, G.Q.; visualization, Z.Z.; supervision, G.Q; project administration, J.S. and Z.Z.; funding acquisition, J.S. and Z.Z.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 61803061 and 61703347); Fundamental Research Funds for the Central Universities (Grant No. XDJK2019C019); Science and Technology of the Chongqing Natural Science Foundation (Grant No. cstc2016jcyjA0428); the Research Program of Chongqing Municipal Education Commission (Grant No. KJQN201800603); the Innovation Project of Chongqing Overseas Students Entrepreneurial Innovation Support program (Grant No. cx2018074); Chongqing Key Industries Common Key Technology Innovation project (Grant No. cstc2017zdcy-zdyf0366); and Southwest University Education Reform Project (Grant No. 2017JY080).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Chen, P.; Li, J.; Fei, M.R. Resilient event-triggered Hinf load frequency control for networked power systems with energy-limited DoS attacks. IEEE Trans. Power Syst. 2017, 32, 4110–4118. [Google Scholar]
  2. Sun, J.; Zhu, Z.; Li, H.; Chai, Y.; Qi, G.; Wang, H.; Hu, Y.H. An integrated critic-actor neural network for reinforcement learning with application of DERs control in grid frequency regulation. Int. J. Electr. Power Energy Syst. 2019, 111, 286–299. [Google Scholar] [CrossRef]
  3. Sridhar, S.; Hahn, A.; Govindarasu, M. Cyber-Physical System Security for the Electric Power Grid. Proc. IEEE 2011, 100, 210–224. [Google Scholar] [CrossRef]
  4. Liang, G.; Zhao, J.; Luo, F.; Weller, S.; Dong, Z.Y. A Review of False Data Injection Attacks Against Modern Power Systems. IEEE Trans. Smart Grid 2017, 8, 1630–1638. [Google Scholar] [CrossRef]
  5. Yuan, Y.; Yuan, H.; Lei, G.; Yang, H.; Sun, S. Resilient Control of Networked Control System under DoS Attacks: A Unified Game Approach. IEEE Trans. Ind. Inform. 2016, 12, 1786–1794. [Google Scholar] [CrossRef]
  6. Srikantha, P.; Kundur, D. Denial of service attacks and mitigation for stability in cyber-enabled power grid. In Proceedings of the Innovative Smart Grid Technologies Conference, Washington, DC, USA, 18–20 February 2015. [Google Scholar]
  7. An, L.; Yang, G.H. Decentralized Adaptive Fuzzy Secure Control for Nonlinear Uncertain Interconnected Systems Against Intermittent DoS Attacks. IEEE Trans. Cybern. 2019, 49, 827–838. [Google Scholar] [CrossRef] [PubMed]
  8. Sun, H.; Peng, C.; Yang, T.; Zhang, H.; He, W. Resilient control of networked control systems with stochastic denial of service attacks. Neurocomputing 2017, 270, 170–177. [Google Scholar] [CrossRef]
  9. Hu, S.; Yue, D.; Xie, X.; Chen, X.; Yin, X. Resilient Event-Triggered Controller Synthesis of Networked Control Systems Under Periodic DoS Jamming Attacks. IEEE Trans. Cybern. 2018. [Google Scholar] [CrossRef]
  10. Li, Y.; Quevedo, D.E.; Dey, S.; Ling, S. SINR-based DoS Attack on Remote State Estimation: A Game-theoretic Approach. IEEE Trans. Control Netw. Syst. 2017, 4, 632–642. [Google Scholar] [CrossRef]
  11. Yuan, Y.; Sun, F.; Liu, H. Resilient control of cyber-physical systems against intelligent attacker: A hierarchal stackelberg game approach. Int. J. Syst. Sci. 2016, 47, 2067–2077. [Google Scholar] [CrossRef]
  12. Ding, K.; Dey, S.; Quevedo, D.E.; Ling, S. Stochastic Game in Remote Estimation under DoS Attacks. IEEE Control Syst. Lett. 2017, 1, 146–151. [Google Scholar] [CrossRef]
  13. Ding, K.; Li, Y.; Quevedo, D.E.; Dey, S.; Ling, S. A multi-channel transmission schedule for remote state estimation under DoS attacks. Automatica 2017, 78, 194–201. [Google Scholar] [CrossRef] [Green Version]
  14. Zhu, Q.; Basar, T. Game-Theoretic Methods for Robustness, Security, and Resilience of Cyberphysical Control Systems: Games-in-Games Principle for Optimal Cross-Layer Resilient Control Systems. IEEE Control Syst. 2015, 35, 46–65. [Google Scholar]
  15. Atwa, Y.M.; El-Saadany, E.F. Optimal Allocation of ESS in Distribution Systems With a High Penetration of Wind Energy. IEEE Trans. Power Syst. 2010, 25, 1815–1822. [Google Scholar] [CrossRef]
  16. Borges, C.L.T.; Falcão, D.M. Optimal distributed generation allocation for reliability, losses, and voltage improvement. Int. J. Electr. Power Energy Syst. 2006, 28, 413–420. [Google Scholar] [CrossRef]
  17. Zhang, H.; Ayoub, R.; Sundaram, S. Sensor selection for Kalman filtering of linear dynamical systems: Complexity, limitations and greedy algorithms. Automatica 2017, 78, 202–210. [Google Scholar] [CrossRef]
  18. Gupta, V.; Chung, T.H.; Hassibi, B.; Murray, R.M. On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage. Automatica 2006, 42, 251–260. [Google Scholar] [CrossRef]
  19. Qu, C.; Chen, W.; Song, J.B.; Li, H. Distributed Data Traffic Scheduling With Awareness of Dynamics State in Cyber Physical Systems With Application in Smart Grid. IEEE Trans. Smart Grid 2015, 6, 2895–2905. [Google Scholar] [CrossRef]
  20. Zhu, Z.; Sun, J.; Qi, G.; Chai, Y.; Chen, Y. Frequency Regulation of Power Systems with Self-Triggered Control under the Consideration of Communication Costs. Appl. Sci. 2017, 7, 688. [Google Scholar] [CrossRef]
  21. Li, H. Data traffic scheduling for cyber physical systems with application in voltage control of microgrids. IEEE Syst. J. 2017, 8, 542–552. [Google Scholar] [CrossRef]
  22. Cambiaso, E.; Papaleo, G.; Aiello, M. Slowcomm: Design, development and performance evaluation of a new slow DoS attack. J. Inf. Secur. Appl. 2017, 35, 23–31. [Google Scholar] [CrossRef]
  23. Cambiaso, E.; Papaleo, G.; Giovanni, C.; Aiello, M. A Network Traffic Representation Model for Detecting Application Layer Attacks. Int. J. Archit. Comput. 2016, 5, 31–42. [Google Scholar]
  24. Sun, J.; Li, J. A Stable Distributed Neural Controller for Physically Coupled Networked Discrete-Time System via Online Reinforcement Learning. Complexity 2018, 2018, 5950678. [Google Scholar] [CrossRef]
  25. Ding, D.; Wang, Z.; Ho, D.W.; Wei, G. Observer-Based Event-Triggering Consensus Control for Multiagent Systems With Lossy Sensors and Cyber-Attacks. IEEE Trans. Cybern. 2017, 47, 1936–1947. [Google Scholar] [CrossRef] [PubMed]
  26. Robnik-šikonja, M. Data Generators for Learning Systems Based on RBF Networks. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 926–938. [Google Scholar] [CrossRef] [PubMed]
  27. Hansen, L.P.; Sargent, T.J. Robust Control and Model Uncertainty. Am. Econ. Rev. 2001, 91, 60–66. [Google Scholar] [CrossRef] [Green Version]
  28. Pan, C.; Liu, W.; Thompson, J.S.; Yang, C.; Jorswieck, E.A. Semi-dynamic Green Resource Management in Downlink Heterogeneous Networks by Group Sparse Power Control. IEEE J. Sel. Areas Commun. 2016, 34, 1250–1266. [Google Scholar]
  29. Bertsekas, D.P. Constrained Optimization and Lagrange Multiplier Methods; Academic Press: Cambridge, MA, USA, 1982. [Google Scholar]
  30. Polak, E.; Yang, T.H.; Mayne, D.Q. A Method of Centers Based on Barrier Functions for Solving Optimal Control Problems with Continuum State and Control Constraints. SIAM J. Control Optim. 2006, 31, 159–179. [Google Scholar] [CrossRef]
  31. Zimmerman, R.D.; Murillo-Sanchez, C.E.; Thomas, R.J. MATPOWER: Steady-State Operations, Planning, and Analysis Tools for Power Systems Research and Education. IEEE Trans. Power Syst. 2011, 26, 12–19. [Google Scholar] [CrossRef]
Figure 1. Area structure.
Figure 1. Area structure.
Applsci 09 02217 g001
Figure 2. Neural network structure.
Figure 2. Neural network structure.
Applsci 09 02217 g002
Figure 3. Frequency deviation of IEEE 14 bus system without control.
Figure 3. Frequency deviation of IEEE 14 bus system without control.
Applsci 09 02217 g003
Figure 4. Frequency deviation of IEEE 14 bus system under control.
Figure 4. Frequency deviation of IEEE 14 bus system under control.
Applsci 09 02217 g004
Figure 5. Optimal DoS attacks and control structure 3D Graph in the IEEE 14 bus system.
Figure 5. Optimal DoS attacks and control structure 3D Graph in the IEEE 14 bus system.
Applsci 09 02217 g005
Figure 6. Frequency deviation of IEEE 24 bus system under control.
Figure 6. Frequency deviation of IEEE 24 bus system under control.
Applsci 09 02217 g006aApplsci 09 02217 g006b
Figure 7. Optimal DoS attacks and control structure 3D graph in the IEEE 24 bus system.
Figure 7. Optimal DoS attacks and control structure 3D graph in the IEEE 24 bus system.
Applsci 09 02217 g007
Table 1. Parameters of sub-system.
Table 1. Parameters of sub-system.
Parameter NameDescriptionValue
M i inertia constant0.2
D i damping constant0.26
T j i synchronizing constant0.5
T d i governor constant5
R g i regulation constant0.5
b i frequency bias gain1
T g i gas turbine constant0.2
K i tie-line bias control gain0.1
Table 2. Parameter of the controller in the IEEE 14 bus test system.
Table 2. Parameter of the controller in the IEEE 14 bus test system.
Parameter NameDescriptionValue
α damping factor for cost p i 0.9
NHorizon length for cost10
β i Learning rate of neural network0.1
γ i Weight of group sparse regulation term0.012
| L i | Norm of damping parameters L i 0.1
Table 3. Parameter of controller in IEEE 24 bus test system.
Table 3. Parameter of controller in IEEE 24 bus test system.
Parameter NameDescriptionValue
α damping factor for cost p i 0.8
NHorizon length for cost10
β i Learning rate of neural network1
γ i Weight of group sparse regulation term0.005
| L i | Norm of damping parameters L i 0.1

Share and Cite

MDPI and ACS Style

Sun, J.; Qi, G.; Zhu, Z. A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid. Appl. Sci. 2019, 9, 2217. https://doi.org/10.3390/app9112217

AMA Style

Sun J, Qi G, Zhu Z. A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid. Applied Sciences. 2019; 9(11):2217. https://doi.org/10.3390/app9112217

Chicago/Turabian Style

Sun, Jian, Guanqiu Qi, and Zhiqin Zhu. 2019. "A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid" Applied Sciences 9, no. 11: 2217. https://doi.org/10.3390/app9112217

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop