A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid

Sun, Jian; Qi, Guanqiu; Zhu, Zhiqin

doi:10.3390/app9112217

Open AccessArticle

A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid

by

Jian Sun

¹

,

Guanqiu Qi

^2,*

and

Zhiqin Zhu

^3,*

¹

School of Electronic and Information Engineering, Southwest University, Chongqing 400715, China

²

Computer Information Systems Department, Buffalo State College, Buffalo, NY 14222, USA

³

College of Automation, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2019, 9(11), 2217; https://doi.org/10.3390/app9112217

Submission received: 22 April 2019 / Revised: 26 May 2019 / Accepted: 26 May 2019 / Published: 30 May 2019

(This article belongs to the Section Optics and Lasers)

Download

Browse Figures

Versions Notes

Abstract

With the rapid growth of distributed energy sources, power grid has become a flexible and complex networked control system. However, it increases the chances of being a denial-of-service attack, which degrades the performance of the power grid, even causing cascading failures. To mitigate negative effects from denial-of-service attack and enhance the reliability of the power grid, we propose a networked control system structure based optimization scheme that is derived from a Stackelberg game model for the frequency regulation of a power grid with distributed energy sources. In the proposed game model, both denial-of-service attacker and control system designer as a defender are considered without using any analytical model. For defenders, we propose a sparse neural network based DES control and system structure design scheme. The neural network is used to approximate the desired control output and reinforce signals for the improvements of short- and long-term performance. It also introduces the sparse regulation of column grouping in the neural network learning process to explore the structure of control system that involves the placement of sensor, distributed energy sources actuator, and communication topology. For denial-of-service attackers, the related attack constraints and attack rewards are established. The solution of game equilibrium is considered as an optimal solution for both denial-of-service attack strategy and control structure. An offline optimization algorithm is proposed to solve the game equilibrium. The effectiveness of proposed scheme is verified by two cases, which illustrate the optimal solutions of both control structure and denial-of-service attack strategy.

Keywords:

DoS attack; reinforcement learning; neural network; distributed energy resources; power grid; smart grid

1. Introduction

With the development of network techniques, electricity supply via the modern power grid is increasingly depending on networked control systems (NCSs). In the power grid and communication integrated network, the efficiency and reliability of power grid are gradually enhanced [1,2]. However, new network control techniques generate related vulnerabilities in the control system of power grids. As the connection of virtual and physical worlds, cyber attacks against NCSs can render the large disturbances of power grid that have been confirmed during the past few years [3].

As a common type of cyber attack, denial-of-service (DoS) attack occupies communication resources to prohibit the transmission of measurement and control signals in NCSs [1]. Compared with deception attacks [4], DoS attacks not only require little prior knowledge about NCSs, but also destroy control operations in real time [5]. In particular, blocking real-time control signals can cause the instability of the power grid [6]. A lot of security control approaches, such as stochastic time delay system, triggering strategy, and game theory, have been applied to the prevention of DoS attacks in power grid.

Stochastic time delay system: In a stochastic time delay system, DoS is modeled as a stochastic process with a delay in the signal. Subject to intermittent DoS attacks, An et al. proposed a decentralized solution of adaptive output feedback control for a power grid [7]. A switching-type state estimator is presented to estimate the state of power grid by using discontinuous output measurement. Sun et al. modeled DoS attacks as a Markov process that converted DoS attacks to stochastic noises of NCSs first [8]. Then, a resilient control model was used to process the converted noises.

Triggering strategy: Event-triggering time-sequence of control signal is adopted to reduce communication costs in the system. The triggering time sequence is able to defend against DoS attacks. Peng et al. proposed a resilient event-triggering based frequency control method for power system control in energy-limited DoS attacks [1]. The proposed event-triggered communication scheme can tolerate a certain degree of data loss in the open communication induced by energy-limited DoS attacks. Hu et al. designed a periodically resilient event-triggering communication scheme to identify DoS attacks initiated by power-constrained pulse-width-modulated jammers [9].

Game theory: In game theory based cyber security model, a game model is constructed for both DoS attacker and defender to obtain a Nash equilibrium as an optimal solution. Li et al. modeled the interactions between the transmission power of sensors and the interference power of DoS attacker by a signal-to-interference-and-noise-ratio (SINR) based network [10]. A modified Nash Q-learning algorithm was proposed to analyze the related interactions as well. Yuan et al. analyzed the resilient control issues of NCS under DoS attack via a unified game approach [5]. Yuan et al. also built a multi-stage hierarchical game with a corresponding hierarchy of decisions that was implemented to achieve a resilient control system [11]. According to optimal control structures, optimal criteria were constructed for DoS attackers and cyber defenders. Ding et al. modeled the remote estimation under DoS attacks by using the strategy of zero-sum stochastic games and presented a monotone structure to solve the proposed model optimistically [12]. At the same time, Ding et al. also formulated the decision-making process of a target channel as a two-player zero-sum stochastic game framework and proposed a Nash Q-learning algorithm to obtain the optimal strategy [13]. Zhu et al. built a hybrid game-theoretic framework to improve the robustness of power system security [14]. Since game theory based control strategy can process strategic interactions among multiple decision makers, it is often used in large-scale system security control.

Despite the game theory based method being able to be efficient against conventional DoS threats, only using control methods to defense DoS attacks is not sufficient to counter potential and increasingly sophisticated attacks. In large-scale control systems, structure factors, such as distributed energy sources (DES) placements, sensors, and communication link topologies, can critically affect the performance in DoS attacks. Existing work of energy storage systems [15], DES placement [16], sensor scheduling [17] and coverage aims at minimizing the electrical and computational costs [18]. Upon existing research in the optimization of control system structure for cyber security, there are several critical challenges. One critical challenge is the transient and dynamic issues of DES and cyber resources allocation. Most existing research focused on the characteristics of steady state [19]. They ignored the transient and dynamic issues in control systems [20]. Another critical challenge is that most existing algorithms used a precisely analytical model as a premise [21]. However, most large-scale power systems are unable to be precisely modeled. In this case, a model free method is required for control-system-structure optimization.

With the consideration of DoS [22,23] attacks and the control-system-structure optimization problem, we propose a sparse neural network based NCS optimization method that is derived from the Stackelberg game model for the frequency regulation of DES in power systems. A neural controller is trained by reinforcement learning of offline simulation and online processes to improve the system performance. The system performance is also improved under limited cyber resources by optimizing the control structure, which involves the placement of DES, RTU sensors and communication topology. It is optimized by imposing a group sparse regulation on neural network weights. The Stackelberg game model is used, which derives a minimax optimization of system performance, so that the optimized control structure is robust to DoS attacks under the consideration of worst case attacks. The structure consists of the placement of DES, RTU sensors and communication topologies. The contributions of this paper are summarized as follows:

Sparse neural network based reinforcement learning is proposed to improve the frequency regulation of DES in control systems without using a power system analytical model, which involves adaptiveness, performance, and structure.
The Stackelberg game model is used to derive the optimal control scheme and structure, so that the proposed frequency regulation system is robust to the worst case of DoS attacks. In addition, the reliability of proposed frequency regulation system is enhanced.

The remainder of this paper is organized as follows: the system model and related problems are formulated in Section 2, which introduces the power system and DoS attack model, and describes the frequency regulation as well; Section 3 elaborates control structure and control law design by sparse neural network based reinforcement learning; Stackelberg game model and the optimization scheme of control structure are derived under DoS attacks in Section 4; and Section 5 demonstrates the simulation results to verify the proposed algorithm.

2. Problem Formulation

The formulation of power grid and control objective will be introduced in this section. We consider a multi-area system that is integrated with Distributed Energy resources. The control objective is to mitigate the frequency regulation. The DoS attack may degrade the control system performance and even lead to failures. Thus, a design problem involving controller design and structure design under the cyber attack is also introduced.

2.1. Power Grid Frequency Dynamic Model

We consider the interconnected multi-area power system. Each one of n areas is connected to each other by tie-line (also called transmission line). As shown in Figure 1, each area equips a turbine generator and DES, such as wind power, solar power, battery, etc. [20].

It also contains a load frequency controller (LFC) and tie-line bias controller (TBC) for frequency synchronization. Even if the synchronization measures regulate the frequency, an auxiliary control offered by DES may be necessary to enhance the system performance, when the power system encounters a severe disturbance, such as system fault, and sudden large load drop. Considering the auxiliary control, the dynamic model of area i can be formulated as a discrete linear difference equation [24]:

x_{i} (k + 1) = A_{i} x_{i} (k) + B_{i} u_{i} (k) + \sum_{j \in N_{p} (i)} B_{j i} x_{j} (k) + E_{i} w_{i} (k),

(1)

where

i \in N^{+} \cap [1, n]

,

x_{i} = {[\begin{matrix} Δ f_{i} & Δ P_{m i} & Δ P_{v i} & Δ P_{t i e - i} & \int A C E_{i} \end{matrix}]}^{T}

is area state.

Δ f_{i}

is the deviation related to synchronized frequency;

Δ P_{m i}

is the mechanical power deviation of generator;

Δ P_{v i}

is the valve position deviation of turbine;

Δ P_{t i e - i}

is the deviation of tie-line power injection from other physical neighbored areas;

A C E_{i}

is the

A C E

signal of area i and

A C E_{i} = α_{i} Δ f_{i} + Δ P_{t i e - i}

.

u_{i}

as the auxiliary control output of DES for frequency regulation is the sum of all the powers generated from power-electronic interfaced DES;

w_{i}

is the disturbance caused by model error or other time-varying factors;

A_{i}

is the system transition matrix;

B_{i}

and

B_{j i}

are the gains of control effect and other physical neighbored systems;

E_{i}

is the disturbance gain.

N_{p} (i)

denotes the physical neighbored areas of area i.

In this linear model, loads are assumed to be constant because the variation of loads is slow relative to the dynamic frequency regulation. Therefore,

A_{i}

,

B_{i}

, and

B_{j i}

can be modeled in time invariant [24]. The system is linear time invariant (LTI) as well as an NCS. The DES controller of area i for frequency regulation are written as

u_{i} = φ_{i} (x_{i}, x_{N_{c} (i)}),

(2)

where the time stage k is neglected.

N_{c} (i)

denotes the cyber connected areas of area i. The controller calculates DES control outputs by the received state

x_{i}

and

x_{j}

as well as

j \in N_{c} (i)

from local and remote areas, respectively. The control objective is to mitigate the frequency deviation and reduce the overall costs defined in Section 3, which consider the least quadratic of state

x_{i}

and the control output

u_{i}

.

2.2. DoS Attack Model

For the previously mentioned NCS, we consider attacker launch attacks, when the power system requires the emergency auxiliary control offered by DES. DoS attack blocks communication channels to degrade the control performance, even causing system failures. The blocks of communication channels probably result in the absence of some remote states

x_{j}

and

j \in N_{c} (i)

of the current time stage k [25]. When controller cannot obtain the remote state from cyber connected area j, “zero-control” strategy is applied to assigning

x_{j} = 0

.

u_{i}

becomes

u_{i} = φ [{α_{j i} x_{j} | j \in N_{c} (i) \cup i}],

(3)

where

α_{j i} \in {0, 1}

is distributed according to Bernoulli distribution, which is

\{\begin{matrix} P (α_{j i} (k) = 0) = δ_{j i}, \\ P (α_{j i} (k) = 1) = 1 - δ_{j i}, \end{matrix}

(4)

where

δ_{j i} \in [0, 1]

is the probability of packet drop in the communication from area j to i,

k \in N^{+}

. The probability

δ_{j i}

increases the intensity of DoS attack. DoS attacker has the limited cyber resources, which constrains the intensity of DoS attack at a certain degree. The constraint in our model is subject to

\sum_{i = 1}^{n} \sum_{j \in N_{c} (i)} δ_{j i} \leq C .

(5)

The probability

δ_{j i}

is limited to the intensity of DoS attack. The intensity of DoS attack is under the constraints of cyber cost, which is also limited. Therefore, the sum of probability

δ_{j i}

must be less than a maximal real number C.

2.3. Control, Structure Design and Optimization Problem

Even though the NCS of power grid suffers DoS attacks, one of the NCS objectives is to enhance the system performance by mitigating the frequency deviation

Δ f_{i}

and minimizing the control cost. To achieve the control objective, the control law

φ_{i}

should be investigated. Moreover, the design of control structure and its optimization also need to be considered. Since the control structure associates the placement of sensors (in remote terminal unit (RTU)), actuators (in DES) and communication topology, these placements seriously affect the control performance in a large-scale multi-area power system. Generally, the placed resources (RTU, DES, etc.) are expensive.

In mathematical form, the control design gives a

φ_{i}

to achieve the system performance. We define Q shown in Equation (6) as the control cost to measure system performance.

{φ_{}}^{*} = arg min_{φ_{}} Q,

(6)

where “argmin” here also denotes the functional minimization; the control structure design aims to select cyber connected area set

N_{c} (i)

for

i = 1, \dots, n

, which is

{N_{c}}^{*} = arg min_{N_{c}} Q .

(7)

Therefore, as an optimization, it searches a combination of

φ_{i}

and

N_{c} (i)

for

N_{c} (i)

,

i = 1, \dots, n

to minimize the control cost in the worst case of DoS attack with constraint Equation (5) as shown in Equation (8):

(\begin{matrix} φ^{*}, & {N_{c}}^{*}, & δ^{*} \end{matrix}) = arg max_{δ} min_{φ, N_{c}} Q,

(8)

where

φ = (\begin{matrix} φ_{1} & \dots & φ_{n} \end{matrix})

,

N_{c} = [\begin{matrix} N_{c} (1) & \dots & N_{c} (n) \end{matrix}]

,

δ = [\begin{matrix} δ_{1 N_{c} (1)} & \dots & δ_{n N_{c} (n)} \end{matrix}]

.

We assume the attacker knows the information of the designed power system. The attacker can figure out the minimal system performance, when the controller reaches its maximal performance. The details of optimization in a Stackelberg game are discussed in Section 4.

3. Control and Structure Design

3.1. Control Design by Reinforcement Learning

Before introducing control design, a cost function is defined to measure the control cost Q. Control performance usually involves the quadratic of state and control output. Thus, we define the control cost Q by a control strategy utility function used in our previous work [24]:

Q_{i} (k) = min_{u_{i} (k + j), j \in [0, \infty]} \sum_{t = 0}^{\infty} α^{N - t} p_{i} (k + t),

(9)

where

0 < α < 1

is the discount rate, and N is a positive integer.

p_{i}

as a binary performance index is defined as:

p_{i} (k) = \{\begin{matrix} 1, & a_{1} ∥x_{i} (k)∥ + a_{2} ∥u_{i} (k)∥ \leq c, \\ 0, & otherwise, \end{matrix}

(10)

where

∥\cdot∥

denotes 2-norm,

a_{1}

and

a_{2}

are the weights of state and control cost respectively, and c is the threshold of performance indication. If the current cost is in an allowed range (less than c), the binary performance index shown in Equation (10) is 1; otherwise, it is 0. Binary performance index makes the strategy utility function limited. When time horizon is limited, even the system diverges, and it can avoid a numerical crash in the learning process.

To prevent the system diverge, we construct a desired control output

u_{d i}

to apply a damping rate

L_{i} \in R^{5 \times 5}

,

∥L_{i}∥ < 1

to the system, so that the power grid frequency dynamic becomes

x_{i} (k + 1) = L_{i} x_{i} (k) + \sum_{j \in N_{p} (i) \ N_{c} (i)} B_{j i} x_{j} (k) + E_{i} w_{i} (k) .

(11)

If we approximate

u_{d i}

by a neural network output

{\hat{u}}_{d i}

, the neural network is defined as

{\hat{u}}_{d i} (k) = M_{a} W_{i} ϕ (k),

(12)

where

ϕ \in R^{n l \times 1}

is the radial basis of neural network calculated from state

x_{1}, x_{2}, \dots, x_{n}

,

W_{i} \in R^{n l \times n l}

is the trainable weight, and

M_{a} \in R^{1 \times n l}

is a given constant matrix. As shown in Figure 2, the proposed neural network structure is based on a radial basis function (RBF). An RBFb-based neural network can identify nonlinear dynamical systems [26]. l is the dimension of radius basis for one area. Under the control of

{\hat{u}}_{d i} = u_{i}

, the system of area i becomes

x_{i} (k + 1) = L_{i} x_{i} (k) + B_{i} [{\hat{u}}_{d i} (k) - u_{d i} (k)] + \sum_{j \in N_{p} (i) \ N_{c} (i)} B_{j i} x_{j} (k) + E_{i} w_{i} (k) .

(13)

According to the control theory of stability, if

B_{i} [{\hat{u}}_{d i} (k) - u_{d i} (k)] + E_{i} w_{i} (k)

and

\sum_{j \in N_{p} (i) \ N_{c} (i)} B_{j i} x_{j} (k)

in the right part of Equation (13) are bounded, the system is ultimately uniformly bound (UUB) [27]. We assume the disturbance

w_{i} (k)

is bounded as

∥w_{i} (k)∥ \leq ε

, where

ε

is a small real number, so learning

W_{i}

should involve the approximation of

u_{d i}

.

The control cost involving system performance is also approximated by a neural network as shown in Equation (14):

{\hat{Q}}_{i} (k) = M_{c} W_{i} ϕ (k),

(14)

where

M_{c} \in R^{1 \times n l}

is a given constant matrix.

We train the neural network weight

W_{i}

to approximate the desired control output

u_{d i}

as well as the performance measurement for the improvement of system performance. Therefore, we define a loss function for training as

\begin{matrix} V_{i} (k) = & \frac{1}{2} {∥B_{i} {\hat{u}}_{d i} (k) - B_{i} u_{d i} (k)∥}^{2} + \\ \frac{1}{2} {∥α^{N + 1} p_{i} (k) + α^{- 1} {\hat{Q}}_{i} (k + 1) - {\hat{Q}}_{i} (k)∥}^{2} + \frac{1}{2} {∥{\hat{Q}}_{i} (k)∥}^{2} . \end{matrix}

(15)

The first term denotes the approximation error of the desired control output, the second term denotes the approximation error of system performance, and the third term is used to improve the long-term system performance. As shown in Equation (16), the desired control output

u_{d i}

and system performance

Q_{i}

can be expressed by the neural network:

\{\begin{matrix} u_{d i} (k) = M_{a} W_{i}^{*} ϕ (k) + υ_{a i} (k), \\ Q_{i} (k) = M_{c} W_{i}^{*} ϕ (k) + υ_{c i} (k), \end{matrix}

(16)

where

∥υ_{a i} (k)∥ \leq v

and

∥υ_{c i} (k)∥ \leq v

are the optimal approximation errors, v is is a small number, and

W_{i}^{*}

is the optimal approximation of a constant matrix. We derive the defined loss function

V_{i} (k)

subject to

W_{i} (k)

and neglect

v_{a i}

and

v_{c i}

, so the converted online iteration formula is shown as

\{\begin{matrix} W_{i} (k + 1) = W_{i} (k) - β_{i} \frac{\partial L_{i} (k)}{\partial W_{i} (k)}, \\ \frac{\partial L_{i} (k)}{\partial W_{i} (k)} = \{M_{a}^{T} [x_{i} (k + 1) - L_{i} x_{i} (k)] + M_{c}^{T} [α^{N + 1} p_{i} (k) + α^{- 1} {\hat{Q}}_{i} (k + 1)]\} ϕ {(k)}^{T} . \end{matrix}

(17)

3.2. Structure Design by Sparse Neural Networks

The structure design involves the placement of sensor, actuator, and communication topology. It is formulated by the neural network weight

W_{i}

. The weight matrix is separated into column groups, so we can define the radius basis

ϕ

as

ϕ = {(\begin{matrix} {ϕ_{1}}^{T} & {ϕ_{2}}^{T} & \dots & {ϕ_{n}}^{T} \end{matrix})}^{T},

(18)

where

ϕ_{i} = {(\begin{matrix} e^{- \frac{{∥x_{i} - x_{c 1}∥}^{2}}{σ^{2}}} & e^{- \frac{{∥x_{i} - x_{c 2}∥}^{2}}{σ^{2}}} & \dots & e^{- \frac{{∥x_{i} - x_{c l}∥}^{2}}{σ^{2}}} \end{matrix})}^{T}

,

x_{c j}

is a given radius center,

j \in N^{+} \cap [1, l]

, and

σ

is a given radius width. According to the definition of

ϕ

, we know that the neural weight is separated into n column groups. Each group that has l columns corresponds to the radius basis

ϕ_{i}

of one area. If the weight matrix

W_{i}

is group sparse [28] with respect to these columns, the structure of control system can be figured out. For instance, if the column group j of

W_{i}

corresponding to the gain of

ϕ_{i}

is zero, the controller of area i does not need the information from area j, and the communication channel between area j and i is also not necessary. If all the controllers do not require the information from area j, any sensor is not necessary for area j. If

W_{i} = 0

, it means that the actuator in area i is not required. The key is to force

W_{i}

to be column group sparse. Thus, a group sparse regulation term is added to the loss function. Thus, the loss function becomes

\begin{matrix} V_{i} (k) = \frac{1}{2} [{∥B_{i} {\hat{u}}_{i} (k) - B_{i} u_{d i} (k)∥}^{2} + {∥{\hat{Q}}_{i} (k)∥}^{2}] + \\ \frac{1}{2} {∥α^{N + 1} p_{i} (k) + α^{- 1} {\hat{Q}}_{i} (k + 1) - {\hat{Q}}_{i} (k)∥}^{2} + γ_{i} \sum_{j = 1}^{n} ∥v e c (W_{i G (j)})∥, \end{matrix}

(19)

where

W_{i G (j)}

denotes the column group j or the gain of radius basis

ϕ_{j}

.

γ_{i}

is the regulation weight. The sparse regulated learning iteration is demonstrated in Equation (20), which can be also derived by Equation (17):

\{\begin{matrix} W_{i} (k + 1) = W_{i} (k) - β_{i} \frac{\partial V_{i} (k)}{\partial W_{i} (k)} \\ \frac{\partial V_{i} (k)}{\partial W_{i} (k)} = \{M_{a}^{T} [x_{i} (k + 1) - L_{i} x_{i} (k)] + M_{c}^{T} [α^{N + 1} p_{i} (k) + α^{- 1} {\hat{Q}}_{i} (k + 1)]\} ϕ {(k)}^{T} \\ + γ_{i} W_{i} (k) D_{i} (k), \end{matrix}

(20)

where

D_{i} = d i a g [\begin{matrix} 1 {∥v e c (W_{i G (1)})∥}^{- 1} & 1 {∥v e c (W_{i G (2)})∥}^{- 1} & \dots & 1 {∥v e c (W_{i G (n)})∥}^{- 1} \end{matrix}]

, and

1 = [1, 1, \dots, 1] \in R^{l}

.

W_{i j}

means the jth weight block in

W_{i}

, which involves

ϕ_{j}

. Thus, we have

{\hat{u}}_{d i} = W_{i 1} ϕ_{1} + W_{i 2} ϕ_{2} + \dots + W_{i n} ϕ_{n} .

(21)

The stability analysis of iteration in Equation (20) can be taken in a similar way used in [24]. We know that if

β_{i}

and

γ_{i}

are small enough, the iteration in Equation (20) is stable and the errors in results are limited.

4. Structure Optimization under DoS Attacks

In this section, the control system structure design is solved by a structure optimization problem. Considering DoS attack, a Stackelberg game is formulated in the structure optimization. In addition, the optimization algorithm is also presented.

4.1. Stackelberg Game Formulation

As shown in Equation (20), the iteration searches a system structure and control scheme in simulation, and then a DoS attacker observes the system and takes attack actions. It is a leader–follower sequence, therefore a Stackelberg game model is proposed to obtain the optimal solution of control structure optimization and DoS attacks.

There are two actors in the proposed model. One is a defender that is the system designer, and the other one is an attacker that is a DoS attacker. The defender’s action is

N_{c} = \{N_{c} (1), N_{c} (2), \dots, N_{c} (n)\}

. The reward function of defender is defined as

r (N_{c}, δ) = \sum_{i}^{n} {Q_{i}|}_{N_{c} (i), δ} .

(22)

The attacker’s action is

δ = \{\begin{matrix} δ_{N_{c} (1) 1} & δ_{N_{c} (2) 2} & \dots & δ_{N_{c} (n) n} \end{matrix}\}

. The reward of attacker is

- r (N_{c}, δ)

. Therefore, it is a zero-sum game. The structure optimization becomes a min-max optimization problem as follows:

({N_{c}}^{*}, δ^{*}) = arg min_{δ} max_{N_{c}} r (N_{c}, δ),

(23)

where

{N_{c}}^{*}

and

δ^{*}

are the equilibrium of game model, which can also be treated as the optimal solution for both defender and attacker. Thus, it can obtain

r (N_{c}^{}, δ^{*}) \leq r (N_{c}^{*}, δ^{*}) \leq r (N_{c}^{*}, δ^{})

.

4.2. Structure Optimization under Dos Attacks

For a given DoS attack

δ

, we know that the optimal structure and control law can be figured out by the iteration of Equation (20): The iteration derives as follows.

{N_{c}^{*}|}_{δ} = arg max_{N_{c}} r (N_{c}, δ) .

(24)

If

j \notin N_{c}^{*} (i)

, the iteration of Equation (20) usually obtains a small

∥v e c (W_{i j})∥

instead of

∥v e c (W_{i j})∥ = 0

because it is an numerical method. Therefore, a threshold method is used in our algorithm to obtain

{N_{c}^{*}|}_{δ}

. We can obtain the estimated

{N_{c}^{*}|}_{δ}

by Equation (25):

{{\hat{N}}_{c}^{*}|}_{δ} (i) = \{j| ∥v e c (W_{j i})∥ \leq ρ_{i}\}, i \in N^{+} \cap [1, n],

(25)

where

N^{+}

is the integer set, and

ρ_{i}

is a given threshold that is a small positive number. The equilibrium of Stackelberg game can be reached by solving the optimization problem of Equation (23), or an optimization problem of minimization with constraints as follows:

\begin{matrix} δ^{*} = arg min_{δ} r ({N_{c}^{*}|}_{δ}, δ), \\ \begin{matrix} s . t . & \sum_{i = 1}^{n} \sum_{j \in N_{c} (i)} δ_{j i} \leq C, δ_{j i} \in [0, 1] & j, i \in N^{+} \cap [1, n] . \end{matrix} \end{matrix}

(26)

In summary, the algorithm of structure optimization under DoS attacks is described in Algorithm 1.

Algorithm 1 Algorithm of Structure Optimization under DoS Attacks

Input: : the radial basis dimension l, area number n, learning rate $β_{i}$ and $γ_{i}$ .
Output: : system structure $N_{c}^{*}$ and optimal attack strategy $δ^{*}$ .
- Initialize $M_{a}$ , $M_{c}$ , $W_{i}$ , $i = 1, 2, \dots, n$ and $δ (0)$ , each element is subject to Gaussian distribution with small average value 0.01 and variance 0.14. Set $t = 0$ and maximal iteration number as MAXITER;
- For given $δ (t)$ , find ${{\hat{N}}_{c}^{*}|}_{δ (t)}$ by the iteration of Equation (20) and (25);
- Measure the gradient of $r [{N_{c}^{*}|}_{δ}, δ (t)]$ that is subject to $δ$ ;
- Update $δ$ to obtain $δ (t + 1)$ based on the gradient obtained in step c, and handle the constraints of Equation (26) by Lagrange method [29] or barrier function method [30];
- Check whether the updated variance is less than a threshold or t equals to the maximal iteration number MAXITER. If yes, end the algorithm and go to step f, otherwise $t = t + 1$ and return to step b;
- Obtain $N_{c}^{*} = {{\hat{N}}_{c}^{*}|}_{δ (t)}$ and $δ^{*} = δ (t)$ .

The obtained

N_{c}^{*}

by Algorithm 1 is the optimized control structure of power system. The worst case of system performance with our controller is also known by Algorithm 1. However, we do not further consider the unstable control system caused by a large C [5] or the attacker has a lot of cyber resources to launch DoS attacks. These issues will be investigated in future.

5. Experiments and Analysis

This section illustrates two cases of IEEE 14 bus system and 24 bus system [31] to show the effectiveness and advantages of the proposed scheme. The scheme takes DoS attacks into account and considers the worst case of attacks under the constraints in the Stackelberg game model mentioned above. In the Stackelberg game model, the scheme uses the optimal structure design to enhance the system performance by reinforcement learning. We assume sub-system parameters are as follows:

\begin{matrix} A_{i} = (\begin{matrix} 0 & - \sum_{j \in D_{i}} T_{j i} & 0 & 0 & 0 \\ 1 / M_{i} & - D_{i} / M_{i} & 1 / M_{i} & 0 & 0 \\ 0 & 0 & - 1 / T_{d i} & 1 / T_{d i} & 0 \\ 0 & - 1 / (T_{g i} R_{g i}) & 0 & 1 / T_{g i} & K_{i} / T_{g i} \\ 1 & - b_{i} & 0 & 0 & 0 \end{matrix}), B_{j i} = (\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ \frac{T_{j i}}{M_{i} \sum_{h \in D_{i}} T_{i h}} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}), \\ B_{i} = {(\begin{matrix} 0 & 1 / M_{i} & 0 & 0 & 0 \end{matrix})}^{T}, \end{matrix}

where

M_{i}

and

D_{i}

are inertia and damping constant respectively,

T_{g i}

and

T_{d i}

are the governor and gas turbine constant respectively, and

R_{g i}

and

T_{i j}

are the regulation and synchronizing constant respectively [20]. The related parameters of sub-system are listed in Table 1 [24].

The constraints of DoS attacks are set as follows:

\sum_{i = 1}^{14} \sum_{j \in N_{c} (i)} δ_{j i} \leq 1.4 .

The subsystems in grid communicate with each other by communication channels. The above constraint means that the sum of all communication channel jamming possibilities is 1.4, or the average jamming possibility is 0.1 for each communication channel. The larger C in Equation (26) is, the more cyber resources for attackers would be taken. In the following simulations, we set the sampling period to 0.1 s.

5.1. Case I: IEEE 14 Bus Test System

The IEEE 14 bus test system is carried out under reinforcement learning control and DoS attacks. The initial state of power grid simulation is assumed to be affected by a large disturbance. The simulation time lasts 5 s. Without any control, we can see that the system collapses under serious DoS attacks after a large power grid disturbance occurs as shown in Figure 3. The frequency increases to a large value, which reaches to a maximal value about 40 Hz in 5 s. Therefore, a controller is required to maintain the stability. The parameters of reinforcement learning algorithm are listed in Table 2. The detail of parameter selection principle can be referred to [2].

The parameter

β_{i}

decides the result of IEEE 14 bus test system. If

β_{i}

is too large, the online learning of Equation (20) diverges; otherwise, too small

β_{i}

may cause the slow convergence rate. The selection of

γ_{i}

is another key factor. Too large

γ_{i}

may result in the poor control performance and high sparsity of

W_{i}

.

Figure 4 and Figure 5 show the results that use the proposed control scheme and the optimal DoS attack strategy. The optimal DoS attack strategy is obtained by solving Equation (26), and the optimal controllers are learned by the online iteration of Equation (20). According to the results in Figure 4, we can see the frequency deviation curves of all sub-systems converge to a steady value around 0. The maximal fluctuant magnitude of these curves is a small value about 1.2 Hz. The swings end in about 4 s.

For the optimal DoS attack strategy shown in Figure 5a, the attacks focus on the communication from buses 3, 6, 10, 11, and 13. For the optimal control structure about placement of sensor, actuator, and communication topology shown in Figure 5b, it shows the F-norm for block

W_{i j}

in

W_{i}

,

i, j = 1, 2, \dots, n

. If the norm value is large, it means sub-system j needs information from sub-system i. Thus, sub-system i needs to install sensors, and sub-system j requires actuators as well as the communication topology from sub-system i to j. The sensors are mainly installed in buses 2, 3, 6, 10, 11, 13, and 14. According to the solution, those communication lines that are not attacked maintain the system stable. Thus, the obtained solution is optimal for both attacker and defender (control system designer) as Nash equilibrium.

5.2. Case II: IEEE 24 Bus Test System

This section verifies the effectiveness of the proposed scheme in a relatively large system—IEEE 24 bus test system. The simulation time lasts 10 s. Both reinforcement learning control and DoS attacks are applied. The optimal attack strategy and the optimized control structure are obtained by solving the optimization of Equation (26). Specifically, the optimal attack strategy is solved by the offline optimization algorithm described in Section 4.2, and the optimized control structure is obtained by the online learning of Equation (20).

The parameters of reinforcement learning algorithm are listed in Table 3. The detail of parameter selection can be referred to [2].

The simulation results are shown in Figure 6 and Figure 7. Figure 6 shows the frequency deviation curves of power grid under the worst case of DoS attacks. The results illustrate the effectiveness of propose solution under the large-size disturbance and DoS attacks. The swings of all the frequency deviation curves end in about 5 s. The magnitude of swings is small, and its maximal value is 1.5 Hz.

From the optimal attack strategy results shown in Figure 7a, we know the attacker should focus on the communications from buses 2, 3, 4, 14, 19, 21, and 22 with the attack possibility of 0.2–0.35. The optimal control structure under the worst case of DoS attacks is shown in Figure 7b. The sensor should be installed in buses 2, 3, 4, 6, 14, 15, 19, 22, 23, and 24. The weight of bus 19 is high, which means the information obtained from bus 19 has high importance. Thus, DoS attacks have such an attack strategy that attacks the communication lines of bus 19 with a relatively high attack rate.

6. Conclusions

This paper proposes a novel optimization method of control structure as well as a reinforcement learning method in an integrated neural network under DoS attacks. The reinforcement learning, which involves the frequency damping and control performance, is approximated by the integrated neural network. The frequency damping includes the desired control input holding the system stable and accelerating the convergence rate in online learning. The approximation of control performance is adopted as the reinforcement signal to enhance the long-term system performance. To obtain a reasonable control scheme and structure under DoS attacks, a Stackelberg game model is also proposed. The worst case of DoS attacks is considered and solved by the proposed optimization solution. The optimization solution also derives the optimal control structure, which involves the placement of sensors, actuators (DES), and the communication topology. The simulation results illustrate the effectiveness of the proposed scheme. The optimal DoS attacks and the control structure are obtained in the simulation. In future, the convexity of the game and the existence of game equilibrium will be analyzed. The constraints of DER considering the environment and variation will also be considered. In addition, the error of the proposed algorithm will be analyzed theoretically and numerically in the next steps.

Author Contributions

Conceptualization, J.S.; methodology, J.S.; software, J.S.; validation, J.S., G.Q. and Z.Z.; formal analysis, Z.Z.; investigation, G.Q. and Z.Z.; resources, J.S.; data curation, J.S. and Z.Z.; writing–original draft preparation, J.S.; writing–review and editing, G.Q.; visualization, Z.Z.; supervision, G.Q; project administration, J.S. and Z.Z.; funding acquisition, J.S. and Z.Z.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 61803061 and 61703347); Fundamental Research Funds for the Central Universities (Grant No. XDJK2019C019); Science and Technology of the Chongqing Natural Science Foundation (Grant No. cstc2016jcyjA0428); the Research Program of Chongqing Municipal Education Commission (Grant No. KJQN201800603); the Innovation Project of Chongqing Overseas Students Entrepreneurial Innovation Support program (Grant No. cx2018074); Chongqing Key Industries Common Key Technology Innovation project (Grant No. cstc2017zdcy-zdyf0366); and Southwest University Education Reform Project (Grant No. 2017JY080).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Chen, P.; Li, J.; Fei, M.R. Resilient event-triggered H_inf load frequency control for networked power systems with energy-limited DoS attacks. IEEE Trans. Power Syst. 2017, 32, 4110–4118. [Google Scholar]
Sun, J.; Zhu, Z.; Li, H.; Chai, Y.; Qi, G.; Wang, H.; Hu, Y.H. An integrated critic-actor neural network for reinforcement learning with application of DERs control in grid frequency regulation. Int. J. Electr. Power Energy Syst. 2019, 111, 286–299. [Google Scholar] [CrossRef]
Sridhar, S.; Hahn, A.; Govindarasu, M. Cyber-Physical System Security for the Electric Power Grid. Proc. IEEE 2011, 100, 210–224. [Google Scholar] [CrossRef]
Liang, G.; Zhao, J.; Luo, F.; Weller, S.; Dong, Z.Y. A Review of False Data Injection Attacks Against Modern Power Systems. IEEE Trans. Smart Grid 2017, 8, 1630–1638. [Google Scholar] [CrossRef]
Yuan, Y.; Yuan, H.; Lei, G.; Yang, H.; Sun, S. Resilient Control of Networked Control System under DoS Attacks: A Unified Game Approach. IEEE Trans. Ind. Inform. 2016, 12, 1786–1794. [Google Scholar] [CrossRef]
Srikantha, P.; Kundur, D. Denial of service attacks and mitigation for stability in cyber-enabled power grid. In Proceedings of the Innovative Smart Grid Technologies Conference, Washington, DC, USA, 18–20 February 2015. [Google Scholar]
An, L.; Yang, G.H. Decentralized Adaptive Fuzzy Secure Control for Nonlinear Uncertain Interconnected Systems Against Intermittent DoS Attacks. IEEE Trans. Cybern. 2019, 49, 827–838. [Google Scholar] [CrossRef] [PubMed]
Sun, H.; Peng, C.; Yang, T.; Zhang, H.; He, W. Resilient control of networked control systems with stochastic denial of service attacks. Neurocomputing 2017, 270, 170–177. [Google Scholar] [CrossRef]
Hu, S.; Yue, D.; Xie, X.; Chen, X.; Yin, X. Resilient Event-Triggered Controller Synthesis of Networked Control Systems Under Periodic DoS Jamming Attacks. IEEE Trans. Cybern. 2018. [Google Scholar] [CrossRef]
Li, Y.; Quevedo, D.E.; Dey, S.; Ling, S. SINR-based DoS Attack on Remote State Estimation: A Game-theoretic Approach. IEEE Trans. Control Netw. Syst. 2017, 4, 632–642. [Google Scholar] [CrossRef]
Yuan, Y.; Sun, F.; Liu, H. Resilient control of cyber-physical systems against intelligent attacker: A hierarchal stackelberg game approach. Int. J. Syst. Sci. 2016, 47, 2067–2077. [Google Scholar] [CrossRef]
Ding, K.; Dey, S.; Quevedo, D.E.; Ling, S. Stochastic Game in Remote Estimation under DoS Attacks. IEEE Control Syst. Lett. 2017, 1, 146–151. [Google Scholar] [CrossRef]
Ding, K.; Li, Y.; Quevedo, D.E.; Dey, S.; Ling, S. A multi-channel transmission schedule for remote state estimation under DoS attacks. Automatica 2017, 78, 194–201. [Google Scholar] [CrossRef]
Zhu, Q.; Basar, T. Game-Theoretic Methods for Robustness, Security, and Resilience of Cyberphysical Control Systems: Games-in-Games Principle for Optimal Cross-Layer Resilient Control Systems. IEEE Control Syst. 2015, 35, 46–65. [Google Scholar]
Atwa, Y.M.; El-Saadany, E.F. Optimal Allocation of ESS in Distribution Systems With a High Penetration of Wind Energy. IEEE Trans. Power Syst. 2010, 25, 1815–1822. [Google Scholar] [CrossRef]
Borges, C.L.T.; Falcão, D.M. Optimal distributed generation allocation for reliability, losses, and voltage improvement. Int. J. Electr. Power Energy Syst. 2006, 28, 413–420. [Google Scholar] [CrossRef]
Zhang, H.; Ayoub, R.; Sundaram, S. Sensor selection for Kalman filtering of linear dynamical systems: Complexity, limitations and greedy algorithms. Automatica 2017, 78, 202–210. [Google Scholar] [CrossRef]
Gupta, V.; Chung, T.H.; Hassibi, B.; Murray, R.M. On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage. Automatica 2006, 42, 251–260. [Google Scholar] [CrossRef]
Qu, C.; Chen, W.; Song, J.B.; Li, H. Distributed Data Traffic Scheduling With Awareness of Dynamics State in Cyber Physical Systems With Application in Smart Grid. IEEE Trans. Smart Grid 2015, 6, 2895–2905. [Google Scholar] [CrossRef]
Zhu, Z.; Sun, J.; Qi, G.; Chai, Y.; Chen, Y. Frequency Regulation of Power Systems with Self-Triggered Control under the Consideration of Communication Costs. Appl. Sci. 2017, 7, 688. [Google Scholar] [CrossRef]
Li, H. Data traffic scheduling for cyber physical systems with application in voltage control of microgrids. IEEE Syst. J. 2017, 8, 542–552. [Google Scholar] [CrossRef]
Cambiaso, E.; Papaleo, G.; Aiello, M. Slowcomm: Design, development and performance evaluation of a new slow DoS attack. J. Inf. Secur. Appl. 2017, 35, 23–31. [Google Scholar] [CrossRef]
Cambiaso, E.; Papaleo, G.; Giovanni, C.; Aiello, M. A Network Traffic Representation Model for Detecting Application Layer Attacks. Int. J. Archit. Comput. 2016, 5, 31–42. [Google Scholar]
Sun, J.; Li, J. A Stable Distributed Neural Controller for Physically Coupled Networked Discrete-Time System via Online Reinforcement Learning. Complexity 2018, 2018, 5950678. [Google Scholar] [CrossRef]
Ding, D.; Wang, Z.; Ho, D.W.; Wei, G. Observer-Based Event-Triggering Consensus Control for Multiagent Systems With Lossy Sensors and Cyber-Attacks. IEEE Trans. Cybern. 2017, 47, 1936–1947. [Google Scholar] [CrossRef] [PubMed]
Robnik-šikonja, M. Data Generators for Learning Systems Based on RBF Networks. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 926–938. [Google Scholar] [CrossRef] [PubMed]
Hansen, L.P.; Sargent, T.J. Robust Control and Model Uncertainty. Am. Econ. Rev. 2001, 91, 60–66. [Google Scholar] [CrossRef]
Pan, C.; Liu, W.; Thompson, J.S.; Yang, C.; Jorswieck, E.A. Semi-dynamic Green Resource Management in Downlink Heterogeneous Networks by Group Sparse Power Control. IEEE J. Sel. Areas Commun. 2016, 34, 1250–1266. [Google Scholar]
Bertsekas, D.P. Constrained Optimization and Lagrange Multiplier Methods; Academic Press: Cambridge, MA, USA, 1982. [Google Scholar]
Polak, E.; Yang, T.H.; Mayne, D.Q. A Method of Centers Based on Barrier Functions for Solving Optimal Control Problems with Continuum State and Control Constraints. SIAM J. Control Optim. 2006, 31, 159–179. [Google Scholar] [CrossRef]
Zimmerman, R.D.; Murillo-Sanchez, C.E.; Thomas, R.J. MATPOWER: Steady-State Operations, Planning, and Analysis Tools for Power Systems Research and Education. IEEE Trans. Power Syst. 2011, 26, 12–19. [Google Scholar] [CrossRef]

Figure 1. Area structure.

Figure 2. Neural network structure.

Figure 3. Frequency deviation of IEEE 14 bus system without control.

Figure 4. Frequency deviation of IEEE 14 bus system under control.

Figure 5. Optimal DoS attacks and control structure 3D Graph in the IEEE 14 bus system.

Figure 6. Frequency deviation of IEEE 24 bus system under control.

Figure 7. Optimal DoS attacks and control structure 3D graph in the IEEE 24 bus system.

Table 1. Parameters of sub-system.

Parameter Name	Description	Value
$M_{i}$	inertia constant	0.2
$D_{i}$	damping constant	0.26
$T_{j i}$	synchronizing constant	0.5
$T_{d i}$	governor constant	5
$R_{g i}$	regulation constant	0.5
$b_{i}$	frequency bias gain	1
$T_{g i}$	gas turbine constant	0.2
$K_{i}$	tie-line bias control gain	0.1

Table 2. Parameter of the controller in the IEEE 14 bus test system.

Parameter Name	Description	Value
$α$	damping factor for cost $p_{i}$	0.9
N	Horizon length for cost	10
$β_{i}$	Learning rate of neural network	0.1
$γ_{i}$	Weight of group sparse regulation term	0.012
$\| L i \|$	Norm of damping parameters $L_{i}$	0.1

Table 3. Parameter of controller in IEEE 24 bus test system.

Parameter Name	Description	Value
$α$	damping factor for cost $p_{i}$	0.8
N	Horizon length for cost	10
$β_{i}$	Learning rate of neural network	1
$γ_{i}$	Weight of group sparse regulation term	0.005
$\| L i \|$	Norm of damping parameters $L_{i}$	0.1

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, J.; Qi, G.; Zhu, Z. A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid. Appl. Sci. 2019, 9, 2217. https://doi.org/10.3390/app9112217

AMA Style

Sun J, Qi G, Zhu Z. A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid. Applied Sciences. 2019; 9(11):2217. https://doi.org/10.3390/app9112217

Chicago/Turabian Style

Sun, Jian, Guanqiu Qi, and Zhiqin Zhu. 2019. "A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid" Applied Sciences 9, no. 11: 2217. https://doi.org/10.3390/app9112217

APA Style

Sun, J., Qi, G., & Zhu, Z. (2019). A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid. Applied Sciences, 9(11), 2217. https://doi.org/10.3390/app9112217

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Sparse Neural Network Based Control Structure Optimization Game under DoS Attacks for DES Frequency Regulation of Power Grid

Abstract

1. Introduction

2. Problem Formulation

2.1. Power Grid Frequency Dynamic Model

2.2. DoS Attack Model

2.3. Control, Structure Design and Optimization Problem

3. Control and Structure Design

3.1. Control Design by Reinforcement Learning

3.2. Structure Design by Sparse Neural Networks

4. Structure Optimization under DoS Attacks

4.1. Stackelberg Game Formulation

4.2. Structure Optimization under Dos Attacks

5. Experiments and Analysis

5.1. Case I: IEEE 14 Bus Test System

5.2. Case II: IEEE 24 Bus Test System

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI