A Deep Learning-Based Classiﬁcation Scheme for False Data Injection Attack Detection in Power System

86-188-1110-9859


Introduction
The power system is a complex and interconnected network that transfers electrical energy from generators to users [1,2]. The power grid is continuously operated and monitored by a supervisory control and data acquisition system (SCADA) to ensure a normal operating condition. In particular, the state of the power system is estimated by the measured value, and the system operators use the estimated state to control the actual operation [3][4][5].
By integrating various advanced communication technologies, the power system is moving towards the direction of the smart grid [6][7][8]. However, due to the deep integration of the cyber space with the physical space, the power grid is facing increasing security challenges. In addition, massive real-time power system data has brought about the transformative potential and challenge of protecting smart grid systems. Physical security and cyber security are two significant aspects of power system security. Physical security is the ability of a power system to maintain continuous supply in the event of equipment breakdowns. Cyber security refers to the security of a SCADA system that maintains the operation of the power system. Recently, cyber-attacks have gradually threatened modern power systems due to the ubiquitous use of communication technologies [9][10][11]. Besides, because of the close interlinking between the physical and SCADA systems, the physical security of power systems can be compromised by cyber security vulnerabilities [12][13][14].
Cyber-attacks have led to numerous incidents and have been concerned by both power system operators and users. They can undermine or even completely disrupt the control system of the power grid. For instance, in 2010, the Iranian nuclear power plant was invaded by a Stuxnet worm that falsely altered the system status, which spread across the whole SCADA system and disrupted system protection strategies. On December 23th, 2015, three Ukrainian regional electric power distribution companies experienced a cyber-attack, which caused power outages affecting nearly 225,000 customers for several hours. Barely a month after the incident, ransomware attacked the Israel Electric Authority through online phishing. The events are fresh examples of the vulnerability of a highly automated smart grid to cyber-attacks. Generally, there are three major types of cyber-attacks: denial of service attack (DoS), replay attack (RA), and false data injection attack (FDIA). The DoS attack occurs when an attacker inserts artificial loads to the service source such that the normal trend of service will be no longer accessible to legitimate requests [15]. The RA involves an attacker replacing the current data with the measurement of a certain period of time before the control center can make correct decisions about the current system state [16]. The FDIA means that an attacker can access the current power system configuration and manipulate the stored data and measurements. This article focuses on the FDIA, which is regarded as a severe threat to the SCADA system.
There is a growing body of literature that recognizes the FDIA. Studies over the past decades have provided valuable information on the FDIA scenarios and the corresponding detection strategies. Bobba et al. [17] investigated the detection of the FDIA by a strategically selected set of measurements and state variables. The authors show that it is useful to defend against such attacks by protecting a set of basic measurements. Pasqualetti et al. [18] proposed a mathematical framework for cyber-physical systems, characterized fundamental monitoring limitations from system-theoretic and graph-theoretic perspectives, and designed centralized and distributed attack detection and identification monitors. In Reference [19], the authors introduced the attack model with the least amount of effort and formulated the attack strategy, in which several meters are selected for manipulation to cause the maximum damage. To defend against the attacks, the authors also investigated the protection-based defence and detection-based defence. In Reference [20], the problem of false data detection was modelled as a matrix separation problem. The nuclear norm minimization method and low rank matrix factorization method are presented. The authors in [21] introduced two distributed detection methods: distributed observable island detection (DOID) algorithm and distributed time approaching detection (DTAD) algorithm. In Reference [22], the equivalent measurement transformation and the residual researching method are utilized to identify false data. However, to some extent, the above-mentioned traditional methods strongly depend on the prescribed bad data detection threshold and are sensitive to environmental noise. Moreover, they are easily affected by the attack intensity, i.e., the smaller the attack intensity, the lower the detection accuracy.
With the rapid development of artificial intelligence technology, the FDIA detection method based on artificial intelligence has also been widely studied. In Reference [23], the authors designed a support vector machine (SVM) based on the the alternating direction method of multipliers, which can effectively identify whether the power system is under attack. Multilayer perceptrons (MLPs), as deep learning models, have been used to detect attacks in [24][25][26]. They treated the FDIA detection problem as a supervised classification problem. In Reference [27], the authors combined discrete wavelet transform(DWT), dropout with recurrent neural network(RNN), extracted the hidden time-frequency domain characteristics, solved the overfitting problem, and increased the accuracy of FDIA detection. This type of artificial intelligence detection model automatically processes features and the detection accuracy is often higher than the traditional methods, but the training of the model relies on large sample datasets and requires too much computation. In addition, the existing works do not consider the impact of historical measurements on the current situation.
In the recent past, a deep belief network (DBN) was proposed as an unsupervised learning method to learn the hierarchical representations and correlation from real-time data [28,29]. It is one of the basic deep learning technologies built by stacking restricted Boltzmann machines (RBMs) [30][31][32]. By implementing automatic feature extraction, the DBN can achieve higher efficiency and accuracy than traditional machine learning algorithms [33][34][35]. Although the DBN demonstrates good performance in static modelling, it encounters challenges in capturing complicated temporal dynamics from time-series input [36]. In light of this, this paper develops an extended version of the DBN, called the conditional deep belief network (CDBN), which updates a conditional Gaussian-Bernoulli RBM (CGBRBM) to model temporal data [37][38][39]. The CDBN-based approach can then identify the hidden correlation and estimate the reliability of the measurement data. The main contributions of this paper are as follows:

•
The standard DBN is improved to deal with the continuous real-time series data of the power system flexibly and extract the time correlation. • A CDBN-based FDIA detection scheme is proposed to evaluate the reliability of the measurement and ensure the safe and stable operation of the power grid.

•
By simulating different attack scenarios, the performance of the proposed scheme is evaluated from multiple aspects to ensure its feasibility and effectiveness.
Section 2 presents the system model, the state estimation, and the conventional bad data detection (BDD) system. Section 3 mathematically models the FDIA. Section 4 presents the basic principles of the CDBN and formulates a deep learning-based detection scheme. Section 5 performs case studies to evaluate the performance and effectiveness of the developed methodology. The last section draws conclusions and suggests future work.

State Estimation in Power Systems
Generation, transmission, and distribution are the three main parts of the power system. In a power grid, the control centre must monitor the state of all buses and nodes to make operational decisions as quickly as possible. However, it is impossible to measure all the data directly. On this subject, the control centre estimates the operating conditions of the system by collecting the readings from remote meters.
Let z = [z 1 , z 2 , . . . , z m ] T be an m × 1 vector of all measurements, including loads and power injections at buses, power flows at transmission lines, and so on. x = [x 1 , x 2 , . . . , x n ] T denotes an n × 1 state vector, where m n. e = [e 1 , e 2 , . . . , e m ] T , where e m ∼ N 0, σ 2 m is the measurement error. We have where h(·) shows the nonlinear relationship between the measurement z and the state x. In a DC power flow model, Equation (1) can be written in the form of a linear matrix: where H is an m × n Jacobian matrix, and e ∼ N 0, σ 2 is the environmental noise. On this basis, the state vector can be calculated by where

Conventional Bad Data Detection
Erroneous data measurements can occur for a variety of reasons (e.g., device misconfiguration and malicious attacks). These measurements could get incorrect state estimates. Therefore, they must be recognized and removed in time. The BDD system can eliminate some random errors. When detecting and identifying the erroneous data, the L2-norm of measurement residual is first calculated. By comparing the calculated result r with a prescribed threshold τ, it reports normal data measurements if holds, or bad ones otherwise.

False Data Injection Attack
When an adversary launches the FDIA, he can manipulate the measurement z to cause an arbitrary change in the estimated value without being detected by the BDD system [40]. Figure 1 presents the process when the state estimation is attacked. Under the condition of the FDIA, an original measurement z can be replaced by a compromised z a , where z a = z + a and a is an m × 1 malicious data vector. If so, the result of the state estimation then becomes x a . In general, the BDD system is likely to recognize the random attack vector a. However, in [40], it was found that a few well-designed attack vectors (such as a = Hc) can bypass the BDD because the injected false data do not affect the residue: and c = [c 1 , c 2 , . . . , c n ] T is an arbitrary n × 1 vector. Therefore, the attack is stealthy and can inject any malicious data into the state estimation. However, adversaries can usually only compromise a limited number of measurements, so two main realistic attack scenarios are considered as follows: 1.
Multiple attacks [40]: k > 1, adversaries can compromise up to k measurements to launch the FDIA; where k is the number of attacked measurements. However, the FDIAs are not constrained by these two scenarios. In the IEEE 14-bus test system, Figure 2 shows the difference in the economic dispatch of the power system before and after the measurement z is attacked. We can see that the total generation and the production cost are higher than those of the original case. Furthermore, as the attack intensity increases, the difference increases accordingly. We find that the FDIA can leave the system out of control and even cause security risks. Our developed scheme can specifically detect this kind of attack.

Deep Learning-Based Identification Scheme
In order to detect the FDIA, a deep learning-based identification scheme is developed. We propose a CDBN by combining a conventional DBN with a CGBRBM, which can process real-valued data and consider the impact of previous measurements on current detection results. Figure 3 shows the framework of the CDBN. We employ a CGBRBM and stack K − 1 standard RBMs on top, where K is the number of hidden layers. To indicate whether the measurements are attacked by the FDIA, a BP output unit is added at the end of the scheme to make it a binary classifier.

Conventional RBM
The RBM is a two-layer neural network, which is the core of the CDBN. As Figure 4 shows, its two layers are the visible layer and the hidden layer. The units between adjacent layers are connected, but there is no connection inside each layer. The visible layer corresponds to the measurement, and the hidden layer can represent feature extraction. The RBM is an energy-based undirected generation model, and its system energy is where v i and h j are the state of visible unit i and hidden unit j, w ij is the weight between them, a i and b j index the standard biases, n and m are the numbers of visible and hidden units, respectively. According to the property of the RBM, given the state of the visible layer, the activation probability of the jth hidden unit is: Similarly, given the state of the hidden layer, the activation probability of the ith visible unit is: where sigm(x) = 1/(1 + exp(x)). The goal of the RBM training is to obtain the parameters to maximize the likelihood function by gradient descent. By calculating the derivative of the log-likelihood, the weights and the biases can be updated as follows: where ε is the learning rate, · data and · model are the expectations calculated from the data and model distributions, respectively. · data is easily obtained by Equations (9) and (10). However, getting · model is much more difficult. To simplify the process, Hinton proposed an efficient and straightforward contrast divergence (CD) algorithm based on Gibbs sampling [41].

Conditional Gaussian-Bernoulli RBM
In the standard type of the RBM, input data are binary and static, but the measurements in the power system are usually real-valued and time-series data. To address this limitation, we adopt a conditional Gaussian-Bernoulli RBM(CGBRBM) as the basis for the detection algorithm.
It can be seen from Figure 5 that the CGBRBM is a variant of the conventional RBM. First, the input units are linear with Gaussian noise, whereas the hidden units are still binary. The second improvement is that the time-series data can be modelled by considering the visible variables in previous time steps. The energy function of the CGBRBM is: where v i,t is the ith real-valued visible element at time step t, h j is the state of hidden unit j, w ij expresses the weight between v i,t and h j , σ i is the According to Equation (12), the corresponding activation probabilities become where N µ, σ 2 is a Gaussian with mean µ and variance σ 2 . In practice, when σ i 2 is fixed to 1, it can make the learning work better [37]. So, in this case, similar to the conventional RBM, by using the CD algorithm, we can update the weights and the biases as follows: where a ijk and b ijk are the elements of A k and B k .

CDBN
The CDBN is a probability generation model. It is a deep learning classifier composed of the CGBRBM, the RBM, and the BP [42]. As Figure 3 shows, the data are first input into the CGBRBM at the bottom for training and feature extraction. Then, the extracted features are used as the input values of another RBM. In this way, more RBM layers can be stacked [28]. The training process of the CDBN model consists of two steps [30]: layer-wise unsupervised learning and fine-turning.
The first step is an unsupervised learning process. By using the CD algorithm, the RBM of each layer is trained layer-by-layer. Finally, we get the CDBN with a few layers, the parameters of which are suitable for extracting the characteristics of this type of data [31]. In order to optimize the parameters mapped to each layer, the whole CDBN model should be fine-tuned. This process uses the labelled data and the BP network for top-down supervised learning. The binary output node can be calculated by Equation (9), and it can be utilized to represent the compromised label and the normal one. In the calculation of the kth hidden layer, the weights and the biases are updated in the following: where η is the learning rate, p k−1,i is the ith activation probability of the (k − 1)th hidden layer, and where p k,j is the jth activation probability of the kth hidden layer, W k+1,j,h is the jhth element of the (k + 1)th layer weight matrix, H is the number of elements. Correspondingly, for the output layer, the updated values of the weights and the biases are as follows: where p K,i is the ith activation probability of the last RBM layer, and where p o is the activation probability of the output layer, l o and L represent the predicted value and the actual one, respectively. As shown in Figure 6, the detection process of our scheme can be mainly divided into three steps: data preprocessing stage, training stage, and testing stage. The first stage is to obtain the measurement vector z and inject the attack vector a into it according to a certain proportion. After the normalization process, some sample data are selected as the training set and others as the test set. Next, by completing layer-wise unsupervised learning and fine-turning, the model is trained in the second stage. Finally, the trained model is used to predict whether the sample data in the test set is under attack. By comparing with the actual value, the accuracy of our developed scheme can be evaluated.

Simulation
In this simulation, the performance of our developed scheme is evaluated in the IEEE 14-bus test system. All the data used in the simulation, including the vector of measurements and the Jacobian matrix H, are based on the MATPOWER 7.1. MATPOWER is an open-source Matlab (R2017b, MathWorks, Natick, Massachusetts, USA) power system simulation package, which has been widely used in research and education for solving power flow and optimal power flow problems. Included are numerous example power flow and optimal power flow cases. It can simulate most power system scenarios, and the generated data are consistent with the actual situation, which can satisfy the verification of algorithm performance.
In the IEEE 14-bus test system, by changing the active and reactive power of the load, we first use MATPOWER to complete the power flow calculation for 30,000 consecutive moments. Then, some values (including the branch power flow, the active and reactive power of the generator, and the node voltage, a total of 39 values) are selected from the calculation results of each power flow, and Gaussian noise (such as (0, 0.25)) is injected into them. Finally, the calculation result is regarded as the measurement of state estimation. There are 30,000 measurements in total, and the number of elements in each measurement is 39. Next, according to the method in [40], the FDIA is launched randomly on 15,000 measurements. The measurement residual after the attack is guaranteed to be less than the prescribed threshold τ, so as to avoid bad data detection. These 30,000 measurements are divided into three parts on average, which are used as the training set, the verification set, and the test set, respectively. For the above two scenarios (least-effort attack and multiple attacks), we consider the following three aspects to evaluate the performance of the mechanism. Each value of the simulation is an average among 30 independent trials.

I.
Effect of the height and width of the CDBN We first study the effect of the number of hidden layers and the number of units per layer on the performance of our developed scheme. In this simulation, the number of attacked measurements k is set to 1, the size of the observation window (N) is 1, the time interval (I) is 2, and the number of hidden layers is changed from 2 to 5, the hidden layer units range from 20 to 60. From Figure 7, when there are three hidden layers and the number of units in each layer is 30, we can see that the accuracy can be up to 97.3%.

II. Effect of the Observation Window Structure
Next, we consider the effect of the size of the observation window at the previous time (N) and the time interval (I) between two adjacent time steps on the effectiveness of our scheme, where N and I are defined before. According to the conclusion of the previous section, we build a CDBN structure with three hidden layers and 30 units in each layer. The range of N is set from 1 to 4, and I is increased from 1 to 5. We can see the simulation results in Figure 8. Considering the accuracy and the availability, N = 1 and I = 2 represent a reasonable choice for detecting the FDIA.

Multi-Scenario Validation
In this experiment, we discuss the accuracy of our developed scheme in the least-effort attack (k = 1) and multiple attacks (k > 1), respectively. According to Section 5.1.1, we simulate a 3-layer CDBN model with 30 units per layer, set N to 1, and I to 2. Besides, by using the same data set, we compare the performance of our method with the ANN and the SVM, where the ANN consists of a hidden layer with 30 units and the radial basis function (RBF) kernel is used in the SVM. Figure 9 shows the detection results. Specifically, when k = 1, 4, 7, 10, the receiver operating characteristics (ROC) curves of the method are shown in Figure 10 [43]. ROC is one of the essential metrics for evaluating the performance of a classification model.

Robustness Validation
In the previous experiment, we set N (0, 0.25) as the environmental noise. It means that we use a Gaussian with mean 0 and variance 0.25 as the environmental noise. However, the real environment may be much worse. To evaluate the robustness, in this part, we fix the number of attacked measurements (k) to 4, and the standard deviation σ of environmental noise~N (0, σ) changes from 0.25 to 2.5. The settings of the other structural parameters are the same as Section 5.1.2. Figure 11 compares the accuracy obtained from the ANN, the SVM, and our developed scheme.

Analysis of Results
In the verification of the CDBN structure, the number of hidden layers, the number of units per layer, the size of the observation window (N), and the time interval (I) are four important parameters. The function of depth is to abstract layer by layer and extract features continuously, while the function of width is to allow each layer of RBM to learn more features. Generally speaking, the performance of an algorithm is more sensitive to depth, and an appropriate width is easier to improve performance. Setting too few or too many layers and hidden units may cause under-fitting and over-fitting, which will decrease the accuracy [44]. If N is larger or smaller than what is required, the observation window cannot adequately reflect the recent changes in the measurements. Similarly, a larger I tends to smooth out or even ignore some short-term but critical fluctuations, whereas a smaller I may cause this change to be too dramatic and lose its reference value [45]. So, by choosing the appropriate parameters, the accuracy of the mechanism can be significantly improved. According to the experimental results, we simulate a three-layer CDBN model with 30 units per layer, set N to 1, and I to 2.
In the multi-scenario validation, we can find that the accuracy of our CDBN-based method is higher than the other two. Moreover, with the increase of k, the detection performance is stable, and the accuracy can reach up to 98.4%. The area under curve (AUC) is close to 1. The developed scheme not only considers the time correlation of the measurements, but the structure of deep learning also makes the feature extraction more accurate. It can be inferred that the CDBN model has good performance and can accurately identify FDIA.
In the robustness validation, as the noise level increases, the accuracy of the three methods decreases. It is understandable. Because the higher the noise level, the harder it is to distinguish between normal and compromised measurements. However, the accuracy of the developed method is always the highest of the three. It can be concluded that the CDBN-model can deal with more severe situations and is more suitable for FDIA detection in actual power scenarios. Especially when σ < 2.0, the accuracy can be more than 90%. That is to say, when the difference caused by the environmental noise is smaller than that caused by the FDIA, our CDBN-based method is competent and has good robustness.
Although the detection accuracy is high, there are still some FDIAs undetected. There are three main reasons for this: 1.
The choice of the parameters 2.
The presence of environmental noise 3.
Insufficient data In conclusion, our developed scheme has the advantages of high detection accuracy, stable performance, and good robustness. It has great practical value in FDIA detection.

Conclusions
This article presents an in-depth study of the state estimation, analyzes the basic principles of the FDIA, and focuses on the detection of power system cyber-attacks. By integrating the DBN structure with the CGBRBM, which can process time-series realvalued measurement data, we introduce a deep learning-based scheme to recognize the potential FDIA for maintaining the stability of the smart grid. It can extract the highdimensional temporal behaviour features from the input data to construct a classification model and perform detection. In the simulation, we first optimize the model parameters suitable for the FDIA detection. By simulating two realistic attack scenarios, according to the determined optimal parameters, the performance is then demonstrated. The results indicate that our scheme can efficiently detect the FDIA and achieve better accuracy and robustness than the ANN and the SVM. In our future work, more sophisticated attack scenarios will be investigated based on the developed mechanism. Additionally, to be more widely used in the field of the FDIA detection, we will explore our scheme in the AC power system model.  Data Availability Statement: The model and data used to support the results of this study are available from the corresponding author upon request.

Acknowledgments:
The authors would like to thank all of the editors and anonymous reviewers for their careful reading and insightful remarks.

Conflicts of Interest:
The authors declare no conflict of interest.