Next Article in Journal
Exact Boundary Correction Methods for Multivariate Kernel Density Estimation
Next Article in Special Issue
Diversified Cover Selection for Image Steganography
Previous Article in Journal
On Cyclic LA-Hypergroups
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Laplace-Domain Hybrid Distribution Model Based FDIA Attack Sample Generation in Smart Grids

1
State Grid Shanghai Municipal Electric Power Company, Shanghai 200122, China
2
College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
Symmetry 2023, 15(9), 1669; https://doi.org/10.3390/sym15091669
Submission received: 1 August 2023 / Revised: 24 August 2023 / Accepted: 29 August 2023 / Published: 30 August 2023

Abstract

:
False data injection attack (FDIA) is a deliberate modification of measurement data collected by the power grid using vulnerabilities in power grid state estimation, resulting in erroneous judgments made by the power grid control center. As a symmetrical defense scheme, FDIA detection usually uses machine learning methods to detect attack samples. However, existing detection models for FDIA typically require large-scale training samples, which are difficult to obtain in practical scenarios, making it difficult for detection models to achieve effective detection performance. In light of this, this paper proposes a novel FDIA sample generation method to construct large-scale attack samples by introducing a hybrid Laplacian model capable of accurately fitting the distribution of data changes. First, we analyze the large-scale power system sensing measurement data and establish the data distribution model of symmetric Laplace distribution. Furthermore, a hybrid Laplace-domain symmetric distribution model with multi-dimensional component parameters is constructed, which can induce a deliberate deviation in the state estimation from its safe value by injecting into the power system measurement. Due to the influence of the multivariate parameters of the hybrid Laplace-domain distribution model, the sample deviation generated by this model can not only obtain an efficient attack effect, but also effectively avoid the recognition of the FDIA detection model. Extensive experiments are carried out over IEEE 14-bus and IEEE 118-bus test systems. The corresponding results unequivocally demonstrate that our proposed attack method can quickly construct large-scale FDIA attack samples and exhibit significantly higher resistance to detection by state-of-the-art detection models, while also offering superior concealment capabilities compared to traditional FDIA approaches.

1. Introduction

As an advanced energy supply and management system, the smart grid integrates sensor, communication and control technologies to improve energy efficiency, reliability and security [1], and combines modern Internet of Things (IoT) technology with the traditional power grid system to achieve effective and reliable power distribution and promote the development of clean energy. Nowadays, it is an indispensable infrastructure for various services such as the smart home [2], smart health care [3] and smart transportation [4]. However, with the rapid development of the smart grid, its security threats are increasingly prominent. The smart grid is essentially a cyber-physical system (CPS), which can also be called energy CPS (e-CPS) by combining the computing, communication and control (3CS) capabilities with the physical world of traditional grid. Despite the advantages of reliability and efficiency, the insufficient level of security measures leads to a greater threat situation. According to the survey, the economic losses caused by malicious network attacks to the U.S. economy in 2016 are estimated to be between USD 5.7 billion and 109 billion [5] and many loopholes still exist in the development of the smart grid. Among them, FDIA is a serious threat. By tampering with the data in the smart grid, these behaviors will seriously damage the reliability and good operation performance of the grid, and may lead to serious consequences and security risks [6].
In recent years, FDIA attacks have aroused widespread concern in the research community. By injecting false data into the smart grid, targeted interference is performed for the normal operation of the power grid and may lead to energy imbalance, wrong decision making and control, misjudgment of the monitoring system and damage to the reliability and toughness of the whole grid. For example, an FDIA attacker can tamper with sensor data to make the smart grid misjudge the current energy demand, resulting in energy distribution imbalance and power supply interruption [7]. In addition, FDIA attack may lead to errors in the control system in the smart grid, thereby affecting the safety and reliable supply of energy. In order to make the FDIA successful, the attacker needs to keep an in-depth understanding of the smart grid, including its data characteristics, communication protocol, attack sparsity, attack specificity, minimum attack vector, the need to present unobservability and the need to be based on the impact of attacks in the smart grid environment [8]. It, however, is difficult for attackers to access the real-time topology configuration and structure of power grid system in reality, which increases the difficulty of attack and greatly reduces the effect of attack.
Aiming at the threat and countermeasures of FDIA in the smart grid, many researchers have invested a lot of efforts. They pay attention to the detection and defense methods of FDIA and put forward a variety of different solutions. For example, Huang et al. proposed a deep reinforcement learning FDIA detection method, which focuses on state attention to solve the problem of state feature extraction in existing reinforcement learning detection methods [9]. Mahi et al. proposed a novel prediction-aided anomaly detection, using the sequence-to-sequence architecture of the automatic encoder based on CNN-LSTM to combat the FDIA [10]. However, the above research deals with the detection of traditional and single FDIA. Due to the limitations of this attack, in reality, we can not often meet the conditions that meet the traditional FDIA, but may suffer from novel and complex attack modes. Therefore, the traditional FDIA technology has gradually lost its effect against the smart grid. With the progress of technology and the increasing complexity of the smart grid system, we are faced with new forms of FDIA, which are more hidden, advanced and more destructive [11]. In addition, we can understand the current research status in this field by consulting a large number of literature.
In [12], Tu et al. proposed an optimal attack strategy using Kullback–Leibler (KL) divergence, in which the attacker maintains a fixed stealth level and reduces the performance of the system by modifying the control input. In [13], Zhang et al. designed a self-generated FDI attack on the measurement signal that remains invisible. In [14], Xiao et al. proposed an optimal stealth attack on damaged sensors using the backward recursive Riccati equation. In [15], Sushree et al. studied FDI attacks on sensors, actuators and physical systems, and designed bounded and unbounded errors of FDI (False Data Injection) attacks in some cases to improve their effectiveness. In [16], Sun et al. proposed a closed form expression for the optimal Gaussian sparse attack, and two target observation selection strategies were introduced to find the vulnerable observations in the system, in which the algorithm adopts different trade-offs between performance and calculation time. In [17], Wang et al. proposed a new attack method to reduce the detection effect of FDI detector based on SVM (Support Vector Machine). The core of this method is to construct attack vectors that cannot be detected via FDI detector based on SVM to avoid the detection of the power system detector. Although the above methods have shown strong detection performance against FDIA attacks, most of them are based on machine learning methods. However, efficient machine learning detection models require large-scale attack samples as training sets, which are difficult to achieve in practical scenarios because FDIA attack nodes are difficult to accurately characterize, making it difficult to obtain FDIA cascading attack samples. Therefore, some mature machine learning algorithm models are difficult to stably train and have weak generalization ability against FDIA attacks without sufficient labeled samples as training data, greatly reducing the efficiency of FDIA attack detection models in smart grids.
Facing the aforementioned problems, we are thus motivated to design an efficient FDIA attack sample generation method by introducing a hybrid Laplace-domain distribution model, which makes the following novel contributions:
-
We propose an efficient FDIA attack sample generation method. Our method can quickly construct large-scale attack training samples for FDIA detection models, thereby solving the problem of sparse attack samples.
-
By analyzing the measured data changes of each node of the power system, individual Laplacian distribution can be established sequentially according to the change of the sensing measurement data of each node. A Laplace-domain hybrid distribution can be constructed to generate FDIA attack samples by combining multiple symmetric Laplace distribution models, which can better improve the concealment of the attack sample.
-
We conduct a large number of experiments by different detection models to verify that our attack samples are more deceptive than other attack samples. The experimental results demonstrate that our method outperforms traditional FDIA attack sample construction schemes in terms of attack strength and covert capability, while guaranteeing a low computational complexity.
The rest of this paper is organized as follows. Section 2 reviews the mechanism of the false data injection attack. In Section 3, we describe the details of the proposed FDIA attack sample generation method. Comprehensive experiments are performed to evaluate the performance of proposed scheme. The experimental results and corresponding discussions are presented in Section 4. Finally, Section 5 concludes the paper.

2. False Data Injection Attack

FDIA refers to an attacker’s behavior of introducing false information into the smart grid to disrupt its normal operation or obtain illegal benefits by tampering with or falsifying the data of the power system. This attack may have a serious impact on the operation stability, energy supply reliability and data accuracy of the power system [18]. The basic principle of constructing FDIA is to change some state vectors by manipulating a group of measurement vectors on the AC power system [19]. An attacker can change the real power on the bus to make a series of attack vectors, which are then added to the measurement data so that the estimated state vector is different from the actual state value. Correspondingly, the DC state estimation model can be expressed as follows.
z = H x + e
where z = { z 1 , z 2 , z I } is the measurement vector, x = { x 1 , x 2 , x J } is the state vector, e = { e 1 , e 2 , e I } is the measurement error vector, I and J denote the number of measurements and the number of state data, respectively, and H is the measurement Jacobian matrix.
The most commonly used state estimation method in power systems is the weighted least squares method, which solves the estimate with the smallest mesh function value as the optimal state result by using the sum of squared differences between the measurement vector z and the estimated state vector x as the objective function, which is formulated as
x ^ = ( H T W H ) 1 H T W z
where W is a diagonal matrix and each of its elements is equal to the inverse of the corresponding measurement accuracy.
Since the measurement data in the power grid are usually subject to incomplete and abnormal data, state estimation is necessary to accurately and effectively monitor the state information and provide support for system safety assessment [20]. In the process of state estimation, the bad data with large errors will lead to the deviation between the calculated state estimation value and the real situation, which seriously affects the judgment of the control center operator on the system state. Researchers have adopted a range of approaches to detect, process and eliminate bad data. In general, the largest normalized residual (LNR) test is commonly used for bad data detection (BDD) of state estimation. The residual r is defined as
r = z H x ^
In order to make the injected attack vector invisible, we assume that the measurement error e follows an ideal normal distribution and we use a = a 1 , a 2 , a m T represent the FDIA vector injected by the attacker into the measured value. Then the actual measurement data is z a = z + a , and the error vector is c = c 1 , c 2 , c n T of the state variable caused by FDIA. At this time, the estimated state variable x a = x ^ + c and the residual after attacking can be expressed as
r a = z a H x a = z + a H ( x ^ + c ) = z H x ^ + a H c
When a = H c , the following formula holds:
r a = z a H x a = z H x ^ = r τ
where τ is the threshold value of LNR test. It can be seen that, when the above conditions are met, FDIA can successfully pass the LNR test, thus causing changes and losses to the power system state estimation. However, attackers in this mode must understand the internal structure of the power grid system and the real-time topology configuration, and also need to find highly sparse attack vectors that meet specific conditions to cause attacks, making it rather cost-consuming.

3. Proposed Method

3.1. The Framework of Proposed Scheme

The primary objective of the proposed framework is to generate attack samples by closely observing and fitting the distribution of the original data. Firstly, the interactive data from various sensors within the virtual power plant are collected and processed, which is then utilized to calculate power flow, while incorporating standard Laplace distribution noise to simulate power system measurement data. Subsequently, a Laplace-domain hybrid distribution model is established to further simulate the measurement data of the power system. This hybrid model can be solved iteratively based on the changing trends observed in the measurement data to effectively capture the inherent characteristics of the data. Furthermore, the solved mixed model is used to generate the corresponding attack vector, which is then injected into the measurement data to change the state estimation of the power system, and the proposed Laplace-domain hybrid distribution model can be further evaluated and corrected by iteratively using different detection models. The complete framework of proposed attack sample generation scheme is shown in Figure 1.

3.2. Laplace Distribution and Observation

Laplace distribution, also known as double-exponential distribution, is a continuous probability distribution that is widely employed in various fields, including statistics, signal processing and power system analysis. Laplace distribution is characterized by its symmetric bell-shaped curve and usually exhibits heavy tails compared to the normal distribution [21]. This heavy-tailed property makes the Laplace distribution particularly suitable for modeling data with outliers or extreme values. Laplace distribution is usually defined by two parameters: the location parameter and the scale parameter, where the former determines the center of the distribution and the latter controls the spread or dispersion.
f = 1 2 s e x μ s
where x is the sample data, s is the scale parameter, μ is the position parameter, and its position parameter μ determines the center of the distribution, and the scale parameter s controls its distribution.
Based on the definition of the above Laplace standard distribution, we analyze the IEEE 14- and IEEE 118-node system data. Due to the fact that different data nodes in the IEEE node system represent different power system devices, their temporal data have different distribution states. We randomly select the data from four nodes 1, 2, 4, 11 in the IEEE 14-node system and fit their distributions, as shown in Figure 2. From the figure, we can observe that the actual data distribution for different nodes have a significantly consistent distribution model with the standard Laplace distribution with different parameters. In other words, the data of different nodes in the IEEE node system almost all conform to the Laplace distribution, although their distribution parameters may be different. Therefore, we completely believe that, if multiple different Laplacian models with different parameters are combined to construct a hybrid Laplacian distribution model to fit interference errors aiming at measurement data, it is easy to confuse existing FDIA detection models by adding the above interference errors in normal samples, because the distribution of generated attack samples is approximately the same as that of normal samples.

3.3. Hybrid Laplace Distribution Model

According to the analysis in Section 3.2, the sensing data of different nodes in the power system generally conform to the standard Laplace distribution, but their parameters may be different. Therefore, a single Laplace distribution model can not fit the sensing data distribution of all nodes well. Considering the above problem, we design a Laplace-domain hybrid distribution by combining multiple single standard Laplace distribution model [22]. The proposed hybrid Laplace distribution model is a probability model based on standard Laplace distribution. It is used to simulate the sensing data distribution in a power system by linearly combining multiple Laplacian distributions through weighting functions. The mathematical modeling process can be defined as follows.
g x = i = 1 k b i 1 2 s i e x μ i s i
where b i 0 , and i = 1 k b i = 1 , s i and μ i represent the component scale and mean value of the i-th component distribution in the linear hybrid Laplacian distribution, respectively. b i is a weight parameter, i = 1 , 2 , , k . Accordingly, the parameter vector of the density function of the hybrid Laplacian distribution can be calculated as follows.
ϑ k = b 1 , b 2 , , b k , s 1 , s 2 , , s k , μ 1 , μ 2 , , μ k
Correspondingly, the parameter form in Equation (7) can be further transformed as:
g ( x / ϑ k ) = i = 1 k b i f i ( x , μ i , s i )
Furthermore, according to the above established hybrid Laplace distribution model, we need to solve it to obtain the optimal model parameters. Notably, within this model, the measurement data of each node is addressed individually, and thus the characteristics of power injection changes for each node need to be fitted separately. Moreover, a synchronous injection attack can be utilized to intentionally obfuscate the detection and identification model. Correspondingly, the solution process of the proposed hybrid distribution model can be described as follows.
(1)
Firstly, determine the initial values of the model parameters according to the characteristics of the sensing measurement data, then the model parameter vector is ϑ k ( 0 ) = b 1 ( 0 ) , b 2 ( 0 ) , , b k ( 0 ) , s 1 ( 0 ) , s 2 ( 0 ) , , s k ( 0 ) , μ 1 ( 0 ) , μ 2 ( 0 ) , , μ k ( 0 ) . Correspondingly, the posterior probability of sample X 1 , X 2 , X n (where X i f ( s i ( 0 ) , μ i ( 0 ) ) ) under this initial condition can be expressed as:
p t j ( 0 ) = g x t , b j ( 0 ) , μ j ( 0 ) , s j ( 0 ) i = 1 k g x t , b t ( 0 ) , μ t ( 0 ) , s t ( 0 ) .
Considering that the posterior probability of each component of the linear component is calculated circularly: For each component, the probability density of the sample data X i belonging to the component is calculated by using the initialized parameters s i and μ i , where the posterior probability can be calculated by matrix multiplication with the prior probability b i , and the posterior probability meets the normalization condition, that is, j = 1 k p t j ( 0 ) = 1 . For any group of j = 1 k b j = 1 , the assignment of samples to k components with p t j ( 0 ) can be completed sequentially under the initial value ϑ k ( 0 ) . Subsequently, the parameters of each component distribution can be obtained by the expectation algorithm.
b j ( 1 ) = 1 n t = 1 n p t j ( 0 ) μ j ( 1 ) = t = 1 n p t j ( 0 ) x t t = 1 n p t j ( 0 ) s j ( 1 ) = t = 1 n p t j ( 0 ) ( x t μ j ( 1 ) ) 2 t = 1 n p t j ( 0 )
where b j ( 1 ) , μ j ( 1 ) and s j ( 1 ) are the component weights, component mean values and component scales updated in the first iteration, respectively. Meanwhile, ϑ k ( 1 ) is a known estimation ϑ k ( 0 ) , not the maximum likelihood estimation of hybrid distribution, and ϑ k ( 1 ) is the expected result of the first iterative separation.
(2)
In order to find the best model parameters, we need to introduce the maximum likelihood estimation (MLE) on sample X 1 , X 2 , X n .
L ϑ k = i = 1 n t = 1 k g x i , b t , μ t , s t
In order to maximize the likelihood function L ϑ k under the conditions of formula p t j = g x t , b j , μ j , s j i = 1 k g x t , b t , μ t , s t , we use the derivative method and make the derivative zero to solve the corresponding s j and μ j .
(3)
Finally, after m rounds of iteration, the result of m + 1 rounds can be obtained, that is, the solution of b j ( m + 1 ) , μ j ( m + 1 ) , s j ( m + 1 ) .
b j ( m + 1 ) = 1 n t = 1 n p t j ( m ) μ j ( m + 1 ) = t = 1 n p t j ( m ) x t t = 1 n p t j ( m ) s j ( m + 1 ) = t = 1 n p t j ( m ) ( x t μ j ( m + 1 ) ) 2 t = 1 n p t j ( m )
In the iterative algorithm for maximum likelihood estimation of hybrid model parameters, the likelihood function is monotonically increasing, i.e., L ϑ k m + 1 L ϑ k m . This means that an L ϑ k maximum point can always be found in the iteration process, and the corresponding threshold ε can be given generally. When L ϑ k m + 1 L ϑ k m ε , the Likelihood function is the largest, and accordingly the iteration should stop to obtain the maximum likelihood estimation parameters.
After successfully solving the hybrid Laplace distribution model using the original measurement data from the system, the model is utilized to capture the changing patterns in the measurement data of each node. Subsequently, an attack vector corresponding to each node can be incorporated into the measurement data of the power system. This manipulation alters the power injection quantities, thereby indirectly impacting the state estimation process and potentially leading to detrimental consequences for the power system.

4. Experimental Results and Discussions

4.1. Attack Data Generation

In our experiment, the test data are from the IEEE 14-node and IEEE 118-node test systems [23,24]. We collect the real load data of a New York independent system operator (nyiso) from 1 January 2020 to 1 May 2022. Notably, we collect the data of more than two years because the data in this time range include seasonal changes throughout the year, which help to comprehensively understand long-term load patterns and trends, including different seasons, weather conditions and the potential impact of energy consumption caused by various factors. After obtaining the data, we simulate the measurement data of the power system through the load flow calculation of the load data and further obtain the load information of each bus state variable and each node in the IEEE 14-node and IEEE 118-node test systems. The nodes in the system are regarded as different sensor data sources monitoring the same system.
After completing the modeling of the hybrid Laplace distribution model, we solve the model parameters and fit the changes of the measurement data of the power system. Accordingly, the generation of attack vectors can be simulated to change the state estimation of the system and destroy the stability and security of the power system. Figure 3 is a comparison of the normal measurement data variation distribution and the generated measurement data variation distribution on the IEEE 14-node test system [23], where the abscissa is the number of data samples and the ordinate is the data variation. In order to assess the threat degree of the attack and compare it with other attack methods, we divide the attack vector into different attack strengths according to the method in [20], where the ratio of the average power injection deviation of c and x is less than 10% for weak attacks, the ratio of the average power injection deviation of c and c is greater than 10% and less than 30% for moderate attacks, and the ratio of the average power injection deviation of c and c is greater than 30% for strong attacks. Correspondingly, attack strength can be calculated as follows.
i = 1 n c i x i m × 100 %
In addition, to avoid bad data detection (BDD) of the power system and improve the attack sample concealment, the error vector c should meet
c = H + a
where c i is the power error vector injected in the i dimension, H + is the generalized inverse of Jacobian matrix, a is the attack vector of injected measurement data, n is the size of data dimension, and m is the number of samples of injection attack.

4.2. Experimental Setup

In this section, we introduce the super-parameter, training and testing set, and the corresponding environment settings. All simulations are performed on Intel Core i7-8750 h CPU, 1050ti GPU and 8 GB RAM. Power flow calculation and state estimation of data are performed on Matlab using Matpower, while the establishment and solution of the hybrid Laplace distribution model are performed in Python. In addition, the number of components of the hybrid distribution model is set to 3, and then the number is changed during experimental comparison. The number of iterations for solving the mixed model is set to 100, and the threshold value of the difference in the absolute value of the likelihood function is set to 0.0001. The noise error of the analog measurement data of power flow calculation is set to 0.25, and the Gaussian noise with the mean value of zero and the standard deviation of 1 is used.
In each attack case, we generate 15,000 pieces of data, 7500 of which are normal data and 7500 are attack data. We label the normal data as 0 and the attack data as 1 to facilitate subsequent experiments. Further, the proportion of test set, verification set and training set is 0.2, 0.3 and 0.5, respectively, which are used to detect the detection model. Each data contain the active injection power of each node to measure the impact caused by the injection power error.

4.3. Experimental Results and Discussions

After the attack samples are generated in our experiment, we compare them with the attack samples generated by the traditional FDIA mode. The comparisons use different popular deep learning detection models for detection. In addition, four metrics—accuracy, precision, recall and F 1 score—are used as the evaluation indicators of our output results in the experiment, which can be defined as follows [23].
Accuracy = T P + T N T P + F P + T N + F N
where TP, FP, TN and FN are true positive, false positive, true negative and false negative, respectively. For F 1 score, the formula is
F 1 = 2 × Precision × Recall Precision + Recall
Precision = T P T P + F P
Recall = T P T P + F N
Obviously, the higher the accuracy and F 1 score, the worse the concealment of the attack sample, and the easier it is to be detected by the detection model.
We carry out a series of experiments over the IEEE 14- and IEEE 118-node systems to demonstrate the performance of our proposed attack sample generation scheme. Two different FDIA detection schemes—CNN-based detection scheme [25] and LSTM-based detection scheme [23]—are used to give the testing results. Three attack levels (weak, moderate, strong) and nine attack strengths, c = 2 % , 5 % , 10 % , 15 % , 20 % , 25 % , 30 % , 40 % , 50 % , are employed to provide the comparisons. The corresponding experimental results are shown in Table 1, Table 2, Table 3 and Table 4, where Table 1 and Table 2 present the testing results over the IEEE 14-node system, while Table 3 and Table 4 show the testing results over the IEEE 118-node system.
From these tables, we can observe that our scheme can obtain lower detection precision, F 1 score and recall values. To be specific, when CNN-based FDIA detection scheme is performed over the IEEE 14-node system, for weak attack strength 2 % , our scheme can obtain an approximate 21 % reduction for precision, 16 % for recall value, 20 % for F 1 score. For strong attack strength 40 % , our LMM-FDIA scheme can still obtain an approximate 9 % reduction for precision, 14 % for recall value, 14 % for F 1 score. Similarly, for the IEEE 118-node system, our scheme can also achieve approximate performance improvement. This implies that our method has a stronger anti-detection capability and can more effectively bypass the detection model to achieve the attack effect. In fact, this result can be easily explained as follows. Because our constructed hybrid Laplace distribution can well simulate the disturbance model that is consistent with the distribution of the original samples, the constructed attack samples can be thus perfectly consistent with the distribution model of the original samples.
In order to better compare our generated attack samples and the attack samples generated by traditional FDIA, we use the accuracy as the evaluation metric and the experimental results are shown in Figure 4, where the abscissa is the attack strength and the ordinate is the accuracy. As can be seen from these figures, the accuracy of our method is always lower than that of FDIA when the attack intensity is between 5% and 40%, which means that the attack samples generated by our method are more covert and confusing than traditional FDIA samples, and also implies that our scheme is more difficult to be detected by the detection model. In addition, we can find an interesting phenomenon that the performance under low attack strength is close to the performance under high attack strength. This is mainly because, under low attack strength, our generated attack samples and traditional FDIA samples are both difficult to detect due to the scarcity of attack samples, while for high attack strength, both are easily detected due to the increase in the number of attack samples. Nevertheless, our scheme can still obtain a superior performance compared to traditional FDIA samples.
To gain more insight, we discuss the influence of different model component parameters by a series of experiments. We set the component parameters from 1 to 6, and use CNN-based and LSTM-based detection models and F 1 score as measurements. The corresponding experimental results are shown in Figure 5. It can be seen from the results that, when the component parameters are set to 1 and 2, the generated attack samples are easy to be detected, which demonstrates that the generated attack samples do not fit the original sensing data well. When the component parameter is set to 3, the effect of the attack sample is greatly improved and is difficult to be recognized by the detection model, indicating that the characteristics of the data distribution can be well fitted at this time. When the component parameters are set to 4, 5 or 6, the impact on the attack samples gradually stabilize; the computational burden on memory and operational efficiency, however, significantly increase at this time. Consequently, a smaller parameter can be suggested to maintain the desired effect on attack samples while minimizing computational costs.
Furthermore, we also demonstrate the effect from qualitative comparison. We choose different FDIA attack methods, e.g., Table 5, which is also shown in this revision report for a quick check. Most of the FDIA schemes in this table utilize the traditional random disturbance generation method. We compare them in terms of concealment, model structure complexity, operational costs and runtime. Compared to some existing FDIA attack methods, using hybrid Laplacian models to construct attack samples can more easily bypass FDIA detection models, it does not need complex internal structure of the power system, and it has a low operation cost and model complexity, as well as high time effect. This is mainly because our model directly constructs a hybrid Laplacian model by combining several individual Laplacian distributions and then utilizes large-scale data samples to train optimized parameters. This makes the model construction and parameter optimization process simpler and easier to implement. In addition, we further analyze the construction of the model through parameter discussions, fundamentally simplifying the model structure, thereby greatly reducing the process and complexity of parameter optimization. This is also why our model can demonstrate significant advantages from qualitative comparisons.

4.4. Impact of Noise Error

In order to evaluate the robustness of our attack sample generation method, we conduct a series of experiments to observe the impact of noise errors on the power system measurements. Real power system measurements often suffer from noise errors, and it is important to understand how these errors affect our method. To simulate these errors [30], Gaussian noise with varying levels of noise variance is injected into the data of each node in the power system. We test the performance of two attack samples using CNN-based and LSTM-based detection models over the IEEE 14-node test system and present the results in Figure 6.
In this experiment, the noise variance distribution of the measurement data is set to 0.25, 0.35, 0.45, 0.55, respectively. From the figure, it is evident that the accuracy of all models decreases with the noise levels increasing. This decline in accuracy can be attributed to the increased difficulty in distinguishing between normal and damaged data as the noise variance grows. Consequently, noiseless data is more likely to be obscured by the noise. Furthermore, the experimental results demonstrate that our method of generating attack samples outperforms the traditional FDIA attack sample generation method at different noise levels, regardless of the detection model employed. This demonstrates the robustness and applicability of our attack sample generation method compared to other existing approaches. In conclusion, our method can construct more robust and anti-interference FDIA attack samples, while also having stronger concealment capability. This can provide large-scale and efficient attack samples for the construction of deep-learning-based FDIA detection models.

4.5. Time Complexity Analysis

Time complexity is an estimate of the running time of the algorithm, which indicates that the time needed for the algorithm to execute varies with the size of the problem [31]. In this experiment, we test the calculation time and storage capability of various model parameters in our hybrid model. In our proposed hybrid model, parameters are used to control changes in data distribution. When the parameters are performed with small changes, the time complexity of the hybrid model is mainly affected by sample data and iteration times, and its complexity will not undergo large-scale changes due to parameter changes, e.g., the complexity is O ( k n ) when parameter is set to k. Therefore, the final theoretical complexity only maintains a linear change of O ( n ) . Meanwhile, due to the small size of the parameter quantity, it has no impact on the storage space, and the spatial complexity also maintains a constant change, so it can keep the space complexity as O ( 1 ) . According to the construction of the hybrid model, the specific runtime of our method generally maintains a linear change. Figure 7 illustrates this relationship, where the x-axis represents the different iterations of the model solution, and the y-axis stands for the time of each round (seconds). Different model component parameters of the hybrid model are set to 1, 2, 3 and 4, which are drawn by the dotted lines in the figure, respectively. Parameter 1 corresponds to the case of single Laplace distribution model. Although the running time of this model is the shortest, the quality of the generated attack samples is easy to be detected. When the model parameter is 2, the running time is close to the case of model parameter 3, but the effect of the generated attack samples is far worse than that of the hybrid model with the model component parameter of 3. When the model component parameter is 4, the effect of the generated attack samples is similar to the case of model parameter 3, but the running time significantly increases. In general, when the model parameter is set to 3, the proposed hybrid model can obtain optimal performance.

5. Conclusions

This paper presents a novel approach for generating FDIA attack samples based on Laplace-domain hybrid distribution model. The proposed method involves establishing the mixed Laplace distribution and utilizing the EM algorithm to solve the hybrid distribution, thereby effectively capturing the changes observed in power system measurement data. Subsequently, the corresponding attack vector is generated and injected into the measurement data to induce changes in the system’s state estimation. To evaluate the effectiveness of the generated attack samples, we conduct an in-depth analysis of the parameters associated with different component models and the division of attack intensity. Furthermore, we employ various detection models to assess the performance of the generated attack samples. Extensive experiments are conducted on both the IEEE 14-node system test and the IEEE 118-node system test datasets.The experimental results unequivocally demonstrate that the attack samples generated by our proposed method exhibit significantly higher resistance to detection by the employed detection models.
While our proposed method has shown good performance in the test with diverse and classical IEEE node data sets, we should note that it is slightly weak on small samples. This is because more data features may be required to better fit the data distribution and improve the concealment of FDIA attack sample, while small sample datasets may lead to inaccurate model construction, thereby affecting the attack capability of attack samples. In addition, the proposed hybrid model we constructed is sample-constructed for specific FDIA attacks, resulting in a lack of universality in our model in terms of diverse attack samples targeting the power system.
In the future, our research will focus on conducting feature engineering by analyzing cascading FDIA attack samples and building efficient detection model by generating a large-scale attack samples, thereby enhancing the overall security and reliability of the smart grid system.

Author Contributions

Conceptualization and Resources: Z.Z.; Methodology and Original draft writing: Y.W.; Software and Validation: N.G. and T.Z.; Review and editing and Supervision: F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Scientific and Technological Project of the State Grid Shanghai Municipal Electric Power Company (Grant No. B30940220003).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Victor, C.; Harindra, S.; Brady, J.; Bjorn, V.; Vivek, K. A Review of Visualization Methods for Cyber-Physical Security: Smart Grid Case Study. IEEE Access 2023, 11, 1. [Google Scholar]
  2. Thakur, N.; Han, C.Y. An Intelligent Ubiquitous Activity Aware Framework for Smart Home. In Human Interaction, Emerging Technologies and Future Applications III: Proceedings of the 3rd International Conference on Human Interaction and Emerging Technologies: Future Applications (IHIET 2020), Paris, France, 27–29 August 2020; Ahram, T., Taiar, R., Langlois, K., Choplin, A., Eds.; Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2021; Volume 1253. [Google Scholar]
  3. Vergutz, A.; Noubir, G.; Nogueira, M. Reliability for Smart Healthcare: A Network Slicing Perspective. IEEE Netw. 2020, 34, 91–97. [Google Scholar] [CrossRef]
  4. Wu, L.; Zhang, W.; Zhao, W. Privacy Preserving Data Aggregation for Smart Grid with User Anonymity and Designated Recipients. Symmetry 2022, 14, 847. [Google Scholar] [CrossRef]
  5. Charalambos, K.; Saraju, P.M. Cybersecurity for the Smart Grid. Computer 2020, 53, 10–12. [Google Scholar]
  6. Aoufi, S.; Derhab, A.; Guerroumi, M. Survey of False Data Injection in Smart Power Grid: Attacks, Countermeasures and Challenges (Article). Inf. Secur. Appl. 2020, 54, 102518. [Google Scholar] [CrossRef]
  7. Qu, Z.; Yang, J.; Lang, Y.; Wang, Y.; Han, X.; Guo, X. Earth-Mover-Distance-Based Detection of False Data Injection Attacks in Smart Grids. Energies 2022, 15, 1733. [Google Scholar] [CrossRef]
  8. Haftu, T.; Adnan, A.; Abdun, M. Comprehensive Survey and Taxonomies of False Data Injection Attacks in Smart Grids: Attack Models, Targets, and Impacts. Renew. Sustain. Energy Rev. 2022, 163, 112423. [Google Scholar]
  9. Huang, R.; Li, Y.; Wang, X. Attention-Aware Deep Reinforcement Learning for Detecting False Data Injection Attacks in Smart Grids. Electr. Power Energy Syst. 2023, 147, 108815. [Google Scholar] [CrossRef]
  10. Mahi, A.; Hossain, F.; Anwar, A.; Azam, S. False Data Injection Attack Detection in Smart Grid Using Energy Consumption Forecasting. Energies 2022, 15, 4877. [Google Scholar] [CrossRef]
  11. Nafees, M.; Saxena, N.; Cardenas, A.; Grijalva, S.; Burnap, P. Smart Grid Cyber-Physical Situational Awareness of Complex Operational Technology Attacks: A Review. Assoc. Comput. Mach. 2023, 55, 10. [Google Scholar] [CrossRef]
  12. Tu, W.; Dong, J.; Zhai, D. Optimal ϵ-stealthy attack in cyber-physical systems. J. Frankl. Inst. 2021, 358, 151–171. [Google Scholar] [CrossRef]
  13. Zhang, T.; Ye, D. False Data Injection Attacks With Complete Stealthiness in Cyber-physical Systems: A Self-Generated Approach. Automatica 2020, 120, 109117. [Google Scholar] [CrossRef]
  14. Xiao, L. Optimal Attack Strategy Against Fault Detectors for Linear Cyber-Physical Systems. Inf. Sci. 2021, 581, 390–402. [Google Scholar]
  15. Sushree, P.; Ashok, K.T. Design of False Data Injection Attacks in Cyber-Physical Systems. Inf. Sci. 2022, 608, 825–843. [Google Scholar]
  16. Sun, K.; Li, Z. Sparse Data Injection Attacks on Smart Grid: An Information-Theoretic Approach. IEEE Sens. 2022, 22, 14553–14562. [Google Scholar] [CrossRef]
  17. Wang, B.; Zhu, P.; Chen, Y.; Xun, P.; Zhang, Z. False Data Injection Attack Based on Hyperplane Migration of Support Vector Machine in Transmission Network of the Smart Grid. Symmetry 2018, 10, 165. [Google Scholar] [CrossRef]
  18. Li, X.; Wang, Y.; Lu, Z. Graph-Based Detection for False Data Injection Attacks in Power Grid. Energy 2023, 263, 125865. [Google Scholar] [CrossRef]
  19. Jorjani, M.; Seifi, H.; Varjani, A. A Graph Theory-Based Approach to Detect False Data Injection Attacks in Power System AC State Estimation. IEEE Trans. Ind. Inform. 2020, 17, 2465–2475. [Google Scholar] [CrossRef]
  20. Li, Y.; Wei, Y.; Li, Y.; Dong, Z.; Shahidehpour, M. Detection of False Data Injection Attacks in Smart Grid: A Secure Federated Deep Learning Approach. IEEE Trans. Smart Grid 2022, 99, 1. [Google Scholar] [CrossRef]
  21. Jing, H.; Liu, Y.; Zhao, J. Asymmetric Laplace Distribution Models for Financial Data: VaR and CVaR. Symmetry 2022, 14, 807. [Google Scholar] [CrossRef]
  22. Amos, N.; Tomasz, J.K. A Uniform-Laplace Mixture Distribution. Comput. Appl. Math. 2023, 115236. [Google Scholar]
  23. Wu, Y.; Sheng, Y.; Guo, N.; Li, F.; Tian, Y.; Su, X. Hybrid Deep Network Based Multi-Source Sensing Data Fusion for FDIA Detection in Smart Grid. In Proceedings of the 2022 Asia Power and Electrical Technology Conference (APET), Shanghai, China, 11–13 November 2022; pp. 310–315. [Google Scholar]
  24. Wu, Y.; Wang, Q.; Guo, N.; Tian, Y.; Li, F.; Su, X. Efficient Multi-Source Self-Attention Data Fusion for FDIA Detection in Smart Grid. Symmetry 2023, 15, 1019. [Google Scholar] [CrossRef]
  25. Shen, K.; Yan, W.; Ni, H.; Chu, J. Localization of False Data Injection Attack in Smart Grids Based on SSA-CNN. Information 2023, 14, 180. [Google Scholar] [CrossRef]
  26. Sayghe, A.; Anubi, O.M.; Konstantinou, C. Adversarial Examples on Power Systems State Estimation. In Proceedings of the 2020 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 17–20 February 2020; pp. 1–5. [Google Scholar]
  27. Mukherjee, D. Data-Driven False Data Injection Attack: A Low-Rank Approach. IEEE Trans. Smart Grid 2022, 13, 2479–2482. [Google Scholar] [CrossRef]
  28. Jiao, R.; Xun, G.; Liu, X.; Yan, G. A New AC False Data Injection Attack Method Without Network Information. IEEE Trans. Smart Grid 2021, 12, 5280–5289. [Google Scholar] [CrossRef]
  29. Tian, J.; Wang, B.; Wang, Z.; Cao, K.; Li, J.; Ozay, M. Joint Adversarial Example and False Data Injection Attacks for State Estimation in Power Systems. IEEE Trans. Cybern. 2022, 52, 13699–13713. [Google Scholar] [CrossRef]
  30. Li, Y.; Li, J.; Qi, J.; Chen, L. Robust Cubature Kalman Filter for Dynamic State Estimation of Synchronous Machines Under Unknown Measurement Noise Statistics. IEEE Access 2019, 7, 29139–29148. [Google Scholar] [CrossRef]
  31. Deng, G.; Qi, N.; Tang, M.; Duan, X. Constructing Dixon Matrix for Sparse Polynomial Equations Based on Hybrid and Heuristics Scheme. Symmetry 2022, 14, 1174. [Google Scholar] [CrossRef]
Figure 1. The complete framework of proposed attack sample generation scheme.
Figure 1. The complete framework of proposed attack sample generation scheme.
Symmetry 15 01669 g001
Figure 2. Actual data distribution and standard Laplace distribution for different node data (node 1, node 2, node 3, node 4) in the IEEE 14-node system.
Figure 2. Actual data distribution and standard Laplace distribution for different node data (node 1, node 2, node 3, node 4) in the IEEE 14-node system.
Symmetry 15 01669 g002
Figure 3. Variation distribution comparison for normal samples and generated attack samples. (a) Normal measurement data variation. (b) Generated measurement data variation.
Figure 3. Variation distribution comparison for normal samples and generated attack samples. (a) Normal measurement data variation. (b) Generated measurement data variation.
Symmetry 15 01669 g003
Figure 4. Accuracy comparison of two attack sample generation schemes on the IEEE 14- and IEEE 118-node systems. In this experiment, CNN-based FDIA detection model and LSTM-based FDIA detection model are used to provide the testing results. (a) CNN-based detection over the IEEE 14 system. (b) LSTM-based detection over the IEEE 14 system. (c) CNN-based detection over the IEEE 118 system. (d) LSTM-based detection over the IEEE 118 system.
Figure 4. Accuracy comparison of two attack sample generation schemes on the IEEE 14- and IEEE 118-node systems. In this experiment, CNN-based FDIA detection model and LSTM-based FDIA detection model are used to provide the testing results. (a) CNN-based detection over the IEEE 14 system. (b) LSTM-based detection over the IEEE 14 system. (c) CNN-based detection over the IEEE 118 system. (d) LSTM-based detection over the IEEE 118 system.
Symmetry 15 01669 g004
Figure 5. Performance comparison under different component parameters.
Figure 5. Performance comparison under different component parameters.
Symmetry 15 01669 g005
Figure 6. Comparison of two attack sample generation schemes under different noise levels. (a) CNN-based detection model. (b) LSTM-based detection model.
Figure 6. Comparison of two attack sample generation schemes under different noise levels. (a) CNN-based detection model. (b) LSTM-based detection model.
Symmetry 15 01669 g006
Figure 7. Comparison of running time of different model component parameters.
Figure 7. Comparison of running time of different model component parameters.
Symmetry 15 01669 g007
Table 1. Performance comparison of two attack sample generation schemes, the traditional FDIA sample generation and FDIA sample generation based on hybrid Laplace model (LMM-FDIA). In this test, CNN-based FDIA attack detection model is used over the IEEE 14-node system.
Table 1. Performance comparison of two attack sample generation schemes, the traditional FDIA sample generation and FDIA sample generation based on hybrid Laplace model (LMM-FDIA). In this test, CNN-based FDIA attack detection model is used over the IEEE 14-node system.
Attack
Level
Attack
Strength
Traditional SchemeLMM-FDIA Based Scheme
PrecisionRecall F 1 -ScorePrecisionRecall F 1 -Score
Weak
Attacks
2%0.54850.61370.56130.33300.45200.3689
5%0.62430.68690.64080.42050.69500.5179
10%0.70210.71320.70040.65560.61480.5499
Moderate
Attacks
15%0.78130.79540.78260.66360.55410.5574
20%0.83720.84240.83580.73510.67760.6764
25%0.84780.87320.85650.77940.76050.7595
Strong
Attacks
30%0.89950.90940.89980.82800.81430.7981
40%0.94940.94520.94620.85410.80210.8090
50%0.97540.97080.97250.92810.91860.9213
Table 2. Performance comparison of two attack sample generation schemes, the traditional FDIA sample generation and FDIA sample generation based on hybrid Laplace model (LMM-FDIA). In this test, LSTM-based FDIA attack detection model is used over the IEEE 14-node system.
Table 2. Performance comparison of two attack sample generation schemes, the traditional FDIA sample generation and FDIA sample generation based on hybrid Laplace model (LMM-FDIA). In this test, LSTM-based FDIA attack detection model is used over the IEEE 14-node system.
Attack
Level
Attack
Strength
Traditional SchemeLMM-FDIA Based Scheme
PrecisionRecall F 1 -ScorePrecisionRecall F 1 -Score
Weak
Attacks
2%0.53990.54170.54080.49430.46860.4811
5%0.62900.60000.61410.50000.40780.4492
10%0.67040.68420.69590.50440.47990.4854
Moderate
Attacks
15%0.74810.80630.77610.52440.48590.5045
20%0.82080.83070.82570.52490.62610.5054
25%0.84090.90230.87050.56620.47060.5140
Strong
Attacks
30%0.85860.91890.88770.60580.57870.6041
40%0.95090.94210.94650.67420.64950.6616
50%0.98310.95690.96980.78430.58670.5960
Table 3. Performance comparison of two attack sample generation schemes, the traditional FDIA sample generation and FDIA sample generation based on hybrid Laplace model (LMM-FDIA). In this test, CNN-based FDIA attack detection model is used over the IEEE 118-node system.
Table 3. Performance comparison of two attack sample generation schemes, the traditional FDIA sample generation and FDIA sample generation based on hybrid Laplace model (LMM-FDIA). In this test, CNN-based FDIA attack detection model is used over the IEEE 118-node system.
Attack
Level
Attack
Strength
Traditional SchemeLMM-FDIA Based Scheme
PrecisionRecall F 1 -ScorePrecisionRecall F 1 -Score
Weak
Attacks
2%0.51760.54850.49520.43720.58540.4360
5%0.58180.57140.56500.59330.41320.3958
10%0.72410.69030.70050.79760.41210.5355
Moderate
Attacks
15%0.80230.78740.79120.85860.53380.6525
20%0.83640.82630.82810.86190.62910.7165
25%0.87460.84810.85830.90720.68100.7730
Strong
Attacks
30%0.89260.89520.89170.94640.72500.8162
40%0.93740.93760.93640.91890.79700.8504
50%0.96210.96930.96510.92720.90470.9157
Table 4. Performance comparison of two attack sample generation schemes, the traditional FDIA sample generation and FDIA sample generation based on hybrid Laplace model (LMM-FDIA). In this test, LSTM-based FDIA attack detection model is used over the IEEE 118-node system.
Table 4. Performance comparison of two attack sample generation schemes, the traditional FDIA sample generation and FDIA sample generation based on hybrid Laplace model (LMM-FDIA). In this test, LSTM-based FDIA attack detection model is used over the IEEE 118-node system.
Attack
Level
Attack
Strength
Traditional SchemeLMM-FDIA Based Scheme
PrecisionRecall F 1 -ScorePrecisionRecall F 1 -Score
Weak
Attacks
2%0.53280.18700.27690.49660.64440.5610
5%0.57180.60270.58690.50620.98080.6678
10%0.75180.61260.67510.67370.22880.3417
Moderate
Attacks
15%0.80080.76540.78270.83980.47030.6029
20%0.87280.75490.80960.90370.67480.7726
25%0.86720.87350.87030.88170.71300.7884
Strong
Attacks
30%0.91290.87080.89140.83380.91020.8703
40%0.89950.96700.93200.94990.85090.8977
50%0.95550.97690.96610.98000.93730.9581
Table 5. Qualitative comparison with existing FDIA methods.
Table 5. Qualitative comparison with existing FDIA methods.
Attack MethodCharacteristicsChallenges
FDIA [20]Effectively avoiding BDD detection
Low model complexity
Easy to detect by DL model
Simple construction of attack vector
AFDIA [26]Effectively avoiding BDD detection
High success rate
Poor robustness
Easy to detect by DL model
Large amount of model parameters
D-FDIA [27]Effectively avoiding BDD detection
Low computational burden
Easy to detect by DL model
Poor concealment of attack vector
SG-FDIA [28]Effectively avoiding BDD detection
High time efficiency
Easy to detect by DL model
Poor robustness
M-AFDIA [29]Effectively avoiding DL model detection
Strong concealment of attack vectors
Easy to detect by BDD
Long running time
S-AFDIA [29]Effectively avoiding BDD and
DL model detection
High model complexity
High operation cost
Obtain comprehensive system information
LMM-FDIAEffectively avoiding BDD and
DL model detection
Low model complexity and running time
Strong concealment of attack vector
Poor performance on small samples
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, Y.; Zu, T.; Guo, N.; Zhu, Z.; Li, F. Laplace-Domain Hybrid Distribution Model Based FDIA Attack Sample Generation in Smart Grids. Symmetry 2023, 15, 1669. https://doi.org/10.3390/sym15091669

AMA Style

Wu Y, Zu T, Guo N, Zhu Z, Li F. Laplace-Domain Hybrid Distribution Model Based FDIA Attack Sample Generation in Smart Grids. Symmetry. 2023; 15(9):1669. https://doi.org/10.3390/sym15091669

Chicago/Turabian Style

Wu, Yi, Tong Zu, Naiwang Guo, Zheng Zhu, and Fengyong Li. 2023. "Laplace-Domain Hybrid Distribution Model Based FDIA Attack Sample Generation in Smart Grids" Symmetry 15, no. 9: 1669. https://doi.org/10.3390/sym15091669

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop