Next Article in Journal
A High-Reliable Wireless Sensor Network Coverage Scheme in Substations for the Power Internet of Things
Next Article in Special Issue
Enhanced Example Diffusion Model via Style Perturbation
Previous Article in Journal
Concepts of Picture Fuzzy Line Graphs and Their Applications in Data Analysis
Previous Article in Special Issue
Secure Steganographic Cover Generation via a Noise-Optimization Stacked StyleGAN2
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Efficient Multi-Source Self-Attention Data Fusion for FDIA Detection in Smart Grid

1
State Grid Shanghai Municipal Electric Power Company, Shanghai 200122, China
2
College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201306, China
3
College of Electrical Engineering, Shanghai University of Electric Power, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
Symmetry 2023, 15(5), 1019; https://doi.org/10.3390/sym15051019
Submission received: 13 April 2023 / Revised: 27 April 2023 / Accepted: 1 May 2023 / Published: 4 May 2023

Abstract

:
As a new cyber-attack method in power cyber physical systems, false-data-injection attacks (FDIAs) mainly disturb the operating state of power systems by tampering with the measurement data of sensors, thereby avoiding bad-data detection by the power grid and threatening the security of power systems. However, existing FDIA detection methods usually only focus on the detection feature extraction between false data and normal data, ignoring the feature correlation that easily produces diverse data redundancy, resulting in the significant difficulty of detecting false-data-injection attacks. To address the above problem, we propose a multi-source self-attention data fusion model for designing an efficient FDIA detection method. The proposed data fusing model firstly employs a temporal alignment technique to integrate the collected multi-source sensing data to the identical time dimension. Subsequently, a symmetric hybrid deep network model is built by symmetrically combining long short-term memory (LSTM) and a convolution neural network (CNN), which can effectively extract hybrid features for different multi-source sensing data. Furthermore, we design a self-attention module to further eliminate hybrid feature redundancy and aggregate the differences between attack-data features and normal-data features. Finally, the extracted features and their weights are integrated to implement false-data-injection attack detection using a single convolution operation. Extensive simulations are performed over IEEE14 node test systems and IEEE118 node test systems; the experimental results demonstrate that our model can achieve better data fusion effects and presents a superior detection performance compared with the state-of-the-art.

1. Introduction

In the context of new power-system construction strategies [1], new business systems such as virtual power plants (VPP), load integrators with multiple interactions and cross-network operations have emerged in large numbers. Along with the continuous increase in the volume of business, the interaction between the main bodies on both sides of the public and private sector is becoming increasingly large and complex. VPPs have a higher requirement fo the flexible scheduling of communication networks, differentiated service guarantees and network security [2,3]. As a regional multi-energy aggregation model which enables large-scale connection of renewable energy generation to the grid, virtual power plants aggregate distributed power sources [4], controllable loads and energy storage devices in the grid into a virtual controllable aggregate through a distributed power management system. However, due to the existence of a series of new energy sites and low-voltage distributed power generation systems in the virtual-power-plant business system, which may involve multiple stakeholders, there may be a large number of problems in the operation process of the power system, such as random fluctuations in the output of new energy; multi-level, multi-time-scale stability; and the balance of the power system. Therefore, based on existing information-physical system theory and analysis, the dynamic interaction process of information and energy flows on both sides of the virtual power plant is modelled to realize the state sensing, fusion analysis and coordination control of the virtual-power-plant system [5]. On the other hand, in order to achieve flexible and efficient regulation of the source network load and storage in the virtual-power-plant system, the information nodes and power nodes need to complete the regulation and control instructions and upload the physical-equipment status information through frequent data interaction, but the frequent-interaction data contains a large number of multi-source heterogeneous data, e.g., structured data in the form of data tables and unstructured data, such as text and images, etc. If these heterogeneous data cannot be digitally transformed in real time, it will seriously affect the accuracy and stability of the model in the information-physical system of the virtual power plant. However, cross-type data analysis techniques for new power systems, such as VPPs, are currently very weak [6]; the correlation analysis mining capabilities between state quantities are rather insufficient meaning that the front-back-end data fusion is difficult to effectively operate [7]. Therefore, an integrated and unified model for multiple sources of heterogeneous interactive data in a public–private interactive power-grid environment can efficiently help to achieve efficient fusion of multiple sources of heterogeneous data and to improve the performance of virtual-power-plant information-physical system models.
In recent years, some scholars have started to investigate the data generated by the interaction of public and private information in virtual power plants [8], given the increasing number of occasions where big data is present in the construction of smart grids. Currently, work on the integration of heterogeneous data from multiple sources in virtual power plants is mainly focused on sensor-node data fusion. Current power-data-fusion techniques are mainly aimed at the fusion of multiple sources of power equipment, and most of these techniques face the fault location of power equipment and the identification and calibration of multiple sources of data in the distribution network. Jiao et al. [9] used the distance measurement results of distance relays in substations at both ends of transmission lines to propose a new method for improving the fault location accuracy of transmission lines based on multi-source data-fusion techniques. Based on the analysis of existing series-compensated transmission-line fault-location algorithms, some scholars introduced artificial-intelligence techniques to the differential equation mathematical model and fuse the model with neural networks to design a new power-fault location method [10,11]. For multiple data sources in distribution-network data collection, a practical multi-source data pre-processing technique can repair some of the bad data and improve the quality of the state estimation input data, so that the advantages of redundant data can be fully utilized and misjudgement and omissions in data pre-processing can be avoided. In response to the problems of complex rectification, difficult coordination and poor adaptability of traditional protection methods for smart distribution networks, Lin et al. developed a condition-monitoring and fault-handling method [12] based on big-data analysis of smart distribution networks. According to the network correlation matrix and regional difference rules, the current and power data collected by the measurement and control integration terminals at each node are pre-processed and the results are fused in time and space, a high-dimensional spatial-temporal condition monitoring matrix is generated. However, the above methods only consider the fusion error as an indicator when fusing data from multiple sources. As the multi-source data contains a large amount of redundant data, the transmission of redundant data results in a waste of bandwidth resources and even causes network blockage in severe cases, reducing the fusion efficiency [13]. Therefore, how to screen feature attributes efficiently with guaranteed fusion error and reduce the transmission of redundant data is a focus of multi-source data-fusion processing. Recently, deep-learning-based methods are involved in FDIA detection, e.g., the CNN-LSTM scheme [14] and a multi-head-attention-like scheme [15]. They can achieve better results than traditional machine-learning methods, obtaining the advantages in terms of classification accuracy and training speed. Nevertheless, for the multi-business-entity nature of virtual power plants, the data obtained from different business entities is clearly heterogeneous and from multiple sensing sources, and these data are fragmented between different systems, creating a highly heterogeneous data. Yet, these deep-learning-based methods can only effectively process data from the same data source sensor, while, for heterogeneous data from different sources, their fusion efficiency will be greatly reduced, resulting in a decline in FDIA-detection accuracy.
Overall, a unified data-fusion model can effectively integrate multi-source sensor data by eliminating data redundancy. The fused and efficient data can improve the detection performance for FDIA, providing a reliable guarantee for the stable operation of the power system. However, existing deep-learning-based FDIA detection schemes usually only focus on the detection-feature extraction between false data and normal data, ignoring the feature correlation that easily produces diverse data redundancies, resulting in the significant difficulty of detecting false-data-injection attacks. Facing the aforementioned problem, we were motivated to propose a multi-source self-attention data-fusion model for FDIA detection. The proposed data-fusing model firstly employs a temporal-alignment technique to integrate the collected multi-source sensing data to the identical time dimension. Subsequently, a hybrid deep-learning network was built by combining long short-term memory (LSTM) and a convolution neural network (CNN), which can effectively extract hybrid features for different multi-source sensing data. Furthermore, we designed a self-attention module to further eliminate hybrid-feature redundancy and aggregate the differences between the attack-data features and normal-data features. Finally, the extracted features and their weights were integrated to implement false-data-injection-attack detection using a single convolution operation. Extensive simulations were performed over IEEE14 node test systems and the experimental results demonstrate that our model can obtain better data-fusion effects and presents a superior detection performance compared with the state-of-the-art.
In general, compared to existing works, we make the following novel contributions:
-
We build a hybrid deep-learning network by combining long short-term memory and a convolution neural network, which can effectively extract hybrid features for different multi-source sensing data. The proposed network model is a good attempt to achieve a balance between efficient feature construction and high-accuracy attack detection.
-
We designed a self-attention module which can further eliminate hybrid-feature redundancy by aggregating the differences between attack-data features and normal-data features. Since the proposed self-attention module works in a plug-and-play mode and further optimizes the model scale, it does not add to overall network-training-time consumption.
-
Comprehensive experiments were performed over IEEE14 and IEEE118 node test systems demonstrate that our model can outperform existing methods in terms of feature effectiveness, FDIA-detection accuracy and network training complexity.
The rest of this paper is organized as follows. Section 2 presents the related work on data-fusion schemes and self-attention mechanisms. In Section 3, we describe the details of the multi-source self-attention data-fusion model for FDIAs detection. Comprehensive experiments were performed to evaluate the performance of proposed scheme. The experimental results and corresponding discussions are presented in Section 4. Finally, Section 5 concludes the paper.

2. Related Work

2.1. Data Fusion

FDIAs can cause the unstable operation of a power grid by injecting false data into the power grid, which creates serious challenges in the modern new power system. However, in existing power systems, especially in new power systems represented by VPP, a large number of sensor nodes can generate massive amounts of perceptual data, making it difficult for the injection of false data to be accurately detected using existing detection models. Meanwhile, different sensor nodes cause sensor data to be transmitted in a multi-source heterogeneous state in the power system, which further reduces the accuracy of FDIA detection.
Several papers have been published on multi-source heterogeneous data fusion. Lu et al. [16] employed an improved convolutional neural network, convolutional neural networks combined with Gate Recurrent Unit (CNN-GRU), to extract temporal and spatial features for false-data-injection-attack detection. Alazab et al. [17] proposed a novel multidirectional long short-term memory (MLSTM) technique to predict the stability of the smart-grid network; experiments show that this hybrid approach significantly outperforms other machine-learning solutions. Alimi et al. [18] proposed a hybrid neural network algorithm that involves the combination of support vector machine (SVM) and multi-layer perceptron neural networks (MPL-NN) algorithms for predicting and detecting cyber intrusion attacks into power-system networks. Wahid et al. [14] proposed a fusion method based on a combination of convolutional neural networks and long short-term memory with a skip connection (CNN-LSTM), which provided the most reliable and highest prediction accuracy in comparison with other similar schemes. Cheng et al. [19] developed an attack-information-fusion model based on a convolutional neural network and support vector machine (svm) for DDos attacks. Saleh et al. [20] proposed a stacked LSTM-based method to fuse the temporal features of different sensors, which achieved good results in driving behavior-classification problems. Han et al. [21] employed a hybrid mechanism to design a fusion modelling method, which was utilized to accurately evaluate the energy consumption of smart buildings. Chen et al. [22] proposed a deep-learning framework, based on a convolutional neural network (CNN) and a naive Bayes data-fusion scheme and then applied it to image detection. Shao et al. [23] proposed a cyber-attack detection model by building a fusion model to process time features and frequency features simultaneously for short-term load forecasting.The above methods enable the fusion of data in smart-grid systems. However, they ignore the feature of temporal correlation in heterogeneous data without the unified treatment of temporal features.
Existing works on the integration of heterogeneous data from multiple sources in virtual power plants is focused on sensor node data fusion; by performing a review of data-fusion techniques in multi-sensor networks, some researchers have designed distributed fusion methods [24] that use maximum likelihood estimates (MLE) to achieve the stable estimation of local sensory data, eliminate data anomalies and solve the problem of fusing asynchronous data. In the study of data fusion based on a cluster structure [25], the Bayes method is used to estimate the number of sending data nodes in order to solve the data-conflict problem of cluster head nodes in data collection. In order to improve the computational efficiency of Bayes data fusion, there are technical solutions that implement the distributed computation of posterior probabilities and use neural networks for the fusion of multi-sensor data for target recognition systems.
Overall, the existing data-fusion methods mainly focus on processing homologous data. In the face of a new type of power system represented by VPP, due to the frequent data interaction between the public and private networks, the data transmitted by different sensor nodes may be always in a heterogeneous state, which makes it difficult for existing methods to eliminate the redundancy between large-scale data. Table 1 tabulates some works on data fusion, and their limitations are, accordingly, listed below. Although multi-neural networks can eliminate partial redundancy between data, the diversity of heterogeneous data still maintains a lot of redundancy so that the data-fusion performance is hard to improve. Therefore, a more accurate data-fusion model needs to be designed to further improve the efficiency of data fusion.

2.2. Self-Attention Mechanism

The attention mechanism and its variants have been widely used in different types of deep-learning tasks such as natural language processing, image recognition and speech recognition, in recent years. The self-attention mechanism is a variant of the attention mechanism, first proposed in 2017 in the Google team’s Transformer model [26], which is more concerned with capturing internal relationships between data or features than relying on external information compared to the attention mechanism. The time complexity of the self-attention in the traditional Transformer is O n 2 , computationally intensive and difficult to train and deploy. As a result, many optimization methods have emerged. For instance, the degree of self-attention can be improved by introducing sparsity into self-attention layers, reducing the complexity to O ( n n ) and O n log n [27,28]. Linformer [29] successfully reduced the complexity to O ( n ) by decomposing the self-attention operation into multiple smaller self-attention operations using linear projections. Reformer [30] used position-sensitive hashing to group sequences and reduces the complex of self-attention from O n 2 to O n log n . Choromanski et al. [31] used a low-rank matrix to replace the self-attention matrix, which can reduce the complexity from O n 2 to O ( n ) . Lu et al. [32] used the Gaussian kernel function to replace the dot product similarity without further normalization, thus reducing the complexity to O ( n ) .
According to the above analysis, a series of self-attention-mechanism-based network models have been developed to improve the performance of diverse computer vision tasks. However, there are few methods for designing data-fusion models by introducing self-attention mechanisms. Since self-attention mechanism can effectively aggregate the feature difference by learning the features and assigning different weights to represent their importance and also has a low computational cost with a linear complexity, we can borrow this idea for multi-source sensing data fusion. Therefore, this paper presents a better solution by introducing a self-attention mechanism in this way.

3. Proposed Method

3.1. Proposed Data-Fusion Framework

The framework of our proposed scheme is shown in Figure 1. The framework consists of three main modules: data pre-processing module, deep-feature-extraction module, and data-fusion module. In the data pre-processing module, the collected multi-source heterogeneous sensing data is re-arranged by removing the irrelevant data. Then, temporal alignment method is used to eliminate the temporal differences between sensors so that the same type of data can be employed under the same time dimension. Subsequently, the deep-feature-extraction module symmetrically combines CNN and LSTM neural networks to extract hybrid sensing features and the self-attention module is designed to further optimize the extracted features by setting different weighted transformation. Finally, deep convolutional fusion is employed on data-fusion module, which can aggregate the feature difference to improve detection rates aiming at FDIA attacks.

3.2. Data Pre-Processing Module

The data collected from different sensors vary considerably in terms of data composition, data accuracy, data transmission delay and frequency refresh. Thus, the collected multi-source data are firstly pre-processed to remove a lot of uncorrelated data. Then, we employed the curve-alignment method to eliminate the temporal difference in heterogeneous data from different sources, and sequentially use Pearson correlation coefficient to measure serial correlation. Assuming a heterogeneous dataset of two sensors from different sources { ( a i , b i ), i = 1 , 2 , 3 . . . n }, satisfying the binary normal distribution ( a i , b i ) N, the correlation coefficient of the sample can be calculated as follows.
ρ ^ ( A , B ) = i = 1 n a i a ¯ b i b ¯ i = 1 n a i a ¯ 2 i = 1 n b i b ¯ 2
where a ¯ and b ¯ are the mean values of the sample data sets A and B, respectively.
When the data are perceived by the sensors, cyber physical system containing strong timing may cause these data to have a certain temporal correlation. Accordingly, we can convert the Pearson coefficients with discrete sample values in Equation (1) into temporal Pearson coefficients.
ρ ( J , K ) = t = 1 t 1 a J τ J a K τ K t = 1 t 1 a J τ J 2 t = 1 t 1 a K τ K 2
where the ρ ( J , K ) is the correlation coefficient between the sample a J and the sample a K . t = 1 is the initial time for a given sensor, t 1 is the deadline, and τ J and τ K are the mean values of the sample a J and the sample a K , respectively.
Furthermore, the mutual covariance function of the two heterogeneous data vectors a J and a k can be calculated to assess the correlation of the two heterogeneous data vectors. Correspondingly, temporal alignment of the heterogeneous data can be sequentially achieved.
C J K = E a J t i τ J · a K t i τ K ( i = 1 , 2 , 3 . . . n )

3.3. Deep-Feature-Extraction Module

In this section, we construct a hybrid deep-network model by combining CNN and LSTM networks, and then design self-attention module to extract efficient deep features. The whole hybrid deep-network model is shown in Figure 2. Firstly, the model symmetrically divides the processed sensor data into two copies; one is used for extracting convolutional features [33,34] and the other is for extracting temporal features. The CNN layer that is used to extract the convolutional features mainly consists of three one-dimensional convolutional operations ( C o n v 1 D ) containing 5 × 5, 3 × 3, and 3 × 3 convolutional kernels, respectively, and all the convolutional layers have a convolutional step size of 1. The ReLU activation is then used to enhance the expressive power of neural networks and to improve the efficiency of feature extraction from input data, while Dropout (Dropout probability = 0.3) is used to prevent overfitting. One-dimensional convolution operations are used to extract convolutional features for the feature dimension without considering the effect of time series, where the number of training rounds is 15, the batch size is 128, the loss function uses MSE loss, and the optimizer is set to the Adam learner. The default parameters were (0.9, 0.99), and the optimization was performed by the Adam learner. The number of LSTM hidden layers used to extract temporal features is 2. The two-layer LSTM allows more fully connected layers to make the information more accurate. The convolutional features obtained from the CNN network and the temporal features obtained from the LSTM network are combined to obtain the final features ( f i , i = 1 , 2 , 3 n ) of the single-source sensor. Notably, the two network models are combined in the same dimension to ensure that the convolutional and temporal features do not affect each other. We concatenated the single-source sensor features to obtain the multi-source heterogeneous sets f ˜ .
Furthermore, the entire model of self-attention is presented in Figure 3. In detail, the proposed self-attention is similar to the multi-head self-attention structure, which uses three branches (I,K,V) to process the input, named by the input branch I, the key branch K and the value branch V. The input branch I maps each d-dimensional sequence in f ˜ into a scalar using a linear layer with weight W I R d , and then generates a k-dimensional vector. A softmax operation is correspondingly performed on this k-dimensional vector to obtain a context score C s R k .
C s = σ f ˜ · W I
where σ denotes the normalization operation and f ˜ · W I indicates the inner product operation.
In addition, the context score C s is used to weight the sequence and compute the text vector C v . C v is computed specifically by using a linear layer of weights W k R d × d to obtain the output, which is weighted by C s . C v is equivalent to the attention matrix in multi-headed self-attention. We use it to encode the contextual information, but it is not time-consuming compared with the attention matrix in multi-headed self-attention. Figure 4 illustrates a dot product between the input sequence and latent node to obtain global information, which is used to scale the key sequence. The weights W I serve as the latent node L, and the resultant vector is normalized using softmax to produce context scores C s . These context scores are used to weight key sequence and produce a context vector C v , which encodes contextual information.
C v = C s × W k · f ˜
Notably, the input f ˜ are scalar-produced using a linear layer with weight W v R d × d , followed by a ReLU activation to obtain the output. The contextual information in C v and the output are converted to the result by means of a Hadamard product. Finally, the results are output to another linear layer with weight W o R d × d to obtain the weighted features Q ^ .
Q ^ = C v × ReLU f ˜ · W v · W 0

3.4. Deep Convolution Data-Fusion Module

In order to fuse the weighted features Q ^ to improve the accuracy of the data-detection model, we further design a convolutional fusion mechanism for the weighted data using two different scales of sequence convolution operations.
P ^ = con con ( Q ^ )
where c o n v represents a one-dimensional convolution operation and each convolution kernel size is 1 × 1. conv represents the added dimensionality of the features, while conv denotes the reduced dimensionality of the features. Since this method of increasing dimension first and then reducing dimension can effectively maintain the time-characteristic information before and after fusion, the use of one-dimensional convolutional operations for multi-source cascaded features can ensure that the convolutional features are further fused without destroying the temporal features. Finally, the LSTM convolutional network is used to further extract the overall temporal features.
It is worth noting that the proposed data-fusion framework is not only able to handle heterogeneous data from two sensors, but can also handle the heterogeneous data from more than two sensors very well. In addition, our proposed self-attention uses common element operations (multiplication and summation) and has a linear time complexity; it is not, thus, time-consuming as the network complexity increases.

4. Experimental Results and Discussions

4.1. Experimental Setup

In our experiments, the IEEE 14 node-test system data was selected as the experimental data. The IEEE 14 node data was collected from New York independent system operator from 1 January 2020 to 1 May 2022 for real loads. Eleven regions were selected to represent the 11 load buses of the IEEE 14 node-test system. The individual bus-state variables were obtained by performing a trend calculation on the system. Moreover, the multi-source sensor data for the experiments in this paper were formed by simulating a split of the IEEE 14 node data, e.g., Figure 5. Each node in the IEEE 14 node-test system independently monitors the load information of the test system for that node, the node dividing in the system can be, thus, considered as dividing the different sensor data sources that monitor the same system.
In our experiments, the data for the training and validation sets are divided from the normal data of the system. For the training set, the attack data is generated using a false-data-injection method, while the attack strength c and the measurement noise e are combined to achieve the performance evaluation of the model detection. The attack strength c is a parameter used to measure the degree of impact of attackers on the system. It is typically used to measure the percentage of attackers who can successfully inject false data and influence system behavior.The measurement noise e, also known as noise variance, is commonly used to describe the degree to which the distribution of random variables deviates from the mean; Gaussian noise is often added to the input data during training to improve the robustness and generalization ability of the model. In addition, we designed the random FDIA with an attack duration of 1–5 times and number of attacks of 5000 times. The attack strength c = 0.1 and the measurement environment noise follows a Gaussian distribution, where measurement noise e = 0.25, 0.3, 0.35, 0.4, 0.45, 0.5. The training and validation sets were divided from 30,000 normal data points with a validation-set ratio of 0.3. The testing set consists of 13,298 normal data points and 270 false-attack data points, for a total of 13,568 data points.
To fully characterize the performance of our model, we exploit four metrics, named as accuracy, recall, precision and F 1 score. True Positive (TP) denotes the amount of false data detected correctly, True Negative (TN) the amount of normal data detected as normal, False Positive (FP) as the amount of normal data incorrectly detected as false, and False Negative (FN) the amount of false data incorrectly detected as normal; the four metrics can be correspondingly expressed as:
Accuracy = T P + T N T P + F P + T N + F N
F 1 = 2 × Precision × Recall Precision + Recall
Precision = T P T P + F P
Recall = T P T P + F N
In the above metrics, accuracy reflects the proportion of correct classifications, recall indicates the percentage of false data that we successfully detected out of all false data, precision means the percentage of false data that we correctly detected out of all predicted false data and F 1 is the harmonic mean of recall and accuracy [35]. Overall, a larger F 1 score implies a better overall performance of the model, while a larger area under the ROC curve [36] indicates better performance.

4.2. Effectiveness Verification for Proposed Model

In this experiment, we first carried out a series of experiments to test the effectiveness of our proposed network model. All simulation experiments were implemented over the platform with Intel (R) Core (TM) i5-8250U CPU @ 1.60 GHz and 8 GB Memory. The running time is 1064 s. In general, our network model mainly consists of four stages during normal operation, including data pre-processing, model training, threshold selection and model testing. In the data pre-processing phase, a temporal alignment operation was performed on the data using Pearson coefficients. The model training phase used the CNN-LSTM combined with self-attention network structure proposed to train the classification model, where the number of training rounds was 15, the batch size was 128, the loss function used MSE loss, and the optimizer was set to the Adam learner. The default parameters were (0.9, 0.99) and the optimization was performed by the Adam learner. The initial learning rate was 10 4 and the weights were weakened in each round. During the training of the model, the optimal model parameters were fixed using threshold selection and were used for FDIA-attack detection. We tested the change in loss during model training when two sensors and three sensors were fused for data, respectively.
Figure 6 shows the test results. It can be seen from this figure that our data-fusion network model can achieve a rapid training loss reduction and eventually become stable, whether two or three sensors are used. This indicates that our network model is effective in achieving fast data fusion. This phenomenon can be easily explained as follows. Unlike multiple self-attention using complex batch matrix multiplication, our model uses simple numerical operations (e.g., multiplication and summation), which can achieve the rapid convergence of the training model and, ultimately, reduce time consumption.
Furthermore, a series of experiments were conducted to validate the effect of the self-attention module in IEEE14 and IEEE118 system data sets. In this test, four basic deep-learning network models—the CNN model, LSTM model, MLP model and CNN-LSTM model—were used to test the effect of self-attention. We tested the fusion of data from two sensors and built the FDIA attack mentioned in Section 4.1, using the F 1 score as an evaluation metric. The corresponding results are shown in Figure 7. We can observe from this figure that the attack-detection performance of the four basic models in a multi-source data environment are significantly improved by adding the self-attention module. To be specific, in IEEE14, the F 1 score of the CNN model achieves an about 26% improvement; LSTM and MLP also have about 2% and 3% gains in F 1 score; while for the CNN-LSTM hybrid model, a 3% F 1 score improvement can be also achieved. Similarly, for the IEEE118 system data set, our fusion mechanism can achieve an about 21% F 1 score improvement comparing with the CNN model, a 3% F 1 score improvement for LSTM, and a 1% F 1 score improvement for the CNN-LSTM hybrid model. This shows that our self-attention module can effectively improve data availability before and after fusion. As the spatial and temporal features can be extracted separately using our designed CNN-LSTM hybrid network, the redundancy in the original data can be effectively eliminated while the two features do not affect each other. Moreover, the self-attention module can further learn deep internal relationships between features and assign weights to them. The network model pays more attention to the differences between attack features and normal features. Accordingly, it greatly enhances the usability of the fused data, resulting in, ultimately, an improvement in the accuracy of the detection of FDIA attacks.

4.3. Performance Comparison with the State-of-the-Art

To gain more insight, we tested the overall performance of different data-fusion methods before and after the addition of the self-attention module. Similar to previous experiments, we performed a further comparison among three deep-detection methods—CNN, LSTM and CNN-LSTM hybrid model—to show the advantages of the proposed self-attention module. In this experiment, we tested the fusion effectiveness using two sensors and three sensors, respectively, and then compared the detection performance based on accuracy and F 1 scores in IEEE14 and IEEE118.
The corresponding test results are shown in Table 2 and Table 3. In these tables, the four basic models were above 90% for the detection accuracy, according to the F 1 score and the threshold set by the ROC curve. With the addition of the self-attention module, these basic models presented different degrees of improvement. Specifically, when our self-attention module was added to each basic detection model, the accuracy gains of CNN, LSTM, and CNN-LSTM were 0.5%, 0.08%, and 0.13%, respectively, using the maximum F 1 score as the threshold criterion. As the ROC curves focus on the detection of abnormal and normal data, the corresponding ROC gains were also 3.8%, 0.35%, and 0.3%, respectively, when the optimal threshold value of ROC is used as the threshold value. In contrast, the data-fusion method proposed in this paper achieves the highest values of 99.46% and 99.08% based on both F 1 scores and ROC curves [35,36], respectively; on the IEEE118 dataset, we can also see improvements in different models. This demonstrates that our model can achieve the best detection performance.
We can explain this phenomenon as follows. It is well-known that the temporal features of the multi-source data are not involved in the CNN model. The LSTM model considers the temporal features, and it completely ignores the spatial correlation between the multi-source data. For the CNN-LSTM hybrid model, although it involves both spatial and temporal features, the deep temporal information of the fused spatial-temporal features is still abandoned during the feature-fusion process. For our CNN-LSTM hybrid model with the self-attention mechanism, our designed self-attention module focuses on the internal structure of the features, looking for intrinsic dependencies between different features, and provides the specific weights for each feature. Since different features usually have different levels of importance to the classification, the differences can be, thus, described by the weights. Overall, our model not only considers spatial and temporal features, but also uses the self-attention mechanism to further mine the internal connections between features, which can assign different weights to them according their importance, and, finally, it obtains the deep temporal information of the convolved features using one-dimensional convolution operations. Our model greatly enhances the usability of the data and ultimately results in a significant improvement in detection performance. In addition, the similar results can be also observed when using three sensors for data fusion in Table 3, which also verified the above conclusion.
Additionally, we tried to test the performance change of the self-attention by adjusting the parameters. We modified the output dimensionality of the linear layer W I from one to three parameters 2, 3, 4, respectively (e.g., out_features = 2, 3, 4). Similar to the previous experiments, we compared the accuracy and F 1 scores to evaluate the performance of the proposed data-fusion scheme with different parameters. The corresponding detection results are shown in Figure 8. As can be seen from this figure, after modifying the parameters, both in terms of F 1 score and accuracy, the performance of the proposed data-fusion scheme can achieve the best performance when the output dimension is 1 (out_features = 1), and the overall performance gradually decreases as the output dimension increases. We explain the phenomenon with the following reason. As the output dimension increases, the training models tend towards a finite performance and the resources used for training also have a similar tendency. In other words, more resources do not yield better benefit. Therefore, our experiments set the output dimension of self-attention as out_features = 1 to balance training resources and output performance.

4.4. Computational-Complexity Comparison

We further tested the computational complexity of our self-attention module when different parameters are used. To be specific, we compared the complexity based on the running time and different input sizes by just changing the parameter k, e.g., Figure 3; the self-attention module was employed to provide the testing results. As shown in Figure 9, for different input sizes, a linear complexity can be generally maintained, although the running time continues to change. This is mainly because in branch I of the self-attention model, a k-dimensional sequence can be effectively obtained using a linear projection, which can be computed relative to the latent node; in addition, it is not necessary to use the batch matrix approach in multi-head self-attention to process the computation. Therefore, the complexity can be significantly reduced to O ( n ) . Apparently, modifying the output dimension may only increase the cost of the computation, not the computation complexity. Therefore, it is conceivable that the time complexity of our self-attention module will always be linear with the size increase in the input parameters, because the running time is only related to k. This undoubtedly demonstrates that our proposed self-attention mechanism can improve the efficiency of data fusion without significantly increasing the time complexity of the network model. Therefore, compared to other data-fusion models, our model has a superior comprehensive performance.

5. Conclusions

In this paper, we propose a hybrid deep data-fusion framework based on LSTM and convolutional neural networks with self-attention combined. The proposed approach first processes the data using temporal-alignment techniques and mines spatial-temporal features by building a symmetric hybrid CNN-LSTM network model, followed by a kind of self-attentiveness to mine the internal relationships between features and assign different weights to them and, finally, perform convolution with a separate LSTM to further mine the deep temporal information of spatial-temporal features. The proposed model is validated on a load dataset of a IEEE 14-bus system and the experimental results show that the proposed fusion model can achieve better detection performance compared to the original multi-source heterogeneous sensing data. The proposed method solves the problem of the efficient fusion of multi-sensor heterogeneous data in smart grids, and also provides a solid data basis for attack detection in smart grids.
While our proposed method showed a superior performance in the test with FDIA detection, we should note that our data-fusion scheme mainly uses the load data set from the IEEE 14-bus system to simulate multi-source sensing data fusion on the electrical side. In fact, for VPP scenarios in current new power systems, multi-source data fusion considering the public–private side may be more practical. In addition, although our method achieves good performance on continuous data, a power cyber physical system contains not only continuous data, but also discrete data, which have a significant impact on the detection of FDIA attacks. Considering the above problem, we plan to further improve our work in two ways. First, we will try to introduce a multi-modal attention mechanism to address the problem of multi-source heterogeneous data fusion on the public–private sides in new power systems, which remains an open challenge. Second, we will investigate lightening the deep hybrid network model.

Author Contributions

Conceptualization, Y.T.; Methodology, Y.W.; Software, N.G.; Validation, Q.W. and N.G.; Resources, Y.T.; Writing—original draft, Y.W.; Writing—review & editing, F.L. and X.S.; Supervision, F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Scientific and Technological Project of the State Grid Shanghai Municipal Electric Power Company (Grant No. B30940220003).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liu, X.; Zeng, X.; Yao, L.; Rashed, G.I.; Deng, C. Power System State Estimation Based on Fusion of WAMS/SCADA Measurements: A Survey. In Proceedings of the 2018 2nd IEEE Conference on Energy Internet and Energy System Integration (EI2), Beijing, China, 20–22 October 2018; pp. 1–6. [Google Scholar]
  2. Jiang, C.; Huang, C.; Huang, Q.; Shi, J. A Multi-Source Big Data Security System of Power Monitoring Network Based on Adaptive Combined Public Key Algorithm. Symmetry 2021, 13, 1398. [Google Scholar] [CrossRef]
  3. Ucar, F. A Comprehensive Analysis of Smart Grid Stability Prediction along with Explainable Artificial Intelligence. Symmetry 2023, 15, 289. [Google Scholar] [CrossRef]
  4. Fahmy, F.B.N.; Soliman, M.H.; Talaat, H.E.A.; Attia, M.A. Modern Active Voltage Control in Distribution Networks, including Distributed Generation, Using the Hardware-in-the-Loop Technique. Symmetry 2022, 15, 90. [Google Scholar]
  5. Jhala, K.; Natarajan, B.; Pahwa, A.; Wu, H. Stability of Transactive Energy Market-Based Power Distribution System under Data Integrity Attack. IEEE Trans. Ind. Inform. 2019, 15, 5541–5550. [Google Scholar] [CrossRef]
  6. Bahrami, M.; Fotuhi-Firuzabad, M.; Farzin, H. Reliability Evaluation of Power Grids Considering Integrity Attacks against Substation Protective IEDs. IEEE Trans. Ind. Inform. 2019, 16, 1035–1044. [Google Scholar] [CrossRef]
  7. Yu, S.; Fang, F.; Liu, Y.; Liu, J. Uncertainties of Virtual Power Plant: Problems and Countermeasures. Appl. Energy 2019, 239, 454–470. [Google Scholar] [CrossRef]
  8. Sergei, K.; Aleksei, M.; Mariia, K. Novel Approach to Collect and Process Power Quality Data in Medium-Voltage Distribution Grids. Symmetry 2021, 13, 460. [Google Scholar]
  9. Jiao, Z.; Wu, R.; Wang, Z. A Novel Method to Improve the Fault Location Accuracy in Transmission Line Based on Data Fusion Technology. Proc. CSEE 2017, 37, 2571–2578. [Google Scholar]
  10. Shi, Z.; Yao, W.; Li, Z.; Zeng, L.; Zhao, Y.; Zhang, R.; Tang, Y.; Wen, J. Artificial Intelligence Techniques for Stability Analysis and Control in Smart Grids: Methodologies, Applications, Challenges and Future Directions. Energies 2020, 278, 115733. [Google Scholar] [CrossRef]
  11. Kumar, R.S.; Saravanan, S.; Pandiyan, P.; Tiwari, R. Impact of Artificial Intelligence Techniques in Distributed Smart Grid Monitoring System. In Smart Energy and Electric Power Systems; Elsevier: Amsterdam, The Netherlands, 2023; pp. 79–103. [Google Scholar]
  12. Lin, D.; Fang, L.; Wan, X.; Wu, Q.; Liu, H. Status Monitoring and Fault Handling Method Based on Big Data Analysis of Intelligent Distribution Network. Digit. Technol. Appl. 2018, 7, 100–101. [Google Scholar]
  13. Qin, S.-T.; Huang, C.; Tian, J.-Y.; Yang, Y.; Wei, H. Research and Application of Multi-Source Big Data Fusion Method in Power Grid. IEEE Electr. Device 2021, 2, 480–485. [Google Scholar]
  14. Wahid, A.; Breslin, J.G.; Intizar, M.A. Prediction of Machine Failure in Industry 4.0: A Hybrid CNN-LSTM Framework. Appl. Sci. 2022, 12, 4221. [Google Scholar] [CrossRef]
  15. Mehta, S.; Rastegari, M. Separable Self-Attention for Mobile Vision Transformers. arXiv 2022, arXiv:2206.02680. [Google Scholar]
  16. Lu, M.; Wang, L.; Cao, Z.; Zhao, Y.; Sui, X. False Data Injection Attacks Detection on Power Systems with Convolutional Neural Network. JPCS 2020, 1633, 012134. [Google Scholar] [CrossRef]
  17. Alazab, M.; Khan, S.; Krishnan, S.S.R.; Pham, Q.-V.; Reddy, M.P.K.; Gadekallu, T.R. A Multidirectional LSTM Model for Predicting the Stability of a Smart Grid. IEEE Access 2020, 8, 85454–85463. [Google Scholar] [CrossRef]
  18. Alimi, O.A.; Ouahada, K.; Abu-Mahfouz, A.M. Real Time Security Assessment of the Power System Using a Hybrid Support Vector Machine and Multilayer Perceptron Neural Network Algorithms. Sustainability 2019, 11, 3586. [Google Scholar] [CrossRef]
  19. Cheng, J.; Cai, C.; Tang, X.; Sheng, V.S.; Guo, W.; Li, M. A DDoS Attack Information Fusion Method Based on CNN for Multi-Element Data. Comput. Mater. Contin. 2020, 63, 131–150. [Google Scholar] [CrossRef]
  20. Saleh, K.; Hossny, M.; Nahavandi, S. Driving Behavior Classification Based on Sensor Data Fusion Using LSTM Recurrent Neural Networks. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017. [Google Scholar]
  21. Han, X.; Zhang, C.; Tang, Y.; Ye, Y. Physical-Data Fusion Modeling Method for Energy Consumption Analysis of Smart Building. J. Mod. Power Syst. Clean Energy 2022, 10, 482–491. [Google Scholar] [CrossRef]
  22. Chen, F.-C.; Jahanshahi, M.R. NB-CNN: Deep Learning-Based Crack Detection Using Convolutional Neural Network and Naive Bayes Data Fusion. IEEE Trans. Ind. Electron. 2017, 65, 4392–4400. [Google Scholar] [CrossRef]
  23. Shao, X.; Pu, C.; Zhang, Y.; Kim, C.S. Domain Fusion CNN-LSTM for Short-Term Power Consumption Forecasting. IEEE Access 2020, 8, 188352–188362. [Google Scholar] [CrossRef]
  24. Hu, Y.L.; Sun, Y.F.; Yin, B.C. Information Sensing and Interaction Technology in Internet of Things. Chin. J. Comput. 2012, 35, 1147. [Google Scholar] [CrossRef]
  25. Yuan, Y.; Kam, M. Distributed Decision Fusion with a Random-Access Channel for Sensor Network Applications. IEEE Trans. Instrum. Meas. 2004, 53, 1339–1344. [Google Scholar] [CrossRef]
  26. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. NIPS 2017, 30, 5998–6008. [Google Scholar]
  27. Child, R.; Gray, S.; Radford, A.; Sutskever, I. Generating Long Sequences with Sparse Transformers. arXiv 2019, arXiv:1904.10509. [Google Scholar]
  28. Qiu, J.; Ma, H.; Levy, O.; Yih, S.W.-T.; Wang, S.; Tang, J.; Yih, W.-T. Blockwise Self-Attention for Long Document Understanding. arXiv 2020, arXiv:1911.02972. [Google Scholar]
  29. Wang, S.; Li, B.Z.; Khabsa, M.; Fang, H.; Ma, H. Linformer: Self-Attention with Linear Complexity. arXiv 2020, arXiv:2006.04768. [Google Scholar]
  30. Kitaev, N.; Kaiser, L.; Levskaya, A. Reformer: The Efficient Transformer. arXiv 2020, arXiv:2001.04451. [Google Scholar]
  31. Choromanski, K.; Likhosherstov, V.; Dohan, D.; Song, X.; Gane, A.; Sarlos, T.; Hawkins, P.; Davis, J.; Mohiuddin, A.; Kaiser, L.; et al. Rethinking Attention with Performers. arXiv 2021, arXiv:2010.11929. [Google Scholar]
  32. Lu, J.; Yao, J.; Zhang, J.; Zhu, X.; Xu, H.; Gao, W.; Xu, C.; Xiang, T.; Zhang, L. Soft: Softmax-Free Transformer with Linear Complexity. Adv. Neural Inf. Process. Syst. 2021, 34, 21297–21309. [Google Scholar]
  33. Dewangan, D.K.; Sahu, S.P. Lane detection in intelligent vehicle system using optimal 2-tier deep convolutional neural network. Multimed. Tools Appl. 2023, 82, 7293–7317. [Google Scholar] [CrossRef]
  34. Dewangan, D.K.; Sahu, S.P. Optimized convolutional neural network for road detection with structured contour and spatial information for intelligent vehicle system. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2252002. [Google Scholar] [CrossRef]
  35. Wixted, J.T.; Mickes, L.; Wetmore, S.A.; Gronlund, S.D.; Neuschatz, J.S. ROC Analysis in Theory and Practice. Behav. Sci. 2017, 6, 343–351. [Google Scholar] [CrossRef]
  36. Hoo, Z.H.; Candlish, J.; Teare, D. What is an ROC Curve? Emerg. Med. J. 2017, 34, 357–359. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Proposed multi-source self-attention data-fusion framework containing data pre-processing module, deep-feature-extraction module, and data-fusion module.
Figure 1. Proposed multi-source self-attention data-fusion framework containing data pre-processing module, deep-feature-extraction module, and data-fusion module.
Symmetry 15 01019 g001
Figure 2. Symmetric hybrid deep-network model.
Figure 2. Symmetric hybrid deep-network model.
Symmetry 15 01019 g002
Figure 3. Self-attention mechanism and the module design.
Figure 3. Self-attention mechanism and the module design.
Symmetry 15 01019 g003
Figure 4. Arithmetic and structure details of proposed symmetric self-attention model.
Figure 4. Arithmetic and structure details of proposed symmetric self-attention model.
Symmetry 15 01019 g004
Figure 5. IEEE 14 test system node.
Figure 5. IEEE 14 test system node.
Symmetry 15 01019 g005
Figure 6. The loss change during model training when two sensors and three sensors were used for data fusion, respectively. (a) Training model for two sensors. (b) Training model for three sensors.
Figure 6. The loss change during model training when two sensors and three sensors were used for data fusion, respectively. (a) Training model for two sensors. (b) Training model for three sensors.
Symmetry 15 01019 g006
Figure 7. Attack-detection performance for four basic models before and after using self-attention module in IEEE14 and IEEE118. (a) IEEE14 system data. (b) IEEE118 system data.
Figure 7. Attack-detection performance for four basic models before and after using self-attention module in IEEE14 and IEEE118. (a) IEEE14 system data. (b) IEEE118 system data.
Symmetry 15 01019 g007
Figure 8. F 1 score and accuracy comparisons when different parameters are used. In this test, we set out_features = 1, 2, 3, 4, respectively. (a) The F 1 score (best F 1 ) change with different parameters. (b) The accuracy (best F 1 ) change with different parameters. (c) The F 1 score (ROC) change with different parameters. (d) The accuracy (ROC) with different parameters.
Figure 8. F 1 score and accuracy comparisons when different parameters are used. In this test, we set out_features = 1, 2, 3, 4, respectively. (a) The F 1 score (best F 1 ) change with different parameters. (b) The accuracy (best F 1 ) change with different parameters. (c) The F 1 score (ROC) change with different parameters. (d) The accuracy (ROC) with different parameters.
Symmetry 15 01019 g008
Figure 9. Comparison of running times with different parameters.
Figure 9. Comparison of running times with different parameters.
Symmetry 15 01019 g009
Table 1. Existing data-fusion models: characteristics and challenges.
Table 1. Existing data-fusion models: characteristics and challenges.
AuthorModelCharacteristicsChallenges
Lu et al. [16]CNN-GRUHigh accuracy, real-time,
Fast convergence speed
Require high time
Alazab et al. [17]MLSTMPrecision, recall,
F 1 -score for stable and unstable class
Running and deploying,
Time-consuming
Alimi et al. [18]SVM-MPL-NNHigh accuracyCannot identify location,
eliminate intrusion
Wahid et al. [14]CNN-LSTMCapture abstract features
Reliable prediction accuracy
No integration
of feature information
Cheng et al. [19]CNN-SVMHigh fusion efficiency
Low memory consumption
Low running time
High detection rate
Improved F 1 score
Chen et al. [22]NB-CNNIntegrate multiple featuresGPU dependency
Slow computing speed
Shao et al. [23]CNN-LSTMBetter short-term predictionInferior performance
on large samples
Table 2. Performance comparison of different attack-detection models when two sensors are used in IEEE14 and IEEE118 node systems.
Table 2. Performance comparison of different attack-detection models when two sensors are used in IEEE14 and IEEE118 node systems.
IEEE14IEEE118
Detection ModelBest F 1 ROCBest F 1 ROC
Accuracy F 1 Accuracy F 1 Accuracy F 1 Accuracy F 1
CNN [16]0.98880.67800.91560.30140.98930.70590.92670.3326
CNN+Self-Attention0.99450.85770.95070.43380.99450.85760.95070.4300
LSTM [17]0.99390.84930.97570.60530.99370.84570.93910.3798
LSTM+Self-Attention0.99470.87020.97910.64130.99460.87090.98490.7126
MLP [18]0.99410.85190.97760.62540.99710.86720.97520.6002
CNN-LSTM [14]0.99340.84460.97580.60670.99430.85360.97210.5723
CNN-LSTM+Self-Attention0.99470.86970.97860.63640.99540.86710.97920.6429
Proposed model0.99540.88850.99290.83860.99570.87360.99280.8377
Note: The significance of bold emphasis is to show that the value is the maximum in current column.
Table 3. Performance comparison of different attack-detection models when three sensors are used in IEEEE14 and IEEE118 node systems.
Table 3. Performance comparison of different attack-detection models when three sensors are used in IEEEE14 and IEEE118 node systems.
IEEE14IEEE118
Detection ModelBest F 1 ROCBest F 1 ROC
Accuracy F 1 Accuracy F 1 Accuracy F 1 Accuracy F 1
CNN-LSTM [14]0.99360.84690.97220.57300.99330.84150.97210.5769
Proposed model0.99460.87120.99080.80130.99370.85360.98840.7377
Note: The significance of bold emphasis is to show that the value is the maximum in current column.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, Y.; Wang, Q.; Guo, N.; Tian, Y.; Li, F.; Su, X. Efficient Multi-Source Self-Attention Data Fusion for FDIA Detection in Smart Grid. Symmetry 2023, 15, 1019. https://doi.org/10.3390/sym15051019

AMA Style

Wu Y, Wang Q, Guo N, Tian Y, Li F, Su X. Efficient Multi-Source Self-Attention Data Fusion for FDIA Detection in Smart Grid. Symmetry. 2023; 15(5):1019. https://doi.org/10.3390/sym15051019

Chicago/Turabian Style

Wu, Yi, Qiankuan Wang, Naiwang Guo, Yingjie Tian, Fengyong Li, and Xiangjing Su. 2023. "Efficient Multi-Source Self-Attention Data Fusion for FDIA Detection in Smart Grid" Symmetry 15, no. 5: 1019. https://doi.org/10.3390/sym15051019

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop