A Hierarchical Spatio-Temporal Graph Attention Network for False Data Injection Attack Detection in Smart Grids
Abstract
1. Introduction
- (1)
- A hierarchical architecture: We decouple the learning process into distinct spatial and temporal stages, moving beyond joint spatio-temporal models. This design provides superior interpretability and aligns more closely with the physical propagation characteristics of cyber intrusions in smart grids.
- (2)
- Advanced spatial encoding: We employ a GAT [18] to explicitly capture the complex structural dependencies and inherent physical constraints among buses and transmission lines, ensuring a faithful representation of the grid topology.
- (3)
- Coordinated temporal analysis: We integrate temporal self-attention mechanisms [19] to identify subtle and evolving attack patterns across successive time steps, enabling the detection of sophisticated, coordinated attacks that manifest only over time.
- (4)
- Interpretable and robust detection: The proposed model is designed not only for high accuracy but also for robustness under class imbalance, with an architecture amenable to explainability analysis, enhancing trustworthiness for real-world deployment.
2. Proposed Methodology
2.1. Graph Representation of the Grid
2.2. Hierarchical Spatio-Temporal Graph Attention Network
- (1)
- Spatial feature extraction layer: Utilizes GATv2Conv to encode spatial attention for the graph structure at each time step.
- (2)
- Temporal feature fusion layer: Employs linear projection and temporal attention mechanism to capture temporal dependencies.
- (3)
- Sequence modeling layer: Uses a gated recurrent unit (GRU) to further learn temporal dynamics.
- (4)
- Classification output layer: Performs attack detection classification based on the final hidden state.
2.3. Spatial Attention Layer with GATv2Conv
- (1)
- Attention coefficient calculation:where is a learnable weight matrix, is the attention vector, and ∥ denotes concatenation.
- (2)
- Normalized attention weights:where denotes the neighborhood of node i.
- (3)
- Node feature aggregation:
2.4. Temporal Attention Mechanism
- (1)
- Feature reshaping and projection:First, the spatial features are reshaped into and then projected to a lower dimension via a linear layer:where is the projection matrix, is the bias vector and D is the projected dimension.
- (2)
- Multi-head temporal attention calculation:where Query (), Key (), and Value () are all derived from the projected features . The model uses two heads for temporal attention to enhance the capture of diverse temporal patterns.
2.5. GRU-Based Temporal Sequence Processing
- (1)
- Update gate and reset gate:
- (2)
- Candidate hidden state:
- (3)
- Final hidden state update:
2.6. Output Layer and Loss Function
- (1)
- Classification output layer:where is the sigmoid activation function, and denotes the probability of an attack.
- (2)
- Loss function:The binary cross-entropy loss is used:
- (3)
- Regularization strategies:Dropout with a rate of 0.2 is applied to prevent overfitting, weight decay (L2 regularization) with a coefficient is used, and batch normalization is applied after the spatial attention layer to accelerate training convergence.
| Algorithm 1: HST-GAT Forward Propagation |
![]() |
3. Experiment Setup
3.1. Dataset Construction
3.2. Baseline Methods
3.3. Evaluation Metrics
3.4. Implementation Details
4. Results and Discussion
4.1. Performance Analysis and Comparative Evaluation
4.1.1. Overall Performance Comparison
4.1.2. Analysis of Precision–Recall Trade-Offs
4.1.3. Comparative Performance Analysis
4.2. Explainability Analysis
4.2.1. Results Analysis
4.2.2. Case Studies
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Qu, B.; Wang, Z.; Shen, B.; Dong, H.; Zhang, X. Secure particle filtering with Paillier encryption–decryption scheme: Application to multi-machine power grids. IEEE Trans. Smart Grid 2023, 15, 863–873. [Google Scholar] [CrossRef]
- Sridhar, S.; Hahn, A.; Govindarasu, M. Cyber–physical system security for the electric power grid. Proc. IEEE 2011, 100, 210–224. [Google Scholar] [CrossRef]
- Zhang, M.; Shen, C.; He, N.; Han, S.; Li, Q.; Wang, Q.; Guan, X. False data injection attacks against smart gird state estimation: Construction, detection and defense. Sci. China Technol. Sci. 2019, 62, 2077–2087. [Google Scholar] [CrossRef]
- Liu, Y.; Ning, P.; Reiter, M.K. False data injection attacks against state estimation in electric power grids. ACM Trans. Inf. Syst. Secur. (TISSEC) 2011, 14, 1–33. [Google Scholar] [CrossRef]
- Alsharif, G.O.; Anagnostopoulos, C.; Marnerides, A.K. Energy Market Manipulation via False-Data Injection Attacks: A Review. IEEE Access 2025, 13, 42559–42573. [Google Scholar] [CrossRef]
- Li, J.; Sun, C.; Su, Q. Analysis of cascading failures of power cyber-physical systems considering false data injection attacks. Glob. Energy Interconnect. 2021, 4, 204–213. [Google Scholar] [CrossRef]
- Ozay, M.; Esnaola, I.; Vural, F.T.Y.; Kulkarni, S.R.; Poor, H.V. Machine learning methods for attack detection in the smart grid. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 1773–1786. [Google Scholar] [CrossRef]
- He, H.; Yan, J. Cyber-physical attacks and defences in the smart grid: A survey. IET Cyber-Phys. Syst. Theory Appl. 2016, 1, 13–27. [Google Scholar] [CrossRef]
- Miao, K.; Zhang, M.; Guo, F.; Lu, R.; Guan, X. Detection of False Data Injection Attacks in Smart Grids: An Optimal Transport-Based Reliable Self-Training Approach. IEEE Trans. Inf. Forensics Secur. 2025, 20, 709–723. [Google Scholar] [CrossRef]
- Miao, K.; Zhang, M.; Chen, K.; Li, Y.; Zhan, X.; Guan, X. Learning to Match Prototype for Few-Shot Classification of Attacks and Faults in Smart Grids. IEEE Trans. Cybern. 2025. [Google Scholar] [CrossRef] [PubMed]
- Graves, A.; Mohamed, A.r.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
- Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long short term memory networks for anomaly detection in time series. In Proceedings of the 2015 European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 22–24 April 2015; pp. 89–94. [Google Scholar]
- Hasan, M.N.; Toma, R.N.; Nahid, A.A.; Islam, M.M.; Kim, J.M. Electricity theft detection in smart grid systems: A CNN-LSTM based approach. Energies 2019, 12, 3310. [Google Scholar] [CrossRef]
- Wu, Y.; Dai, H.N.; Tang, H. Graph neural networks for anomaly detection in industrial Internet of Things. IEEE Internet Things J. 2021, 9, 9214–9231. [Google Scholar] [CrossRef]
- Xia, W.; He, D.; Yu, L. Locational detection of false data injection attacks in smart grids: A graph convolutional attention network approach. IEEE Internet Things J. 2023, 11, 9324–9337. [Google Scholar] [CrossRef]
- Takiddin, A.; Ismail, M.; Atat, R.; Davis, K.R.; Serpedin, E. Robust graph autoencoder-based detection of false data injection attacks against data poisoning in smart grids. IEEE Trans. Artif. Intell. 2023, 5, 1287–1301. [Google Scholar] [CrossRef]
- Miao, K.; Zhang, M.; Fan, B.; Guan, X. Domain Adaptive Representation Learning for Attack Detection in Smart Grids. IEEE Trans. Smart Grid 2025. [Google Scholar] [CrossRef]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Li, X.; Wang, Y.; Lu, Z. Graph-based detection for false data injection attacks in power grid. Energy 2023, 263, 125865. [Google Scholar] [CrossRef]
- Li, H.; Dou, C.; Yue, D.; Hancke, G.P.; Zeng, Z.; Guo, W.; Xu, L. End-edge-cloud collaboration-based false data injection attack detection in distribution networks. IEEE Trans. Ind. Inform. 2023, 20, 1786–1797. [Google Scholar] [CrossRef]
- Su, X.; Deng, C.; Yang, J.; Li, F.; Li, C.; Fu, Y.; Dong, Z.Y. DAMGAT-based interpretable detection of false data injection attacks in smart grids. IEEE Trans. Smart Grid 2024, 15, 4182–4195. [Google Scholar] [CrossRef]
- Chen, C.; Li, Q.; Chen, L.; Liang, Y.; Huang, H. An improved GraphSAGE to detect power system anomaly based on time-neighbor feature. Energy Rep. 2023, 9, 930–937. [Google Scholar] [CrossRef]
- Vincent, E.; Korki, M.; Seyedmahmoudian, M.; Stojcevski, A.; Mekhilef, S. Reinforcement learning-empowered graph convolutional network framework for data integrity attack detection in cyber-physical systems. CSEE J. Power Energy Syst. 2024, 10, 797–806. [Google Scholar]
- Ji, J.; Liu, Y.; Chen, J.; Yao, Z.; Zhang, M.; Gong, Y. False Data Injection Attack Detection Method Based on Deep Learning with Multi-Scale Feature Fusion. IEEE Access 2024, 12, 89262–89274. [Google Scholar] [CrossRef]
- Li, Y.; Wei, X.; Li, Y.; Dong, Z.; Shahidehpour, M. Detection of false data injection attacks in smart grid: A secure federated deep learning approach. IEEE Trans. Smart Grid 2022, 13, 4862–4872. [Google Scholar] [CrossRef]
- Lu, K.D.; Zhou, L.; Wu, Z.G. Representation-learning-based CNN for intelligent attack localization and recovery of cyber-physical power systems. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 6145–6155. [Google Scholar] [CrossRef] [PubMed]





| Method | Architecture Type | Spatial | Temporal | Key Feature |
|---|---|---|---|---|
| TSGCN | Joint spatio-temporal | GCN | LSTM | Integrated feature learning |
| DAMGAT | Attention-based | GAT | Integrated attention | Dynamic attention |
| GGNN-GAT | Graph-based | GGNN+GAT | Basic recurrent layers | Topology-aware |
| CNN-LSTM | Sequential hybrid | CNN | LSTM | Spatial convolution |
| HST-GAT | Hierarchical decoupled | GATv2Conv | Self-attention+GRU | Explicit separation of space/time |
| Hyperparameter | Value | Description |
|---|---|---|
| Time Window Size (T) | 10 | Length of historical measurement sequences |
| Batch Size | 256 | Number of training samples per iteration |
| Learning Rate | Initial step size for the Adam optimizer | |
| Hidden Dimensions (D) | 64 | Dimension of the hidden state in GAT and GRU |
| GAT Heads () | 4 | Number of attention heads in spatial module |
| Temporal Heads () | 4 | Number of heads in multi-head self-attention |
| Dropout Rate | 0.2 | Probability for dropout layers to prevent overfitting |
| Training Epochs | 100 | Maximum number of passes over the dataset |
| Optimizer | Adam | Algorithm used for weight updates |
| Weight Decay | L2 regularization coefficient to prevent overfitting | |
| Activation Function | ELU | Non-linear activation for the spatial module |
| Bus Systems | Models | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| IEEE 14-bus system | HST-GAT | 98.70% | 100% | 96.02% | 97.97% |
| GGNN-GAT | 92.99% | 92.84% | 93.05% | 92.94% | |
| TSGCN | 93.04% | 93.25% | 92.67% | 92.96% | |
| DAMGAT | 92.64% | 92.14% | 93.12% | 92.62% | |
| SAGE | 92.87% | 92.90% | 92.70% | 92.80% | |
| GCN | 92.74% | 92.71% | 92.64% | 92.68% | |
| CNN-GRU | 90.02% | 89.91% | 89.97% | 89.94% | |
| Transformer | 91.64% | 92.22% | 90.79% | 91.50% | |
| CNN | 85.61% | 84.97% | 86.25% | 85.60% | |
| IEEE 118-bus system | HST-GAT | 92.38% | 100% | 75.56% | 86.08% |
| GGNN- GAT | 78.19% | 78.08% | 78.07% | 78.03% | |
| TSGCN | 72.08% | 73.01% | 69.70% | 71.25% | |
| DAMGAT | 79.54% | 80.41% | 77.68% | 79.00% | |
| SAGE | 77.46% | 78.17% | 75.71% | 76.90% | |
| GCN | 72.34% | 74.19% | 68.21% | 70.96% | |
| CNN-GRU | 57.11% | 57.45% | 56.99% | 56.89% | |
| Transformer | 66.56% | 66.44% | 66.10% | 66.24% | |
| CNN | 56.78% | 46.90% | 45.64% | 45.97% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, H.; Cheng, J.; Bai, X.; Wang, D.; Gao, R.; Fan, B. A Hierarchical Spatio-Temporal Graph Attention Network for False Data Injection Attack Detection in Smart Grids. Processes 2026, 14, 507. https://doi.org/10.3390/pr14030507
Zhang H, Cheng J, Bai X, Wang D, Gao R, Fan B. A Hierarchical Spatio-Temporal Graph Attention Network for False Data Injection Attack Detection in Smart Grids. Processes. 2026; 14(3):507. https://doi.org/10.3390/pr14030507
Chicago/Turabian StyleZhang, Hongjie, Jichuan Cheng, Xue Bai, Dong Wang, Rixin Gao, and Bo Fan. 2026. "A Hierarchical Spatio-Temporal Graph Attention Network for False Data Injection Attack Detection in Smart Grids" Processes 14, no. 3: 507. https://doi.org/10.3390/pr14030507
APA StyleZhang, H., Cheng, J., Bai, X., Wang, D., Gao, R., & Fan, B. (2026). A Hierarchical Spatio-Temporal Graph Attention Network for False Data Injection Attack Detection in Smart Grids. Processes, 14(3), 507. https://doi.org/10.3390/pr14030507


