TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection
Abstract
1. Introduction
- (1)
- Multi-scale Temporal Feature Extraction: We employ the Time Series Feature extraction based on Scalable Hypothesis tests (TSFresh) to mine statistical, time-domain, and frequency-domain features from load sequences, constructing a rich multi-scale temporal representation.
- (2)
- Graph-level Embedding Feature Extraction: We utilize a Sparse Unified GATv2 module to model value-aware dependencies among historical states. By enhancing the input embedding with multi-scale contexts and constructing a sparse KNN graph, the method achieves unified modeling of local variations and global recurrences, thereby improving the expressiveness and robustness of graph-level features.
- (3)
- Cross-attention-based Dynamic Feature Fusion Mechanism: We design a cross-attention interaction module to enable deep coupling and dynamic weighting between multi-scale temporal features and graph-level features. This allows the model to adaptively focus on critical feature channels while suppressing redundancy from irrelevant information, significantly enhancing the accuracy and stability of anomaly detection.
2. TGCformer
2.1. Overall Framework
- (1)
- Temporal Feature Extraction Branch: As shown in Module 1 of Figure 1, the raw power load time series is first processed by the TSFresh module to extract a comprehensive set of multi-scale temporal features, including time-domain statistics, frequency-domain descriptors, and non-linear features. These features explicitly encode the volatility, periodicity, and distributional properties of the load sequences. To reduce redundancy and enhance robustness, statistical significance testing and feature selection processes are applied (as indicated by the filtering module in the figure). This branch outputs a refined temporal feature vector containing 783 significant features, serving as the temporal semantic representation for subsequent fusion.
- (2)
- Graph-level Embedding Extraction: In parallel with temporal modeling, Module 2 of Figure 1 depicts the global dependency modeling process. The time series is transformed into a latent value graph structure by constructing a sparse adjacency matrix using the K-Nearest Neighbors (KNN) algorithm, where each node represents a time point, and edges encode semantic proximity based on signal magnitude (rather than temporal distance). The resulting sparse graph is then fed into stacked Sparse Unified GATv2 layers. Incorporating a unified feature embedding (comprising multi-scale statistics and positional encodings), this layer achieves adaptive attention-based aggregation of recurrent pattern information. A mean pooling operation is subsequently applied to aggregate node-level embeddings into a global graph-level representation with a dimension of 64, preserving both value-based topological relations and global dependency patterns.
- (3)
- Dual-Channel Fusion based on Cross-Attention: Module 3 of Figure 1 corresponds to the proposed multi-head cross-attention mechanism. In this module, the refined temporal features from the upper branch are projected via a linear layer into a unified 512-dimensional embedding space as Query vectors, while the graph-level embeddings from the lower branch are similarly mapped to the same dimension to serve as Key and Value vectors. Through cross-attention, the model explicitly aligns local temporal patterns with global structure-related graph information, allowing temporal semantics to dynamically guide the weighting of graph-structural dependencies. The multi-head design, as illustrated by the parallel attention heads in the figure, enables the learning of complementary heterogeneous feature interactions across different subspaces. Additionally, residual connections and normalization layers are employed to ensure stable training and effective deep feature encoding.
- (4)
- Anomaly Classification Head: As shown in Module 4 of Figure 1, the fused dual-channel joint representation is passed to a Multilayer Perceptron (MLP) classifier. The MLP further refines features through stacked linear layers and non-linear activation functions, transforming high-dimensional Time-Graph dependencies into discriminative semantics. Finally, the output layer employs a Sigmoid activation function to map the latent features to a probability score within the interval [0, 1], functioning as a binary classifier to determine whether a given sample belongs to the normal or anomalous class, thereby achieving robust power load anomaly detection.
2.2. Dual-Channel Structural Feature Extraction and Encoding
2.2.1. Multi-Scale Time Series Statistical Feature Extraction
2.2.2. Graph-Level Embedding Feature Extraction Based on Sparse Unified GATv2
- Sparse Neighbor Graph Construction based on Value Similarity
- 2.
- Dynamic Graph Attention Modeling
- 3.
- Graph-level Feature Aggregation
2.3. Dual-Channel Feature Fusion and Encoding Based on Multi-Head Cross-Attention
3. Experiments and Validation
3.1. Data Description
3.2. Experimental Setup and Evaluation Metrics
3.3. Overall Performance Evaluation of TGCformer
3.4. Ablation Study
3.5. Visualization and Interpretability Analysis
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Carr, D.; Thomson, M. Non-technical electricity losses. Energies 2022, 15, 2218. [Google Scholar] [CrossRef]
- De Souza Savian, F.; Siluk, J.C.; Garlet, T.B.; Nascimento, F.M.D.; Pinheiro, J.R.; Vale, Z. Non-technical losses: A systematic contemporary article review. Renew. Sustain. Energy Rev. 2021, 147, 111205. [Google Scholar] [CrossRef]
- Lepolesa, L.J.; Achari, S.; Cheng, L. Electricity theft detection in smart grids based on deep neural network. IEEE Access 2022, 10, 39638–39655. [Google Scholar] [CrossRef]
- Wang, X.; Yao, Z.; Papaefthymiou, M. A real-time electrical load forecasting and unsupervised anomaly detection framework. Appl. Energy 2023, 330, 120279. [Google Scholar] [CrossRef]
- Wang, X.; Wang, H.; Bhandari, B.; Cheng, L. AI-empowered methods for smart energy consumption: A review of load forecasting, anomaly detection and demand response. Int. J. Precis. Eng. Manuf.-Green Technol. 2024, 11, 963–993. [Google Scholar] [CrossRef]
- Fahmi, A.T.W.K.; Kashyzadeh, K.R.; Ghorbani, S. Enhanced Autoregressive Integrated Moving Average Model for Anomaly Detection in Power Plant Operations. Int. J. Eng. 2024, 37, 1691–1699. [Google Scholar] [CrossRef]
- Cheng, M.; Zhang, D.; Yan, W.; He, L.; Zhang, R.; Xu, M. Power system abnormal pattern detection for new energy big data. Int. J. Emerg. Electr. Power Syst. 2023, 24, 91–102. [Google Scholar] [CrossRef]
- Yang, J.; Fei, K.; Ren, F.; Li, Q.; Li, J.; Duan, Y.; Dong, L. Non-technical loss detection using missing values pattern. In Proceedings of the International Conference on Smart Grid and Clean Energy Technologies, Kuching, Malaysia, 9–11 October 2020; pp. 149–154. [Google Scholar]
- Hussain, S.; Mustafa, M.W.; Jumani, T.A.; Baloch, S.K.; Saeed, M.S. A novel unsupervised feature-based approach for electricity theft detection using robust PCA and outlier removal clustering algorithm. Int. Trans. Electr. Energy Syst. 2020, 30, 3359–3372. [Google Scholar] [CrossRef]
- Guerrero, J.I.; Monedero, I.; Biscarri, F.; Biscarri, J.; Millan, R.; Leon, C. Non-technical losses reduction by improving the inspections accuracy in a power utility. IEEE Trans. Power Syst. 2017, 33, 1209–1218. [Google Scholar] [CrossRef]
- Xia, Y.; Liang, D.; Zheng, G.; Wang, J.; Zeng, J. Helicopter main reduction planetary gear fault diagnosis method based on SVDD. Int. J. Appl. Electromagn. Mech. 2020, 64, 137–145. [Google Scholar] [CrossRef]
- Vapnik, V.; Chervonenkis, A.Y. A class of algorithms for pattern recognition learning. Avtomat. Telemekh. 1964, 25, 937–945. [Google Scholar]
- Liu, H.; Shi, J.; Fu, R.; Zhang, Y. Anomaly Detection of Residential Electricity Consumption Based on Ensemble Model of PSO-AE-XGBOOST. In International Conference on Neural Computing for Advanced Applications; Springer Nature: Singapore, 2024; pp. 44–58. [Google Scholar]
- Touzani, S.; Granderson, J.; Fernandes, S. Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy Build. 2018, 158, 1533–1543. [Google Scholar] [CrossRef]
- Harshini, C.; Deepthi, G.; Reddy, G.A.; Laxmi, G.V.; Rajasree, G. Electricity theft detection in power grids with deep learning and random forests. Int. J. Manag. Res. Rev. 2023, 13, 1–10. [Google Scholar]
- Bian, J.; Wang, L.; Scherer, R.; Wozniak, M.; Zhang, P.; Wei, W. Abnormal detection of electricity consumption of user based on particle swarm optimization and long short term memory with the attention mechanism. IEEE Access 2021, 9, 47252–47265. [Google Scholar] [CrossRef]
- Irwansyah, A.; Muhammad, E.; Arifin, F.; Iman, B.N.; Hermawan, H. Power consumption predictive analytics and automatic anomaly detection based on CNN-LSTM neural networks. J. Rekayasa Elektr. 2023, 19, 127–134. [Google Scholar] [CrossRef]
- Duan, J. Deep learning anomaly detection in AI-powered intelligent power distribution systems. Front. Energy Res. 2024, 12, 1364456. [Google Scholar] [CrossRef]
- Kang, H.; Kang, P. Transformer-based multivariate time series anomaly detection using inter-variable attention mechanism. Knowl.-Based Syst. 2024, 290, 111507. [Google Scholar] [CrossRef]
- Yi, S.; Zheng, S.; Yang, S.; Zhou, G.; He, J. Robust transformer-based anomaly detection for nuclear power data using maximum correntropy criterion. Nucl. Eng. Technol. 2024, 56, 1284–1295. [Google Scholar] [CrossRef]
- Christ, M.; Braun, N.; Neuffer, J.; Kempa-Liehr, A.W. Time series feature extraction on basis of scalable hypothesis tests (TSFresh—A python package). Neurocomputing 2018, 307, 72–77. [Google Scholar] [CrossRef]
- Tam, I.; Kalech, M.; Rokach, L.; Madar, E.; Bortman, J.; Klein, R. Probability-based algorithm for bearing diagnosis with untrained spall sizes. Sensors 2020, 20, 1298. [Google Scholar] [CrossRef]
- Döhler, S. A discrete modification of the Benjamini-Yekutieli procedure. Econom. Stat. 2018, 5, 137–147. [Google Scholar] [CrossRef]
- Donner, R.V.; Zou, Y.; Donges, J.F.; Marwan, N.; Kurths, J. Recurrence networks—A novel paradigm for nonlinear time series analysis. New J. Phys. 2010, 12, 129–132. [Google Scholar] [CrossRef]
- Jin, M.; Zheng, Y.; Li, Y.-F.; Chen, S.; Yang, B.; Pan, S. Multivariate time series forecasting with dynamic graph neural odes. IEEE Trans. Knowl. Data Eng. 2022, 35, 9168–9180. [Google Scholar] [CrossRef]
- Dwivedi, V.P.; Bresson, X. A Generalization of Transformer Networks to Graphs. arXiv 2020, arXiv:2012.09699. [Google Scholar]
- Fu, Y.; Liu, X.; Yu, B. PD-GATv2: Positive difference second generation graph attention network based on multi-granularity in information systems to classification. Appl. Intell. 2024, 54, 5081–5096. [Google Scholar] [CrossRef]
- Ma, W.; Guo, Y.; Zhu, H.; Yi, X.; Zhao, W.; Wu, Y.; Hou, B.; Jiao, L. Intra-and intersource interactive representation learning network for remote sensing images classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5401515. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J. CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach. Inf. Fusion 2024, 103, 102147. [Google Scholar] [CrossRef]
- Lv, Y.; Liu, Y.; Li, S.; Liu, J.; Wang, T. Enhancing marine shaft generator reliability through intelligent fault diagnosis of gearbox bearings via improved Bidirectional LSTM. Ocean. Eng. 2025, 337, 121860. [Google Scholar] [CrossRef]
- Razavi, R.; Gharipour, A. Rethinking the privacy of the smart grid: What your smart meter data can reveal about your household in Ireland. Energy Res. Soc. Sci. 2018, 44, 312–323. [Google Scholar] [CrossRef]
- Mohassel, R.R.; Fung, A.S.; Mohammadi, F.; Raahemifar, K. A survey on advanced metering infrastructure and its application in smart grids. In Proceedings of the 2014 IEEE 27th CANADIAN Conference on Electrical and Computer Engineering (CCECE), Toronto, ON, Canada, 4–7 May 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–8. [Google Scholar]
- Zanetti, M.; Jamhour, E.; Pellenz, M.; Penna, M.; Zambenedetti, V.; Chueiri, I. A tunable fraud detection system for advanced metering infrastructure using short-lived patterns. IEEE Trans. Smart Grid 2017, 10, 830–840. [Google Scholar] [CrossRef]
- McLaughlin, S.; Holbert, B.; Fawaz, A.; Berthier, R.; Zonouz, S. A multi-sensor energy theft detection framework for advanced metering infrastructures. IEEE J. Sel. Areas Commun. 2013, 31, 1319–1330. [Google Scholar] [CrossRef]
- Jokar, P.; Arianpoo, N.; Leung, V.C.M. Electricity theft detection in AMI using customers’ consumption patterns. IEEE Trans. Smart Grid 2015, 7, 216–226. [Google Scholar] [CrossRef]
- Rahimian, E.; Zabihi, S.; Atashzar, S.F.; Asif, A.; Mohammadi, A. Xceptiontime: A novel deep architecture based on depthwise separable convolutions for hand gesture classification. arXiv 2019, arXiv:1911.03803. [Google Scholar] [CrossRef]
- Fawaz, H.I.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.-A.; Petitjean, F. Inceptiontime: Finding alexnet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
- Cheng, M.; Liu, Q.; Liu, Z.; Li, Z.; Luo, Y.; Chen, E. FormerTime: Hierarchical multi-scale representations for multivariate time series classification. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 1437–1445. [Google Scholar]
- Kuppan, K.; Acharya, D.B.; Divya, B. LSTM-GNN Synergy: A New Frontier in Stock Price Prediction. J. Adv. Math. Comput. Sci. 2024, 39, 95–109. [Google Scholar] [CrossRef]








| No. | Feature & Description | Mathematical Formula |
|---|---|---|
| 1 | Absolute Energy: Reflects the overall amplitude or intensity of the signal | |
| 2 | Mean Change: Indicates the overall trend of the time series | |
| 3 | Absolute Sum of Changes: Quantifies the total volatility of the sequence. | |
| 4 | FFT Coefficient: Captures periodicity and frequency characteristics. | |
| 5 | Parameter: Characterizes self-similarity and nonlinearity. |
| No. | Method Description | Mathematical Formula |
|---|---|---|
| 1 | Proportional Reduction: Randomly select consecutive daily data and scale it down uniformly. | |
| 2 | Threshold-based Perturbation: Randomly reduce only the data points exceeding a high percentile threshold. | |
| 3 | Constant Truncation: Subtract a constant from all values, setting results below zero to zero. | |
| 4 | Time-interval Zero-setting: Set data within random 8-h intervals on selected days to zero. | |
| 5 | Daily Random Perturbation: Scale each day’s data independently by a daily random factor. | |
| 6 | Monthly mean Propagation: Scale each month’s data using the previous month’s average as a coefficient. |
| Category | Configuration/Value | Category | Configuration/Value |
|---|---|---|---|
| Processor | AMD EPYC 9354 | Batch Size | 512 |
| GPU | NVIDIA RTX 4090 | Learning Rate | 1 × 10−4 |
| Python Version | 3.11.8 | Total Training Epochs | 20 |
| PyTorch Version | 2.2.2 | Weight Decay | 1 × 10−4 |
| Data Split Ratio | 8:1:1 (Train:Val:Test) | Dropout Rate | 0.1 |
| Optimizer | AdamW | KNN (K) | 8 |
| Loss Function | Focal Loss | GATv2 layers | 4 |
| Random Seed | [123, 199, 1998, 2178, 3047] | Number of Cross-Source Attention Layers | 8 |
| Multi-scale window sizes | {4, 8, 16, 32, 64} | Number of Attention Heads | 8 |
| Method | Anomaly Rate | ACC | Precision | Recall | F1-Score | FPR |
|---|---|---|---|---|---|---|
| XceptionTime [36] | 5% | 0.955 ± 0.005 | 0.778 ± 0.447 | 0.107 ± 0.126 | 0.175 ± 0.180 | 0.001 ± 0.001 |
| 10% | 0.915 ± 0.007 | 0.944 ± 0.073 | 0.161 ± 0.101 | 0.263 ± 0.141 | 0.002 ± 0.002 | |
| 15% | 0.870 ± 0.017 | 0.845 ± 0.102 | 0.153 ± 0.130 | 0.244 ± 0.173 | 0.003 ± 0.002 | |
| InceptionTime [37] | 5% | 0.949 ± 0.002 | 0.780 ± 0.114 | 0.171 ± 0.122 | 0.259 ± 0.146 | 0.003 ± 0.002 |
| 10% | 0.905 ± 0.004 | 0.880 ± 0.110 | 0.221 ± 0.077 | 0.348 ± 0.101 | 0.004 ± 0.004 | |
| 15% | 0.889 ± 0.018 | 0.871 ± 0.054 | 0.311 ± 0.161 | 0.431 ± 0.185 | 0.009 ± 0.009 | |
| FormerTime [38] | 5% | 0.951 ± 0.005 | 0.689 ± 0.273 | 0.114 ± 0.042 | 0.187 ± 0.067 | 0.005 ± 0.005 |
| 10% | 0.912 ± 0.006 | 0.683 ± 0.134 | 0.229 ± 0.031 | 0.339 ± 0.036 | 0.013 ± 0.007 | |
| 15% | 0.872 ± 0.014 | 0.639 ± 0.113 | 0.379 ± 0.148 | 0.454 ± 0.119 | 0.041 ± 0.021 | |
| LSTM-GNN [39] | 5% | 0.950 ± 0.000 | 0.475 ± 0.000 | 0.500 ± 0.000 | 0.487 ± 0.000 | 0.500 ± 0.000 |
| 10% | 0.901 ± 0.000 | 0.450 ± 0.000 | 0.500 ± 0.000 | 0.474 ± 0.000 | 0.500 ± 0.000 | |
| 15% | 0.840 ± 0.013 | 0.471 ± 0.068 | 0.507 ± 0.015 | 0.478 ± 0.033 | 0.493 ± 0.015 | |
| TGCformer (Ours) | 5% | 0.982 ± 0.008 | 0.921 ± 0.079 | 0.714 ± 0.227 | 0.772 ± 0.142 | 0.004 ± 0.005 |
| 10% | 0.990 ± 0.005 | 0.978 ± 0.022 | 0.923 ± 0.038 | 0.949 ± 0.024 | 0.002 ± 0.002 | |
| 15% | 0.979 ± 0.006 | 0.926 ± 0.022 | 0.937 ± 0.022 | 0.931 ± 0.021 | 0.013 ± 0.004 |
| Processing Stage | Device | Avg. Time/Sample | Throughput (Samples/s) |
|---|---|---|---|
| Multi-scale Feature Extraction (TSFresh) | CPU | 98.68 s | 0.01 |
| Graph Embedding Feature Extraction (GATv2) | GPU | 3.82 s | 0.26 |
| TGCformer Inference | GPU | 1.60 ms | 626.9 |
| FeatureSet | ACC | Precision | Recall | F1-Score | FPR |
|---|---|---|---|---|---|
| Only_TSFresh | 0.973 ± 0.006 | 0.849 ± 0.058 | 0.898 ± 0.063 | 0.870 ± 0.025 | 0.019 ± 0.010 |
| Only_GATv2 | 0.811 ± 0.077 | 0.394 ± 0.259 | 0.526 ± 0.125 | 0.379 ± 0.070 | 0.157 ± 0.096 |
| TGCformer (Ours) | 0.990 ± 0.005 | 0.978 ± 0.022 | 0.923 ± 0.038 | 0.949 ± 0.024 | 0.002 ± 0.002 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Xu, L.; Chen, S.; Wu, X.; Wang, Q.; Liu, Y.; Peng, Y. TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection. Electronics 2026, 15, 874. https://doi.org/10.3390/electronics15040874
Xu L, Chen S, Wu X, Wang Q, Liu Y, Peng Y. TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection. Electronics. 2026; 15(4):874. https://doi.org/10.3390/electronics15040874
Chicago/Turabian StyleXu, Li, Shouwei Chen, Xiaoping Wu, Qu Wang, Yu Liu, and Yasi Peng. 2026. "TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection" Electronics 15, no. 4: 874. https://doi.org/10.3390/electronics15040874
APA StyleXu, L., Chen, S., Wu, X., Wang, Q., Liu, Y., & Peng, Y. (2026). TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection. Electronics, 15(4), 874. https://doi.org/10.3390/electronics15040874

