TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection

Xu, Li; Chen, Shouwei; Wu, Xiaoping; Wang, Qu; Liu, Yu; Peng, Yasi

doi:10.3390/electronics15040874

Open AccessArticle

TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection

by

Li Xu

^1,*,

Shouwei Chen

^1,2,*,

Xiaoping Wu

¹

,

Qu Wang

^3,4

,

Yu Liu

⁵

and

Yasi Peng

⁶

¹

Engineering Research Center of Micro-Nano and Intelligent Manufacturing, Ministry of Education, Kaili University, Kaili 556011, China

²

Guangxi Key Laboratory of Precision Navigation Technology and Application, Guilin 541004, China

³

School of Automation Science and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China

⁴

Shunde Graduate School, University of Science and Technology Beijing, Foshan 528399, China

⁵

School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430063, China

⁶

Bijie Power Supply Bureau of Guizhou Electric Power Corporation, Bijie 551700, China

^*

Authors to whom correspondence should be addressed.

Electronics 2026, 15(4), 874; https://doi.org/10.3390/electronics15040874

Submission received: 20 January 2026 / Revised: 13 February 2026 / Accepted: 14 February 2026 / Published: 19 February 2026

Download

Browse Figures

Versions Notes

Abstract

Existing methods for power load anomaly detection suffer from several limitations, including insufficient extraction of multi-scale temporal features, difficulty in capturing long-range dependencies, and inefficient fusion of heterogeneous Time-Graph information. To address these issues, this study proposes the TGCformer, an enhanced framework for Time-Graph feature fusion. First, a dual-channel feature extraction module is constructed. The temporal path utilizes Time Series Feature Extraction based on Scalable Hypothesis Tests (TSFresh) to enhance the explicit pattern representation of the load sequences, while the graph-learning path employs a Sparse Unified Graph Attention Network v2 (Sparse Unified GATv2) to model global semantic correlations among time steps. Together, these two paths provide more interpretable and structured inputs for the subsequent fusion module. Subsequently, a multi-head cross-attention mechanism is designed, where temporal features serve as the Query and graph-level embeddings as the Key and Value to guide the feature fusion process. This design ensures the effective integration of complementary information while suppressing noise. Experimental results on the public Irish CER Smart Meter Dataset demonstrate the effectiveness of the proposed model. Specifically, TGCformer consistently outperforms four classic deep learning baselines (XceptionTime, InceptionTime, FormerTime, and LSTM-GNN), demonstrating competitive detection accuracy and robustness.

Keywords:

transformer; Time-Graph features; TSFresh; Sparse Unified GATv2; cross-attention; Dynamic Feature Fusion

1. Introduction

With the continuous growth in electricity demand, the issue of Non-Technical Losses (NTL) caused by abnormal electricity consumption behavior has become increasingly prominent, threatening the secure operation of distribution networks and causing significant economic losses [1,2]. As a crucial technique for identifying such anomalies and ensuring the stable operation of power grids, power load anomaly detection has attracted increasing research attention in recent years [3,4,5]. Existing approaches can be broadly categorized into three groups: traditional methods, machine learning methods, and deep learning methods.

Traditional anomaly detection methods primarily include strategies based on statistical analysis, signal processing, density estimation, and clustering. While statistical and signal processing methods offer fundamental modeling of load data, they are often inadequate for handling non-stationary, high-dimensional, and highly nonlinear sequences [6]. Density-based and clustering methods can identify potential anomalies in an unsupervised manner; however, they are inherently limited in explicitly modeling long-term temporal dynamics and complex sequential dependencies [7,8,9]. Although hybrid forecasting-based methods and residual analysis techniques have improved detection accuracy to some extent [10], traditional approaches overall struggle to capture the complex temporal dependencies and latent structural correlations inherent in power load data, limiting their performance in practical applications.

Machine learning methods enhance the modeling of complex features and improve generalization through supervised or semi-supervised mechanisms. Support Vector Machines (SVM) have demonstrated good performance in handling nonlinear features and imbalanced datasets [11,12]. Ensemble learning algorithms, such as Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Gradient Boosting Machines (GBM), significantly boost detection robustness and accuracy through multi-model fusion [13,14,15]. However, these methods typically depend on manually engineered features, making it difficult to fully exploit the dynamic temporal patterns and latent value-based dependencies within load data.

Deep learning methods offer substantial advantages in time series modeling and high-dimensional feature representation. Architectures like Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), and Convolutional Neural Networks (CNNs) can effectively capture nonlinear relationships and long-term dependencies [16,17]. Recently, the Transformer architecture, with its self-attention mechanism, has shown exceptional performance in long-sequence modeling and has been widely applied to power load forecasting and anomaly detection [18]. Studies indicate that combining Transformer with techniques like clustering, residual analysis, or Generative Adversarial Networks (GANs) can significantly enhance detection performance [19]. Furthermore, improved Transformer variants designed for multivariate and multi-scale time series exhibit greater stability and generalization in complex scenarios [20]. However, these methods still primarily rely on sequential self-attention and implicitly assume that temporal proximity dominates dependency modeling. As a result, they often overlook latent topological dependencies across distant but semantically similar time steps. In addition, when heterogeneous features from different modeling paradigms are involved, existing approaches typically adopt simple concatenation or late fusion strategies, lacking effective mechanisms for deep and adaptive feature interaction. Consequently, the joint modeling and dynamic fusion of temporal patterns and graph-level dependencies remain insufficiently explored.

To address the aforementioned limitations, this paper proposes an anomaly detection framework named TGCformer, based on the dynamic fusion of multi-scale temporal features and graph-level features. The framework aims to achieve joint Time-Graph modeling and multi-source feature fusion for power load data, thereby improving detection accuracy and model interpretability. The main contributions of this paper are as follows:

(1): Multi-scale Temporal Feature Extraction: We employ the Time Series Feature extraction based on Scalable Hypothesis tests (TSFresh) to mine statistical, time-domain, and frequency-domain features from load sequences, constructing a rich multi-scale temporal representation.
(2): Graph-level Embedding Feature Extraction: We utilize a Sparse Unified GATv2 module to model value-aware dependencies among historical states. By enhancing the input embedding with multi-scale contexts and constructing a sparse KNN graph, the method achieves unified modeling of local variations and global recurrences, thereby improving the expressiveness and robustness of graph-level features.
(3): Cross-attention-based Dynamic Feature Fusion Mechanism: We design a cross-attention interaction module to enable deep coupling and dynamic weighting between multi-scale temporal features and graph-level features. This allows the model to adaptively focus on critical feature channels while suppressing redundancy from irrelevant information, significantly enhancing the accuracy and stability of anomaly detection.

This study achieves deep feature fusion of power load data by leveraging the synergistic modeling of TSFresh and Sparse GATv2 networks, combined with a cross-attention mechanism. Distinct from traditional unsupervised methods that rely solely on reconstruction errors of normal data, this work explicitly formulates electricity theft detection as a supervised binary classification task by introducing labeled synthetic anomalous samples. The proposed TGCformer framework, through end-to-end learning, explicitly discriminates between normal consumption patterns and theft behaviors and directly outputs anomaly scores, thereby providing an efficient, scalable, and interpretable solution for complex power systems.

2. TGCformer

2.1. Overall Framework

To address the limitations regarding the insufficient representation of multi-scale temporal patterns and the difficulty in effectively modeling dynamic value-based dependencies in power load data, this paper proposes a Time-Graph Fusion anomaly detection framework named TGCformer. This framework significantly improves anomaly detection performance by jointly modeling multi-scale temporal statistical features and dynamic graph-level features. Figure 1 illustrates the overall architecture of the TGCformer framework, which consists of four integrated modules: multi-scale temporal feature extraction, graph-level embedding extraction, cross-attention-based Time-Graph feature fusion, and anomaly classification.

(1): Temporal Feature Extraction Branch: As shown in Module 1 of Figure 1, the raw power load time series is first processed by the TSFresh module to extract a comprehensive set of multi-scale temporal features, including time-domain statistics, frequency-domain descriptors, and non-linear features. These features explicitly encode the volatility, periodicity, and distributional properties of the load sequences. To reduce redundancy and enhance robustness, statistical significance testing and feature selection processes are applied (as indicated by the filtering module in the figure). This branch outputs a refined temporal feature vector containing 783 significant features, serving as the temporal semantic representation for subsequent fusion.
(2): Graph-level Embedding Extraction: In parallel with temporal modeling, Module 2 of Figure 1 depicts the global dependency modeling process. The time series is transformed into a latent value graph structure by constructing a sparse adjacency matrix using the K-Nearest Neighbors (KNN) algorithm, where each node represents a time point, and edges encode semantic proximity based on signal magnitude (rather than temporal distance). The resulting sparse graph is then fed into stacked Sparse Unified GATv2 layers. Incorporating a unified feature embedding (comprising multi-scale statistics and positional encodings), this layer achieves adaptive attention-based aggregation of recurrent pattern information. A mean pooling operation is subsequently applied to aggregate node-level embeddings into a global graph-level representation with a dimension of 64, preserving both value-based topological relations and global dependency patterns.
(3): Dual-Channel Fusion based on Cross-Attention: Module 3 of Figure 1 corresponds to the proposed multi-head cross-attention mechanism. In this module, the refined temporal features from the upper branch are projected via a linear layer into a unified 512-dimensional embedding space as Query vectors, while the graph-level embeddings from the lower branch are similarly mapped to the same dimension to serve as Key and Value vectors. Through cross-attention, the model explicitly aligns local temporal patterns with global structure-related graph information, allowing temporal semantics to dynamically guide the weighting of graph-structural dependencies. The multi-head design, as illustrated by the parallel attention heads in the figure, enables the learning of complementary heterogeneous feature interactions across different subspaces. Additionally, residual connections and normalization layers are employed to ensure stable training and effective deep feature encoding.
(4): Anomaly Classification Head: As shown in Module 4 of Figure 1, the fused dual-channel joint representation is passed to a Multilayer Perceptron (MLP) classifier. The MLP further refines features through stacked linear layers and non-linear activation functions, transforming high-dimensional Time-Graph dependencies into discriminative semantics. Finally, the output layer employs a Sigmoid activation function to map the latent features to a probability score within the interval [0, 1], functioning as a binary classifier to determine whether a given sample belongs to the normal or anomalous class, thereby achieving robust power load anomaly detection.

2.2. Dual-Channel Structural Feature Extraction and Encoding

2.2.1. Multi-Scale Time Series Statistical Feature Extraction

In the temporal dimension, the proposed TGCformer framework employs the TSFresh library [21] to systematically extract a comprehensive set of multi-scale features. This process automatically mines statistical, time-domain, and frequency-domain characteristics, thereby enabling an explicit and robust characterization of the sequential and evolutionary patterns inherent in time series data. Figure 2 provides a visualization of example features extracted by TSFresh. To provide a rigorous mathematical grounding, Table 1 details the definitions and formulations of representative features used to capture properties such as central tendency, volatility, periodicity, and nonlinearity.

As shown in Module 1 of Figure 1, after TSFresh extracts multi-scale features, feature selection is required. Univariate linear regression F-test [22] is used to screen significant features:

S S R = \sum_{j = 1}^{n} {({\hat{y}}_{j} - \bar{y})}^{2}, S S E = \sum_{j = 1}^{n} {(y_{j} - {\hat{y}}_{j})}^{2}

(1)

F = \frac{S S R / 1}{S S E / (N - 2)}

(2)

p = P (F_{1, N - 2} \geq F_{o b s})

(3)

Subsequently, the Benjamini–Yekutieli (BY) procedure [23] is applied to control the false discovery rate:

p_{(i)}^{B Y} S = m i n (\frac{g \cdot p (i)}{i} c (e), 1), c (g) = \sum_{j = 1}^{n} \frac{1}{j}

(4)

If the corrected

p_{(i)}^{B Y} < 0.05

, the feature is considered significant and retained.

2.2.2. Graph-Level Embedding Feature Extraction Based on Sparse Unified GATv2

To characterize the long-range recurrent patterns of load sequences, we design a dynamic graph feature extraction module based on Sparse Unified GATv2 (Module 2 in Figure 1). Unlike traditional approaches that rely on predefined physical grid topology (which is often unavailable), we define the “spatial” structure in the latent feature space. This allows the model to capture global semantic correlations by connecting time steps that exhibit similar behaviors, regardless of their temporal distance.

Sparse Neighbor Graph Construction based on Value Similarity

We treat the univariate time series as graph-structured data, where each time point constitutes a node. To capture long-term dependencies effectively, we propose a graph construction strategy based on magnitude similarity rather than temporal proximity. Specifically, we use the K-Nearest Neighbors (KNN) algorithm to construct a sparse adjacency matrix: for time point

i

, the

k

nodes with the smallest absolute value difference (i.e.,

| x_{i} - x_{j} |

) are selected as neighbors, forming a sparse graph

G = (V, E)

. This structure explicitly links historically similar states, enabling the network to learn from recurrent anomalies [24,25].

To provide rich semantic context for the attention mechanism, we construct a “Unified Node Embedding” as the input. The initial features of each node consist of three parts: (1) the original load value

x_{i}

; (2) multi-scale statistical features (mean

μ_{i}^{(s)}

and standard deviation

σ_{i}^{(s)}

) to capture local temporal context; and (3) sinusoidal positional encoding

P E (i)

to explicitly inject sequential order information into the non-sequential graph topology [26]. The unified embedding formulas are as follows:

μ_{i}^{(s)} = \frac{1}{s} \sum_{t = i - ⌊\frac{s}{2}⌋}^{i + ⌊\frac{s}{2}⌋} x_{t}, σ_{i}^{(s)} = \sqrt{\frac{1}{s} \sum_{t = i - ⌊\frac{s}{2}⌋}^{i + ⌊\frac{s}{2}⌋} x_{t}^{2} - {(μ_{i}^{(s)})}^{2}}

(5)

{P E (i)}_{2 p} = s i n (\frac{i}{{10,000}^{\frac{2 p}{d_{m o d e l}}}}), {P E (i)}_{2 p + 1} = c o s (\frac{i}{{10,000}^{\frac{2 p}{d_{m o d e l}}}})

(6)

2.: Dynamic Graph Attention Modeling

Based on the constructed sparse value-graph and unified embeddings, we employ the GATv2 mechanism [27] to dynamically update node representations. GATv2 incorporates a nonlinear mapping before computing attention weights, allowing it to fix the static attention problem of standard GAT. In our Sparse Unified framework, this mechanism focuses on aggregating information from “semantic neighbors” (nodes with similar values) rather than just temporal neighbors.

For node

i

at layer

l

, its unified feature

h_{i}

and the feature

h_{j}

of neighbor node

j

(where

j

is determined by the value-based KNN) are linearly transformed and concatenated, then input into a

L e a k y R e L U

function, followed by a nonlinear mapping to obtain the attention score (as shown in Figure 3):

q_{i j} = a^{T} L e a k y R e L U ([W h_{i} | | W h_{j}]), j \in N_{i}

(7)

where

q_{i j}

reflects the importance of neighbor node

j

to node

i

.

The

S o f t m a x

function is used for normalization within the neighbor set to compute attention weights:

α_{i j} = \frac{e x p {(q}_{i j})}{\sum_{k ϵ N_{i}} e x p {(q}_{i k})}

(8)

where

N_{i}

denotes the set of neighbors for node

i

.

GATv2 employs a multi-head attention mechanism to enhance feature representation capability:

h_{i}^{(l + 1)} = {| |}_{k = 1}^{K} σ (\sum_{j ϵ N_{i}} α_{i j}^{k} W^{k} h_{j}^{(l)}), k \in (0,1, 2, \dots, K)

(9)

The multi-head mechanism yields richer and more robust node feature representations. Residual connections and layer normalization are applied after each GATv2 layer to improve training stability and convergence. After multiple stacked layers, the final node graph embedding representation is obtained.

3.: Graph-level Feature Aggregation

To obtain global dynamic graph features for the entire time series, an average pooling operation is introduced after the sparse GATv2 layers, aggregating the embedding features of all nodes into a graph-level representation vector:

{A v e p o o l (h}_{G}) = \frac{1}{N} \sum_{i = 1}^{N} h_{i}^{(L)}

(10)

The resulting graph-level feature vector

h_{G}

comprehensively characterizes the dynamic dependency structure of the load sequence, providing a dynamic graph semantic representation foundation for subsequent cross-attention fusion.

2.3. Dual-Channel Feature Fusion and Encoding Based on Multi-Head Cross-Attention

In power load anomaly detection, sequential features and graph-based topological features are heterogeneous, exhibiting significant differences in semantic expression and statistical distribution. While the self-attention mechanism of the standard Transformer can effectively capture global dependencies within time series, its cross-modal information interaction capability is limited. It can only model dependencies within a single modality, making it difficult to explicitly model the relationship between the local temporal context and global graph structure, leading to insufficient multi-view information fusion. To address this, this paper proposes the introduction of a cross-attention mechanism [28,29] based on the standard Transformer framework, aiming to achieve deep interactive fusion of multi-scale temporal features and graph embedding features (as shown in Module 3 of Figure 1).

Specifically, TGCformer takes the multi-scale time series features

Y_{t s f}

extracted by TSFresh and the graph embedding features

Y_{g a t}

generated by the sparse GATv2 network as dual-channel inputs. The cross-attention mechanism enables explicit relational modeling between these heterogeneous features. As shown in Figure 4,

Y_{t s f}

serves as the Query vector, and

Y_{g a t}

serves as the Key and Value vectors, obtained via linear projection:

Q_{t s f} = Y_{t s f} W^{Q}, K_{g a t} = Y_{g a t} W^{K}, V_{g a t} = Y_{g a t} W^{V}

(11)

Cross-attention generates fused features by computing the similarity between Query and Key and performing a weighted sum over Value:

A t t e n t i o n (Q_{t s f}, K_{g a t}, V_{g a t}) = softmax (\frac{Q_{t s f} {K_{g a t}}^{T}}{\sqrt{d_{k}}}) V_{g a t}

(12)

Compared to self-attention, which focuses only on dependencies within the same source, cross-attention achieves directed interaction between features from the temporal and graph domains. It can adaptively aggregate graph structure information most relevant to the temporal features, achieving cross-modal feature alignment. Furthermore, to enhance multi-semantic relational modeling capability for dual-channel features, a Multi-Head Cross-Attention mechanism is introduced to learn complementary relationships in parallel across different subspaces:

H^{(f)} = M u l t i H e a d (Q_{t s f}, K_{g a t}, V_{g a t}) = C o n c a t ({h e a d}_{1}, {h e a d}_{2}, \dots {h e a d}_{n}) W^{O}

(13)

where

W^{O}

is a learnable linear projection matrix.

The multi-head mechanism learns in parallel across different feature subspaces, thereby capturing complementary semantic relationships between temporal and graph structural features, significantly enhancing dual-channel dependency modeling capability and feature expression richness.

To improve deep encoding stability, TGCformer retains the Transformer’s residual connection and layer normalization structure, ensuring stable gradient propagation and enhancing feature representation efficiency. The inter-layer computation is as follows:

X^{'} = L a y e r N o r m (X_{i n p u t} + M u l t i H e a d (Q_{t s f}, K_{g a t}, V_{g a t}))

(14)

Z = L a y e r N o r m (X^{'} + F F N (X^{'}))

(15)

The Feed-Forward Network (FFN) adopts a standard two-layer linear transformation structure with GELU activation and Dropout to prevent overfitting:

F F N (x) = D r o p o u t ({L i n e a r}_{2} (G E L U ({L i n e a r}_{1} (x))))

(16)

where

{L i n e a r}_{1} \in R^{d_{m o d e l} \times d_{f f}}

,

{L i n e a r}_{2} \in R^{d_{f f} \times d_{m o d e l}}

, and

d_{f f} ≫ d_{m o d e l}

to enhance nonlinear representation capability.

The fused feature vector

H_{f u s e d}

obtained after stacking multiple cross-attention layers is input into a Multilayer Perceptron (MLP) for anomaly detection (as shown in Module 4 of Figure 1). The designed MLP has four hidden layers, each containing linear projection, batch normalization, GELU activation function, and Dropout to enhance nonlinear representation capability and reduce overfitting risk [30]. The forward pass can be represented as:

h^{(l)} = D r o p o u t (G E L U (B N (W^{(l)} h^{(l - 1)} + b^{(l)}))), l = 1,2, 3,4

(17)

G E L U (x) = x \cdot Φ (x) = x \cdot \frac{1}{2} [1 + e r f (\frac{x}{\sqrt{2}})]

(18)

The output layer maps the last hidden layer features to a single neuron and generates the probability of a sample belonging to the anomalous class via the Sigmoid activation function:

z = (W^{(0)} h^{(4)} + b^{(0)})

(19)

\hat{y} = σ (z) = \frac{1}{1 + e^{- z}}

(20)

During training, the network uses Focal Loss to optimize the anomaly detection task:

L_{f o c a l} = - α y (1 - \hat{y})^{γ} l o g (\hat{y}) - (1 - α) (1 - y) {\hat{y}}^{γ} l o g (1 - \hat{y})

(21)

where

α

and

γ

are tuning factors to mitigate training bias caused by class imbalance.

3. Experiments and Validation

3.1. Data Description

This study employs the smart meter measurement dataset published by the Irish Electricity Research Center (Irish CER Smart Meter Dataset) [31] as the foundation for experimentation. The dataset comprises electricity consumption records from 5633 residential and commercial users, with a sampling interval of 1 h, yielding a total of 12,002 time series. Given that the data originates from compliant participants, it is reasonable to consider the original data as a reliable baseline representing normal electricity consumption behavior.

To effectively evaluate electricity theft detection, it is essential to construct anomalous samples that accurately simulate real-world attacks. Drawing on references [32,33,34] and typical Advanced Metering Infrastructure (AMI) attack modes, including physical, firmware, and communication network attacks [35], this paper proposes six strategies for generating electricity theft data. These strategies focus on distinct scenarios such as consumption reduction, zero-setting, and random perturbation, which correspond to various non-technical loss patterns. The specific mathematical formulations and descriptions are detailed in Table 2.

Subsequently, the synthesized anomalous samples are integrated into the original normal data at specified proportions, yielding three distinct anomalous datasets with anomaly injection ratios of 5%, 10%, and 15%, respectively. These datasets are designed for comprehensive model evaluation and robustness validation across multiple scenarios.

Figure 5 illustrates the hourly electricity consumption distribution comparison between normal samples and the six types of anomalous samples. It is evident that normal samples exhibit a broader distribution range, with both median and extreme values notably surpassing those of the anomalous samples. In contrast, all anomalous samples demonstrate an overall downward shift and a distributional contraction, with the degree of variability determined by the specific theft method applied. This indicates that the anomalous samples generated in this study are statistically distinct from normal behaviors and can effectively simulate various real-world theft patterns, thereby providing a robust evaluation basis for subsequent anomaly detection models.

3.2. Experimental Setup and Evaluation Metrics

To guarantee the reproducibility and transparency of the experiments, the detailed experimental environment and key model configurations are summarized in Table 3. All experiments were performed on a high-performance cloud computing instance running on a Linux operating system. The model was implemented using Python 3.11.8 and the PyTorch 2.2.2 deep learning framework. Specifically, for the multi-scale statistical feature extraction, we employed window sizes of

s \in {4, 8, 16, 32, 64}

; boundary effects were managed by clamping the window indices to the valid time range [0, T], effectively using dynamically reduced windows at the sequence edges without zero-padding. For model optimization, we employed the AdamW optimizer. To effectively mitigate class imbalance, Focal Loss was utilized as the objective function, with its hyperparameters

α

and

γ

set to 1 and 2, respectively. The dataset was randomly partitioned into training, validation, and test sets following an 8:1:1 ratio. To prevent data leakage, each sample in the dataset encapsulates the complete electricity consumption record of a single user. These subsets were utilized exclusively for model parameter learning, hyperparameter tuning, and final performance evaluation, respectively. Furthermore, to demonstrate the robustness of our approach, experiments were conducted using five distinct random seeds. Consequently, the final results are reported as the

m e a n \pm s t a n d a r d

deviation.

To comprehensively evaluate model performance in the power load anomaly detection task, this paper selects Accuracy, Precision, Recall, F1-score and FPR as evaluation metrics. The calculation methods are as follows:

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(22)

P r e c i s i o n = \frac{T P}{T P + F P}

(23)

R e c a l l = \frac{T P}{T P + F N}

(24)

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(25)

F P R = \frac{F P}{F P + T N}

(26)

3.3. Overall Performance Evaluation of TGCformer

To systematically evaluate the robustness of TGCformer under varying anomaly distributions, comparative experiments were conducted on datasets with anomaly rates set at 5%, 10%, and 15%. As presented in Table 4, TGCformer demonstrates superior performance and stability across all evaluation metrics compared to the baseline models.

First, TGCformer excels in handling scenarios with extreme class imbalance. In the challenging low-anomaly scenario (5%), despite the scarcity of positive samples, TGCformer achieves an Accuracy of 0.982 and an F1-score of 0.772. Although this F1-score is slightly lower than that observed at higher anomaly rates, it significantly outperforms all comparative methods, demonstrating its effectiveness in highly imbalanced settings. As the anomaly rate increases to 10% and 15%, the model’s Recall further improves to 0.923 and 0.937, respectively, with F1-score exceeding 0.93. This trend indicates that as the density of anomaly patterns increases, TGCformer captures potential faults more comprehensively while maintaining a consistently low False Positive Rate, thereby achieving an optimal balance between Precision and Recall.

In contrast, the baseline models exhibit significant limitations in low-anomaly scenarios, with distinct failure modes. On one hand, CNN-based models (InceptionTime and XceptionTime) together with the Transformer-based FormerTime are severely constrained by low Recall. This suggests a tendency to overfit the majority class (normal samples) during training, resulting in the missed detection of sparse anomalies. On the other hand, LSTM-GNN displays stochastic behavior, with its FPR hovering around 0.5 across all settings and Precision fluctuating between only 0.45 and 0.47. This indicates that LSTM-GNN lacks effective discriminative capability, performing similarly to random guessing.

In summary, the experimental results strongly confirm that TGCformer significantly surpasses traditional methods in both detection accuracy and stability. Unlike existing models that struggle to balance Precision and Recall in imbalanced datasets, TGCformer maintains high comprehensive performance with minimal fluctuation. This robustness makes it particularly suitable for real-world time-series anomaly detection tasks, effectively handling practical scenarios where anomalies are typically scarce and sporadic.

To assess the feasibility of deploying the proposed framework in real-world smart grid scenarios, we evaluated the computational performance of its three key stages: multi-scale feature extraction, graph embedding feature extraction, and the final model inference. Experiments were conducted on a workstation equipped with a GPU, and the results are summarized in Table 5.

The multi-scale feature extraction stage is the main contributor to computational overhead, taking approximately 98.7 s per sample. Future improvements, including batch processing, GPU acceleration, and redundancy elimination through correlation analysis, are expected to mitigate this bottleneck

In contrast, the graph embedding feature extraction module effectively leverages GPU acceleration, reducing latency to 3.82 s per sample. Most importantly, the core TGCformer model demonstrates exceptional efficiency, achieving an inference time of only 1.60 ms per sample and a throughput of 626.90 samples/s while occupying merely 254.75 MB of GPU memory. This ultra-low latency and minimal memory footprint confirm that the TGCformer is highly scalable and ideally suited for deployment on resource-constrained embedded devices within smart grids, enabling robust real-time anomaly detection.

3.4. Ablation Study

To verify the effectiveness of the model design, ablation experiments were conducted on the test set with an anomaly ratio of 10%, comparing detection performance with different feature inputs. The results are shown in Table 6.

When using only multi-scale temporal features (Only-TSFresh) modeled via self-attention, the model achieves acceptable but suboptimal performance in both precision (0.849) and recall (0.898). Its overall performance (F1-score of 0.870) shows a clear gap compared to the complete model, indicating that relying solely on temporal statistical information is insufficient for optimal detection. When using only graph embedding features (Only-GATv2), performance is weakest, with precision notably decreased to 0.394 and the false alarm rate high (0.157), suggesting that utilizing only graph embedding features cannot provide sufficient discriminative power. The complete TGCformer, which employs dual-channel collaborative modeling and performs explicit fusion via cross-attention, achieves the best performance in precision (0.978), recall (0.923), and F1-score (0.949), while maintaining an extremely low false positive rate (0.002). This validates that the dual-channel collaborative modeling and the cross-attention mechanism can effectively integrate the strengths of different feature modalities, thereby significantly enhancing anomaly pattern recognition capability and detection stability.

Figure 6 compares the confusion matrices of single-channel and fusion models. The model employing only the Graph Attention Network (middle) exhibits suboptimal performance with a high error rate, indicating that static topology alone is insufficient for effective dynamic fault detection. The model utilizing only temporal statistical features (left) shows improved accuracy but still suffers from noticeable detection blind spots. In stark contrast, the proposed TGCformer (right) achieves a significant performance leap through cross-modal feature calibration: it substantially reduces false positive cases while minimizing false negatives to the lowest number. These results confirm that the proposed fusion strategy effectively filters out noise, thereby significantly enhancing the robustness and reliability of the detection system.

3.5. Visualization and Interpretability Analysis

To qualitatively evaluate the interpretability of the proposed model, we randomly selected a True Positive (TP) sample and a True Negative (TN) sample from the test set and visualized the attention weight matrices from the final cross-attention layer as heatmaps (Figure 7).

As illustrated in Figure 7 (Left), the TP sample exhibits a highly sparse and selective attention distribution, where specific attention heads (e.g., Head 3) show strong activation towards certain graph tokens (e.g., the token at index 2). This indicates the model’s capability to precisely localize key topological substructures associated with faults. In contrast, the TN sample in Figure 7 (Right) displays a distinctly different activation pattern (e.g., Head 5 focusing on the token at index 5). This shift in structural focus demonstrates the model’s context-awareness in adaptively adjusting its focus on graph embedding features driven by the input temporal signals.

Furthermore, the diversity in attention targets across different attention heads confirms the effectiveness of the multi-head cross-attention mechanism. This mechanism enhances representational robustness by simultaneously capturing diverse features, ranging from local neighborhood variations to global structural consistency, thereby improving the fusion and representation of complex patterns.

To intuitively evaluate the model’s effectiveness in feature representation learning and latent structure reconstruction, we compared the distribution of multi-scale features (TSFresh) with the fusion embeddings, as shown in Figure 8. Figure 8 (Left) illustrates the PCA dimensionality reduction results of the TSFresh features: although normal users (blue dots) are concentrated in the central region, they exhibit a distinct “long-tail” distribution along both the horizontal and vertical axes. Simultaneously, anomalous users (red dots) show significant overlap with normal samples, resulting in a highly entangled distribution pattern that is difficult to linearly separate. In stark contrast, the t-SNE visualization of the embeddings processed by the cross-attention mechanism (Figure 8, Right) demonstrates a qualitative leap: positive and negative samples achieve high inter-class separability in the feature space with clear boundaries between the two classes. Furthermore, the anomalous samples aggregate into two highly cohesive sub-clusters, exhibiting superior intra-class compactness. This transition from the “entanglement and overlap” of multi-scale features to the “clarity and clustering” of fusion features strongly validates that the TGCformer model has successfully captured highly discriminative latent patterns, laying a solid foundation for its superior classification performance.

4. Conclusions

This paper proposes TGCformer, a novel dual-channel model for dynamic Time-Graph feature fusion, designed for power load anomaly detection. TGCformer integrates multi-scale temporal features with graph-based semantic features, aiming to fully leverage global temporal patterns and the global semantic correlations among recurrent time steps. Experimental results demonstrate that TGCformer consistently achieves excellent performance across various anomaly ratio scenarios. The model’s overall performance significantly outperforms state-of-the-art deep temporal baselines, including XceptionTime, InceptionTime, FormerTime, and LSTM-GNN, with substantial average improvements. This validates the model’s robustness and superiority in identifying anomalies within imbalanced datasets. However, the study also reveals a limitation: when the model relies heavily on the cross-attention mechanism for fusion, it may underutilize discriminative features from individual channels, potentially hindering fine-grained recognition of specific anomaly types. Future work will explore methods to enhance the model’s adaptive utilization of diverse features, preserving its fusion strengths while better harnessing the discriminative power of heterogeneous information.

Author Contributions

Conceptualization, L.X., S.C. and Q.W.; methodology, L.X., S.C. and Q.W.; resources, X.W. and Y.L.; writing—original draft preparation, L.X. and S.C.; writing—review and editing, Q.W. and X.W.; visualization, L.X., S.C. and Y.P.; supervision, L.X. All authors have read and agreed to the published version of the manuscript.

Funding

Supported by the Foundation Research Project of Kaili University (Grant no.2025YB005); Guangxi Science and Technology Base and Talent Special Project: Research and Application of Key Technologies for Precise Navigation (Gui Ke AD25069103); the National Natural Science Foundation of China (Grant nos. 62162012 and 62462013); the Key Project of Engineering Research Center of Micro-nano and Intelligent Manufacturing of Ministry of Education (Grant no. WZG202502); the 2025 Open Fund Project of Key Laboratory of Glass in Qiandongnan Prefecture (Grant no. WZG01); Guizhou Provincial Science and Technology Projects (QN no.〔2025〕241); the Science and Technology Innovation Program of Xiongan New Area (Grant no. 2025AGG0028); the National Natural Science Foundation of China (Grant no. 42401521); the Henan Key Research and Development Program (Grant no. 241111320700).

Data Availability Statement

The original contributions presented in this study are included in the article.

Conflicts of Interest

Author Yasi Peng is employed by the company Bijie Power Supply Bureau of Guizhou Electric Power Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Carr, D.; Thomson, M. Non-technical electricity losses. Energies 2022, 15, 2218. [Google Scholar] [CrossRef]
De Souza Savian, F.; Siluk, J.C.; Garlet, T.B.; Nascimento, F.M.D.; Pinheiro, J.R.; Vale, Z. Non-technical losses: A systematic contemporary article review. Renew. Sustain. Energy Rev. 2021, 147, 111205. [Google Scholar] [CrossRef]
Lepolesa, L.J.; Achari, S.; Cheng, L. Electricity theft detection in smart grids based on deep neural network. IEEE Access 2022, 10, 39638–39655. [Google Scholar] [CrossRef]
Wang, X.; Yao, Z.; Papaefthymiou, M. A real-time electrical load forecasting and unsupervised anomaly detection framework. Appl. Energy 2023, 330, 120279. [Google Scholar] [CrossRef]
Wang, X.; Wang, H.; Bhandari, B.; Cheng, L. AI-empowered methods for smart energy consumption: A review of load forecasting, anomaly detection and demand response. Int. J. Precis. Eng. Manuf.-Green Technol. 2024, 11, 963–993. [Google Scholar] [CrossRef]
Fahmi, A.T.W.K.; Kashyzadeh, K.R.; Ghorbani, S. Enhanced Autoregressive Integrated Moving Average Model for Anomaly Detection in Power Plant Operations. Int. J. Eng. 2024, 37, 1691–1699. [Google Scholar] [CrossRef]
Cheng, M.; Zhang, D.; Yan, W.; He, L.; Zhang, R.; Xu, M. Power system abnormal pattern detection for new energy big data. Int. J. Emerg. Electr. Power Syst. 2023, 24, 91–102. [Google Scholar] [CrossRef]
Yang, J.; Fei, K.; Ren, F.; Li, Q.; Li, J.; Duan, Y.; Dong, L. Non-technical loss detection using missing values pattern. In Proceedings of the International Conference on Smart Grid and Clean Energy Technologies, Kuching, Malaysia, 9–11 October 2020; pp. 149–154. [Google Scholar]
Hussain, S.; Mustafa, M.W.; Jumani, T.A.; Baloch, S.K.; Saeed, M.S. A novel unsupervised feature-based approach for electricity theft detection using robust PCA and outlier removal clustering algorithm. Int. Trans. Electr. Energy Syst. 2020, 30, 3359–3372. [Google Scholar] [CrossRef]
Guerrero, J.I.; Monedero, I.; Biscarri, F.; Biscarri, J.; Millan, R.; Leon, C. Non-technical losses reduction by improving the inspections accuracy in a power utility. IEEE Trans. Power Syst. 2017, 33, 1209–1218. [Google Scholar] [CrossRef]
Xia, Y.; Liang, D.; Zheng, G.; Wang, J.; Zeng, J. Helicopter main reduction planetary gear fault diagnosis method based on SVDD. Int. J. Appl. Electromagn. Mech. 2020, 64, 137–145. [Google Scholar] [CrossRef]
Vapnik, V.; Chervonenkis, A.Y. A class of algorithms for pattern recognition learning. Avtomat. Telemekh. 1964, 25, 937–945. [Google Scholar]
Liu, H.; Shi, J.; Fu, R.; Zhang, Y. Anomaly Detection of Residential Electricity Consumption Based on Ensemble Model of PSO-AE-XGBOOST. In International Conference on Neural Computing for Advanced Applications; Springer Nature: Singapore, 2024; pp. 44–58. [Google Scholar]
Touzani, S.; Granderson, J.; Fernandes, S. Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy Build. 2018, 158, 1533–1543. [Google Scholar] [CrossRef]
Harshini, C.; Deepthi, G.; Reddy, G.A.; Laxmi, G.V.; Rajasree, G. Electricity theft detection in power grids with deep learning and random forests. Int. J. Manag. Res. Rev. 2023, 13, 1–10. [Google Scholar]
Bian, J.; Wang, L.; Scherer, R.; Wozniak, M.; Zhang, P.; Wei, W. Abnormal detection of electricity consumption of user based on particle swarm optimization and long short term memory with the attention mechanism. IEEE Access 2021, 9, 47252–47265. [Google Scholar] [CrossRef]
Irwansyah, A.; Muhammad, E.; Arifin, F.; Iman, B.N.; Hermawan, H. Power consumption predictive analytics and automatic anomaly detection based on CNN-LSTM neural networks. J. Rekayasa Elektr. 2023, 19, 127–134. [Google Scholar] [CrossRef]
Duan, J. Deep learning anomaly detection in AI-powered intelligent power distribution systems. Front. Energy Res. 2024, 12, 1364456. [Google Scholar] [CrossRef]
Kang, H.; Kang, P. Transformer-based multivariate time series anomaly detection using inter-variable attention mechanism. Knowl.-Based Syst. 2024, 290, 111507. [Google Scholar] [CrossRef]
Yi, S.; Zheng, S.; Yang, S.; Zhou, G.; He, J. Robust transformer-based anomaly detection for nuclear power data using maximum correntropy criterion. Nucl. Eng. Technol. 2024, 56, 1284–1295. [Google Scholar] [CrossRef]
Christ, M.; Braun, N.; Neuffer, J.; Kempa-Liehr, A.W. Time series feature extraction on basis of scalable hypothesis tests (TSFresh—A python package). Neurocomputing 2018, 307, 72–77. [Google Scholar] [CrossRef]
Tam, I.; Kalech, M.; Rokach, L.; Madar, E.; Bortman, J.; Klein, R. Probability-based algorithm for bearing diagnosis with untrained spall sizes. Sensors 2020, 20, 1298. [Google Scholar] [CrossRef]
Döhler, S. A discrete modification of the Benjamini-Yekutieli procedure. Econom. Stat. 2018, 5, 137–147. [Google Scholar] [CrossRef]
Donner, R.V.; Zou, Y.; Donges, J.F.; Marwan, N.; Kurths, J. Recurrence networks—A novel paradigm for nonlinear time series analysis. New J. Phys. 2010, 12, 129–132. [Google Scholar] [CrossRef]
Jin, M.; Zheng, Y.; Li, Y.-F.; Chen, S.; Yang, B.; Pan, S. Multivariate time series forecasting with dynamic graph neural odes. IEEE Trans. Knowl. Data Eng. 2022, 35, 9168–9180. [Google Scholar] [CrossRef]
Dwivedi, V.P.; Bresson, X. A Generalization of Transformer Networks to Graphs. arXiv 2020, arXiv:2012.09699. [Google Scholar]
Fu, Y.; Liu, X.; Yu, B. PD-GATv2: Positive difference second generation graph attention network based on multi-granularity in information systems to classification. Appl. Intell. 2024, 54, 5081–5096. [Google Scholar] [CrossRef]
Ma, W.; Guo, Y.; Zhu, H.; Yi, X.; Zhao, W.; Wu, Y.; Hou, B.; Jiao, L. Intra-and intersource interactive representation learning network for remote sensing images classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5401515. [Google Scholar] [CrossRef]
Li, H.; Wu, X.J. CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach. Inf. Fusion 2024, 103, 102147. [Google Scholar] [CrossRef]
Lv, Y.; Liu, Y.; Li, S.; Liu, J.; Wang, T. Enhancing marine shaft generator reliability through intelligent fault diagnosis of gearbox bearings via improved Bidirectional LSTM. Ocean. Eng. 2025, 337, 121860. [Google Scholar] [CrossRef]
Razavi, R.; Gharipour, A. Rethinking the privacy of the smart grid: What your smart meter data can reveal about your household in Ireland. Energy Res. Soc. Sci. 2018, 44, 312–323. [Google Scholar] [CrossRef]
Mohassel, R.R.; Fung, A.S.; Mohammadi, F.; Raahemifar, K. A survey on advanced metering infrastructure and its application in smart grids. In Proceedings of the 2014 IEEE 27th CANADIAN Conference on Electrical and Computer Engineering (CCECE), Toronto, ON, Canada, 4–7 May 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–8. [Google Scholar]
Zanetti, M.; Jamhour, E.; Pellenz, M.; Penna, M.; Zambenedetti, V.; Chueiri, I. A tunable fraud detection system for advanced metering infrastructure using short-lived patterns. IEEE Trans. Smart Grid 2017, 10, 830–840. [Google Scholar] [CrossRef]
McLaughlin, S.; Holbert, B.; Fawaz, A.; Berthier, R.; Zonouz, S. A multi-sensor energy theft detection framework for advanced metering infrastructures. IEEE J. Sel. Areas Commun. 2013, 31, 1319–1330. [Google Scholar] [CrossRef]
Jokar, P.; Arianpoo, N.; Leung, V.C.M. Electricity theft detection in AMI using customers’ consumption patterns. IEEE Trans. Smart Grid 2015, 7, 216–226. [Google Scholar] [CrossRef]
Rahimian, E.; Zabihi, S.; Atashzar, S.F.; Asif, A.; Mohammadi, A. Xceptiontime: A novel deep architecture based on depthwise separable convolutions for hand gesture classification. arXiv 2019, arXiv:1911.03803. [Google Scholar] [CrossRef]
Fawaz, H.I.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.-A.; Petitjean, F. Inceptiontime: Finding alexnet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
Cheng, M.; Liu, Q.; Liu, Z.; Li, Z.; Luo, Y.; Chen, E. FormerTime: Hierarchical multi-scale representations for multivariate time series classification. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 1437–1445. [Google Scholar]
Kuppan, K.; Acharya, D.B.; Divya, B. LSTM-GNN Synergy: A New Frontier in Stock Price Prediction. J. Adv. Math. Comput. Sci. 2024, 39, 95–109. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the overall TGCformer network structure.

Figure 2. Illustration of sample time-series features.

Figure 3. Calculation process of GATv2 attention coefficients.

Figure 4. Schematic diagram of the cross-attention mechanism.

Figure 5. Comparison chart of hourly electricity consumption distribution between normal and electricity theft samples.

Figure 6. Classification Performance Comparison between Single-Channel and Fusion Models (Confusion Matrices).

Figure 7. Visualization of cross-attention weights.

Figure 8. Comparison of feature separability.

Table 1. Mathematical Definitions of Representative Time Series Features.

No.	Feature & Description	Mathematical Formula
1	Absolute Energy: Reflects the overall amplitude or intensity of the signal	$E = \sum_{i = 1}^{n} x_{i}^{2}$
2	Mean Change: Indicates the overall trend of the time series	$M C = \frac{1}{n} \sum_{i = 1, \dots, n - 1} (x_{i + 1} - x_{i})$
3	Absolute Sum of Changes: Quantifies the total volatility of the sequence.	$A S O C = \sum_{m = 0}^{n - 1} \|x_{i + 1} - x_{i}\|$
4	FFT Coefficient: Captures periodicity and frequency characteristics.	$A_{k} = \sum_{m = 0}^{n - 1} a_{m} e x p (- 2 π i \frac{m k}{n})$
5	$C_{3}$ Parameter: Characterizes self-similarity and nonlinearity.	$C_{3} = \frac{1}{n - 2 lag} \sum_{i = 0}^{n - 2 lag} x_{i + 2 lag}^{2} x_{i + 2 lag} x_{i}$

Table 2. Methods for Constructing Anomalous Electricity Consumption Data.

No.	Method Description	Mathematical Formula
1	Proportional Reduction: Randomly select consecutive daily data and scale it down uniformly.	${\tilde{x}}_{t} = α x_{t}, α ~ U (0,1)$
2	Threshold-based Perturbation: Randomly reduce only the data points exceeding a high percentile threshold.	${\tilde{x}}_{t} = \{\begin{matrix} x_{t}, x_{t} \leq c \\ {α_{t} x}_{t}, x_{t} > c \end{matrix}, α_{t} ~ U (0,1)$
3	Constant Truncation: Subtract a constant from all values, setting results below zero to zero.	${\tilde{x}}_{t} = \max (x_{t} - c, 0)$
4	Time-interval Zero-setting: Set data within random 8-h intervals on selected days to zero.	${\tilde{x}}_{t} = β x_{t}, β = \{\begin{matrix} 0, {t \in [t}_{i}, t_{j}] \\ 1, o t h e r w i s e \end{matrix}$
5	Daily Random Perturbation: Scale each day’s data independently by a daily random factor.	${\tilde{x}}_{t} = {α_{t} x}_{t}, α_{t} ~ U (0,1)$
6	Monthly mean Propagation: Scale each month’s data using the previous month’s average as a coefficient.	${\tilde{x}}_{t} = x_{t} m e a n (x_{t - 1}^{m o n t h})$

Table 3. Experimental Environment and Key Configurations.

Category	Configuration/Value	Category	Configuration/Value
Processor	AMD EPYC 9354	Batch Size	512
GPU	NVIDIA RTX 4090	Learning Rate	1 × 10⁻⁴
Python Version	3.11.8	Total Training Epochs	20
PyTorch Version	2.2.2	Weight Decay	1 × 10⁻⁴
Data Split Ratio	8:1:1 (Train:Val:Test)	Dropout Rate	0.1
Optimizer	AdamW	KNN (K)	8
Loss Function	Focal Loss $(α = 1, γ = 2)$	GATv2 layers	4
Random Seed	[123, 199, 1998, 2178, 3047]	Number of Cross-Source Attention Layers	8
Multi-scale window sizes	{4, 8, 16, 32, 64}	Number of Attention Heads	8

Table 4. Comparative Experiments with Other Anomaly Detection Methods.

Method	Anomaly Rate	ACC	Precision	Recall	F1-Score	FPR
XceptionTime [36]	5%	0.955 ± 0.005	0.778 ± 0.447	0.107 ± 0.126	0.175 ± 0.180	0.001 ± 0.001
	10%	0.915 ± 0.007	0.944 ± 0.073	0.161 ± 0.101	0.263 ± 0.141	0.002 ± 0.002
	15%	0.870 ± 0.017	0.845 ± 0.102	0.153 ± 0.130	0.244 ± 0.173	0.003 ± 0.002
InceptionTime [37]	5%	0.949 ± 0.002	0.780 ± 0.114	0.171 ± 0.122	0.259 ± 0.146	0.003 ± 0.002
	10%	0.905 ± 0.004	0.880 ± 0.110	0.221 ± 0.077	0.348 ± 0.101	0.004 ± 0.004
	15%	0.889 ± 0.018	0.871 ± 0.054	0.311 ± 0.161	0.431 ± 0.185	0.009 ± 0.009
FormerTime [38]	5%	0.951 ± 0.005	0.689 ± 0.273	0.114 ± 0.042	0.187 ± 0.067	0.005 ± 0.005
	10%	0.912 ± 0.006	0.683 ± 0.134	0.229 ± 0.031	0.339 ± 0.036	0.013 ± 0.007
	15%	0.872 ± 0.014	0.639 ± 0.113	0.379 ± 0.148	0.454 ± 0.119	0.041 ± 0.021
LSTM-GNN [39]	5%	0.950 ± 0.000	0.475 ± 0.000	0.500 ± 0.000	0.487 ± 0.000	0.500 ± 0.000
	10%	0.901 ± 0.000	0.450 ± 0.000	0.500 ± 0.000	0.474 ± 0.000	0.500 ± 0.000
	15%	0.840 ± 0.013	0.471 ± 0.068	0.507 ± 0.015	0.478 ± 0.033	0.493 ± 0.015
TGCformer (Ours)	5%	0.982 ± 0.008	0.921 ± 0.079	0.714 ± 0.227	0.772 ± 0.142	0.004 ± 0.005
	10%	0.990 ± 0.005	0.978 ± 0.022	0.923 ± 0.038	0.949 ± 0.024	0.002 ± 0.002
	15%	0.979 ± 0.006	0.926 ± 0.022	0.937 ± 0.022	0.931 ± 0.021	0.013 ± 0.004

Table 5. Computational Performance Analysis of Different Stages in TGCformer.

Processing Stage	Device	Avg. Time/Sample	Throughput (Samples/s)
Multi-scale Feature Extraction (TSFresh)	CPU	98.68 s	0.01
Graph Embedding Feature Extraction (GATv2)	GPU	3.82 s	0.26
TGCformer Inference	GPU	1.60 ms	626.9

Table 6. Detection Performance under Different Feature Inputs.

FeatureSet	ACC	Precision	Recall	F1-Score	FPR
Only_TSFresh	0.973 ± 0.006	0.849 ± 0.058	0.898 ± 0.063	0.870 ± 0.025	0.019 ± 0.010
Only_GATv2	0.811 ± 0.077	0.394 ± 0.259	0.526 ± 0.125	0.379 ± 0.070	0.157 ± 0.096
TGCformer (Ours)	0.990 ± 0.005	0.978 ± 0.022	0.923 ± 0.038	0.949 ± 0.024	0.002 ± 0.002

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, L.; Chen, S.; Wu, X.; Wang, Q.; Liu, Y.; Peng, Y. TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection. Electronics 2026, 15, 874. https://doi.org/10.3390/electronics15040874

AMA Style

Xu L, Chen S, Wu X, Wang Q, Liu Y, Peng Y. TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection. Electronics. 2026; 15(4):874. https://doi.org/10.3390/electronics15040874

Chicago/Turabian Style

Xu, Li, Shouwei Chen, Xiaoping Wu, Qu Wang, Yu Liu, and Yasi Peng. 2026. "TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection" Electronics 15, no. 4: 874. https://doi.org/10.3390/electronics15040874

APA Style

Xu, L., Chen, S., Wu, X., Wang, Q., Liu, Y., & Peng, Y. (2026). TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection. Electronics, 15(4), 874. https://doi.org/10.3390/electronics15040874

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TGCformer: A Transformer-Based Dual-Channel Fusion Framework for Power Load Anomaly Detection

Abstract

1. Introduction

2. TGCformer

2.1. Overall Framework

2.2. Dual-Channel Structural Feature Extraction and Encoding

2.2.1. Multi-Scale Time Series Statistical Feature Extraction

2.2.2. Graph-Level Embedding Feature Extraction Based on Sparse Unified GATv2

2.3. Dual-Channel Feature Fusion and Encoding Based on Multi-Head Cross-Attention

3. Experiments and Validation

3.1. Data Description

3.2. Experimental Setup and Evaluation Metrics

3.3. Overall Performance Evaluation of TGCformer

3.4. Ablation Study

3.5. Visualization and Interpretability Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI