Power System Fault Detection and Localization Using a Dual-Path Spatio-Temporal Multi-Task Graph Convolutional Network

Wu, Zhaoyang; Shi, Fanrong; Li, Hao; Ran, Lili

doi:10.3390/electronics15132767

Open AccessArticle

Power System Fault Detection and Localization Using a Dual-Path Spatio-Temporal Multi-Task Graph Convolutional Network

Robot Technology Used for Special Environment Key Laboratory of Sichuan Province, School of Information Engineering, Southwest University of Science and Technology, Mianyang 621010, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(13), 2767; https://doi.org/10.3390/electronics15132767 (registering DOI)

Submission received: 25 May 2026 / Revised: 12 June 2026 / Accepted: 18 June 2026 / Published: 23 June 2026

Download

Browse Figures

Versions Notes

Abstract

With the continuous expansion and increasing topological complexity of modern power grids, achieving high-precision fault localization under sparse measurement conditions has become a core challenge in the operation and maintenance of smart grids. Existing methods based on deep graph networks generally face complex spatiotemporal coupling between fault types and fault localization. To address this, this paper proposes a recognition method for fault localization based on sparse measurements and spatial configuration. A reinforcement learning algorithm with a Checking-Action mechanism, termed DQN-CA, is adopted to identify optimal PMU installation buses. In parallel, a dual-path spatio-temporal multi-task graph convolutional network, termed ST-MTGCN, is developed to decouple fault-type-related features from topology-sensitive fault-Localization features through a global feature dimensionality-reduction path and a K-hop spatial graph convolution path, thereby accomplishing the fault localization task. Experimental results on the IEEE 39-bus system show that ST-MTGCN achieves 99.68% fault type accuracy, 89.94% fault localization accuracy, and 88.62% accuracy for 185 joint fault type-Localization classes under the OPT13 configuration. Comparative experiments, PMU configuration sensitivity analysis, and ablation studies further demonstrate the effectiveness of the proposed framework under sparse measurement conditions.

Keywords:

fault localization; PMU placement; deep reinforcement learning; graph convolutional network

1. Introduction

With the widespread integration of distributed generation and the increasing complexity of modern power grid topologies, wide-area measurement systems (WAMSs) and phasor measurement units (PMUs) have become important technical foundations for situational awareness in smart grids because they can provide high-precision and time-synchronized phasor measurements [1,2,3]. However, due to the high cost of PMU devices and the limitations of communication bandwidth, large interconnected power grids usually operate under sparse PMU coverage. In practical dispatching and protection scenarios, when sudden short-circuit faults or abnormal disturbances occur, rapidly identifying the fault type and accurately localizing the fault topology are essential for isolating the affected area, supporting emergency control, and preventing cascading outages. Nevertheless, sparse PMU measurements inevitably introduce spatial blind spots in power system monitoring. Weak transient voltage-dip signals may be attenuated, distorted, or aliased during long-distance propagation along the physical grid topology. Therefore, determining an effective sparse PMU configuration while simultaneously achieving high-precision fault type classification and fault localization has become a key challenge in smart grid operation and maintenance.

Optimal PMU configuration (OPP) aims to achieve full observability of the entire network topology using the minimum number of sensors, while keeping equipment investment and communication costs low, thereby providing effective data for fault classification and Localization identification in power systems. Early research has largely relied on mathematical analysis (such as integer linear programming and Koopman operator theory) or heuristic algorithms to identify the optimal sensor configuration that satisfies system observability [4,5,6]. However, with the rapid increase in the number of nodes in power grids, traditional algorithms are prone to suffering from the curse of dimensionality in discrete spaces. In recent years, deep reinforcement learning (DRL) has been introduced into the field of power grid measurement optimization due to its powerful representation capabilities in decision-making [7,8,9]. Recent studies have further demonstrated the potential of deep reinforcement learning in power system analysis, including operation optimization, stability control, scheduling decision-making, and measurement configuration [10]. Compared with traditional mathematical programming and heuristic search methods, DRL can learn sequential decision-making policies from interactions with the environment and is therefore suitable for high-dimensional combinatorial optimization problems such as OPP [7]. Although algorithms such as DQN have shown some effectiveness in multi-dimensional optimization, they generally face bottlenecks such as inefficient exploration and redundant optimization attempts when dealing with the vast discrete action space of OPP [11].

Once the optimal observation boundary has been established, the goal of fault identification is to classify fault types and locate fault topologies using the sparse phasor data transmitted from the front end. Early research relied on physical evaluation models based on a power-based perspective or traditional relay protection logic [12,13,14]. However, research has shown that their generalizability is limited under complex ground fault or high-impedance conditions. Subsequently, a large number of deep learning models based solely on temporal data (such as hybrid CNN-LSTM and Transformer models) have been introduced into this field, achieving significant breakthroughs in extracting local temporal transient waveforms and identifying various types of short-circuit faults [15,16,17,18,19]. Nevertheless, such models still suffer from topological blind spots; due to the lack of explicit spatial structural constraints, they tend to confuse physical buses that are in close electrical proximity under sparse PMU observations, and are unable to address the challenge of high-precision fault Localization.

Recently, industrial large language models have shown promising potential in intelligent fault diagnosis, knowledge reasoning, and maintenance decision support under variable operating conditions [20]. For example, adaptive industrial large language models have been explored for mechanical fault diagnosis under varying working conditions, showing strong capabilities in knowledge-driven reasoning and cross-condition diagnosis [21]. However, high-speed power system fault localization under sparse PMU measurements still relies heavily on numerical transient signal modeling and explicit topology-aware representation learning [22]. Therefore, this work focuses on a spatio-temporal graph learning framework for sparse-measurement-based fault localization, while industrial large language models are regarded as a promising complementary direction for future intelligent operation and maintenance systems.

To overcome the topological blind spots of pure temporal models, graph neural networks (GNNs) and their variants, such as GCN and GraphSAGE, have been increasingly adopted for power system fault localization [23,24,25,26,27,28]. These methods utilize multi-hop neighbor aggregation mechanisms to successfully propagate transient dropout characteristics of unobserved nodes along the physical power grid to measurement nodes. However, as research progresses, single-path graph networks still exhibit shortcomings under sparse measurement conditions. Existing graph network diagnostic models face complex spatiotemporal coupling issues when performing fault Localization tasks, which leads to classification features being diluted by graph topological features [29,30,31]. At the same time, over-smoothing in dense physical subnets—particularly in areas of dense grid integration on the generator side—leads to the blending of node characteristics due to multi-hop aggregation, resulting in potential manifold overlaps and blurred localization boundaries [32,33,34].

To address the dual limitations of existing methods—namely, the low efficiency of PMU optimization and the complex spatiotemporal characteristics involved in fault localization—this paper proposes a fault detection and localization framework for fault localization based on a sparse measurement network with efficient spatial configuration. The main contributions of this paper are as follows:

A DQN-CA-based optimal PMU placement strategy is developed for sparse measurement configuration. By embedding the Checking-Action mechanism into the action-selection process, redundant and invalid PMU placement actions are suppressed, thereby improving the search efficiency of OPP under full observability constraints.
A dual-path spatio-temporal multi-task graph convolutional network, termed ST-MTGCN, is proposed for joint fault type classification and fault topology localization under sparse PMU measurements. The model combines a global feature reduction path and a K-hop spatial graph convolution path to decouple fault-type-related features from topology-sensitive fault-location features.
We designed a joint optimization scheme based on physical structures. Through multi-task auxiliary supervision, we simultaneously utilized t-SNE visualization and confusion matrices to demonstrate a high degree of physical consistency between the model’s residual error and the topological isomorphism of the power system.

The remainder of this paper is organized as follows. Section 2 introduces the proposed PMU placement optimization model. Section 3 presents the architecture and mechanism of the proposed ST-MTGCN. Section 4 reports and discusses the experimental results. Section 5 concludes the paper and outlines future research directions.

2. An OPP Framework Based on Deep Reinforcement Learning

2.1. Overall Architecture

Given the complexity and diversity of modern power system topologies, this paper employs spatial configuration networks to address the task of fault localization and identification: (1) Reinforcement learning-based optimal sensor placement (DQN-CA); (2) A fault panoramic diagnosis method based on a dual-path spatio-temporal graph neural network (ST-MTGCN). Phase 1: In the power system, the goal of achieving full-network observability with the minimum number of PMUs is addressed by employing a Double DQN (DQN-CA) based on Checking Action to solve the optimal PMU placement problem. Phase 2: The fault data monitored by the obtained optimal PMU placement points, along with the topological adjacency matrix, are input into the ST-MTGCN network model to perform fault type classification and fault Localization identification tasks.

2.2. An Optimal Test Point Selection Model Based on DQN-CA

The problem of determining the optimal PMU configuration (OPP) can be formulated as an NP-hard combinatorial optimization problem and transformed into a Markov decision process (MDP) [35]. As shown in Figure 1, this paper employs the nonlinear fitting capabilities and exploration mechanisms of deep reinforcement learning to perform a heuristic solution. Under the assumption that the entire power system is fully observable, the objective is to minimize the number of PMUs installed. Inspired by the concept of local spatial constraints in cellular automata (CA) for complex spatial search tasks (such as pedestrian evacuation and robot navigation) [36,37], this paper employs a reinforcement learning mechanism capable of filtering out invalid actions at the tensor level to drive the algorithm to converge stably to the OPP position. Consider a power system with

N

nodes, whose topology can be represented by an undirected graph

G = (V, E)

Traditional ILP algorithms face a curse of dimensionality as the number of nodes

N

increases. This method reformulates the problem as a sequential decision-making process.

The key tuple

⟨ S, A, R, P ⟩

defining a Markov decision process (MDP) is as follows:

State Space: S

The system state is represented by a 39-dimensional binary vector that records the PMU installation status at each node in the current power grid:

S = [s_{1}, s_{2}, \dots, s_{39}], s_{i} \in \{0, 1\}

(1)

Here,

s_{i} = 1

indicates that node

i

has a PMU installed; otherwise,

s_{i} = 0

.

Given a state

S

, to determine whether the entire network is observable, based on Kirchhoff’s laws and Ohm’s law, if a PMU is installed at a node, the voltage phasors of that node and all its adjacent nodes are observable. We define the system’s observability matrix

M_{o b s} (M \in R^{N \times N}

) (the adjacency matrix of Figure 1 plus the identity matrix

A + I

), where

A

is the system’s adjacency matrix and

I

is the identity matrix. The system’s observable state vector

y \in R^{N}

can be computed as

y = M s_{i}

. Therefore, the OPP problem can be strictly defined as the following integer linear programming (ILP) problem: a node is observable if and only if a PMU is installed at that node or at one of its adjacent nodes. First, matrix multiplication enables extremely fast vectorized determination:

O = \min (1, S \times M_{o b s})

(2)

N_{o b s} = \sum_{i = 1}^{39} O_{i}

(3)

Here,

O

is a 39-dimensional observable Boolean vector, and

N_{o b s}

is the total number of currently observable nodes.

2.: Action Space and the Checking Action Mechanism: A

In standard Q-learning, the model selects actions based on the action-value function

Q (s, a)

. At any given step, the model selects a node

a \in {0, 1, \dots, 38}

to install a new PMU. However, in the OPP problem, repeatedly installing PMUs at the same node constitutes an invalid configuration. Standard DQN requires a lengthy process of trial and error and penalty to learn this rule, resulting in a significant waste of computational resources. To address this, this paper introduces a Checking Action mask. Before executing the

ε

—greedy policy to select the action with the maximum

Q

, this mask forces the

Q

of nodes where PMUs have already been installed to be set to a minimum, thereby physically and logically eliminating invalid redundant exploration:

Q_{m a s k e d} (s_{i}) = \{\begin{matrix} Q (s_{i}), & if s_{i} = 0 \\ - 1 \times 1 0^{9}, & if s_{i} = 1 \end{matrix}

(4)

The CA mechanism fundamentally prunes invalid Markov decision tree branches, strictly reducing the search space from

N^{K}

(allowing repetitions) to a space of unique permutations, thereby effectively improving the convergence speed and accuracy of the neural network.

3.: Reward Function: R

The reward function directly guides the model to achieve maximum coverage with the minimum number of PMUs. This method employs a mechanism that combines dense rewards with sparse terminal rewards:

R = \{\begin{matrix} 100, & if N_{o b s} = 39 \\ - N_{p m u} + 2 \times N_{o b s}, & otherwise \end{matrix}

(5)

N_{p m u}

represents the total number of installed PMUs, and

N_{o b s}

represents the total number of observable buses. When full network observability has not yet been achieved, the model incurs a penalty of −1 for each additional PMU (

- N_{p m u}

). However, if that PMU introduces a new observable node, it receives a positive reward (

+ 2 \times N_{o b s}

). This gradient-based reward design provides the model with an optimization direction, avoiding blind search in a vast state space.

4.: Transition Probability: P

Finally, since the PMU installation process is physically deterministic, once it is decided to install a PMU at node

i

(

a_{t} = i

), that node must be in the installed state in its next state, that is:

P (s_{t + 1} | s_{t}, a_{t}) = 1

(6)

P (s_{t + 1} | s_{t}, a_{t})

denotes the probability of transitioning to state

s_{t}

after executing action

a_{t}

in state

s_{t + 1}

. Algorithm 1 shows the specific application of DQN-CA on the IEEE-39 node system.

Algorithm 1: DQN-CA for Optimal PMU Placement

Input: IEEE-39 topology G, N_SEEDS, EP_PER_SEED
for seed = 1 … N_SEEDS:
init Q_eval, Q_tar (FC 39 → 128 → 128 → 39 + BN + ReLU), replay D, ε = 0.9
for ep = 1 … EP_PER_SEED:
s ← 0; while n_obs(s) < 39:
a ← ε-greedy(Q_eval(s) with Checking mask: Q[s = 1] ← −∞)
s’ ← s ∪ {a}; r ← −n_pmu(s’) + 2·n_obs(s’) (+100 if done)
D.push(s,a,r,s’,done) ▷ replay buffer
a* = argmax Q_eval(s’); target = r + γ·Q_tar(s’,a*)·(1 − done) ▷ Double DQN
update Q_eval on MSE(Q_eval(s,a), target); periodically Q_tar ← Q_eval
if |s| < |B*|: B* ← buses(s)
return B*
Output: best PMU set B* (|B*| minimal, full observability)

3. Architecture and Mechanisms of Dual-Path Spatio-Temporal Multitask Convolutional Graph Networks

3.1. A Dual-Path Spatio-Temporal Multi-Task Convolutional Graph Network Architecture

After obtaining sparse transient data for the optimal PMU nodes, this paper proposes the ST-MTGCN architecture, as shown in Figure 2. By decoupling fault-type features from fault-topology localization features, this network resolves the issue where classification features are diluted by graph topology features due to complex spatiotemporal coupling in graph networks. Table 1 shows the main hyperparameters of the ST-MTGCN model used for fault diagnosis and localization.

3.1.1. Global Dimension Reduction Path

During transients in power systems, the type of fault (such as single-phase-to-ground or two-phase short-circuit) determines the nature of the sudden change in system energy and the waveform distortion characteristics. These characteristics are universal and should not be influenced by the specific Localization of the fault; therefore, this paper proposes a global dimension-reduction approach to extract fault-type features.

The shared high-dimensional feature tensor derived from the output of the temporal encoder is:

H_{s h a r e d} \in R^{B \times C \times N \times T^{'}}

(7)

where

C

is the number of channels,

N

is the number of data observation points, and

T^{'}

is the down sampled time step.

In the global path,

H_{s h a r e d}

is first subjected to global flattening along the channel, node, and time dimensions, converting it into a one-dimensional feature vector. Subsequently, the high-dimensional space undergoes drastic dimensionality reduction through linear layers equipped with batch normalization and nonlinear activation functions. The mathematical process can be expressed as:

z_{g l o b a l} = σ (W_{g} \times Flatten (H_{s h a r e d}) + b_{g})

(8)

Here,

W_{g}

and

b_{g}

are the trainable weight matrix and bias vector of the fully connected layer in this path, respectively, and

σ

denotes the ReLU activation function. The final output,

z_{g l o b a l} \in R^{52}

is an extremely compact 52-dimensional global latent variable. This variable is subsequently used directly to assist the Auxiliary Classification Head (Aux S Head) in fault type decoupling supervision and global waveform reconstruction.

3.1.2. Paths for Spatial Graph Aggregation

Unlike fault types, the Localization of a fault depends on the physical topology of the power grid [38]. The amplitude of transient voltage sags decays spatially along electrical lines, and this decay gradient is also a characteristic for locating the fault source [39,40,41]. Since the optimal configuration nodes obtained by DQN-CA optimization are generally distributed discretely in physical space, this paper designs a spatial graph aggregation path that leverages the powerful topological awareness of graph neural networks (GNNs) to enable discrete measurement points to communicate with each other along physical conductors.

To meet the GNN input requirements, we first perform node reshaping on

H_{s h a r e d}

by folding it along the temporal and channel dimensions, converting it into a feature representation matrix for graph nodes:

H_{n o d e} \in R^{N \times F_{i n}}

(9)

Here,

F_{i n}

represents the initial feature dimension of a single node. Additionally, based on the physical line parameters of the power system and the electrical interconnections between PMU nodes, a physical adjacency matrix is constructed:

A \in R^{N \times N}

(10)

Meanwhile, to address the issue of degree imbalance in irregular graphs encountered by traditional GCNs [42], we further employ the Graph Sample and Aggregate (GraphSAGE) operator, which is based on a sampling and aggregation strategy. Three layers of GraphSAGE convolutional modules are cascaded within the spatial graph path. In the

k

layer of graph convolution, node

v

first aggregates the hidden layer features of its first-order physical neighbors

N (v)

, and then concatenates and fuses them with its own features from the previous layer. The mathematical formula for this information propagation is defined as:

h_{N (v)}^{(k)} = AGG (\{h_{u}^{(k - 1)}, \forall u \in N (v)\})

(11)

h_{v}^{(k)} = σ (W^{(k)} \times [h_{v}^{(k - 1)} ∥ h_{N (v)}^{(k)}])

(12)

Here,

h_{v}^{(k)}

denotes the feature representation of node

v

at layer

k

;

AGG (\times)

employs a Mean Aggregator to capture the average voltage drop trend within the local topology;

∥

denotes the tensor concatenation operation; and

W^{(k)}

is the graph convolution weight matrix for layer

k

. After three layers of deep feature propagation, each measurement point integrates transient fluctuation information from its third-order topological neighborhood:

h_{0} = ϕ (Linear (F_{n o d e})), h_{0} \in R^{B \times 13 \times 128}

(13)

h_{1} = {SAGE}_{1} (h_{0}, A), h_{2} = {SAGE}_{2} (h_{1}, A), h_{3} = {SAGE}_{3} (h_{2}, A))

(14)

where

h_{3} \in R^{B \times 13 \times 384}

and

A

is the PMU topology-normalized adjacency matrix. Finally, through a graph-level global mean pooling operation, the features of all nodes are aggregated into a 384-dimensional graph feature representation:

g = \frac{1}{N} \sum_{v = 1}^{N} h_{v}^{(3)}

(15)

This feature,

g \in R^{384}

effectively captures the spatial decay gradient of voltage sags during power grid faults, providing robust topological support for subsequent precise localization tasks.

3.1.3. Multitasking and Auxiliary Supervision Head

Fuse the two features obtained (global features and graph features):

f = [z_{g l o b a l}; g], f \in R^{B \times (52 + 384)} = R^{B \times 436}

(16)

Fault localization task using an MLP:

o_{s d} = {Head}_{s d} (f), o_{s d} \in R^{B \times S D}

(17)

These are fed into separate prediction heads and reconstruction heads, where

S D

represents the total number of fault Localization points of different types. The output is fed into the main classifier to produce a joint localization prediction

{\hat{y}}_{S D}

(encompassing the orthogonal combination of

S D

class Localization and type). Additionally, to prevent dominant 384-dimensional spatial features from overshadowing fragile 52-dimensional classification features during backpropagation, we independently attach an auxiliary classification head at

z_{g l o b a l}

, which outputs a type prediction

{\hat{y}}_{S}

, thereby achieving a balance in feature representation within the latent space.

3.1.4. Multi-Task Joint Loss Functions and Physical Reconstruction

To prevent the model from overfitting in the

S D

localization and classification task, this paper introduces an unsupervised regularization mechanism based on physical waveform reconstruction. At the end of the dual-path architecture, a global decoder and a spatio-temporal decoder are connected, respectively, and are tasked with reconstructing the original transient waveforms

{\hat{X}}_{g l o b a l}

and

{\hat{X}}_{n o d e}

from the extracted global features

z_{g l o b a l}

and spatial graph features

g

. The training of the final model is driven by a multi-objective joint loss function. The total loss

L_{t o t a l}

is composed of the main classification cross-entropy loss

L_{S D}

, the auxiliary classification cross-entropy loss

L_{S}

, and the bidirectional reconstruction mean squared error (MSE) loss:

L_{t o t a l} = λ_{S D} L_{S D} + λ_{a u x} L_{S} + λ_{g} | | X - {\hat{X}}_{g l o b a l} {| |}_{2}^{2} + λ_{n} | | X - {\hat{X}}_{n o d e} {| |}_{2}^{2}

(18)

The hyperparameter weights for each task were set to

λ_{s d} = 1.0, λ_{a u x} = 0.3, λ_{g} = 0.15, λ_{n} = 0.15

, respectively. The model employed the AdamW optimizer, combined with cosine annealing and gradient clipping strategies, to ensure that the entire dual-path spatio-temporal architecture converged stably on complex nonlinear error surfaces. Table 1 presents the experimental results of the ST-MTGCN model.

Algorithm 2 Pseudocode for the Dual-Path Spatio-Temporal Multi-Task Convolutional Graph Network (ST-MTGCN).

Algorithm 2: ST-MTGCN Training

Input: X ∈R^{B × 3 × 13 × 30}, A_norm ∈ R^{13×13}, y_SD ∈ {0.184}
for epoch = 1 … 150:
lr ← warmup(5) → cosine to 1 × 10⁻⁶
for (X, y_SD) in loader:
feat = Encoder_GN(X)    ▷ [B,32,13,30]
z_glob = Linear(flatten feat);  rec_G = Decoder_GN(z_glob)
aux_S = aux_S_head(z_glob)     ▷ [B,6]
h = NodeProj(feat) → SAGE × 3(A_norm) → mean_N    ▷ graph_feat [B,384]
rec_N = STDecoder(h_nodes)
logits_SD = SDHead([z_glob ‖ graph_feat]) ▷ [B,185]
L = CE(logits_SD,y_SD) + 0.3·CE(aux_S,y_S) + 0.15·(MSE_G + MSE_N)
AdamW step; gradClip = 1.0; EMA update (decay = 0.998)
if val_SxD(EMA) > best: save θ*
pr_SD = argmax logits_SD; pr_S = sd2type[pr_SD]; pr_D = sd2loc[pr_SD]
Output: θ* maximizing val S × D accuracy

3.2. Fault Data Generation Method

Before proceeding with fault identification, classification, and Localization, we require a comprehensive fault dataset. Current research indicates that few open datasets exist that cover multiple types and Localizations of power system faults. Therefore, we have chosen the widely recognized IEEE 39-node system as our physical environment, operating at a power frequency of 60 Hz, and will leverage the powerful electromagnetic transient (EMT) simulation capabilities of DigSILENT PowerFactory. Simulation time range [−0.05 s, +0.2 s]. Fault injection time

t = 0.02 s

, Fault clearance time

t = 0.08 s

(fault duration: 60 ms). The simulation step size is 0.1 ms, with a sampling rate of 10 kHz (10,000 points per second). The time window contains 30 sample points, covering an effective time span of approximately 3 ms, which is used to characterize the transient waveform. For each sample, active and reactive loads were independently perturbed before EMT simulation, and an AC load flow calculation was re-executed to obtain a valid pre-fault operating point. The training and test sets were generated and stored separately, and no duplicate fault scenario or operating-condition sample was shared between them. Therefore, the test set evaluates the model under unseen operating conditions rather than memorized load snapshots.

The main simulation parameters for fault data generation are summarized in Table 2.

4. Experiments and Analysis of Results

4.1. Data Generation and Preprocessing

As shown in Algorithm 2, the data partitioning employs a category-balanced fixed-quota strategy to perform high-precision dynamic time-domain simulations while ensuring the convergence of power flow calculations. To obtain a rich set of fault data while ensuring consistency in sample organization, the raw records were first sorted by ‘fault_type’, ‘fault_Localization’, ‘load_condition’, and ‘timestep’, then sliced into single-sample inputs at fixed time steps, ultimately forming the network input shape. At the label level, an S × D joint modeling strategy was adopted, encoding fault type and fault Localization into joint categories, resulting in six types of faults: Phase A ground fault (AG), AB-phase short-circuit fault (AB), BC two-phase ground fault (BCG), ABC-phase ground fault (ABCG), generator trip (GT), and load disconnection (LO). The total number of joint categories is 185. Each (S_i, D_j) combination corresponds to 40 training samples and 10 test samples, resulting in a total dataset of 7400 training samples and 1850 test samples.

4.2. Results of Optimal PMU Configuration Based on the DQN-CA Algorithm

4.2.1. Training Dynamics and OPT13 Configuration of DQN-CA

In this section, we validate the solution performance of the DQN-CA algorithm OPP based on an IEEE 39-node system. As shown in Figure 3, we designed a dense reward function and embedded a Checking Action mechanism at the front end of the network’s decision layer. By setting the Q-value of actions on already deployed buses to a minimum, this mechanism filters out redundant and invalid actions at the physical topology level, thereby significantly narrowing the search space. Table 3 shows the training parameters of the DQN-CA algorithm.

To address the convergence challenges of reinforcement learning under discrete sparse rewards, the training process incorporates GPU parallel acceleration strategies at the algorithmic engineering level: the number of parallel environments

N_{E N V S} = 64

o enable synchronous evolution across multiple episodes; a very large batch size (

B a t c h_S i z e = 2048

) is adopted to improve the accuracy of gradient estimation; and the capacity of the experience replay pool is expanded to 100,000 entries. Furthermore, to prevent a single network from getting stuck in a local suboptimal solution, the experiment was configured with 50 independent random seeds for concurrent search, with a training limit of 2000 episodes per seed.

To visually illustrate the learning mechanism of DQN-CA in high-dimensional combinatorial spaces, Figure 4 details the training dynamics of the optimal random seed (seed = 1) over 2000 episodes. The light-colored trace represents the raw episode data, while the dark solid line represents the 30-episode moving average (30-ep MA). As observed from the reward curve on the left, during the early training phase (0–300 episodes), the exploration trajectory exhibits extremely high uncertainty due to a lack of prior knowledge about the grid topology, resulting in significant variance in the total reward. As the experience replay pool accumulates data and the network parameters are updated, the model gradually learns the physical mapping relationship between actions and observables. After 800 episodes, the amplitude of fluctuations in the moving average reward curve narrows significantly, eventually converging into a stable plateau after 1000 episodes. This smooth convergence trend indicates that the action masking (Checking Action) mechanism effectively prevents computational waste on invalid actions, ensuring the stable convergence of the Q-network.

The results show that the optimal configuration provided by DQN-CA requires only 13 PMUs (with a coverage rate of just 33%). Post-verification using Kirchhoff’s laws to calculate node currents demonstrated that this 13-node scheme can perfectly cover all buses in the IEEE 39-node system, achieving 100% observability of the entire network structure. This OPT13 configuration [2,6,9,10,12,14,17,19,20,22,23,25,29] will serve as the baseline input for dual-path spatio-temporal graph network diagnosis.

4.2.2. OPP for IEEE 39 Bus Test System

To evaluate the effectiveness of the proposed DQN-CA-based optimal PMU placement strategy, four OPP solving methods were compared on the IEEE 39-bus system, including Greedy search, BPSO, conventional DQN, and the proposed DQN-CA. Figure 5 shows the variation in the best-found PMU number during the optimization process. A lower PMU number indicates a more compact feasible PMU configuration under the same observability constraint.

As shown in Figure 5, all methods exhibit a stepwise decreasing trend, indicating that better feasible PMU placement schemes are gradually discovered during the search process. However, clear differences can be observed in terms of convergence speed and final solution quality. The conventional DQN method converges to an 18-PMU configuration and fails to further reduce the PMU number, which indicates that redundant or invalid action exploration may limit its search efficiency in the discrete OPP space. BPSO finally obtains a 14-PMU configuration, while Greedy search also converges to a 14-PMU solution after several updates. In contrast, the proposed DQN-CA obtains a 13-PMU configuration and maintains a stable solution after convergence.

The improved performance of DQN-CA can be attributed to the embedded Checking-Action mechanism. By constraining invalid and redundant PMU placement actions during the action-selection process, DQN-CA reduces ineffective exploration and improves the search efficiency in the high-dimensional discrete placement space. Therefore, the obtained OPT13 configuration provides a more compact sparse measurement layout for the subsequent fault diagnosis and localization task.

4.3. Classification of Fault Types and Fault Localization

4.3.1. Comparative Analysis of Results

To further verify the effectiveness of the proposed ST-MTGCN, representative baseline models were introduced for comparison, including traditional machine learning, temporal deep learning, and graph neural network models. The compared methods include SVM, CNN, GCN, GraphSAGE, LSTM, Transformer-GNN, and the proposed ST-MTGCN. All models were evaluated under the same OPT13 PMU configuration. Table 3 details the accuracy rates of the models across the three task dimensions: Fault Type, Localization, and S × D Joint.

As shown in Table 4, ST-MTGCN achieves the best overall performance, with 99.68% fault type accuracy, 89.94% Localization accuracy, and 88.62% S × D joint accuracy. Although LSTM obtains the same fault type accuracy as ST-MTGCN, its Localization accuracy and joint accuracy are much lower, indicating that temporal sequence modeling alone is insufficient for accurate fault topology localization under sparse PMU measurements.

Compared with graph-based baseline models, ST-MTGCN also shows clear advantages. The S × D joint accuracy of GCN and GraphSAGE is 78.32% and 80.21%, respectively, while ST-MTGCN improves the joint accuracy to 88.62%. This demonstrates that simple graph-neighbor aggregation is not sufficient to distinguish topology-sensitive fault Localizations under sparse observations. Compared with Transformer-GNN, ST-MTGCN improves the joint accuracy by 5.30 percentage points, further confirming the effectiveness of the proposed dual-path spatio-temporal multi-task graph learning architecture.

These results indicate that ST-MTGCN can better decouple fault-type-related features from topology-sensitive fault-Localization features, thereby improving the accuracy of joint fault type and Localization recognition.

4.3.2. Fault Type Identification and Evaluation

t-SNE Latent Space Clustering Analysis

To thoroughly validate the effectiveness of global paths for feature extraction in a dual-path decoupled architecture, Figure 6 and Figure 7 present dimension reduction and visualization analyses based on the t-SNE algorithm for the 6-dimensional normalized probability features output by the auxiliary classification head.

In Figure 6, the data points are colored according to their true physical fault labels. It can be observed that in the two-dimensional latent representation space, the six fault types exhibit a distinct topological distribution characterized by high cohesion and low coupling. Sample points within the same category demonstrate extremely high spatial compactness within their clusters, while broad and distinct safety decision boundaries are observed between clusters of different categories, effectively eliminating the phenomenon of manifold overlap. It is worth noting that the two abnormal states—transformer grounding (GT) and loss of magnetization/loss of synchronism (LO)—exhibit isolated subspace distributions in the geometric space. This phenomenon aligns closely with their unique electromagnetic transient evolution mechanisms, which differ from those of conventional busbar short circuits [43], and provides data-driven validation of the physical laws governing power systems [44]. At the same time, the global dimension reduction approach, combined with auxiliary supervision mechanisms, can reduce redundant topological interference caused by network isolation Localization identification, thereby enabling the precise extraction of fault type characteristics.

B.: Analysis of the Confusion Matrix

The confusion matrix shown in Figure 7 provides a visual quantification of the model’s classification accuracy across the six fault types. The matrix exhibits a highly diagonal distribution, with an overall classification accuracy as high as 99.68%. Specifically, the model achieved zero-error identification of transformer ground faults (GT); while in the evaluation of the four conventional asymmetric and symmetric short-circuit faults (AG, BCG, AB, ABCG), there were only extremely low probabilities of misclassification at class boundaries (for example, among 390 BCG test samples, only 2 were misclassified as ABCG, and among 390 AB samples, only 2 were misclassified as LO). It also demonstrates that the shared temporal coding module constructed in this paper possesses extremely strong dynamic feature representation capabilities.

4.3.3. Assessment of Fault Localization Capabilities

This section further focuses on the more challenging task of fault Localization. Fault Localization requires the model to precisely pinpoint the source of the fault among 68 candidate Localizations across the entire network (covering busbars, generators, and low-voltage loads). To this end, as shown in Figure 8, Figure 9 and Figure 10, this paper independently plots high-resolution localization confusion matrices for each fault type and conducts t-SNE latent space visualization analyses for each fault category using feature tensors obtained before the fully connected layer. This aims to reveal the capability boundaries of ST-MTGCN and the underlying physical topological mechanisms behind its errors.

As shown in Figure 8, in the evaluation of four types of conventional line short-circuit faults (AG, BCG, AB, and ABCG), the model demonstrated high localization accuracy, with independent localization accuracy rates of 86.41%, 95.38%, 81.28%, and 90.00%, respectively. Examination of the corresponding confusion matrices reveals that in the core topological region of the main grid spanning Bus 1 to Bus 29, the prediction results exhibit an extremely strict main diagonal distribution, with off-diagonal misclassification elements approaching zero. This phenomenon experimentally confirms that in well-connected, topologically balanced areas of the main grid, relying solely on the 13 sparse PMU measurement points provided by DQN-CA optimization, ST-MTGCN—utilizing the aggregation operator with

K_{h o p} = 3

in GraphSAGE—is capable of capturing the spatial decay patterns of transient voltage sags at unobserved nodes, thereby reconstructing the electrical distance gradient across the entire grid in high-dimensional space. However, in the lower-right corner of the confusion matrix for all short-circuit faults (i.e., the region from Bus 30 to Bus 39), a common block-wise confusion pattern is observed. From a power physics perspective, Bus 30–39 constitutes the generator-side subnetwork of the IEEE 39 system. In this region, multiple generators are densely interconnected via step-up transformers, resulting in extremely short electrical distances. Furthermore, under global observability constraints, DQN-CA assigns an extremely sparse distribution of PMUs to this region. According to the Laplacian smoothing theory in graph deep learning [45,46], the combination of this electrical structural isomorphism and the scarcity of observation viewpoints inevitably leads to over-smoothing [32] when deep graph networks aggregate neighborhood information in this region.

As shown in Figure 9, in the independent t-SNE visualization projection of four types of short-circuit faults, Bus 30–39 collapses into a highly congested central core cluster in the spatial distribution. This manifold overlap in high-dimensional space explains the root cause of the degradation in localization accuracy within this specific subnetwork [47].

As shown in Figure 9 and Figure 10, for the two special fault types—transformer ground fault (GT) and loss of excitation/loss of synchronization (LO)—the model exhibits error propagation characteristics that differ from those of conventional short circuits: for transformer ground faults (accuracy 86.00%), the confusion matrix shows that misclassifications are primarily concentrated in local clusters of equipment (such as cross-classification between G04, G05, and G06). For de-magnetization/desynchronization anomalies (accuracy: 79.47%), the t-SNE projection reveals severe distortion of the decision boundary for LO faults in the latent space, and the confusion matrix shows error terms widely distributed across low-voltage side loads. From the perspective of physical dynamic processes, the electromechanical transient oscillations triggered by de-magnetization and desynchronization exhibit significant global dispersion [48].

4.4. Ablation Study

4.4.1. PMU Configuration Sensitivity Analysis

To investigate the influence of PMU placement on fault localization performance, different PMU configurations were tested using the same ST-MTGCN model. The compared configurations include a random 5-PMU configuration, a sequential 13-PMU configuration, the DQN-CA-optimized OPT13 configuration, and the full 39-PMU configuration. The full 39-PMU configuration is regarded as the upper-bound measurement condition.

As shown in Figure 11, the random 5-PMU configuration achieves only 78.54% S × D joint accuracy due to insufficient system observability. When the number of PMUs is increased to 13 using a sequential but unoptimized configuration, the joint accuracy improves to 83.63%. However, under the same number of PMUs, the DQN-CA-optimized OPT13 configuration achieves 88.62% joint accuracy, which is 4.99 percentage points higher than the sequential 13-PMU configuration.

The full 39-PMU configuration achieves the highest joint accuracy of 90.76%, but it requires PMUs to be installed at all buses. In contrast, OPT13 uses only 13 PMUs, approximately one-third of the full measurement configuration, while achieving a joint accuracy close to the full-measurement upper bound. This result demonstrates that the proposed DQN-CA placement strategy can provide an effective sparse measurement configuration for ST-MTGCN, achieving a favorable balance between measurement cost and localization accuracy.

4.4.2. Ablation Study of the Proposed ST-MTGCN

To analyze the contribution of each component in the proposed ST-MTGCN, an ablation study was conducted. The evaluated variants include removing the auxiliary supervision branch, removing the reconstruction branch, removing both auxiliary supervision and reconstruction branches, using only the global feature path, and using only the graph path.

As shown in Figure 12, the full ST-MTGCN achieves the highest S × D joint accuracy of 88.62%. Removing the auxiliary supervision branch reduces the joint accuracy to 84.16%, corresponding to a decrease of 4.46 percentage points. This indicates that auxiliary supervision helps alleviate feature interference between the fault type classification task and the fault localization task. Removing the reconstruction branch reduces the joint accuracy to 86.11%, suggesting that the reconstruction constraint contributes to preserving useful latent representations.

When both auxiliary supervision and reconstruction branches are removed, the joint accuracy further decreases to 81.89%, confirming their combined contribution to stable multi-task learning. In addition, the comparison between the global-path-only and graph-path-only variants reveals the necessity of the dual-path architecture. The global-path-only model achieves 83.81% joint accuracy, while the graph-path-only model drops sharply to 68.49%. This result suggests that relying only on local topology aggregation may lose global fault-type-related discriminative information and may also suffer from insufficient feature separation under sparse PMU measurements.

Overall, the ablation results verify that the global feature reduction path, the K-hop spatial graph convolution path, the auxiliary supervision branch, and the reconstruction branch jointly contribute to the final fault localization performance.

5. Conclusions

This paper proposes a fault localization and identification method that integrates front-end measurement point optimization with back-end multi-task inference. Through electromagnetic transient simulations and in-depth analysis on the IEEE 39-node standard test system, we achieved efficient optimization using the DQN-CA algorithm based on an action masking mechanism, while also introducing the ST-MTGCN dual-path spatiotemporal multi-task architecture for the first time. By refining features representing fault types through global dimensionality reduction paths and reconstructing topological decay gradients indicating fault Localizations via multi-hop spatial graph paths, supplemented by a joint multi-task feature balancing strategy, the method successfully achieves an accuracy of 88.62% for 185 types of complex joint diagnostic tasks under sparse observation conditions.

Although our method demonstrates high recognition accuracy in fault type classification and fault localization, given the complexity of real-world industrial deployment environments—including the spatial configurations of complex real-world scenarios and the diversity of power system networks—its generalization capabilities remain to be tested.

Future research will focus on the interference resistance and dynamic robustness of fault localization and identification methods for complex spatial configurations under non-ideal boundary conditions (e.g., integrating spatio-temporal self-attention mechanisms with masked feature completion algorithms, and exploring robust training paradigms based on domain adaptation), with the aim of effectively eliminating distribution shifts between simulation data and on-site physical signals, and advancing the full-scale engineering implementation of fault localization methods for real-time online monitoring of large power grids.

Author Contributions

Conceptualization, Z.W. and F.S.; methodology, Z.W. and F.S.; software, Z.W. and F.S.; validation, Z.W. and H.L.; formal analysis, Z.W. and H.L.; investigation, Z.W. and F.S.; resources, Z.W. and F.S.; writing—original draft preparation, Z.W. and F.S.; writing—review and editing, Z.W., F.S., H.L. and L.R.; visualization, Z.W.; funding acquisition, F.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work is supported by the National Natural Science Foundation of China (No. U23A20651).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Deepa, B.; Hampannavar, S.; Mansani, S. Phasor Estimation Using Micro-Phasor Measurement Unit (μPMU) in Distribution Network for Situational Awareness. J. Electr. Syst. Inf. Technol. 2024, 11, 52. [Google Scholar]
Pattanaik, V.; Malika, B.K.; Panda, S.; Rout, P.K.; Sahu, B.K.; Samanta, I.S.; Bajaj, M.; Blazek, V.; Prokop, L. A Critical Review on Phasor Measurement Units Installation Planning and Application in Smart Grid Environment. Results Eng. 2024, 24, 103559. [Google Scholar] [CrossRef]
Fierro-Rincón, J.S.; Lozano-Moncada, C.A.; Gómez-Luna, E.; Grisales-Noreña, L.F.; Sanin-Villa, D. Applications of Distribution Phasor Measurement Units for the Integration of Distributed Energy Resources in Modern Distribution Networks. Appl. Syst. Innov. 2026, 9, 92. [Google Scholar] [CrossRef]
Khodadadi-Arpanahi, M.; Torkzadeh, R.; Safavizadeh, A.; Ashrafzadeh, A.; Eghtedarnia, F. A Novel Comprehensive Optimal PMU Placement Considering Practical Issues in Design and Implementation of a Wide-Area Measurement System. Electr. Power Syst. Res. 2023, 214, 108940. [Google Scholar]
Ahmed, M.M.; Amjad, M.; Qureshi, M.A.; Khan, M.O.; Haider, Z.M. Optimal PMU Placement to Enhance Observability in Transmission Networks Using ILP and Degree of Centrality. Energies 2024, 17, 2140. [Google Scholar] [CrossRef]
Ge, J.; Xu, Y.; Wu, Z.; Mili, L.; Lu, S.; Hu, Q.; Gu, W. Data-Driven Optimal PMU Placement for Power System Nonlinear Dynamics Using Koopman Approach. IEEE Trans. Ind. Inform. 2024, 20, 11306–11317. [Google Scholar] [CrossRef]
Zhou, X.; Wang, Y.; Shi, Y.; Jiang, Q.; Zhou, C.; Zheng, Z. Deep Reinforcement Learning-Based Optimal PMU Placement Considering the Degree of Power System Observability. IEEE Trans. Ind. Inform. 2024, 20, 8949–8960. [Google Scholar] [CrossRef]
Lei, X.; Li, Z.; Jiang, H.; Yu, S.S.; Chen, Y.; Liu, B.; Shi, P. Deep-Learning Based Optimal PMU Placement and Fault Classification for Power System. Expert Syst. Appl. 2025, 292, 128586. [Google Scholar] [CrossRef]
Hou, Y.; Liang, X.; Zhang, J.; Yang, Q.; Yang, A.; Wang, N. Exploring the Use of Invalid Action Masking in Reinforcement Learning: A Comparative Study of On-Policy and Off-Policy Algorithms in Real-Time Strategy Games. Appl. Sci. 2023, 13, 8283. [Google Scholar]
Fathollahi, A.; Andresen, B. Deep Deterministic Policy Gradient for Adaptive Power System Stabilization and Voltage Regulation. e-Prime—Adv. Electr. Eng. Electron. Energy 2024, 9, 100675. [Google Scholar] [CrossRef]
Lyu, Y.; Wang, H. Optimizing Evacuation for Disabled Pedestrians with Heterogeneous Speeds: A Floor Field Cellular Automaton and Reinforcement Learning Approach. Buildings 2025, 15, 4191. [Google Scholar] [CrossRef]
Teng, J.; Sun, Y.; Song, N.; Gan, Y.; Li, Y.; Hou, X. A Power System Fault Assessment Methodology from a Power-Based Perspective. In Proceedings of the 2025 International Conference on New Power System Technology (PowerCon), 2025; IEEE: New York, NY, USA, 2025. [Google Scholar]
Pérez-García, J.M.; Eguía, P.; Perea Olabarria, E.; Pujana, A.; Gyftakis, K.N.; Guerrero, J.M. Diagnosis and Protection of Ground Fault in Electrical Systems: A Comprehensive Analysis. IEEE Access 2025, 13, 189150–189176. [Google Scholar] [CrossRef]
Yadav, G.K.; Kirar, M.K.; Gupta, S.C. Adaptive Protection Based Fault Detection, Classification, and Localization in Power System. In Proceedings of the 2025 IEEE 1st International Conference on Smart and Sustainable Developments in Electrical Engineering (SSDEE), 2025; IEEE: New York, NY, USA, 2025. [Google Scholar]
Alhanaf, A.S.; Farsadi, M.; Balik, H.H. Fault Detection and Classification in Ring Power System with DG Penetration Using Hybrid CNN-LSTM. IEEE Access 2024, 12, 59953–59975. [Google Scholar] [CrossRef]
Rizeakos, V.; Bachoumis, A.; Andriopoulos, N.; Birbas, M.; Birbas, A. Deep Learning-Based Application for Fault Localization Identification and Type Classification in Active Distribution Grids. Appl. Energy 2023, 338, 120932. [Google Scholar]
Thomas, J.B.; Chaudhari, S.G.; Shihabudheen, K.V.; Verma, N.K. CNN-Based Transformer Model for Fault Detection in Power System Networks. IEEE Trans. Instrum. Meas. 2023, 72, 1–10. [Google Scholar] [CrossRef]
Sahu, R.; Panigrahi, P.K.; Lal, D.K.; Pradhan, R.; Mahanty, C. A Deep Recurrent Learning Framework for Multi-Class Microgrid Fault Classification Using LSTM and Bi-LSTM Models. Eng 2026, 7, 143. [Google Scholar]
Giri, T.; Thapa, B.B.; Paneru, B.; Paneru, B. An XAI-Driven CNN-Transformer Model for Transmission Line Fault Detection and Classification. Energy Rep. 2026, 15, 108929. [Google Scholar] [CrossRef]
Xu, C.; Wang, Z.; Jin, Y.; Nong, W. An Adaptive Industrial Large Language Model for Mechanical Fault Diagnosis under Variable Operating Conditions. Adv. Eng. Inform. 2026, 74, 104821. [Google Scholar] [CrossRef]
Ma, Y.; Zheng, S.; Yang, Z.; Pan, H.; Hong, J. A Knowledge-Graph Enhanced Large Language Model-Based Fault Diagnostic Reasoning and Maintenance Decision Support Pipeline towards Industry 5.0. Int. J. Prod. Res. 2025, 64, 5239–5260. [Google Scholar]
Lai, Y.; Wu, Z.; Chen, M.; Liu, C.; Shao, H. FR-LLM: Multi-Task Large Language Model with Signal-to-Text Encoding and Adaptive Optimization for Joint Fault Diagnosis and RUL Prediction. Reliab. Eng. Syst. Saf. 2026, 269, 112091. [Google Scholar]
Chanda, D.; Soltani, N.Y. A Heterogeneous Graph-Based Multi-Task Learning for Fault Event Diagnosis in Smart Grid. IEEE Trans. Power Syst. 2025, 40, 1427–1438. [Google Scholar]
Karabulut, B.; Manna, C.; Develder, C. Generalization of Graph Neural Network Models for Distribution Grid Fault Detection. In Proceedings of the 2025 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Toronto, ON, Canada, 2025; IEEE: New York, NY, USA, 2025; pp. 1–7. [Google Scholar]
Nguyen, B.L.H.; Vu, T.; Nguyen, T.-T.; Panwar, M.; Hovsapian, R. 1-D Convolutional Graph Convolutional Networks for Fault Detection in Distributed Energy Systems. In Proceedings of the IEEE 1st Industrial Electronics Society Annual On-Line Conference (ONCON), 2023; IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
Prasad, U.; Mohanty, S.R.; Singh, S.P.; Jagan, A. Robust Centralized Protection Scheme with AI-Based Fault Diagnosis Capabilities for Graph-Structured AC Microgrids. IEEE Trans. Smart Grid 2025, 16, 1975–1992. [Google Scholar] [CrossRef]
Rimkus, M.; Kokoszka, P.; Duan, D.; Wang, X.; Wang, H. Graph Neural Networks for the Localization of Faults in a Partially Observed Regional Transmission System. Scand. J. Stat. 2025, 52, 572–594. [Google Scholar] [CrossRef]
Ngo, Q.-H.; Nguyen, B.L.H.; Zhang, J.; Schoder, K.; Ginn, H.; Vu, T. Deep Graph Neural Network for Fault Detection and Identification in Distribution Systems. Electr. Power Syst. Res. 2025, 247, 111721. [Google Scholar] [CrossRef]
Nguyen, B.L.H.; Vu, T.V.; Nguyen, T.-T.; Panwar, M.; Hovsapian, R. Spatial-Temporal Recurrent Graph Neural Networks for Fault Diagnostics in Power Distribution Systems. IEEE Access 2023, 11, 46039–46050. [Google Scholar] [CrossRef]
Wang, S.; Xiang, X.; Zhang, J.; Liang, Z.; Li, S.; Zhong, P.; Zeng, J.; Wang, C. A Multi-Task Spatiotemporal Graph Neural Network for Transient Stability and State Prediction in Power Systems. Energies 2025, 18, 1531. [Google Scholar] [CrossRef]
Huang, S.; Li, J.; Zeng, R.; Li, Z.; Xu, J. Dynamical Graph Neural Networks for Modern Power Grid Analysis. Electronics 2026, 15, 493. [Google Scholar] [CrossRef]
Rusch, T.K.; Bronstein, M.M.; Mishra, S. A Survey on Oversmoothing in Graph Neural Networks. arXiv 2023, arXiv:2303.10993. [Google Scholar]
Chen, Z.; Lin, Z.; Chen, S.; Polyanskiy, Y.; Rigollet, P. Avoiding Oversmoothing in Deep Graph Neural Networks: A Multiplicative Ergodic Analysis. arXiv 2025, arXiv:2501.00762. [Google Scholar]
Attali, H.; Pernelle, N.; Buscaldi, D.; Malliaros, F.D. Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing. arXiv 2026, arXiv:2411.17429. [Google Scholar]
Zhang, M.; Wu, Z.; Yan, J.; Lu, R.; Guan, X. Attack-Resilient Optimal PMU Placement via Reinforcement Learning Guided Tree Search in Smart Grids. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1919–1929. [Google Scholar] [CrossRef]
Zhong, D.; Yang, Y.; Zhao, Q. No Prior Mask: Eliminate Redundant Action for Deep Reinforcement Learning. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; AAAI Press: Washington, DC, USA, 2024; pp. 17078–17086. [Google Scholar]
Huang, Z.; Liang, R.; Xiao, Y.; Fang, Z.; Li, X.; Ye, R. Simulation of Pedestrian Evacuation with Reinforcement Learning Based on a Dynamic Scanning Algorithm. Phys. A Stat. Mech. Appl. 2023, 625, 129011. [Google Scholar] [CrossRef]
Wang, F.; Hu, Z. A Novel Fault Diagnosis and Accurate Localization Method for a Power System Based on GraphSAGE Algorithm. Electronics 2025, 14, 1219. [Google Scholar] [CrossRef]
Li, T.; Wu, Z.; Liu, Y.; Jia, R. Voltage Sag Source Localization Based on Multi-Layer Perceptron and Transfer Learning. Front. Energy Res. 2023, 11, 1237239. [Google Scholar] [CrossRef]
Daisy, M.; Dashti, R.; Shaker, H.R.; Javadi, S.; Aliabadi, M.H. Fault Localization in Power Grids Using Substation Voltage Magnitude Differences: A Comprehensive Technique for Transmission Lines, Distribution Networks, and AC/DC Microgrids. Measurement 2023, 220, 113403. [Google Scholar]
Yao, R.; Bai, H.; Jiang, S.; Liu, T.; Lei, Y.; Zheng, Y. A Two-Stage Voltage Sag Source Localization Method in Microgrids. Energies 2026, 19, 258. [Google Scholar] [CrossRef]
Tang, X.; Yao, H.; Sun, Y.; Wang, Y.; Tang, J.; Aggarwal, C.; Mitra, P.; Wang, S. Investigating and Mitigating Degree-Related Biases in Graph Convolutional Networks. In ACM International Conference on Information and Knowledge Management (CIKM); ACM: New York, NY, USA, 2020. [Google Scholar]
Alenezi, M.; Anayi, F.; Packianather, M.; Shouran, M. Enhancing Transformer Protection: A Machine Learning Framework for Early Fault Detection. Sustainability 2024, 16, 10759. [Google Scholar] [CrossRef]
Zhang, S. Review of the Development of Power System Out-of-Step Splitting Control and Some Thoughts on the Impact of Large-Scale Access of Renewable Energy. Energy AI 2024, 16, 100357. [Google Scholar]
Li, Q.; Han, Z.; Wu, X.-M. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Palo Alto, CA, USA, 2018; pp. 3538–3545. [Google Scholar]
Bacho, A.; Kutyniok, G.; Maskey, S.; Paolino, R. A Fractional Graph Laplacian Approach to Oversmoothing. In Proceedings of the Thirty-Seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
Wang, Y.; Wang, L.; Zhong, W.; Wang, E.; He, B.; Lan, W. A Real-Time Fault Detection Strategy for Cables Based on Adaptive Feature Enhancement and Multi-Scale Temporal Modeling. Discov. Artif. Intell. 2025, 5, 394. [Google Scholar]
Samarawickrama, K.; Kordi, B.; Chang, L. Generator Out-of-Step Protection Using the Trajectory of Relative Speed. In Proceedings of the International Conference on Power Systems Transients (IPST), Thessaloniki, Greece, 12–15 June 2023; IPST: Hayama, Japan, 2023; pp. 1–6. [Google Scholar]

Figure 1. Offline Perception Layer: Optimal sparse PMU placement satisfying full network observability constraints via the action-masked DQN-CA agent.

Figure 2. Online Cognition Layer: Spatio-temporal feature decoupling and multi-task joint fault inference based on the ST-MTGCN dual-path architecture.

Figure 3. Optimal PMU placement scheme in the IEEE 39-bus system based on the DQN-CA algorithm (red dots indicate the extremely sparse deployment Localizations of 13 measurement nodes).

Figure 4. Training dynamics curves of the DQN-CA agent optimization process: (a) Total reward convergence curve with moving average; (b) Dynamic descending curve of PMU deployment count under the full network observability constraint.

Figure 5. Convergence curves of the best-found PMU number obtained by different OPP methods on the IEEE 39-bus system. The compared methods include Greedy search, BPSO, conventional DQN, and the proposed DQN-CA.

Figure 6. Dimension reduction and visualization analysis based on the t-SNE algorithm for the 6-dimensional normalized probability features output by the auxiliary classification head in the global dimension reduction process.

Figure 7. Confusion Matrix for 6 Categories of Macro-Level Fault Types.

Figure 8. Confusion matrices for the four types of conventional line short-circuit faults (AG, BCG, AB, ABCG) based on the ST-MTGCN model.

Figure 9. Six fault categories (AG, BCG, AB, ABCG, GT, LO) are independently projected onto the t-SNE visualization.

Figure 10. Confusion Matrix for Transformer Grounding (GT) and Loss of Magnetization/Loss of Synchronism (LO) Faults.

Figure 11. S × D joint accuracy under different PMU configurations. The compared configurations include a random 5-PMU configuration, a sequential 13-PMU configuration, the DQN-CA-optimized OPT13 configuration, and the full 39-PMU configuration.

Figure 12. Ablation analysis of the proposed ST-MTGCN on the S × D joint recognition task.

Table 1. Main hyperparameters of the ST-MTGCN model for fault diagnosis and localization.

Parameter	Values
Input tensor	(B × 3 × 13 × 30)
PMU configuration	OPT13
PMU-induced graph	3-hop graph
Number of joint classes	185
Shared encoder	6 ResBlock2D layers
Encoder channels	128–128–64–64–32–32
Global latent dimension	52
Graph path	3 GraphSAGE layers
Graph hidden dimension	384
Fusion dimension	436
Main classifier	Linear(436–128–64–185)
Auxiliary classifier	Linear(52–32–6)
Optimizer	AdamW
Weight decay	1 × 10⁻³
Gradient clipping	1 × 10⁻³
Epochs	150
Label smoothing	0.05
Loss function	Joint cross-entropy + auxiliary cross-entropy + MSE

Table 2. Simulation parameters for fault data generation.

Parameter	EMT Electromagnetic Transient	Note
System frequency	60 Hz	IEEE 39 standard
Simulation window	[−0.05, +0.2] s	t = 0 is fault injection
Fault-on time	0.02 s	via EvtShc/EvtSwitch
Fault-off time	0.08 s	duration 1.0 s/60 ms
Time step Δt	1.0 × 10⁻⁴ s	1/60 s/0.1 ms
Sampling rate fs	10 kHz	16.7 ms/0.1 ms interval
Samples per window T	30	last 30 steps
Effective length	≈3 ms	T·Δt
Observation channels	Va/Vb/Vc	RMS derived from
Fault types S	AG/BCG/AB/ABCG/GT/LO	6 classes
Fault Localizations D	39 buses + 10 gens + 19 loads	S × D = 185
Fault resistance R_f	U(4.5, 5.5) Ω	resampled per run
Fault reactance X_f	0	short-circuit faults
Load perturbation	N(1.0, 0.15), clip [0.5, 1.5]	Plini, Qlini
Samples per (S, D)	40/10	50 per pair
Dataset total	7400/1850	9250 in total

Table 3. Training parameters of the DQN-CA algorithm.

Parameter	Values
Test system	IEEE 39-bus system
Number of input neurons	39
Number of hidden neurons	128
Number of output neurons	39
Q-network structure	FC–BN–ReLU–FC–BN–ReLU–FC
Initialization method	Normal distribution
Target network update frequency	50 episodes
Exploration probability	0.9 → 0.05
decay rate	0.997
Discount factor	0.9
Learning rate	1 × 10⁻³
Optimizer	Adam
Episodes per seed	2000
Number of random seeds	50
Gradient clipping	10.0
Loss function	MSE
Checking-Action mechanism	Masking installed buses
Terminal reward	+100

Table 4. Performance comparison of representative baseline models and the proposed ST-MTGCN under the OPT13 PMU configuration.

Model	Fault Type	Localization	S × D (Joint)
SVM	89.54%	73.73%	70.49%
CNN	95.59%	78.32%	78.01%
GCN	98.76%	78.38%	78.32%
GraphSAGE	99.41%	80.54%	80.21%
LSTM	99.68%	82.00%	81.35%
Transformer-GNN	99.30%	84.38%	83.32%
ST-MTGCN	99.68%	89.94%	88.62%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, Z.; Shi, F.; Li, H.; Ran, L. Power System Fault Detection and Localization Using a Dual-Path Spatio-Temporal Multi-Task Graph Convolutional Network. Electronics 2026, 15, 2767. https://doi.org/10.3390/electronics15132767

AMA Style

Wu Z, Shi F, Li H, Ran L. Power System Fault Detection and Localization Using a Dual-Path Spatio-Temporal Multi-Task Graph Convolutional Network. Electronics. 2026; 15(13):2767. https://doi.org/10.3390/electronics15132767

Chicago/Turabian Style

Wu, Zhaoyang, Fanrong Shi, Hao Li, and Lili Ran. 2026. "Power System Fault Detection and Localization Using a Dual-Path Spatio-Temporal Multi-Task Graph Convolutional Network" Electronics 15, no. 13: 2767. https://doi.org/10.3390/electronics15132767

APA Style

Wu, Z., Shi, F., Li, H., & Ran, L. (2026). Power System Fault Detection and Localization Using a Dual-Path Spatio-Temporal Multi-Task Graph Convolutional Network. Electronics, 15(13), 2767. https://doi.org/10.3390/electronics15132767

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Power System Fault Detection and Localization Using a Dual-Path Spatio-Temporal Multi-Task Graph Convolutional Network

Abstract

1. Introduction

2. An OPP Framework Based on Deep Reinforcement Learning

2.1. Overall Architecture

2.2. An Optimal Test Point Selection Model Based on DQN-CA

3. Architecture and Mechanisms of Dual-Path Spatio-Temporal Multitask Convolutional Graph Networks

3.1. A Dual-Path Spatio-Temporal Multi-Task Convolutional Graph Network Architecture

3.1.1. Global Dimension Reduction Path

3.1.2. Paths for Spatial Graph Aggregation

3.1.3. Multitasking and Auxiliary Supervision Head

3.1.4. Multi-Task Joint Loss Functions and Physical Reconstruction

3.2. Fault Data Generation Method

4. Experiments and Analysis of Results

4.1. Data Generation and Preprocessing

4.2. Results of Optimal PMU Configuration Based on the DQN-CA Algorithm

4.2.1. Training Dynamics and OPT13 Configuration of DQN-CA

4.2.2. OPP for IEEE 39 Bus Test System

4.3. Classification of Fault Types and Fault Localization

4.3.1. Comparative Analysis of Results

4.3.2. Fault Type Identification and Evaluation

4.3.3. Assessment of Fault Localization Capabilities

4.4. Ablation Study

4.4.1. PMU Configuration Sensitivity Analysis

4.4.2. Ablation Study of the Proposed ST-MTGCN

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI