PC-LossGNN: A Physics-Consistent Spatiotemporal Graph Neural Network for Line Loss Anomaly Classification

Zhu, Xiaojing; Huang, Li; Zhou, Gan; Yang, Junyang; Duan, Chengge

doi:10.3390/sym18061052

Open AccessArticle

PC-LossGNN: A Physics-Consistent Spatiotemporal Graph Neural Network for Line Loss Anomaly Classification

by

Xiaojing Zhu

¹,

Li Huang

¹,

Gan Zhou

^1,*,

Junyang Yang

²

and

Chengge Duan

²

¹

School of Electrical Engineering, Southeast University, Nanjing 210096, China

²

School of Cyber Science and Engineering, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(6), 1052; https://doi.org/10.3390/sym18061052

Submission received: 25 April 2026 / Revised: 2 June 2026 / Accepted: 10 June 2026 / Published: 18 June 2026

(This article belongs to the Special Issue Symmetry and Asymmetry in Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

Modern distribution networks undergo frequent topology reconfiguration, volatile bi-directional flows, and noisy measurements, making five-class line-loss anomaly classification both valuable and challenging. In this study, PC-LossGNN is proposed—a physics-consistent spatiotemporal graph neural network for edge-level classification into Normal, Infrastructure, Documentation, Metering, and Theft. A static topology prior is fused with a measurement-adaptive graph and confidence-aware multi-source features; power-flow physics is injected via residual-guided attention using active/reactive balance, voltage-drop, and ohmic-loss residuals. A dual-path decoder is employed to yield calibrated probabilities and interpretable class evidence, trained under an uncertainty-weighted curriculum objective. On six months of real utility data, macro-F1 of 0.8503 and accuracy of 0.9915 are achieved, surpassing XGBoost, LSTM, GCN, STGCN, and two recent physics-aware spatiotemporal GNN baselines including ST-RGNN and PA-STGCN. Ablation indicates that physics-consistent regularization is pivotal, while adaptive topology and interactive temporal encoding further improve performance. Robustness tests with injected Gaussian noise show more graceful degradation than baselines. These results suggest that PC-LossGNN provides accurate, physically plausible, and interpretable five-way line-loss diagnostics suitable for real-world operations.

Keywords:

line-loss anomaly classification; spatiotemporal graph neural network; physics-informed learning; distribution networks

1. Introduction

With the increasing penetration of distributed generation, renewable resources, and electrified loads, the operating environment of distribution networks has become increasingly complex and uncertain. Photovoltaic units, wind power, and electric vehicles introduce bidirectional and highly fluctuating power flows, while frequent switching operations for routine maintenance, fault isolation, and load transfer lead to continuously changing topologies. These dynamics not only impose stricter requirements on supply reliability but also create new challenges for maintaining energy efficiency and operational health on the distribution side. Consequently, ensuring a secure power supply while improving efficiency has become a central objective in modern distribution network operation and management [1,2].

Among various operational indicators, line loss remains a fundamental parameter for assessing the efficiency and health of distribution networks [3]. While normal losses are mainly caused by resistive dissipation, deviations beyond reasonable ranges indicate abnormal line losses. Unlike conventional methods that merely detect anomalies, line loss anomaly classification provides further insight into their root causes [4,5]. Technical categories such as infrastructure degradation, parameter or documentation errors, and metering malfunctions, as well as non-technical causes such as electricity theft, can be distinguished. This categorization is of practical importance: it guides operators toward targeted actions—maintenance, record correction, device replacement, or anti-theft enforcement—rather than generic interventions. Without classification, anomaly sources may be obscured, leading to ineffective remedies, distorted efficiency assessments, and elevated risks in settlement and regulation. Therefore, developing effective methods for line loss anomaly classification is of great significance, as it enhances energy efficiency, improves operational and managerial decision-making, ensures fair metering and settlement, and strengthens the secure and reliable operation of distribution networks [6,7].

In recent years, research has gradually shifted from simple anomaly detection toward more refined line loss anomaly classification, which aims not only to identify anomalies but also to reveal their underlying causes [8,9]. Existing approaches can be broadly categorized into several groups. Rule-based and statistical methods rely on predefined thresholds or statistical learning of line loss rates to separate technical from non-technical categories [10,11]. Physics-based methods employ power flow equations, DistFlow models, or impedance-based calculations to distinguish infrastructure-related issues from discrepancies caused by metering or documentation errors. Data-driven machine learning methods, such as decision trees, random forests, and support vector machines (SVM), learn discriminative patterns from historical operational data, whereas deep learning models, including convolutional networks, autoencoders, and graph neural networks (GNNs), exploit spatiotemporal correlations and topology information to infer fine-grained anomaly categories [12,13]. More recently, semi-supervised learning and hierarchical classification frameworks have been introduced to alleviate label scarcity and to align the classification process with practical diagnostic hierarchies in utilities [14,15].

Although significant progress has been made in line loss classification, existing approaches still exhibit notable limitations in practical applications. First, most methods remain constrained to a binary separation between technical and non-technical losses, lacking the capability to distinguish finer-grained categories such as infrastructure degradation, documentation errors, metering faults, and theft, which limits their operational usefulness [16]. Second, current models typically rely on fixed topology assumptions, whereas real distribution networks undergo frequent reconfigurations due to maintenance, load transfer, and fault isolation, making static models unsuitable for dynamic environments. Third, the uncertainty of measurement data, including noise, missing values, and inconsistent device accuracy, is often insufficiently modeled, undermining the stability of classification outcomes under complex conditions [17]. Moreover, many approaches ignore physics consistency constraints during training, relying purely on data-driven features, which may lead to predictions that contradict power balance or loss mechanisms [18,19]. Finally, interpretability remains a critical challenge, as most classifiers can only output labels without providing transparent “class–evidence” explanations, thereby reducing their value for practical decision-making [20,21,22].

These limitations highlight the urgent need for a line loss classification framework that can achieve multi-class granularity, account for topology dynamics and data uncertainty, incorporate physics-consistent constraints, and deliver interpretable results, in order to meet the comprehensive requirements of robustness, reliability, and operational applicability in distribution networks.

To address the limitations of existing line loss classification methods—namely, their coarse binary separation between technical and non-technical losses, reliance on fixed topology assumptions, insufficient modeling of measurement uncertainty, and lack of interpretability—this paper proposes a physics-consistent spatiotemporal graph neural network framework, as shown in Figure 1, termed PC-LossGNN. In the encoding stage, the framework integrates static priors with dynamic topology features, introduces power-flow-based physical regularization, and employs residual-guided attention to extract discriminative spatiotemporal features under noisy and incomplete measurements. In the decoding stage, a dual-path classifier is designed, combining Softmax-based probabilistic prediction with prototype embedding, to enable fine-grained classification of five categories: normal, infrastructure anomaly, documentation anomaly, metering anomaly, and theft. Furthermore, physics-informed constraints derived from power balance, voltage drop, and Ohmic loss residuals are incorporated, while a curriculum-inspired optimization strategy progressively reinforces constraint learning, thereby improving robustness and ensuring physical consistency.

The methodological novelty of PC-LossGNN does not lie in independently using adaptive graph learning, spatiotemporal graph convolution, prototype classification, or physics-informed regularization. Instead, it lies in reformulating line-loss anomaly classification as a physics-constrained edge-level diagnostic task, where topology adaptation, spatiotemporal representation learning, residual-guided attention, and class-evidence decoding are jointly coupled around the physical mechanisms of line-loss deviations. Compared with generic STGNN-based anomaly classifiers, PC-LossGNN explicitly links the learned graph representation to branch power-flow residuals, embeds physical consistency into the training objective rather than only using physics-related quantities as auxiliary inputs, and produces prototype-based class evidence for five operationally meaningful line-loss states.

The main contributions of this paper are summarized as follows:

(1): A physics-consistent edge-level STGNN framework is developed for five-class line-loss anomaly diagnosis. Unlike generic STGNN anomaly classifiers, the proposed PC-LossGNN embeds branch-level power balance, voltage-drop, and ohmic-loss residuals into both representation learning and model optimization, thereby aligning learned features with the physical mechanisms of line-loss deviations.
(2): A topology-adaptive spatiotemporal encoder is designed by fusing the static electrical topology prior with a measurement-adaptive graph. This design enables the model to capture operating-condition-dependent coupling among branches while retaining physically feasible connectivity constraints.
(3): A dual-path decoder combining probabilistic prediction and prototype-based class evidence is introduced for fine-grained anomaly interpretation. The decoder not only improves classification performance under severe class imbalance but also provides interpretable evidence for distinguishing Normal, Infrastructure, Documentation, Metering, and Theft states.

The remainder of this paper is organized as follows. Section 2 establishes the preliminaries for five-class line-loss anomaly classification, including the problem formulation, notation, taxonomy, and physics-based residuals used in distribution networks. Section 3 details the proposed PC-LossGNN methodology, covering topology priors, measurement-adaptive graph construction, residual-guided attention, physics-consistent regularization, and the training protocol. Section 4 reports the experimental validation on real utility data and benchmark models, together with ablation studies, robustness and calibration analyses, interpretability assessments, and computational efficiency. Section 5 concludes the paper and outlines limitations and potential avenues for future research.

2. Preliminaries

2.1. Branch Power Flow Model

Power flow equations describe the fundamental physical laws governing the transfer of active and reactive power in distribution networks. These equations ensure the conservation of energy across branches, capturing the interdependence among nodal voltages, line parameters, and branch currents. As shown in Figure 2, the active and reactive power flows at each bus are constrained by Kirchhoff’s current and voltage laws, while line losses are explicitly represented through the resistance and reactance of conductors. As a result, the branch power flow model provides a physically interpretable foundation for state estimation and loss analysis, and serves as a key prior for anomaly classification.

The branch power balance can be expressed as:

P_{j k, t} = P_{i j, t} - I_{i j, t}^{2} r_{i j} - P_{j, t}

(1)

Q_{j k, t} = Q_{i j, t} - I_{i j, t}^{2} x_{i j} - Q_{j, t}

(2)

The voltage constraint is given by:

U_{j, t}^{2} = U_{i, t}^{2} - 2 (P_{i j, t} r_{i j} + Q_{i j, t} x_{i j}) + (r_{i j}^{2} + x_{i j}^{2}) I_{i j, t}^{2}

(3)

The current–power relationship is:

I_{i j, t}^{2} = (P_{i j, t}^{2} + Q_{i j, t}^{2}) / U_{i, t}^{2}

(4)

where

P_{j k, t}

and

Q_{j k, t}

denote the active and reactive power at the sending end of branch

j k

at time

t

;

P_{i j, t}

and

Q_{i j, t}

denote the active and reactive power at the sending end of branch

i j

at time

t

;

x_{i j}

is the series reactance of branch

i j

;

U_{i, t}

and

U_{j, t}

are the voltage magnitudes at buses

i

and

j

at time

t

; and

P_{j, t}

and

Q_{j, t}

are the net active and reactive power injections at bus

j

at time

t

.

2.2. Line Loss Mechanism and Anomaly Patterns

Line losses in distribution networks primarily originate from the Joule effect in conductors and reactive power circulation. Under normal operating conditions, the active power loss of a branch is proportional to the square of the line current and the resistance, while the reactive power loss is associated with the line reactance. However, abnormal loss patterns may arise due to non-technical factors such as electricity theft, conductor aging, or poor contact resistance, which introduce deviations between the measured and theoretically expected losses. Accurate modeling of both normal and abnormal loss mechanisms is therefore critical for anomaly detection. A condensed taxonomy covering the five line-loss states-Normal, Infrastructure, Documentation, Metering, and Theft-together with key physics-based indicators and typical signatures, is summarized in Table 1.

The active power loss on branch

i j

at time

t

can be expressed as:

P_{l o s s, i j, t} = I_{i j, t}^{2} r_{i j}

(5)

The reactive power loss is:

Q_{l o s s, i j, t} = I_{i j, t}^{2} x_{i j}

(6)

The total complex loss is then given by:

S_{l o s s, i j, t} = P_{l o s s, i j, t} + j Q_{l o s s, i j, t}

(7)

Alternatively, the active loss can also be formulated in terms of branch power flow:

P_{l o s s, i j, t} = P_{i j, t}^{i n} - P_{i j, t}^{o u t}

(8)

The relative loss rate, which is often used as an indicator of anomaly, is defined as:

η_{i j, t} = \frac{P_{l o s s, i j, t}}{P_{i j, t}^{i n}}

(9)

where

η_{i j, t}

deviates significantly from the expected range determined by line parameters and operating conditions, the corresponding branch can be classified as an anomalous line.

2.3. Temporal Correlation Features

The operating states of distribution networks exhibit strong temporal correlations due to the inertia of electrical loads and the periodicity of renewable generation. In practice, line currents, power flows, and voltage magnitudes at a given time are strongly influenced by their historical values. Such correlations provide an important prior for modeling line loss anomalies, since abnormal losses often manifest as deviations from normal temporal patterns.

The temporal evolution of a state variable

x_{t}

can be represented by an autoregressive formulation:

x_{t} = α_{1} x_{t - 1} + α_{2} x_{t - 2} + α_{3} x_{t - 3} + \dots + α_{p} x_{t - p} + ϵ_{p}

(10)

For a branch

i j

, the time series of active power flows can be written as:

P_{i j, t} = f (P_{i j, t - 1}, P_{i j, t - 2}, \dots, P_{i j, t - p}) + ϵ_{i j, t}

(11)

Similarly, the line current sequence satisfies:

I_{i j, t}^{2} = g (I_{i j, t - 1}^{2}, I_{i j, t - 2}^{2}, \dots, I_{i j, t - p}^{2}) + ϵ_{i j, t}^{'}

(12)

To quantify the strength of temporal dependence, the autocorrelation coefficient is commonly adopted:

ρ (τ) = \frac{E [(x_{t} - μ) (x_{t - τ} - μ)]}{σ^{2}}

(13)

A significant deviation of real measurements from these temporal patterns may indicate abnormal conditions, providing an essential temporal prior for anomaly detection.

2.4. Graph-Based Modeling of Power Systems

The distribution network can be naturally represented as a graph, where buses are modeled as nodes and transmission or distribution lines are modeled as edges. This representation explicitly encodes electrical connectivity and facilitates the application of graph neural networks (GNNs) to capture spatial dependencies. Each node is associated with state features such as voltage magnitude, phase angle, and net power injection, while each edge is associated with line parameters and power flows. As illustrated in Figure 3, the physical single-line diagram of a radial feeder with a lateral (panel a) is mapped to its graph abstraction (panel b), in which buses 1–10 are treated as nodes, and the energized branches are encoded as edges.

Formally, a distribution system is abstracted as a graph:

G = (V, E, A)

(14)

where

V

is the set of nodes,

E

is the set of edges, and

A \in R^{|V| \times |V|}

is the adjacency matrix indicating electrical connectivity.

The feature vector of node

i

at time

t

is defined as:

x_{i, t} = [U_{i, t}, θ_{i, t}, P_{i, t}, Q_{i, t}]

(15)

For each edge

i j

, the corresponding feature vector is:

e_{i j, t} = [r_{i j, t}, x_{i j, t}, P_{i j, t}, Q_{i j, t}, I_{i j, t}]

(16)

The propagation rule of a generic GNN layer can then be expressed as:

h_{i}^{(l + 1)} = σ [\sum_{j \in N (i)} ϕ (h_{i}^{(l)}, h_{j}^{(l)}, e_{i j})]

(17)

where

h_{i}^{(l)}

denotes the hidden representation of node

i

at the

i

-th layer,

N (i)

denotes its neighbors,

ϕ (\cdot)

is a learnable message function, and

σ (\cdot)

is a nonlinear activation.

Through this graph-based formulation, the spatial interactions of line losses can be effectively modeled, and abnormal patterns can be identified in the context of network-wide correlations.

3. Methodology

A physics-consistent spatiotemporal graph neural network, PC-LossGNN, is developed for five-way edge classification of line-loss states (Normal, Infrastructure, Documentation, Metering, Theft). The architecture is built upon physics-aware and dynamically adaptive encoders, in which a static-prior plus measurement-driven dynamic graph, an interactive spatiotemporal fusion encoder based on odd-even temporal splitting, and confidence-aware inputs are employed. Predictions are regularized by explicit line-loss formulations to preserve physical consistency under sparse or noisy measurements.

3.1. Task Formulation and Feature Construction

Multi-source node/edge measurements together with confidence channels are organized as shown in Figure 4 to form the confidence-aware inputs

\tilde{X}

and

\tilde{Z}

. As shown in Figure 4, node measurements and edge measurements are organized as two parallel input tensors over the same sliding historical window from t − T + 1 to t. The time axis denotes only the temporal dimension within each tensor and does not indicate any sequential dependency between node and edge measurements.

A distribution system is modeled as a graph

G = (V, E, A)

with

|V| = N

buses and

|E|

branches. At time

t

, node-wise and edge-wise measurements within a sliding window of length

T

are arranged as

X_{t - T + 1 : t} \in R^{N \times C \times T}, Z_{t - T + 1 : t} \in R^{| E | \times C_{e} \times T}

(18)

Here,

| E |

denotes the number of branches, T denotes the length of the sliding temporal window, and

C

and

C_{e}

denote the numbers of node-measurement and edge-measurement feature channels, respectively. This definition clarifies the dimensional meaning of the node and edge features before constructing the confidence-aware inputs.

Measurement heterogeneity is encoded by confidence channels (from device accuracy and pseudo-measurements) that are broadcast in time and concatenated with raw features, yielding confidence-aware inputs

\tilde{X}

,

\tilde{Z}

.

For each branch

i j

and time

t

, a label

y_{i j, t} \in \{0,1, 2,3, 4\}

(normal/4 anomaly types) is assigned. Edge features include static parameters and dynamic flows together with loss cues,

e_{i j, t} = [\begin{matrix} r_{i j}, x_{i j}, P_{i j, t}, Q_{i j, t}, I_{i j, t}, P_{i j, t}^{i n} - P_{i j, t}^{o u t}, r_{i j} I_{i j, t}^{2} \end{matrix}]

(19)

so that both measured and ohmic losses are available to the model. The objective is a mapping

F_{Θ} : (\tilde{X}, \tilde{Z}) \mapsto {\hat{p}}_{i j, t} \in Δ^{4}

, where

{\hat{p}}_{i j, t}

is the five-class probability vector for branch

i j

at time

t

.

3.2. Physics-Consistent Spatiotemporal Graph Modeling

To reflect electrical connectivity while adapting to operating conditions, a static prior adjacency

A^{b a s e}

(covering all feasible lines) is combined with a dynamic adjacency inferred from measurements. The dynamic graph is parameterized by node embeddings

E_{1}, E_{2} \in R^{N \times d}

obtained from

\tilde{X}

:

A_{t}^{a d p} = S o f t M a x (R e L U (E_{1} E_{1}^{⊤})) ⊙ S o f t M a x (E_{1} E_{2}^{⊤})

(20)

{\tilde{A}}_{t} = σ (τ) A^{b a s e} + (1 - σ (τ)) A_{t}^{a d p}

(21)

Here, ⊙ denotes element-wise Hadamard multiplication, SoftMax is applied row-wise, and the mask induced by

A^{b a s e}

is used to exclude non-electrical links. The same ⊙ notation is used in the subsequent even-odd temporal interaction and endpoint-embedding interaction equations.

It should be noted that the adaptive adjacency matrix in this study is not an unconstrained statistical correlation graph, nor is it interpreted as a direct substitute for the impedance matrix or analytical power-flow sensitivity matrix. The static topology prior and the electrical connectivity mask first restrict the feasible support of graph learning by excluding node pairs without electrical connections. Within this physically feasible support, the measurement-driven adaptive adjacency further adjusts the coupling strength between neighboring nodes or branches according to voltage, power-flow, current, and loss-related features observed in the current time window. Therefore, the learned adaptive edge weights can be understood as a data-driven approximation of operating-condition-dependent physical interactions: network topology and line impedance determine the possible coupling paths, whereas load/generation variations, voltage drops, and power-flow redistribution change the effective influence strength along these paths. In this sense, the adaptive adjacency complements the static topology by representing dynamic coupling variations that cannot be fully captured by a fixed graph, rather than replacing the electrical parameters in conventional power-flow models.

The resulting block is schematized in Figure 5. The topology prior and the adaptive adjacency are fused into the final graph structure, where the former fixes the physically feasible connectivity support and the latter reweights coupling strengths within this support according to the current measurement window. Forward/backward multi-hop diffusion with learnable weights and residual-based gating then produces the updated node states.

Spatiotemporal encoding is performed by STHGC (spatiotemporal hybrid graph convolution) together with an interactive temporal branching module. After temporal convolution over the sliding window, diffusion-style aggregation with forward/backward walks is applied:

H^{(l + 1)} = \sum_{k = 0}^{K} {({\hat{D}}_{t}^{- 1} {\tilde{A}}_{t})}^{k} H^{(l)} W_{k}^{(l)} + \sum_{k = 1}^{K} {({\hat{D}}_{t}^{- 1} {\tilde{A}}_{t}^{⊤})}^{k} H^{(l)} V_{k}^{(l)}

(22)

where the degree matrix corresponds to the graph structure used in diffusion. To capture complementary dependencies between adjacent and interleaved time steps, the temporal window is split into even and odd subsequences, each processed by STHGC and then coupled through cross-stream multiplicative interaction and recombination:

H^{{o d d}^{'}} = S T H G C (H^{o d d}) ⊙ H^{e v e n}, H^{{e v e n}^{'}} = S T H G C (H^{e v e n}) ⊙ H^{o d d}

(23)

H^{{o d d}^{″}} = S T H G C (H^{{o d d}^{'}}) + H^{{e v e n}^{'}}, H^{{e v e n}^{''}} = S T H G C (H^{{e v e n}^{'}}) + H^{{o d d}^{'}}

(24)

then upsampled and interleaved to yield

H^{e n c} \in R^{N \times d \times T}

. Edge-time embeddings

G \in R^{| E | \times d \times T}

are obtained by temporal encoding of

\tilde{Z}

.

The interactive spatiotemporal fusion encoder (ISTFE) block is illustrated in Figure 6, where even/odd temporal streams undergo STHGC, cross-stream multiplicative interaction, residual recombination, and upsampling-interleaving to produce the final node-time embeddings.

It should be noted that ISTFE is not intended to constitute a rigorous hierarchical multiresolution decomposition in the signal-processing sense. Unlike wavelet decomposition, which separates frequency bands using predefined basis functions, temporal pyramids, which construct hierarchical temporal scales through multiple layers or window lengths, or frequency-domain methods, which analyze periodic components through spectral transforms, ISTFE adopts a lightweight odd-even two-stream interaction. The purpose of this design is to enhance the modeling of interactions among adjacent samples, interleaved samples, and short-term abrupt patterns while retaining end-to-end trainability and low computational overhead. Therefore, in the revised manuscript, ISTFE is described as an interactive temporal encoding module rather than as a strict multiresolution signal decomposition method.

3.3. Edge Representation and Five-Class Decoding

A symmetric edge representation is formed by fusing end-node and edge features:

h_{i j, t} = ϕ ([h_{i, t}^{e n c} ∥ h_{j, t}^{e n c} ∥ | h_{i, t}^{e n c} - h_{j, t}^{e n c} | ∥ h_{i, t}^{e n c} ⊙ h_{j, t}^{e n c} ∥ g_{i j, t}])

(25)

where

ϕ (\cdot)

is an MLP with gated linear units and

g_{i j, t}

is the edge-time feature from

G

. To encourage physical plausibility, attention weights are modulated by branch residuals (defined in Section 3.4):

α_{i j, t} \propto e x p (ψ (h_{i j, t}) - η ρ_{i j, t}^{p h y s}), {\bar{h}}_{i j, t} = α_{i j, t} h_{i j, t}

(26)

with

ψ (\cdot)

a scalar scorer and

η > 0

.

Five-class probabilities are produced by a hybrid decoder combining a softmax head and a prototype (center) head:

{\hat{p}}_{i j, t}^{m l p} = s o f t m a x (W {\bar{h}}_{i j, t} + b)

(27)

{\hat{p}}_{i j, t}^{p r o t o} = s o f t m a x (- \frac{1}{τ} {[∥ M {\bar{h}}_{i j, t} - μ_{c} ∥_{2}^{2}]}_{c = 0}^{4})

(28)

{\hat{p}}_{i j, t} = λ {\hat{p}}_{i j, t}^{m l p} + (1 - λ) {\hat{p}}_{i j, t}^{p r o t o}

(29)

where

μ_{c}

are learnable prototypes for Normal/Infrastructure/Documentation/Metering/Theft,

τ

is a temperature and

λ \in [0,1]

.

The overall decoder is schematized in Figure 7: the symmetric edge representation in Equation (25) is gated by physics residuals in Equation (26) and passed to two parallel heads Equations (27) and (28); their outputs are mixed as in Equation (29), while an auxiliary regression toward the ohmic target

r_{i j} I_{i j, t}^{2}

contributes to

L_{p h y s}

.

3.4. Learning Objective, Regularization, and Training Protocol

Physics residuals are constructed from branch identities (active/reactive conservation, voltage drop, ohmic loss equivalence). With

P_{i j, t}^{i n / o u t}

,

Q_{i j, t}^{i n / o u t}

,

U_{i, t}

,

I_{i j, t}

and

(r_{i j}, x_{i j})

,

R_{i j, t}^{P} = P_{i j, t}^{i n} - P_{i j, t}^{o u t} - I_{i j, t}^{2} r_{i j} - P_{j, t}

(30)

R_{i j, t}^{Q} = Q_{i j, t}^{i n} - Q_{i j, t}^{o u t} - I_{i j, t}^{2} x_{i j} - Q_{j, t}

(31)

R_{i j, t}^{U} = U_{j, t}^{2} - U_{i, t}^{2} + 2 (P_{i j, t} r_{i j} + Q_{i j, t} x_{i j}) - (r_{i j}^{2} + x_{i j}^{2}) I_{i j, t}^{2}

(32)

R_{i j, t}^{L} = (P_{i j, t}^{i n} - P_{i j, t}^{o u t}) - r_{i j} I_{i j, t}^{2}

(33)

Although

R_{i j, t}^{P}

and

R_{i j, t}^{L}

both involve the branch active-power difference and the ohmic-loss term,

R_{i j, t}^{P}

is used to measure branch-level active-power balance with downstream injection, whereas

R_{i j, t}^{L}

directly measures the consistency between the measured active-power drop and the theoretical ohmic loss.

An auxiliary scalar head predicts

{\hat{P}}_{l o s s, i j, t}

from

{\bar{h}}_{i j, t}

to fit

{\tilde{P}}_{l o s s, i j, t} : = r_{i j} I_{i j, t}^{2}

. The physics loss is

L_{p h y s} = \sum_{i j, t} ω_{i j, t} (\begin{matrix} ∥ R_{i j, t}^{P} ∥_{1} + ∥ R_{i j, t}^{Q} ∥_{1} + γ_{U} ∥ R_{i j, t}^{U} ∥_{1} + \\ γ_{L} ∥ R_{i j, t}^{L} ∥_{1} + γ_{p} | {\hat{P}}_{l o s s, i j, t} - {\tilde{P}}_{l o s s, i j, t} | \end{matrix})

(34)

where

ω_{i j, t}

reflects measurement confidence.

For five-class supervision, a class-balanced focal cross-entropy is employed to handle skewed priors (e.g., Theft being rare):

L_{c l s} = - \sum_{i j, t} \sum_{c = 0}^{4} α_{c} {(1 - {\hat{p}}_{i j, t}^{c})}^{γ} y_{i j, t}^{c} \log {\hat{p}}_{i j, t}^{c}, α_{c} = \frac{1 - β}{1 - β^{n_{c}}}

(35)

with class counts

n_{c}

,

γ \in [1,3]

,

β \in (0,1)

. In a highly imbalanced dataset, standard cross-entropy is easily dominated by Normal samples. Because the Normal class accounts for the overwhelming majority of instances, optimization tends to reduce the empirical risk of the majority class first, shifting the decision boundary toward the minority anomaly classes. This behavior may preserve high overall accuracy but suppress the recall of Infrastructure, Documentation, Metering, and Theft samples. The class-balanced weights used in this study increase the effective gradient contribution of minority classes, while the focal modulation further down-weights easily classified Normal samples and emphasizes hard or rare anomaly samples. Therefore, the loss function is not only intended to improve aggregate performance, but also to mitigate decision-boundary bias caused by severe class imbalance. To encourage temporal consistency while allowing sharp transitions when physics deviates, a residual-aware smoothing is added:

L_{t e m p} = \sum_{(i, j)} \sum_{t > t_{0}} e^{- κ {\bar{ρ}}_{i j, t}^{p h y s}} {∥{\hat{p}}_{i j, t} - {\hat{p}}_{i j, t - 1}∥}_{1}

(36)

where

{\bar{ρ}}_{i j, t}^{p h y s}

is a normalized mixture of residuals in (30) and (33). A prototype separation term sharpens class structure in the embedding space:

L_{p r o t o} = \sum_{(i, j), t} {(∥ M {\bar{h}}_{i j, t} - μ_{y_{i j, t}} ∥_{2}^{2} - δ \underset{\begin{matrix} c \neq y_{i j, t} \end{matrix}}{m i n} ∥ M {\bar{h}}_{i j, t} - μ_{c} ∥_{2}^{2})}_{+}

(37)

The overall objective adopts uncertainty-based weighting and a curriculum on physics:

L = \frac{1}{2 σ_{c l s}^{2}} L_{c l s} + \frac{1}{2 σ_{p h y s}^{2}} L_{p h y s} + \frac{1}{2 σ_{t e m p}^{2}} L_{t e m p} + λ_{p r o t o} L_{p r o t o} + l o g (σ_{c l s} σ_{p h y s} σ_{t e m p})

(38)

Figure 8 summarizes the end-to-end pipeline. Confidence-aware node/edge windows are prepared and a fused adjacency

{\tilde{A}}_{t}

is obtained from the static prior and the adaptive graph. Node embeddings

H^{e n c}

(ISTFE + STHGC) and edge embeddings

G

are combined to form edge representations

h_{i j, t}

, which are modulated by physics-residual–aware attention. Two parallel heads (MLP softmax and prototype) are mixed to produce the five-class probability

{\hat{p}}_{i j, t}

, while an auxiliary regression toward the ohmic target

r_{i j} I_{i j, t}^{2}

and residual channels

\{R^{L}, R^{Q}, R^{U}, R^{L}\}

shape the physics loss. The total objective aggregates

L_{c l s}

,

L_{p h y s}

,

L_{t e m p}

,

L_{p r o t o}

with uncertainty-based weighting as in Equation (38).

Under this formulation, Infrastructure anomalies tend to present increased ohmic inconsistencies (

R^{L}

) and voltage-drop deviations (

R^{U}

); Documentation anomalies are characterized by persistent topology/parameter mismatch reflected as systematic residuals despite stable temporal patterns; Metering anomalies co-locate with confidence-weighted residual spikes and local disagreement between measured and ohmic losses; Theft typically yields directional flow discrepancies (large

P_{i j, t}^{i n} - P_{i j, t}^{o u t}

unexplained by

r_{i j} I_{i j, t}^{2}

) with sharp temporal onsets, which the residual-aware attention and smoothing are designed to expose.

4. Experiments

This section presents a series of comprehensive experiments designed to rigorously evaluate the performance of the proposed Physics-Consistent Spatiotemporal Graph Neural Network (PC-LossGNN) for line loss anomaly classification and categorization. The experimental exposition is structured as follows. Section 4.1 details the dataset, implementation specifics, and the experimental environment. Section 4.2 specifies the evaluation metrics and the baseline models employed for comparative analysis. The core performance results of our proposed model are presented and analyzed in Section 4.3. To comprehensively benchmark the model’s efficacy, Section 4.4 provides a comparative analysis against several baseline models. In Section 4.5, ablation studies are conducted to validate the contributions of the model’s key components, such as the physics-consistent loss and spatiotemporal encoders. Section 4.6 further investigates the model’s robustness under varying degrees of measurement noise. Finally, Section 4.7 offers a visual analysis of the learned feature representations to intuitively demonstrate the effectiveness of our model.

4.1. Data Source and Implementation Details

The dataset used in this study was constructed from real-world operational records of a distribution network in China collected from May 2022 to November 2022. The network segment includes 15 primary feeders, 1175 buses, and 1210 branches, serving residential, industrial, and commercial loads. Static data were obtained from topology records and asset-management files of the utility, including the base adjacency matrix, branch connectivity, and the resistance and reactance parameters of each line. Dynamic data were collected at a 15 min resolution, including bus-side voltage magnitudes, currents, active/reactive power, and branch-related power-flow and loss quantities. Node features were constructed from bus-side measurements and confidence channels, whereas edge features were mapped to the corresponding branches by associating terminal measurements, line parameters, and power-flow/loss quantities with each branch. This procedure yields node-time and edge-time samples consistent with the model inputs.

The supervisory labels were constructed from historical maintenance records, manual inspection reports, metering anomaly work orders, and physically constrained samples generated for rare events. For real event records, we extracted the event type, affected device or branch, start and end times, and inspection outcome, and then mapped each event to the corresponding branch index and 15 min time intervals. If a branch-time instance fell within a confirmed event window, it was labeled as Infrastructure, Documentation, Metering, or Theft according to the five-class taxonomy. Instances without evidence from maintenance, inspection, or anomaly work orders and with normal physical residuals were labeled as Normal. Records with ambiguous time boundaries, conflicting event types, or insufficient evidence were not directly used as supervised labels, thereby reducing the impact of label noise on model training.

Because theft events and some metering anomalies are rare in real systems, require long verification cycles, and are difficult to label exhaustively, physically constrained anomaly injection based on real operating windows was used to supplement the training samples. The injection process does not alter the base topology or line parameters; instead, it introduces anomaly-consistent perturbations within the empirical ranges of real load, voltage, and power-flow conditions. For example, theft samples are generated by introducing unmetered load or branch-side active-power deficits, which create persistent directional discrepancies between measured power drop and expected ohmic loss. Metering samples are generated by introducing metering bias, scaling errors, or short-term reading anomalies, which create local inconsistency between measurements and physics-based residuals. The injection parameters were sampled from empirical operating ranges and screened using power-balance, voltage-drop, and ohmic-loss residual checks to avoid physically implausible samples. The final experimental dataset contains approximately 10 million time-step-branch instances, with the following class distribution: Normal 98.5%, Metering 0.8%, Infrastructure 0.4%, Documentation 0.25%, and Theft 0.05%, reflecting the severe class imbalance typical of practical anomaly detection tasks.

All experiments were conducted on a workstation equipped with an Intel Core i7-12700K CPU, 64 GB of RAM, and an NVIDIA GeForce RTX 3080 Ti GPU. The PC-LossGNN model was implemented in Python 3.9 using PyTorch and the PyTorch Geometric library. The dataset was chronologically partitioned into training (first 4 months), validation (5th month), and testing (final month) sets. Key hyperparameters were optimized via grid search on the validation set, including a learning rate of 0.001, a sliding window length T of 96 steps (24 h), and a GNN layer depth of 3. To strengthen the statistical reliability of the reported results, all experiments in this study were repeated five times using distinct random seeds that control the initialization of model parameters, the order of mini-batch sampling, and the stochastic behavior of dropout layers. Unless otherwise stated, the performance metrics reported in the following subsections represent the mean and standard deviation computed across these five independent runs.

4.2. Evaluation Metrics and Baselines

Given the highly imbalanced nature of the classification task, a comprehensive suite of evaluation metrics was employed. These include Accuracy, Precision, Recall, and the F1-Score. We focus on the macro-averaged versions of these metrics, which assign equal weight to each class, thereby providing a more faithful assessment of the model’s performance on minority classes.

In this task, accuracy mainly reflects the model’s ability to identify the Normal class, because Normal instances account for 98.5% of all samples. If accuracy alone is used, a model may still obtain a high score even when most minority anomalies are missed. Therefore, this study emphasizes macro-averaged precision, recall, and F1-score, and further reports per-class F1-scores to examine whether minority anomaly recognition is genuinely improved. In particular, the Theft and Documentation classes have very limited samples and partially overlapping physical signatures with other anomalies; hence, per-class recall and F1-score are more informative than overall accuracy for evaluating practical operational value.

To benchmark our proposed PC-LossGNN model, we compare its performance against a set of representative baseline algorithms. These include a non-GNN model (XGBoost) trained on flattened time-series features; a purely temporal model (LSTM) that ignores spatial correlations; a purely spatial model (GCN) that lacks a temporal modeling component; and a well-established spatiotemporal GNN (STGCN) adapted for the edge classification task. Following the recommendation of the reviewers, two recent physics-aware spatiotemporal GNN baselines are additionally included to more rigorously position the proposed framework against the state of the art. The first is ST-RGNN [23], a spatiotemporal recurrent graph neural network designed for fault diagnostics in power distribution systems. ST-RGNN combines GRU-based temporal encoding with graph convolution and employs a hierarchical detection, classification, and location structure that is architecturally analogous to the present task. The second is PA-STGCN [24], a physics-aware spatiotemporal graph convolutional network that integrates power-flow equation residuals as auxiliary input features into the graph learning process. Both models were re-implemented in PyTorch Geometric using the architectures and hyperparameter settings reported in the original publications and were trained on the same dataset splits, preprocessing pipeline, and evaluation protocol as all other baselines.

4.3. Performance of the Proposed Model

The proposed PC-LossGNN model demonstrated robust performance on the test set, with the aggregated results over five independent runs summarized in Table 2. The model achieved a mean overall accuracy of 0.9914 ± 0.0001, a metric largely driven by its reliable classification of the dominant Normal class. More informative for the imbalanced setting are the macro-averaged metrics, which assign equal weight to each of the five categories. The model attained a macro-averaged F1-Score of 0.8503 ± 0.0039, indicating a competent and well-balanced classification performance across all five classes, including the rare anomaly types. The small standard deviation confirms that this result is stable across different random initializations and is not dependent on a single favorable training trajectory. The corresponding 5 × 5 confusion matrix of the proposed model on the test set is shown in Figure 9.

To situate the performance of PC-LossGNN within a broader context, it was benchmarked against six baseline models, including two recent physics-aware spatiotemporal GNNs. The quantitative results are presented in Table 3, and the distributions of macro F1-Scores across five independent runs are visualized in Figure 10 and Figure 11.

The analysis reveals a clear performance hierarchy that provides insight into the relative importance of different modeling capabilities. The non-spatiotemporal baselines achieved macro F1-Scores of 0.6659, 0.7086, and 0.7465 for XGBoost, LSTM, and GCN, respectively. This progression confirms that explicitly modeling spatial or temporal dependencies individually yields incremental gains, but addressing either dimension in isolation is insufficient for capturing the complex dynamics of line loss anomalies.

Among the spatiotemporal models, vanilla STGCN attains a macro F1-Score of 0.8016 ± 0.0075, which already represents a substantial improvement over the single-domain baselines. The two recently proposed physics-aware GNNs build upon this foundation. ST-RGNN achieves 0.8099 ± 0.0068, an improvement of 0.8 percentage points over STGCN. This gain can be attributed to its GRU-based recurrent temporal encoding, which captures longer-range dependencies in anomaly signatures compared to the fixed convolutional kernels used in STGCN. PA-STGCN further advances the performance to 0.8212 ± 0.0048, representing a 2.0 percentage point improvement over STGCN. The additional gain of PA-STGCN over ST-RGNN demonstrates that incorporating power-flow residuals as auxiliary input features provides useful inductive bias for distinguishing anomaly types whose electrical signatures differ in their physical mechanisms.

Nonetheless, PC-LossGNN substantially outperforms both physics-aware baselines, achieving a macro F1-Score of 0.8503. This represents an improvement of 5.2 percentage points over STGCN and 2.9 percentage points over PA-STGCN. Three factors account for this additional gain. First, PC-LossGNN does not merely use physics residuals as input features but embeds physics consistency directly into the training objective through L_phys, which constrains the learned representations to satisfy power balance and ohmic loss relationships. Second, the measurement-adaptive adjacency mechanism allows the graph topology to evolve with operating conditions, whereas both ST-RGNN and PA-STGCN rely on fixed graph structures that cannot capture transient topological changes. Third, the dual-path decoder with class-balanced focal loss and synthetic theft injection provides targeted improvements for the extreme minority classes, which constitute a challenge not specifically addressed by the architectures of either baseline.

Beyond the mean performance, the variance across runs provides additional insight into model reliability. As shown in Figure 10, PC-LossGNN exhibits the narrowest interquartile range among all models, with a standard deviation of only 0.0039 for macro F1-Score. By contrast, XGBoost and LSTM display notably wider distributions, suggesting greater sensitivity to initialization conditions. This stability can be attributed to the physics-consistent regularization in PC-LossGNN, which constrains the solution space to physically plausible regions and thereby reduces the dependence on random initialization. The consistently narrow box in Figure 10 confirms that the performance advantage of PC-LossGNN is robust and not an artifact of a single favorable run. A detailed statistical significance analysis comparing PC-LossGNN with STGCN through McNemar’s test and bootstrap confidence intervals is provided in Section 4.5. Figure 12 presents the validation loss curves of PC-LossGNN and the LSTM, GCN, and STGCN baselines, where the proposed model converges faster and stabilizes at a lower validation loss.

4.4. Statistical Significance Analysis

To rigorously verify that the performance improvement of PC-LossGNN over the strongest baseline STGCN is not attributable to random variation, two complementary statistical tests were conducted on the test set predictions.

The first test is McNemar’s test, which examines whether two classifiers produce systematic differences in their sample-level predictions. A 2 × 2 contingency table was constructed by cross-tabulating the correctness of each model’s prediction for every test instance, as shown in Table 4. Let b denote the number of instances correctly classified by STGCN but misclassified by PC-LossGNN, and let c denote the number of instances in the converse situation. With b = 4091 and c = 8044, the continuity-corrected McNemar statistic yields χ² = 1287.05, corresponding to p < 10⁻¹⁰. This result provides overwhelming evidence that the two models differ significantly in their classification behavior. Moreover, the ratio c/b ≈ 2.0 indicates that PC-LossGNN correctly reclassifies nearly twice as many instances that STGCN misses, compared to the reverse direction. The difference is therefore not merely statistically significant but also practically meaningful, reflecting a consistent improvement rather than a symmetric exchange of errors.

The second analysis employs stratified parametric bootstrap resampling to quantify the uncertainty of the macro-averaged performance gap. In each of 10,000 iterations, the per-class recall for both models was resampled from the asymptotic normal distribution justified by the Central Limit Theorem, and the macro-averaged difference ΔF1 was recorded. Figure 13 presents the resulting bootstrap distribution. The mean of ΔF1 is 0.0902, and the 95% confidence interval spans from 0.0808 to 0.0995. Because the entire interval lies well above zero, the improvement of PC-LossGNN over STGCN is confirmed to be statistically robust and not sensitive to the particular composition of the test set.

Figure 14 further disaggregates the comparison to the individual class level. PC-LossGNN consistently outperforms STGCN across all five categories. The two models achieve comparable accuracy on the Normal class, which is expected given the overwhelming prevalence of normal instances. The largest absolute improvements are observed for the Documentation and Theft categories, where PC-LossGNN improves per-class accuracy by approximately 13 and 14 percentage points, respectively. These two categories are precisely the ones most likely to benefit from the physics-informed residual attention mechanism, which is designed to detect persistent discrepancies between measured and expected power flow. It is worth noting that the Theft class exhibits the widest confidence intervals for both models, which is a natural consequence of its extremely low prevalence in the test set, with only approximately 825 instances. Nevertheless, the improvement remains clearly visible even under this elevated uncertainty.

The per-class analysis in Figure 14 provides further insight into the source of these improvements. All four spatiotemporal models achieve comparable performance on the Normal class, with F1-Scores exceeding 0.994. The differentiating factor is their ability to classify rare anomaly types. PC-LossGNN achieves per-class F1-Scores of 0.862 for Infrastructure, 0.818 for Documentation, 0.894 for Metering, and 0.683 for Theft. These values consistently exceed those of all baselines, with the most pronounced gains appearing in the Theft category. PC-LossGNN improves Theft F1-Score by 7.8 percentage points over PA-STGCN and by 15.5 percentage points over STGCN. This large margin reflects the synergy between the physics-residual attention mechanism, which is designed to detect the sustained directional power discrepancy characteristic of energy diversion, and the synthetic data augmentation and focal loss strategies that ensure the model receives sufficient and well-balanced training signal for this extremely rare category.

From the perspective of class imbalance, this result indicates that the improvement of PC-LossGNN is not dominated by the Normal class. The Normal class is already nearly saturated across all compared models, whereas the main performance differences arise from the four anomaly classes, especially the minority Documentation and Theft categories. This observation suggests that physics-residual attention, class-balanced focal loss, and rare-class sample supplementation jointly improve the separability of minority classes, allowing the model to maintain high Normal-class accuracy while avoiding excessive absorption of anomaly samples into the majority-class decision region.

4.5. Ablation Studies

The most substantial performance degradation was observed upon removal of the physics-consistent regularization term. Without this component, the macro F1-Score decreased from 0.8503 ± 0.0039 to 0.7718 ± 0.0077. This finding confirms that embedding physical laws from the branch power flow model is fundamental to the model’s success, as it enhances generalization and enables the distinction between complex anomaly types, particularly under noisy measurement conditions.

The study further validates the necessity of the integrated spatiotemporal architecture. Excising either the temporal encoder or the spatial GNN encoder resulted in significant performance drops, with macro F1-Scores falling to 0.7496 ± 0.0058 and 0.7238 ± 0.0125, respectively. This demonstrates that jointly modeling spatial and temporal dependencies is critical for comprehensively capturing the complex dynamics of line loss behavior. This ablation result supports the contribution of the interactive temporal encoder to anomaly classification, but it should not be interpreted as evidence that ISTFE performs a strict hierarchical multiresolution decomposition. Accordingly, we define its role as lightweight modeling of adjacent and interleaved temporal dependencies, while leaving more rigorous multiresolution designs such as wavelet, temporal-pyramid, or frequency-domain decomposition as possible extensions for future work.

Finally, the contribution of the adaptive adjacency mechanism was assessed. While a model variant using only the static base adjacency matrix still performed competently at 0.8264 ± 0.0037, incorporating the measurement-driven dynamic adjacency provided a discernible boost that raised the full model to 0.8503 ± 0.0039. This result also indicates that the adaptive adjacency does not merely provide additional statistical correlation, but captures operating-condition-dependent variations in physical coupling strength under the constraint of the static electrical topology. When loads, distributed generation outputs, and branch-loss distributions vary over time, such adaptive reweighting helps represent power-flow interactions and line-loss anomaly propagation patterns that are difficult to express using a fixed topology alone.

To further disentangle the contributions of the two strategies specifically designed to address class imbalance, two additional ablation experiments were conducted. In the first variant, the class-balanced focal loss defined in Equation (35) was replaced with a standard cross-entropy loss while keeping all other components unchanged. In the second variant, the synthetic theft data injection was removed from the training pipeline so that the model was trained exclusively on the original labeled theft instances. The macro-level results are reported in the last two rows of Table 5, and the per-class breakdown is provided in Table 6. The corresponding distributions are visualized in Figure 15, Figure 16 and Figure 17.

The results reveal that the two strategies play clearly complementary roles. Removing the focal loss leads to a broad performance decline across all four anomaly categories, with the macro F1-Score falling from 0.8503 ± 0.0039 to 0.7887 ± 0.0082. The per-class analysis in Figure 15 shows that Documentation suffers the most severe decline at 7.3 percentage points, followed by Theft at 10.8, Infrastructure at 4.3, and Metering at 3.3. This pattern is consistent with the theoretical function of the focal loss, which modulates the gradient contribution of easily classified samples. Because the Normal class accounts for 98.5% of all instances, standard cross-entropy training allows the Normal gradient to dominate the optimization landscape. The focal loss addresses this by down-weighting confidently classified Normal samples, thereby allocating more effective training signal to the underrepresented anomaly categories. Without this reweighting, the optimizer converges toward a decision boundary that favors the majority class, and the consequence is a broad suppression of minority-class recall.

In contrast, removing the synthetic theft injection produces a highly concentrated effect. As shown in Figure 16, the F1-Score for the Theft class drops sharply from 0.682 ± 0.005 to 0.393 ± 0.019, a decline of nearly 29 percentage points, while the other four categories remain largely unaffected with changes of less than 2 percentage points. This dramatic collapse reflects the extreme scarcity of real theft labels in the training set. Electricity theft accounts for only 0.05% of all instances, corresponding to fewer than 300 real theft events across the four training months. Without synthetic augmentation, the model lacks sufficient positive examples to learn the distinctive physics-residual signature of theft, which manifests as a persistent directional discrepancy between measured power drop and expected ohmic loss. The resulting model achieves high recall on the other anomaly types but frequently misclassifies theft events as Normal or Metering anomalies. It should be emphasized that the synthetically injected samples are used to improve training coverage for rare anomalies, rather than to replace labels confirmed by real inspections and work orders. This design enables the model to learn the physics-residual pattern of theft events while keeping the annotation procedure grounded in operational records, maintenance logs, and manual verification results.

An important observation from these results is that the two mechanisms are not redundant and cannot substitute for each other. Even when the focal loss is present, it cannot compensate for the absence of theft training samples, because reweighting an empty or near-empty class gradient does not create informative gradients from nothing. Conversely, augmenting the theft data alone does not prevent the broad suppression of Infrastructure, Documentation, and Metering performance that occurs when the majority class dominates training through standard cross-entropy. Only the joint deployment of both strategies produces the synergistic effect observed in the full model. These results further indicate that class imbalance affects not only the aggregate training error but also the effective decision regions of different anomaly classes. The class-balanced focal loss mainly mitigates the global boundary bias caused by Normal-class gradient dominance, whereas synthetic theft injection mainly compensates for the insufficient intra-class representation of the extremely rare Theft category. Their joint use allows the model to improve the separability of minority anomaly classes while maintaining high recognition performance for the Normal class.

The variance across runs also reveals an interesting pattern, as illustrated in Figure 17. The variant without the spatial encoder exhibited the largest standard deviation of 0.0125 for macro F1-Score, indicating that the absence of spatial graph convolution not only degrades average performance but also makes the model more sensitive to random initialization. By contrast, the full PC-LossGNN maintained the smallest standard deviation of 0.0039, further corroborating the stabilizing effect of jointly leveraging physics constraints and spatiotemporal encoding. The complete model produces the most compact and highest-positioned box in the plot, confirming that all architectural components contribute to both performance and consistency.

4.6. Robustness Analysis

A model’s robustness to noisy measurements is a key indicator of its practical utility. To comprehensively evaluate this aspect, we analyzed the performance degradation of PC-LossGNN against the LSTM, GCN, STGCN, ST-RGNN, and PA-STGCN baselines under varying levels of injected Gaussian noise. Figure 18 plots the macro F1-Score for each model as a function of the noise-to-signal ratio applied to the sensor measurements in the test set.

The results compellingly demonstrate the superior robustness of the proposed architecture. While the performance of all models naturally degrades as noise intensity increases, their degradation rates differ substantially. The PC-LossGNN model exhibits the most graceful decline, maintaining a high F1-score even under significant noise levels. In contrast, the baseline models show much greater vulnerability. The GCN and LSTM models, lacking the ability to process integrated spatiotemporal information, experience a rapid drop in performance. The STGCN, while more resilient than the purely spatial or temporal models, still degrades more steeply than PC-LossGNN. This superior resilience can be largely attributed to PC-LossGNN’s physics-consistency loss (

L_{p h y s}

), which acts as a strong regularizer by constraining predictions to adhere to fundamental power flow equations, effectively filtering out implausible, noise-induced fluctuations and leveraging the inherent redundancy in the network’s physical structure.

4.7. Interpretability Analysis via Class-Evidence Visualization

To intuitively understand the impact of the model’s representation learning and to demonstrate the interpretability afforded by the prototype embedding head, the t-Distributed Stochastic Neighbor Embedding algorithm was employed to visualize the high-dimensional edge embeddings from the final layer.

Figure 19 presents the t-SNE projection of the learned embeddings from PC-LossGNN, with the five learnable class prototypes μ_c overlaid as star markers. Normal samples form a large, dense, and well-separated cluster in the upper-left region, while the four anomaly categories form distinct but more diffuse clusters that are clearly separated from the Normal group. Normal samples form a large, dense, and well-separated cluster in the upper-left region. The four anomaly categories form distinct but more diffuse clusters that are clearly separated from the Normal group. The prototypes are located near the geometric center of their respective clusters, confirming that the prototype separation loss in Equation (37) successfully anchors the embedding geometry to physically meaningful class structures. The partial overlap between the Infrastructure and Documentation clusters is consistent with the per-class confusion reported in the confusion matrix of Figure 9 and reflects the genuinely overlapping electrical signatures of these two anomaly types. Both categories can produce persistent residual biases under steady-state conditions, which makes them intrinsically difficult to distinguish. The dashed lines in Figure 19 connect several ambiguous Documentation samples to both the Documentation and Infrastructure prototypes, illustrating that their embedding positions fall between the two class regions. These are precisely the cases where the class-evidence decomposition provided by the prototype head is most valuable to human operators.

To demonstrate how the prototype similarity scores function as class-evidence in practice, Figure 20 illustrates two representative cases. In Figure 20a, a correctly classified Theft sample (T − 1) exhibits a clear unimodal similarity profile, with the Theft prototype receiving a similarity of approximately 0.68 and no other prototype exceeding 0.09. This sharp concentration provides unambiguous evidence for the classification decision. In contrast, Figure 20b shows a misclassified Documentation sample (D-3) predicted as Infrastructure: the Infrastructure prototype receives 0.47 while Documentation receives 0.36. Although the prediction is incorrect, the true class retains the second-highest similarity, meaning an operator inspecting the class-evidence would immediately recognize the prediction as uncertain. This evidence decomposition directly supports the operational workflow envisioned in this paper: high-confidence predictions can be acted upon automatically, while ambiguous cases are routed to expert review.

Figure 20 further contrasts the prototype similarity profiles for a correctly classified and a misclassified sample in bar chart form. Panel (a) shows sample T − 1, where the Theft prototype dominates with a sharp peak. The physics residuals for this sample exhibit a persistent directional discrepancy between measured power drop and expected ohmic loss, which is the canonical signature of energy diversion that the Theft prototype has learned to represent. Panel (b) shows sample D-3, where the Infrastructure and Documentation prototypes are nearly tied. This flat profile reflects the fundamental physical ambiguity in distinguishing parameter-mismatch anomalies from equipment-degradation anomalies when both produce similar systematic residual biases. This type of evidence decomposition directly supports the operational workflow envisioned in this paper. High-confidence predictions, where one prototype clearly dominates, can be acted upon automatically. Ambiguous cases, where two or more prototypes share similar scores, are routed to expert review. Standard softmax-only classifiers lack this capability because their output probabilities are often poorly calibrated and do not convey how the classification evidence distributes across competing hypotheses.

5. Conclusions and Future Work

This paper introduces the PC-LossGNN model for the fine-grained classification of line loss anomalies in distribution networks. Simulation results on a real-world power grid dataset demonstrate the model’s effectiveness in classifying five types of line loss. Compared to several machine learning, deep learning, and recent physics-aware graph neural network baselines, our proposed model is more robust and accurate under various noise patterns. The model’s superiority is attributed to its fusion of physics-informed constraints, spatiotemporal encoding capabilities, and an adaptive graph structure. In summary, this paper proposes a flexible and widely applicable anomaly classification framework that enhances the operational awareness and decision-making capabilities of distribution networks.

The proposed framework and approach open up interesting directions for future research. In the present study, the model exhibited some confusion when distinguishing between ‘Documentation’ and ‘Infrastructure’ anomalies, which can have overlapping electrical signatures. Therefore, an important future research direction is to explore the fusion of more diverse data sources, such as maintenance records, to learn more discriminative feature representations, thereby improving the model’s classification performance on these fine-grained categories.

Author Contributions

Conceptualization, X.Z. and G.Z.; methodology, X.Z. and L.H.; software, X.Z. and J.Y.; validation, X.Z., L.H. and J.Y.; formal analysis, X.Z. and L.H.; investigation, X.Z. and C.D.; resources, G.Z. and C.D.; data curation, J.Y. and C.D.; writing—original draft preparation, X.Z.; writing—review and editing, L.H. and G.Z.; visualization, X.Z. and J.Y.; supervision, G.Z.; project administration, G.Z.; funding acquisition, G.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Science and Technology Major Project of the Department of Science and Technology of Yunnan Province, China (No. 202402AF080006).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

During the preparation of this manuscript, the authors used AI-assisted language-editing tools only for grammar, spelling, punctuation, formatting, and readability checks. The authors reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Lin, M.H.; Ding, Y.; Wang, T.L.; Liu, Y.; Li, Z.W. Transmission Line Anomaly Detection and Real-Time Monitoring System Combining Edge Computing and EfficientDet. IEEE Access 2025, 13, 69573–69581. [Google Scholar] [CrossRef]
Bi, W.Z.; Tian, L.; Li, C.; Ma, Z.; Pan, H.Y. Wind-Induced Failure Analysis of a Transmission Tower-Line System with Long-Term Measured Data and Orientation Effect. Reliab. Eng. Syst. Saf. 2023, 229, 108875. [Google Scholar]
Chen, J.L.; Fu, Z.J.; Cheng, X.; Wang, F. An Method for Power Lines Insulator Defect Detection with Attention Feedback and Double Spatial Pyramid. Electr. Power Syst. Res. 2023, 218, 109175. [Google Scholar] [CrossRef]
Kanwak, S.; Jiriwibhakorn, S. Artificial Intelligence Based Faults Identification, Classification, and Localization Techniques in Transmission Lines-A Review. IEEE Lat. Am. Trans. 2024, 21, 1291–1305. [Google Scholar]
Li, W.; Zhao, W.; Li, J.M.; Li, J.; Zhao, Y.K. Abnormal Line Loss Identification and Category Classification of Distribution Networks Based on Semi-Supervised Learning and Hierarchical Classification. Front. Energy Res. 2024, 12, 1378722. [Google Scholar] [CrossRef]
Xi, Y.H.; Li, M.T.; Zhou, F.; Tang, X.; Li, Z.W.; Tian, J.X. SE-Inception-ResNet Model with Focal Loss for Transmission Line Fault Classification Under Class Imbalance. IEEE Trans. Instrum. Meas. 2024, 73, 3500917. [Google Scholar]
Rhaman, T.; Hasan, T.; Ahammad, A.; Ahmed, I.; Rakhaine, N. MLPNN and Ensemble Learning Algorithm for Transmission Line Fault Classification. Int. Trans. Electr. Energy Syst. 2025, 2025, 6114718. [Google Scholar] [CrossRef]
Buazu, M.M.; Tejedor-Aguilera, J.; Cruz-Romero, P.; Gomez-Exposito, A. Hybrid Deep Neural Networks for Detection of Non-Technical Losses in Electricity Smart Meters. IEEE Trans. Power Syst. 2020, 35, 1254–1263. [Google Scholar]
Chen, J.D.; Nanehkaran, Y.A.; Chen, W.R.; Liu, Y.J.; Zhang, D.F. Data-Driven Intelligent Method for Detection of Electricity Theft. Int. J. Electr. Power Energy Syst. 2023, 148, 208948. [Google Scholar] [CrossRef]
Yao, M.T.; Zhu, Y.; Li, J.J.; Wei, H.; He, P.H. Research on Predicting Line Loss Rate in Low Voltage Distribution Network Based on Gradient Boosting Decision Tree. Energies 2019, 12, 2522. [Google Scholar] [CrossRef]
Liu, K.Y.; Jia, D.L.; Kang, Z.J.; Luo, L. Anomaly Detection Method of Distribution Network Line Loss Based on Hybrid Clustering and LSTM. J. Electr. Eng. Technol. 2022, 17, 1131–1141. [Google Scholar]
Zhang, Z.L.; Yang, Y.; Zhao, H.; Xiao, R. Prediction Method of Line Loss Rate in Low-Voltage Distribution Network Based on Multi-Dimensional Information Matrix and Dimensional Attention Mechanism-Long-and Short-Term Time-Series Network. IET Gener. Transm. Distrib. 2022, 16, 4187–4203. [Google Scholar]
Jiang, W.; Tang, H.B. Distribution Line Parameter Estimation Considering Dynamic Operating States with a Probabilistic Graphical Model. Int. J. Electr. Power Energy Syst. 2020, 121, 106133. [Google Scholar] [CrossRef]
Huang, L.; Zhou, G.; Zhang, J.; Zeng, Y.; Li, L. Calculation Method of Theoretical Line Loss in Low-Voltage Grids Based on Improved Random Forest Algorithm. Energies 2023, 16, 2971. [Google Scholar] [CrossRef]
Liang, C.; Chen, C.; Wang, W.Z.; Ma, X.P.; Li, Y.Y.; Jiang, T. Line Loss Interval Algorithm for Distribution Network with DG Based on Linear Optimization under Abnormal or Missing Measurement Data. Energies 2022, 15, 4158. [Google Scholar] [CrossRef]
Sun, Z.Q.; Xuan, Y.; Huang, Y.; Cao, Z.K.; Zhang, J.S. Traceability Analysis for Low-Voltage Distribution Network Abnormal Line Loss Using a Data-Driven Power Flow Model. Front. Energy Res. 2023, 11, 1272095. [Google Scholar]
Li, Z.C.; Liu, Y.D.; Yan, Y.J.; Wang, P.; Jiang, X.C. An Identification Method for Asymmetric Faults with Line Breaks Based on Low-Voltage Side Data in Distribution Networks. IEEE Trans. Power Deliv. 2021, 36, 3629–3639. [Google Scholar]
Dou, Y.B.; Tan, S.W.; Xie, D.W. Comparison of Machine Learning and Statistical Methods in the Field of Renewable Energy Power Generation Forecasting: A Mini Review. Front. Energy Res. 2023, 11, 1218603. [Google Scholar] [CrossRef]
Wang, X.Y.; McArthur, S.D.J.; Strachan, S.M.; Kirkwood, J.D.; Paisley, B. A Data Analytic Approach to Automatic Fault Diagnosis and Prognosis for Distribution Automation. IEEE Trans. Smart Grid 2018, 9, 6265–6273. [Google Scholar]
Chen, K.J.; Huang, C.W.; He, J.L. Fault Detection, Classification and Location for Transmission Lines and Distribution Systems: A Review on the Methods. High Volt. 2016, 1, 25–33. [Google Scholar] [CrossRef]
Jamil, M.; Sharma, S.K.; Singh, R. Fault Detection and Classification in Electrical Power Transmission System Using Artificial Neural Network. SpringerPlus 2015, 4, 334. [Google Scholar] [CrossRef] [PubMed]
Abdullah, A. Ultrafast Transmission Line Fault Detection Using a DWT-Based ANN. IEEE Trans. Ind. Appl. 2018, 54, 1182–1193. [Google Scholar]
Nguyen, B.L.H.; Vu, T.V.; Nguyen, T.-T.; Panwar, M.; Hovsapian, R. Spatial-Temporal Recurrent Graph Neural Networks for Fault Diagnostics in Power Distribution Systems. IEEE Access 2023, 11, 46039–46050. [Google Scholar] [CrossRef]
Wu, T.; Carreño, I.L.; Scaglione, A.; Arnold, D. Spatio-Temporal Graph Convolutional Neural Networks for Physics-Aware Grid Learning Algorithms. IEEE Trans. Smart Grid 2023, 14, 4086–4099. [Google Scholar]

Figure 1. Overall framework.

Figure 2. Branch power flow model of distribution network.

Figure 3. Graph structure mapping of distribution network.

Figure 4. Parallel construction of confidence-aware node and edge input tensors in PC-LossGNN. Node and edge measurements are organized independently over the same sliding historical window from t − T + 1 to t. The temporal axis indicates the time dimension within each tensor and does not imply a sequential relationship between node and edge measurements.

Figure 5. Spatiotemporal hybrid graph convolution with static prior and adaptive adjacency.

Figure 6. Interactive spatiotemporal fusion encoder (ISTFE) based on even-odd temporal splitting for line-loss feature modeling.

Figure 7. Five-class edge decoder with prototype head and physics-guided auxiliary loss.

Figure 8. End-to-end pipeline of PC-LossGNN for five-class line-loss anomaly classification.

Figure 9. 5 × 5 Confusion Matrix for the Proposed PC-LossGNN Model.

Figure 10. Performance comparison of all models with error bars indicating mean ± std over five runs.

Figure 11. Box plots of macro F1-Score across five independent runs for all compared models.

Figure 12. Validation loss curves of the proposed PC-LossGNN and baseline models.

Figure 13. Bootstrap distribution of ΔMacro F1-Score over 10,000 iterations. The solid red line indicates the mean, the shaded region denotes the 95% confidence interval, and the dashed black line marks zero.

Figure 14. Per-class accuracy comparison between STGCN and PC-LossGNN with 95% confidence intervals.

Figure 15. Per-class F1-Score comparison under minority-class intervention ablation with 95% confidence intervals.

Figure 16. Per-class F1-Score degradation relative to the full model when removing each minority-class intervention.

Figure 17. Box plots of macro F1-Score for ablation variants across five independent runs.

Figure 18. Model performance vs. measurement noise level.

Figure 19. Learned embeddings from PC-LossGNN with learnable class prototypes μ_c overlaid as star markers. Dashed lines connect ambiguous Documentation samples to both neighboring prototypes.

Figure 20. Prototype similarity profiles for (a) a correctly classified Theft sample T − 1 and (b) a misclassified Documentation sample D-3 predicted as Infrastructure.

Table 1. Five-class taxonomy for line-loss anomaly classification.

Class	Key Physics-Based Indicators	Typical Signatures (Data/Operation)
Normal	$η_{i j, t}$ within expected band	$R^{P}$
Infrastructure	Persistent $R^{L} > 0$ (measured loss $> I_{i j, t}^{2} r_{i j}$	$R^{U}$
Documentation	Systematic bias in $R^{P}$ , $R^{Q}$ , $R^{U}$ under steady conditions; inconsistent $R^{L}$ not explained by load	Long-lasting constant offset; cross-branch statistics inconsistent with records; weak dependence on operating variations
Metering	Residual spikes near low-confidence sensors; $R^{L}$ decoupled from $I_{i j, t}^{2}$ ; intermittent $R^{P}$ , $R^{Q}$ bursts	Short, bursty anomalies; alignment with maintenance/reading timestamps; time-synchronization or CT/PT issues suspected
Theft	$P_{i j, t}^{i n} - P_{i j, t}^{o u t} ≫ I_{i j, t}^{2} r_{i j}$ (persistent $R^{L} > 0$ ); downstream power balance violated ( $R^{P}$ abnormal)	Sudden onset and time-windowed behavior; decoupled from normal load patterns; off-peak concentration frequently observed

Note:

η_{i j, t}

is relative loss rate;

R^{P}

,

R^{Q}

are active/reactive conservation residuals;

R^{U}

is voltage-drop residual;

R^{L}

is ohmic-loss equivalence residual. Thresholds are configured with sensor accuracy (confidence channels), network voltage level, and historical “normal” bands.

Table 2. Overall classification performance of PC-LossGNN.

Metric	Value (Mean ± Std)
Accuracy	0.9914 ± 0.0001
Precision (Macro)	0.8442 ± 0.0021
Recall (Macro)	0.8615 ± 0.0039
F1-Score (Macro)	0.8503 ± 0.0039

Table 3. Performance comparison of PC-LossGNN against six baseline models.

Model Variant	Accuracy	Precision (Macro)	Recall (Macro)	F1-Score (Macro)
XGBoost	0.9860 ± 0.0002	0.6787 ± 0.0085	0.6483 ± 0.0130	0.6651 ± 0.0108
LSTM	0.9876 ± 0.0003	0.7185 ± 0.0063	0.6970 ± 0.0074	0.7121 ± 0.0105
GCN	0.9879 ± 0.0003	0.7477 ± 0.0064	0.7454 ± 0.0041	0.7507 ± 0.0050
STGCN	0.9893 ± 0.0001	0.8039 ± 0.0058	0.7907 ± 0.0078	0.7987 ± 0.0089
ST-RGNN [23]	0.9896 ± 0.0002	0.8109 ± 0.0049	0.8053 ± 0.0065	0.8099 ± 0.0068
PA-STGCN [24]	0.9903 ± 0.0003	0.8261 ± 0.0027	0.8183 ± 0.0033	0.8212 ± 0.0048
PC-LossGNN	0.9914 ± 0.0001	0.8442 ± 0.0021	0.8615 ± 0.0039	0.8503 ± 0.0039

Table 4. McNemar’s contingency table of sample-level predictions on the test set for PC-LossGNN and STGCN.

	STGCN Correct	STGCN Wrong
PC-LossGNN Correct	1,636,720	8044
PC-LossGNN Wrong	4091	1145

Note: N = 1,650,000. McNemar χ² = 1287.05 with continuity correction, p < 10⁻¹⁰.

Table 5. Ablation study of PC-lossGNN components.

Model Variant	Accuracy	Precision (Macro)	Recall (Macro)	F1-Score (Macro)
PC-LossGNN (Full Model)	0.9914 ± 0.0002	0.8451 ± 0.0064	0.8598 ± 0.0076	0.8503 ± 0.0039
w/o Physics Loss ( $L_{p h y s}$ )	0.9902 ± 0.0003	0.7869 ± 0.0081	0.7611 ± 0.0093	0.7718 ± 0.0077
w/o Temporal Encoder	0.9885 ± 0.0003	0.7581 ± 0.0032	0.7426 ± 0.0072	0.7496 ± 0.0058
w/o Spatial Encoder (GNN)	0.9877 ± 0.0003	0.7332 ± 0.0065	0.7076 ± 0.0071	0.7238 ± 0.0125
w/o Adaptive Adjacency	0.9912 ± 0.0002	0.8183 ± 0.0048	0.8295 ± 0.0034	0.8264 ± 0.0037
w/o Focal Loss	0.9908 ± 0.0001	0.8055 ± 0.0034	0.7910 ± 0.0069	0.7887 ± 0.0082
w/o Synth. Theft Injection	0.9911 ± 0.0001	0.8362 ± 0.0076	0.8198 ± 0.0057	0.8250 ± 0.0025

Table 6. Per-class F1-Score under minority-class intervention ablation.

Variant	Normal	Infra.	Doc.	Metering	Theft
Full Model	0.996 ± 0.000	0.861 ± 0.009	0.817 ± 0.015	0.889 ± 0.002	0.682 ± 0.005
w/o Focal Loss	0.996 ± 0.000	0.818 ± 0.009	0.744 ± 0.011	0.856 ± 0.011	0.574 ± 0.014
w/o Synth. Theft	0.996 ± 0.000	0.848 ± 0.009	0.809 ± 0.010	0.883 ± 0.004	0.393 ± 0.019

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, X.; Huang, L.; Zhou, G.; Yang, J.; Duan, C. PC-LossGNN: A Physics-Consistent Spatiotemporal Graph Neural Network for Line Loss Anomaly Classification. Symmetry 2026, 18, 1052. https://doi.org/10.3390/sym18061052

AMA Style

Zhu X, Huang L, Zhou G, Yang J, Duan C. PC-LossGNN: A Physics-Consistent Spatiotemporal Graph Neural Network for Line Loss Anomaly Classification. Symmetry. 2026; 18(6):1052. https://doi.org/10.3390/sym18061052

Chicago/Turabian Style

Zhu, Xiaojing, Li Huang, Gan Zhou, Junyang Yang, and Chengge Duan. 2026. "PC-LossGNN: A Physics-Consistent Spatiotemporal Graph Neural Network for Line Loss Anomaly Classification" Symmetry 18, no. 6: 1052. https://doi.org/10.3390/sym18061052

APA Style

Zhu, X., Huang, L., Zhou, G., Yang, J., & Duan, C. (2026). PC-LossGNN: A Physics-Consistent Spatiotemporal Graph Neural Network for Line Loss Anomaly Classification. Symmetry, 18(6), 1052. https://doi.org/10.3390/sym18061052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PC-LossGNN: A Physics-Consistent Spatiotemporal Graph Neural Network for Line Loss Anomaly Classification

Abstract

1. Introduction

2. Preliminaries

2.1. Branch Power Flow Model

2.2. Line Loss Mechanism and Anomaly Patterns

2.3. Temporal Correlation Features

2.4. Graph-Based Modeling of Power Systems

3. Methodology

3.1. Task Formulation and Feature Construction

3.2. Physics-Consistent Spatiotemporal Graph Modeling

3.3. Edge Representation and Five-Class Decoding

3.4. Learning Objective, Regularization, and Training Protocol

4. Experiments

4.1. Data Source and Implementation Details

4.2. Evaluation Metrics and Baselines

4.3. Performance of the Proposed Model

4.4. Statistical Significance Analysis

4.5. Ablation Studies

4.6. Robustness Analysis

4.7. Interpretability Analysis via Class-Evidence Visualization

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI