Next Article in Journal
Evaluating the Effectiveness of Explainable AI for Adversarial Attack Detection in Traffic Sign Recognition Systems
Previous Article in Journal
SCFusion: Spatial-Channel Cross-Frequency Guided Fusion Network for Infrared–Visible Image Object Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Physics-Informed LSTM with Adaptive Parameter Updating for Non-Stationary Time Series: A Case Study on Disconnector Health Monitoring

1
School of Electric Power, South China University of Technology, Guangzhou 510641, China
2
Meizhou Power Supply Bureau, Guangdong Power Grid Corporation, Meizhou 514500, China
3
School of Automation Engineering, South China University of Technology, Guangzhou 510641, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(6), 970; https://doi.org/10.3390/math14060970
Submission received: 29 January 2026 / Revised: 9 March 2026 / Accepted: 11 March 2026 / Published: 12 March 2026

Highlights

What are the main findings?
  • Proposed a Hybrid Physics-Informed LSTM framework with thermal constraints.
  • Developed an Adaptive Parameter Updating algorithm for inverse problems.
  • Achieved 35% error reduction in long-term recursive forecasting.
What are the implications of the main findings? 
  • Provides a robust method for solving time-varying ODEs in deep learning.
  • Mitigates parameter drift in non-stationary dynamic systems.
  • Enhances physical consistency through online parameter tracking.

Abstract

Accurate prediction of contact temperature in disconnectors is critical for early fault detection. However, purely physics-based models face difficulties in parameter identification, while purely data-driven models often suffer from error accumulation in long-term forecasting. To address these challenges, this paper proposes a novel framework named Hybrid Physics-Informed Long Short-Term Memory (Hybrid-PI-LSTM). Firstly, this paper mathematically formulates the transient heat transfer process as a constrained optimization problem governed by a nonlinear ordinary differential equation (ODE), embedding physical laws into the loss function as a regularization term to promote dynamic consistency. Secondly, to address the inverse problem of parameter drift caused by environmental changes, an Adaptive Parameter Updating (APU) mechanism is introduced. This algorithm utilizes a gradient-based iterative approach to dynamically estimate equivalent physical coefficients (e.g., heat capacity) from observational residuals during inference. Finally, numerical experiments on a real-world dataset demonstrate that the proposed framework significantly outperforms baseline models. Specifically, it achieves a Root Mean Squared Error (RMSE) of 0.283 at a 720-step forecasting horizon, reducing the prediction error by over 35% compared to static-parameter physical models. The results indicate that the proposed adaptive constraint mechanism contributes to enhanced long-term numerical stability and physics-guided parameter tracking.

1. Introduction

Disconnectors are critical for ensuring the safe operation of power grids [1]. However, their long-term exposure to complex outdoor environments means that their contacts are highly susceptible to increased contact resistance due to oxidative corrosion or mechanical vibration [2], which subsequently induces localized overheating faults [3]. Statistics indicate that contact overheating has become one of the primary causes of disconnector failure [4]. In the face of increasingly stringent requirements for power grid operation, condition-based maintenance using online monitoring data has become a prominent industry trend [5,6]. Therefore, utilizing multi-source monitoring data to achieve precise contact temperature trend prediction is of significant importance for early fault warning and full-lifecycle management.
Existing temperature prediction methods are primarily categorized into physics-based mechanism modeling and data-driven approaches. Physical models, such as the Finite Element Method (FEM) and Thermal Network Method (TNM), are solved based on energy balance equations. Previous studies [7,8] indicate that, while FEM can accurately calculate complex 3D electro-thermal coupled fields, its prohibitive computational costs limit its online application. Research [9] shows that, although TNM simplifies calculations through equivalent networks, it faces challenges in parameter identification within practical engineering. Other research has shown that key thermal parameters, such as contact resistance and convection heat transfer coefficients, are not constant [10]; rather, they are significantly influenced by surface oxidation states and meteorological factors such as wind speed and solar irradiance [11]. Fixed model parameters often struggle to adapt to time-varying outdoor conditions, leading to a gradual degradation in the accuracy of physical models during long-term operation.
With the development of deep learning, data-driven methods have demonstrated advantages in power equipment condition prediction. For instance, approaches based on Temporal Convolutional Networks [12] and deep learning surrogate models [13] have achieved efficient temperature prediction by mining historical data features. Fuzzy neural networks [14] have also proven effective in handling uncertainties in monitoring data. However, purely data-driven models are essentially “black boxes,” making it challenging to extract underlying physical mechanisms [15,16].
To overcome the limitations of single approaches, the “Physics-Informed” paradigm, which synergizes physical mechanisms with deep learning, has emerged [17]. This method embeds physical laws as constraints into neural networks and has achieved breakthroughs in various fields, such as Long Short-Term Memory (LSTM) Physics-Informed Neural Network (PINN) hybrid models for lithium battery temperature estimation [18], digital twin modeling for steam turbines [19], and thermal response prediction for composite materials [20]. Although the application of this method in the field of disconnector temperature prediction has not been studied, a critical examination of existing physics-informed frameworks in broader engineering contexts reveals a notable mathematical limitation. From a mathematical modeling perspective, standard PINN frameworks typically assume that the coefficients in the governing partial differential equations (PDEs) or ordinary differential equations (ODEs) are constant or follow a known deterministic function. For instance, recent studies have successfully applied physics-informed models to predict solder joint reliability [21], simulate temperature fields in asphalt pavement [22], and correct biases in sea surface salinity data [23]. While effective in their respective domains, these approaches predominantly treat physical parameters—such as material degradation rates, thermal conductivity, or diffusion coefficients—as static values within the optimization process. Consequently, these static-parameter models face limitations when solving the coupled forward-inverse problem in non-stationary environments, where physical properties continuously drift due to complex meteorological disturbances.
From a broader mathematical perspective, determining these unobservable, time-varying thermal coefficients from temperature measurements intrinsically constitutes an inverse problem. The theoretical foundations and the inherently ill-posed nature of such parameter identification tasks have been extensively investigated in the literature. Foundational studies have systematically classified these inverse problems, highlighting their widespread application and inherent instability [24]. In the specific context of thermal dynamics, researchers have rigorously explored the structural identification of unknown source terms and parameters in heat equations [25], as well as complex inverse source problems for mixed parabolic-hyperbolic models [26]. These rigorous mathematical investigations underscore a critical consensus: solving coupled forward-inverse thermal problems without introducing appropriate regularization or robust constraint mechanisms typically leads to non-uniqueness and error amplification. Recognizing these fundamental mathematical challenges, it becomes evident that a novel constrained framework is required.
To address these challenges, this paper proposes a Hybrid Physics-Informed LSTM (Hybrid-PI-LSTM) network with parameter-adaptive updating capabilities for the online prediction of disconnector contact temperature. The model uses LSTM to extract temporal features and innovatively introduces the unsteady thermal balance equation as a physical constraint layer for network training. Unlike existing PINN methods, the Adaptive Parameter Updating (APU) mechanism utilizes prediction residuals to backward-correct key parameters such as heat dissipation coefficients, thereby facilitating the dynamic sensing of environmental disturbances. Experimental results show that this method significantly reduces cumulative errors in long-term prediction and provides physics-based criteria for equipment status assessment through physically consistent parameter evolution trajectories.

2. Mathematical Problem Formulation and Dataset Description

2.1. Mathematical Definition of the Prediction Task

This paper addresses the need for online prediction of disconnector contact temperatures in substations. Under conditions of multi-factor operation and environmental disturbances, the objective is to perform trend prediction and provide early warnings for contact hotspot temperatures. Let the multi-factor input feature vector at time t be x t R d , which is composed of electrical load and environmental variables. x t includes current I t , ambient temperature T a m b , t , relative humidity R H t , wind speed v t , solar irradiance G t , and time-related features. The prediction target is the contact hotspot temperature T 1 , t .
Given a historical observation window of length W , the input sequence is constructed as follows:
X t = x t W + 1 , x t W + 2 , , x t
The corresponding future ground-truth temperature sequence is
Y t H = T 1 , t + 1 , T 1 , t + 2 , , T 1 , t + H
where H is the prediction horizon. The aim of this paper is to learn a nonlinear mapping relationship on time-series data:
f θ : X t Y ^ t H = T ^ 1 , t + 1 , , T ^ 1 , t + H
where θ represents the parameters to be learned.

2.2. Dataset Description and Preprocessing Strategy

The dataset utilized in this paper was derived from multi-factor synchronous acquisition sequences measured by the Meizhou Power Supply Bureau. The acquired dataset encompasses both electrical load and environmental disturbance information. The sampling interval is 2 min, and a total of 25,200 samples were collected, encompassing various load levels and diverse meteorological conditions. To mitigate the impact of different dimensions and numerical ranges on training stability, Min–Max Normalization is applied to the input features:
x = x x m i n x m a x x m i n
where x represents the normalized data after processing. Furthermore, all baseline and ablation models adopt a 5:1:1 ratio for partitioning the training, validation, and testing sets, with segments strictly divided in chronological order to avoid future information leakage caused by random shuffling.

2.3. Overall Methodological Framework

The technical roadmap for the disconnector contact temperature trend prediction proposed in this paper is shown in Figure 1. The overall process consists of three stages:
  • Data Preprocessing: Cleaning, aligning, and normalizing multi-source heterogeneous monitoring data.
  • Model Construction: Building the Hybrid-PI-LSTM network, which includes a physical constraint layer and an adaptive parameter module.
  • Prediction and Evaluation: Generating multi-step future temperature trajectories based on model predictions and conducting a comprehensive performance evaluation combined with physical residual analysis.

3. Methodology

3.1. LSTM

LSTM is a type of recurrent neural network designed for sequence modeling. A primary advantage of LSTM is its ability to control the retention and updating of information through a gating mechanism at each time step, thereby mitigating the gradient vanishing problem common in standard recurrent neural networks for long sequences [27]. For an input vector x t R d at the t-th time step, the LSTM maintains a hidden state h t and a cell state C t , updating its states via an input gate, forget gate, output gate, and a candidate memory state C t 1 . Its structure is illustrated in Figure 2.
Mathematically, the internal state transitions shown in Figure 2 are governed by the following gated equations. First, the forget gate f t determines which information from the previous cell state C t 1 should be discarded:
f t = σ W f · h t 1 , x t + b f
Next, the input gate i t and the candidate memory cell C ~ t jointly determine the information to be stored:
i t = σ W i · h t 1 , x t + b i
C ~ t = tanh W C · h t 1 , x t + b C
The current cell state C t is updated by combining the previous state and the new candidate values:
C t = f t C t 1 + i t C ~ t
Finally, the output gate o t and the final hidden state h t are computed as
o t = σ W o · h t 1 , x t + b o
h t = o t tanh C t
where σ · denotes the sigmoid activation function, represents the Hadamard product (element-wise multiplication), and W and b are the learnable weight matrices and bias vectors, respectively.

3.2. Governing Equations: Unsteady Thermal Balance ODE

To provide the necessary physical constraints without introducing high-complexity mechanism modeling or difficult-to-identify boundary conditions, this paper treats the hotspot region of the disconnector contact as a single-node thermal capacitance block [28]. Based on the first law of thermodynamics [2], the instantaneous energy balance of the disconnector contact is governed by the equilibrium between heat generation, environmental heat input, heat dissipation, and thermal storage:
E ˙ i n t + E ˙ g e n t E ˙ o u t t = E ˙ s t t
Specifically, the heat generation rate E ˙ g e n ( t ) is primarily derived from the Joule heating effect driven by the load current I t . To adequately account for the electrothermal coupling, this paper employs a first-order linear approximation for the temperature-dependent contact resistance R ( T 1 , t ) :
R T 1 , t R 0 1 α R T 1 , t T r e f
E ˙ g e n t I t 2 R 0 1 α R T 1 , t T r e f
where T r e f is the reference temperature,   R 0  is the reference resistance at T r e f , and α R is the temperature coefficient of resistance.
The heat input E ˙ i n ( t ) from the environment is mainly attributed to the absorption of solar irradiance G t , expressed as
E ˙ i n t = α A G t + ξ t
where α is the equivalent absorption coefficient, A is the effective area, and ξ t is a perturbation term designed to account for unmodeled weak periodic disturbances or systematic biases. In the actual computational implementation, this paper sets the mathematical expectation of ξ t to zero to prevent the neural network from absorbing fitting errors into ξ t and thereby weakening the physical constraints; consequently, it is not treated as a free, learnable parameter.
The total heat loss rate E ˙ o u t ( t ) is composed of multiple dissipation pathways, including convective heat transfer Q c o n v , radiative heat transfer Q r a d , evaporative heat loss Q e v a p , and conduction loss Q c o n d . Notably, the convective component incorporates a dynamic heat transfer coefficient h ( v t ) to model the enhancement effect of time-varying wind speed v t :
E ˙ o u t t = Q c o n v t + Q r a d t + Q e v a p t + Q c o n d t
Q c o n v t = h v t A T 1 , t T a m b , t
h v t = h 0 + a 1 v t + a 2 R H t
Q r a d t = ε σ A T 1 , t + 273.15 4 T a m b , t + 273.15 4
Q e v a p t = k e v a p v t R H t
Q c o n d t = k c o n d T 1 , t T a m b , t
where σ is the Stefan–Boltzmann constant, h 0 is the baseline convection heat transfer coefficient, a 1 and a 2 are preset empirical coefficients, ε is the equivalent emissivity, and k e v a p and k c o n d are the empirical heat dissipation and equivalent thermal conductivity coefficients.
Finally, the rate of change in stored energy E ˙ s t ( t ) determines the instantaneous temperature variation rate, scaled by the equivalent heat capacity C 1 :
E ˙ s t t = C 1 d T 1 t d t
By substituting these detailed components back into the energy balance equation and rearranging the terms, this paper derives the overall governing ODEs for the contact temperature dynamics, as explicitly presented in Equation (22):
d T 1 t d t = I t 2 R 0 1 α R T 1 , t T r e f + α A G t + ξ t h 0 + a 1 v t + a 2 R H t A T 1 , t T a m b , t ε σ A T 1 , t + 273.15 4 T a m b , t + 273.15 4 k e v a p v t R H t k c o n d T 1 , t T a m b , t C 1
The subset of equivalent thermal parameters that are dynamically updated and recorded during the inference stage is defined as
Θ p h y = C 1 , h 0 , ε ,   α   , k e v a p , k c o n d

3.3. Construction of the Hybrid-PI-LSTM Model

This paper proposes a Hybrid-PI-LSTM model that fuses physical mechanisms with deep learning. The model consists of three main parts:
  • An LSTM-based temporal feature extraction and rolling prediction structure [29];
  • A composite loss function embedded with thermal balance constraints;
  • An APU mechanism for time-varying outdoor environments.
This section will elaborate on these three core modules.

3.3.1. Network Architecture Design

The Hybrid-PI-LSTM model first maps original multi-dimensional features to a unified high-dimensional latent space through a fully connected layer (dense layer) that acts as a pre-layer. Subsequently, the LSTM backbone performs temporal modeling on the encoded multi-factor sequence to obtain the temporal hidden state representation:
h t = L S T M θ x t W + 1 : t
The hidden state is then mapped to the next step of temperature prediction:
T ^ 1 , t + 1 = g θ h t
To align with online deployment conditions and characterize the impact of error propagation on long-term trends, this paper employs recursive rolling prediction to generate future multi-step sequences. For a prediction horizon H , it is defined as
T ^ 1 , t + k = f θ X ^ t + k 1 , k = 1,2 , , H
where X ^ t + k 1 is the updated input window:
X ^ t + k = U p d a t e ( X ^ t + k 1 ,   u t + k ,   T ^ 1 , t + k )
where u t + k denotes available input data and T ^ 1 , t + k is the feedback predicted temperature. It should be explicitly stated that for practical online forecasting, the future covariates u t + k are assumed to be accessible via external operational systems prior to the prediction execution. Specifically, environmental variables and load currents are anticipated to be provided by the grid’s existing meteorological and load forecasting systems, respectively.
To construct the residual term, the predicted temperature change rate T ^ ˙ 1 , t + k is first required, which can be numerically approximated using discrete differences:
T ^ ˙ 1 , t + k T ^ 1 , t + k T ^ 1 , t + k 1 Δ t
where Δ t is the sampling interval. Subsequently, u t + k and the predicted temperature T ^ 1 , t + k are substituted into Equation (22) to obtain the change rate T ˙ p h y , t + k . Accordingly, the physical residual for each step is defined as
r t + k = T ^ ˙ 1 , t + k T ˙ p h y , t + k = T ^ ˙ 1 , t + k F T ^ 1 , t + k , u t + k ; Θ p h y
where F ( · ) is consistent with the right side of Equation (22).

3.3.2. Formulation of Physics-Informed Loss Function

The model employs a composite loss function that combines data and physical terms to jointly optimize network parameters θ and the physical parameter set Θ p h y . The total loss is defined as in [30]:
L t o t a l = L d a t a + λ p h y L p h y
where L d a t a penalizes the fitting error between the predicted and actual temperatures, while L p h y constrains the dynamic evolution of the predicted temperature to align with Equation (22). λ p h y is a weighting coefficient used to balance “data fitting” with “physical consistency.” The data loss term adopts the form of the mean sequence-level error:
L d a t a = 1 H k = 1 H T ^ 1 , t + k T 1 , t + k 2 2
where T ^ 1 , t + k denotes the generated temperature prediction at the k -th step, and T 1 , t + k represents the corresponding ground truth. The physical loss is derived from the physical residual r t + k , which is defined as
r t + k = T ^ ˙ 1 , t + k F T ^ 1 , t + k , u t + k ; Θ p h y
Based on this residual, physical loss is defined as the mean square of the residuals:
L p h y = 1 H k = 1 H r t + k 2 2

3.3.3. Adaptive Parameter Updating Algorithm

The online update is implemented via a “finite-step optimization” approach. Let W u denote the length of the update window. At a given trigger moment, the most recent W u samples are selected to form the update segment, where the corresponding ground-truth temperatures and predicted temperatures are denoted as T 1 , i } i = 1 W u and T ^ 1 , i } i = 1 W u , respectively. The parameter re-estimation is achieved by minimizing the following objective function:
Θ p h y u p d , n e w = arg m i n Θ p h y u p d J Θ p h y u p d
J Θ p h y u p d = ω J d a t a + 1 ω J p h y + γ J r e g ,   ω 0 , 1
where ω is a weighting coefficient and γ is the regularization weight. The data term J d a t a constrains the deviation between the predicted and actual values within the window:
J d a t a = 1 W u i = 1 W u T ^ 1 , i T 1 , i 2 2
The physical term J p h y encourages the predicted trajectory within the window to align with the thermal balance prior:
J p h y = 1 W u i = 1 W u r i 2 2 , r i = T ^ ˙ 1 , i F T ^ 1 , i , u i ; Θ p h y
The regularization term J r e g is employed to constrain the smoothness and numerical stability of the parameter updates, typically taking the form of a penalty on the deviation from the previous round’s results:
J r e g = Θ p h y u p d Θ p h y u p d , o l d 2 2
To efficiently solve the nonlinear optimization problem defined above, this paper employs an iterative gradient-based updating strategy. A critical conceptual distinction must be made here between simply treating these physical parameters as learnable weights during the offline training phase and the proposed online APU mechanism. If parameters (e.g., h 0 , ε ) are solely optimized offline, they become fixed constants during inference. However, in real-world substations, these thermal properties are inherently non-stationary due to progressive contact surface oxidation, equipment aging, and seasonal microclimate shifts. A model relying on static, offline-trained parameters is highly susceptible to concept drift. Therefore, unlike static physical models, the parameter set Θ p h y u p d is treated as a dynamic variable vector calibrated on the fly during inference. The update rule at the k -th iteration step follows the direction of the negative gradient of the total objective function J :
Θ p h y ( k + 1 ) = Θ p h y k η Θ J Θ p h y k
where η is the adaptive learning rate. The gradient Θ J is composed of components corresponding to the data fitting, physical consistency, and regularization terms. Specifically, the gradient of the physical loss term J p h y is derived using the chain rule.
J p h y Θ p h y = 2 W u i = 1 W u r i · r i Θ p h y = 2 W u i = 1 W u r i · S i
Here, S i = Θ F ( T 1 , i , u i ; Θ p h y ) represents the sensitivity matrix of the physical equation with respect to the thermal parameters. For instance, the sensitivity components for the convection coefficient h 0 and equivalent heat capacity C 1 are explicitly derived as
F h 0 = A C 1 T 1 , i T a m b , i
F C 1 = 1 C 1 2 Q i n Q o u t
This gradient-driven mechanism facilitates the dynamic correction of the physical parameters along the path of steepest descent to minimize the residual energy imbalance in the current observation window.

3.3.4. Algorithmic Implementation Procedure

Based on the constructed Hybrid-PI-LSTM architecture, the physics-consistent loss function, and the online adaptive updating mechanism, this section summarizes the complete computational workflow of the model. Algorithm 1 details the systematic steps from multi-dimensional feature input to physics-constrained training, and finally to the execution of physical parameter re-estimation during the testing phase. Crucially, to avoid look-ahead bias (data leakage), the testing phase is formulated as a causal online rolling process, where parameter updating at time t relies solely on historical observations to predict unknown future steps.
Algorithm 1. The pseudo-code of the proposed Hybrid-PI-LSTM modeling approach
1:Input: Multi-factor time-series measurements X ( t ) of length N at time t including current, ambient temperature, humidity, wind speed, and solar radiation, as well as the contact temperature T ( t ) .
2:Output: The trained Hybrid-PI-LSTM model parameters Θ , and predictions T ^ ( t + 1 ) with evaluation results.
3:Hyperparameters:
• Network Architecture: 1 hidden layer, 32 neurons, ReLU activation, Input window L = 6 .
• Optimization: Epochs E . Batch size: 32. Learning rate η = 1   ×   10 3 .
• Physics Parameters: Initial values for C 1 ,   h 0 ,   ε ,   α ,   k e v a p ,   k c o n d .
• Loss weights: Physics-consistency loss weight  λ p h y =   1   ×   10 3 ;
• Adaptive Thresholds: Re-estimation error threshold ϵ e r r = 1 , Correction step γ = 0.05 , Update iterations K = 50 .
4:Normalize input features. Construct sliding window sequences of length L . Partition dataset into training, validation, and testing subsets. Initialize learnable physical parameters and the dynamic correction network.
5:for epoch in range (1, E + 1 ) do
6:     for each batch do
7:      Generate state-dependent physical parameters and compute final temperature prediction T ^ .
8:      Compute Data Loss L d a t a , Physical Loss L p h y and Total loss L t o t a l   =   L d a t a + λ p h y L p h y .
9:      Compute gradient backpropagation and update model parameters Θ  using the Adam optimizer.
10:    end for
11:end for
12:Online Testing Phase:
13:for current time step t in the test set do
14:  Observation: Receive the newly arrived true temperature T ( t ) .
15:  Trigger Evaluation: Calculate error e t = T t T ^ t using the past prediction for step t.
16:    If e t > ϵ e r r or t mod 60 = 0 then
17:      APU Execution: Re-estimate physical parameters using historical data within window t L , t
18:    end if
19:  Forecasting: Freeze parameters and execute recursive forward rollout to predict the unknown future trajectory T ^ t + 1   to   T ^ t + H
20:end for
21:Archive experimental configurations, export numerical results, and generate visualization plots for performance assessment.

4. Numerical Experiments and Performance Evaluation

4.1. Experimental Setup and Evaluation Metrics

4.1.1. Definition of Error Metrics

To comprehensively evaluate the model’s performance, this paper employs Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) as the primary accuracy indicators. The specific formulas are as follows:
M A E = 1 H k = 1 H T ^ 1 , t + k T 1 , t + k
M S E = 1 H k = 1 H T ^ 1 , t + k T 1 , t + k 2
R M S E = 1 H k = 1 H T ^ 1 , t + k T 1 , t + k 2
M A P E = 100 % H k = 1 H T ^ 1 , t + k T 1 , t + k T 1 , t + k + δ
where δ is a negligibly small positive constant added to the denominator to prevent division by zero. In addition to standard accuracy metrics, this paper further investigates the growth curves of these indicators with respect to the prediction horizon H , analyzes their distribution patterns and temporal evolution, and examines the trajectory of the online equivalent thermal parameters Θ p h y u p d to analyze their physical consistency.

4.1.2. Baseline Models and Hyperparameter Settings

To evaluate the effectiveness and comparative advantages of the proposed Hybrid-PI-LSTM in the online multi-step prediction task for disconnector contact hotspot temperature T 1 , this paper constructed an evaluation framework incorporating three distinct categories of baseline models. These baselines encompass purely mechanistic approaches, standard data-driven models, and advanced modern long short-term forecasting (LSTF) architectures:
  • Purely Physics-Based Model: This paper implemented a Finite-Difference Model (FDM) based on the governing energy balance equations. Unlike the proposed APU mechanism, this purely mechanistic solver keeps key equivalent thermal parameters (e.g., heat capacity, convection coefficients) constant during the long-term inference stage, serving as a baseline to highlight the limitations of uncalibrated physics models in non-stationary environments.
  • Long-Horizon Sequence Models: To align with the latest advancements in time-series forecasting, this paper included Transformer-family and linear/MLP-centric models. Specifically, benchmarks were performed against Informer [31] (utilizing ProbSparse attention for long sequences), Autoformer [32] (featuring deep decomposition and auto-correlation mechanisms), PatchTST [33] (leveraging patching and channel-independence for performance forecasting), DLinear [34] (an efficient decomposition-linear model), and TSMixer [35] (an advanced lightweight MLP-based architecture).
  • Standard Deep Learning and Regression Models: This paper retained foundational architectures, including Plain LSTM (without physical constraints), Ridge Regression, Convolutional Neural Network (CNN) [36], and Gated Recurrent Unit (GRU), to serve as standard data-driven references and ablation baselines.
In terms of implementation, all models were built using the PyTorch (version 2.7.1) deep learning framework and optimized using the Adam optimizer. The hyperparameter configurations, detailed in Table 1, were determined through a combination of domain expertise and an empirical grid search evaluated on the validation set. Specifically, the sliding window length was set to W = 6 . Given the 2 min sampling interval, this corresponds to a 12 min observation horizon, which is intended to capture short-term transient dynamics while mitigating the introduction of excessive historical noise. To ensure temporal consistency, the value of W u used to compute the physical residual loss during the APU triggering phase was set equal to the model’s sliding observation window, with W u = W = 6 . For the network architecture, a lightweight structure comprising a single LSTM layer with 32 hidden units was selected. Grid search results indicated that this compact architecture is sufficient to extract temporal dependencies from the low-dimensional input features while helping to mitigate overfitting and minimize inference latency for edge deployment in substations. Finally, the loss-weighting coefficient was empirically tuned to λ p h y = 1 × 10−3 to balance the magnitude of the data-driven MSE loss and the physics-based residual loss, preventing either term from dominating the gradient during the backpropagation process. Other general training hyperparameters were jointly optimized using the grid search ranges specified in Table 1.
Furthermore, to facilitate the reproducibility of the proposed Hybrid-PI-LSTM, the initial values of the equivalent thermal parameters Θ p h y optimized by the APU mechanism are explicitly defined. These initial values were determined based on empirical estimations of standard disconnector materials (e.g., copper alloy) and typical outdoor environmental conditions. These initial values are detailed in Table 2. For data pre-processing and time-feature encoding, all continuous meteorological and electrical input features, including the temporal identifiers (i.e., timestamps), were scaled using standard Min–Max normalization to map the values into a uniform range of [ 0 ,   1 ] .

4.2. Comparative Analysis of Forecasting Accuracy

To systematically validate the performance of Hybrid-PI-LSTM in predicting substation disconnector contact temperatures, this paper compares the prediction performance of different models over future horizons of H = { 180 ,   360 ,   540 ,   720 } time steps (corresponding to 6 to 24 h). To ensure a rigorous and comprehensive evaluation, this paper adopts a dual-perspective testing protocol. First, the global performance metrics (MAE, RMSE, MSE, MAPE) and the subsequent ablation studies are derived by averaging the forecasting results from multiple sliding windows sampled across the entire testing set. Second, since global statistical averaging may obscure temporal error drift and accumulation during continuous long-horizon forecasting, the error frequency distributions and temporal evolution analyses are obtained from a representative 720-step continuous rollout.

4.2.1. Statistical Analysis of Multi-Step Prediction Errors

Figure 3 illustrates the performance curves of the data-driven and hybrid models across MAE, RMSE, MSE, and MAPE metrics as the prediction horizon H increases. Additionally, Table 3 presents the forecasting errors of the physics-based FDM, which are tabulated separately due to their significantly larger error magnitudes.
From a horizontal comparison perspective, Hybrid-PI-LSTM generally achieves lower forecasting errors across all prediction horizons. In short-term prediction ( H = 180 ), the Hybrid-PI-LSTM achieves an MAPE of 0.22% and an RMSE of 0.097. In contrast, the purely data-driven Plain LSTM yields an MAPE of 0.689% and an RMSE of 0.236. Notably, even the advanced LSTF models (such as PatchTST, Informer, and Autoformer) report initial RMSEs of around 0.22 to 0.26. This suggests that the incorporation of thermodynamic priors may help the model capture temperature evolution more effectively than relying solely on purely data-driven sequence mappings.
This advantage becomes more apparent as the prediction horizon extends. At H = 720 , the RMSE of Hybrid-PI-LSTM remains relatively stable at 0.283, with an MAPE of 0.653%. Meanwhile, classic models like CNN and GRU exhibit substantial error accumulation, with RMSEs exceeding 0.66. While the LSTF models demonstrate stronger robustness against sequence degradation—maintaining RMSEs between 0.30 and 0.33—they do not explicitly incorporate the mechanistic constraints of the proposed method. Specifically, compared to the Plain LSTM, the proposed method reduces RMSE by 41.8%, MSE by 66.1%, MAE by 47.6%, and MAPE by 48.9% at the longest horizon.
From a longitudinal trend perspective, observing the FDM in Table 3 reveals a notable limitation: its RMSE increases from 0.315 at H = 180 to 1.511 at H = 720 . This drift suggests that fixed-parameter mechanistic models may have limited adaptability to time-varying outdoor meteorological disturbances, indicating that synergizing deep sequence extraction with an APU constraint helps mitigate error drift, thereby contributing to improved stability over long time domains.

4.2.2. Temporal Stability and Error Distribution Analysis

To explore the sources of performance differences more deeply, this paper further investigates the temporal dynamic evolution of prediction errors. Figure 4 presents two groups of 3D ridge plots, illustrating the error trajectories for classical baselines (Figure 4a) and modern LSTF architectures (Figure 4b).
Observations from the classical model group reveal that GRU and CNN exhibit pronounced “wave-like” oscillations along the time axis. Such high-amplitude fluctuations indicate that traditional black-box models may be more susceptible to local extrema under non-stationary environmental changes. Even the widely adopted LSTM, despite a lower overall error baseline, shows progressive error amplification over time, suggesting difficulty in maintaining long-sequence stability.
More critically, the advanced LSTF models also demonstrate certain limitations in capturing the thermal dynamics. Transformer-based architectures (Autoformer and Informer) exhibit noticeable error oscillations, with ridge lines fluctuating considerably and remaining relatively distant from the zero plane. Meanwhile, TSMixer, PatchTST, and DLinear manage to reduce the overall error magnitude; however, they still display sharp periodic peaks and a gradual upward baseline drift as the prediction horizon extends. This observation suggests that even advanced purely data-driven sequence mappings may encounter challenges in maintaining temporal consistency without mechanistic guidance.
In contrast, the error ridge line of the proposed Hybrid-PI-LSTM remains relatively close to the zero plane in both comparison groups. It exhibits no obvious abrupt peaks, cumulative divergence, or baseline drift throughout the entire prediction period. This comparatively stable trajectory suggests that integrating thermal balance constraints with the APU mechanism may help alleviate certain vulnerabilities of black-box models, thereby contributing to stability and generalization robustness in long-sequence thermal forecasting.
Further analysis of the full-horizon prediction curves in Figure 5a,b reveals the dynamic limitations of the baseline models during non-stationary thermal shifts. At the critical high-temperature interval (steps 350–450), the ground truth reaches an extremum of approximately 41.2 °C.
As observed in the classical models (Figure 5a), LSTM and CNN exhibit severe amplitude attenuation, peaking at approximately 40.1 °C (a systemic negative bias of >1.0 °C). Conversely, GRU distinctly overshoots the peak. Even the advanced LSTF architectures (Figure 5b) struggle to track the true trajectory. TSMixer exhibits the most severe overshooting (exceeding 41.8 °C), accompanied by anomalous high-frequency fluctuations throughout the prediction horizon. Autoformer presents a similar overshooting and oscillating characteristic, failing to capture the smooth thermodynamic decay after step 500. On the other hand, Informer, PatchTST, and DLinear significantly underestimate the peak amplitude.
A closer examination of the inflection points reveals that all purely data-driven models exhibit a pronounced phase lag of 10–20 time steps. This phase mismatch and amplitude deviation directly cause the intense “wave-like” error accumulations observed in Figure 4, suggesting that pure black-box models react to, rather than predict, rapid thermodynamic changes.
In contrast, the proposed Hybrid-PI-LSTM closely tracks the ground-truth trajectory across the entire 720-step horizon. It maintains a maximum instantaneous error within 0.2 °C at the critical peak, exhibiting minimal overshoot, amplitude attenuation, or phase lag. This tracking indicates that embedding the thermal balance constraints and APU mechanism helps constrain the hypothesis space of the neural network. By forcing the model to adhere to intrinsic thermodynamic laws rather than merely fitting delayed statistical correlations, the hybrid approach helps alleviate the dynamic limitations of traditional black-box models in long-term recursive forecasting.
Numerical analysis of the statistical distribution features further reveals the reliability of the predictions. As shown in the error boxplots in Figure 6, the error distribution of the proposed Hybrid-PI-LSTM is highly concentrated. Its Interquartile Range (IQR) is confined to a narrow interval of [−0.3, 0.0] °C, with the median nearly coincident with the zero axis and no extreme outliers observed. Conversely, classical baseline models tend to exhibit either severe systematic underestimation (e.g., CNN’s median at −0.6 °C) or substantial error dispersion (e.g., GRU’s extreme outliers, which reach +1.4 °C).
While the advanced LSTF architectures reduce the extreme outliers seen in classical models, they still exhibit noticeable systematic biases. Specifically, the Transformer-based models (Informer, Autoformer, PatchTST) all shift their error distributions toward the negative direction (medians around −0.3 °C), indicating a persistent tendency toward underestimation. Conversely, DLinear presents a distinct positive bias (overestimating the temperature), whereas TSMixer shows the most dispersed IQR among the advanced LSTF baselines, suggesting weaker robustness under non-stationary dynamics.
The error frequency distribution statistics in Table 4 support these observations. The errors of Hybrid-PI-LSTM are highly concentrated, with over 50% of samples falling within the narrow [−0.3, −0.1) °C range and no occurrences in peripheral extreme intervals (e.g., beyond ±0.7 °C). In contrast, classical models exhibit substantial “long-tail” and “divergent” characteristics. Furthermore, the frequency data reveal the epistemic biases of the advanced LSTF models: Informer concentrates over 97% of its predictions errors in the negative error domain [−0.7, −0.1) °C, while DLinear distributes over 72% of its errors into the positive domain.
This wide-ranging and systematically biased frequency distribution reflects the epistemic uncertainty of purely data-driven sequence modeling. While modern architectures reduce the overall error variance, they may still learn biased statistical mappings when lacking physical mechanism constraints. By embedding thermodynamic priors and the APU mechanism, Hybrid-PI-LSTM helps mitigate these systemic drifts, significantly reducing extreme outliers and achieving a more reliable, zero-centered predictive distribution.

4.2.3. Performance Evaluation Under Realistic Forecast Inputs

The preceding evaluations utilized true future covariates to establish the theoretical performance limits of the models by isolating sequence-mapping capabilities from external input uncertainties. However, to rigorously validate the practical generalization of the proposed framework, an additional experiment was conducted using realistic forecast inputs.
In this supplementary evaluation, the true future covariates within the testing set were entirely substituted with 24 h day-ahead prediction data directly acquired from the operational systems of the power supply bureau, encompassing both their meteorological and load forecasting modules. To objectively assess the models’ robustness against input uncertainties, the proposed Hybrid-PI-LSTM was compared with two advanced Transformer-based architectures, PatchTST and Autoformer. The comprehensive error metrics across various prediction horizons are presented in Table 5.
As observed in Table 5, the introduction of inherent forecasting errors from the external systems naturally leads to a slight performance degradation across all models compared to the ideal true-input scenario. Nevertheless, the Hybrid-PI-LSTM maintains relatively stable performance, achieving an RMSE of 0.296 and an MAE of 0.230 at the H = 720 horizon. In contrast, the purely data-driven advanced baselines, PatchTST and Autoformer, exhibit greater sensitivity to input disturbances, yielding RMSEs of 0.331 and 0.361, respectively. This evidence suggests that the incorporation of thermodynamic constraints and the APU mechanism may help restrict the hypothesis space, thereby enhancing the model’s structural robustness and reducing error accumulation when subjected to practical forecasting uncertainties.

4.3. Ablation Study on Constraint Mechanisms

To evaluate the specific contributions of the “physical constraint term” and the “parameter-adaptive updating mechanism” in multi-step prediction, this paper constructed a three-tiered ablation study and compared model performance under identical test conditions:
  • w/o Physics: Removes all physical constraints, degenerating into a purely data-driven LSTM model.
  • w/ Static-Phys: Introduces thermal balance residual constraints but maintains constant thermal parameters during inference.
  • Hybrid-PI-LSTM: The complete model proposed in this paper, including physical constraints and the online parameter-adaptive re-estimation mechanism.
Figure 7 displays the error evolution curves for different prediction horizons H  across the ablation variants. The results reveal a clear hierarchical progression: A comparison between w/o Physics and w/ Static-Phys shows that the introduction of static physical constraints yields consistent performance improvements across all prediction horizons. Notably, at H = 360 , MAE drops from 0.341 to 0.262 and MAPE from 0.992% to 0.733%. The Hybrid-PI-LSTM, incorporating the parameter adaptive mechanism, further extends this advantage, reducing MAE to 0.190 and enhancing predictive precision. As the prediction horizon extends to H = 540 and H = 720 , w/ Static-Phys begins to show a clear upward trend, indicating that, even when optimally trained offline, fixed parameters struggle to adapt to cumulative drift over long periods in non-stationary environments. In contrast, the Hybrid-PI-LSTM remains stable. In the long-term prediction of H = 720 , the complete model’s RMSE is 0.283, representing a 35% reduction compared to the RMSE of 0.435 yielded by the w/ Static-Phys model, demonstrating that the online update mechanism, i.e., dynamically adjusting equivalent thermal parameters during the inference phase rather than relying solely on offline training, plays a critical role in mitigating divergence in long-term forecasting.
Figure 8 compares the histograms of physical residual distributions for the w/ Static-Phys and Hybrid-PI-LSTM models. While both models generally center their residuals around zero (indicating that physical constraints maintain the prediction trajectory at a magnitude consistent with the thermal balance equation), the Hybrid-PI-LSTM exhibits higher kurtosis. The majority of its residuals are compressed into a noticeably narrower range, indicating a higher degree of conformity between its predicted trajectory and the thermal balance equation.
Figure 9 further illustrates the dynamic evolution of physical residuals over time. In the w/ Static-Phys model, the physical residual r p h y shows a continuous positive shift during the temperature rise interval from H = 300 to H = 450 . This implies that the fixed-parameter model cannot explain the full thermal dynamic changes during this period, resulting in systemic bias. Conversely, the residual curve of the Hybrid-PI-LSTM model fluctuates near the zero axis with reduced trend drift. From a mechanistic viewpoint, this explains why the Hybrid-PI-LSTM maintains robust prediction stability even at long horizons—it effectively compensates for the physical model bias caused by environmental changes in real time via parameter updates.

4.4. Computational Complexity and Edge Deployment Feasibility

While the proposed APU mechanism introduces an iterative gradient-based optimization loop during the inference stage, it is crucial to evaluate its computational cost to ensure its feasibility for real-time edge deployment in actual substations. To quantify this, this paper conducted a comprehensive runtime analysis simulating a continuous online rolling prediction scenario. The inference latency was measured on a standard workstation equipped with an Intel Core i5 CPU, 16 GB of RAM, and an NVIDIA RTX 4060 Ti GPU.
As detailed in Table 6, purely data-driven baselines—including both classical networks (e.g., LSTM at 0.297 ms/step) and modern LSTF architectures (e.g., PatchTST at 0.586 ms/step)—exhibit low inference latency, typically taking less than 0.6 ms per forward pass. In comparison, the Hybrid-PI-LSTM, which activates the APU algorithm to iteratively compute physical residuals and execute backward gradient corrections, takes an average of 8.101 ms per step.
Although the APU inherently increases the computational cost relative to static baselines, this latency must be contextualized within the actual industrial operating environment. According to power grid operation and maintenance experts, the data sampling interval for disconnector condition monitoring systems is generally set between 1 and 2 min. Given the 2 min (120,000 ms) sampling interval adopted in the utilized real-world dataset, an inference latency of 8.101 ms occupies only 0.00675% of the available time window between consecutive samples. This relatively low computational occupancy rate suggests that the proposed optimization loop is unlikely to cause data congestion or real-time processing bottlenecks. Consequently, the Hybrid-PI-LSTM framework shows strong potential for lightweight, real-time edge computing deployment on industrial IoT gateways in substations.

4.5. Parameter Identification and Sensitivity Analysis

To distinguish the proposed framework from the “black-box” nature of traditional deep learning models, this section aims to examine the internal working mechanism of Hybrid-PI-LSTM by visualizing the online evolution trajectories of key thermal parameters. Table 7 presents the results of the parameter re-estimation performed every 60 time steps during the test sequence.
Observing this time-varying parameter sequence reveals that C 1 , k evap , and k cond gradually increase with update steps, while α shows a monotonic decrease. Meanwhile, h 0 and ε exhibit a non-monotonic trend of initially increasing and subsequently decreasing. From the perspective of physical consistency, the continuous increase in C 1 may reflect that as the heating duration extends, heat gradually diffuses from the contact surface to distal metal components like connecting rods and conductive arms. The online update mechanism helps compensate for the thermal inertia contributions of unmodeled components in the single-node model by increasing the equivalent heat capacity. Similarly, h 0 likely corresponds to the process of local wind field transformation. The model mitigates the deviation of static empirical formulas under specific operating conditions by adjusting the baseline heat dissipation capability.
It is worth noting that the smoothness of the aforementioned parameter evolution trajectories supports the effectiveness of the regularization term J r e g in the cost function. In summary, this “dynamic physical parameter optimization” capability is a key factor enabling the Hybrid-PI-LSTM to mitigate systemic bias and maintain dynamic consistency in long-term prediction.
To evaluate the sensitivity of the APU algorithm to parameter initialization, a + 20 % perturbation was applied to the initial values of key parameters (setting C 1 = 360 and h 0 = 12 ). The corresponding parameter re-estimation results are presented in Table 8. Comparing Table 7 (baseline) with Table 8 (perturbed), it is evident that the equivalent parameters evolve along entirely distinct trajectories depending on their starting points. However, as illustrated in Figure 10, despite these diverging parameter paths, the multi-step forecasting error metrics (MAE and RMSE) across various prediction horizons remain consistent and nearly identical for both scenarios.
This phenomenon reveals a crucial characteristic of the proposed framework: the APU algorithm exhibits favorable robustness against initialization variance. Rather than strictly converging to a single set of absolute values, the gradient-based APU dynamically estimates equivalent parameter combinations that adequately satisfy the thermal dynamic constraints. Consequently, the model facilitates stable and reproducible prediction results despite the introduced initial parameter biases.
In addition to initialization robustness, a further sensitivity analysis was conducted on three algorithmic hyperparameters: update window size   W u , update iterations K , and the error threshold ϵ e r r . The evaluation was performed at the forecasting horizon of H = 720 .
Regarding the window size, varying W u { 3 ,   6 ,   12 } yielded RMSEs of 0.294, 0.283, and 0.288, respectively. This suggests that W u provides a reasonable balance between the retention of sufficient historical physics context and the smearing of high-frequency transients. For the optimization iterations, testing K { 20 ,   50 ,   100 } resulted in RMSEs of 0.301, 0.283, and 0.282, respectively. These results indicate that the gradient-based parameter update generally stabilizes within 50 iterations, with minimal accuracy gains observed at higher values.
Furthermore, varying the emergency safety threshold ϵ e r r { 0.8 ,   1.0 ,   1.2 } yielded identical RMSE values of 0.283. This indicates that the regular 60-step time-driven update was sufficient to suppress cumulative drifts, keeping the absolute error consistently below the safety threshold. Consequently, the ϵ e r r mechanism functions primarily as a supplementary condition for unmodeled anomalies, rather than driving routine updates.

5. Conclusions and Limitations

5.1. Conclusions

This paper investigates the fundamental challenges of predicting disconnector contact temperatures under non-stationary environmental and operational conditions. An analysis of the dynamic failure modes of purely data-driven sequence models demonstrates that the absence of thermodynamic constraints tends to cause temporal phase lags and cumulative epistemic biases in long-horizon forecasting.
To overcome these limitations, this paper proposes the Hybrid-PI-LSTM framework. Rather than simply utilizing physical equations as a static offline penalty, this paper highlights the critical necessity of online parameter adaptation. The proposed APU mechanism treats equivalent thermal parameters as dynamic variables, continuously calibrating them via gradient descent during the inference phase. This methodology effectively restricts the neural network’s hypothesis space and helps mitigate the impact of concept drift caused by unmodeled environmental changes and equipment aging. Consequently, the framework achieves centered error convergence and physics-informed parameter evolution trajectories over extended prediction horizons, offering a mechanistically constrained alternative to traditional black-box deep learning in power system condition monitoring.

5.2. Limitations and Future Work

Despite the demonstrated advantages, the proposed framework has specific limitations that will guide future research:
  • Inference Cost, Model Fidelity, and Deployment Constraints: The APU inherently introduces a gradient-based optimization loop during inference. While the runtime analysis confirms that the 8.1 ms latency is minimal for the 2 min sampling interval under the current lumped-parameter ODE constraint, this computational overhead could pose a bottleneck in two specific scenarios. First, the additional computation may become prohibitive if the framework is scaled to ultra-high-frequency transient monitoring tasks (e.g., kHz-level sampling). Second, the gradient computation within the APU loop would exponentially increase the inference latency if the physical constraints are expanded to encompass high-fidelity spatial PDEs for 3D thermal field analysis. Balancing high-fidelity multi-physics constraints with edge-device computational limits remains a critical direction for future lightweight deployments.
  • Optimization Stability under Extreme Physical Degradation: The APU is effective at tracking continuous, slowly varying concept drifts (e.g., progressive surface oxidation or seasonal microclimate shifts). However, in scenarios involving sudden, discontinuous physical damage (e.g., abrupt contact fracture or catastrophic sensor failure), the gradient-based residual re-estimation might encounter non-convex optimization challenges. In such extreme edge cases, the algorithm could experience temporary parameter instability before convergence. Future work will explore integrating bounded-optimization solvers or physical-rule-based fallback mechanisms to enhance algorithmic robustness under extreme fault transients.

Author Contributions

Conceptualization, X.L. and L.Y.; methodology, X.L.; formal analysis, X.L. and L.Y.; investigation, L.Y. and Z.Z.; resources, X.Z. and Y.C.; writing—original draft preparation, X.L.; writing—review and editing, L.Y.; supervision, X.Z. and Y.C.; project administration, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Project of Guangdong Power Grid Co., Ltd., grant number 031400KC23070002 (Research on Power Equipment Overheating Trend Prediction System Based on Neural Dynamic Learning Methods).

Data Availability Statement

To facilitate reproducibility and enable algorithmic verification by other researchers, the core source code of the Hybrid-PI-LSTM framework, along with an anonymized and truncated sample data subset, has been made publicly available on GitHub at [https://github.com/Luoxuesong1/Hybrid-PI-LSTM, accessed on 10 March 2026]. The full-scale raw operational dataset is subject to strict confidentiality agreements with the power grid corporation and is available from the corresponding author upon reasonable and justified request.

Acknowledgments

The authors gratefully acknowledge the technical support provided by the Science and Technology Project of Guangdong Power Grid Co., Ltd.

Conflicts of Interest

Xinwei Zhang and Yuhong Chen were employed by Meizhou Power Supply Bureau, Guangdong Power Grid Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The Meizhou Power Supply Bureau, Guangdong Power Grid Corporation and Science and Technology Project of Guangdong Power Grid Co., Ltd. had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Lan, Z.J.; Wang, Y.P.; Wu, J.L.; Fan, Y. Studies on overheating fault of disconnecting switch. IOP Conf. Ser. Mater. Sci. Eng. 2020, 793, 012024. [Google Scholar] [CrossRef]
  2. Jia, Y.; Li, Y.; Tao, J.; Yang, J.; Zhao, K. Numerical simulation and experimental analysis on abnormal overheating of GW6B-252 high voltage disconnector. High Volt. Appar. 2020, 56, 240–246. [Google Scholar] [CrossRef]
  3. Mardegan, C.S.; Shipp, D.D. Anatomy of a complex electrical failure and its forensics analysis. IEEE Trans. Ind. Appl. 2014, 50, 2910–2918. [Google Scholar] [CrossRef]
  4. Yang, K.; Li, W.; Song, G.; Cheng, P. Operation analysis of HV switchgears in 2013. Smart Grid 2014, 2, 32–41. [Google Scholar] [CrossRef]
  5. Mahmoud, M.A.; Md Nasir, N.R.; Gurunathan, M.; Raj, P.; Mostafa, S.A. The Current State of the Art in Research on Predictive Maintenance in Smart Grid Distribution Network: Fault’s Types, Causes, and Prediction Methods—A Systematic Review. Energies 2021, 14, 5078. [Google Scholar] [CrossRef]
  6. Ahmad, R.; Kamaruddin, S. An overview of time-based and condition-based maintenance in industrial application. Comput. Ind. Eng. 2012, 63, 135–149. [Google Scholar] [CrossRef]
  7. Yang, F.; Cheng, P.; Luo, H.; Yang, Y.; Liu, H.; Kang, K. 3-D thermal analysis and contact resistance evaluation of power cable joint. Appl. Therm. Eng. 2016, 93, 1183–1192. [Google Scholar] [CrossRef]
  8. Wu, X.W.; Shu, N.Q.; Li, L.; Li, H.T.; Peng, H. Finite Element Analysis of Thermal Problems in Gas-Insulated Power Apparatus With Multiple Species Transport Technique. IEEE Trans. Magn. 2014, 50, 321–324. [Google Scholar] [CrossRef]
  9. Mudhigollam, U.K.; Tiwari, N.; Rao, M.M. Transient thermal analysis of gas insulated switchgear modules using thermal network approach. Int. J. Emerg. Electr. Power Syst. 2024, 25, 163–174. [Google Scholar] [CrossRef]
  10. Crinon, E.; Evans, J.T. The effect of surface roughness, oxide film thickness and interfacial sliding on the electrical contact resistance of aluminium. Mater. Sci. Eng. A 1998, 242, 121–128. [Google Scholar] [CrossRef]
  11. Doolgindachbaporn, A.; Callender, G.; Lewin, P.L.; Simonson, E.; Wilson, G. A Top-Oil Thermal Model for Power Transformers That Considers Weather Factors. IEEE Trans. Power Deliv. 2022, 37, 2163–2171. [Google Scholar] [CrossRef]
  12. Xu, Z.; Zhang, Y.; Xue, F.; Xia, Y.; Jiang, J.; Gao, J. Short-Term Temperature Forecasting of Cable Joint Based on Temporal Convolutional Neural Network. IEEE Access 2024, 12, 132543–132551. [Google Scholar] [CrossRef]
  13. Xin, L.; Wu, Z.; Wang, Q.; Wu, B.; Zhao, L.; Cheng, J.; Zeng, Q.; Peng, Z. A Deep-Learning Surrogate Model for the Fast Calculation of Temperature Distribution of ±500 kV RIP Bushings. IEEE Trans. Power Deliv. 2024, 39, 3050–3060. [Google Scholar] [CrossRef]
  14. Zhan, N.; Wang, X.; Wei, B.; Tao, Y.; Huang, Z.; Xiao, J.W. Temperature Prediction of Disconnecting Switch Based on Memory Regression Metric Learning. In Proceedings of the 34th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Jinzhou, China, 6–8 June 2019; pp. 206–210. [Google Scholar] [CrossRef]
  15. Guo, Y.; Chang, Y.; Lu, B. A review of temperature prediction methods for oil-immersed transformers. Measurement 2025, 239, 115383. [Google Scholar] [CrossRef]
  16. Yang, L.; Chen, L.; Zhang, F.; Ma, S.; Zhang, Y.; Yang, S. A Transformer Oil Temperature Prediction Method Based on Data-Driven and Multi-Model Fusion. Processes 2025, 13, 302. [Google Scholar] [CrossRef]
  17. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
  18. Cho, G.; Zhu, D.; Campbell, J.J.; Wang, M. An LSTM-PINN Hybrid Method to Estimate Lithium-Ion Battery Pack Temperature. IEEE Access 2022, 10, 100594–100604. [Google Scholar] [CrossRef]
  19. Li, A.; Guan, F.; Hu, H.; Liu, Z.; Xiao, B.; Huang, W.; Yang, Y. Physics-informed DEEPLSTM for digital twin modeling of steam turbine full-operating condition. Appl. Therm. Eng. 2025, 278, 127439. [Google Scholar] [CrossRef]
  20. Gao, S.; Wang, J.; Yang, D.; Zhang, J.; Wu, L.; Yang, P.; Wang, P. Physics-informed neural networks for predicting the surface temperature of carbon fiber reinforced polymers under laser irradiation. Sci. Rep. 2025, 15, 40598. [Google Scholar] [CrossRef]
  21. de Jong, S.D.M.; Ghezeljehmeidan, A.G.; van Driel, W.D. Solder joint reliability predictions using physics-informed machine learning. Microelectron. Reliab. 2025, 172, 115797. [Google Scholar] [CrossRef]
  22. Quan, W.; Ma, X.; Shang, Z.; Zhao, K.; Su, M.; Dong, Z.; Zhao, Z. Hybrid physics-data-driven model for temperature field prediction of asphalt pavement based on physics-informed neural network. Constr. Build. Mater. 2025, 489, 142179. [Google Scholar] [CrossRef]
  23. Wu, M.; Liang, Z.; Bao, S.; Wang, H.; Liu, Y.; Zhang, Z.; Xuan, Q. Bias Correction of SMAP L2 Sea Surface Salinity Based on Physics-Informed Neural Network. Remote Sens. 2025, 17, 3226. [Google Scholar] [CrossRef]
  24. Kabanikhin, S.I. Definitions and examples of inverse and ill-posed problems. J. Inverse Ill-Posed Probl. 2008, 16, 317–357. [Google Scholar] [CrossRef]
  25. Cannon, J.R.; DuChateau, P. Structural identification of an unknown source term in a heat equation. Inverse Probl. 1998, 14, 535–551. [Google Scholar] [CrossRef]
  26. Feng, P.; Karimov, E.T. Inverse source problems for time-fractional mixed parabolic-hyperbolic-type equations. J. Inverse Ill-Posed Probl. 2015, 23, 339–353. [Google Scholar] [CrossRef]
  27. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  28. Stosur, M.; Szewczyk, M.; Sowa, K.; Dawidowski, P.; Balcerek, P. Thermal behaviour analyses of gas-insulated switchgear compartment using thermal network method. IET Gener. Transm. Distrib. 2016, 10, 2833–2841. [Google Scholar] [CrossRef]
  29. Cai, X.; Li, D.; Zou, Y.; Liu, Z.; Heidari, A.A.; Chen, H. A hybrid wind speed forecasting model with rolling mapping decomposition and temporal convolutional networks. Energy 2025, 324, 135673. [Google Scholar] [CrossRef]
  30. Wang, W.; Guo, H.; Liu, S.; Xin, Y.; Li, G.; Wang, Y. Dynamic-parameter physics-informed neural networks for short-term photovoltaic power prediction: Integrating physics-informed and data driven. Appl. Energy 2025, 401, 126764. [Google Scholar] [CrossRef]
  31. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), Virtual, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar] [CrossRef]
  32. Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
  33. Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In Proceedings of the 2023 International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023; Available online: https://openreview.net/forum?id=Jbdc0vTOcol (accessed on 10 March 2026).
  34. Zeng, A.; Chen, M.; Zhang, L.; Qi, Q. Are transformers effective for time series forecasting? In Proceedings of the 2023 AAAI Conference on Artificial Intelligence, Washington DC, USA, 7–8 February 2023; Volume 37, pp. 11121–11128. [Google Scholar] [CrossRef]
  35. Ekambaram, V.; Jati, A.; Nguyen, N.; Sinthong, P.; Kalagnanam, J. TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; Volume 29, pp. 459–469. [Google Scholar] [CrossRef]
  36. Gui, Y.; Yin, C.; Liu, R.; Dai, H.; He, L.; Zhao, J.; Ma, Q.; Zhong, C. An Edge-Enabled Lightweight LSTM for the Temperature Prediction of Electrical Joints in Low-Voltage Distribution Cabinets. Sensors 2025, 25, 6816. [Google Scholar] [CrossRef]
Figure 1. Methodological framework. Hybrid-PI-LSTM: Hybrid Physics-Informed Long Short-Term Memory.
Figure 1. Methodological framework. Hybrid-PI-LSTM: Hybrid Physics-Informed Long Short-Term Memory.
Mathematics 14 00970 g001
Figure 2. Schematic diagram of the LSTM unit structure.
Figure 2. Schematic diagram of the LSTM unit structure.
Mathematics 14 00970 g002
Figure 3. Performance comparison of different models across varying prediction horizons H. (a) MAE; (b) MAPE; (c) MSE; (d) RMSE. CNN: Convolutional Neural Network; GRU: Gated Recurrent Unit.
Figure 3. Performance comparison of different models across varying prediction horizons H. (a) MAE; (b) MAPE; (c) MSE; (d) RMSE. CNN: Convolutional Neural Network; GRU: Gated Recurrent Unit.
Mathematics 14 00970 g003
Figure 4. 3D ridge plot comparison of prediction error evolution over time for different models. (a) Classical baselines; (b) modern long short-term forecasting (LSTF).
Figure 4. 3D ridge plot comparison of prediction error evolution over time for different models. (a) Classical baselines; (b) modern long short-term forecasting (LSTF).
Mathematics 14 00970 g004
Figure 5. Comparison of multi-step prediction curves. (a) Classical baselines; (b) modern LSTF.
Figure 5. Comparison of multi-step prediction curves. (a) Classical baselines; (b) modern LSTF.
Mathematics 14 00970 g005
Figure 6. Boxplot statistics of prediction error distributions for different models.
Figure 6. Boxplot statistics of prediction error distributions for different models.
Mathematics 14 00970 g006
Figure 7. Ablation study of physical constraints and adaptive update mechanisms on prediction errors across different horizons. (a) MAE; (b) MAPE; (c) MSE; (d) RMSE.
Figure 7. Ablation study of physical constraints and adaptive update mechanisms on prediction errors across different horizons. (a) MAE; (b) MAPE; (c) MSE; (d) RMSE.
Mathematics 14 00970 g007
Figure 8. Histograms of physics-based residual distributions comparing the effect of parameter adaptation. (a) w/ Static-Phys; (b) Hybrid-PI-LSTM.
Figure 8. Histograms of physics-based residual distributions comparing the effect of parameter adaptation. (a) w/ Static-Phys; (b) Hybrid-PI-LSTM.
Mathematics 14 00970 g008
Figure 9. Comparison of dynamic evolution trajectories of physics-based residuals over time.
Figure 9. Comparison of dynamic evolution trajectories of physics-based residuals over time.
Mathematics 14 00970 g009
Figure 10. Comparison of forecasting error metrics across different prediction horizons under baseline and perturbed initializations: (a) MAE; (b) RMSE.
Figure 10. Comparison of forecasting error metrics across different prediction horizons under baseline and perturbed initializations: (a) MAE; (b) RMSE.
Mathematics 14 00970 g010
Table 1. Model hyperparameters.
Table 1. Model hyperparameters.
ParameterRange
Hidden Units16~64
Layers1~3
Initial Learning Rate1 × 10−4~1 × 10−3
Training Epochs100~2000
Batch Size32~128
Window Length6
λ p h y 1 × 10−3
Table 2. Initial values.
Table 2. Initial values.
ParameterValue
C 1 300
h 0 10
ε 0.3
α 0.5
k e v a p 1.5
k c o n d 2.0
Table 3. Finite-Difference Model forecasting errors. MAE: Mean Absolute Error; MAPE: Mean Absolute Percentage Error; MSE: Mean Squared Error; RMSE: Root Mean Square Error.
Table 3. Finite-Difference Model forecasting errors. MAE: Mean Absolute Error; MAPE: Mean Absolute Percentage Error; MSE: Mean Squared Error; RMSE: Root Mean Square Error.
HMAEMAPE (%)MSERMSE
1800.2810.9390.1000.316
3600.7742.1021.4051.185
5401.2603.3102.9231.710
7201.0792.8952.2851.512
Table 4. Frequency distribution of prediction errors.
Table 4. Frequency distribution of prediction errors.
Error IntervalHybrid-PI-LSTMLSTMRidge RegressionCNNGRUDlinearPatchTSTTSMixerInformerAuto
Former
[−1.5, −1.3)0002000000
[−1.3, −1.1)00080000000
[−1.1, −0.9)010102000000
[−0.9, −0.7)0595122000002
[−0.7, −0.5)6019714115002702774
[−0.5, −0.3)145325368303128024379256
[−0.3, −0.1)36913087461590272134292187
[−0.1, 0.1)1468164398179951902282
[0.1, 0.3)00140122176249441570115
[0.3, 0.5)0046913420026804
[0.5, 0.7)00460857005500
[0.7, 0.9)0096093104200
[0.9, 1.1)0042053003800
[1.1, 1.3)0031038001000
[1.3, 1.5)001302900200
Table 5. Performance comparison of forecasting models under realistic prediction inputs.
Table 5. Performance comparison of forecasting models under realistic prediction inputs.
ModelMAEMAPE (%)
180360540720180360540720
Hybrid-PI-LSTM0.0740.2050.2660.2300.2400.5710.7110.628
PatchTST0.1960.3040.2820.2990.6420.8530.7610.837
Autoformer0.2240.2610.2810.3160.7450.7720.8440.893
ModelRMSEMSE
180360540720180360540720
Hybrid-PI-LSTM0.0970.2600.3330.2960.0090.0670.1110.088
PatchTST0.2270.3410.3290.3310.0510.1160.1090.110
Autoformer0.2660.3030.3470.3610.0710.0920.1200.130
Table 6. Comparison of average inference latency and computational occupancy rate per prediction step across different models.
Table 6. Comparison of average inference latency and computational occupancy rate per prediction step across different models.
ModelAverage Latency (ms/Step)Occupancy Rate
Hybrid-PI-LSTM8.1010.00675%
LSTM0.2970.00025%
Ridge Regression0.0550.00005%
CNN0.1730.00014%
GRU0.2230.00019%
DLinear0.3490.00029%
PatchTST0.5860.00049%
TSMixer0.3320.00028%
Informer0.3700.00031%
Autoformer0.3620.00030%
Table 7. Adaptive update trajectories of effective thermal parameters during online inference.
Table 7. Adaptive update trajectories of effective thermal parameters during online inference.
Step C 1 h 0 ε α k e v a p k c o n d
60409.8911.9300.44180.31761.20112.9077
120418.1412.1690.46150.29771.22102.9663
180426.5612.4120.48100.27771.24103.0261
240435.1512.6600.50060.25781.26093.0871
300443.9112.9150.52050.23791.28093.1493
360452.8513.1750.54030.21791.30083.2128
420461.9713.4400.56020.19801.32073.2776
480471.2813.6030.57330.17801.34073.3376
540480.7713.3360.55350.15811.36063.2718
600490.4513.0750.53380.13811.38063.2073
660500.3312.8190.51410.11821.40053.1441
720510.4112.5780.49570.09831.42053.0833
Table 8. Adaptive update trajectories of effective thermal parameters under + 20 % initialization perturbation. ( C 1 = 360 , h 0 = 12 ).
Table 8. Adaptive update trajectories of effective thermal parameters under + 20 % initialization perturbation. ( C 1 = 360 , h 0 = 12 ).
Step C 1 h 0 ε α k e v a p k c o n d
60453.7514.0420.41160.30301.27262.8508
120462.8814.3240.43130.28381.29252.9083
180472.1914.6110.45110.26391.31252.9669
240481.7014.9050.47080.24401.33243.0267
300491.4015.2050.49070.22411.35233.0877
360501.2815.5110.51050.20411.37233.1500
420511.3715.8230.53040.18421.39223.2135
480521.6516.1420.55030.16431.41213.2782
540532.1516.4670.57020.14431.43213.3443
600542.8616.7980.59000.12441.45203.4118
660553.7917.1360.60980.10451.47193.4805
720564.9417.4780.62940.08451.49193.5505
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, X.; Yang, L.; Zhang, X.; Chen, Y.; Zhang, Z. Physics-Informed LSTM with Adaptive Parameter Updating for Non-Stationary Time Series: A Case Study on Disconnector Health Monitoring. Mathematics 2026, 14, 970. https://doi.org/10.3390/math14060970

AMA Style

Luo X, Yang L, Zhang X, Chen Y, Zhang Z. Physics-Informed LSTM with Adaptive Parameter Updating for Non-Stationary Time Series: A Case Study on Disconnector Health Monitoring. Mathematics. 2026; 14(6):970. https://doi.org/10.3390/math14060970

Chicago/Turabian Style

Luo, Xuesong, Lin Yang, Xinwei Zhang, Yuhong Chen, and Zhijun Zhang. 2026. "Physics-Informed LSTM with Adaptive Parameter Updating for Non-Stationary Time Series: A Case Study on Disconnector Health Monitoring" Mathematics 14, no. 6: 970. https://doi.org/10.3390/math14060970

APA Style

Luo, X., Yang, L., Zhang, X., Chen, Y., & Zhang, Z. (2026). Physics-Informed LSTM with Adaptive Parameter Updating for Non-Stationary Time Series: A Case Study on Disconnector Health Monitoring. Mathematics, 14(6), 970. https://doi.org/10.3390/math14060970

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop