Causal-Enhanced LSTM-RF: Early Warning of Dynamic Overload Risk for Distribution Transformers

Bai, Hao; Liu, Yipeng; Zheng, Yawen; Dong, Ming; Ding, Qiaoyi; Wang, Hao

doi:10.3390/en19051354

Open AccessArticle

Causal-Enhanced LSTM-RF: Early Warning of Dynamic Overload Risk for Distribution Transformers

by

Hao Bai

¹

,

Yipeng Liu

¹

,

Yawen Zheng

¹,

Ming Dong

^2,*

,

Qiaoyi Ding

¹ and

Hao Wang

²

¹

Electric Power Research Institute, China Southern Power Grid, Guangzhou 510620, China

²

School of Electrical Engineering, Xi’an Jiaotong University, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(5), 1354; https://doi.org/10.3390/en19051354

Submission received: 27 January 2026 / Revised: 24 February 2026 / Accepted: 27 February 2026 / Published: 7 March 2026

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning Applications in Electric Power and Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

The frequency of extreme weather events has become higher, and electricity consumption has also become more complex. These changes increase the risk of overload in distribution transformers (DTs), and this risk threatens the stability and reliability of the power grid. Existing methods have significant limitations. Traditional static threshold methods (based on DGA gas ratios and electrical signal thresholds) fail to consider temporal changes and complex links between factors, while modern machine learning models lack cause–effect relationships over time and clear ways to describe uncertainty. With such motivations, this paper proposes a causal-enhanced hybrid framework, which combines Long Short-Term Memory (LSTM) networks and Random Forest (RF) algorithms. The framework uses causal Seasonal Trend decomposition using Loess (STL) to reveal load patterns at different time scales. The mutual information index and spatiotemporal graph convolutional network (ST-GCN) are used to explore nonlinear relations and reveal how temperature affects load changes. The LSTM model captures time dependence in load series, and the Bayesian optimized Random Forest is used to solve the problem of data imbalance and quantify uncertainty. In addition, the framework constructs an early warning system that combines data from many sources in real time. Test results show that the proposed algorithm exhibits excellent performance in multi-source data environments.

Keywords:

distribution transformer; analysis of relationship; dynamic risk early warning; LSTM networks with RF

1. Introduction

In recent years, extreme weather events have occurred more often, and power consumption patterns have become more complex, increasing the risk of severe overloading in Distribution Transformers (DTs). These changes create major challenges for reliable grid operation. DTs serve as important parts of the distribution network, and they undergo ageing processes during overloading conditions. These phenomena not only cause insulation strength to gradually weaken but aging speeds up. These effects shorten the service life of the equipment and increase the chance of local power outages [1]. In some cases, these failures spread to nearby areas, reducing overall grid reliability and lowering service quality for users [2,3]. This risk becomes more serious during peak demand periods, such as extreme heat or cold and public holidays. During such times, transformers often operate under heavy load or even overload [4].

Traditional overload warning systems for DTs mainly rely on a single fixed threshold based on fixed load ratios or top-oil temperature. Nevertheless, these static methods, which are grounded in fixed current or power limits (e.g., IEEE Std C57.91), show clear limitations in real operating conditions [5]. In particular, these methods do not describe the nonlinear thermal response of transformers well. They fail to consider the combined effects of weather factors, temporal patterns, and social activities on load behavior [6,7]. This system adapts poorly to changing environments, so it often produces false alarms or mistakes real overload risks, especially under conditions of rapidly changing weather or sudden shifts in load levels.

To address these challenges, many studies propose more advanced methods, which mainly fall into two groups. To address these challenges, recent studies have proposed various advanced hybrid and data-driven approaches for transformer overload risk assessment. For example, Wang et al. (2023) developed an LSTM-XGBoost combined model specifically for short-term heavy overload forecasting [1]. Pentsos et al. (2025) introduced a hybrid LSTM-Transformer model for power load forecasting [4], and Miraki et al. (2024) employed explainable causal graph neural networks for electricity demand forecasting [2]. Rafati et al. (2024) proposed a dedicated overload alarm prediction framework [6]. Liu et al. (2024) presented a transformer heavy overload spatiotemporal distribution prediction method based on data fine mining [8], and Li et al. (2024) developed a real-time early warning method using multimodal data fusion [9]. The first approach builds adaptive warning systems that use feature selection and clustering to enable dynamic load monitoring. The second approach uses multi-source data fusion that combines weather data with operation data to provide a more complete risk view [10]. However, current machine learning models still show clear weaknesses. Many models lack a clear analysis of external factors, such as weather and social activity, and this limits their interpretability [11]. Furthermore, these models do not fully capture temporal links in distribution network structures [12,13]. In addition, most models do not provide reliable uncertainty estimates, especially when overload samples are rare and datasets are highly imbalanced [14,15].

To address the limitations of recent hybrid overload-forecasting baselines, this study proposes a Causal-Enhanced LSTM–RF Hybrid Framework. Recent models usually rely on post-hoc correlation without true causal decomposition. They seldom link predictive uncertainty directly to operational thresholds. They also perform poorly when overload samples are rare and data is imbalanced. In contrast, the study introduces a Causal-Enhanced LSTM-RF Hybrid Framework designed for the early assessment of transformer overload risk. The approach applies Causal Seasonal-Trend Decomposition (Causal STL) to reveal load behavior at different time scales. Additionally, the framework also uses a spatiotemporal graph convolutional network (ST-GCN). This model measures how temperature changes drive sudden load increases. Moreover, the proposed framework combines Long Short-Term Memory (LSTM) networks with a Bayesian-optimized Random Forest (RF) model to handle imbalanced data and produces probability-based risk estimates.

2. Materials and Methods

2.1. Causal STL Decomposition and Load Impact Analysis

This study employs a causal STL decomposition approach to examine the causal relationships between meteorological variables and load, aiming to reveal the underlying impact mechanisms. This method improves upon conventional STL decomposition by incorporating causal inference, which allows for the identification of causal links between distinct time series components. Causal STL is preferred over standard STL with post-hoc tests because it imposes causal constraints within the iterative loops, ensuring components are disentangled along causal directions. This identification assumes series stationarity, no omitted meteorological factors, and strict temporal precedence where weather drivers precede load responses. STL is a technique based on locally weighted regression that separates complex time series data into seasonal, trend, and residual components. Its core algorithm relies on the interplay between inner and outer iterative loops to decom-pose the series. The utilization of STL decomposition is justified by its superior ability to handle stochastic fluctuations and multi-scale temporal dependencies in electric utilities [16,17]. By decoupling the complex transformer load into interpretable trend and seasonal components, the framework can more accurately isolate the impact of external drivers like temperature, which is essential for robust causal analysis in power systems.

(1) The seasonal component is updated through an iterative process. To achieve this, detrending is applied to obtain the residual sequence R_v^(k) = Y_v − T_v^(k). This residual sequence is then divided into multiple subsequences based on the seasonal period. Each subsequence is subsequently smoothed using Loess regression. The smoothed subsequences are recombined to construct a preliminary seasonal sequence C_v^{(k + 1)}. Following this, C_v^{(k + 1)} is subjected to three consecutive filtering procedures: a seasonal moving average, a three-point moving average, and Loess smoothing, resulting in a filtered sequence L_v^{(k + 1)}. Finally, the high-frequency seasonal component is derived by calculating S_v^{(k + 1)} = C_v^{(k + 1)} − L_v^{(k + 1)}, thereby completing the seasonal adjustment process.

(2) The trend component is updated through an iterative process. Deseasonalization is performed by subtracting L_v^{(k + 1)} from Yv to derive a trend residual sequence. This residual sequence is subsequently smoothed using Loess regression to produce an updated trend component T_v^{(k + 1)}.

(3) The computation of the residual is conducted. At the conclusion of each inner loop, the remainder term R_v = Y_v − T_v^(k) − S_v^(k) is calculated and utilized for weight assignment in the outer loop.

The outer loop procedure is primarily concerned with the robust estimation of weights. A residual threshold, denoted as h = 6·median (|R_v|), is established based on the magnitude of the residual term |R_v|, where the threshold corresponds to six times the median of the absolute residuals. To determine robustness weights, a bisquare function is employed, as detailed in Equation (1).

\begin{array}{l} B (μ) = \{\begin{cases} {(1 - μ^{2})}^{2}, 0 \leq μ \leq 1 \\ 0, 1 \leq μ \end{cases} \\ ρ_{v} = B (| R_{v} | / h) \end{array}

(1)

Subsequently, the local weights utilized in the Loess regression within the inner loop are updated according to the relation v_i(x)_new = ρ_vv_i(x)_old.

Through the iterative and coordinated interaction of the inner and outer loops, the STL decomposition ultimately yields three distinct components: the trend component T_v, the seasonal component S_v, and the residual component R_v.

In this study, the causal STL decomposition method is initially applied to decompose the load data into trend, seasonal, and residual components. Subsequently, weather factors are integrated into the analysis, and Granger causality tests are conducted to examine the causal relationships between these factors and each load component. This framework enables a comprehensive analysis of how weather factors influence load fluctuations, specifically accounting for temporal lag effects [18].

2.2. Nonlinear Structural Modeling and Construction of Factor Networks

Electrical load and external driving variables, including meteorological factors, often demonstrate pronounced nonlinear interdependencies and exhibit intricate structural properties across both temporal and spatial dimensions. To effectively capture the structured dependencies among multiple influencing factors, this research utilizes an association modeling framework grounded in mutual information and conditional independence testing to develop a factor network. Additionally, a spatiotemporal graph convolutional network is proposed to represent the spatiotemporal propagation dynamics of factor influences.

Initially, the mutual information measure is applied to assess the strength of nonlinear associations between the electrical load and each driving factor. For two random variables, denoted as X and Y, mutual information is employed, as detailed in Equation (2).

I (X; Y) = \sum_{x, y} p (x, y) \log \frac{p (x, y)}{p (x) p (y)}

(2)

In this Equation, the random variable X denotes the electrical load, and Y represents the random variable associated with meteorological. The joint probability distribution of X and Y is expressed as p(x, y), while their respective marginal probability distributions are denoted by p(x) and p(y). Mutual information serves as a metric to quantify the amount of shared information between these variables. It is inherently non-negative, with higher values indicating stronger dependencies.

In practice, probability density estimation is employed to handle continuous variables, facilitating the numerical computation of mutual information. Crucially, while mutual information quantifies the strength of association between variables, it does not imply causality.

Building on this, the PC algorithm, based on conditional independence testing, is used to refine the association structure by imposing constraints on factor interactions. Integrating causal discovery into factor network construction helps the model identify stable physical-logical dependencies, thus improving its generalization [19]. Given the structured dependencies and temporal dynamics of electrical load, a spatiotemporal graph is adopted to systematically organize the network of factors. Within this framework, graph convolution operations capture structural relationships, while temporal sequencing models dynamic effects. Notably, this spatiotemporal graph modeling serves primarily as a structured feature representation method rather than a standalone predictive model; its outputs are high-level features that inform subsequent forecasting and risk evaluation.

2.3. The LSTM–RF Hybrid Forecasting Model

Electrical load time series exhibit strong temporal dependencies and are influenced by multiple nonlinear factors simultaneously. Conventional single-model approaches often struggle to capture both the temporal dynamics and the complex nonlinear relationships among features. To address these limitations, this study proposes a hybrid forecasting framework combining a LSTM network with a RF model [18]. This approach uses a hierarchical modeling strategy to integrate temporal sequence features with structured information from driving factors [20]. Furthermore, an uncertainty modeling mechanism is incorporated to enhance forecast robustness, particularly under abnormal conditions such as heavy loads and overloads.

Regarding the model architecture, the LSTM network is utilized to model the temporal dependency characteristics inherent in the load sequence. Denote the load time series as {y_t}^T_t. Through the coordinated functioning of the input gate, forget gate, and output gate, the LSTM network selectively preserves relevant historical information. The state update process of the LSTM can be formally described as detailed in Equation (3).

h_{t} = LSTM (y_{t - 1}, h_{t - 1})

(3)

In this Equation, h_t represents the hidden state vector at time t, which serves to characterize the dynamic evolution of the load sequence over the temporal dimension. The hidden states generated by the LSTM network encapsulate not only short-term fluctuations but also capture long-term trend variations.

Subsequently, these hidden state features derived from the LSTM are concatenated with the higher-order features produced by the previously described structured factor modeling module. This combined feature set is then utilized as input to the RF model. By constructing and aggregating multiple decision trees, the RF model complex nonlinear relationships and demonstrates strong robustness against anomalous data points. The prediction output of the RF model can be formally expressed as detailed in Equation (4).

{y_{t}}^{R F} = \frac{1}{N} \sum_{i = 1}^{N} f_{i} (h_{t}, z_{t})

(4)

In this Equation, f_i() represents the output of the i-th decision tree, N denotes the total number of trees, and z_t signifies the structured factor feature vector.

To enhance model robustness under heavy load, overload, and extreme operational scenarios, this study proposes an uncertainty modeling approach based on Bayesian principles. Specifically, the RF output serves as the predictive mean, while a GP is used to model prediction residuals, thereby quantifying predictive uncertainty. This framework allows for the derivation of confidence intervals to characterize output uncertainty. Consequently, by integrating point estimates with uncertainty quantification, the model provides both load point forecasts and their corresponding confidence intervals [19].

The predictive uncertainty, quantified through residual modeling, directly provides the necessary bounds for the dynamic threshold formulation in the early warning module. Consequently, the uncertainty-aware forecasting and the adaptive thresholding mechanism are tightly integrated within a unified framework [20].

2.4. Uncertainty Modeling and Dynamic Threshold-Based Early Warning

In early warning frameworks for DTs, the operational state is determined by the combination of load ratio and duration, rather than by instantaneous load magnitude alone. Short-term fluctuations typically cause minimal thermal stress; however, prolonged periods of elevated load significantly increase the risk of operational failure. Consequently, transformer operating states are defined by integrating both load ratio and duration, as outlined in Table 1.

Traditional static thresholds based solely on rated capacity fail to account for load variability and forecasting uncertainties [21]. To enhance the sensitivity and reliability of early warning systems, a dynamic threshold approach is adopted. In this framework, warning criteria are adaptively adjusted based on predicted load conditions and their associated uncertainties.

Let C represent the rated capacity of the distribution transformer, y_t^RF denote the short-term load forecast mean, and [y_t^RF⁻, y_t^RF⁺] signify the predictive uncertainty interval. The dynamic threshold is formulated as detailed in Equation (5).

θ_{t} = C - λ ({y_{t}}^{R F +} - {y_{t}}^{R F -})

(5)

In this Equation, λ is an uncertainty adjustment coefficient that modulates the system’s sensitivity to forecasting errors. As predictive uncertainty increases, the dynamic threshold correspondingly decreases, thereby facilitating the early detection of potential heavy overload risks. Conversely, when forecast stability improves, the threshold gradually reverts to the rated capacity level, minimizing the occurrence of unwarranted warnings.

Building upon this dynamic threshold, a hierarchical early warning decision framework is established. An initial alert is issued when the predicted load mean ytRF approaches the threshold θ_t. A more severe risk warning is triggered if the upper bound of the predictive interval y_t^RF⁺ surpasses θ_t. This approach enables the proactive identification of potential risks prior to the actual load exceeding the capacity limit, thereby affording operators adequate time to implement preventive measures.

2.5. Design of Comparative Validation Experiments

The load data used in this research was sourced from a regional power grid company. The experimental dataset comprises two primary components: high-resolution load profiles and synchronized meteorological records. Specifically, the load dataset contains active power measurements from distribution transformers in a coastal city of southern China, recorded at 15-min intervals (96 points per day), spanning from January 2020 to January 2021. To account for causal environmental drivers, a concurrent meteorological dataset provides five daily parameters: maximum, minimum, and average temperatures, relative humidity, and rainfall. The temporal coverage allows for a robust analysis of seasonal trends, long-term demand dynamics, and overload incidents, providing a reliable foundation for assessing the early warning system.

To simulate realistic operational scenarios and prevent data leakage, a chronological data partitioning strategy was adopted. Historical data was allocated for training, while subsequent data was reserved for validation. Given the scarcity of overload events in the dataset, the experimental design prioritizes the identification of risk states over traditional point forecasting accuracy.

The framework was evaluated based on operational state classification. Transformer conditions were categorized into normal, heavy load, and overload states using predicted load values and adaptive thresholds. Evaluation metrics included precision, recall, and F1 score for each category, alongside confusion matrix analysis to identify misclassification patterns. These metrics directly measure the system’s effectiveness in detecting risk conditions. For comparison, a conventional LSTM model was implemented as a baseline using identical input features and data splits. This comparison demonstrates the superior performance of the proposed causality-enhanced hybrid framework in identifying overload risks, particularly within imbalanced datasets.

3. Results

3.1. Multi-Scale Load Decomposition and Granger Causality Analysis

To elucidate the underlying structural characteristics governing transformer load dynamics, the STL method was employed to disaggregate the load time series, recorded at 15-min intervals, into trend, seasonal, and residual components, as illustrated in Figure 1. This decomposition effectively reveals the multi-scale framework that shapes load evolution and the emergence of overload risk.

The trend component shows smooth and continuous changes over the year, and it reflects long-term load growth and overall climate conditions. This part sets the basic operating level of the transformer, so it mainly affects the chance of long-lasting high load states. The seasonal component describes repeated time patterns, including daily and weekly cycles, which cause regular load changes around the trend level. The residual contains fast and irregular changes, reflecting short-term load variations.

To measure the importance of each time scale, this study calculates and normalizes the variance of each STL component. As presented in Table 2, the residual component contributes the largest share of total variance, while the trend component ranks second, and the seasonal component contributes the least. The strong influence of the residual part indicates high load variability at a 15-min time step [22]. At this scale, short-term operational changes and user behavior play a major role in load variation.

The Granger test shows a clear causal link between the residual temperature part and the residual load part at short time lags. As presented in Table 3, short-term weather changes provide useful predictive information for fast load changes. In contrast, the separated temperature parts show weak causal effects on slow load changes. This result suggests that weather mainly affects load through fast responses. These responses include the use of temperature-sensitive devices and quick changes in user behavior. Low-frequency load trends mainly depend on long-term demand growth and structural conditions.

3.2. Nonlinear Structural Modeling and Factor Network Analysis

To study nonlinear links between daily average load and weather factors, this study builds a nonlinear influence network based on mutual information. The network shows a sparse structure, which means only a few weather factors have strong nonlinear links with the daily load. Figure 2 depicts the daily nonlinear influence network after significance filtering. Among all tested factors, only maximum temperature and minimum temperature show clear nonlinear links with the load [23].

As presented in Table 4, it summarizes the nonlinear influence strength and network degree for each weather factor. The mutual information results show that minimum temperature has a slightly stronger link with load than maximum temperature. In contrast, average temperature, humidity, and rainfall do not show significant effects on the daily scale. This difference suggests that daily load responds more strongly to low temperature conditions, because heating demand often lasts for long periods during cold days. The results also show a strong mutual link between maximum and minimum temperatures, and this link reflects overall thermal conditions that shape daily load behavior.

Overall, the daily scale nonlinear network provides a clear structure of the main weather drivers. This structure supports later feature selection and multi-scale modeling in the forecasting framework. In this study, humidity and precipitation were specifically excluded from the final feature set based on a rigorous statistical screening. Although these factors are often considered in load analysis, our results showed that their mutual information with the target load fell below the significance threshold after False Discovery Rate (FDR) correction. This ensures that the model remains parsimonious and focuses exclusively on statistically robust causal drivers.

3.3. Integrated Load Forecasting and Overload Early Warning Model

Based on the proven impact of weather factors on load behavior, this section presents an integrated model. The model performs load prediction and overload warning at the same time. It links predicted load values directly to operating states, so it supports clear decision making. In contrast, the proposed framework merges these two tasks into one unified model. This design improves consistency, and reduces information loss between steps. The model uses weather variables as the main inputs, because earlier analysis confirms their strong causal effect on load changes. A nonlinear prediction model first processes these inputs, and generates short-term load forecasts. The predicted load series then becomes the only input for overload state judgment, so the decision path remains simple and easy to interpret.

Load behavior changes over time, so the model introduces a dynamic threshold scheme. These thresholds adjust according to weather correction factors and past operating data. This process allows the model to distinguish normal load, overload, and severe overload under different conditions. The model compares predicted load values with these adaptive thresholds. Based on this comparison, the model outputs probability-based load states instead of fixed yes-or-no results. This approach better reflects real operating uncertainty and supports safer early warning decisions.

Before model training, the input data were processed to match the requirements of the classification model. The validation data come from one transformer in a mixed residential and commercial area. The data cover one full year, and they include load and matching weather records. The dataset contains 106,176 samples in total. The study assigns 70% of the data to training and 30% to testing. After preprocessing, data completeness reaches 99.2%, and only 0.8% of points appear as outliers. The preprocessing step fills missing values through interpolation, so the final data quality meets analysis needs.

Based on the defined heavy overload states, this study builds a confusion matrix, as shown in Figure 3. Table 5 reports the accuracy, recall, and F1 score. The overall accuracy reaches 93.18%, which confirms strong state identification under normal operating conditions. The model maintains high prediction accuracy for all load states. The F1 score further shows balanced performance across categories. These results confirm that the model provides a strong and reliable assessment of heavy overload risk in distribution transformers.

The parameter λ is the core indicator for adjusting the sensitivity of overload warnings, directly corresponding to the risk preference of power grid operation. With the default setting λ = 1.0, the model achieves an accuracy of 0.9938 and a recall of 0.8869 for overload conditions. If operators are more concerned about the safety risks caused by missed overload warnings, λ can be increased to 1.5, which can capture more potential overload points and improve the recall rate. Experimental data show that even with different parameter values, the F1 score for overload identification consistently remains above 0.9, demonstrating the model’s strong robustness. These results are summarized in Table 6.

Figure 4 further compares the classification accuracy of the proposed model and a standard LSTM model. The comparison between the two methods shows clear gains in all key metrics. The improved algorithm raises overall classification accuracy by 1.86%, increases heavy load state recognition accuracy by 14.21%, and improves overload state recognition accuracy by 28.17%. These improvements strengthen the reliability of equipment state assessment in real operations.

To test how well the model transfers to complex scenarios, the study uses load and temperature time series from six typical substations in one province of the State Grid. The data cover the period from April 2022 to September 2023. The experiment splits the data by time. It uses the first 17 months, from April 2022 to August 2023, for training, while it uses September 2023 for testing.

The experiment applies strict conditions. The input features include only temperature data, and the sample distribution remains unbalanced. Figure 5 shows the evaluation results under these limits. The results indicate that limited weather inputs reduce short-term load prediction accuracy compared with the benchmark. However, the dynamic overload warning model still provides useful risk predictions. These findings confirm that the optimized system maintains good predictive ability and adapts well to different operating scenarios.

To further validate the reliability of the uncertainty estimates produced by the Gaussian Process residual module, evaluated the prediction intervals against the actual load observations. For a nominal confidence level of 95%, the model achieved a Prediction Interval Coverage Probability (PICP) of approximately 0.967. This indicates that the constructed intervals successfully capture the true load values in 96.7% of the test cases, slightly exceeding the target coverage and confirming good calibration. Concurrently, the Mean Prediction Interval Width (MPIW) was observed to be approximately 9.8% of the transformer’s rated capacity.

To evaluate the sensitivity of the warning threshold and its impact on operational risk, a Precision-Recall (PR) curve was generated for the overload state by scaling the decision threshold from 0.8× to 1.2×, as illustrated in Figure 6. The results indicate that the default threshold is positioned at the ‘elbow’ of the curve, yielding an optimal balance between precision and recall. Beyond this point, further increasing the recall leads to a rapid decline in precision to 0.8645, which increases false alarms. Conversely, moving to a 1.2× threshold significantly reduces the capture of latent risks.

4. Discussion

The results indicate that the risk of overload in distribution transformers is primarily driven by sustained loading patterns rather than transient fluctuations. Due to the inherent thermal inertia of transformers, brief load surges rarely cause immediate operational hazards. In contrast, extended periods of heavy loading significantly accelerate the degradation of insulation materials. Consequently, static warning mechanisms are often ineffective in environments with high load variability, highlighting the need for risk assessment approaches that consider load persistence.

Causal analysis shows that weather factors, particularly temperature, mainly affect short-term load variability, while long-term load trends are largely determined by underlying structural demand [24]. Furthermore, nonlinear factor network analysis demonstrates that the impact of meteorological factors depends heavily on the temporal scale; specifically, temperature extremes are the primary drivers at a daily resolution. This suggests that effective overload warning systems should use scale-sensitive feature selection strategies, rather than including a broad range of environmental variables without distinction.

Operationally, integrating load forecasting with dynamic state identification offers clear advantages. By using predictive uncertainty to adjust warning thresholds, the proposed framework achieves high precision in overload warnings while maintaining sensitivity to high-risk conditions. Compared to a conventional LSTM approach, our framework demonstrates superior classification performance under heavy and overload states. This confirms its robustness with imbalanced datasets and its suitability for real-world deployment. Despite the framework’s effectiveness, a major challenge encountered was the inherent scarcity of extreme overload samples, which poses difficulties in training the causal model to achieve consistent robustness across all operational scenarios.

This framework transforms transformer management from reactive “firefighting” to proactive prevention. By providing a 24-h warning window, operators can balance loads or adjust configurations before thermal stress damages the insulation, directly preventing equipment failure

5. Conclusions

This study introduces a causal-enhanced LSTM–RF framework for the early warning of dynamic overload risks in distribution transformers. The framework integrates causal load decomposition, structured nonlinear feature modeling, and uncertainty-aware dynamic thresholding to directly link environmental factors with operational risk identification. This result directly providing a more reliable and dynamic early warning mechanism for distribution transformers, enabling operators to move from reactive maintenance to proactive risk management in complex operational environments. Consequently, the proposed methodology offers a practical, interpretable solution for monitoring transformer overload and can be easily integrated into distribution network operations and asset management systems. Future research should focus on integrating physical thermal models into the machine learning framework and exploring the transferability of this causal-enhanced method to other critical grid components like circuit breakers.

Author Contributions

Conceptualization, H.B.; Methodology, M.D.; Software, Y.Z. and Q.D.; Validation, Y.Z. and Q.D.; Formal analysis, Y.L.; Investigation, Y.L.; Resources, H.B.; Data curation, Y.Z. and Q.D.; Writing–original draft, H.W.; Writing—review & editing, H.W.; Visualization, Y.Z. and Q.D.; Supervision, M.D.; Project administration, M.D.; Funding acquisition, H.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Science and Technology Project of China Southern Power Grid [Project No. 032000KC23120049(GDKJXM20231527)].

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request. The data are not publicly available due to their nature as anonymized internal operational records from China Southern Power Grid, for which a formal citation is not applicable.

Conflicts of Interest

Authors Bai Hao, Liu Yipeng, Zheng Yawen and Ding Qiaoyi were employed by the company China Southern Power Grid. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DT	Distribution Transformer
STL	Seasonal-Trend Decomposition using Loess
LSTM	Long Short-Term Memory
RF	Random Forest
MI	Mutual Information
ST-GCN	Spatiotemporal Graph Convolutional Network
GP	Gaussian Process

References

Ma, H.; Yang, P.; Wang, F.; Wang, X.; Yang, D.; Feng, B. Short-Term Heavy Overload Forecasting of Public Transformers Based on the LSTM-XGBOOST Combined Model. Energies 2023, 16, 1507. [Google Scholar] [CrossRef]
Miraki, A.; Parviainen, P.; Arghandeh, R. Electricity demand forecasting at distribution and household levels using explainable causal graph neural network. Energy AI 2024, 16, 100368. [Google Scholar] [CrossRef]
Abumohsen, M.; Owda, A.Y.; Owda, M. Electrical Load Forecasting Using LSTM, GRU, and RNN Algorithms. Energies 2023, 16, 2283. [Google Scholar] [CrossRef]
Pentsos, V.; Tragoudas, S.; Wibbenmeyer, J.; Khdeer, N. A Hybrid LSTM-Transformer Model for Power Load Forecasting. IEEE Trans. Smart Grid 2025, 16, 2624–2634. [Google Scholar] [CrossRef]
IEEE Std C57.91-2011; IEEE Guide for Loading Mineral-Oil-Immersed Transformers and Step-Voltage Regulators. IEEE: Piscataway, NJ, USA, 2011.
Rafati, A.; Mirshekali, H.; Shaker, H.R. Overload Alarm Prediction in Power Distribution Transformers. Smart Grids Sustain. Energy 2024, 9, 39. [Google Scholar] [CrossRef]
Usman, H.M.; ElShatshat, R.; El-Hag, A.H.; Jabr, R.A. Estimation of distribution transformer kVA load using residential smart meter data. Electr. Power Syst. Res. 2022, 204, 107663. [Google Scholar] [CrossRef]
Liu, Y.; Sun, C.; Yang, X.; Jia, Z.; Su, J.; Guo, Z. A Transformer Heavy Overload Spatiotemporal Distribution Prediction Method Based on Data Fine Mining. Sustainability 2024, 16, 3110. [Google Scholar] [CrossRef]
Li, S.; Wang, H.; Li, J.; Hou, K. Real-Time Early Warning Method of Distribution Transformer Load and Temperature Based on Multimodal Data Fusion. J. Circuits Syst. Comput. 2024, 33, 2450224. [Google Scholar] [CrossRef]
Wang, H.; Xu, Y.; Xu, D.; Luo, Z.; Li, Y.; Peng, X. Heavy Load and Overload Pre-warning for Distribution Transformer with PV Access Based on Graph Neural Network. In Proceedings of the 2023 6th International Conference on Energy, Electrical and Power Engineering (CEEPE), Guangzhou, China, 21–23 April 2023; IEEE: Guangzhou, China, 2023; pp. 922–929. [Google Scholar] [CrossRef]
Wang, J.; Li, Y.; Zhang, H. Medium- and Long-Term Load Forecasting for Power Plants Based on Causal Inference and Informer. Sustainability 2023, 15, 11134. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, L.; Wang, Q. Real-Time Robust State Estimation for Large-Scale Low-Observability Power-Transportation System Based on Meta Physics-Informed Graph TimesNet. IEEE Trans. Smart Grid 2025, 16, 1234–1245. [Google Scholar] [CrossRef]
Zaboli, A.; Tuyet-Doan, V.-N.; Kim, Y.-H.; Hong, J.; Su, W. An LSTM-SAE-Based Behind-the-Meter Load Forecasting Method. IEEE Access 2023, 11, 49378–49392. [Google Scholar] [CrossRef]
Santos-Fernández, E.; Ver Hoef, J.M.; Peterson, E.E.; McGree, J.; Isaak, D.J.; Mengersen, K. Bayesian spatio-temporal models for stream networks. Comput. Stat. Data Anal. 2022, 170, 107446. [Google Scholar] [CrossRef]
Liu, S.; Luo, H.; Zhao, L. A Hybrid Deep Learning Model for Imbalanced Data Classification in Intelligent Manufacturing. Sensors 2024, 24, 1234. [Google Scholar] [CrossRef]
Lv, L.; Han, Y. Identification of transformer overload and new energy planning for enterprises based on load forecasting. PLoS ONE 2024, 19, e0311354. [Google Scholar] [CrossRef]
Xia, T.; Lan, H.; Fu, T.; Hao, L.; Wang, Q.; Wang, S. Ultra-short-term load forecasting and risk assessment method for distribution networks based on the VMD–DeepAR model. Front. Energy Res. 2025, 13, 1692222. [Google Scholar] [CrossRef]
Bouhamed, O.; Dissem, M.; Amayri, M.; Bouguila, N. Transformer-based deep probabilistic network for load forecasting. Eng. Appl. Artif. Intell. 2025, 152, 110781. [Google Scholar] [CrossRef]
Mansoor, H.; Gull, M.S.; Rauf, H.; Shaikh, I.U.H.; Khalid, M.; Arshad, N. Graph Convolutional Networks based short-term load forecasting: Leveraging spatial information for improved accuracy. Electr. Power Syst. Res. 2024, 230, 110263. [Google Scholar] [CrossRef]
Shawon, S.M.; Haider, S.N.; Barua, A.; Austin, S.; Adan, I.A.; Hossain, M.S.; Zubair, H. Hybrid CNN-LSTM model for urban energy load forecasting with IGA-XAI for smart grids. Energy Rep. 2025, 28, 107245. [Google Scholar] [CrossRef]
Ganjavi, A.; Paraschivoiu, M.; Saussié, D. Predictive Maintenance for Distribution System Operators in Increasing Transformers’ Reliability. Electronics 2023, 12, 1356. [Google Scholar] [CrossRef]
O’Donnell, J.; Su, W. Attention-Focused Machine Learning Method to Provide the Stochastic Load Forecasts Needed by Electric Utilities. Energies 2023, 16, 5661. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, H.; Yang, R.; Chen, Y. Improving Model Generalization for Short-Term Customer Load Forecasting With Causal Inference. IEEE Trans. Smart Grid 2025, 16, 424–436. [Google Scholar] [CrossRef]
Ullah, Z.; Lodhi, B.A.; Haidri, R.A. LSTM and Bat-Based RUSBoost Approach for Electricity Theft Detection in Smart Grids. Appl. Sci. 2020, 10, 4378. [Google Scholar] [CrossRef]

Figure 1. STL algorithm analysis result diagram.

Figure 2. Daily averaged nonlinear factor influence network between load and meteorological variables. Only temperature extremes form statistically significant nonlinear connections with the load under daily aggregation.

Figure 3. Confusion matrix of the proposed model for load state classification.

Figure 4. Accuracy comparison between the proposed model and the conventional LSTM-based method.

Figure 5. Comparison of correct rate results in different stations.

Figure 6. Precision–recall curve for the overload state across different threshold multipliers.

Table 1. Definition of transformer operating states based on load ratio and duration.

Transformer Operating State	Load Ratio β	Duration (h)
Light load	β ≤ 30%	–
Normal load	30% < β ≤ 80%	–
Heavy load	80% < β ≤ 100%	≥2 *
Overload	β > 100%	≥2

* The criterion for an overload state is established based on the thermal time constants specified in IEEE Std C57.91-2011 [5]. For typical oil-immersed distribution transformers, this duration represents the minimum interval required to produce a significant hot-spot temperature rise (exceeding 10 °C) under sustained loading. This threshold is chosen to distinguish thermally critical overload risks, which contribute to accelerated insulation aging, from short-term transient spikes that do not pose an immediate threat to the transformer’s life.

Table 2. Variance contribution of STL components.

Component	Variance (kW²)	Contribution (%)
Trend	1.07 × 10⁶	29.40
Seasonal	4.16 × 10⁵	11.42
Residual	2.16 × 10⁶	59.18
Total	—	100.00

Table 3. Granger causality test results between temperature components and load.

Temperature Component	Lag Range	Min p-Value	Significance (α = 0.05)
Residual component	1–8	<1 × 10⁻¹⁰	Yes
Short-term component	1–8	0.11	No (marginal)
Long-term component	1–8	0.92	No

Table 4. Nonlinear association metrics and network degree of weather factors on the daily scale.

Factor	MI	Selected	Degree
Highest Temperature	0.113	Yes	2
Lowest Temperature	0.145	Yes	2
Average Temperature	–	No	0
Humidity	–	No	0
Precipitation	–	No	0

Table 5. Classification performance of the proposed model under different load states.

Metric	Normal	Heavy Load	Overload
Precision	0.9353	0.8804	0.9938
Recall	0.9819	0.9265	0.8869
F1 score	0.9580	0.9029	0.9374

Table 6. Sensitivity of warning performance to parameter λ.

λ	Risk Tolerance	Precision	Recall	F1 Score
0.8	Aggressive	0.9982	0.8425	0.9138
1.0	Balanced	0.9938	0.8869	0.9374
1.2	Conservative	0.9715	0.9142	0.9320
1.5	Very High	0.9254	0.9403	0.9328

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bai, H.; Liu, Y.; Zheng, Y.; Dong, M.; Ding, Q.; Wang, H. Causal-Enhanced LSTM-RF: Early Warning of Dynamic Overload Risk for Distribution Transformers. Energies 2026, 19, 1354. https://doi.org/10.3390/en19051354

AMA Style

Bai H, Liu Y, Zheng Y, Dong M, Ding Q, Wang H. Causal-Enhanced LSTM-RF: Early Warning of Dynamic Overload Risk for Distribution Transformers. Energies. 2026; 19(5):1354. https://doi.org/10.3390/en19051354

Chicago/Turabian Style

Bai, Hao, Yipeng Liu, Yawen Zheng, Ming Dong, Qiaoyi Ding, and Hao Wang. 2026. "Causal-Enhanced LSTM-RF: Early Warning of Dynamic Overload Risk for Distribution Transformers" Energies 19, no. 5: 1354. https://doi.org/10.3390/en19051354

APA Style

Bai, H., Liu, Y., Zheng, Y., Dong, M., Ding, Q., & Wang, H. (2026). Causal-Enhanced LSTM-RF: Early Warning of Dynamic Overload Risk for Distribution Transformers. Energies, 19(5), 1354. https://doi.org/10.3390/en19051354

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Causal-Enhanced LSTM-RF: Early Warning of Dynamic Overload Risk for Distribution Transformers

Abstract

1. Introduction

2. Materials and Methods

2.1. Causal STL Decomposition and Load Impact Analysis

2.2. Nonlinear Structural Modeling and Construction of Factor Networks

2.3. The LSTM–RF Hybrid Forecasting Model

2.4. Uncertainty Modeling and Dynamic Threshold-Based Early Warning

2.5. Design of Comparative Validation Experiments

3. Results

3.1. Multi-Scale Load Decomposition and Granger Causality Analysis

3.2. Nonlinear Structural Modeling and Factor Network Analysis

3.3. Integrated Load Forecasting and Overload Early Warning Model

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI