Trend Prediction of Distribution Network Fault Symptoms Based on XLSTM-Informer Fusion Model

Chen, Zhen; Gao, Lin; Cheng, Yuanming

doi:10.3390/en19061389

Open AccessArticle

Trend Prediction of Distribution Network Fault Symptoms Based on XLSTM-Informer Fusion Model

by

Zhen Chen

¹,

Lin Gao

^1,*

and

Yuanming Cheng

²

¹

College of Automation and Electronic Engineering, Qingdao University of Science and Technology, Qingdao 266061, China

²

College of Control Science and Engineering, China University of Petroleum, Qingdao 266109, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(6), 1389; https://doi.org/10.3390/en19061389

Submission received: 13 January 2026 / Revised: 6 February 2026 / Accepted: 12 February 2026 / Published: 10 March 2026

(This article belongs to the Special Issue Smart Energy Solutions with Artificial Intelligence and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Accurate prediction of distribution network operating states is essential for implementing proactive fault warning systems. However, with the high penetration of distributed energy resources, measurement data exhibit strong nonlinearity and multi-scale temporal characteristics, posing significant challenges to existing prediction methods. Current mainstream approaches face a critical dilemma: traditional recurrent neural network (RNN) models (e.g., LSTM) suffer from vanishing gradients and memory bottlenecks in long-sequence forecasting, making it difficult to capture long-term evolutionary trends. In contrast, while standard Transformer models excel at global modeling, their smoothing effect renders them insensitive to subtle transient abrupt changes such as voltage sags, and they incur high computational complexity. To address the dual challenges of “difficulty in capturing transient abrupt changes” and “inability to simultaneously handle long-term trends,” this paper proposes a fault precursor trend prediction model that integrates Extended Long Short-Term Memory (XLSTM) with Informer, termed XLSTM-Informer. To tackle the challenge of extracting transient features, an XLSTM-based local encoder is constructed. By replacing the conventional Sigmoid activation with an improved exponential gating mechanism, the model achieves significantly enhanced sensitivity to instantaneous fluctuations in voltage and current. Additionally, a matrix memory structure is introduced to effectively mitigate information forgetting issues during long-sequence training. To overcome the challenge of modeling long-term dependencies, Informer is employed as the global decoder. Leveraging its ProbSparse sparse self-attention mechanism, the model substantially reduces computational complexity while accurately capturing long-range temporal dependencies. Experimental results on a real-world distribution network dataset demonstrate that the proposed model achieves substantially lower Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE) compared to standalone CNN, LSTM, and other baseline models, as well as conventional LSTM–Informer hybrid approaches. Particularly under extreme operating conditions—such as sustained high summer loads and winter heating peak loads—the model successfully overcomes the trade-off limitations of traditional methods, enabling simultaneous and accurate prediction of both local precursors and global trends. This provides a reliable technical foundation for proactive warning systems in distribution networks.

Keywords:

distribution network; fault prediction; XLSTM; informer; deep learning

1. Introduction

The distribution network, as a critical link connecting the power grid and end-users, directly impacts the reliability of social and economic operations. With the advancement of the “dual carbon” strategy, control and management strategies for distributed energy resources (DERs) have become increasingly complex [1,2]. The high penetration of distributed photovoltaics, combined with the widespread adoption of highly variable loads such as electric vehicles, has transformed traditional unidirectional radial distribution networks into complex active distribution networks (ADNs) characterized by strong source–load interaction. Particularly at the low-voltage distribution level, voltage violation issues caused by high-penetration photovoltaics have emerged as a major control challenge [3,4]. This transformation has significantly increased system vulnerability and operational uncertainty.

In the context of increasingly frequent extreme weather events (e.g., heatwaves) [5] and sharp load surges due to “coal-to-electricity” programs in rural areas, fault patterns in distribution networks now exhibit strong nonlinearity, high randomness, and complex evolution mechanisms [6]. Consequently, keeping pace with emerging trends in active distribution network fault detection [7], the proactive prediction and early identification of fault precursors based on multi-source data have gradually replaced the conventional post-fault emergency repair paradigm, and have become a key strategy for enhancing grid resilience [8].

The core of fault prediction lies in extracting effective features from massive time-series data and inferring evolutionary trends. Early studies mainly relied on physical models or simple signal processing techniques, such as wavelet transform-based methods for early short-circuit fault detection, which performed well in specific frequency band feature extraction [9]. However, these approaches show poor adaptability in active distribution networks with frequently changing topologies. Subsequently, data-driven methods have gradually taken the lead. Among them, shallow machine learning algorithms have been widely applied: some researchers utilized Random Forests with ensemble voting mechanisms to improve fault prediction accuracy [10]; others employed XGBoost combined with Particle Swarm Optimization (PSO) to achieve efficient fault identification [11,12]; and innovative approaches based on “fault gene sequences” and sequence alignment techniques have been proposed for precise state mapping of smart distribution networks [13]. Additionally, improved BP neural networks (e.g., those optimized by Harris Hawks Optimization) have demonstrated good performance in fault prediction considering weather factors [14]. Nevertheless, such shallow models typically rely heavily on manual feature engineering and struggle to capture long-term temporal dependencies, resulting in limited generalization capability when facing massive high-dimensional monitoring data (e.g., protection relays, disturbance recorders) [15].

To extract deep features from time-series data, deep learning approaches have emerged. Notably, deep learning has shown tremendous potential not only in time-series forecasting but also in vision-based detection of critical distribution equipment (e.g., SGI-YOLOv9 model) [16]. In the time-series domain, recurrent neural networks (RNNs) and their variants—Long Short-Term Memory (LSTM)—have become mainstream. LSTM effectively mitigates the vanishing gradient problem of traditional RNNs through its gating mechanism and has been widely applied to short-circuit current prediction [17] and early fault probability sequence classification [18]. Although LSTM performs excellently on short- to medium-term sequences, it still suffers from memory bottlenecks and low computational efficiency when dealing with ultra-long-term fault precursors in distribution networks (such as gradual insulation aging or persistent overloading).

In recent years, with the rise in large language models (LLMs) and Transformer architectures, attention-based time-series forecasting has become a research hotspot. Some studies have explored optimized LLMs for insulation fault prediction in distribution networks, demonstrating their great potential in capturing global contextual information [19]. However, while standard Transformers excel at modeling global dependencies, their self-attention mechanisms incur quadratic computational complexity with respect to sequence length. Moreover, they tend to produce a “smoothing effect,” making them insensitive to subtle transient features such as voltage sags and local disturbances [20]. In real distribution networks, faults often originate from minor local perturbations before eventually evolving into system-wide failures.

In summary, existing methods for distribution network fault prediction face a fundamental trade-off, which is the difficulty in capturing instantaneous abrupt changes while simultaneously accounting for long-term evolutionary trends. Standalone LSTM struggles to maintain long-term memory, whereas standard Transformers tend to overlook critical local details. To address this challenge, this paper proposes a hybrid prediction model that integrates Extended LSTM (XLSTM) with Informer, termed XLSTM-Informer. The model consists of two key components: (1) an XLSTM-based local feature encoder that leverages improved exponential gating and matrix memory structures to specifically capture high-frequency, transient fault precursors; (2) an Informer-based global decoder that employs the ProbSparse sparse self-attention mechanism to efficiently infer long-term fault evolution trends. Using real operational data from a regional distribution network, this study particularly evaluates the model’s performance under typical extreme scenarios, including summer peak loads and winter heating spikes, with the aim of providing a high-precision, robust solution for proactive defense of active distribution networks under complex operating conditions.

2. Related Work

To contextualize the contributions of this study, this section critically reviews the evolution of fault prediction methodologies in distribution networks. The existing literature is broadly categorized into three dominant paradigms: traditional deep learning baselines (RNNs/CNNs), long-sequence attention mechanisms (Transformer variants), and emerging hybrid architectures. The strengths and inherent limitations of each category are analyzed below.

RNN- and CNN-based approaches for traditional deep learning models, such as Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs), have been widely deployed in grid fault diagnosis. LSTM is favored for its gate mechanisms that handle temporal dependencies, while CNNs excel at extracting local spatial features.

Critique: However, these methods face inherent limitations. LSTMs suffer from high memory costs and gradient vanishing problems when processing ultra-long sequences (e.g., continuous weekly monitoring). Similarly, while CNNs capture local fluctuations effectively, their limited receptive field makes them struggle to model the long-range global correlations required for trend prediction.

In order for the transformer and its variants to address the long-term dependency issue, attention-based models like the transformer and informer have emerged. The Informer, in particular, utilizes a ProbSparse attention mechanism to reduce computational complexity to O(L Log L), making it suitable for long-sequence forecasting.

Critique: Despite their success in capturing global trends, standard transformers exhibit a “smoothing effect.” The self-attention mechanism tends to average out high-frequency signals, treating valuable transient fault symptoms (e.g., voltage spikes) as noise. Consequently, they often fail to capture the sharp, instantaneous mutations that are critical for early fault warning.

Hybrid mechanisms and research gap: Recent studies have attempted to fuse RNNs and transformers to combine their strengths. However, most existing hybrid models rely on simple parallel concatenation or lack specialized memory structures for high-frequency signals.

Research Gap: There is a lack of a unified framework that can simultaneously lock onto local transient features (using matrix memory) and model global evolutionary trends (using sparse attention). This paper addresses this gap by proposing the XLSTM-Informer fusion model, specifically designed to balance sensitivity to mutations with long-term forecasting stability.

3. Materials and Methods

3.1. Overall Architecture

To address the challenges of weak fault symptoms and strong randomness in active distribution networks, this paper proposes a novel trend prediction framework named XLSTM-Informer. As illustrated in Figure 1, the framework consists of three integrated modules:

Data Preprocessing Module: Responsible for cleaning raw measurement data, normalizing features, and generating time-series samples via sliding windows.

Local Feature Extraction Module (Encoder): Utilizing the Extended LSTM (XLSTM) to capture instantaneous local variations and short-term dependencies in voltage/current sequences.

Global Trend Inference Module (Decoder): Employing the Informer architecture with ProbSparse self-attention to model long-range dependencies and output multi-step trend predictions.

3.2. Data Acquisition and Processing

The dataset used in this study originates from real-world measurement data of a distribution network in the Tangshan area, Northern China. To ensure model convergence and prediction accuracy, rigorous data processing is required.

(1) Min–Max Normalization: Since the dataset contains multiple electrical quantities (e.g., Voltage, Current, Active Power) with varying magnitudes, direct input into the neural network may cause gradient oscillation. The min–max normalization to map all features to the [0, 1] range is as follows:

x^{'} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(1)

where x represents the original observed value, and x_min and x_max denote the minimum and maximum values of the feature sequence, respectively.

(2) Sliding Window Construction: To transform the continuous time-series forecasting problem into a supervised learning task, a sliding window strategy is implemented. As shown in Figure 2, the input sequence length is set to L_in (Historical Horizon), and the prediction sequence length is set to L_out (Forecasting Horizon).

Input Sequence: $X_{t} = {x_{t - L_{i n} + 1}, \dots, x_{t}}$
Target Sequence: $Y_{t} = {x_{t + 1}, \dots, x_{t + L_{o u t}}}$

In the experiments, based on the sampling frequency and fault evolution characteristics, L_in = 96 and L_out = 24 are set.

Figure 2. Schematic diagram of the sliding window data segmentation.

3.3. The Local Feature Encoder: XLSTM

Traditional LSTMs suffer from limited storage capacity and gradient decay when processing high-frequency sampling data in distribution networks. To overcome this, the Extended LSTM (XLSTM) is employed as the local feature encoder.

The content presented in Figure 3 is the structure of the XLSTM unit with exponential gating and matrix memory.

The XLSTM introduces two key improvements over the standard LSTM:

Exponential Gating: The traditional Sigmoid activation function is replaced by an exponential gating mechanism. This allows for sharper selection of input information, enabling the model to be more sensitive to instantaneous fault symptoms (such as sudden voltage sags) while suppressing background noise.

Matrix Memory: The scalar cell state (c_t) is expanded into matrix structures. This significantly increases the memory capacity, ensuring that critical local features are preserved even before entering the global attention module.

3.4. The Global Trend Decoder: Informer

After extracting local features, the high-dimensional hidden states are fed into the Informer module to infer future evolutionary trends. The Informer is specifically designed to solve the computational inefficiency of standard Transformers in long-sequence forecasting.

(1) ProbSparse Self-Attention: The standard self-attention mechanism requires O(L²) computational complexity, which is resource-intensive. Informer employs the ProbSparse mechanism, which selects only the “Top-u” queries with the highest dominant correlations to compute attention weights:

A (Q, K, V) = S o f t m a x (\frac{\bar{Q} K^{T}}{\sqrt{d}}) V

(2)

where

\bar{Q}

represents the sparse query matrix. This reduces the complexity to O(L log L) allowing the model to efficiently capture long-term dependencies and global periodicity in the distribution network data.

(2) Generative Decoder: Different from the step-by-step recursive prediction of RNNs, the Informer uses a generative style decoder to output the entire prediction sequence (L_out) in one forward step. This avoids the accumulation of prediction errors during the multi-step forecasting process.

3.5. Experimental Environment and Evaluation Metrics

All experiments were conducted on a workstation equipped with an NVIDIA GeForce RTX 3080 GPU and an Intel Core i9 CPU. The proposed model was implemented using the PyTorch (2.0.0) deep learning framework. The optimization algorithm used is Adam, with an initial learning rate of

1 \times 10^{- 4}

and a batch size of 32.

To quantitatively evaluate the prediction performance, three standard metrics were selected: Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). The formulas are as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}

(3)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \hat{y_{i}}|

(4)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - \hat{y_{i}}}{y_{i}}|

(5)

where

y_{i}

represents the actual measured value,

\hat{y_{i}}

represents the predicted value, and n is the number of samples.

4. Results

4.1. Experimental Data and Seasonal Settings

To accurately capture the transient details of fault evolution, the data acquisition system is configured for high-frequency sampling at 1 min intervals. The dataset covers typical winter and summer load distribution periods and comprises approximately 40,000 multivariate time-series samples, with invalid records filtered out.

In the rural distribution network under study, the low-voltage side voltage at the transformer (head end of the distribution area) is typically maintained around 240 V. This intentionally elevated operating strategy compensates for line losses and ensures voltage stability at the end users, often exceeding the urban grid’s standard threshold of 235 V. This practice is closely related to the characteristics of rural power supply, such as long supply radii and dispersed loads.

In particular, samples with voltages exceeding 245 V, falling below 220 V, or even approaching or dropping below 198 V are regarded as critical fault precursors or abnormal states (with 245 V serving as a key upper/lower limit reference).

To visually illustrate the aforementioned characteristics, Figure 4 presents the typical normal operating waveform of the distribution area. As observed, although the voltage is maintained at a high level of approximately 240 V, the curve remains smooth with no abrupt changes. In contrast, Figure 5 demonstrates two representative types of fault precursors: Figure 5 shows a rapid voltage drop under heavy load accompanied by high-frequency fluctuations; Figure 6 depicts an abnormal voltage rise to 250 V, exhibiting a dangerous overvoltage trend. The model proposed in this paper aims to precisely distinguish the normal high voltage in Figure 4 from the abnormal voltage violations shown in Figure 5 and Figure 6.

The experimental validation is conducted using the real-world dataset described in Section 3.2. To comprehensively verify the robustness of the proposed XLSTM-Informer model under complex load profiles, the test set is specifically structured to cover three typical seasonal scenarios in rural distribution networks:

Spring (Transition Season): Characterized by moderate load levels and relatively stable fluctuations.
Summer (Cooling Season): Characterized by persistently high loads due to the continuous operation of large-scale air conditioning equipment.
Winter (Heating Season): Characterized by sharp evening peaks and high volatility, driven by the centralized usage of electric heating devices (following the “coal-to-electricity” conversion policy) in rural areas.

To rigorously evaluate the model’s generalization ability under complex operating conditions, we established strict quantitative selection criteria for “extreme scenarios.” Based on the load characteristics of the distribution network in Northern China and the operational impact of the “Coal-to-Electricity” policy, three typical datasets were extracted. The specific definitions and thresholds are as follows:

(1): Scenario I: Summer Sustained High-Load: This scenario represents the grid status during summer heatwaves, driven by the continuous operation of large-scale air conditioning loads. The primary challenge here is the thermal accumulation in transformers rather than instantaneous volatility. Selection Criteria: Time Window: 10:00 to 16:00 (Peak temperature period). Load Threshold: The Load Factor (LF = P_avg/P_rated) must exceed 90% continuously. Duration: The high-load state must persist for at least 5 h. Rationale: Sustained high load reduces the thermal margin of the equipment, making the voltage baseline sensitive to minor fluctuations.
(2): Scenario II: Winter Sharp Peak and High Volatility, This scenario captures the evening peaks in winter, heavily influenced by the “Coal-to-Electricity” policy. The synchronization of residential electric heating equipment creates sharp load impulses. Selection Criteria: Time Window: 18:00 to 22:00 (Residential heating peak). Volatility Threshold (Ramp Rate): The Load Ramp Rate (R_t) must exceed 15% of the rated capacity within a 15 min interval. Rationale: The rapid ramp-up of heating loads causes inductive impact currents, testing the model’s ability to capture instantaneous voltage sags (transient features) rather than just trends.
(3): Scenario III: Transitional Season (Baseline): Served as a control group, selected from spring/autumn data where the daily average load factor is between 0.3 and 0.5, with a voltage variance less than 0.02.

All models were trained and tested on the same hardware platform (NVIDIA GeForce RTX 3080) as detailed in Section 3.5.

4.2. Overall Performance Comparison

To rigorously evaluate the general trend prediction capability of the proposed model, we conducted a comprehensive benchmark on the complete test dataset. The XLSTM-Informer was compared against mainstream deep learning baselines (CNN, LSTM) as well as state-of-the-art transformer variants, including Autoformer, FEDformer, and the original Informer. The quantitative results are presented in Table 1.

As shown in Table 1, the proposed model consistently achieves superior performance across all evaluation metrics. Notably, compared to the standard Informer, our model yields a significant reduction in error rates—lowering the MSE by approximately 30.1% and the MAPE by 10.6%. These results provide compelling evidence that the integration of the XLSTM module effectively enhances the capture of local transient details without compromising the global trend modeling advantages inherent to the Informer architecture.

As can be observed from Figure 7, the proposed model achieves satisfactory performance.

4.3. Ablation Studies

To verify the specific contributions of the XLSTM (Local Encoder) and the Informer (Global Decoder), ablation experiments are conducted. The results are shown in Table 2.

The following observations can be made regarding the internal mechanisms of the proposed method:

Contribution of Matrix Memory: Model D (XLSTM-Sigmoid) achieves a lower prediction error compared to Model C (LSTM + Informer). Since both models utilize the standard Sigmoid activation, this improvement suggests that replacing scalar memory cells with matrix memory structures contributes to expanding the state capacity, thereby enabling the model to retain more informative historical patterns under complex grid conditions.

Impact of Exponential Gating: Notably, the proposed Model E (XLSTM-Informer) shows further performance gains over the ablation variant Model D. By employing the exponential gating mechanism instead of the Sigmoid activation, the model effectively mitigates the gradient saturation issue. This modification allows Model E to assign larger weights to instantaneous error gradients, enhancing its sensitivity to transient fault symptoms (e.g., voltage sags) and reducing the prediction lag.

Overall Effectiveness: Consequently, Model E demonstrates the best overall performance across the evaluated metrics. This indicates that the serial fusion strategy effectively couples the high-frequency feature extraction of the XLSTM encoder with the long-term trend modeling of the Informer decoder, offering a competitive and reliable solution for distribution network fault symptom prediction.

4.4. Performance Analysis Under Different Seasonal Load Profiles

This section focuses on the model’s adaptability to the distinct load characteristics of summer and winter, which is critical for practical engineering applications.

Quantitative Analysis (across seasons): Table 3 details the prediction performance across three seasons.

As shown in Table 3, the model maintains high accuracy (MAPE < 3%) even in winter, where load volatility is highest due to the stochastic nature of rural electric heating.

Interpretation of Results: Inverse Performance Trend: Interestingly, the results exhibit an inverse relationship between load volatility and prediction error. The model achieves its best performance in winter (lowest MSE of 0.0066 and MAPE of 0.45%), despite this season being characterized by “sharp peaks” due to electric heating loads. Reason for winter superiority: This phenomenon highlights the specific advantage of the XLSTM module. Standard LSTMs often struggle with sharp peaks due to gradient saturation. However, the exponential gating mechanism in our proposed model is specifically designed to assign higher importance to these large, instantaneous gradients (the “sharp peaks”). Consequently, the model captures the regular heating cycles in winter more precisely than the “moderate but random” fluctuations observed in spring. Overall Stability: The comparison confirms that the proposed method is not only accurate under stable conditions (spring) but becomes increasingly effective when handling complex, high-load scenarios (summer and winter), demonstrating strong robustness against load mutations.

5. Discussion

5.1. Mechanism Analysis of Model Superiority

The experimental results in Section 3 demonstrate that the proposed XLSTM-Informer model consistently outperforms traditional deep learning methods across various metrics. This superiority can be attributed to the complementary nature of its two core components:

Solving the Long-Term Dependency Problem: In the “summer” scenario (Figure 5), the load exhibits a continuously high level due to air conditioning. Traditional RNN-based models (like LSTM) suffer from memory forgetting in such long sequences, often failing to maintain the high-load trend prediction. The Informer module, with its ProbSparse self-attention mechanism, effectively captures these global dependencies, ensuring the prediction curve does not drift over time.

Overcoming the “Smoothing Effect”: A common drawback of standard transformer models is their tendency to produce smooth outputs, acting like a low-pass filter that ignores high-frequency mutations. This is fatal for fault prediction. In the “winter” scenario, where electric heating causes sudden load spikes, the XLSTM module plays a crucial role. Its exponential gating mechanism acts as a high-sensitivity trigger, allowing the model to respond instantly to these abrupt changes, thereby capturing potential fault symptoms that other models miss.

5.2. Practical Value for Active Warning in Rural Grids

The study utilized data from a rural distribution network, which presents unique challenges compared to urban grids, such as weaker infrastructure and higher load volatility due to the “coal-to-electricity” policy.

Robustness Across Seasons: As shown in Table 3, the model maintains high accuracy (MAPE < 3%) in both the high-load summer and the volatile winter. This indicates that the model is robust enough to be deployed in real-world environments with changing seasonal patterns without frequent retraining.

Transition from Passive to Active O&M: Traditionally, distribution network maintenance relies on post-fault repair. The proposed model provides accurate multi-step trend prediction (covering the next few hours). This capability allows grid operators to identify potential overloads or voltage violations before they occur, enabling proactive measures such as load shedding or voltage regulation. This shifts the operational paradigm from “passive defense” to “active warning.”

5.3. Limitations and Future Work

Despite the promising results in fault symptom trend prediction, this study has certain limitations that point towards future research directions. First, the current model focuses on the time-series forecasting of electrical quantities (voltage, current). While it can successfully predict future trends and identify potential deviations (early warnings) based on prediction errors, it does not yet possess the capability to autonomously diagnose the specific type of fault (e.g., single-phase grounding, inter-phase short circuit, or high-impedance fault). In practical engineering, after an early warning is triggered, operators need to know not only “that something is wrong” but also “what exactly is wrong” to take targeted measures.

Therefore, future work will focus on rapid and accurate fault classification based on the predicted trends. Specifically, we propose a “Two-Stage Predict-and-Diagnose” framework with the following detailed designs:

Connection Scheme (Serial Fusion Strategy): We plan to construct a serial pipeline where the proposed XLSTM-Informer acts as the upstream “symptom predictor.” Crucially, we will adopt a feature-level fusion strategy rather than simple data transmission. Input Interface: The high-precision waveform sequence generated by the model will serve as the primary input. Latent Feature Sharing: To enhance information density, the high-dimensional hidden state vectors extracted by the XLSTM encoder—which contain rich historical volatility patterns—will be concatenated with the predicted sequence and fed into the downstream classifier.

Classification Network Architecture: Instead of generic classifiers, we intend to design a Multi-Scale Temporal Convolutional Network (TCN) with Attention Mechanism. Structure: The network will utilize dilated causal convolutions with varying kernel sizes to extract features from the predicted trajectories at different time scales. Mechanism: An attention layer will be integrated to automatically assign weights to critical time steps (e.g., the exact moment of a voltage sag), enabling the system to distinguish fine-grained fault signatures.

Fast Pre-fault Diagnosis: By utilizing the multi-step prediction capability of the current model, this cascading system can analyze the predicted future data rather than waiting for the fault to fully develop. This aims to achieve “pre-fault diagnosis,” enabling the system to identify the fault type and isolate the faulty section faster and more accurately before the protection relay trips.

6. Conclusions

This paper proposes a novel XLSTM-Informer fusion framework designed to enhance the prediction of fault symptoms in distribution networks. By integrating the matrix memory mechanism of XLSTM with the ProbSparse attention of Informer, the proposed model effectively mitigates the inherent trade-off between capturing local transient mutations and modeling global long-term dependencies. The key findings and contributions of this study are summarized as follows:

Superior Predictive Accuracy: Comprehensive experiments on real-world datasets demonstrate that the proposed model significantly outperforms mainstream baselines (including CNN, LSTM) and state-of-the-art transformer variants (Autoformer, FEDformer). Specifically, compared to the standard Informer, the proposed method reduces the MSE by 30.1% and MAPE by 10.6%. This confirms that the serial fusion architecture effectively mitigates the “smoothing effect” common in frequency-domain models, ensuring the retention of critical high-frequency fault signatures.
Robustness Under Complex Loads: The model exhibits exceptional adaptability to varying load characteristics. As evidenced by the seasonal analysis, the model achieves its lowest error rate (MAPE of 0.45%) during the winter season, which is characterized by sharp, non-linear heating peaks. This indicates that the exponential gating mechanism of XLSTM is highly effective in locking onto instantaneous fluctuations, making the model reliable for monitoring extreme operating conditions.
Future Roadmap: While the current work focuses on trend forecasting, we have outlined a clear path toward a complete “Predict-and-Diagnose” system. Future work will implement a cascading framework where the predicted waveforms are fed into a Multi-Scale TCN with Attention to achieve specific fault type classification. This evolution aims to transform the current “early warning” capability into a comprehensive “pre-fault diagnosis” solution for smart grids.

Author Contributions

Conceptualization, Z.C., L.G. and Y.C.; methodology, Z.C. and Y.C.; software, Z.C. and Y.C.; validation, Z.C. and Y.C.; formal analysis, Z.C. and Y.C.; investigation, Z.C. and Y.C.; resources, Z.C.; data curation, Z.C.; writing—original draft preparation, Z.C.; writing—review and editing, L.G.; visualization, Z.C.; supervision, L.G.; funding acquisition, L.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Shandong Provincial Department of Science and Technology; grant number 2025TSGCCZZB0330 (Project Title: Chat BI Analysis System Based on Large Model). The APC was not funded.

Data Availability Statement

The datasets presented in this article are not readily available because the datasets are part of an ongoing study. Requests to access the datasets should be directed to Lin Gao (Qingdao University of Science and Technology); E-mail: gaolin0619@126.com.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Smith, E.J.; Robinson, D.A.; Elphick, S. DER Control and Management Strategies for Distribution Networks: A Review of Current Practices and Future Directions. Energies 2024, 17, 2636. [Google Scholar] [CrossRef]
Gao, X.; Zhang, J.; Sun, H.; Liang, Y.; Wei, L.; Yan, C.; Xie, Y. A Review of Voltage Control Studies on Low Voltage Distribution Networks Containing High Penetration Distributed Photovoltaics. Energies 2024, 17, 3058. [Google Scholar] [CrossRef]
Yang, L.; Teh, J. Review on vulnerability analysis of power distribution network. Electr. Power Syst. Res. 2023, 224, 109741. [Google Scholar] [CrossRef]
Atrigna, M.; Buonanno, A.; Carli, R.; Cavone, G.; Scarabaggio, P.; Valenti, M.; Graditi, G.; Dotoli, M. A Machine Learning Approach to Fault Prediction of Power Distribution Grids Under Heatwaves. IEEE Trans. Ind. Appl. 2023, 59, 4835–4845. [Google Scholar] [CrossRef]
Dashti, R.; Daisy, M.; Mirshekali, H.; Shaker, H.R.; Aliabadi, M.H. A survey of fault prediction and location methods in electrical energy distribution networks. Measurement 2021, 184, 109947. [Google Scholar] [CrossRef]
Rosero-Morillo, V.A.; Gonzalez-Longatt, F.; Quispe, J.C.; Salazar, E.J.; Orduña, E.; Samper, M.E. Emerging Trends in Active Distribution Network Fault Detection. Renew. Energy Focus 2025, 53, 100684. [Google Scholar] [CrossRef]
Renga, D.; Apiletti, D.; Giordano, D.; Nisi, M.; Huang, T.; Zhang, Y.; Mellia, M.; Baralis, E. Data-driven exploratory models of an electric distribution network for fault prediction and diagnosis. Computing 2020, 102, 1847–1868. [Google Scholar] [CrossRef]
Mortensen, L.K.; Shaker, H.R.; Veje, C.T. Relative fault vulnerability prediction for energy distribution networks. Appl. Energy 2022, 322, 119449. [Google Scholar] [CrossRef]
Guo, M.F.; Yang, N.C.; You, L.X. Wavelet-transform based early detection method for short-circuit faults in power distribution networks. Int. J. Electr. Power Energy Syst. 2018, 99, 706–721. [Google Scholar] [CrossRef]
Lin, R.; Pei, Z.; Ye, Z.; Wu, B.; Yang, G. A voted based random forests algorithm for smart grid distribution network faults prediction. Enterp. Inf. Syst. 2019, 13, 1216–1234. [Google Scholar] [CrossRef]
Fang, J.; Wang, H.; Yang, F.; Yin, K.; Lin, X.; Zhang, M. A failure prediction method of power distribution network based on PSO and XGBoost. Aust. J. Electr. Electron. Eng. 2022, 19, 371–378. [Google Scholar] [CrossRef]
Cai, J.; Cai, Y.; Cai, H.; Shi, S.; Lin, Y.; Xie, M. Feeder Fault Warning of Distribution Network Based on XGBoost. J. Phys. Conf. Ser. 2020, 1639, 012037. [Google Scholar] [CrossRef]
Xiang, M.; Min, J.; Wang, Z.; Gao, P. A Novel Fault Early Warning Model Based on Fault Gene Table for Smart Distribution Grids. Energies 2017, 10, 1963. [Google Scholar] [CrossRef]
Xu, Y.; Dai, Z.; Zhao, Y.; Zhao, C.; Li, M. Research on the HHO improved BP neural network fault prediction algorithm for distribution networks based on weather factors. J. Phys. Conf. Ser. 2024, 2831, 012014. [Google Scholar] [CrossRef]
Balouji, E.; Bäckström, K.; Olsson, V.; Hovila, P.; Niveri, H.; Kulmala, A.; Salo, A. Distribution Network Fault Prediction Utilising Protection Relay Disturbance Recordings and Machine Learning. In Proceedings of the 27th International Conference on Electricity Distribution (CIRED); IET: Stevenage, UK, 2023. [Google Scholar]
Yang, M.; Chen, B.; Lin, C.; Yao, W.; Li, Y. SGI-YOLOv9: An effective method for crucial components detection in the power distribution network. Front. Phys. 2024, 12, 1517177. [Google Scholar] [CrossRef]
Zheng, X.; Wang, H.; Jiang, K.; He, B. A Novel Machine Learning-Based Short-Circuit Current Prediction Method for Active Distribution Networks. Energies 2019, 12, 3792. [Google Scholar] [CrossRef]
Skydt, M.R.; Bang, M.; Shaker, H.R. A Probabilistic Sequence Classification Approach for Early Fault Prediction in Distribution Grids Using Long Short-Term Memory Neural Networks. Measurement 2020, 170, 108691. [Google Scholar] [CrossRef]
Matos-Carvalho, J.P.; Stefenon, S.F.; Leithardt, V.R.Q.; Yow, K.C. Time series forecasting based on optimized LLM for fault prediction in distribution power grid insulators. arXiv 2025, arXiv:2502.17341. [Google Scholar] [CrossRef]
Zhang, C.; Song, N.; Li, Y. Incipient fault identification of distribution networks based on feature matching of power disturbance data. Electr. Eng. 2021, 103, 2421–2432. [Google Scholar] [CrossRef]

Figure 1. The overall architecture of the proposed XLSTM-Informer framework.

Figure 3. The structure of the XLSTM unit with exponential gating and matrix memory.

Figure 4. Normal high-voltage operation waveform based on the head-end boosting strategy.

Figure 5. Fault evolution waveform containing extreme over-voltage and mutation characteristics.

Figure 6. Fault evolution waveform containing extreme over-voltage and mutation characteristics.

Figure 7. Comparative experiment visualization.

Table 1. Overall performance comparison with state-of-the-art models on the complete test set.

Model	MSE	MAE	MAPE	R²
CNN	0.0120	0.085	0.025	0.953
LSTM	0.0483	0.190	0.0378	0.961
Transformer	0.0105	0.090	0.0221	0.963
Informer	0.0113	0.092	0.0226	0.961
Autoformer	0.0080	0.078	0.0206	0.964
FEDformer	0.0090	0.090	0.0203	0.97
XLSTM-Informer	0.0079	0.070	0.0202	0.97

Table 2. Quantitative comparison of ablation studies.

Model Configuration	MSE	MAE	MAPE
Model A: Base Standard LSTM	0.0483	0.190	0.0250
Model B: XLSTM Only	0.0357	0.120	0.0226
Model C: LSTM + Informer	0.0090	0.098	0.0206
Model D: XLSTM-Sigmoid (Ablation)	0.0086	0.096	0.0205
Model E: XLSTM-Informer (Proposed)	0.0079	0.070	0.0202

Table 3. Performance breakdown of the proposed model across different seasons.

Season	Load Characteristic	MSE	MAE	MAPE
Spring	Baseline: Moderate load, stable	0.0094	0.084	0.0636
Summer	High sustained load (air conditioning)	0.0079	0.070	0.0202
Winter	Sharp peaks (electric heating/coal-to-electricity)	0.0066	0.075	0.0045

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Z.; Gao, L.; Cheng, Y. Trend Prediction of Distribution Network Fault Symptoms Based on XLSTM-Informer Fusion Model. Energies 2026, 19, 1389. https://doi.org/10.3390/en19061389

AMA Style

Chen Z, Gao L, Cheng Y. Trend Prediction of Distribution Network Fault Symptoms Based on XLSTM-Informer Fusion Model. Energies. 2026; 19(6):1389. https://doi.org/10.3390/en19061389

Chicago/Turabian Style

Chen, Zhen, Lin Gao, and Yuanming Cheng. 2026. "Trend Prediction of Distribution Network Fault Symptoms Based on XLSTM-Informer Fusion Model" Energies 19, no. 6: 1389. https://doi.org/10.3390/en19061389

APA Style

Chen, Z., Gao, L., & Cheng, Y. (2026). Trend Prediction of Distribution Network Fault Symptoms Based on XLSTM-Informer Fusion Model. Energies, 19(6), 1389. https://doi.org/10.3390/en19061389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Trend Prediction of Distribution Network Fault Symptoms Based on XLSTM-Informer Fusion Model

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Overall Architecture

3.2. Data Acquisition and Processing

3.3. The Local Feature Encoder: XLSTM

3.4. The Global Trend Decoder: Informer

3.5. Experimental Environment and Evaluation Metrics

4. Results

4.1. Experimental Data and Seasonal Settings

4.2. Overall Performance Comparison

4.3. Ablation Studies

4.4. Performance Analysis Under Different Seasonal Load Profiles

5. Discussion

5.1. Mechanism Analysis of Model Superiority

5.2. Practical Value for Active Warning in Rural Grids

5.3. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI