Enhanced Fault Prediction for Synchronous Condensers Using LLM-Optimized Wavelet Packet Transformation

Zhang, Dongqing; Li, Shenglong; Hong, Tao; Zhang, Chaofeng; Zhao, Wenqiang

doi:10.3390/electronics14020308

Open AccessArticle

Enhanced Fault Prediction for Synchronous Condensers Using LLM-Optimized Wavelet Packet Transformation

by

Dongqing Zhang

¹,

Shenglong Li

²,

Tao Hong

^3,*

,

Chaofeng Zhang

^4,5 and

Wenqiang Zhao

²

¹

DC Technical Center of State Grid Corporation of China, Beijing 100031, China

²

State Grid Qinghai Electric Power Research Institute, Xining 810008, China

³

School of Electronics and Information Engineering, Beihang University, Beijing 100191, China

⁴

State Grid Hunan Extra High Voltage Substation Company, Changsha 410004, China

⁵

Substation Intelligent Operation and Inspection Laboratory of State Grid Hunan Electric Power Co., Ltd., Changsha 410029, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(2), 308; https://doi.org/10.3390/electronics14020308

Submission received: 28 November 2024 / Revised: 3 January 2025 / Accepted: 11 January 2025 / Published: 14 January 2025

Download

Browse Figures

Versions Notes

Abstract

This paper presents an enhanced fault prediction framework for synchronous condensers in UHVDC transmission systems, integrating Large Language Models (LLMs) with optimized Wavelet Packet Transform (WPT) for improved diagnostic accuracy. The framework innovatively employs LLMs to automatically optimize WPT parameters, addressing the limitations of traditional manual parameter selection methods. By incorporating a Multi-Head Attention Gated Recurrent Unit (MHA-GRU) network, the system achieves superior temporal feature learning and fault pattern recognition. Through intelligent parameter optimization and advanced feature extraction, the LLM component intelligently selects optimal wavelet decomposition levels and frequency bands, while the MHA-GRU network processes the extracted features for accurate fault classification. Experimental results on a high-capacity synchronous condenser demonstrate the framework’s effectiveness in detecting rotor, air-gap, and stator faults across diverse operational conditions. The system maintains efficient real-time processing capabilities while significantly reducing false alarm rates compared to conventional methods. This comprehensive approach to fault prediction and diagnosis represents a significant advancement in synchronous condenser fault prediction, offering improved accuracy, reduced processing time, and enhanced reliability for UHVDC transmission system maintenance.

Keywords:

large language model (LLM); multi-head attention gated recurrent unit (MHA-GRU); intelligent fault prediction; synchronous condenser; UHVDC transmission

1. Introduction

Synchronous condensers play a vital role in maintaining voltage stability within ultra-high voltage direct current (UHVDC) transmission systems through dynamic reactive power support. However, fault prediction in these critical devices presents significant challenges due to the complex interaction of electrical, mechanical, and thermal phenomena. The intricate nature of fault development, coupled with the diverse manifestations across different components, makes early detection particularly challenging. For instance, rotor winding faults may exhibit subtle electromagnetic signatures before mechanical symptoms appear, while air-gap eccentricity might simultaneously affect multiple operational parameters. These challenges are further compounded by the critical nature of synchronous condensers in grid stability, where undetected faults can lead to cascading failures and significant power disruptions.

Early detection of faults, such as rotor winding short circuits or air-gap eccentricity, presents significant challenges due to the complex interaction of electrical, mechanical, and thermal phenomena. For instance, rotor winding short circuits can reduce the effective number of winding turns, compromising the machine’s ability to generate the necessary magnetic field for reactive power support. Air-gap eccentricity can cause uneven magnetic fields and excessive vibrations, leading to mechanical stress and potential long-term damage. These challenges are further compounded by the diverse manifestations across different components, where faults may exhibit subtle electromagnetic signatures before mechanical symptoms appear.

Traditional fault detection approaches, while established, face considerable limitations in addressing these multi-faceted challenges. Conventional signal processing methods, such as Fourier analysis and wavelet transforms, typically process different data modalities independently, failing to capture the complex correlations between various fault indicators. Existing methods can be broadly categorized into signal processing-based approaches, machine learning solutions, and hybrid frameworks. While signal processing methods excel at feature extraction, they struggle with parameter optimization. Machine learning approaches have shown improved accuracy but lack interpretability, and hybrid solutions often face challenges in real-time processing and parameter tuning.

The emergence of LLMs presents a promising opportunity to overcome these limitations through their unique capabilities in multi-modal data understanding and knowledge reasoning. Recent advancements in LLM architectures have demonstrated exceptional abilities in complex pattern recognition and parameter optimization. Their capacity to process heterogeneous data streams and provide interpretable outputs makes them particularly suitable for synchronous condenser fault detection. Furthermore, LLMs’ ability to capture temporal dependencies and correlations across different data modalities addresses the limitations of traditional compartmentalized approaches.

Without adequate reactive power support, UHVDC systems are at risk of experiencing voltage drops that could escalate into severe voltage collapse. Such events can disrupt the entire transmission network, leading to widespread blackouts and significant economic losses. Therefore, the proper functioning and maintenance of synchronous condensers are not just important but critical to the reliability and stability of the entire transmission system. By effectively managing reactive power, these machines help prevent power disruptions, ensuring consistent and reliable electricity supply over extensive transmission networks.

Fault prediction in synchronous condensers is a critical aspect of maintaining operational safety and preventing catastrophic failures across the power grid. These devices play a key role in stabilizing voltage levels in UHVDC transmission systems, and any malfunction can have serious consequences, potentially leading to widespread outages [1]. Therefore, the ability to detect and predict faults at an early stage is essential to ensure the smooth and reliable operation of the entire transmission network. A key strength of our approach lies in its generalizability across different synchronous condenser configurations and operating conditions. The framework’s adaptive architecture enables efficient transfer learning, allowing it to be readily applied to new machines while maintaining high diagnostic accuracy. This generalization capability makes our framework particularly valuable for industrial applications where equipment specifications and operating conditions may vary significantly.

Early detection of faults, such as rotor winding short circuits or air-gap eccentricity, is crucial because it allows for timely interventions before these issues can develop into more serious problems [2]. For instance, a rotor winding short circuit can reduce the effective number of winding turns, compromising the machine’s ability to generate the necessary magnetic field for reactive power support. Similarly, air-gap eccentricity, which occurs when the rotor deviates from its central axis, can cause uneven magnetic fields and excessive vibrations, leading to mechanical stress and potential long-term damage. Detecting these faults early allows operators to implement corrective actions, such as maintenance or realignment, before they cause significant disruptions to the grid’s stability.

The complexity of synchronous condensers, which integrate electrical, mechanical, and thermal components, generates heterogeneous data streams under varying operational conditions. This multi-modal nature makes it challenging for traditional diagnostic methods to effectively identify and predict faults, as they typically focus on single-domain analysis. LLMs, with their superior capabilities in multi-modal data understanding and knowledge reasoning, emerge as a promising solution. These advanced models can simultaneously process diverse data types while providing interpretable diagnostic results, enabling operators to detect potential issues before they escalate into system failures.

This research addresses several critical questions in the field of synchronous condenser fault prediction. First, regarding the integration of LLMs with wavelet packet transformation, our framework demonstrates that LLMs can effectively optimize WPT parameters through intelligent feature extraction and pattern recognition. The integration is achieved by utilizing LLM’s deep learning capabilities to automatically adjust wavelet decomposition levels and select optimal basis functions, resulting in more accurate fault pattern identification.

Second, concerning improvements through LLM-optimized parameter selection, our research reveals significant enhancements in both accuracy and computational efficiency. The LLM-driven approach achieves optimization of multiple parameters simultaneously, including decomposition levels, threshold values, and feature selection criteria. This automated optimization process demonstrates a 15% improvement in detection accuracy and 30% reduction in processing time compared to traditional manual parameter selection methods.

Third, regarding the enhancement of temporal feature learning, our combined MHA-GRU architecture significantly improves fault detection capabilities by effectively capturing both short-term and long-term temporal dependencies in the sensor data. The multi-head attention mechanism enables parallel processing of different temporal features, while the GRU network maintains efficient memory of historical patterns, resulting in more robust and accurate fault prediction.

The primary objective of this study is to develop an LLM-driven intelligent framework, incorporating LLM-Optimized Wavelet Packet Transformation (LLM-WPT), for fault prediction in synchronous condensers within UHVDC transmission systems. The framework innovatively leverages LLM’s capabilities in processing multi-modal data through optimized wavelet decomposition, enabling enhanced feature extraction and providing interpretable decisions for early fault detection. Given the critical role of synchronous condensers in grid stability, this intelligent system aims to enhance operational reliability by accurately identifying potential faults across various components through comprehensive analysis of heterogeneous operational data. Unlike traditional WPT methods with fixed decomposition and manual feature selection [3], our LLM-WPT framework enables dynamic optimization and automated fault pattern recognition, significantly improving detection accuracy.

To support the LLM’s decision-making process, we integrate Gated Recurrent Unit (GRU) networks as a complementary component for temporal feature analysis. The GRU networks, enhanced by multi-head attention mechanisms, efficiently process sequential operational data and capture temporal dependencies, providing additional insights to the LLM-based diagnostic system. This combination enables robust temporal pattern recognition while maintaining computational efficiency for real-time applications [4].

The proposed framework represents a significant advancement in synchronous condenser fault prediction through a novel integration where LLM dynamically optimizes WPT parameters while working synergistically with MHA-GRU networks. The LLM component continuously analyzes system feedback to optimize wavelet parameters for enhanced feature extraction, which are then processed by MHA-GRU networks for temporal pattern recognition. This bidirectional interaction between LLM-optimized feature extraction and MHA-GRU temporal modeling creates a self-improving prediction system that offers a practical solution for enhancing the reliability and safety of UHVDC transmission systems [5].

2. Background and Related Work

This section reviews the foundational aspects of synchronous condenser fault diagnosis. We begin by examining common fault types and their mechanisms, followed by an analysis of traditional detection methods. Finally, we explore the emerging applications of LLMs in industrial fault prediction, establishing the context for our proposed framework.

2.1. Synchronous Condensers Fault Types

Synchronous condensers in UHVDC systems exhibit three primary fault types: rotor winding inter-turn short circuits, air-gap eccentricity, and stator winding faults [6].

Rotor winding inter-turn short circuits occur when insulation between adjacent turns in the rotor windings deteriorates, causing electrical currents to bypass parts of the winding and creating a short circuit [7]. This reduces the effective number of winding turns, weakening the rotor’s ability to generate the necessary magnetic field.

Air-Gap Eccentricity Faults: Air-gap eccentricity in a synchronous condenser occurs when the rotor deviates from its central position within the stator, disrupting the uniform air gap and causing uneven magnetic fields [8]. This misalignment leads to varying magnetic flux densities, which generate excessive vibrations due to imbalanced mechanical forces. Over time, these vibrations cause wear on components such as bearings, the rotor shaft, and stator windings.

Stator Winding Faults: Stator winding faults in synchronous condensers typically result from insulation failure between windings, causing abnormal currents to flow through unintended paths and leading to various operational issues [9]. Normally, insulation ensures smooth current flow, but when it deteriorates, short circuits can occur, disrupting the designed current patterns.

The consequences include reduced efficiency, unplanned downtime, and costly repairs. In severe cases, extensive stator damage might require a complete winding replacement, which is expensive and time-consuming. Regular monitoring and maintenance are essential to detect early signs of insulation degradation, enabling timely intervention. Effective fault prediction and diagnostics help maintain the health of stator windings, ensuring reliable performance of the synchronous condenser and overall grid stability.

Other Common Faults: In addition to rotor and stator faults, synchronous condensers are prone to bearing degradation and excitation system failures, both of which can significantly impact performance if not addressed promptly.

Bearing degradation results from wear and tear due to mechanical stress, poor lubrication, and high operational loads [10]. As bearings deteriorate, they cause increased friction and vibration, which can lead to further mechanical stress and eventually rotor imbalance. If left unchecked, this can progress to rotor seizure, potentially causing severe damage and costly repairs, disrupting the entire power transmission system.

Failures in the excitation system, which controls the rotor’s magnetic field, can lead to irregular current flow and unstable reactive power output, causing grid voltage fluctuations. These issues can also result in overheating and erratic performance, further compromising system efficiency and safety.

To mitigate these faults, regular maintenance, real-time monitoring, and advanced diagnostics are essential. Detecting signs of wear, abnormal vibration, or irregular currents early allows operators to take preventive actions, ensuring the reliable performance of synchronous condensers and stability across the power grid.

The stator, with its windings, produces a magnetic field when current flows through it. This interacts with the rotor, which rotates within the stator, allowing the condenser to provide reactive power, essential for maintaining voltage stability in the power grid. The mathematical model simulates these relationships to predict reactive power under various conditions.

To illustrate the function of synchronous condensers, Figure 1a presents a simplified circuit model. The synchronous condenser (SC) is positioned at an intermediate node along the transmission line, where it dynamically adjusts reactive power to maintain voltage stability. The SC absorbs or supplies reactive power based on voltage fluctuations, ensuring stable voltage across the power transmission system.

A crucial aspect of the condenser’s operation is the magnetic field in the air-gap between the rotor and the stator. Misalignments, such as rotor eccentricities, can disrupt the magnetic field, leading to inefficiency, vibrations, and mechanical stress. Monitoring and analyzing the air-gap magnetic field is critical for early detection of these issues, ensuring timely maintenance and preventing severe damage.

Figure 1b shows the stator windings arranged in a three-phase star configuration, crucial for balanced load distribution and efficient energy transfer. Any imbalance or irregularity in the currents could disrupt the magnetic field and cause issues like air-gap eccentricity.

Vibration analysis also plays a vital role in diagnosing mechanical problems in the synchronous condenser. Changes in vibration patterns can indicate faults such as bearing wear or rotor misalignment. Continuous monitoring of vibrations allows for early detection and helps prevent more severe issues, ensuring the reliability and longevity of the system.

This modeling approach, combined with real-time monitoring, helps operators predict and detect potential faults, improving the performance and lifespan of synchronous condensers.

To illustrate this fault mechanism, Figure 1c provides a schematic representation of an inter-turn short circuit within the rotor winding. As depicted in Figure 1c, the diagram shows key electrical components related to the fault, including the excitation system electromotive force (EMF), excitation resistance, impedance of the short-circuited part, and contact resistance at the short circuit. The excitation current and short-circuit current are also shown. When a short circuit occurs, the current increases abnormally, leading to local overheating and potential further insulation degradation.

Real-time current monitoring helps detect sudden current changes, indicating a potential fault. Early detection is essential, as it prevents overheating, rotor damage, and further degradation.

Air-gap eccentricity occurs when the rotor is misaligned with the stator, causing an uneven air gap. This disrupts the magnetic field, leading to fluctuating magnetic forces and vibrations. These vibrations increase mechanical stress on components such as bearings, accelerating wear.

Vibration sensors monitor rotor behavior, while changes in magnetic flux help diagnose eccentricity. Early detection allows for corrective actions like rotor realignment, preventing further damage.

The system employs three primary categories of input data for comprehensive fault diagnosis. Mechanical parameters are acquired through shaft-mounted sensors and vibration probes, capturing both kinematic and dynamic behavior patterns. These measurements include acceleration, displacement, velocity, and vibration modes, which are processed using wavelet decomposition and time-frequency analysis techniques to extract fault-relevant features. Electrical indicators, collected via current/voltage sensors and power analyzers, provide essential information about the system’s electrical state, including phase voltage, current amplitude, power factor, and harmonic components. These parameters undergo FFT analysis and power spectrum density evaluation to identify potential electrical anomalies. Field distribution data, obtained from embedded flux sensors and field probes, offers detailed insights into the machine’s electromagnetic state by measuring magnetic field intensity, flux density, and field symmetry patterns. These measurements are analyzed through spatial-temporal decomposition to detect field-related faults. Table 1 presents a systematic overview of these input categories and their respective processing methodologies.

During normal operation, parameters like current, magnetic flux, and voltage follow stable patterns. Faults such as short circuits or air-gap eccentricity cause deviations like current spikes or flux fluctuations. Continuous monitoring establishes a baseline, enabling quick detection of abnormalities.

Advanced diagnostic systems compare real-time data to historical trends, helping to identify faults early and reduce equipment damage.

2.2. Traditional Fault Detection and Diagnosis Methods

Traditional fault detection methods include thermal imaging, visual inspections, and signal analysis techniques. While useful for basic monitoring, these methods have limitations in detecting early-stage faults and providing real-time analysis.

Modern signal analysis methods, such as Wavelet Packet Analysis (WPA) and Weighted Kurtogram Analysis, enhance traditional fault detection by offering detailed examination of system signals [11]. WPA decomposes signals into multiple frequency bands, while Weighted Kurtogram Analysis focuses on analyzing signal kurtosis for identifying impulsive events.

Recent AI advancements have improved fault diagnosis through neural networks and deep learning [12]. Radial Basis Function (RBF) neural networks excel at pattern recognition, while Gated Recurrent Unit (GRU) networks effectively analyze time-series data by capturing temporal dependencies.

In conclusion, traditional diagnostic methods like thermal imaging and visual inspections are reactive and limited in their ability to detect faults early or in real time. Signal-processing-based techniques offer better fault detection capabilities but are still hindered by manual processes and sensitivity to noise. AI-based methods present a promising solution, but their reliance on large datasets and computational power presents challenges that still need to be addressed. As technology continues to advance, these methods will likely become more efficient, allowing for more proactive, accurate, and real-time fault detection.

2.3. LLMs for Industrial Fault Prediction and Diagnosis

LLMs have advanced from natural language processing to industrial applications, including fault prediction and system diagnostics. Their ability to process multi-modal data and correlate information from various sources makes them effective for predictive maintenance and anomaly detection.

In addition to fault detection, LLMs are also improving human-machine collaboration by serving as intelligent assistants. These models can generate fault reports, suggest predictive maintenance actions, and even offer real-time insights based on continuous monitoring of industrial systems. They can also perform root cause analysis by synthesizing data from various sources, helping to identify the underlying causes of failures.

LLMs are also improving anomaly detection by processing vast amounts of sensor data in real time. By distinguishing between normal operational variations and true anomalies, they help identify issues such as unusual temperature spikes, pressure drops, or vibrations that may signal impending equipment malfunctions. Moreover, LLMs are proficient in fault classification, where they can use historical data to identify different types of faults, such as bearing failures, rotor imbalances, or electrical issues. These models can even suggest corrective actions based on the severity and type of fault detected.

In complex systems involving multiple interdependent components, such as those found in the energy, aerospace, and automotive sectors, LLMs excel at analyzing the system as a whole. By considering not only individual component performance but also the interactions between components and environmental factors, LLMs provide more accurate predictions and allow for better-informed decision-making regarding maintenance and fault prevention.

However, while the potential of LLMs in industrial fault prediction is clear, several challenges must be addressed for full-scale implementation. One of the main challenges is the need for high-quality, labeled data. Industrial systems often produce noisy, incomplete, or unstructured data, which makes training accurate models difficult. Additionally, obtaining labeled data for rare or atypical faults can be challenging, which hinders the model’s ability to predict less common failures. Integration with legacy systems is another obstacle, as many industrial environments still use outdated technologies with incompatible data formats. This requires significant effort to standardize data and develop interfaces that allow seamless integration of LLMs into existing infrastructures.

Computational complexity is another limitation, as LLMs require substantial computing resources, particularly for real-time fault prediction in large-scale industrial systems. While cloud and edge computing solutions are improving, real-time deployment can still be constrained by hardware limitations. Moreover, LLMs are often seen as “black boxes”, which may reduce trust in their predictions. In safety-critical industries, where understanding the reasoning behind a model’s decision is crucial, there is a need for better interpretability and transparency in these models.

3. LLM-Driven Intelligent Framework

This section presents our integrated framework for synchronous condenser fault prediction, which seamlessly combines LLM-optimized WPT with MHA-GRU networks. We detail the systematic workflow from multi-modal data acquisition through feature extraction to fault prediction. The framework leverages LLM capabilities to optimize wavelet transform parameters while utilizing MHA-GRU networks for temporal feature learning, achieving enhanced fault detection performance through intelligent parameter optimization and multi-modal feature integration.

3.1. Integration Framework of LLM-Optimized WPT and MHA-GRU Network

The LLM-based framework in industrial applications revolves around its core capabilities to process multi-modal data, extract patterns, and generate actionable insights. This framework comprises several critical components that ensure accurate real-time decision-making, with the LLM serving as the central engine for data processing, analysis, and decision-making. In our implementation, we employed GPT-4 as the Large Language Model underpinning the optimization process. This choice leverages GPT-4’s advanced mathematical capabilities and pattern recognition abilities in their base form, without additional fine-tuning. Rather than training or fine-tuning the model specifically for WPT optimization, we utilize GPT-4’s inherent mathematical reasoning capabilities to analyze the signal characteristics and determine optimal wavelet packet decomposition parameters.

The decision to use GPT-4 without fine-tuning was based on both practical and theoretical considerations. GPT-4’s pre-trained capabilities proved sufficient for processing the mathematical relationships in wavelet transformation, eliminating the need for resource-intensive model adaptation. This approach significantly reduces computational overhead while maintaining robust optimization performance. The model directly processes the mathematical representations of signal characteristics and wavelet parameters, leveraging its general-purpose mathematical reasoning abilities rather than requiring domain-specific training.

To justify our methodological choices, our LLM-optimized WPT is designed for data preprocessing and feature extraction from the original data, rather than optimizing neural network hyperparameters. The approach leverages LLM to determine optimal wavelet packet decomposition parameters, enhancing the initial feature representations for the MHA-GRU model. While Bayesian optimization and other methods are effective for tuning model parameters, our focus remains on optimizing the front-end signal processing.

For the model architecture, GRUs are chosen over Bayesian Neural Networks (BNNs) due to their specific advantages in time-series analysis and ability to capture temporal patterns while mitigating vanishing gradients. Although BNNs excel at modeling uncertainty through probabilistic weight distributions, our dataset feature prioritizes accurate sequence modeling over uncertainty quantification. GRUs better serve our fault prediction objectives by providing robust time-series analysis capabilities.

At its core, the LLM processes diverse inputs, from structured sensor data to unstructured textual logs and maintenance records. Through advanced natural language processing, it extracts meaningful features and identifies correlations across different data types to predict potential faults, diagnose issues, and recommend maintenance actions. The framework’s key components include data collection modules, preprocessing layers, real-time monitoring systems, and user interfaces.

One notable implementation of this framework is the MHA-GRU architecture, specifically designed for synchronous condenser fault prediction. This architecture enhances the LLM’s temporal modeling capabilities by combining GRU’s efficient sequential processing with multi-head attention’s ability to capture complex relationships in data. The MHA-GRU network processes multiple data streams simultaneously, including electromagnetic measurements, air-gap characteristics, vibration data, and current/voltage parameters, through its three-layer structure: input processing, hybrid processing, and fault prediction output.

The real-time monitoring system continuously tracks equipment performance, feeding data back to the LLM for dynamic prediction adjustments. Results are presented through user-friendly interfaces featuring dashboards, alert systems, and interactive visualizations. The system collects performance data for continuous improvement, ensuring the model maintains high accuracy through periodic retraining.

This integration of traditional LLM capabilities with specialized architectures like MHA-GRU demonstrates the framework’s adaptability to specific industrial applications, providing robust fault prediction and maintenance optimization capabilities while maintaining the core benefits of LLM-based analysis and decision-making. LLMs are deep learning frameworks capable of intelligent analysis of synchronous condenser signals through extensive data training and complex pattern learning. In fault diagnosis for synchronous condensers, LLMs can automatically select the optimal wavelet packet decomposition parameters, thus avoiding the subjectivity and errors associated with manual parameter setting, significantly enhancing diagnostic accuracy and efficiency [13].

LLMs can be represented as a conditional probability model:

P (y | x) = \arg \max_{θ} \prod_{i = 1}^{n} P (y_{i} | x_{i}; θ)

(1)

where

x

denotes the input signal sequence of the synchronous condenser, and

y

represents the output of the optimal wavelet packet decomposition parameters. By maximizing the likelihood to optimize the model parameters

θ

, LLMs can establish a mapping between input signal features and the optimal decomposition parameters.

A core technique of LLMs is the attention mechanism, which can be expressed as:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d}}) V

(2)

where

Q

,

K

, and

V

are the query, key, and value matrices, respectively, and

d

is a dimension normalization factor. Through this attention mechanism, LLMs can dynamically focus on key feature bands in the signal, especially on changes in frequency components when faults occur.

In practical applications, the inputs to the LLM include the decomposition levels

L

, the band selection vector

B = [b_{1}, \dots, b_{k}]

, and the wavelet basis function

ψ (t)

. By learning from historical data features, the LLM can output an optimized combination of decomposition levels and bands:

\hat{y} = \arg \max_{y} P (y | x)

(3)

This adaptive learning strategy enables LLMs to intelligently choose optimal parameters under various operating conditions, effectively optimizing signal decomposition [14].

Figure 2 presents our proposed framework that integrates LLM-optimized WPT with MHA-GRU networks through a systematic workflow. The framework implements a comprehensive data processing pipeline that enhances both feature extraction and fault prediction capabilities through four essential modules.

At the foundation of this architecture, the input layer acquires three categories of data: mechanical parameters, electrical indicators, and field distribution, establishing a multi-modal foundation. These inputs are processed by the LLM-optimized Wavelet Packet Transform (WPT) module through parameter optimization and feature extraction, enhancing the signal characteristics critical for fault detection.

Building upon the extracted features, the Feature Integration module implements multi-modal fusion and temporal-spatial correlation analysis to create comprehensive feature representations. These integrated features are then processed by the MHA-GRU Network, which incorporates multi-head attention mechanisms and gated recurrent units to classify four fault types: rotor, air-gap, stator, and bearing faults. This end-to-end architecture enables efficient and accurate fault diagnosis in complex systems.

3.2. Neural Network Architecture Design

In the evolution of deep learning architectures, GRU has emerged as an efficient variant of recurrent neural networks, offering simplified structure while maintaining effective temporal feature extraction. This study proposes an enhanced architecture combining GRU with multi-head attention mechanism for synchronous condenser fault prediction. The following sections detail the MHA-GRU network architecture and its prediction process flow, demonstrating how this integration improves early fault detection in UHVDC transmission systems.

(1): MHA-GRU Network Architecture:

The MHA-GRU network architecture enhances feature extraction and temporal modeling for synchronous condenser fault prediction by integrating a multi-head attention mechanism with GRU [15]. It comprises three key layers: an input processing layer that handles multi-source sensor data (e.g., electromagnetic measurements, vibration data), a hybrid processing layer that combines GRU with multi-head attention to capture complex temporal dependencies, and a fault prediction output layer that delivers accurate diagnostic results, as illustrated in Figure 3.

Key parameters, such as batch size

(B)

, learning rate

(η)

word vector dimension

(d_{w})

, position vector dimension

(d p)

, and the number of network layers

(L)

, are finely tuned to optimize the network’s performance, ensuring effective training and precise fault detection.

(2): Input Layer:

The input layer simultaneously processes four key types of data, i.e., electromagnetic measurements, air-gap characteristics, vibration data, and current/voltage parameters, capturing different aspects of the synchronous condenser’s operation. These signals undergo preprocessing, such as normalization and noise filtering, to ensure data quality.

(3): Hybrid Processing Layer:

The hybrid processing layer combines GRU and multi-head attention (MHA) components, enhancing the model’s ability to capture temporal dependencies and key feature patterns for effective fault detection [16].

GRU Component: The GRU is a simplified variant of traditional RNNs, designed to address issues like vanishing gradients while reducing computational complexity as shown in Figure 4. Unlike LSTM, which uses separate input, forget, and output gates, GRU combines the input and forget gates into a single update gate. This update gate

(z_{t})

determines how much of the previous hidden state

(h_{t - 1})

should be carried forward, while the reset gate

(r_{t})

modulates the incorporation of new information. The GRU equations can be summarized as [17]:

z_{t} = σ (W_{z} \cdot [h_{(t - 1)}, x_{t}] + b_{z}) r_{t} = σ (W_{r} \cdot [h_{(t - 1)}, x_{t}] + b_{r}) h ˜_{t} = t a n h (W_{h} \cdot [r_{t} ⊙ h_{(t - 1)}, x_{t}] + b_{h}) h_{t} = (1 - z_{t}) ⊙ h_{(t - 1)} + z_{t} ⊙ h ˜_{t}

(4)

where

σ

denotes the sigmoid activation,

x_{t}

is the input at time

t

,

W

and

b

are weight matrices and biases. The simplified architecture of GRU allows efficient processing of sequential data while preserving

h_{t - 1}

essential temporal features.

Multi-Head Attention Component: The multi-head attention mechanism enhances the model’s capacity to capture intricate relationships within sequential data by projecting the input into multiple subspaces. Each head processes distinct subspaces of the data, allowing the model to focus on different aspects of the input simultaneously. Given the input queries

(Q)

keys

(K)

, and values

(V)

, the mechanism computes:

Attention(Q, K, V) = softmax((QK^T)/√(d_k))V

(5)

where

d_{k}

is the dimension of the key vectors. By using multiple attention heads, the model applies several attention operations in parallel, resulting in richer feature representations:

M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, \dots, h e a d_{n}) W^{O}

(6)

Each attention head (

h e a d_{i}

) is computed using independently learned linear projections:

h e a d_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(7)

(3): Fault Prediction Output Layer:

The fault prediction output layer processes features from the hybrid processing layer to provide comprehensive assessments of multiple fault types in synchronous condenser systems. It delivers probability estimates and severity assessments for four key faults: rotor, air-gap, stator, and bearing faults [18].

This layer is represented by the fault prediction vector

y^{(< t >)}

, calculated as [19]:

y^{(< t >)} = ψ_{2} (W_{y} c^{(< t >)} + b_{y})

(8)

where

c^{(< t >)}

represents the combined feature vector from the hybrid processing layer, and

ψ_{2}

is the activation function transforming these features into probabilities for each fault type.

A softmax activation function is applied to convert the raw outputs into normalized probability distributions:

P (f a u l t_{i}) = e^{(y i)} / (\sum_{j = 1}^{4} e^{(y i)})

(9)

where

P (f a u l t_{i})

indicates the probability of fault

i

. When the probability for any fault exceeds a predefined threshold

τ_{i}

, the system triggers an early warning.

To quantify the potential impact, the layer also assigns a risk score

R_{i}

for each fault type:

R_i = P(fault_i) × S_i

(10)

where

S_{i}

is a severity coefficient derived from historical fault data, reflecting the potential impact. This dual assessment mechanism ensures early and accurate fault detection, enabling timely maintenance and reducing the risk of critical system failures.

3.3. Multi-Modal Feature Learning

In the context of industrial fault prediction, multi-modal feature learning involves integrating and processing data from various sources or modalities, such as sensor data, operational logs, maintenance records, and environmental data. These different data types provide complementary insights into system performance and can be combined to improve the accuracy of fault detection and prediction. The design of a multi-modal data processing workflow is essential for extracting meaningful features from diverse data streams to enhance the effectiveness of machine learning models.

A well-designed multi-modal data processing workflow typically follows several key steps. The first step is gathering data from multiple sources. This could include sensor data, such as vibration, temperature, pressure, and current measurements from machines and equipment; operational logs from control systems and machine status logs; maintenance records detailing past faults, repairs, and component replacements; and environmental data like humidity, ambient temperature, or pressure that could impact the system’s performance. These data sources are collected through IoT devices, data acquisition systems, and enterprise software. The next step is to integrate the data, ensuring it is aligned by time and context, creating a comprehensive dataset for further analysis [20].

Once the data is integrated, it often requires preprocessing to handle missing values, remove noise, and ensure uniformity across different data sources. The preprocessing steps may include missing value imputation, which involves filling in gaps in sensor readings or logs using statistical techniques like mean imputation or interpolation; noise filtering, which removes noise from sensor data using smoothing techniques, such as moving averages or filtering algorithms; normalization and scaling, which standardizes data to a common scale, especially when dealing with sensor data that may have different units, such as temperature in Celsius versus pressure in Pascals; and data augmentation, which generates synthetic data or uses techniques like bootstrapping to increase the robustness of the model when there is limited data, particularly for rare fault events.

After preprocessing, the next task is extracting relevant features from the raw data that can be used for machine learning models. This is where the multi-modal nature of the data becomes important. For sensor data, statistical features like mean, variance, skewness, and kurtosis are commonly extracted to summarize the distribution of the signals. Time-domain features, such as peaks, zero-crossings, and signal trends, are also important for capturing dynamic behaviors. In addition to these, frequency-domain features, such as the power spectral density and dominant frequencies, are useful for detecting patterns related to mechanical faults like bearing degradation or rotor imbalance. Operational logs and maintenance records can be processed using natural language processing (NLP) techniques to extract event patterns, fault descriptions, and maintenance intervals. By combining features from all modalities, a richer and more comprehensive feature set can be generated for machine learning models [21].

Finally, once the relevant features are extracted, they are fed into machine learning algorithms, which can identify patterns and predict potential faults. The multi-modal approach allows for more robust predictions by leveraging complementary information from various data sources, leading to better fault detection, earlier identification of issues, and ultimately, more efficient predictive maintenance strategies.

3.4. LLM-Based Decision Making

LLM-based decision-making systems leverage the capabilities of LLMs to enhance the decision-making process in industrial applications, including fault detection and predictive maintenance. These systems can process vast amounts of unstructured data, such as sensor readings, operational logs, and textual descriptions of faults, and provide actionable insights to operators and decision-makers. The decision-making process typically involves the integration of multiple data sources, the extraction of relevant features, and the application of learned models to predict outcomes or recommend actions.

The decision-making mechanism in LLM-based systems usually follows several key steps. First, the LLM ingests raw data from different sources, such as real-time sensor inputs or historical maintenance records. It processes this information to identify patterns and correlations, often through a process called “fine-tuning” where the model is adapted to the specific characteristics of the industrial system or fault scenario. The LLM then generates predictions, such as the likelihood of a failure occurring, the potential causes of a fault, or the most appropriate maintenance actions to take. These predictions are often presented as recommendations or insights that are easy for human operators to interpret [22].

A key feature of LLM-based decision-making systems is their ability to support real-time decision-making. By continuously analyzing incoming data, LLMs can provide immediate recommendations based on the most current information, allowing operators to take timely actions to prevent failures or reduce downtime. For example, in a predictive maintenance system, the LLM might predict the failure of a critical component, such as a pump or motor, based on early signs of wear, temperature fluctuations, or vibration anomalies. The system could then recommend a maintenance check, part replacement, or an adjustment to operational parameters.

However, the use of LLMs in decision-making also raises concerns around transparency and trust. Because LLMs are often viewed as “black-box” models, their decision-making process can be difficult for users to understand, especially when the predictions or recommendations they provide are not immediately explainable. To address this issue, the field of explainable AI (XAI) has emerged, focusing on making machine learning models, including LLMs, more interpretable and understandable for human users.

Explainability in LLM-based decision-making systems can be achieved through various techniques. One common approach is to use feature importance scores, which highlight which inputs (e.g., specific sensor readings, historical events, or environmental factors) were most influential in driving the model’s prediction. Another approach is through the use of attention mechanisms, which help visualize which parts of the input data the model focuses on when making a decision. For example, in a maintenance prediction task, an attention-based model might highlight a spike in temperature or vibration at a particular time as a key factor in predicting an impending failure. Additionally, surrogate models, such as decision trees or rule-based systems, can be trained to approximate the behavior of the LLM and provide an easier-to-understand representation of its decision-making process.

The goal of explainable design is to ensure that the decisions made by the LLM are not only accurate but also transparent and justifiable to end-users. In industrial environments, where high-stakes decisions are often based on model outputs, explainability can significantly increase trust in the system, making it more likely that operators will follow the recommendations provided. Furthermore, by understanding the rationale behind decisions, operators can refine and improve the model over time, leading to better performance and more reliable predictions [23].

In summary, LLM-based decision-making systems provide powerful tools for improving fault detection and predictive maintenance by processing diverse data sources and making informed predictions. However, ensuring that these systems are transparent and interpretable is essential for gaining user trust and ensuring their successful deployment in industrial settings. Through techniques like feature importance, attention mechanisms, and surrogate models, LLM-based decision systems can achieve a balance between performance and explainability, enabling more effective and reliable decision-making in complex industrial environments.

4. Feature Processing and Enhancement

This section describes the feature processing methodology of our framework, focusing on wavelet packet transform techniques and their optimization. We present the mathematical foundations of signal processing, detail our approach to fault feature extraction, introduce LLM-optimized wavelet packet selection, and outline the complete prediction process flow. These components work together to enhance the framework’s feature extraction and fault detection capabilities.

4.1. Wavelet Packet Transform

Wavelet Packet Transform (WPT) is an extension of the traditional Wavelet Transform (WT), offering a more detailed and flexible decomposition of signals, especially in high-frequency components. While the standard wavelet transform decomposes a signal into a low-frequency approximation and a high-frequency detail, WPT recursively decomposes both the low and high-frequency components at each level, resulting in multiple frequency sub-bands [24]. This allows WPT to capture more intricate features of the signal, making it especially suitable for applications such as fault diagnosis, noise reduction, and time frequency signal analysis.

Let

x (t)

be the continuous-time signal to be analyzed. The first step in the WPT is to decompose the signal using a pair of low-pass and high-pass filters, denoted as and

h (t)

, respectively. These filters perform the signal decomposition into approximation and detail coefficients. The convolution of the signal with these filters can be written as:

\begin{array}{l} a_{k} = \int_{- \infty}^{\infty} x (t) g (t - k) d t \\ b_{k} = \int_{- \infty}^{\infty} x (t) h (t - k) d t \end{array}

(11)

where

a_{k}

refers to low-pass filter coefficient, and

b_{k}

refers high-pass filter coefficient. After convolution, the output is down sampled by a factor of two, yielding the approximation and detail coefficients. The signal at this first level of decomposition can be expressed as:

x (t) = \sum_{k} a_{k} \cdot g (t - k) + \sum_{k} b_{k} \cdot h (t - k)

(12)

This process of decomposing both approximation and detail components continue recursively, generating a tree structure of frequency sub-bands. In the second level of decomposition, the approximation signal

a_{k}

and the detail signal

b_{k}

from the previous step are further decomposed by applying the lowpass and high-pass filters again, resulting in four new sub-bands:

L L, L H, H L

and

H H

.

This recursive splitting continues for

L

levels, creating a decomposition tree that includes all possible frequency sub-bands. The decomposition tree after two levels of analysis is shown schematically as:

x (t) \to \{L L, L H, H L, H H\} \to \{L L L, L L H, L H L, L H H, \dots\}

(13)

The key advantage of WPT over the traditional wavelet transform lies in its ability to perform a detailed frequency decomposition at every level, especially in the high-frequency regions. In WT, only two components are generated: the approximation (low-frequency) and the detail (high-frequency). However, for signals with rich high-frequency content, such as those encountered in fault detection, this decomposition may be insufficient. WPT, by recursively applying the filtering and down sampling steps, results in a much finer analysis of the signal, especially in regions where high-frequency features are critical.

The energy of each sub-band after decomposition can be calculated to measure the significance of each frequency component. The energy in a given frequency band

E_{i}

can be computed as the sum of squared coefficients:

\begin{array}{l} E_{i} = \sum_{k} {|a_{k}|}^{2} for the low - frequency sub - band \\ E_{j} = \sum_{k} {|b_{k}|}^{2} for the high - frequency sub - band \end{array}

(14)

This energy-based criterion is often used to identify important features within the signal and can be applied to fault diagnosis, where the presence of fault related high-frequency components can be highlighted by analyzing the energy distribution across sub-bands.

The multi-level decomposition provided by WPT allows for better localization of signal energy in both time and frequency domains. This capability makes WPT robust to noise and particularly effective in fault detection, where subtle fault signatures might be embedded in the high-frequency components. The method can be applied to mechanical systems to detect anomalies in vibration signals, or to other time-series data that exhibit transient behavior.

4.2. Fault Feature Extraction via Wavelet Packet Transform

Fault feature extraction using WPT is a critical step in the process of detecting and diagnosing faults in mechanical systems [25]. By decomposing the signal into multiple frequency sub-bands, WPT allows for a detailed analysis of both low- and high-frequency components, which can help identify fault-related features that are often hidden in the signal. The ability to isolate specific frequency ranges through WPT is particularly important for analyzing signals from mechanical systems [26], where faults often manifest in certain frequency bands due to the nature of mechanical vibrations.

Once the signal has been decomposed into its sub-bands, the next step is to extract relevant features from these sub-bands that can indicate the presence of faults. Among the most commonly used features for fault detection are energy, entropy, and statistical moments such as mean and variance.

Another useful feature is entropy, which measures the irregularity or complexity of the signal. A higher entropy value generally corresponds to a more complex or erratic signal, which could be a sign of abnormal behavior or an incipient fault. The entropy

H_{i}

of a sub-band can be computed as [27]:

H_{i} = - \sum_{k} p_{k} \log (p_{k})

(15)

where

p_{k}

is the probability distribution of the coefficients in the sub-band. Entropy helps capture subtle changes in the signal that might not be obvious in terms of energy alone.

Statistical features such as mean and variance are also commonly used to characterize the signal’s behavior within each sub-band. The mean

μ

and variance

σ^{2}

are computed as:

μ = \frac{1}{N} \sum_{k} a_{i k}

(16)

σ^{2} = \frac{1}{N} \sum_{k} {(a_{i k} - μ)}^{2}

(17)

where

N

is the number of coefficients in the sub-band. These statistical features can help identify changes in the signal’s characteristics that might indicate the presence of a fault.

After extracting these features from the different frequency sub-bands, the next step is to combine them for fault diagnosis. Feature fusion involves selecting the most discriminative features from the various sub-bands and combining them to improve classification accuracy. This can be done using techniques such as Principal Component Analysis (PCA) to reduce the dimensionality of the feature set while retaining the most important information. The principal components

z_{i}

are given by:

z_{i} = \sum_{j} α_{i j} x_{j}

(18)

where

α_{i j}

are the coefficients from the PCA transformation and

x_{j}

are the original features. By reducing the number of features, PCA helps to highlight the most important fault-related features, improving both the efficiency and accuracy of the fault detection process.

Alternatively, machine learning classifiers such as SVM can be trained using the extracted features to automatically classify the system’s condition. The SVM decision function is written as:

f (x) = \sum_{i} α_{i} y_{i} 〈x_{i}, x〉 + b

(19)

where

α_{i}

are the Lagrange multipliers,

y_{i}

are the class labels, and

〈x_{i}, x〉

represents the dot product between the training and test data. By using SVM or other classification algorithms, the fault detection system can be trained to recognize patterns in the feature set and make accurate predictions about the system’s condition.

Fault feature extraction using WPT allows for the identification of subtle anomalies in mechanical systems that may be indicative of impending faults. By analyzing the signal in the frequency domain and extracting features such as energy, entropy, and statistical moments, WPT provides a powerful tool for fault detection. When combined with techniques like PCA or machine learning classifiers, these extracted features can be used to accurately diagnose faults, improving the reliability and efficiency of predictive maintenance systems.

4.3. Optimal Wavelet Packet Selection Using LLM

Our approach leverages LLM-driven optimization to refine this feature extraction process, specifically determining the optimal wavelet packet decomposition parameters. The optimization encompasses the selection of mother wavelet functions and decomposition levels, along with the adaptation of threshold parameters and frequency band divisions. This optimization strategy ensures that the input signals fed into the MHA-GRU network are highly informative and discriminative, establishing a robust foundation for subsequent temporal analysis.

In the fault diagnosis process of synchronous condensers, selecting appropriate decomposition levels

L

and frequency bands

B

is crucial for capturing effective fault characteristics. LLMs achieve automatic optimization of these parameters through deep learning of historical data, greatly reducing the subjectivity and uncertainty associated with manual parameter setting. The overall process of LLM-enhanced wavelet packet transform is illustrated in Figure 5. The system consists of five main components: input signal, wavelet packet decomposition, LLM optimization, optimized WPT, and feature extraction. The historical fault data provides essential support for LLM optimization.

The decomposition level

L

determines the frequency resolution of the wavelet packet decomposition. Each additional level of decomposition divides the signal into more sub-bands, increasing the detail in both high and low-frequency feature analysis. The LLM optimization module performs two key tasks: level selection and band selection. The LLM employs an energy balance optimization formula to automatically select an appropriate

L

:

L = \arg \min |\sum_{j = 1}^{L} \frac{E_{j}}{E_{t o t a l}} - \frac{1}{L}|

(20)

where

E_{j}

is the energy at the

j

-th level, and

E_{t o t a l}

is the total energy. This formula aims to ensure an even distribution of energy across the sub-bands, enabling the effective capture of fault features while avoiding computational redundancy.

Wavelet packet decomposition divides the signal into multiple frequency bands, and LLMs automatically select the optimal bands through a weighted energy optimization strategy:

S (B) = \sum_{j = 1}^{k} ω_{j} E_{j}, with ω_{j} = \frac{e^{- η_{j}}}{\sum_{l} e^{- η_{l}}}

(21)

where

E_{j}

denotes the energy in the

j

-th frequency band,

ω_{j}

is the weighting coefficient, and

η_{j}

is related to the fault characteristics. The LLM automatically adjusts these weights, giving higher priority to frequency bands relevant to the fault, thus enhancing diagnostic precision. For example, in the case of stator winding short-circuit faults, the characteristic frequencies may concentrate in the mid-frequency range, and the LLM will prioritize these bands by assigning higher weights, ensuring accurate diagnostic outcomes.

The dynamic weighting mechanism is defined as:

ω_{j} = f (η_{j}) = \frac{e^{- α \cdot η_{j}}}{\sum_{i} e^{- α \cdot η_{i}}}

(22)

Through this mechanism, the system can adaptively adjust the band selection strategy, ensuring sensitivity to key fault-related frequency bands.

4.4. Prediction Process Flow for Synchronous Condenser Fault Detection

The fault prediction process utilizing the MHA-GRU model follows a systematic flow designed for real-time monitoring and early fault detection. The process begins with continuous data acquisition from multiple sensors, which collect electromagnetic measurements, air-gap field characteristics, vibration patterns, and electrical parameters from the synchronous condenser. These raw sensor signals undergo preprocessing through noise filtering and normalization to ensure data quality and consistency.

The preprocessed data streams are then fed into the MHA-GRU model’s input layer for feature transformation. Building upon previous research in power equipment defect prediction [28], the hybrid processing layer integrates GRU component for temporal sequences while the multi-head attention mechanism simultaneously analyzes patterns across different feature subspaces. This dual processing approach enables the model to detect both immediate anomalies and gradual degradation patterns in the synchronous condenser’s operation.

The final stage involves fault analysis and warning generation, where the model’s output layer processes the enhanced feature representations. For each fault type—rotor, air-gap, stator, and bearing—the system calculates probability scores and severity assessments. When these scores exceed predetermined thresholds, the system triggers early warning signals, enabling operators to implement preventive maintenance strategies before critical failures occur. This integrated approach ensures reliable fault detection while minimizing false alarms.

5. Experiment Results and Analysis

This section presents a comprehensive performance evaluation of our proposed LLM-enhanced fault prediction framework. Through extensive experiments conducted on real-world synchronous condenser data, we assess the framework’s effectiveness from multiple perspectives: the experimental setup and implementation details, quantitative performance analysis across different fault types, and the enhancement achieved through LLM-optimized feature processing. The results demonstrate our framework’s advantages over traditional methods in terms of fault classification accuracy, real-time processing capability, and overall system reliability.

5.1. Experimental Setup and Implementation

To comprehensively validate the effectiveness and reliability of the proposed framework, extensive experiments were conducted using data collected from a synchronous condenser in an actual UHVDC transmission system. The dataset encompassed three typical fault types: rotor winding, air-gap eccentricity, and stator winding faults. The dataset split ratios were determined through empirical validation to ensure sufficient samples for model training while maintaining statistically meaningful validation and test sets. Our experiments demonstrated that moderate variations in these split ratios did not significantly impact the model’s performance, indicating the robustness of our approach across different data partitioning schemes. This systematic split strategy provides a balanced framework for both model development and comprehensive performance assessment.

The experimental environment was established on a high-performance computing platform equipped with an Intel Xeon E5-2680 v4 CPU, 128 GB RAM, and an NVIDIA Tesla V100 GPU with 32 GB memory. The software implementation utilized PyTorch 1.8, with data collection spanning a 12-month period at sampling frequencies of 2 kHz for fault data and 1 kHz for vibration analysis. For the Multi-Head Attention GRU model, key hyperparameters were carefully tuned to optimize performance. The number of attention heads was set to 8, with a hidden layer size of 256 neurons and a dropout rate of 0.2 to prevent overfitting. The model was trained using the Adam optimizer with an initial learning rate of 0.001 and a batch size of 64, as shown in Table 2.

To ensure the robustness and reproducibility of the results, all experiments were repeated 10 times with different random seeds, and the average performance metrics were reported. The comprehensive experimental setup enabled thorough evaluation of the framework’s capabilities in real-world fault diagnosis scenarios, while the diverse dataset and rigorous implementation details provided a solid foundation for performance assessment.

5.2. Comprehensive Performance Analysis

All performance metrics and results presented in this section are derived from the independent testing dataset, which was not used during model training or validation. This ensures an unbiased evaluation of the model’s real-world performance capability.

The MHA-GRU model demonstrated exceptional fault classification capabilities across all fault types. As shown in Figure 6, the confusion matrix reveals high accuracy rates of 96.8%, 95.7%, and 96.2% for rotor, air-gap, and stator faults respectively. The low misclassification rates, averaging only 2.1%, demonstrated the model’s robust ability to distinguish between different fault types, even under complex operating conditions.

As shown in Table 3, the model demonstrates robust performance across all fault types, with particularly high accuracy for normal condition detection. The detection times remain consistently low, with standard deviations indicating stable performance. These results indicate that the model can effectively identify and classify different fault types while maintaining real-time processing capabilities.

The comparative analysis in Figure 7 comprehensively evaluates the performance of different models including Support Vector Machine (SVM), Long Short-Term Memory (LSTM), traditional GRU, and our proposed MHA-GRU approach. The MHA-GRU model achieved superior performance across all evaluation metrics. In terms of accuracy, the MHA-GRU reached 96.2%, significantly outperforming SVM (89.3%), LSTM (92.1%), and traditional GRU (93.8%). The precision metric showed similar improvements, with MHA-GRU achieving 95.8% compared to SVM (88.7%), LSTM (91.5%), and GRU (93.1%). The recall rate of MHA-GRU (96.1%) demonstrated its robust fault detection capability, surpassing other methods by 5–8 percentage points. Furthermore, the F1-score, which provides a balanced measure of precision and recall, reached 95.9% for MHA-GRU, indicating its well-rounded performance advantage over traditional approaches.

Figure 8 demonstrates the comparative prediction performance of different algorithms through amplitude signal tracking. The proposed MHA-GRU model exhibits superior prediction capabilities compared to conventional approaches. The signal profile captures a typical fault development process, with a significant anomaly occurring around sample point 150 followed by a recovery phase near point 200. As shown in the zoomed-in section (samples 160–180), the MHA-GRU achieves the most stable tracking performance with minimal deviation from the real value, while traditional methods such as SVM and LSTM show larger fluctuations and delayed responses during rapid signal changes. The results validate that our MHA-GRU approach significantly enhances fault prediction accuracy while maintaining robust performance under complex signal variations.

Table 4 provides a detailed breakdown of the signal characteristics across different operational phases. The analysis divides the entire monitoring period into four distinct stages, from normal operation through fault development to post-recovery. Each stage exhibits unique amplitude ranges and behavioral patterns, with the MHA-GRU consistently demonstrating superior performance across all phases, particularly during the critical fault development period (samples 150–200) where accurate prediction is most crucial for fault detection.

The early warning system exhibited robust detection capabilities in real-time operation scenarios. As demonstrated in Figure 9a, the time-domain fault detection process shows clearly identified fault points and detection thresholds. The system achieved an average detection time of 0.8 s after fault occurrence, while maintaining remarkably low false alarm (2.3%) and missing alarm (1.7%) rates. The 10 s interval in Figure 10 was chosen based on our experimental sampling rate and the typical time scale of fault development in synchronous condensers. Figure 9b illustrates the statistical distribution of fault detection delays, which followed an approximately normal distribution with a standard deviation of 0.3 s, indicating stable and reliable detection performance.

Figure 10 demonstrates the real-time processing performance of different algorithms under system loads varying from 20% to 100%. The MHA-GRU model exhibits the most stable processing time characteristics, maintaining below 60 ms throughout the entire load range with the most gradual growth curve. In contrast, traditional GRU and LSTM show significant increases in processing time with increasing load, particularly LSTM approaching 90 ms under high load conditions. While the SVM algorithm performs well under low loads, its performance degrades rapidly as the load increases, exceeding 70 ms. The zoomed-in region (40–60% load) clearly illustrates the subtle performance differences between MHA-GRU and SVM algorithms under medium load conditions, further confirming the performance advantages of the MHA-GRU model.

Real-time processing capabilities were thoroughly evaluated under varying system loads ranging from 20% to 100% utilization, as shown in Figure 11. The box plots and scatter points reveal consistent performance even under full load conditions, with the average processing time remaining below 45 ms and a standard deviation of 5 ms. This performance stability under high computational demands was further verified through extensive testing across different operational scenarios. The statistical indicators demonstrate the framework’s reliability in practical applications, making it suitable for deployment in actual UHVDC transmission systems.

To comprehensively demonstrate the advantages of our proposed LLM-MHA GRU framework, we present an extended performance evaluation incorporating multiple metrics. Table 5 provides a detailed comparison of our method against other approaches across various performance indicators.

The results demonstrate that our LLM-MHA GRU framework achieves superior performance across all metrics. Specifically, our model achieves the highest diagnostic accuracy of 98.7% with balanced precision-recall performance (98.5% and 98.6% respectively), while maintaining a comparable model complexity of 57k parameters in line with other approaches (0.041M–0.052M parameters). The most balanced F1-score of 98.5% indicates robust overall performance. These results demonstrate that with similar computational complexity, the MHA-GRU architecture can achieve significantly better performance compared to traditional approaches, validating the effectiveness of our proposed architectural improvements rather than relying on increased model capacity.

5.3. LLM Feature Enhancement Analysis

The LLM-enhanced signal processing significantly improved both feature selection and overall system performance. As shown in Figure 12, the energy distribution of WPT terminal nodes after LLM weighting demonstrates the effectiveness of intelligent node selection. Higher energy nodes correspond to critical signal features, allowing the system to identify and prioritize relevant features for fault diagnosis. This led to a 15% improvement in feature extraction accuracy and 30% decrease in processing time compared to traditional methods, while maintaining diagnostic accuracy. The results validate the effectiveness of the LLM-based approach in fault detection applications.

5.4. Framework Generalization and Adaptability

Our LLM-optimized WPT and MHA-GRU framework demonstrates strong generalization capability through its core design principles. The framework is not tied to a specific dataset, as its key components—multi-modal data preprocessing, wavelet-based feature extraction, and adaptive temporal modeling—are broadly applicable to various synchronous condensers and similar rotating machinery.

The framework’s generalization strength is reflected in three key aspects. First, the LLM-optimized WPT component adaptively determines optimal parameters based on signal characteristics rather than specific fault patterns, enabling robust feature extraction across different operating conditions. Second, our training process supports transfer learning, allowing efficient model fine-tuning when applied to new machines or operating conditions with minimal additional training data. Third, the MHA-GRU architecture’s temporal modeling capabilities can effectively capture fault progression patterns across different machine specifications and operational scenarios.

The framework’s design ensures adaptability when processing new fault patterns and different operational conditions, demonstrating its broad applicability in practical scenarios. Future work will focus on validating the model’s adaptability across more diverse industrial scenarios, further demonstrating its practical value in real-world applications.

6. Conclusions

This paper presents an innovative fault prediction framework for synchronous condensers in UHVDC transmission systems, combining LLM-optimized WPT with MHA-GRU networks. Compared to traditional methods like DNN, RNN, and LSTM, our MHA-GRU framework achieves superior performance while maintaining a more compact model structure. The proposed approach improves feature extraction from multi-modal sensor data through LLM-driven optimization, enhancing the framework’s ability to detect subtle fault patterns under complex operational conditions. The integration of MHA-GRU further boosts temporal modeling, demonstrating notable improvements in detection accuracy, processing efficiency, and fault alarm reliability compared to existing methods. The framework has demonstrated strong practical value, successfully addressing challenges such as handling multi-modal data, adapting to varying conditions, and providing reliable early fault warnings in real-time UHVDC systems. Future research will focus on automating hyperparameter optimization, extending the framework to concurrent fault detection, enhancing model generalization across diverse datasets, and developing a more comprehensive optimization pipeline. Overall, the proposed solution offers substantial improvements in predictive maintenance for UHVDC systems, enhancing both reliability and operational efficiency.

Author Contributions

Conceptualization, D.Z.; methodology, T.H.; software, S.L.; validation, C.Z.; formal analysis, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Project of State Grid Corporation of China, with project number 5500-202334190A-1-1-ZN.

Data Availability Statement

The data is private, please contact the corresponding author for access.

Conflicts of Interest

Authors Dongqing Zhang, Shenglong Li, Chaofeng Zhang, and Wenqiang Zhao were employed by the State Grid Corporation of China. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Yousaf, M.Z.; Mirsaeidi, S.; Khalid, S.; Raza, A.; Zhichu, C.; Rehman, W.U.; Badshah, F. Multisegmented Intelligent Solution for MT-HVDC Grid Protection. Electronics 2023, 12, 1766. [Google Scholar] [CrossRef]
Purbowaskito, W.; Wu, P.-Y.; Lan, C.-Y. Permanent Magnet Synchronous Motor Driving Mechanical Transmission Fault Detection and Identification: A Model-Based Diagnosis Approach. Electronics 2022, 11, 1356. [Google Scholar] [CrossRef]
Kuai, Z.; Huang, G. Fault Diagnosis of Diesel Engine Valve Clearance Based on Wavelet Packet Decomposition and Neural Networks. Electronics 2023, 12, 353. [Google Scholar] [CrossRef]
Yuan, X.; Wang, Y.; Yang, C.; Ge, Z.; Song, Z.; Gui, W. Weighted Linear Dynamic System for Feature Representation and Soft Sensor Application in Nonlinear Dynamic Industrial Processes. IEEE Trans. Ind. Electron. 2018, 65, 1508–1517. [Google Scholar] [CrossRef]
Qiang, C.Q.; Ping, L.J.; Haq, A.U.; He, L.; Haq, A. Net Traffic Classification Based on GRU Network Using Sequential Features. In Proceedings of the 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 17–19 December 2021; pp. 460–465. [Google Scholar]
Guo, T.; Zhang, T.; Lim, E.; López-Benítez, M.; Ma, F.; Yu, L. A Review of Wavelet Analysis and Its Applications: Challenges and Opportunities. IEEE Access 2022, 10, 58869–58903. [Google Scholar] [CrossRef]
Rehman, A.U.; Chen, Y.; Zhao, Y.; Cheng, Y.; Zhao, Y.; Tanaka, T. Detection of Rotor Inter-turn Short Circuit Fault in Doubly-fed Induction Generator using FEM Simulation. In Proceedings of the 2018 IEEE 2nd International Conference on Dielectrics (ICD), Budapest, Hungary, 1–5 July 2018; pp. 1–4. [Google Scholar]
Lare, P.; Sarabi, S.; Delpha, C.; Nasr, A.; Diallo, D. Stator winding Inter-turn short-circuit and air gap eccentricity fault detection of a Permanent Magnet-Assisted Synchronous Reluctance Motor in Electrified vehicle. In Proceedings of the 2021 24th International Conference on Electrical Machines and Systems (ICEMS), Gyeongju, Republic of Korea, 31 October–3 November 2021; pp. 932–937. [Google Scholar]
Taghipour-GorjiKolaie, M.; Razavi, S.M.; Shamsi-Nejad, M.A.; Darzi, A. Inter-turn stator winding fault detection in PMSM using magnitude of reactive power. In Proceedings of the 2011 IEEE International Conference on Computer Applications and Industrial Electronics (ICCAIE), Penang, Malaysia, 4–7 December 2011; pp. 256–261. [Google Scholar]
Zhu, W.; Huang, P. Performance Degradation Assessment of Rolling Bearing Based on Difference of Eigenvalues in Random Matrix Theory. In Proceedings of the 2023 10th International Conference on Dependable Systems and Their Applications (DSA), Tokyo, Japan, 10–11 August 2023; pp. 559–564. [Google Scholar]
Lu, Y.; Li, Q.; Liang, S.Y. Adaptive prognosis of bearing degradation based on wavelet decomposition assisted ARMA model. In Proceedings of the 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chengdu, China, 15–17 December 2017; pp. 733–736. [Google Scholar]
Wang, L.; Wang, H.; Lu, T.; Wang, C. Synchronous Condenser Reactive Power Output Model Based on DAG-CNN. In Proceedings of the 2021 International Conference on Communications, Information System and Computer Engineering (CISCE), Beijing, China, 14–16 May 2021; pp. 674–678. [Google Scholar]
Trummer, I. Large Language Models: Principles and Practice. In Proceedings of the IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands, 13–17 May 2024; pp. 5354–5357. [Google Scholar]
Li, W.; Xuan, W.; Min, W. Motor bearing fault diagnosis based on wavelet packet decomposition of instantaneous power. In Proceedings of the 2010 International Conference on Computer Design and Applications, Qinhuangdao, China, 25–27 June 2010; pp. V3-457–V3-459. [Google Scholar]
Yang, H.; Lin, L.; Zhong, S.; Guo, F.; Cui, Z. Aero Engines Fault Diagnosis Method Based on Convolutional Neural Network Using Multiple Attention Mechanism. In Proceedings of the 2021 IEEE International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC), Weihai, China, 13–15 August 2021; pp. 13–18. [Google Scholar]
Sun, C.; Zhang, Y.; Xu, S.; Tian, W.; Gu, B.; Cheng, M.; Chao, W. Equivalent analysis of Stator Inter-Turn Fault in Different Winding Topologies of Synchronous Condenser. In Proceedings of the 2022 IEEE 5th International Electrical and Energy Conference (CIEEC), Nangjing, China, 27–29 May 2022; pp. 4994–4998. [Google Scholar]
Dey, R.; Salem, F.M. Gate-variants of Gated Recurrent Unit (GRU) Neural Networks. In Proceedings of the 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
Han, X.; Liang, Y.; Zhu, E.; Bian, X. Nonlinear Transient Mathematical Model of Large-Capacity Synchronous Condenser Based on Time-Varying Reactance Parameters. IEEE Access 2023, 11, 5411–35420. [Google Scholar] [CrossRef]
Zainuddin, Z.; EA, P.A.; Hasan, M.H. Predicting Machine Failure Using Recurrent Neural Network-Gated Recurrent Unit (RNN-GRU) through Time Series Data. Bull. Electr. Eng. Inform. 2021, 10, 870–878. [Google Scholar]
Fu, X.; Li, H.; Xu, D.; Lin, M.; Zou, J. Analysis of Air-Gap Magnetic Field in Homopolar Inductor Alternator by Analytical Method and FEM. IEEE Trans. Magn. 2015, 51, 8100604. [Google Scholar]
Liu, X.; Zhang, Y.; Wang, X. Research on Air Gap Magnetic Field for Permanent Magnet Toroidal Motor with Rotor Eccentricity. In Proceedings of the 2023 26th International Conference on Electrical Machines and Systems (ICEMS), Zhuhai, China, 5–8 November 2023; pp. 2712–2716. [Google Scholar]
Ren, Z.; Wang, C.; Ye, L.; Hu, J.; Liu, Y.; Zhou, W. Analysis of Electromagnetic Characteristics of Synchronous Condenser Under Stator Inter-Tum Short Circuit Fault. In Proceedings of the 2018 21st International Conference on Electrical Machines and Systems (ICEMS), Jeju, South Korea, 7–10 October 2018; pp. 2638–2642. [Google Scholar]
Chen, J.; Chuan, S.; Chao, X. Research on Fault Analysis and Remote Fault Diagnosis Technology of New Large Capacity Synchronous Condenser. In Proceedings of the 2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA), Shenyang, China, 26–28 January 2024; pp. 87–91. [Google Scholar]
Hussein, A.M.; Obed, A.A.; Zubo, R.H.A.; Al-Yasir, Y.I.A.; Saleh, A.L.; Fadhel, H.; Sheikh-Akbari, A.; Mokryani, G.; Abd-Alhameed, R.A. Detection and Diagnosis of Stator and Rotor Electrical Faults for Three-Phase Induction Motor via Wavelet Energy Approach. Electronics 2022, 11, 1253. [Google Scholar] [CrossRef]
Chen, S.; Li, Z.; Pan, G.; Xu, F. Power Quality Disturbance Recognition Using Empirical Wavelet Transform and Feature Selection. Electronics 2022, 11, 174. [Google Scholar] [CrossRef]
Kareem, A.B.; Hur, J.-W. Towards Data-Driven Fault Diagnostics Framework for SMPS-AEC Using Supervised Learning Algorithms. Electronics 2022, 11, 2492. [Google Scholar] [CrossRef]
Liu, W.; Yang, X.; Jinxing, S. An Integrated Fault Identification Approach for Rolling Bearings Based on Dual-Tree Complex Wavelet Packet Transform and Generalized Composite Multiscale Amplitude-Aware Permutation Entropy. Shock. Vib. 2020, 2020, 8851310. [Google Scholar] [CrossRef]
Shcherbatov, I.; Lisin, E.; Rogalev, A.; Tsurikov, G.; Dvořák, M.; Strielkowski, W. Power Equipment Defects Prediction Based on the Joint Solution of Classification and Regression Problems Using Machine Learning Methods. Electronics 2021, 10, 3145. [Google Scholar] [CrossRef]

Figure 1. (a) Circuit model of synchronous condenser in power system (b) Stator winding circuit model (c) Schematic of rotor inter-turn short circuit.

Figure 2. Integrated Fault Prediction Framework Combining LLM-Optimized WPT and MHA-GRU.

Figure 3. MHA-GRU Network Architecture for Synchronous Condenser Fault Prediction.

Figure 4. GRU Cells.

Figure 5. LLM-Enhanced Wavelet Packet Transform System Model.

Figure 6. Confusion matrix for the MHA-GRU model, showing classification rates between predicted and actual fault types: rotor, air-gap, and stator faults.

Figure 7. Performance comparison of fault diagnosis models: Accuracy, precision, recall, and F1-score for SVM, LSTM, GRU, and MHA-GRU.

Figure 8. Prediction performance comparison for amplitude signal tracking: MHA-GRU, GRU, LSTM, and SVM.

Figure 9. Early warning system evaluation: (a) Time−domain fault detection with thresholds and fault points, (b) Distribution of fault detection delays.

Figure 10. Real-time processing performance under varying system loads (20–100%) for different algorithms.

Figure 11. Real-time processing performance under varying system loads (20–100%): processing time distributions with statistical indicators.

Figure 12. Energy distribution of WPT terminal nodes after LLM weighting.

Table 1. Multi-modal Input Features for Synchronous Condenser Fault Diagnosis.

Data Category	Signal Acquisition	Feature Characteristics	Processing Method
Mechanical Parameters	Shaft-mounted sensors and vibration probes	Acceleration, Displacement, Velocity, Vibration modes	Wavelet decomposition, Time-frequency analysis
Electrical Indicators	Current/voltage sensors and power analyzers	Phase voltage, Current amplitude, Power factor, Harmonics	FFT analysis, Power spectrum density
Field Distribution	Embedded flux sensors and field probes	Magnetic field intensity, Flux density, Field symmetry	Spatial-temporal decomposition

Table 2. Statistical Analysis of the MHA-GRU Based Prediction System.

System Parameter	Value/Description
Training Epochs	50
Batch Size	64
Learning Rate	0.001
Attention Heads	8
Hidden Layer Size	256
Dropout Rate	0.2
Sequence Length	2048
Model Architecture	MHA-GRU with 3 layers
Software Framework	PyTorch 1.8

Table 3. Model performance comparison under different fault conditions.

Fault Type	Accuracy (%)	Precision (%)	F1-Score (%)	Detection Time (ms)
Rotor Winding	96.8 ± 0.5	95.9 ± 0.6	96.2 ± 0.4	42 ± 5
Air-Gap Eccentricity	95.7 ± 0.7	94.8 ± 0.8	95.1 ± 0.6	45 ± 6
Stator Winding	96.2 ± 0.6	95.4 ± 0.7	95.8 ± 0.5	43 ± 4
Normal	97.5 ± 0.4	96.9 ± 0.5	97.1 ± 0.4	40 ± 3

Table 4. Analysis summary of prediction performance in different signal regions.

Signal Region	Sample Range	Amplitude Range	Best Performance
Normal Operation	0–100	465–475	MHA-GRU (±0.5 deviation)
Fault Development	100–150	455–465	MHA-GRU (fastest response)
Critical Region	150–200	460–472	MHA-GRU (minimum error)
Post-Recovery	200–300	462–476	MHA-GRU (best stability)

Table 5. Comprehensive Performance Comparison of Different Methods.

Method	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	Parameters (M)
Traditional DNN	91.2	90.8	91.0	90.9	0.048
RNN	93.5	93.1	93.3	93.2	0.041
LSTM	94.1	93.8	94.0	93.9	0.052
Attention-RNN	95.8	95.5	95.6	95.5	0.047
MHA-GRU	98.7	98.5	98.6	98.5	0.044

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, D.; Li, S.; Hong, T.; Zhang, C.; Zhao, W. Enhanced Fault Prediction for Synchronous Condensers Using LLM-Optimized Wavelet Packet Transformation. Electronics 2025, 14, 308. https://doi.org/10.3390/electronics14020308

AMA Style

Zhang D, Li S, Hong T, Zhang C, Zhao W. Enhanced Fault Prediction for Synchronous Condensers Using LLM-Optimized Wavelet Packet Transformation. Electronics. 2025; 14(2):308. https://doi.org/10.3390/electronics14020308

Chicago/Turabian Style

Zhang, Dongqing, Shenglong Li, Tao Hong, Chaofeng Zhang, and Wenqiang Zhao. 2025. "Enhanced Fault Prediction for Synchronous Condensers Using LLM-Optimized Wavelet Packet Transformation" Electronics 14, no. 2: 308. https://doi.org/10.3390/electronics14020308

APA Style

Zhang, D., Li, S., Hong, T., Zhang, C., & Zhao, W. (2025). Enhanced Fault Prediction for Synchronous Condensers Using LLM-Optimized Wavelet Packet Transformation. Electronics, 14(2), 308. https://doi.org/10.3390/electronics14020308

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Fault Prediction for Synchronous Condensers Using LLM-Optimized Wavelet Packet Transformation

Abstract

1. Introduction

2. Background and Related Work

2.1. Synchronous Condensers Fault Types

2.2. Traditional Fault Detection and Diagnosis Methods

2.3. LLMs for Industrial Fault Prediction and Diagnosis

3. LLM-Driven Intelligent Framework

3.1. Integration Framework of LLM-Optimized WPT and MHA-GRU Network

3.2. Neural Network Architecture Design

3.3. Multi-Modal Feature Learning

3.4. LLM-Based Decision Making

4. Feature Processing and Enhancement

4.1. Wavelet Packet Transform

4.2. Fault Feature Extraction via Wavelet Packet Transform

4.3. Optimal Wavelet Packet Selection Using LLM

4.4. Prediction Process Flow for Synchronous Condenser Fault Detection

5. Experiment Results and Analysis

5.1. Experimental Setup and Implementation

5.2. Comprehensive Performance Analysis

5.3. LLM Feature Enhancement Analysis

5.4. Framework Generalization and Adaptability

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI