1. Introduction
The On-Load Tap Changer (OLTC), as a critical component of transformers, directly influences the voltage regulation capability of power systems and the operational safety of equipment [
1]. Due to the frequent switching operations performed during OLTC operation, its internal contacts, transition resistance, and insulation media are subjected to combined effects of arc discharge, thermal stress, and electrical stress over prolonged periods, which easily lead to contact wear, carbon accumulation, abnormal discharges, and other faults [
2]. Early manifestations of OLTC faults are typically latent and difficult to detect through conventional operational parameters. However, once faults accumulate to a critical threshold, severe consequences such as contact welding, open or short circuit of the transition resistance may occur, thereby jeopardizing the safety and stability of transformer operation and the entire power system [
3].
Dissolved Gas Analysis (DGA) is a well-established technique for transformer fault diagnosis, providing valuable insights into the types and concentrations of gases dissolved in insulating oil [
4,
5]. In on-load tap changers (OLTCs), however, frequent switching operations introduce unique phenomena such as arcing and contact wear, which alter oil decomposition pathways and result in gas compositions distinct from those in the transformer main tank. Among these, acetylene (C
2H
2) is particularly complex: while commonly recognized as an indicator of severe internal faults such as arcing, it is also a natural byproduct of normal OLTC operation, thereby complicating fault identification [
6,
7,
8].
Although DGA is critical for OLTC fault diagnosis, gas monitoring in OLTC oil is still predominantly performed through intermittent sampling, constrained by two fundamental factors. First, existing commercial gas monitors face inherent limitations in OLTC applications. Most devices require manual on-site sampling at intervals of weeks, and even online monitors often struggle to achieve stable daily or higher-frequency measurements due to restricted installation space, severe mechanical vibrations, and electromagnetic interference during switching operations. Moreover, high-frequency monitoring may cause sensor drift and oil contamination, further reducing data reliability. Second, OLTCs typically operate at relatively high switching frequencies. The OLTC undergoes multiple switching operations per day, each of which induces short-lived peaks in gas concentrations lasting only one to two hours. Conventional low-frequency sampling, conducted every few days, is unable to capture these transient variations, leading to highly discrete gas profiles and substantially limiting the accuracy of trend analysis and fault prediction. Though recent research has sought to refine DGA techniques to alleviate the effects of sparse spatiotemporal data [
9,
10], systematic investigations dedicated to OLTC applications remain limited. Established interpretation methods—such as the key gas ratio approach and the Duval triangle method—are widely applied in main transformer tank diagnostics [
11], yet have not been adequately extended to OLTCs. Addressing this gap is essential to advance reliable fault diagnosis in tap changers.
At present, predictive approaches for dissolved gas analysis (DGA) primarily encompass statistical forecasting [
12], machine learning [
13], and deep learning methods [
14]. Statistical models, such as grey models and traditional time series approaches, are capable of capturing overall data trends but often suffer from limited accuracy when confronted with non-stationary and nonlinear sequences [
15]. To improve predictive performance, machine learning methods, particularly gradient boosting decision trees (GBDT) and their advanced variants, have been widely adopted for dissolved gas concentration forecasting [
16,
17]. While these models are effective in utilizing structured features, they are typically constructed on static representations and thus fail to adequately capture the temporal evolution of gas concentrations, which is critical given the intermittent and dynamic nature of OLTC oil monitoring. More recently, deep learning techniques have emerged as powerful alternatives, demonstrating superior performance in modeling nonlinear dependencies and temporal dynamics, thereby establishing themselves as the leading direction in DGA-based predictive research [
18,
19]. In particular, deep neural network-based ensemble frameworks outperform traditional approaches such as logistic regression, random forests, and adaptive boosting by offering stronger nonlinear feature extraction capabilities. Nevertheless, most existing studies focus on specific operating conditions without fully considering the temporal dependencies and spatial correlations inherent in gas evolution, thereby limiting model generalizability. To address these limitations, Transformer architectures have garnered increasing attention due to their advantages in time series modeling [
20,
21]. LogTrans [
22], for instance, improves long-term dependency modeling through local causal convolutions while retaining the global attention mechanism of Transformers to enhance feature extraction from gas concentration variations. Building upon this, Informer [
23] introduces a probabilistic sparse self-attention mechanism and a scalable time decomposition strategy, greatly increasing efficiency in long-sequence forecasting. Against this backdrop, ETSformer has recently emerged as a novel time series forecasting model that integrates exponential smoothing attention with frequency-based attention mechanisms, offering enhanced capability in capturing both trend evolution and periodic patterns [
24,
25,
26]. While ETSformer has demonstrated strong performance in generic time series forecasting tasks, it has not yet been applied to dissolved gas analysis. Considering the distinctive characteristics of OLTC oil, where frequent switching operations lead to intermittent gas fluctuations and blurred boundaries between normal and fault-related gas generation, ETSformer provides a promising framework to uncover latent temporal structures and improve the reliability of gas concentration prediction in this domain.
Accurate identification and correction of anomalous data that deviate from normal operating patterns are essential for reliably reflecting the actual operational state and evolutionary trends of the OLTC, thereby enhancing data credibility and improving the effectiveness of fault diagnosis and condition assessment [
27,
28,
29]. In fields such as power equipment monitoring and load sensing, anomaly detection has become a central research focus. Existing methods, such as Local Outlier Factor (LOF) and Isolation Forest (IF) [
30,
31], have been successfully applied to various anomaly detection scenarios. However, these approaches predominantly rely on global criteria for anomaly identification, exhibiting limited sensitivity to local anomalies and often failing to detect subtle deviations [
32]. To address this limitation, this study introduces a novel anomaly detection method based on the Local Gradient of Outlier Degree (LGOD), designed to improve the recognition of localized anomalies.
In addition, Local Weighted Regression (LWR) is employed to further correct the identified anomalous data, thereby improving data quality and ensuring temporal continuity [
33]. LWR is widely recognized for its ability to handle nonlinear and non-stationary data, providing flexible local fitting that preserves underlying patterns without imposing rigid global assumptions. It has been successfully applied in domains such as environmental monitoring and sensor calibration, where irregular data frequently occur [
34,
35]. However, its performance is strongly influenced by the choice of kernel bandwidth and weighting schemes, which may result in over-smoothing or underfitting when applied to highly volatile signals. Within the proposed framework, LWR is integrated to exploit its strengths in local approximation while mitigating its limitations through complementary combination with LGOD and ETSformer. This integration enables a more accurate characterization of OLTC condition evolution trends under intermittent monitoring regimes.
Existing studies on OLTC dissolved gas prediction have achieved progress through statistical and deep learning approaches; however, challenges remain due to data sparsity, measurement noise, and vulnerability to outliers, which limit their applicability in real monitoring environments. Motivated by these limitations, this study develops an integrated predictive framework that combines LGOD-based anomaly detection, LWR-based data reconstruction, and the ETSformer deep time series forecasting model. The proposed method effectively improves data quality, enhances temporal continuity, and strengthens the modeling of dynamic gas evolution. Experimental and ablation results demonstrate that the framework significantly outperforms conventional methods, while the complementary contributions of its components provide clear evidence of its necessity and practical value.
2. Algorithm Introduction
2.1. An LGOD-Based Method for Anomaly Detection in Time Series Data
Transient disturbances induced by OLTC operations can lead to localized fluctuations in dissolved gas concentration measurements, resulting in anomalous data patterns [
36]. Therefore, it is essential to incorporate an anomaly detection algorithm with fine-grained sensitivity to accurately identify such deviations. To address this need, this study proposes a novel anomaly detection method for dissolved gas data based on the Local Gradient of Outlier Degree (LGOD). This method quantifies variations in the local resultant force experienced by data points across multiple neighborhood scales, thereby enabling effective discrimination of outliers.
The local resultant force characterizes the aggregate influence—conceptualized as gravitational attraction—exerted on a data point by its neighboring points, as illustrated in
Figure 1. Based on gravitational theory, the attraction (or force) between two data points
Pi and
Pj can be defined as follows:
Here, G denotes the gravitational constant, mi and mj represent the masses assigned to data points Pi and Pj, respectively, dij denotes the Euclidean distance between the two points, and is the unit vector pointing from Pi to Pj.
By combining the gravitational constant and the masses into a proportionality factor
C, the gravitational force expression can be simplified.
As illustrated in
Figure 1, the local resultant force is defined as the vector sum of the gravitational influences exerted on data object
i by all its neighbors within the
k-nearest neighborhood [
37]. The corresponding formulation is given by
Here,
k denotes the number of nearest neighbors of data point
i. The local resultant force (LRF) provides a comprehensive characterization of the spatial distribution and directional influence acting on point
i within its neighborhood [
38]. The simplified expression for the LRF is given by
Here,
wi represents the mass of data object
i, which is defined
According to the above definition, points located in low-density regions exhibit stronger local resultant forces due to larger distances and smaller masses, whereas points in high-density regions tend to have weaker local resultant forces as a result of greater mass and mutual gravitational cancellation.
Figure 2 illustrates the local resultant force as the number of neighbors
k increases. By comparing
Figure 2a,b, it is evident that the local resultant force of point P
2 changes minimally, while that of point P
1 exhibits a pronounced variation. This indicates that different types of points experience significantly distinct changes in their local resultant forces as the neighborhood size expands.
Since the local resultant force directions of outliers or boundary points tend to be aligned, their local resultant force magnitudes increase with the number of neighbors. In contrast, internal points exhibit diverse attraction directions, resulting in negligible local resultant forces as the neighborhood size expands. To quantitatively characterize the LRF, the following metric is defined, as expressed in Equation (6):
Here,
denotes the magnitude of the local resultant force acting on data object
i with respect to its
k neighbors, where K represents the maximum number of neighbors considered for point
i. By cumulatively summing the local resultant force magnitudes of data object
i across increasing neighborhood sizes, the local resultant force variation rate is obtained. The calculation is formulated as follows:
To enable automatic identification of outliers based on the variation rate, this study employs a hierarchical partitioning method to adaptively determine the threshold. Specifically, the difference in local resultant force variation rates between data objects
i and
j is defined as LRFVar(
i,
j), which is computed as follows:
By sorting the local resultant force variation rates of all data objects within the dataset, an LRFList is constructed. The differences in local resultant force variation rates between adjacent data objects in this list are then calculated, where pronounced peaks typically indicate outliers characterized by significantly large variation rates. Accordingly, by setting a threshold τ, values of LRFVar exceeding this threshold are identified as anomalies. The threshold τ is defined as follows:
Here, EX denotes the expected value, and SD represents the standard deviation. The adjustment parameter α within (0,3], with a default value set to 2.5. By applying a positive adjustment to the expected value of the LRFVarList, anomalies can be effectively identified. Based on this principle, the LGOD method ranks the dissolved gas data in OLTC oil according to the variation rate of the local resultant force, enabling automatic anomaly detection without the need for manual intervention or complex assumptions.
2.2. Data Reconstruction Method Based on Locally Weighted Regression
Locally Weighted Regression (LWR) is a non-parametric regression technique widely employed for data reconstruction following anomaly detection. Its core principle lies in assigning weights to data points based on their proximity to the target point, whereby neighboring points receive higher weights while distant points are assigned lower weights. This weighting scheme reduces the influence of remote points on the fitted model, thereby enabling effective capture of local data characteristics [
39,
40]. The specific procedural steps of LWR are as follows:
First, weights are computed using a Gaussian kernel function [
41]. For the data point
xt at the current time step
t, the weights of its neighboring points are determined based on their distances to
xt. The weighting function is specifically defined
Here,
wi denotes the weight of the
i-th neighboring point, and
h is the bandwidth parameter. Within the neighborhood of each data point
xt, a weighted linear regression is performed based on the weight distribution. By minimizing the weighted sum of squared residuals, the local regression coefficients are estimated, which are then used to reconstruct the current data point. The regression model is formulated as follows:
Here, β represents the regression coefficients, yi denotes the observed value, and yi′ corresponds to the reconstructed value.
2.3. ETSformer-Based Forecasting Model
The variations in gas concentrations induced by OLTC operations are closely correlated with switching frequency and load fluctuations, necessitating a time series forecasting model capable of simultaneously capturing long-term trends and periodic oscillations [
42]. To this end, the ETSformer model is introduced for temporal modeling and prediction of dissolved gas concentrations in oil. ETSformer is an innovative Transformer architecture that integrates an exponential smoothing mechanism [
43]. By decomposing the time series into level, growth (trend), and seasonal components layer by layer, it significantly enhances the model’s ability to capture temporal structural biases. Leveraging the expressive power of deep neural networks and an efficient residual learning framework, ETSformer precisely captures the latent trend evolution and periodic fluctuations inherent in the dissolved gas data, effectively modeling their complex dynamic dependencies.
For the problem of dissolved gas prediction in OLTC oil, the data is first modeled as a multivariate time series. Let xt∈Rm denote the feature observation vector at time step t, and yt∈R represent the corresponding gas concentration value. Given the historical feature window Xt−L:t−1 = [xt−L,…,xt−1] and the corresponding gas concentration observations Yt−L:t−1 = [yt−L,…,yt−1], the objective is to predict the gas concentrations for the next H steps, denoted as Yt+1:t+H = [yt+1,…,yt+H]. Here, the look-back window length L defines the number of past observations provided as model input, while the forecasting horizon H specifies the number of future steps to be predicted.
Step 1: Input Embedding.
The modeling process begins with an input embedding module that transforms the raw dissolved gas concentration data within the look-back window into a latent representation space. This is achieved through a temporal convolutional filter [
44], which effectively captures short-term correlations among different gas components:
Step 2: Encoder with Growth and Seasonal Decomposition.
Subsequently, the encoder extracts growth and seasonal features of the dissolved gas concentration data through a cascaded, layer-wise mechanism [
45]. At each layer, the residual sequence
Zt−L:t−1 is taken as input. The Multi-Head Exponential Smoothing Attention (MH-ESA) module and the Frequency Attention (FA) module jointly operate to update the residuals
Zt−L:t, while simultaneously generating the latent growth component
Bt−L:t and the seasonal component
St−L:t. This iterative process allows for progressive refinement and separation of temporal patterns:
Step 3: Level Component Extraction.
In parallel, the level component is dynamically estimated. At each time step t, the level value is computed as a weighted combination of the current level estimate and the extrapolated level-growth prediction from the previous step [
45]. This ensures that the level component evolves smoothly while adapting to recent changes:
Step 4: Forecasting Seasonal Dynamics.
For each latent periodic feature dimension within the look-back window, denoted as S(n)t−L:t,i, the Frequency Attention module extrapolates the seasonal component into the forecasting horizon, producing S(n)t:t+H
Finally, the decoder integrates the extracted level, growth, and seasonal components into a coherent prediction framework. By stacking multiple encoder layers and residual connections, ETSformer progressively enhances the representation of dynamic dependencies. The outputs are linearly mapped back from the latent space to the observation space, yielding the multi-step forecasts of dissolved gas concentrations over the future H steps:
As illustrated in
Figure 3, the ETSformer-based forecasting framework employs a series of stacked encoders to extract both growth and seasonal features from historical dissolved gas concentration data, while simultaneously utilizing exponential smoothing to update the level component. The final decoder integrates these three structural components—level, growth, and seasonal—to generate accurate and interpretable multi-step predictions of gas concentrations in OLTC oil.
2.4. The Framework of the OLTC Dissolved Gas Prediction Model
The proposed OLTC dissolved gas concentration prediction method, which integrates the LGOD-based outlier detection approach with the ETSformer forecasting model, is illustrated in
Figure 4. The overall procedure consists of the following steps:
Step 1: The original dissolved gas data from the OLTC is first subjected to outlier detection using the LGOD algorithm. Detected outliers are subsequently removed. Local weighted regression is then employed to impute and reconstruct the data, thereby completing the data preprocessing phase.
Step 2: The preprocessed dataset is partitioned into training, validation, and test sets to facilitate subsequent model training and performance evaluation.
Step 3: An ETSformer-based forecasting model is constructed for predicting OLTC dissolved gas concentrations. The training set is fed into the encoder module of the model for learning, during which model parameters are optimized to achieve the best predictive configuration.
Step 4: The trained model is validated using the validation dataset. Furthermore, the predictive performance is compared against various baseline models to rigorously assess the effectiveness of the proposed method in forecasting OLTC dissolved gas concentrations.
In this study, a comprehensive comparative evaluation is conducted to validate the effectiveness of the proposed OLTC dissolved gas concentration forecasting method. The evaluation includes deep learning models such as Transformer, LogTrans, and Informer, as well as machine learning models including LightGBM and CatBoost. To assess the prediction performance of dissolved gas concentrations in OLTC oil, four widely adopted evaluation metrics are employed: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) [
46]. The definitions of these evaluation metrics are presented as follows, where
yi denotes the actual measured concentration of a dissolved gas at the
i-th time step, and
represents the corresponding predicted value generated by the forecasting model.
3. Experimental Results and Analysis
This study initially conducts an in-depth analysis of dissolved gas monitoring data collected from the OLTC oil of a 220 kV power substation in China, spanning from 1 March 2022 to 1 January 2023, comprising a total of 200 gas volume fraction samples. To overcome data limitations, the OLTC oil sampling system was enhanced to achieve stable, near-daily data collection over the course of one year. It should be noted that such high-frequency monitoring remains uncommon in industry due to additional equipment and operational costs, as well as structural constraints in older OLTCs that limit sampling frequency. To address residual data scarcity and sample imbalance, the original dataset is preprocessed using the LGOD anomaly detection algorithm to identify and remove outliers, followed by imputation of missing values to enhance data quality and reliability. Subsequently, the ETSformer model is employed to forecast dissolved gas concentrations in OLTC oil, and its predictive accuracy is rigorously compared against various benchmark models to validate the proposed method’s effectiveness and superiority in practical applications. Furthermore, the study extends to predictive analysis based on an additional set of 50 chromatographic monitoring samples that reflect the entire insulation degradation process of the OLTC. This dataset captures the evolution of dissolved gas concentrations as the OLTC transitions from normal operation to insulation deterioration. By integrating the proposed deep temporal forecasting approach, the analysis effectively uncovers abnormal patterns and dynamic evolution trends inherent in the data.
3.1. Normal Data Preprocessing and Forecasting
The original dissolved gas monitoring data from the OLTC oil of a 220 kV power substation, as shown in
Figure 5, includes six key gas species: C
2H
6, CO, H
2, C
2H
4, CO
2, and CH
4.
Figure 5 provides a direct visualization of the raw OLTC monitoring dataset, illustrating the temporal distribution and baseline characteristics of each gas concentration prior to any preprocessing. It should be noted that C
2H
2, although a critical diagnostic indicator under fault conditions, is not included in the predictive analysis of this dataset. The overall acetylene concentration measured during normal OLTC operation was consistently close to zero, with only a few scattered points exhibiting abnormal values or abrupt increases. Compared with the more dynamic variations observed in other gases, these limited fluctuations were insufficient to provide meaningful predictive patterns under normal operating conditions. Therefore, C
2H
2 was excluded from the forecasting experiments at this stage; however, its diagnostic relevance is not overlooked, and it is further analyzed in the subsequent fault evolution study, where acetylene becomes a key indicator of insulation degradation and arc discharge.
To optimize the model’s training effectiveness and generalization capability, the complete dataset of 200 samples was first partitioned into training and testing subsets at a ratio of 3:1, prior to any preprocessing operations. The proposed preprocessing methodology was then applied exclusively to the training subset: the LGOD anomaly detection algorithm was used to identify and label outliers, followed by the LWR method for data imputation and correction, thereby enhancing data quality and integrity while avoiding information leakage. The reconstructed training set thus retained consistency with expected physical behavior, whereas the testing subset was left unaltered to provide a fair and unbiased evaluation of model generalization. Taking the H
2 concentration data from
Figure 5 as an illustrative example,
Figure 6 shows this procedure in detail: panel (a) presents the raw time series with detected anomalous points highlighted, while panel (b) displays the reconstructed series after anomaly correction, demonstrating improved stability and continuity aligned with physical expectations.
This study develops a dissolved gas concentration forecasting model for OLTC oil based on the ETSformer architecture, achieving enhanced temporal prediction performance through systematic parameter optimization and structural innovations. The model employs a hierarchical encoder–decoder design, where the encoder consists of two stacked hierarchical layers, each producing a 1024-dimensional latent representation augmented by a 2048-dimensional feedforward neural network for nonlinear feature enhancement. The decoder features a dual-cascade structure that hierarchically extracts multiscale temporal features of gas concentration evolution, combined with an 8-head attention mechanism to effectively capture global dependencies across time steps.
To accommodate the temporal characteristics of dissolved gas concentration, the model input is formulated as a historical look-back window of length L, with the corresponding forecasting horizon denoted as H. Through systematic evaluation and validation by Granger causality tests, the optimal configuration was determined as L = 11 days and H = 7 days. This setting ensures that the historical input window captures sufficient physical diffusion dynamics of gases within the oil, while the prediction horizon provides a practically meaningful forecasting interval for early-warning applications [
47].
The optimization strategy adopts a two-stage hyperparameter search framework [
48]: in the first stage, grid search identifies the optimal learning rate, with η = 3 × 10
−4 yielding the best convergence on the validation loss; in the second stage, Bayesian optimization determines the frequency control parameter K within {0,1,2,3}, achieving an optimal trade-off between model complexity and generalization at K = 2 [
49]. Training is conducted using a progressive learning schedule over 50 iterations, incorporating a dynamic weight decay strategy with linear decay proportional to iteration count. The ETSformer model is constructed and trained on the training dataset, with predictive accuracy rigorously validated against the test set, demonstrating the model’s effectiveness in capturing complex temporal dynamics in dissolved gas concentration forecasting.
The final predictions of gas concentrations based on the trained ETSformer model are illustrated in
Figure 7. As shown, the ETSformer’s forecasted curves exhibit a high degree of concordance with the actual observed values, demonstrating superior fitting capability. Notably, the model outperforms comparative baselines, particularly in capturing the fluctuation patterns across multiple gas components, thereby delivering more precise and reliable prediction results.
To comprehensively demonstrate the superior performance of the ETSformer model in dissolved gas concentration forecasting, this study conducted detailed predictive analyses and comparative evaluations for six dissolved gases in OLTC oil: CO, C
2H
4, CO
2, C
2H
6, CH
4, and H
2. The specific results are summarized in
Table 1. The ETSformer model achieved mean absolute percentage errors (MAPE) of 0.81%, 2.57%, 0.73%, 1.57%, 4.57%, and 2.32%, respectively. In comparison, machine learning baseline models LightGBM and CatBoost yielded MAPE values of 7.22%, 7.32%, 6.83%, 4.41%, 14.41%, 9.15% and 6.85%, 6.84%, 5.78%, 4.13%, 12.57%, 8.89%, respectively. Deep learning baselines LogTrans and Informer obtained MAPE values of 4.02%, 4.05%, 4.43%, 3.77%, 9.56%, 6.43% and 4.28%, 4.22%, 3.29%, 2.49%, 7.25%, 5.87%, respectively.
These results clearly indicate that Transformer-based deep learning models outperform traditional machine learning approaches in time series forecasting tasks. Their core advantage lies in the exceptional capability for temporal feature extraction and modeling of global dependencies, effectively preserving the temporal continuity of data and capturing latent patterns. Addressing the challenge posed by discontinuous data acquisition, the ETSformer model integrates an exponential smoothing mechanism that enhances its representation of long-term trend characteristics, enabling robust prediction performance despite extended time intervals caused by intermittent data sampling.
3.2. Ablation Study Comparative Results
To rigorously assess the contribution of each component in the proposed framework, an ablation study was conducted using the training and testing sets. The comparative results for four evaluation metrics, namely MSE, RMSE, MAE, and MAPE, are presented in
Figure 8, where
Figure 8a–d correspond to MSE, RMSE, MAE, and MAPE, respectively. Specifically, the full model integrating LGOD anomaly detection, LWR data repair, and ETSformer forecasting was compared against three simplified variants: (i) removing LGOD while retaining LWR and ETSformer, (ii) removing LWR while retaining LGOD and ETSformer, and (iii) employing ETSformer alone without preprocessing.
The results clearly demonstrate that the proposed full model consistently achieves the lowest errors across all four metrics, thereby confirming the complementary role of anomaly detection and data repair in enhancing deep temporal modeling. The exclusion of LGOD significantly increases prediction errors, particularly for gases with high volatility such as H2 and C2H4. This highlights the critical role of anomaly detection in filtering spurious fluctuations and preventing distortion of temporal patterns caused by irregular outliers. Similarly, removing LWR leads to noticeable degradations in prediction performance for gases with relatively smoother but incomplete concentration trajectories, such as CO and CH4. This confirms that robust data repair and imputation are indispensable for improving continuity in sparse datasets, which directly translates into higher stability and reduced error in predictive modeling.
Furthermore, the ETSformer-only variant exhibits the weakest performance across all gases and metrics, indicating that although ETSformer provides strong temporal modeling capacity, its predictive accuracy is fundamentally limited without high-quality, preprocessed inputs. Importantly, the ablation results also reveal gas-specific sensitivities: acetylene-free gases such as CO2 show moderate error increases when LGOD or LWR are removed, reflecting their smoother evolution, while H2 and C2H4 predictions suffer disproportionately in the absence of LGOD due to their susceptibility to short-lived spikes. Conversely, CH4 and CO exhibit larger error increases when LWR is omitted, underscoring their reliance on robust imputation for accurate representation of gradual but incomplete dynamics.
Overall, the ablation study provides compelling evidence that the superior performance of the proposed framework is not attributable to ETSformer alone but rather to the synergistic integration of LGOD, LWR, and ETSformer. LGOD primarily safeguards the model from anomalous disturbances, LWR ensures continuity and reliability in sparse monitoring datasets, and ETSformer captures long-term dependencies and temporal patterns. Together, these components form a cohesive pipeline that substantially improves predictive accuracy and robustness for all characteristic gases.
3.3. Processing and Predictive Analysis of Fault Data
In this study, a separate set of dissolved gas analysis data reflecting the entire insulation degradation process of a single OLTC was utilized. Three representative fault evolution stages, labeled as Events A, B, and C, were identified during the earlier data collection period from this same device, based on abnormal results detected using the three-ratio method [
50]. These events were selected to facilitate a phase-wise analysis of characteristic gases, as summarized in
Table 2.
Event A corresponds to the initial anomaly stage, primarily associated with early thermal aging and poor contact. This stage is characterized by a concurrent increase in concentrations of C2H4, CO, H2, CO2, and CH4. Specifically, the CO2 concentration reached 1021.24 μL/L, significantly exceeding the threshold of 900 μL/L, while the H2 concentration rose markedly to 7.6 μL/L compared to baseline levels. In addition, the CH4 concentration reached 6.2 μL/L, indicating a relatively elevated level.
Event B indicates a medium-temperature overheating process, suggesting aggravated deterioration of contact conditions. This is reflected by a general increase in characteristic gas concentrations. In particular, the C
2H
6 concentration increased by 112.5% compared to Event A, and the CO concentration rose to 327.8 μL/L, representing an approximate 10% increase. The concentration of C
2H
4 also increased to 0.62 μL/L. While CO
2 concentration slightly decreased to 981.1 μL/L, the CH
4 concentration remained relatively stable at 5.1 μL/L. According to the three-ratio method [
50], the CH
4/H
2 and C
2H
6/CH
4 ratios were approximately 0.61 and 0.10, respectively, consistent with the gas signature of medium-temperature overheating faults.
Event C presents a typical scenario of severe overheating accompanied by pronounced insulation degradation. This stage is marked by a substantial increase in CO concentration, reaching 388.55 μL/L—well beyond the normal threshold. Concurrently, the concentrations of H2 and C2H4 continued to rise, reaching 8.85 μL/L and 0.89 μL/L, respectively, indicating an escalating fault condition. Notably, the acetylene concentration exhibited a distinct increase, further corroborating the occurrence of advanced insulation deterioration.
For the three types of warning events, this study employs the proposed ETSformer model to perform predictive modeling of key dissolved gas concentrations. The predictive performance is evaluated against several representative methods, including LightGBM, CatBoost, LogTrans, and Informer, as summarized in
Table 3. The results demonstrate that ETSformer consistently achieves the lowest Mean Absolute Error (MAE) values across all events and gas types. Specifically, for Event A, ETSformer attains MAE values of 0.85, 0.12, and 0.21 μL/L for CO, C
2H
4, and H
2, respectively—substantially lower than those of LightGBM, which reach 4.92, 0.56, and 1.05 μL/L. Similarly, in Event C, ETSformer maintains prediction errors within 1.32 μL/L for CO and below 0.3 μL/L for H
2, demonstrating its robustness in accurately capturing gas concentration variations under more severe fault conditions. Furthermore, for Event B, ETSformer achieves an MAE of only 0.03 μL/L for C
2H
2, markedly outperforming all benchmark models and highlighting its high sensitivity in detecting minor yet critical changes in gas levels. Overall, these results indicate that ETSformer exhibits superior predictive capability in modeling preprocessed dissolved gas concentration data, effectively capturing subtle fluctuations and dynamic trends that may signify incipient transformer faults.
4. Results
This paper proposes an integrated prediction framework for dissolved gas concentrations in on-load tap changer (OLTC) oil, combining LGOD-based outlier detection, locally weighted regression (LWR) for data imputation, and ETSformer for time-series forecasting. First, the LGOD method is employed to accurately identify abnormal data points by considering local data heterogeneity, thereby enhancing both the sensitivity and reliability of outlier detection. Subsequently, LWR is utilized to smoothly repair anomalous values, preserving the underlying data trends while mitigating the influence of local deviations on model accuracy. Finally, the ETSformer model is used to perform deep temporal modeling on the corrected time series. Experimental results demonstrate that the proposed framework significantly outperforms traditional statistical and machine learning approaches in terms of both prediction accuracy and computational efficiency. This superiority is particularly evident under intermittent data acquisition conditions. In addition, ablation experiments confirm that the performance gains are not solely attributable to ETSformer itself, but rather arise from the complementary integration of LGOD, LWR, and ETSformer, each of which contributes distinct advantages to data quality, continuity, and temporal modeling. The framework exhibits robust performance across both normal and fault scenarios, accurately capturing gas concentration trends and subtle fluctuations.
In future work, the integration of deep learning-based multimodal data fusion approaches—leveraging real-time measurements such as vibration, temperature, and current—could further enhance the prediction accuracy and robustness under complex operating conditions. At the same time, we recognize the limitation that all experiments were conducted on a single dataset. Cross-dataset validation remains an essential step to fully establish the generalizability of the proposed framework. However, given the scarcity of high-frequency OLTC datasets, such work could not be included in the current study and is identified as a key direction for future research. These advancements would improve model adaptability and promote its application in smart grids and power equipment monitoring, ultimately strengthening fault warning capabilities and maintenance efficiency.