Next Article in Journal
Generation of High-Resolution Time-Series NDVI Images for Monitoring Heterogeneous Crop Fields
Previous Article in Journal
Privacy Protection in AI Transformation Environments: Focusing on Integrated Log System and AHP Scenario Prioritization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimized NRBO-VMD-AM-BiLSTM Hybrid Architecture for Enhanced Dissolved Gas Concentration Prediction in Transformer Oil Soft Sensors

1
College of Energy and Power Engineering, Inner Mongolia University of Technology, Hohhot 010051, China
2
Inner Mongolia Power (Group), Co., Ltd., Hohhot 010010, China
3
College of Electric Power, Inner Mongolia University of Technology, Hohhot 010080, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(16), 5182; https://doi.org/10.3390/s25165182
Submission received: 6 July 2025 / Revised: 6 August 2025 / Accepted: 18 August 2025 / Published: 20 August 2025
(This article belongs to the Section Electronic Sensors)

Abstract

Soft sensors have emerged as indispensable tools for predicting dissolved gas concentrations in transformer oil-critical indicators for fault diagnosis that defy direct measurement. Addressing the persistent challenge of prediction inaccuracy in existing methods, this study introduces a novel hybrid architecture integrating time-series decomposition, deep learning prediction, and signal reconstruction. Our approach initiates with variational mode decomposition (VMD) to disassemble original gas concentration sequences into stationary intrinsic mode functions (IMFs). Crucially, VMD’s pivotal parameters (modal quantity and quadratic penalty term) governing bandwidth allocation and mode orthogonality are optimized via a Newton–Raphson-based optimization (NRBO) algorithm, minimizing envelope entropy to ensure sparsity preservation through information-theoretic energy concentration metrics. Subsequently, a bidirectional long short-term memory network with attention mechanism (AM-BiLSTM) independently forecasts each IMF. Final concentration trends are reconstructed through superposition and inverse normalization. The experimental results demonstrate the superior performance of the proposed model, achieving a root mean square error (RMSE) of 0.51 µL/L and a mean absolute percentage error (MAPE) of 1.27% in predicting hydrogen (H2) concentration. Rigorous testing across multiple dissolved gases confirms exceptional robustness, establishing this NRBO-VMD-AM-BiLSTM framework as a transformative solution for transformer fault diagnosis.

1. Introduction

Oil-immersed transformers are critical components in power systems, serving as the core hub for energy conversion and transmission [1]. The transformer’s health status critically determines the fault tolerance and contingency management capabilities of regional electricity supply networks. During long-term operation, transformers are susceptible to faults caused by extreme temperature fluctuations, insulation aging, and inadequate maintenance practices. Dissolved gas analysis (DGA) serves as a prevalent diagnostic technique for assessing the health status of oil-immersed transformers. By analyzing the composition, concentration, and production rates of dissolved gases under normal and fault conditions, potential internal faults can be identified. The deployment of online gas monitoring devices enables the periodic collection of gas concentration data, facilitating predictive modeling to anticipate transformer conditions [2,3]. Soft sensors, as a key technology in predictive modeling, have gained significant attention due to their ability to estimate difficult-to-measure variables indirectly. This proactive approach supports targeted maintenance strategies, reducing operational costs and enhancing reliability. Consequently, the accurate prediction of dissolved gas concentration trends in transformer oil using soft sensors is of paramount importance for transformer maintenance and plays a pivotal role in power system research and applications [4,5,6].
Online gas monitoring devices for oil-immersed transformers typically collect gas composition and concentration data, generating long-term time-series datasets [7]. A thorough analysis of these time series, combined with advanced forecasting methods, can predict future gas concentration trends. However, the complexity of transformer operation, influenced by factors, such as temperature, environmental conditions, load variations, and core structure, leads to intricate insulation degradation mechanisms. These mechanisms result in nonlinear and non-stationary gas concentration time series, making accurate prediction a challenging task [8].
Soft sensors, which leverage data-driven models to predict key variables, offer a promising solution to this challenge by integrating historical data and real-time measurements. In recent years, artificial intelligence (AI)-based soft sensors have gained popularity and have been successfully applied to transformer condition assessment [9,10,11]. Nevertheless, conventional AI approaches face inherent limitations in handling dissolved gas time series: (1) unidirectional architectures (e.g., long short-term memory, LSTM) cannot capture bidirectional dependencies where gas generation is influenced by both preceding operational states and subsequent physicochemical processes; (2) without adaptive feature weighting, they struggle with multivariate coupling under dynamic operating conditions [12,13,14]. As a result, researchers have increasingly adopted hybrid models that combine multiple methods to enhance prediction accuracy. Meta-heuristic algorithms are frequently employed to predict relevant model parameters, such as genetic algorithm (GA), slime mould algorithm (SMA), sparrow search algorithm (SSA), and particle swarm optimization (PSO). By optimizing models, these algorithms can enhance the predictive performance for important time-series data. For example, a bidirectional gated recurrent unit (BiGRU) model optimized by SSA has shown significant improvements over single-model approaches. Additionally, ensemble empirical mode decomposition (EEMD) has been employed to improve the quality of modal decomposition, with highly correlated subsequences serving as inputs to bidirectional LSTM (BiLSTM) models. These hybrid approaches have demonstrated superior predictive accuracy and provided new insights into gas concentration forecasting [15,16,17,18]. However, the challenge remains in effectively integrating multiple models and leveraging their unique characteristics to further enhance prediction performance, particularly in soft sensor development.
In addition to the aforementioned methods, research scholars have also employed mode decomposition techniques from the field of signal processing. These techniques decompose the nonlinear and non-stationary time-series data of dissolved gases in oil into multiple stationary subsequences. Subsequently, regression models or neural networks are used to independently forecast each subsequence. The final prediction result is then obtained by reconstructing these forecasted components. Decomposition techniques like empirical mode decomposition (EMD) and its ensemble variant (EEMD), while valuable for non-stationary signals, suffer from mode mixing, where noise or transient events cause spectral overlap between intrinsic mode functions (IMFs) [19,20,21]. This ambiguity in component separation obscures fault-indicative frequencies and propagates errors to downstream prediction modules.
Variational mode decomposition (VMD), in contrast, utilizes a variational optimization framework to minimize the total variation in the time series and the mutual information between its modal functions. Compared to EMD and EEMD, VMD offers superior localization performance and noise suppression capabilities, making it particularly suitable for decomposing complex time-series signals. However, VMD’s effectiveness is highly dependent on the selection of its parameters [22,23]. To optimize VMD’s parameters, this study employs a Newton–Raphson-based optimizer (NRBO), which integrates the Newton–Raphson Search Rule (NRSR) with a Trap Avoidance Operator (TAO) [24,25]. The NRBO enhances search efficiency, accelerates convergence, and mitigates the risk of local optima entrapment.
To overcome the limitations of conventional approaches, this study presents a novel soft sensor model for dissolved gas concentration in transformer oil. This hybrid model integrates BiLSTM with an attention mechanism (AM-BiLSTM) and NRBO-optimized VMD. The process begins by decomposing the original dissolved gas concentration time series into more stationary subsequences using the optimized VMD. These subsequences are then input into the BiLSTM network. The attention mechanism dynamically calculates weights for different input variables and time steps, further enhancing prediction accuracy. By leveraging adaptive feature fusion mechanisms, the developed architecture capitalizes on soft sensor functionalities and demonstrates significantly enhanced predictive capability for dissolved gas concentrations compared to existing methods.

2. Methodology

2.1. NRBO-Optimized VMD

2.1.1. Variational Mode Decomposition

An adaptive optimization scheme is implemented by VMD to ascertain the mode-specific center frequencies and bandwidth parameters through constrained variational calculus. This method is particularly suitable for handling complex and nonlinear time series.
The constrained variational structure of VMD is formulated as follows:
min u k , ω k , k = 1 K n δ n + j π n u k n e j ω k n 2 2 s . t . k = 1 K u k ( n ) = T ( n )
where T ( n ) represents the time series of dissolved gas concentration. u k n denotes the decomposed modal component. K represents the number of modes. δ n represents the Dirac delta function. ω k represents the central frequency. n is the partial derivative with respect to n . The symbol * denotes the convolution operation.
To optimize the constrained variational model, a combination of a quadratic penalty term α , the Lagrange operator λ ( n ) , and the augmented Lagrangian function is employed. The augmented Lagrangian function is expressed as:
L u k , ω k , λ = α k = 1 K n δ n + j π n u k n e j ω k n 2 2 + T ( n ) k = 1 K u k ( n ) 2 2   + λ n , T n k = 1 K u k n
The optimization process proceeds iteratively as follows:
(1) Initialization:
Set initial values for u k , ω k , λ 1 , and m = 0 . Perform cyclic iterations to update m = m + 1 , u k , ω k , and λ using Equations (3)–(5).
(2) Frequency range definition:
Define the range of non-negative frequencies and perform iteration accordingly u k .
u ^ k m + 1 ( ω ) = T ^ ( ω ) s = 1 k 1 u ^ s m ( ω ) + λ ^ n ( ω ) / 2 1 + 2 α ( ω + ω m k ) 2
where u ^ k m + 1 ( ω ) , T ^ ( ω ) , and λ ^ n ( ω ) correspond to the Fourier transforms of u k m + 1 , T ( n ) , and λ ( n ) , respectively.
(3) Update central frequencies:
ω k m + 1 = 0 ω u ^ k m + 1 ( ω ) 2 d ω 0 u ^ k m + 1 ( ω ) 2 d ω
(4) Iteration within non-negative frequency range:
Further iteration of λ is required within the interval of non-negative frequency τ .
λ ^ n + 1 ( ω ) = λ ^ n ( ω ) + τ T ^ ( ω ) k = 1 K u ^ k m + 1 ( ω )
(5) Convergence check:
Set the judgment precision ε to satisfy:
k = 1 K u ^ k m + 1 u ^ k m 2 2 u ^ k m 2 2 < ε
If the conditions of Equation (6) are met, the optimization process is complete. Otherwise, return to step (2) for recalculation.
VMD effectiveness is influenced by multiple factors, with the decomposition component number K and the penalty factor α being critical parameters. Different values of K can lead to significant variations in decomposition results, while α governs the bandwidth of each component. Taking experimental data on dissolved gases in transformer oil as an example, this study investigates the specific impact of K and α on VMD performance, analyzing changes in center frequencies, as shown in Figure 1 and Figure 2.
1. Impact of Component Number K (fixed α = 100):
Figure 1 demonstrates that the choice of K significantly affects decomposition quality. When K = 2, the frequency iteration count for the decomposed components is clearly insufficient, exhibiting characteristics of under-decomposition. Conversely, when K = 9, pronounced mode mixing emerges during the initial iteration stages, indicating over-decomposition has occurred.
2. Impact of Penalty Factor α (fixed K = 7):
Further analysis, keeping K = 7 constant, reveals the influence of the penalty factor α on the decomposition process. The results show that smaller α values (e.g., α = 50) cause severe mode mixing, resulting in insufficient differentiation between components. As α increases, the frequency differences between components become progressively more distinct. However, excessively large α values (e.g., α = 500) degrade signal reconstruction accuracy.

2.1.2. Newton–Raphson-Based Optimizer

The NRBO is a state-of-the-art metaheuristic algorithm recognized for its robust search capabilities and effective local optima avoidance [26]. This section outlines the NRBO’s implementation steps, focusing on its application to VMD parameter optimization.
(1) Population initialization:
The optimization process begins with the initialization of the population. Let N P denote the number of individuals in the population, and assume the optimization problem involves dim dimensions. The initial positions of the population are randomly generated using the following formula:
x j n = l b + rand u p l b
where x j n represents the position of the n t h individual in the j t h dimension, j 1 , 2 , , dim , n 1 , 2 , , N p . lb and up are the lower and upper bounds of the parameter to be optimized, respectively, and rand is a random number uniformly distributed between 0 and 1.
(2) Fitness value calculation:
Subsequent to population initialization, the fitness of each individual is evaluated based on a predefined fitness function. The best and worst fitness values, along with their corresponding positions, are identified and denoted as x b and x w , respectively.
(3) Application of the NRSR rule:
In the t t h iteration, the n t h individual seeks a new position using the NRSR rule, formulated as follows:
x n t + 1 = r 1 r 1 X 1 n t + 1 r 2 X 2 n t + 1 r 2 X 3 n t
where x n t + 1 is the new position determined by the NRSR rule. r 1 and r 2 are random numbers between (0, 1). After updating, three new positions, denoted as X 1 n t , X 2 n t , and X 3 n t , are obtained.
X 1 n t = x n t N r + R h o
X 2 n t = x b N r + R h o
X 3 n t = x n t δ X 2 n t X 1 n t
where N r is the value calculated using the NRSR. Rho represents the step size factor. δ represents the adaptive coefficient.
N r = randn y w y b Δ x 2 y w + y b 2 x n t
R h o = rand x b x n t + rand x a 1 t x a 2 t
δ = 1 2 t T 5
Δ x = rand x b x n t
where y w and y b are two positions obtained from the NRSR search results for x n t , which can enhance the search capability of the NRBO. Δ x is the exploration range. The randn represents a random number following the standard normal distribution. a 1 and a 2 are two distinct random numbers between [1, N P ]. T represents the maximum number of iterations.
(4) Avoiding Local Optima:
By combining x b and x n t + 1 through T A O , a superior solution x T A O t is generated. If a random number is less than the decision factor DF (typically 0.6), then the position is updated according to the following formula:
TAO combines the current position x n t + 1 with a randomly generated position x b to produce a superior solution x T A O t . If a randomly generated number is less than the decision factor DF (typically set to 0.6), the position is updated as follows:
x n t + 1 = x T A O t + 1 ,   if   rand   < D F x n t + 1 = x n t + 1 ,     else
x T A O t = x n n t + 1 + θ 1 μ 1 x b μ 2 x n n t + θ 2 δ μ 1 Mean x n n t μ 2 x n n t ,   if   μ 1 < 0.5 x T A O t = x b + θ 1 μ 1 x b μ 2 x n n t + θ 2 δ μ 1 Mean x n n t μ 2 x n n t ,       else
x T A O t + 1 = x T A O t
where θ 1 and θ 2 are random numbers within the intervals (−1, 1) and (−0.5, 0.5), respectively. DF constitutes the governing determinant that dictates the operational efficacy of the NRBO, while μ 1 and μ 2 are random numbers, as shown below:
μ 1 = 3 β × rand + 1 β
μ 2 = β × rand + 1 β
where β is a binary number, taking the value of either 1 or 0. If Δ 0.5 , then β is set to 0; otherwise, it is 1. Due to the randomness of μ 1 and μ 2 , the population becomes more diverse, allowing it to escape local optima and thereby enhancing its diversity.

2.1.3. Parameter Optimization of VMD via NRBO

In the VMD process, the two critical parameters K and α are treated as individuals within the NRBO population. The optimal parameter combination is iteratively determined through NRSR and TAO enhancement, continuing until convergence criteria are satisfied. The NRBO significantly improves VMD performance by ensuring spectral orthogonality of decomposed modes, particularly when processing nonlinear multicomponent temporal signals.
A complete optimization flowchart for VMD parameter tuning using NRBO is illustrated in Figure 3.

2.2. Bidirectional Long Short-Term Memory

The strong temporal dependencies’ dissolved gas concentration time series challenge conventional unidirectional LSTM networks, which fail to fully capture the data’s complex temporal dynamics. To overcome this constraint, BiLSTM architecture employing bidirectional temporal processing through parallel forward and backward layers is implemented. This configuration preserves critical historical information from both temporal contexts, effectively modeling nonlinear interdependencies to enhance predictive accuracy.
The LSTM architecture, serving as the foundational component for bidirectional implementations, resolves vanishing gradient limitations inherent in conventional recurrent neural networks through specialized gating mechanisms that orchestrate information propagation [27,28,29].
At each time step t , the LSTM unit receives the current input X t , the previous short-term state h t 1 , and the previous cell state C t 1 . The input gate i t , forget gate f t , and output gate o t are responsible for controlling the flow of information into and out of the memory cell. The cell state C t is updated based on the input and forget gates, while the output gate determines the short-term state h t that is passed to the next time step. The resultant hidden state of the LSTM architecture is collectively governed by the output gate modulation and current cell state configuration, as depicted in Figure 4. The governing equations describing these computational processes are formulated as follows:
C ˜ t = tanh W c h t 1 + W c C k t + b c
C t = f t C t 1 + i t C ˜ t
h t = o t tanh C t
where f t , i t , and o t represent the status results of the forget gate, input gate, and output gate, respectively. W c denotes the weight matrix of the input unit state. b c refers to the bias term of the input unit state. tanh indicates the activation function.
In the BiLSTM architecture, dual independent LSTM modules process input sequences in complementary temporal directions, as shown in Figure 5. The forward-propagating LSTM unit chronologically processes sequential data from initial to terminal points, preserving temporal dependencies preceding the current state. Conversely, the backward-propagating LSTM unit operates in reverse chronological order, integrating subsequent temporal context relative to the observation point. These bidirectional contextual representations are subsequently concatenated to generate the composite contextual representation of the BiLSTM layer:
h t = LSTM X t , h t 1 h t = LSTM X t , h t 1 y t = σ W y h t , h t + b y
where h t represents the forward hidden layer state. h t represents the backward hidden layer state. W y and b y are the weight matrix and bias term, respectively.
The BiLSTM network is particularly well suited for the task of predicting dissolved gas concentrations in transformer oil due to its ability to model long-term dependencies and capture complex temporal patterns. By leveraging both historical and future information, the BiLSTM can more accurately predict future gas concentration trends, which is critical for effective transformer fault diagnosis and maintenance planning.

2.3. Attention Mechanism

The attention mechanism is a data processing technique of machine learning that emulates human visual attention, drawing inspiration from the human ability to focus on key areas while ignoring others in specific contexts, with the aim of more effectively filtering and utilizing information. In machine learning, the attention mechanism enhances model performance by assigning different weights to various feature vectors, thereby emphasizing important features and disregarding less critical information. The application of the attention mechanism in BiLSTM networks involves assigning a weight to each element of the input sequence, with this weight determining the element’s influence on the final output [30]. The process of calculating weights through the attention mechanism is as follows:
X t is the input sequence to the BiLSTM network. h t is hidden states obtained after each input passes. Encoding h t yields a set of query vectors q , which are then scored using similarity measures, defined as follows:
s h t , q = V T tanh W h t + U q
where V , W , and U are denoted as matrices of parameters that have been learned through training. By employing the softmax function for normalization, the weights a t associated with the input vectors can be derived, as detailed subsequently:
a t = softmax s h t , q = exp s h t , q t = 1 n exp s h t , q
Based on the weights and the corresponding value vectors, the weighted sum is calculated to update y t , which is the BiLSTM output after being processed by the attention mechanism.
The attention mechanism empowers the model to dynamically weight salient features within input sequences, thereby augmenting computational efficacy through adaptive feature prioritization, as demonstrated in Figure 6.

2.4. AM-BiLSTM Architecture

The AM-BiLSTM model consists of an input layer, a BiLSTM layer, an AM layer, and an output layer. The input data comprise dissolved gas concentrations preprocessed through VMD. The BiLSTM layer provides robust nonlinear mapping capabilities essential for modeling complex temporal patterns in the input data. Simultaneously, the incorporated attention mechanism enhances the model’s ability to identify critical time steps within the feature sequences. This integrated architecture significantly improves temporal correlation modeling across multivariate gas concentration features. By dynamically weighting influential features, the attention mechanism enables focused learning on diagnostically significant patterns, thereby enhancing prediction accuracy. The output layer generates the final predictive values. The complete structure of the proposed AM-BiLSTM framework is depicted in Figure 7.

3. NRBO-VMD-AM-BiLSTM Forecasting Model

The operational workflow of the NRBO-VMD-AM-BiLSTM predictive framework is shown in Figure 8, which mainly includes four key stages: data preparation, data decomposition, model construction, evaluation and comparison of prediction results (see Figure 9).
(1) Multidimensional Data Preparation
Operational parameters influencing oil-immersed transformers, including ambient temperature, relative humidity, top oil thermal profiles, and peak load metrics, are integrated with dissolved gas concentration measurements to construct a multivariate chronological dataset. To address dimensional discrepancies across parameters, min–max normalization is applied prior to partitioning the dataset into training and testing subsets at an 8:2 ratio.
(2) Optimized Mode Decomposition
The VMD process is enhanced through NRBO to determine critical hyperparameters: K and α . This optimized configuration decomposes raw gas concentration signals into multiple IMFs exhibiting spectral stationarity and mode orthogonality.
(3) Model Construction
The multivariate time series from (1) and the subsequences from (2) are recombined into a new multivariate time series and sequentially input into the BiLSTM layer, which is responsible for extracting features from input data. Then, the attention layer calculates the weights of BiLSTM hidden layers to identify and emphasize key information. Ultimately, the final component prediction results are obtained in the output layer of BiLSTM through linear transformation and activation.
(4) Model Evaluation
All component prediction results are summed and then denormalized to obtain the final predicted gas concentration. Comprehensive evaluation using prediction metrics verifies the superior predictive accuracy of the proposed algorithm compared to benchmark models.
The predictive accuracy of the NRBO-VMD-AM-BiLSTM model is evaluated using two metrics: the root mean square error (RMSE) and the mean absolute percentage error (MAPE) y M A P E . Diminished values of these dual metrics signify enhanced predictive efficacy, reflecting the model’s improved generalization capability and measurement fidelity.
R M S E = 1 n i = 1 k x a c t i x p r e d i 2
M A P E = 100 % n i = 1 n x a c t i x p r e d i x a c t i
where x a c t i represents the true value of the i t h test sample, x p r e d i represents the model’s predicted value for i t h test sample, and n denotes the total number of test samples, which is 100 in this context.

4. Prediction Results

4.1. Testing Platform

The testing platform is a self-configured computer with a 64-bit Windows 11 operating system, an 8 GB RTX 2050 GPU processor, 24 GB of RAM, and deep neural architectures are implemented through the PyTorch 1.13.1.

4.2. Experimental Data

The experimental dataset is derived from real dissolved gas data in oil from a 220 kV oil-immersed transformer of a Chinese power company, comprising a total of 500 valid data entries. The data include six parameters: maximum and minimum temperatures, average daily temperature, humidity, top oil temperature of the transformer, and maximum load, which, together with the dissolved gas concentrations, form a multivariate time series. Using the hydrogen (H2) concentration as a representative case study, the raw measurement values are presented in Figure 10.
Figure 10 exhibits distinct nonlinear characteristics, indicating that the rate of change of the H2 gas concentration varies across different time periods, potentially influenced by various factors such as the aging of internal insulation and electrothermal faults in the transformer, leading to a complex and variable dynamic process of gas generation and release.

4.3. Time-Series Data Preprocessing

The principal objective of the NRBO-VMD-AM-BiLSTM framework is to determine optimal configurations for the VMD hyperparameters K and α via NRBO, where minimum envelope entropy serves as the evolutionary fitness criterion. With a termination condition of 20 iterations, comparative analysis of convergence behaviors across varying population sizes reveals performance characteristics, as visualized in Figure 11.
As depicted in Figure 6, the analysis results indicate that the NRBO optimization algorithm achieved its optimal performance when the population size was set at 30 and the optimal parameter values were determined to be K = 7 and α = 285. The distribution of the central frequencies of the decomposed modes corresponding to these optimal solutions is illustrated in Figure 12.
As clearly demonstrated in Figure 7, decomposition of the H2 concentration time series using optimal parameters yields well-separated modal components with non-overlapping center frequencies within defined bounds and no cross-mixing. These components are further illustrated in Figure 13.
Upon examination of Figure 8, it is apparent that the modal components do not overlap, and the distribution boundaries are well defined, suggesting that the chosen parameters effectively isolate the signal components across various frequencies.

4.4. Comparison of Algorithm Optimization Results

To validate the performance advantages of the proposed NRBO algorithm, this study selected SMA, GA, SSA, and PSO as comparative algorithms for hyperparameter optimization of the AM-BiLSTM model. Optimization was performed on three key hyperparameters: learning rate [0.001, 0.01], number of neurons in the hidden layer [16, 64], and training epochs [40, 100] [31]. RMSE was adopted as the fitness value for algorithmic optimization. Figure 14 illustrates the fitness evolution curves of each optimization algorithm during the training process. The experimental results demonstrate that compared to the other four algorithms, the NRBO algorithm exhibits faster convergence speed and achieves superior solutions within fewer iterations. This advantage enables its enhanced performance in hyperparameter optimization tasks. The optimized hyperparameters of the AM-BiLSTM model are presented in Table 1.
To compare the performance of the AM-BiLSTM model under different optimization algorithms, two-step-ahead predictions were conducted on the test set, and error values for the future two-step predictions were calculated, as illustrated in Figure 15.
The results demonstrate that hyperparameter optimization significantly enhances model prediction accuracy while effectively reducing forecasting errors. As illustrated in Figure 10, the model optimized by the NRBO algorithm achieves optimal performance, with both RMSE and MAPE error metrics lower than those of the unoptimized model and other optimization algorithms. Notably, the NRBO-optimized model exhibits superior stability in the second-step prediction, indicating its distinct advantage for time-series forecasting tasks.

4.5. Prediction Results of NRBO-VMD-AM-BiLSTM Model

Predictions of the modal components are presented in Figure 16. The predicted values are remarkably consistent with the actual values, thereby demonstrating the superior forecasting capabilities of the NRBO-VMD-AM-BiLSTM model across diverse amplitude time series. The H2 concentration forecast is completed by linearly combining the predicted modal components, with the results presented in Figure 17. The prediction of the H2 concentration exhibits a high level of accuracy, with the majority of the predicted points coinciding with the actual data points, thereby highlighting the model’s high precision.

4.6. Performance Comparison of Different Models

To validate the effectiveness of the proposed NRBO-VMD-AM-BiLSTM model in enhancing prediction accuracy for dissolved gas concentrations in transformer oil, comparative experiments were conducted against nine benchmark models on the identical dataset: NRBO-BiLSTM, AM-NRBO-BiLSTM, VMD-AM-BiLSTM, NRBO-BiGRU, VMD-AM-NRBO-BiGRU, NRBO-EEMD-AM-BiLSTM, BiLSTM, and LSTM. The prediction results and error statistics for all models are presented in Table 2.
BiLSTM demonstrates predictive capability for dissolved gas concentrations in transformer oil, but its accuracy remains suboptimal due to challenges in determining optimal parameters. As evidenced in Table 2, BiLSTM achieves a 28.1% reduction in MAPE and 16.13% lower RMSE compared to the LSTM model in the first-step prediction, confirming its superior adaptability to experimental data. The AM-NRBO-BiLSTM model further reduces errors to 1.04 µL/L (RMSE) and 1.54% (MAPE), representing reductions of 0.26 µL/L and 0.84%, respectively, versus the baseline BiLSTM. These results validate the enhanced performance of parameter-optimized BiLSTM architectures. AM contributes to this improvement by dynamically reweighting hidden state parameters of BiLSTM outputs.
The proposed NRBO-VMD-AM-BiLSTM model outperforms all comparative architectures in concentration forecasting. It effectively captures the temporal dynamics of the H2 concentration, with VMD decomposition reducing original time-series complexity and enhancing prediction accuracy. This superior performance versus EEMD-based approaches confirms VMD’s efficacy in processing complex dissolved gas time series, ultimately boosting model predictive capability. Collectively, the proposed hybrid model achieves substantial accuracy improvements over both individual and composite benchmarks, demonstrating particular efficacy for H2 forecasting.
To further validate NRBO-VMD-AM-BiLSTM’s applicability, concentration predictions were conducted for CH4, CO, and total hydrocarbon gases in transformer oil. Final prediction results after decomposition–reconstruction and comparative error metrics across methods are presented in Figure 18 and Figure 19.
Figure 18a indicates that the majority of CH4 concentration values are distributed between 9.5 and 10.2 µL/L, and the prediction model is capable of accurately capturing these concentration variations. For the concentrations of CO and total hydrocarbons, the predicted curves closely match the actual value curves, demonstrating the model’s highly stable predictive performance. Figure 19 presents the prediction error results for different models. When employing the NRBO-VMD-AM-BiLSTM combined model, the prediction errors for CO and CH4, denoted as MAPE, are only 0.71% and 1.02%, respectively, which are the lowest among all the forecast models, exhibiting precise predictive capabilities.
The error results from Figure 14 further reveal that the prediction errors obtained after decomposition using the NRBO-VMD combined model are lower than those when using VMD or EEMD alone. This substantiates the presence of significant noise in time series, which may originate from environmental temperature fluctuations during sampling, precision limitations of online gas monitoring equipment, and the long-term aging of transformer oil, leading to a substantial amount of non-stationary signals in the time series. Effective modal decomposition through NRBO-VMD, coupled with the inclusion of data such as ambient temperature and humidity, top oil temperature, and maximum load in the training set, provides a more stable time series and richer feature information for forecasting.
Combining the predictive outcomes from Figure 18 and Figure 19, it is evident that the NRBO-VMD-AM-BiLSTM combined forecasting model can achieve accurate predictions for dissolved gases in oil, demonstrating good performance.

4.7. Threshold Exceedance Prediction Results

A normally operating 220 kV main transformer (model: SSZ-150000/220) exhibited significant dissolved gas anomalies on 18 September 2012. DGA monitoring data revealed critical changes: total hydrocarbon content exceeded the threshold with detectable C2H2 presence, as detailed in Table 3. The volume fractions of CH4, C2H6, C2H4, and total hydrocarbons surged to 74.05 µL/L, 23.17 µL/L, 80.90 µL/L, and 179.66 µL/L, respectively. According to the IEC 60599:2015 Standard [3], these concentrations reached technically significant levels, indicating potential abnormalities such as the thermal decomposition of solid insulation materials or localized overheating within the transformer. Although on-site outage inspection identified no critical safety hazards, enhanced monitoring was implemented during continued operation.
To validate the capability of the proposed NRBO-VMD-AM-BiLSTM hybrid model in predicting abrupt gas concentration surges, this study utilized 405 DGA samples collected from 25 November 2011 to 20 December 2022 as training data. The dataset includes both normal operational data and data corresponding to moments when thresholds were exceeded. For model validation, the latest 80 samples were designated as the test set, with particular focus on evaluating total hydrocarbon gas prediction performance. The forecasting results are presented in Figure 20.
The prediction model achieves an RMSE of 1.54 µL/L and MAPE of 2.71% for total hydrocarbon concentration forecasting. As demonstrated by the comparative curves in Figure 20, while minor deviations exist in absolute magnitude between predicted and measured values, the model successfully captures peak magnitude characteristics and trend inflection points in gas concentration evolution. Crucially, the predicted curve maintains high temporal alignment with the actual monitoring curve at critical timestamps.
This trend-predictive capability delivers significant practical value for power equipment condition assessment. It enables early anticipation of operational state transitions, providing data-driven support for fault diagnosis and maintenance decision making. Through timely analysis of gas evolution patterns, operators can accurately identify underlying equipment abnormalities, formulate targeted inspection protocols, and implement precision maintenance strategies. Consequently, this approach effectively prevents fault escalation while ensuring the safe and reliable operation of power transformers.

5. Conclusions

To address the challenge of predicting dissolved gas concentrations in transformer oil, a combined optimization model, NRBO-VMD-AM-BiLSTM, has been proposed to enhance the accuracy of such predictions.
(1) The VMD technique, optimized by NRBO, is employed for modal decomposition of the gas concentration time series, extracting modal components across various frequency scales. This step effectively addresses the complexity and nonlinearity of dissolved gas predictions in transformer oil, laying a foundation for subsequent precise forecasting.
(2) Relevant feature sequences, such as ambient temperature and humidity, top oil temperature, and maximum load, are added to the training set. These, combined with the subsequences obtained after modal decomposition, form a multivariate time series, providing the BiLSTM prediction model with more stable time-series data and richer feature information. Furthermore, an attention mechanism is introduced to calculate the weights of the BiLSTM’s hidden layers, thereby improving the model’s predictive accuracy.
(3) Comparative analysis of the prediction results for key fault gases (H2, CO, CH4, total hydrocarbons) in transformer oil indicates that the model achieves accurate concentration forecasts.
(4) For dissolved gas concentrations reaching attention-threshold levels in transformer oil, the model effectively predicts both magnitude peaks and trend inflection points while accurately characterizing concentration evolution patterns. This capability provides reliable trend analysis, enabling proactive assessment of transformer operational integrity. Consequently, targeted mitigation measures can be implemented to prevent fault escalation.

Author Contributions

Conceptualization, N.W. and W.L.; methodology, N.W.; formal analysis, X.L.; investigation, N.W.; data curation, W.L.; writing—original draft preparation, N.W.; writing—review and editing, N.W.; supervision, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Inner Mongolia Basic Science Research Business Expenses of Directly Affiliated Universities of China No. JY20220421.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to policy reasons.

Conflicts of Interest

N.W. was employed by Inner Mongolia Power (Group) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Ward, S.A.; El-Faraskoury, A.; Badawi, M.; Ibrahim, S.A.; Mahmoud, K.; Lehtonen, M.; Darwish, M.M.F. Towards Precise Interpretation of Oil Transformers Via Novel Combined Techniques based on DGA and Partial Discharge Sensors. Sensors 2021, 21, 2223. [Google Scholar] [CrossRef]
  2. Xing, Z.; He, Y.; Chen, J.; Wang, X.; Du, B. Health Evaluation of Power Transformer using Deep Learning Neural Network. Electr. Pow. Syst. Res. 2023, 215, 109016. [Google Scholar] [CrossRef]
  3. IEC 60599; Mineral Oil-Filled Electrical Equipment in Service—Guidance on the Interpretation of Dissolved and Free Gases Analysis. IEC: Geneva, Switzerland, 2015.
  4. Mohler, I.; Ujević Andrijić, Ž.; Bolf, N. Soft Sensors Model Optimization and Application for the Refinery Real-Time Prediction of Toluene Content. Chem. Eng. Commun. 2018, 205, 411–421. [Google Scholar] [CrossRef]
  5. Liu, Y.; Xie, M. Rebooting Data-Driven Soft-Sensors in Process Industries: A Review of Kernel Methods. J. Process. Contr. 2020, 89, 58–73. [Google Scholar] [CrossRef]
  6. Lei, P.; Ma, F.; Zhu, C.; Li, T. LSTM Short-Term Wind Power Prediction Method based on Data Preprocessing and Variational Modal Decomposition for Soft Sensors. Sensors 2024, 24, 2521. [Google Scholar] [CrossRef]
  7. Bustamante, S.; Manana, M.; Arroyo, A.; Castro, P.; Laso, A.; Martinez, R. Dissolved Gas Analysis Equipment for Online Monitoring of Transformer Oil: A Review. Sensors 2019, 19, 4057. [Google Scholar] [CrossRef]
  8. Jin, L.; Kim, D.; Chan, K.Y.; Abu-Siada, A. Deep Machine Learning based Asset Management Approach for Oil-immersed Power Transformers using Dissolved Gas Analysis. IEEE Access 2024, 12, 27794–27809. [Google Scholar] [CrossRef]
  9. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, Palo Alto, CA, USA, 2–9 February 2021; Volume 35, pp. 11106–11114. [Google Scholar] [CrossRef]
  10. Christina, A.J.; Salam, M.; Rahman, Q.; Wen, F.; Ang, S.; Voon, W. Causes of Transformer Tailures and Diagnostic Methods—A Review. Renew. Sustain. Energy Rev. 2018, 82, 1442–1456. [Google Scholar] [CrossRef]
  11. Kong, X.; Cai, B.; Yu, Y.; Yang, J.; Wang, B.; Liu, Z.; Shao, X.; Yang, C. Intelligent Diagnosis Method for Early Faults of Electric-Hydraulic Control System based on Residual Analysis. Reliab. Eng. Syst. Saf. 2025, 261, 111142. [Google Scholar] [CrossRef]
  12. Zhang, H.; Sun, J.; Hou, K.; Li, Q.; Liu, H. Improved Information Entropy Weighted Vague Support Vector Machine Method for Transformer Fault Diagnosis. High Volt. 2022, 7, 510–522. [Google Scholar] [CrossRef]
  13. Li, J.; Zhang, Q.; Wang, K.; Wang, J.; Zhou, T.; Zhang, Y. Optimal Dissolved Gas Ratios Selected by Genetic Algorithm for Power Transformer Tault Diagnosis based on Support Vector Machine. IEEE Trans. Dielectr. Electr. Insul. 2016, 23, 1198–1206. [Google Scholar] [CrossRef]
  14. Yang, Y.; Liu, A.; Xin, H.; Wang, J. Fault Early Warning of Wind Turbine Gearbox based on Multi-input Support Vector Regression and Improved Ant Lion Optimization. Wind Energy 2021, 24, 812–832. [Google Scholar] [CrossRef]
  15. Yan, W.; Lu, C.; Liu, Y.; Zhang, X.; Zhang, H. An Energy Data-Driven Approach for Operating Status Recognition of Machine Tools Based on Deep Learning. Sensors 2022, 22, 6628. [Google Scholar] [CrossRef]
  16. Tan, Y.; Zhao, G. Transfer Learning with Long Short-Term Memory Network for State-of-Health Prediction of Lithium-Ion Batteries. IEEE Trans. Ind. Electron. 2019, 67, 8723–8731. [Google Scholar] [CrossRef]
  17. Mirowski, P.; Lecun, Y. Statistical Machine Learning and Dissolved Gas Analysis: A Review. IEEE Trans. Power Deliv. 2012, 27, 1791–1799. [Google Scholar] [CrossRef]
  18. Baruah, N.; Maharana, M.; Nayak, S.K. Performance Analysis of Vegetable Oil-based Nanofluids used in Transformers. IET Sci. Meas. Technol. 2019, 13, 995–1002. [Google Scholar] [CrossRef]
  19. Wu, Z.; Huang, N.E. Ensemble Empirical Mode Decomposition: A Noise-Assisted Data Analysis Method. Adv. Adapt. Data Anal. 2008, 1, 1–41. [Google Scholar] [CrossRef]
  20. Xue, W.; Dai, X.; Zhu, J.; Luo, Y.; Yang, Y.A. Noise Suppression Method of Ground Penetrating Radar Based on EEMD and Permutation Entropy. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1625–1629. [Google Scholar] [CrossRef]
  21. Wang, N.; Li, W.; Li, J.; Li, X.; Gong, X. Prediction of Dissolved Gas Content in Transformer Oil Using the Improved SVR Model. IEEE Trans. Appl. Supercond. 2024, 34, 9002804. [Google Scholar] [CrossRef]
  22. Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process. 2014, 62, 531–544. [Google Scholar] [CrossRef]
  23. Ding, C.; Ding, Q.C.; Feng, L.; Wang, Z.L. Prediction Model of Dissolved Gas in Transformer Oil Based on VMD-SMA-LSSVM. IEEJ Trans. Electr. Electron. Eng. 2022, 17, 1432–1440. [Google Scholar] [CrossRef]
  24. Mahmoud, S.A.; Ahmed, E.; Maged, S.A.; Sami, E.F. Parameter Estimation of Proton Exchange Membrane Fuel Cells using Chaotic Newton-Raphson-based Optimizer. Res. Eng. 2024, 24, 103369. [Google Scholar] [CrossRef]
  25. Ahmadianfar, I.; Bozorg-Haddad, O.; Chu, X. Gradient-based Optimizer: A New Metaheuristic Optimization Algorithm. Inf. Sci. 2020, 540, 131–159. [Google Scholar] [CrossRef]
  26. Sowmya, R.; Premkumar, M.; Jangir, P. Newton Raphson-based Optimizer: A New Population-based Metaheuristic Algorithm for Continuous Optimization Problems. Eng. Appl. Artif. Intel. 2024, 128, 107532. [Google Scholar] [CrossRef]
  27. Huang, C.-J.; Kuo, P.-H. A Deep CNN-LSTM Model for Particulate Matter (PM2.5) Forecasting in Smart Cities. Sensors 2018, 18, 2220. [Google Scholar] [CrossRef]
  28. Graves, A.; Schmidhuber, J. Framewise Phoneme Classification with Bidirectional LSTM Networks. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
  29. Van Houdt, G.; Mosquera, C.; Nápoles, G. A Review on the Long Short-Term Memory Model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
  30. Ghaffarian, S.; Valente, J.; Van Der Voort, M.; Tekinerdogan, B. Effect of Attention Mechanism in Deep Learning-Based Remote Sensing Image Processing: A Systematic Literature Review. Remote Sens. 2021, 13, 2965. [Google Scholar] [CrossRef]
  31. Zhou, D.; Liu, Y.; Wang, X.; Wang, F.; Jia, Y. Combined Ultra-Short-Term Photovoltaic Power Prediction based on CEEMDAN Decomposition and RIME Optimized AM-TCN-BiLSTM. Energy 2021, 318, 134847. [Google Scholar] [CrossRef]
Figure 1. Variation in central frequency with different K values.
Figure 1. Variation in central frequency with different K values.
Sensors 25 05182 g001
Figure 2. Variation in central frequency with different α values.
Figure 2. Variation in central frequency with different α values.
Sensors 25 05182 g002
Figure 3. Flowchart of the NRBO method.
Figure 3. Flowchart of the NRBO method.
Sensors 25 05182 g003
Figure 4. LSTM structure.
Figure 4. LSTM structure.
Sensors 25 05182 g004
Figure 5. BiLSTM structure.
Figure 5. BiLSTM structure.
Sensors 25 05182 g005
Figure 6. The attention mechanism model.
Figure 6. The attention mechanism model.
Sensors 25 05182 g006
Figure 7. The AM-BiLSTM framework.
Figure 7. The AM-BiLSTM framework.
Sensors 25 05182 g007
Figure 8. NRBO-VMD-AM-BiLSTM prediction flowchart.
Figure 8. NRBO-VMD-AM-BiLSTM prediction flowchart.
Sensors 25 05182 g008
Figure 9. Framework of the proposed prediction model.
Figure 9. Framework of the proposed prediction model.
Sensors 25 05182 g009
Figure 10. H2 raw concentration values.
Figure 10. H2 raw concentration values.
Sensors 25 05182 g010
Figure 11. Comparison of minimum envelope entropy values under different population sizes.
Figure 11. Comparison of minimum envelope entropy values under different population sizes.
Sensors 25 05182 g011
Figure 12. NRBO-optimized central frequencies of VMD modes.
Figure 12. NRBO-optimized central frequencies of VMD modes.
Sensors 25 05182 g012
Figure 13. H2 modal components.
Figure 13. H2 modal components.
Sensors 25 05182 g013
Figure 14. Fitness value convergence comparison of optimization algorithms.
Figure 14. Fitness value convergence comparison of optimization algorithms.
Sensors 25 05182 g014
Figure 15. Error value convergence comparison of optimization algorithms.
Figure 15. Error value convergence comparison of optimization algorithms.
Sensors 25 05182 g015
Figure 16. The predicted modal components of H2.
Figure 16. The predicted modal components of H2.
Sensors 25 05182 g016
Figure 17. The prediction of H2 concentrations.
Figure 17. The prediction of H2 concentrations.
Sensors 25 05182 g017
Figure 18. Multivariate gas concentration forecasting results. (a) CH4 concentration. (b) CO concentration. (c) total hydrocarbons concentration.
Figure 18. Multivariate gas concentration forecasting results. (a) CH4 concentration. (b) CO concentration. (c) total hydrocarbons concentration.
Sensors 25 05182 g018aSensors 25 05182 g018b
Figure 19. Comparison of prediction errors across gas types. (a) MAPE. (b) RMSE.
Figure 19. Comparison of prediction errors across gas types. (a) MAPE. (b) RMSE.
Sensors 25 05182 g019
Figure 20. Total hydrocarbon concentration prediction results.
Figure 20. Total hydrocarbon concentration prediction results.
Sensors 25 05182 g020
Table 1. Hyperparameter values under different optimization algorithms.
Table 1. Hyperparameter values under different optimization algorithms.
Optimization AlgorithmLearning RateNeuron CountTraining Epochs
SMA0.004193250
GA0.007724452
SSA0.008013771
PSO0.004075674
NRBO0.001821986
Table 2. Comparative prediction results and error metrics for different models.
Table 2. Comparative prediction results and error metrics for different models.
ModelRMSE/(µL/L)
(Step 1)
MAPE/%
(Step 1)
RMSE/(µL/L)
(Step 2)
MAPE/%
(Step 2)
LSTM1.553.312.395.34
BiLSTM1.32.381.844.65
NRBO-BiLSTM1.022.162.014.46
NRBO-BiGRU1.11.991.564.11
AM-NRBO-BiLSTM1.041.541.774.31
VMD-AM-BiLSTM0.941.511.343.69
VMD-AM-NRBO-BiGRU0.31.351.183.72
NRBO-EEMD-AM-BiLSTM0.280.861.163.55
NRBO-VMD-AM-BiLSTM0.1020.470.912.52
Table 3. DGA attention values.
Table 3. DGA attention values.
DateDGA (µL/L)
H2COCO2CH4C2H6C2H4C2H2Total Hydrocarbon
17 September64.25951.652581.3926.3314.0122.060.0062.4
18 September101.33816.344222.2474.0523.1780.900.94179.66
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, N.; Li, W.; Li, X. Optimized NRBO-VMD-AM-BiLSTM Hybrid Architecture for Enhanced Dissolved Gas Concentration Prediction in Transformer Oil Soft Sensors. Sensors 2025, 25, 5182. https://doi.org/10.3390/s25165182

AMA Style

Wang N, Li W, Li X. Optimized NRBO-VMD-AM-BiLSTM Hybrid Architecture for Enhanced Dissolved Gas Concentration Prediction in Transformer Oil Soft Sensors. Sensors. 2025; 25(16):5182. https://doi.org/10.3390/s25165182

Chicago/Turabian Style

Wang, Nana, Wenyi Li, and Xiaolong Li. 2025. "Optimized NRBO-VMD-AM-BiLSTM Hybrid Architecture for Enhanced Dissolved Gas Concentration Prediction in Transformer Oil Soft Sensors" Sensors 25, no. 16: 5182. https://doi.org/10.3390/s25165182

APA Style

Wang, N., Li, W., & Li, X. (2025). Optimized NRBO-VMD-AM-BiLSTM Hybrid Architecture for Enhanced Dissolved Gas Concentration Prediction in Transformer Oil Soft Sensors. Sensors, 25(16), 5182. https://doi.org/10.3390/s25165182

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop