Research on Prediction of Dissolved Gas Concentration in a Transformer Based on Dempster–Shafer Evidence Theory-Optimized Ensemble Learning

Zhang, Pan; Hu, Kang; Yang, Yuting; Yi, Guowei; Zhang, Xianya; Peng, Runze; Liu, Jiaqi

doi:10.3390/electronics14071266

Open AccessArticle

Research on Prediction of Dissolved Gas Concentration in a Transformer Based on Dempster–Shafer Evidence Theory-Optimized Ensemble Learning

by

Pan Zhang

¹,

Kang Hu

¹,

Yuting Yang

¹,

Guowei Yi

¹,

Xianya Zhang

¹,

Runze Peng

^2,*

and

Jiaqi Liu

²

¹

State Grid Hubei Electric Power Co., Ltd., Xiaogan Power Supply Company, Xiaogan 432000, China

²

School of Computer Science and Engineering, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(7), 1266; https://doi.org/10.3390/electronics14071266

Submission received: 14 February 2025 / Revised: 21 March 2025 / Accepted: 22 March 2025 / Published: 24 March 2025

(This article belongs to the Special Issue Artificial Intelligence Applications in Electrical and Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

The variation in dissolved gas concentration in the transformer serves as a crucial indicator for assessing the health status and potential faults of the transformer. However, traditional models and existing machine learning and deep learning models exhibit limitations when applied to real-world scenarios in power systems, lacking adaptability and failing to meet the requirements for accuracy and efficiency of prediction in practical applications. This paper proposes a Dempster–Shafer evidence theory-optimized Bagging ensemble learning model, aiming to improve the accuracy and stability of dissolved gas concentration prediction in transformers. By incorporating Dempster–Shafer evidence theory for the fusion of base learners and optimizing the basic probability distribution parameters by using the sequential least squares programming algorithm, this model significantly improves the adaptability and robustness of prediction. The experimental results show that compared to the ordinary Bagging method and the SARIMA model, the overall mean squared error of the Bagging prediction results optimized by the Dempster–Shafer evidence theory is only 22% of the mean square error of the Bagging prediction results and 38% of the mean square error of the SARIMA prediction results.

Keywords:

dissolved gas concentration; Dempster–Shafer evidence theory; Bagging; sequential least squares programming algorithm; mean square error

1. Introduction

Transformers, as core equipment in the power system, primarily serve to transport electricity after voltage boosting at power plants and deliver it to residential users after voltage reduction at substations. They ensure efficient voltage regulation and power quality control, and their safety is crucial to the normal operation of the entire grid system. Therefore, timely fault diagnosis of transformers is key to ensuring the normal operation of the grid system [1]. Although there are various causes of transformer failures, such as insulation aging or damage, overload operation, and natural disasters like lightning strikes, most failures will lead to temperature changes in the transformer, such as electrical faults [2], e.g., winding short circuits and transformer external short circuits, and thermal faults [3], e.g., winding overheating, core overheating. As the temperature of the transformer changes, the transformer insulating oil will react accordingly, producing various gases that dissolve in the insulating oil. Abnormal concentration changes of gases such as hydrogen, acetylene, and methane are often important early warning signals of faults. Therefore, methods such as dissolved gas analysis have emerged as important ways to assess the internal operating state and health of the transformer [4].

In the current power system, the health monitoring and fault warning of the transformer is inseparable from the prediction and analysis of dissolved gas concentration. Relying solely on real-time monitoring data for fault diagnosis may result in some irretrievable losses [5]. With the development and widespread application of artificial intelligence, by analyzing the historical operation data and the variation law of gas concentration, predicting the concentration level of dissolved gas in the future can not only discover the potential risks in advance, but also provide a scientific basis for the operation and maintenance strategy, so as to extend the service life of the transformer, reduce the failure rate of the equipment, and ensure the safe and stable operation of the power grid. Therefore, the research of dissolved gas concentration prediction technology has become one of the hot directions in the field of smart grid and power equipment management [6].

Currently, there has been a lot of research on transformer dissolved gas prediction, but most of it simply applies other prediction algorithms to the context of transformer dissolved gases without considering various actual operating conditions of the transformer, such as the magnitude of data that can be collected during actual operation and the sensitivity of the power grid system to computational complexity. The prediction of dissolved gas concentration relies on historical operational data of the transformer. However, in practical applications, the collected data often fluctuate with minimal or no fluctuations and may contain noise and missing values, thereby failing to provide a real, effective, and sufficient dataset for gas concentration prediction research [7]. At present, many researchers use traditional statistical methods or single machine learning models for prediction [8]. These models typically struggle to capture the nonlinear and non-stationary characteristics in the time series of dissolved gas concentration, resulting in limited prediction accuracy. Moreover, due to the differences in the operating environments and conditions of different transformers, the applicability of generic models is constrained, making it difficult to meet the demands of complex real-world scenarios. Additionally, the requirement for computational efficiency in the practical application of transformer gas concentration prediction is high, but with the improvement of the intelligent level of the power grid, the overall calculation consumption of the system also increases. However, some prediction methods have high computational complexity and make it difficult to meet the demands of computational efficiency in practical applications. Therefore, the efficiency improvement of model training and reasoning processes remains challenging.

Considering the aforementioned challenges of transformer dissolved gas concentration prediction, this paper proposes a transformer dissolved gas concentration prediction model based on Dempster–Shafer (D-S) evidence theory-optimized ensemble learning, aiming to solve the problem of poor prediction adaptability and stability of prediction models under the limitation of computational resources and dataset quality scale in real power systems. The main contributions are as follows:

This paper establishes a Bagging model using decision trees as the base learner for predicting dissolved gas concentrations in transformers, and applies D-S evidence theory to the aggregation layer of the Bagging model, optimizing the flexibility and stability of the existing Bagging model’s predictions. It is noteworthy that both Bagging models and D-S evidence theory have been extensively researched, but this paper is the first to combine them to construct an optimized Bagging model.
The fusion effect of D-S evidence theory depends on the values of the basic probability assignment, but few studies have optimized the basic probability assignment. Therefore, this paper considers using the sequential least squares programming algorithm to optimize the basic probability assignment values in D-S evidence theory, with the goal of minimizing the mean square error as the optimal solution for the objective function. The results show that this optimization operation for D-S evidence theory also improves the prediction accuracy of the entire prediction model.

The work is organized as follows: Section 2 introduces the recent research related to the prediction of dissolved gas concentration in transformers. Section 3 introduces the Bagging model based on D-S evidence theory. Section 4 is the comparative analysis of the experiment of the Bagging model based on D-S evidence theory and other methods. Section 5 summarizes all the work of this paper and outlines future expectations.

2. Related Work

At present, research on the prediction of dissolved gas concentration in transformers has been very comprehensive. This paper divides it into three categories: prediction models based on traditional prediction methods, prediction models based on machine learning, and prediction models based on deep learning.

The traditional prediction models mainly construct a mathematical model to predict the future concentration trend by analyzing the historical data on dissolved gas concentration. Among them, the autoregressive integrated moving average model combines the ideas of autoregressive, moving average, and difference, which is suitable for the short-term prediction of gas concentration, and performs better when the trend of change is relatively stable. Based on the autoregressive integrated moving average model, in Ref. [9], the concentration data were smoothed to improve the prediction effect, but found it has limited capability in handling nonlinear characteristics and was sensitive to model parameters. The seasonal time-series model incorporated seasonal components, making it suitable for sequences with periodic changes, but it has limitations under complex multivariate correlations. Considering the correlation between gas concentration and external seasonal factors, Patil M [10] proposed a hybrid algorithm model composed of seasonal autoregressive integrated moving average, convolutional neural network, and gated recurrent unit for gas concentration prediction. In another study [11], the researchers used non-parametric kernel density estimation (NKDE) to optimize the autoregressive integrated moving average (ARIMA) prediction model, improving the accuracy of ARIMA in predicting dissolved gas concentrations. However, these models based on traditional prediction methods cannot effectively deal with the nonlinearity of data and have insufficient adaptability to practical application scenarios.

With the development of data-driven technology, machine learning has emerged as an important tool for dissolved gas concentration prediction. Compared with traditional methods, machine learning has significant advantages in dealing with nonlinear relationships, high-dimensional data, and complex patterns. Mahamdi Y proposed a transformer fault prediction model based on Naive Bayes and decision trees in [12], which performed well in predicting six types of transformer faults. Ekojono compared the fault diagnosis effects of various machine learning algorithms on dissolved gas analysis in [13], ultimately comparing the random forest model with the Duval Triangle Method (DTM), but did not optimize the machine learning models. Al-Sakini S R [14] optimized various machine learning algorithms and performed model switching under different conditions to combine these models for predicting transformer fault diagnosis. Grey relational analysis was used in Ref. [15] to analyze the correlation between dissolved gas concentration and transformer load, oil temperature, and ambient temperature. A model based on an improved grey wolf optimizer and least squares support vector machine was proposed for the prediction of dissolved gas concentration in transformers, which successfully deals with the influence of uncertain factors in prediction. Zhou X [16] proposed a random forest model based on firefly optimization to predict the content of dissolved gas in transformer oil, aiming to issue a warning in time before further deterioration of the fault. Wang N [17] used integrated empirical mode analysis and a cuckoo search algorithm to optimize the support vector regression model, and proposed a combined prediction model to predict the future concentration of characteristic gases in the transformer. However, the parameter optimization process of these models requires a large amount of computing resources, and the optimization effect does not match it. In addition, Elânio Bezerra F [18] used principal component analysis for dissolved gas data processing, and proposed a nonlinear autoregressive neural network model to reduce the average absolute percentage error of dissolved gas concentration prediction in the transformer. However, this model requires a large number of high-quality historical data for training, and its performance is sensitive to data quality.

In recent years, deep learning has made remarkable progress in the field of prediction, especially in dealing with nonlinear, multivariate, high-dimensional, and time-series data, and has been widely used in the prediction of dissolved gas concentration in transformers. Wang L [19] conducted a simple combined application of machine learning algorithms, using long short-term memory networks to predict the concentration of dissolved gases in the future, and integrating various machine learning algorithms for fault classification. Xing M [20] proposed a prediction model based on the Mish-SN Temporal Convolutional Network (MSTCN) for predicting transformer dissolved gas concentrations, and the results showed an improvement in prediction accuracy. Das S used a deep neural network model in [21] to predict the concentration of corrosive sulfur in transformer insulating oil, thereby reducing the probability of inter-turn faults occurring within the transformer windings. Shao J in [22] combined probability statistics with deep learning to propose a prediction model based on probability density and long short-term memory networks, and used this to predict dissolved gas concentrations and assess transformer conditions. Luo D constructed a topological structure diagram of different gas relationships in [23] and proposed a prediction model based on temporal convolutional networks and graph convolutional networks, improving the prediction error of dissolved gas concentration. Hu C [24] utilized the long short-term memory network from deep learning to predict the concentration of gases in oil, achieving higher prediction accuracy compared to traditional prediction models and machine learning-based approaches. Zhang Y [25] used the Sparrow search algorithm to optimize the parameters of the long short-term memory network model, in order to improve the prediction accuracy of the transformer dissolved gas concentration prediction model. Zhang X [26] combined a genetic algorithm with a long short-term memory network for the prediction of the concentration of dissolved gas in a transformer, which solved the problem of low prediction accuracy caused by parameter selection. After obtaining the relevant variables of dissolved gas analysis, Ref. [27] used the long short-term memory model and the random forest algorithm to predict the dissolved gas concentration. Additionally, Zhang W [28] proposed a Bayesian probabilistic matrix factorization and gated recurrent unit neural network model for the prediction of dissolved gas concentration in transformer oil with missing data, and improved the prediction accuracy to a certain extent. Yang T [29] used the cross-entropy theory to fuse the univariate prediction model based on the temporal convolution network and the multivariate informer prediction model to obtain a combined prediction model, which improves the accuracy and stability of the dissolved gas concentration prediction. However, these deep learning-based models are usually quite complex, so their own training and reference adjustment optimization require relatively more time, which may bring delays and other burdens to the overall operation of the power grid system. Moreover, these models have strict requirements on the scale of data, and the data that can be collected in practical applications may be difficult to meet their needs, resulting in the final prediction effect not meeting expectations.

The prediction model in this paper is designed based on the Bagging model, which is a classic ensemble learning model. Ensemble learning methods have been widely applied in various fields, such as [30] using the Stacked Ensemble Forecasting (SEF) method to build a photovoltaic prediction model in photovoltaic systems, enabling the system to provide accurate photovoltaic predictions. Additionally, Ref. [31] applied ensemble learning to predict landslide natural disasters and compared the advantages and disadvantages of different ensemble strategies. Moreover, there has been research introducing ensemble learning into the power grid system. For example, Ref. [32] uses ensemble learning to combine three machine learning models to adjust the parameters of the power system stabilizer (PSS). Additionally, Ref. [33] proposes a real-time power system state estimation (PSSE) model based on the idea of ensemble learning, using a dense residual neural network as the base learner.

3. Bagging Model Based on D-S Evidence Theory

To improve the prediction accuracy and stability of transformer dissolved gas concentration prediction, this paper proposes a model based on the Bagging model from ensemble learning and uses D-S evidence theory for the fusion of base learners, which improves the accuracy of the ensemble learning model in predicting transformer dissolved gas concentration and better handles the uncertainties caused by sub-datasets, thereby increasing the model’s robustness to noisy data. Moreover, the sequential least squares programming algorithm is used to gradually update the decision variables through iteration, and finally find the optimal solution that satisfies all the constraints. This approach is convenient to deal with complex constraints while improving the convergence speed and calculation efficiency, resulting in a D-S evidence theory-optimized ensemble learning model for transformer dissolved gas concentration prediction. In this chapter, the model is divided into two parts: D-S evidence theory based on sequential least squares programming algorithm optimization and the transformer dissolved gas concentration prediction model based on Bagging.

3.1. D-S Evidence Theory Based on Sequential Least Squares Programming Algorithm Optimization

D-S evidence theory is a theoretical framework for dealing with uncertainty and multi-source information fusion. It provides flexible reasoning tools for complex decision-making problems through belief functions, making it particularly suitable for reasoning under incomplete information. Its core idea is to construct a framework of all possible hypotheses and describe the degree of support for the subsets of hypotheses using basic probability assignment. In the Bagging model, the final prediction result is the result of multiple base models predicting the test set and aggregating multiple prediction results. In a conventional Bagging model, the prediction phase involves averaging the predictions of all base learners or using simple voting or weighted voting to obtain the final prediction result. However, since each base learner is trained on randomly selected subsets of data, it is unstable to use the average directly to determine the final prediction result, and lacks flexibility, leading to inconsistent performance across different scenarios. Therefore, this paper applies D-S evidence theory to the prediction phase of the Bagging model to fuse the prediction results of all base models, thereby obtaining the final prediction result. This approach can better handle the uncertainty introduced by sub-datasets and enhance the robustness of the model to noisy data.

In D-S evidence theory, the basic probability assignment represents the degree of credibility for the occurrence of a specific hypothesis or event. For all possible events, the sum of the basic probability assignment is equal to 1. After applying the D-S evidence theory to the fusion of the prediction results in the Bagging model, the fusion prediction value is calculated as Formula (1):

Y = \sum_{i = 1}^{N} m_{i} * f_{i} (x)

(1)

where

m_{i}

is the basic probability assignment value of the i-th base learner, which satisfies the relationship as Formula (2):

\{\begin{matrix} m_{i} \geq 0 \\ \sum_{i = 1}^{n} m_{i} = 1 \end{matrix}

(2)

In addition, the Dempster rule is used to fuse the prediction results of multiple base models to generate a new probability assignment, as shown in Formula (3):

m_{A B} (c) = \frac{\sum_{a \cap b = c} m_{A} (a) * m_{B} (b)}{1 - K}

(3)

where

m_{A B} (c)

represents the probability distribution after the fusion, the probability distribution value of the result c is expressed after the prediction results a and b of the two models A and B are combined.

m_{A} (a)

denotes the original probability assignment value for result a from model A, and

m_{B} (b)

denotes the original probability assignment value for result b from model B.

1 - K

is a normalization factor, which is used to eliminate the conflict distribution to ensure that the results meet the probability distribution requirements.

K

is the conflict coefficient, when the prediction result of the base model has a large conflict, the D-S theory quantifies the degree of conflict by calculating

K

and normalizes the impact of conflict in the fusion process. This mechanism ensures that the final decision is generated with high credibility, significantly enhancing the robustness of the decision-making process. The calculation of K is shown in Formula (4):

K = \sum_{a \cap b = \emptyset} m_{A} (a) * m_{B} (b)

(4)

The greater the conflict coefficient

K

, the greater the inconsistency between the results of the two models. Since

K

represents the degree of conflict between the two pieces of evidence, it is necessary to eliminate the influence of these conflicts from the final support allocation. Therefore, dividing by

1 - K

normalizes the result, ensuring that the new support allocation satisfies the probability constraints.

It can be seen from Formula (1) that the prediction performance of applying D-S evidence theory to the fusion of Bagging model’s prediction results depends on the basic probability assignment value of each base learner. Therefore, this paper also aims to minimize the prediction mean square error and optimize the basic probability assignment value of each base learner according to the sequential least squares programming algorithm.

The sequential least squares programming algorithm is an algorithm used for constrained optimization problems, primarily designed to deal with nonlinear objective function and constraints. The algorithm gradually updates the decision variables by iteration, which is the basic probability assignment value. In each iteration, the algorithm uses the Taylor expansion of the current point to approximate the objective function and constraint conditions. This linearization allows for effective optimization near the current point. During each iteration, a subproblem is solved to minimize the linearized objective function within the neighborhood of the current point while satisfying the linear constraints. Through continuous iteration and updates, the algorithm ultimately can find an optimal solution that satisfies all the constraints. The sequential least squares programming algorithm is capable of handling complex constraints, including nonlinear and equality constraints, and excels in convergence speed and computational efficiency. In this research, the objective function of the sequential least squares programming algorithm is shown in Formula (5):

f (m) = m i n (\sum_{i = 1}^{n} {(Y_{i} - y_{i})}^{2})

(5)

where

Y_{i}

represents the actual result value, and

y_{i}

represents the aforementioned fusion prediction value. During the optimization process, the rule for updating the basic probability assignment value in each iteration is shown in Formula (6):

m_{k + 1} = m_{k} + α_{k} * d_{k}

(6)

where

α_{k}

represents the step size, and

d_{k}

represents the calculated direction. By optimizing the basic probability assignment value using the sequential least squares programming algorithm, the fusion prediction can not only be closer to the real value, thereby improving the accuracy of prediction, but also reduce the conflict degree

K

, so as to improve the stability and robustness of the fusion prediction.

3.2. Prediction Model of Transformer Dissolved Gas Concentration Based on Bagging

Ensemble learning is a machine learning paradigm that combines multiple weakly supervised models to obtain a better and more comprehensive strongly supervised model. Its main goal is to improve the performance and robustness of the model, overcoming the limitations of individual models. Its fundamental idea is based on the principle of “collective wisdom”, which believes that the combination of multiple models can yield better prediction performance than a single model. In the health monitoring and maintenance of the transformer, dissolved gas analysis is a critical model used to assess the state of the transformer and detect potential faults. The Bagging model in ensemble learning, due to its efficient ensemble characteristics, can significantly improve the prediction accuracy of different gas concentrations. Especially in the case of small sample sizes and high noise levels, it has become a popular model for transformer dissolved gas concentration prediction. This model works by randomly extracting multiple subsets from the original training dataset and independently training the base learner on each subset. Ultimately, the predictions from these base learner models are combined to produce the final result. The core steps of the Bagging model include Bootstrap sampling, that is, randomly extracting samples from the original dataset to generate multiple subsets. The feature is that some data in the sample set may be repeated, and some data may not be sampled. Subsequently, a base learner is trained independently on each sub-dataset, which is a decision tree in this paper. Finally, for the regression tasks, the final prediction is determined by using the average value of the predicted value of the base learner or the voting mechanism. This model can effectively enhance the stability and accuracy of the model while demonstrating strong capabilities in handling noise and outliers. However, due to the randomness of the subset, the prediction performance depends on how the prediction of the base learners are combined.

In the Bagging-based transformer dissolved gas concentration prediction model proposed in this paper, the base learner is a decision tree model. A decision tree is a supervised learning algorithm based on a tree-like structure, capable of handling both classification and regression tasks, so it is suitable for prediction tasks. It divides the dataset through a series of rules, ultimately generating a hierarchical tree structure to predict the value of the sample. The core idea of the decision tree is to recursively select the optimal features and its split points, and divide the dataset into several subsets to ensure each subset is as pure as possible. The goal is to make each leaf node correspond to an approximate continuous value. The process of generating a decision tree includes inputting the feature matrix X and target variable y as the internal and leaf nodes of the decision tree. Then, the optimal splitting is found according to the information gain index, that is, the split point that can minimize the mean square error after splitting each feature selection in the current dataset. Moreover, each recursive dataset is divided into smaller subsets until the stop condition is reached, that is, the dataset cannot be re-divided or the maximum depth of the tree is reached. The calculation of the degree of information gain is shown in Formula (7):

I G (D, A) = H (D) - \sum_{V ϵ V a l u e s (A)} \frac{|D_{v}|}{|D|} * H (D_{v})

(7)

where

H (D)

represents the entropy of dataset

D

, and

H (D_{v})

represents the entropy of the subset divided based on the value v of feature A. The calculation of the mean square error is shown in Formula (8):

M S E (D) = \frac{1}{|D|} * \sum_{i = 1}^{|D|} {(y_{i} - y_{p r e d})}^{2}

(8)

Here,

y_{p r e d}

represents the predicted value of the leaf node. After recursively generating the subtrees, the prediction result is output, which is the average value of the samples in the leaf node.

In general, the process and structure of the transformer dissolved gas concentration prediction model based on D-S evidence theory-optimized ensemble learning proposed in this paper are shown in Figure 1:

The framework in Figure 1 is based on the Bagging model process, which we have enclosed in a pink dashed box. Inside, it consists of three parts: the Bootstrap sampling layer, the base learner training layer, and the aggregation layer, which are enclosed in light green, blue, and cyan dashed boxes, respectively. Unlike the ordinary Bagging model, we use D-S evidence theory for optimization within the aggregation layer, which is also the main contribution of our work. The marked sections in the diagram indicate that there are a total of six steps, and the detailed processes for each step are as follows: The first step is to generate sub-datasets, that is, to randomly extract multiple independent sub-datasets from the original dataset. In the Bagging model, the random data samples in the training set are selected by the substitution model; as a result, some samples may be extracted multiple times, while others may not be extracted at all.

The second step is to train the base learner for each sub-dataset to obtain multiple independent models, that is, to train a decision tree model for each sub-dataset to obtain multiple independent decision tree models. The training process of a decision tree includes selecting split features, traversing all features and their possible split points, and selecting the optimal feature for splitting according to information gain. The tree is then recursively constructed by repeating the process of feature selection and node splitting for each subset until the stopping condition is satisfied. The stopping conditions include the depth of the tree exceeding the preset maximum value, the number of samples in the node being lower than the set threshold, and further splitting not significantly improving the information gain.

The third step is to use the trained base learner to predict the samples and obtain the prediction result, that is, using the multiple decision tree models obtained in step 2 to predict the samples at the same time to obtain the prediction result of different decision tree models. During prediction, starting from the root node, the model traverses down the decision tree based on the eigenvalues, at each node, a decision is made according to the selected feature and its threshold—if the eigenvalues satisfy a certain condition, the model will go downward along one path, otherwise, it will follow another path until they reach the leaf node and return the predicted value of the leaf node.

The fourth step is to calculate the basic probability assignment value for the prediction result of each base learner. In the Bagging-based transformer dissolved gas concentration prediction model, the initial basic probability assignment value is set as the average value. The basic probability assignment value is then iteratively updated by using an optimization algorithm to minimize the mean square error of the prediction result. Specifically, the aforementioned optimization algorithm is the sequential least squares programming algorithm, whose objective function is given by Formula 5. During the optimization process, the basic probability assignment value is updated in each iteration according to the rule defined by Formula 6. The core idea is to construct an objective function based on the mean square error, impose constraints on the sum of the basic probability assignment value, and iteratively update the basic probability assignment value to find the optimal solution.

The fifth step is to use the D-S evidence theory to fuse the prediction results of all base learners and generate the final prediction value. After obtaining the fusion prediction result, the model evaluates the outcome. If the mean square error of the prediction result tends to converge, the process proceeds to step six; otherwise, it returns to step four to recalculate the basic probability assignment. Specifically, in D-S evidence theory, the basic probability assignment represents the degree of belief of the occurrence of a hypothesis or event, and the sum of base probability assignment for all possible events is equal to 1. After applying the D-S evidence theory to the fusion of the prediction results of the Bagging model, the fusion prediction value is calculated to meet Formula 1. The application of D-S evidence theory in the code provides an effective mechanism for result fusion in the Bagging model. By weighting the predictions of different models, it better integrates information and reduces prediction uncertainty. Moreover, by flexibly setting the trust assignment value, the prediction performance of the model can be improved to some certain extent, especially effectively solving the problem of unstable fusion prediction results caused by the randomness of the ensemble learning sub-dataset.

The sixth step involves using the D-S evidence theory-optimized ensemble learning model to predict transformer dissolved gas concentration, obtaining the prediction result, and comparing it with the actual result to analyze the accuracy of the model. Specifically, the dataset to be predicted is generated by sub-datasets, and predictions are made for each sub-dataset by using the base learners. After obtaining the prediction results from each base learner, based on the optimal basic probability distribution obtained by multiple iterations of step 4 and step 5, the prediction results of all base learners are fused by D-S evidence theory. Finally, the prediction result obtained by fusion is compared with the actual value.

4. Results of Experiment

In order to demonstrate the effectiveness of the proposed D-S evidence theory-optimized ensemble learning model for predicting transformer dissolved gas concentration and its ability to enhance the performance of ensemble learning algorithms, we conducted experiments on a machine equipped with an Intel Core i5-13500H CPU, 16GB RAM, and running Windows 11, using Python 3.9 and the Scikit-learn machine learning library. In actual power systems, the probability of a fault occurring is very low. Therefore, most of the data that sensors in transformers can collect remain unchanged, and these data cannot be used for training and testing prediction models. Only during the periods before and after a fault occurs will the data show fluctuations. Therefore, in practical scenarios, the data on dissolved gas concentrations in transformers that can be used for training prediction models are very limited. The dataset used in this paper comes from a transformer at a certain substation. This dataset consists of small sample time-series data of dissolved gas concentrations collected at non-uniform time intervals from 10 July 2020 to 28 February 2022. The entire dataset has four-dimensional features, corresponding to four types of gases, with each feature dimension including 21 data points. The four types of gases are H2, C2H6, C2H4, and CH4. It should be noted that [34] proposed a seasonal autoregressive integrated moving average (SARIMA) model to address the issue of time-series prediction of dissolved gas concentration in transformers, and this method is the most accurate and stable compared to the autoregressive (AR) model and the long short-term memory (LSTM) model. Therefore, we conducted a comparative analysis between the method proposed in our research and the method proposed in [34]. It is important to note that the experimental data used in this paper are specifically intended to demonstrate the effectiveness and superiority of the proposed model. However, this model can also be extended to higher-dimensional data, such as datasets involving more types of gases, as well as larger-scale data. This paper used the first 20 data points from the aforementioned small sample dataset as the training set to train an ensemble learning model optimized based on D-S evidence theory. The trained model is then used to predict the gas concentrations at the next moment, and the last data point of the dataset is used to compare and analyze the accuracy of the model’s predictions. This represents the practical application of using recent dissolved gas concentration data to predict the dissolved gas concentration data of the near future.

Based on the above experimental settings, the results are shown in Table 1. The accuracy of each model for a certain gas prediction is shown in Formula (9):

P_{i} = 1 - \frac{|Y_{i}^{t r u e} - Y_{i}|}{Y_{i}^{t r u e}}

(9)

And the overall prediction accuracy of each model is shown in Formula (10):

P = \frac{1}{C} \sum_{i = 1}^{C} 1 - \frac{|Y_{i}^{t r u e} - Y_{i}|}{Y_{i}^{t r u e}}

(10)

where C represents the number of gas categories,

Y_{i}^{t r u e}

is the actual concentration of the i-th gas, and

Y_{i}

is the predicted value of the model for the i-th gas.

Therefore, based on the aforementioned formula, we calculated the prediction accuracy for both the ensemble learning model and the D-S evidence theory-optimized ensemble learning model. Specifically, for H2 gas prediction, the accuracy of the ensemble learning model is approximately 98.7%, while the D-S evidence theory-optimized ensemble learning model achieves an accuracy of about 99%, and the SARIMA model has an accuracy of around 93.2%. For C2H6 gas prediction, the ensemble learning model achieves an accuracy of approximately 98.1%, the D-S evidence theory-optimized ensemble learning model reaches about 99.7%, and the SARIMA model attains an accuracy of about 99.5%. In the case of C2H4 gas prediction, the ensemble learning model has an accuracy of approximately 99.6%, the D-S evidence theory-optimized ensemble learning model achieves about 99.9%, and the SARIMA model has an accuracy of around 99.3%. For CH4 gas prediction, the ensemble learning model achieves an accuracy of approximately 97.7%, the D-S evidence theory-optimized ensemble learning model reaches about 96.8%, and the SARIMA model attains an accuracy of about 97.8%. As shown in Table 2, although the D-S evidence theory-optimized ensemble learning model slightly underperforms the ensemble learning and SARIMA model in predicting CH4 gas, it demonstrates improvements in the prediction accuracy for other gases and overall performance.

Additionally, for intuitive comparison, we present area charts and bar charts based on the data from Table 1 and Table 2, as shown in Figure 2 and Figure 3, respectively. It can be seen from Figure 2 that the area corresponding to the D-S evidence theory-optimized Bagging model for predicting transformer dissolved gas concentrations is significantly smaller than that of the standard Bagging model and the SARIMA model, which means that the overall prediction error of the D-S evidence theory-optimized Bagging model is notably reduced compared to the other two models. Furthermore, Figure 3 shows that the D-S evidence theory-optimized Bagging model achieves higher accuracy in predicting H2, C2H6, and C2H4 gases compared to the other two models. Although its accuracy is slightly lower for CH4 gas prediction, the overall average prediction accuracy is improved compared to the other models, with an increase of approximately 1.4% over the SARIMA model. Therefore, these results demonstrate the effectiveness of the proposed D-S evidence theory-optimized ensemble learning model for predicting transformer dissolved gas concentrations, as well as the enhancement in accuracy and robustness over the original ensemble learning model.

Because the objective function of the sequential least squares programming algorithm in the proposed D-S evidence theory-optimized Bagging model for predicting transformer dissolved gas concentrations is to minimize the overall mean square error, this paper calculates the mean square error values of the prediction results for the D-S evidence theory-optimized Bagging model, the standard Bagging model, and the SARIMA model based on the experimental results, thereby indirectly verifying the effectiveness of the optimization algorithm.

The mean square error calculated results are shown in Table 3. It can be observed that the overall mean square error of the D-S evidence theory-optimized Bagging model is only about 22% of the standard Bagging model and approximately 38% of the SARIMA model. This indicates that the D-S evidence theory-optimized Bagging model significantly reduces the overall mean square error of the prediction results. At the same time, it demonstrates the stability of the proposed D-S evidence theory-optimized ensemble learning model for predicting transformer dissolved gas concentration and the effectiveness of the sequential least squares programming optimization algorithm.

In addition, based on the data from Table 3, we draw the radar chart illustrating the mean square error of the Bagging model, the D-S evidence theory-optimized Bagging model, and the SARIMA model for predicting transformer dissolved gas concentration, as shown in Figure 4.

Similarly, it can be observed that the D-S evidence theory-optimized Bagging model is only slightly higher than the other two models in terms of mean squared error for CH4 gas prediction, and the extent of this is far less than the reduction in mean square error achieved by this model for the other three gases. Furthermore, from the perspective of the total MSE, the standard Bagging model has the highest sum of MSE, followed by the SARIMA model, while the D-S evidence theory-optimized Bagging model has the lowest sum of MSE. Therefore, the proposed D-S evidence theory-optimized Bagging model not only improves the standard Bagging model but also significantly enhances the accuracy and stability compared to existing models for predicting transformer dissolved gas concentrations.

5. Conclusions

In actual power systems, due to limitations in computational resources and dataset size, existing prediction models often exhibit poor accuracy and stability. To address this shortcoming, this paper proposes a Bagging ensemble learning model optimized based on D-S evidence theory. By integrating the prediction results of D-S evidence theory and base learners, and using the Sequential Least Squares Programming algorithm to optimize the basic probability assignment parameters, the model’s robustness and adaptability are effectively enhanced, improving the accuracy and stability of transformer dissolved gas concentration predictions. This paper combines D-S evidence theory with Bagging for transformer gas concentration prediction, enhancing the accuracy of the Bagging model at the aggregation layer. Compared to traditional Bagging and SARIMA models, this method demonstrates superior overall prediction accuracy and mean squared error, particularly showing significant performance advantages in the prediction of H2, C2H6, and C2H4 gases. Moreover, this paper applies D-S evidence theory to optimize the decision layer of classification models, which is also suitable for fault prediction of other equipment such as circuit breakers and switchgear, demonstrating broad applicability. When the number of gas types in the data increases, D-S evidence theory consumes a significant amount of computational resources due to the calculation of its basic probability values, resulting in an excessively long overall training time for the prediction model. We will further optimize and address this issue in subsequent work.

Author Contributions

P.Z., K.H. and Y.Y. designed the project and drafted the manuscript, as well as collected the data. G.Y., X.Z., R.P. and J.L. wrote the code and performed the analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Project of State Grid Hubei Electric Power Co., Ltd. Xiaogan Power Supply Company (research on proactive warning method of abnormal operation of substation equipment based on deep learning). The contract number is SGHBXG00JXJS2310925.

Data Availability Statement

The data presented in this study are available in this article.

Conflicts of Interest

Authors Pan Zhang, Kang Hu, Yuting Yang, Guowei Yi and Xianya Zhang were employed by the company State Grid Hubei Electric Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Youssef, M.M.; Ibrahim, R.A.; Desouki, H.; Moustafa, M.M.Z. An overview on condition monitoring & health assessment techniques for distribution transformers. In Proceedings of the 2022 6th International Conference on Green Energy and Applications (ICGEA), Singapore, 4–6 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 187–192. [Google Scholar]
Xian, R.; Wang, L.; Zhang, B.; Li, J.; Xian, R.; Li, J. Identification method of interturn short circuit fault for distribution transformer based on power loss variation. IEEE Trans. Ind. Inform. 2023, 20, 2444–2454. [Google Scholar] [CrossRef]
Shiravand, V.; Faiz, J.; Samimi, M.H.; Djamali, M. Improving the transformer thermal modeling by considering additional thermal points. Int. J. Electr. Power Energy Syst. 2021, 128, 106748. [Google Scholar] [CrossRef]
Saroja, S.; Haseena, S.; Madavan, R. Dissolved gas analysis of transformer: An approach based on ML and MCDM. IEEE Trans. Dielectr. Electr. Insul. 2023, 30, 2429–2438. [Google Scholar]
Ramesh, J.; Shahriar, S.; Al-Ali, A.R.; Osman, A.; Shaaban, M.F. Machine learning approach for smart distribution transformers load monitoring and management system. Energies 2022, 15, 7981. [Google Scholar] [CrossRef]
Wani, S.A.; Rana, A.S.; Sohail, S.; Rahman, O.; Parveen, S.; Khan, S.A. Advances in DGA based condition monitoring of transformers: A review. Renew. Sustain. Energy Rev. 2021, 149, 111347. [Google Scholar] [CrossRef]
Odongo, G.; Musabe, R.; Hanyurwimfura, D. A multinomial DGA classifier for incipient fault detection in oil-impregnated power transformers. Algorithms 2021, 14, 128. [Google Scholar] [CrossRef]
Thango, B.A. Dissolved gas analysis and application of artificial intelligence technique for fault diagnosis in power transformers: A South African case study. Energies 2022, 15, 9030. [Google Scholar] [CrossRef]
Xing, Z.; He, Y.; Wang, X.; Shao, K.; Duan, J. VMD-IARIMA-Based Time-Series Forecasting Model and its Application in Dissolved Gas Analysis. IEEE Trans. Dielectr. Electr. Insul. 2022, 30, 802–811. [Google Scholar] [CrossRef]
Patil, M.; Paramane, A.; Das, S.; Rao, U.M.; Rozga, P. Hybrid Algorithm for Dynamic Fault Prediction of HVDC Converter Transformer Using DGA Data. IEEE Trans. Dielectr. Electr. Insul. 2024, 31, 2128–2135. [Google Scholar]
Wang, T.L.; Yang, J.G.; Li, J.S.; Li, Y.; Wu, P.; Lin, J.S.; Liang, J.B. Time Series Prediction of Dissolved Gas Concentrations in Transformer Oil using ARIMA-NKDE Method. In Proceedings of the 2022 2nd International Conference on Electrical Engineering and Control Science (IC2ECS), Nanjing, China, 16–18 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 129–133. [Google Scholar]
Mahamdi, Y.; Boubakeur, A.; Mekhaldi, A.; Benmahamed, Y. Power transformer fault prediction using naive Bayes and decision tree based on dissolved gas analysis. ENP Eng. Sci. J. 2022, 2, 1–5. [Google Scholar]
Ekojono; Prasojo, R.A.; Apriyani, M.E.; Rahmanto, A.N. Investigation on machine learning algorithms to support transformer dissolved gas analysis fault identification. Electr. Eng. 2022, 104, 3037–3047. [Google Scholar]
Al-Sakini, S.R.; Bilal, G.A.; Sadiq, A.T.; Al-Maliki, W.A.K. Dissolved Gas Analysis for Fault Prediction in Power Transformers Using Machine Learning Techniques. Appl. Sci. 2024, 15, 118. [Google Scholar] [CrossRef]
Zeng, B.; Guo, J.; Zhang, F.; Zhu, W.; Xiao, Z.; Huang, S.; Fan, P. Prediction model for dissolved gas concentration in transformer oil based on modified grey wolf optimizer and LSSVM with grey relational analysis and empirical mode decomposition. Energies 2020, 13, 422. [Google Scholar] [CrossRef]
Zhou, X.; Tian, T.; He, N.; Ma, Y.; Liu, W.; Yan, Z.; Luo, Y.; Li, X.; Ni, H. Prediction Method of Dissolved Gas in Transformer Oil Based on Firefly Algorithm-Random Forest. In Proceedings of the 2022 Asia Power and Electrical Technology Conference (APET), Shanghai, China, 11–13 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 51–55. [Google Scholar]
Wang, N.; Li, W.; Li, J.; Li, X.; Gong, X. Prediction of Dissolved Gas Content in Transformer Oil Using the Improved SVR Model. IEEE Trans. Appl. Supercond. 2024, 34, 9002804. [Google Scholar]
Elânio Bezerra, F.; Zemuner Garcia, F.A.; Ikuyo Nabeta, S.; Martha de Souza, G.F.; Chabu, I.E.; Santos, J.C.; Junior, S.N.; Pereira, F.H. Wavelet-like transform to optimize the order of an autoregressive neural network model to predict the dissolved gas concentration in power transformer oil from sensor data. Sensors 2020, 20, 2730. [Google Scholar] [CrossRef]
Wang, L.; Littler, T.; Liu, X. Dynamic incipient fault forecasting for power transformers using an LSTM model. IEEE Trans. Dielectr. Electr. Insul. 2023, 30, 1353–1361. [Google Scholar]
Xing, M.; Ding, W.; Li, H.; Zhang, T. A power transformer fault prediction method through temporal convolutional network on dissolved gas chromatography data. Secur. Commun. Netw. 2022, 2022, 5357412. [Google Scholar]
Das, S.; Paramane, A.; Rao, U.M.; Chatterjee, S.; Kumar, K.S. Corrosive dibenzyl disulfide concentration prediction in transformer oil using deep neural network. IEEE Trans. Dielectr. Electr. Insul. 2023, 30, 1608–1615. [Google Scholar]
Shao, J.; Wang, J.; Pan, X.; Wang, R.; Liu, S.; Jin, Z.; Wang, Z. Probabilistic Modeling of Dissolved Gas Concentration for Predicting Operating Status of Oil-Immersed Transformers. IEEE Trans. Ind. Inform. 2024, 21, 1339–1348. [Google Scholar]
Luo, D.; Fang, J.; He, H.; Lee, W.J.; Zhang, Z.; Zai, H.; Chen, W.; Zhang, K. Prediction for dissolved gas in power transformer oil based on TCN and GCN. IEEE Trans. Ind. Appl. 2022, 58, 7818–7826. [Google Scholar]
Hu, C.; Zhong, Y.; Lu, Y.; Luo, X.; Wang, S. A prediction model for time series of dissolved gas content in transformer oil based on LSTM. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Xi’an, China, 20–22 August, 2020; Volume 1659, p. 012030. [Google Scholar]
Zhang, Y.; Liu, D.; Liu, H.; Wang, Y.; Wang, Y.; Zhu, Q. Prediction of dissolved gas in transformer oil based on SSA-LSTM model. In Proceedings of the 2022 9th International Conference on Condition Monitoring and Diagnosis (CMD), Kitakyushu, Japan, 13–18 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 177–182. [Google Scholar]
Zhang, X.; Wang, S.; Jiang, Y.; Wu, F.; Sun, C. Prediction of dissolved gas in power transformer oil based on LSTM-GA. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2021; Volume 675, p. 012099. [Google Scholar]
Mahrukh, A.W.; Lian, G.X.; Bin, S.S. Prediction of power transformer oil chromatography based on LSTM and RF model. In Proceedings of the 2020 IEEE International Conference on High Voltage Engineering and Application (ICHVE), Beijing, China, 6–10 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–4. [Google Scholar]
Zhang, W.; Zeng, Y.; Li, Y.; Zhang, Z. Prediction of dissolved gas concentration in transformer oil considering data loss scenarios in power system. Energy Rep. 2023, 9, 186–193. [Google Scholar]
Yang, T.; Fang, Y.; Zhang, C.; Tang, C.; Hu, D. Prediction of dissolved gas content in transformer oil based on multi-information fusion. High Volt. 2024, 9, 685–699. [Google Scholar]
Oprea, S.V.; Bâra, A. A stacked ensemble forecast for photovoltaic power plants combining deterministic and stochastic methods. Appl. Soft Comput. 2023, 147, 110781. [Google Scholar]
Zeng, T.; Wu, L.; Peduto, D.; Glade, T.; Hayakawa, Y.S.; Yin, K. Ensemble learning framework for landslide susceptibility mapping: Different basic classifier and ensemble strategy. Geosci. Front. 2023, 14, 101645. [Google Scholar]
Shahriar, M.S.; Shafiullah, M.; Pathan, M.I.H.; Sha’aban, Y.A.; Bouchekara, H.R.; Ramli, M.A.; Rahman, M.M. Stability improvement of the PSS-connected power system network with ensemble machine learning tool. Energy Rep. 2022, 8, 11122–11138. [Google Scholar]
Bhusal, N.; Shukla, R.M.; Gautam, M.; Benidris, M.; Sengupta, S. Deep ensemble learning-based approach to real-time power system state estimation. Int. J. Electr. Power Energy Syst. 2021, 129, 106806. [Google Scholar]
Liu, J.; Zhao, Z.; Zhong, Y.; Zhao, C.; Zhang, G. Prediction of the dissolved gas concentration in power transformer oil based on SARIMA model. Energy Rep. 2022, 8, 1360–1367. [Google Scholar]

Figure 1. Flowchart of transformer dissolved gas concentration prediction model based on D-S evidence theory-optimization ensemble learning.

Figure 2. Comparison area of prediction error.

Figure 3. Prediction accuracy comparison bar chart.

Figure 4. Predicted mean square error comparison radar chart.

Table 1. Prediction error results table.

	H2	C2H6	C2H4	CH4	Overall Deviation
Actual value	20.600	112.500	64.200	31.200	0
Bagging	20.333	110.362	64.433	31.905	3.343
SARIMA	19.198	111.890	64.638	31.890	3.140
Optimized Bagging based on D-S evidence theory	20.400	112.182	64.115	32.196	1.599

Table 2. Prediction accuracy results table.

	H2	C2H6	C2H4	CH4	Overall Accuracy
SARIMA	0.932	0.995	0.993	0.978	0.974
Bagging	0.987	0.981	0.996	0.977	0.985
Optimized Bagging based on D-S evidence theory	0.990	0.997	0.999	0.968	0.989

Table 3. Mean square error results of predictions.

	H2	C2H6	C2H4	CH4	Overall Mean Square Error
Bagging	0.071	4.571	0.054	0.497	5.194
SARIMA	1.966	0.373	0.192	0.476	3.006
Optimized Bagging based on D-S evidence theory	0.040	0.101	0.007	0.992	1.140

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, P.; Hu, K.; Yang, Y.; Yi, G.; Zhang, X.; Peng, R.; Liu, J. Research on Prediction of Dissolved Gas Concentration in a Transformer Based on Dempster–Shafer Evidence Theory-Optimized Ensemble Learning. Electronics 2025, 14, 1266. https://doi.org/10.3390/electronics14071266

AMA Style

Zhang P, Hu K, Yang Y, Yi G, Zhang X, Peng R, Liu J. Research on Prediction of Dissolved Gas Concentration in a Transformer Based on Dempster–Shafer Evidence Theory-Optimized Ensemble Learning. Electronics. 2025; 14(7):1266. https://doi.org/10.3390/electronics14071266

Chicago/Turabian Style

Zhang, Pan, Kang Hu, Yuting Yang, Guowei Yi, Xianya Zhang, Runze Peng, and Jiaqi Liu. 2025. "Research on Prediction of Dissolved Gas Concentration in a Transformer Based on Dempster–Shafer Evidence Theory-Optimized Ensemble Learning" Electronics 14, no. 7: 1266. https://doi.org/10.3390/electronics14071266

APA Style

Zhang, P., Hu, K., Yang, Y., Yi, G., Zhang, X., Peng, R., & Liu, J. (2025). Research on Prediction of Dissolved Gas Concentration in a Transformer Based on Dempster–Shafer Evidence Theory-Optimized Ensemble Learning. Electronics, 14(7), 1266. https://doi.org/10.3390/electronics14071266

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Prediction of Dissolved Gas Concentration in a Transformer Based on Dempster–Shafer Evidence Theory-Optimized Ensemble Learning

Abstract

1. Introduction

2. Related Work

3. Bagging Model Based on D-S Evidence Theory

3.1. D-S Evidence Theory Based on Sequential Least Squares Programming Algorithm Optimization

3.2. Prediction Model of Transformer Dissolved Gas Concentration Based on Bagging

4. Results of Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI