Entropy-based Bagging for Fault Prediction of Transformers Using Oil-dissolved Gas Data

The development of the smart grid has resulted in new requirements for fault prediction of power transformers. This paper presents an entropy-based Bagging (E-Bagging) method for prediction of characteristic parameters related to power transformers faults. A parameter of comprehensive information entropy of sample data is brought forward to improve the resampling process of the E-Bagging method. The generalization ability of the E-Bagging is enhanced significantly by the comprehensive information entropy. A total of sets of 1200 oil-dissolved gas data of transformers are used as examples of fault prediction. The comparisons between the E-Bagging and the traditional Bagging and individual prediction approaches are presented. The results show that the E-Bagging possesses higher accuracy and greater stability of prediction than the traditional Bagging and individual prediction approaches.


Introduction
Prediction of potential failures in electrical power equipment can effectively improve the reliability of electrical grids.Because power transformers are very important electrical power equipment in electrical grids, their failure prediction has been focused on by a lot of researchers.Prediction of failures in power transformers consists of two steps.The first is to estimate characteristic parameters of transformer faults based on sample data related to transformer conditions.Secondly, a fault diagnostic tool is used to evaluate transformer faults in combination with the estimated characteristic parameters.Many approaches, including regression analysis [1], time series analysis [2], artificial neural networks (ANN) [3][4][5], grey models (GM) [6,7], and support vector machines (SVM) [8], have been used to estimate the characteristic parameters of transformer faults.Estimation accuracy of these approaches can be acceptable if sample data with sufficient accuracy is available.However, sample data of transformer faults may be obtained from various sources and the accuracy levels of the sample data may be different.Based on sample data of transformer faults with different accuracy levels, a fault prediction model may generate different characteristic transformer fault parameters results.How to design a robust model of transformer fault prediction and improve generalization ability of the prediction model is still a task that is attracting more and more researchers.
Ensemble learning is an effective approach to enhance generalization of machine learning algorithms.Bagging [9] and Adaboost [10] are two major ensemble learning methods to form a combined classifier or predictive tool based on data resampling.They are capable of improving stability of classification or prediction models as well as keeping maximum time complexity of the models unchanged.Adaboost is adapted to combine weak classification or prediction models with strong independence, while Bagging is applicable to combine strong classification or prediction models.Some approaches, such as sub-Bagging [11] and weighted member models [12], were brought forward to improve ensemble learning with satisfactory results.This paper presents an improved Bagging algorithm for power transformer fault prediction based on oil-dissolved gas.The improved bagging is also called entropy-based Bagging (E-Bagging), because comprehensive information entropy was considered to improve Bagging.The probability of selecting sample data for training is determined by the comprehensive information entropy of the sample data.Sample data containing a greater amount of information results in bigger comprehensive entropy of the sample data and has a greater probability of being selected for training.Examples and analysis results show that the improved Bagging enhances generalization and accuracy of transformer fault prediction based on oil-dissolved gas.

Basic Process of Bagging
As shown in Figure 1, Bagging is a parallel process to train a machine learning algorithm.Data subsets from S 1 to S n are randomly selected from the whole data set U for training.This process is called a sample data resampling process.The data subset S i is used for training to obtain a member prediction function H i .Many popular methods, such as SVM [8], ANN [3][4][5], and GM [6,7], could be used to construct the member prediction function.The combined prediction function H is deducted from the member functions H 1 , H 2 , ⋅⋅⋅, H c by a weighted mean method as below [13]: where α i is the weight of member prediction function H i and α i = 1/c.In Figure 1, H i could be a very time-wasting prediction model, like an artificial neural network.If the whole sample data are used for training the individual prediction model, time for training may be too much to be acceptable.However, each data subset has a much smaller size than the whole sample data, so the time to train an individual model by using a data subset can be reduced considerably.The Bagging method can thus save much time in training all individual models in the parallel process and thus form the combined prediction model.The parallel process of Bagging shows that resampling is a very important step to select representative sample data for training.It may influence the diagnosis or prediction results of Bagging.The usual process of resampling is a simple random selection of sample data without prior information.This indicates that the probability of sample data being selected is same.The simple random resampling may not improve or even reduce the accuracy of diagnosis or prediction results of Bagging.

Data Resampling Based on Comprehensive Information Entropy
Sample data of transformer faults come from transformers at different voltage levels and under various operation conditions.The representative sample data should be selected with great probability during the resampling process of Bagging.A uniform standard is thus required to determine the probability of sample data for selection.A comprehensive information entropy parameter of sample data is introduced as below to determine selection probability of the data.The comprehensive information entropy is a weighted sample entropy of the sample data.

Computation of Sample Entropy
Information entropy is a measure of the uncertainty associated with a random variable in information theory.Sample entropy was proposed as a complexity measure of time series in [14].In this paper, it is used as an objective measure of the amount of information contained in sample data.
Let U = [u 1 , u 2 , ⋅⋅⋅, u n ] denote a n-elements sample data sequence of transformer faults.The computation algorithm of sample entropy of the data sequence consists of the following steps: (1) Form a set of m sample data segment defined as: (2) Calculate d[X(i), X(j)], the distance between X(i) and X(j), defined as the maximum absolute difference between any two vectors of the two sample data segments: If u i is a q-dimensional vector, the value of - is calculated as below: ( ) ( ) (3) Given a tolerance parameter r, calculate the number whereby d[ ( ), ( )] i j X X is smaller than r. ( ) B r is a measure to describe the similarity degree between the sample data segment X(i) and the sample data sequence U.
( ) B r is defined as: where the function count(•) is to calculate the number that d[ ( ), ( )] i j X X is smaller than r.(4) For i∈[1, N − m + 1], calculate the average of ( ) (5) Form a new set of m+1 sample data sequence as below: (6) Calculate the sample entropy E of the sample data sequence as: When N is a finite number, the sample entropy of the sample data sequence can be estimated as:

Comprehensive Information Entropy
Comprehensive information entropy of the sample data segment u i is defined as: where ω i is an additional weight of the sample entropy.Suppose Y = [y 1 , y 2 , ⋅⋅⋅, y n ] to be a data sequence about additional information of transformers.Given a target vector ŷ , the additional weight of sample entropy can be computed as: where ik y is the kth element of the vector y i and ˆk y is the kth element of the target vector ŷ .k y and k y represent maximum and minimum of the kth elements of all vectors belonging to Y, respectively.
The additional weight of sample entropy is a normalized parameter between 0 and 1.

Procedures of Data Resampling
Probability of data to be selected for training is calculated as: Given a vector I 1 with m elements equal to zero and m < n, where n is the number of sample data and the selected sample data is noted as S 1 .The procedures of data resampling algorithm are as follows: (1) Generate randomly a vector , where r i is a random number between 0 and 1; (2) Let k = 0 and form a vector If the Bagging consists of member prediction functions in number of c, the above procedures are repeated to obtain S 1 , S 2 , ⋅⋅⋅, S c for training.

E-Bagging Procedures
The procedures of the E-Bagging algorithm are as follows: (1) Calculate the comprehensive information entropy of sample data by using (12) to obtain S i , where I = 1, 2, ⋅⋅⋅, c; (2) Use S i to train the member prediction function H i ; (3) Repeat steps (1) and ( 2) until the completion of training of the member prediction function H c ; (4) Combine the member prediction functions H 1 , H 2 , ⋅⋅⋅, H c to obtain the ensemble prediction function H by using (1).From the above E-Bagging procedures, it could be found that the traditional Bagging is a special case of the E-Bagging when ( ) . The maximum time complexity to compute the comprehensive information entropy of the E-Bagging algorithm is O (mNc), where m is the number of sample data of every subset, N is the number of the whole sample data, and c is the number of data subsets.This is much shorter than the time required for a member prediction algorithm used in the E-Bagging.Therefore, the time complexity of the E-Bagging algorithm is the same as that of the traditional Bagging algorithm.

Processing of Sample Data
The fault prediction of transformers is based on a total of 1200 sample datasets of oil-dissolved gas in oils of transformers with voltage levels between 50 kV and 220 kV.The sample data are normalized as: where u ij is the jth element of the sample data u i .Because the elements of each sample data vector correspond to the measurement results of the five types of oil-dissolved gas: H 2 , CH 4 , C 2 H 2 , C 2 H 4 , and C 2 H 6 , the sample data Voltage level and operation time of transformers are two additional information items of transformers.Table 1 shows the voltage levels and participation of operation time of transformers.The mapping values of the additional information of transformers are also presented in Table 1.Every data vector about additional information of transformers thus consists of two elements according to Table 1.For example, if y i denotes a sample data about additional information of a 110 kV and 15 years transformers, y i = [2,3].

Prediction Accuracy
Table 2 shows the average relative errors of prediction results of the E-Bagging.Four prediction models, GM [6], SVM [8], back-propagation neural network (BPNN) [5], and a combination forecasting method (CF) [15] based on the previous three prediction models, are used as individual models for Bagging.For the E-Bagging, the size of each data subset m = 400 and number of member functions c = 10.The average relative errors of these four models and traditional Bagging based on the four models are also presented in Table 2.As can be seen from Table 2, the average relative errors of the E-Bagging are smaller than those of both the Bagging and the individual models.This indicates that the improvements of the E-Bagging are capable of increasing the transformer fault prediction accuracy.

Prediction Stability
The above 1200 sample datasets were divided into five groups for testing the prediction stability of E-Bagging.The first group consists of 800 sample data.The first 100 sample data of the first group were replaced by the first 100 of the remaining 400 sample data to obtain the second group of sample data.The third group was obtained by taking the first 200 from the remaining 400 sample data to replace the first 200 sample data of the first group.By the same way, 100 more sample data of the first group were replaced each time to obtain the other two groups of sample data.From the first group to the fifth group, changes of sample data increases.The five groups of sample data were used as for training, respectively.
Table 3 shows the average relative errors of the E-Bagging, Bagging, and individual models, which are trained by using the five groups of sample data.The prediction of E-Bagging is stable, even though various sample data are used for its training.However, the average relative errors of individual models change more than the E-Bagging when the five groups of sample data are used for training.To compare the stability of the prediction methods, the standard deviation σ of average relative error is used as quantification measure as below: where e i is the average relative error of fault prediction and e is the average of e i .
Table 4 shows the calculation results of the standard deviation of average relative errors of prediction by using the E-Bagging, Bagging, and the individual models.Except for the SVM used for ensemble, the E-Bagging generates a significantly smaller standard deviation of the average relative error of fault prediction.The E-Bagging also generates much smaller standard deviation of the average relative error of fault prediction than the individual models.Table 4 also shows that the standard deviation of average relative error of E-Bagging when ω i ≡ 1.The additional weight of sample entropy can be used to increase the stability of E-Bagging, except for the SVM used for the ensemble.

Conclusions
This paper presents an E-Bagging method based on the comprehensive information entropy.The examples and analysis of transformer fault prediction are also presented.The results of the work can be summarized as follows: (1) The resampling is an important process of Bagging.The comprehensive information entropy of sample data is helpful to select representative sample data for training during the resampling process and to improve the generalization ability of Bagging.

Figure 1 .
Figure 1.Basic process of Bagging for an ensemble of prediction functions.

Figure 2
Figure 2 shows furthermore the average relative errors of the E-Bagging, Bagging, and CF.The CF is the individual model of both the E-Bagging and the Bagging.The change of the average relative error of E-Bagging based on CF is smallest among the three prediction approaches.The CF generates the much greater change of the average relative error of prediction than the E-Bagging and Bagging, if no ensemble is used.

Figure 2 .
Figure 2. Error rate of CF by different ensemble methods.

( 2 )
The E-Bagging method improves the prediction accuracy of transformer faults.E-Bagging generates significantly smaller average relative transformer fault prediction errors, based on 1200 sample data of oil-dissolved gas, than the traditional Bagging and individual prediction algorithms.(3)E-Bagging shows a good generalization ability of prediction of transformer faults.The stability of the E-Bagging was shown to be greater than the traditional Bagging and individual prediction algorithms through examples of training with various sample data.

Table 1 .
Mapping values of additional information of transformers.

Table 2 .
Average relative error of fault prediction.

Table 3 .
Average relative error of fault prediction.

Table 4 .
Standard deviation of average relative error of fault prediction.