Machine Learning-Based Sensor Data Modeling Methods for Power Transformer PHM

An emerging prognostic and health management (PHM) technology has recently attracted a great deal of attention from academies, industries, and governments. The need for higher equipment availability and lower maintenance cost is driving the development and integration of prognostic and health management systems. PHM models depend on the smart sensors and data generated from sensors. This paper proposed a machine learning-based methods for developing PHM models from sensor data to perform fault diagnostic for transformer systems in a smart grid. In particular, we apply the Cuckoo Search (CS) algorithm to optimize the Back-propagation (BP) neural network in order to build high performance fault diagnostics models. The models were developed using sensor data called dissolved gas data in oil of the power transformer. We validated the models using real sensor data collected from power transformers in China. The results demonstrate that the developed meta heuristic algorithm for optimizing the parameters of the neural network is effective and useful; and machine learning-based models significantly improved the performance and accuracy of fault diagnosis/detection for power transformer PHM.


Introduction
Prognostic and System Health Management (PHM) generally provides capabilities such as fault detection, fault prediction, and component life tracking to assess product reliability.PHM technologies include sensing, anomaly detection, diagnosis, prediction and decision support for intelligent machinery maintenance and health operation.Taking advantage of advances in sensor technologies, PHM enables a pro-active fault prevention strategy through continuously monitoring the health of complex systems.A power transformer is a piece of equipment that is of great importance to the electronic system.Thus, its performance can have a great impact on the power grid [1][2][3].Power transformer aging is an important factor leading to grid failure, which can also cause three main fault types in transformers: electrical, mechanical, and thermal failure.Among them, mechanical failure ranks first [4,5].Therefore, it is critical to improve the accuracy of fault diagnosis of power transformers [6,7].Some traditional methods for fault diagnosis of transformers such as dissolved gas analysis (DGA) [8][9][10], short circuit reactance (SCR) [11], and frequency response analysis (FRA) [12] have been widely used in industries.Nevertheless, these methods were limited by the low accuracy of fault diagnosis when the component of the dissolved gas in oil is complicated.High-dimensional fault data on power transformers can lead to the nonlinearity of the whole system, and FRA and SCR in this condition cannot find the real locations of fault, and also cannot provide the information about the types of transformers [11].
The methods of power transformer faults' diagnosis [13] include mainly the International Electrotechnical Commission (IEC) four-ratio and the three-ratio method, characteristic gas method and so on.However, these methods generate large errors in the diagnosis of power transformers.The accuracy will be greatly reduced when the sample data are too small or there are some outliers the samples.Therefore, artificial intelligence technology with excellent performance is desired to be used in transformer fault diagnosis.Intelligent algorithms based on the DGA data are the widely-used methods in transformer fault diagnosis, especially the back-propagation (BP) neural network [14,15].The BP neural network can be utilized to find the connection weights and bias to implement accurate diagnostic methods or models for DGA.The updated parameters of BP neural network follow the rule of gradient descending to avoid mistaking the parameters as the optimal parameters.
Nowadays, many smart optimization algorithms and machine learning algorithms have been applied to different domains such as power transformers since these methods have great fault diagnosis performance.There are plenty of power transformer fault diagnoses and other cutting-edge research.In the fault diagnosis of power transformers, various intelligent and machine learning methods are used to detect the state of transformers.
As for power transformer fault diagnosis, Khmais et al. [16] developed a fault classification method of power transformer based on support vector machine (SVM) using train data to build a multi-layer SVM classifier.This classifier has superior performance in identifying transformer fault types.Li et al. [17] presented an intelligent method for power transformer fault diagnosis based on selected gas ratio and SVM.They used a genetic algorithm (GA) to obtain the optimal dissolved gas ratio (ODGR) for DGA ratio selection and support vector machine parameter optimization.Three and four-digit coding with faulty information and fuzzy logic is used to improve the result by Hooshmand et al. [18].The method has been applied to the diagnosis of dissolved oil in the transformer.Wang [19] developed a new transformer fault diagnosis method based on a probabilistic neural network (PNN) and dissolved gas analysis.A hybrid evolutionary algorithm based on particle swarm optimization (PSO) and BP is used to optimize the parameters of PNN.In order to solve the problem of power transformer accidents, Trappey et al. [20] developed an intelligent engineering asset management system.Data-driven models are used to detect potential faults in transformers.The Principal component analysis (PCA) and BP-Artificial Neural Network (BP-ANN) are used as prediction models to carry out this task.Zheng et al. [21] proposed a transformer solubility prediction method based on PSO and least squares support vector machine (LS-SVM).The results demonstrated that the method is superior to BPNN, Generalized Regression Neural Network (GRNN), Radial Basis Function Neural Network (RBFNN) and Support Vector Regression (SVR) methods.
With regard to another piece of equipment detected by novel methods, Zhou et al. [22] presented a method of intelligent fault diagnosis based on ontology and FMECA (Failure Mode, Effects and Critically Analysis) for the fault diagnosis of wind turbines.This method realizes the knowledge sharing between deep knowledge and shallow knowledge, improves the fault diagnosis ability and makes a better decision for the diagnosis system.In order to improve the efficiency and accuracy of transient probability analysis of flexible mechanisms, a dynamic network method (DNNM) based on Improved PSO/Bayesian regularization (BR) is proposed by Song et al. [23].The results show that the method improves the computational efficiency and provides a meaningful insight for flexible mechanisms.In order to address the problem of rolling bearing tip under complex working conditions, it is often affected by mechanical and electrical system faults.Therefore, Lu et al. [24] proposed a deep learning method based on a convolutional neural network (CNN).Evssukoff and Gentil [25] proposed a recursive neural fuzzy system for fault detection and isolation in nuclear reactors.It generates good performance in detecting and isolating various security related faults.This paper is extended from the DPDC 2008 conference, entailed, "Cuckoo Search Optimized NN-based Fault Diagnosis Approach for Power Transformer PHM" [26].Based on the recommendation from the SPSC 2008 committee, we extensively rewrote the paper by extending the experiments and providing more validation results obtained from real transformer sensor data collected in a smart grid.The main contributions are as follows: (1) to develop machine learning-based models for transformer PHM, we proposed a novel method to enhance the cuckoo search algorithm for optimizing the parameters of multi-layer back-propagation neural network for fault diagnosis of a power transformer.
(2) We introduce the important factors such as improvement rate (IR) to update the function of cuckoos (solutions).(3) Given the mutation of the process of finding optimal solution, we consider that the mutation of solution x, which is controlled by mutation probability P m .(4) We evaluated the developed machine learning-based PHM models by using the real operational data collected from power transformers in a smart grid.The results demonstrated the high performance of the PHM models for transformer fault diagnosis.
The paper is organized as follows.After the Introduction section, Section 2 presents the machine learning-based method, using a Cuckoo search algorithm to optimize the BP neural network for power transformer fault diagnosis.Section 3 introduces the developed machine learning-based model for power transformer fault diagnosis; Section 4 presents the experiments and the results; Section 5 discusses the results and draws the conclusions.

Modified Cuckoo Search (MCS) Algorithm
Cuckoo Search Algorithm (CS) is a nature-inspired meta heuristic algorithm which imitates parasitic brood behavior of cuckoos [27].To simulate the behavior of cuckoo nesting, the CS algorithm sets three rules.The cuckoo produces an egg each time, which represents a solution to the problem, and randomly places the eggs in a nest for hatching.In addition, the number of nests is fixed and set a value P a ∈ (0, 1) to describe the probability that the nest owner finds the that the egg is a foreign egg.CS is enhanced by the Levy flight so that CS can explore global space and local space of solution and combine them with local search and global search mechanisms that make itself efficient [28].In addition, important parameters P a and step-size α of CS algorithm in fine-tuning of solution vectors are used to adjust the convergence rate of the algorithm.However, the standard CS algorithm uses a constant value for these parameters by the experience.Unquestionable parameter setting and constant parameters during iterations will decrease the performance of CS algorithm [29].
Thus, in order to improve the ability and overcome disadvantages, a modified Cuckoo Search Algorithm (MCS) is proposed in [30], which the main task is to implement the iterative process in which parameters P a and α are updated via function in the appropriate range.
In order to use feedback information during evolution, parameters P a and α are set as proportional to the improvement rate (IR).In addition, the IR can be computed by where where NI is the number of improvement of solutions.f is the fitness function we set.NN is the total population size.
The discovery probability P a and step size α are dynamically updated as follows: where α max and α min are the maximum and minimum values of step size α, respectively.P amax and P amax are the maximum and minimum values of discovery probability P a , respectively.m and n are nonlinear factors for adjusting the speed of change of the control parameters.
There are two different strategies in MCS for exploration and exploitation.The first strategy uses Mantegna's algorithm [31] as follows: where Here, L is the characteristic scale of the data set.α in Equation ( 5) is the step size.λ is the Levy exponent which controls the scale of distribution.s is the step size that can be computed as follows: where U ∼ N(0, σ 2 ), V ∼ N(0, 1).In addition, σ 2 can be calculated as follows: The second strategy is to attract the closest individuals of the current solution and conduct a global random walk.
These two strategies are randomly selected by the switching probability P c , and the second strategy is described as follows: where the integers q1, q2 and q3 represent three mutually different indices randomly selected in the range [1, 2, ..., NN], which are different from the integer i. T is scaling factor and t 1 is a random number within the interval [0, 1].
In addition, to strengthen the global search capability of MCS, a mutation strategy is also introduced through mutation probability P m .In addition, the mutation is as follows: where t 2 is a random number in the interval [−1, 1].i and j are different integers selected within the range [1, 2, ..., NN].k is an integer within the range [1, 2, . . ., D], D is the solution space dimension.
In addition, the parameter Pa is to judge the probability of hosts finding exotic birds' eggs.It can determine whether to generate the next new nest.At this time, the location update equation is given by where the integers q 1 , q 2 and q 3 denote three different integers.r 1 and r 2 are randomly generated numbers in [0, 1].
The initial location of MCS can be expressed as: where the Ub and Lb are the upper and lower bounds of the search space, respectively.The pseudo-code of modified CS algorithm is shown as Algorithm 1.
Algorithm 1: Pseudo-code of the modified CS algorithm

Back-Propagation (BP) Neural Network
Back-propagation (BP) neural network is a multi-layer feed-forward neural network, which belongs to an uncertain nonlinear mathematical model [32][33][34].The BP network consists of an input layer, hidden layer and output layer.The two processes of forward propagation and back propagation are of great importance to the BP neural network [35,36].The BP network can have better performers in classification and prediction because of the combination of these two processes.In the forward propagation, the data are passed through the input layer and combined with the hidden layer weights and thresholds to calculate layer by layer, and finally reach the output layer to obtain the classification result.In back propagation, when the output in the output layer does not comply with expectations, the error signal will propagate back.It uses an error gradient descent algorithm to reduce the mean square error (MSE) between the network output value and the actual output value, and the network adjusts the weights and thresholds layer by layer from the output layer to all hidden layers.Finally, the corrected result is output to the output layer.

Feed-forward
After recording the input value vector x, the activation a l in the input layer l can be computed in a simple and compact vectorized form: where w l and b l j are the weight and the bias between the (l − 1)th and the lth layer.To set the corresponding activation, this paper uses the most popular sigmoid function: The quadratic error criterion function of sample n is C: •

Back-Propagation
While reaching the layer L, the output error δ L can be calculated by ∇ a C contains the rate of C changing.denotes the entry-wise product of two vectors.Subsequently, the error in the next layer is where (w l+1 ) T is the transpose for the (l + 1)th layer.σ (z l ) is the Hadamard product within the interval .
According to the error gradient descent method, the threshold can be calculated as follows: Any weight in the network is: By combining Label (11) with Label (12), the error goes backward through the activation function in layer l.
The BP neural network model structure can be seen in Figure 1.

Muti-Hidden layer
Output layer

MCS Optimized BP Neural Network (MCS-BP)
The fault diagnosis of a power transformer based on an MCS optimized BP neural network can be used as a comprehensive diagnosis platform, which combines the data of gas in oil with the detection system, and then obtains good results by supervised learning methods.
As shown in Figure 2, MCS optimizes the block diagram of BP neural network.The following are the main steps: Step 1: At first, use an IEC three-ratio method to process the features of DGA data of power transformer.
Step 2: Randomly choose the different types of faults of power transformers into the neural network.
Step 3: Initialize the parameters of the BP neural network.
Step 4: Initialize the modified cuckoo search size x i (i = 1, 2, ..., N), population size N, switching probability P c , mutation probability P m , and value of step size α 0 , maximum value of step size α max , minimum value of step size α min , maximum value of discovery probability P amax , minimum value of discovery probability P amin , nonlinear factor m and n, scaling factor F and the fitness function f (x).
The fitness function we used in this paper is the mean square error (MSE), as follows: where Y i is the measure value and O i is the predicted result.
Step 5: Calculate the fitness value of the initial nest via the fitness function, and then select the current optimal solution in the solution space.
Step 6: Generate a random number K1 and compare with P c .Compare K1 and P c , if K1 > P c , update nests x t+1 i via Equation ( 5), otherwise by Equation (9).
Step 7: Generate a random number K2 and compare with P m of MCS.If K2 < P m , perform the mutation via Equation (10); otherwise, it is unchanged.
Step 8: Calculate the updated solution's fitness value and update the discovery probability P a and the step size α via Equation (3) and Equation (4).
Step 9: Generate a random number K3 and compare with P a .If K3 > P a , update nests x t+1 i via Equation (11), or do not change.Compare the last fitness values with new birds' nests, keep the optimal bird's nest as the contemporary best nest x b .
Step 10: If it can reach the maximum iteration condition, proceed to the next step, or return to Step 6.
Step 11: Substitute the optimized weights and bias of the BP neural network.
Step 12: Input the test set into the trained BP neural network to get the classification output.

MCS-BP for Power Transformer Fault Diagnosis Platform
In this paper, power transformer fault diagnosis is mainly divided into four parts: data collection and preprocessing, segmentation of data set, neural network model-training, and comparison between test set output and train set output, as is shown as Figure 3.In Figure 3, firstly power transformer DGA data will be processed in feature selection via the IEC three-ratio method.This procession can be seen in Table 1.Then, 70% of the data can be used in the training model, which has been sorted randomly to ensure the train set and test set containing all types of faults.The other 30% of the data is utilized to test the optimized model.In this study, we test five types of faults of power transformers, which are the thermal faults T > 700 • C, thermal faults T < 300 • C, high energy discharge, low energy discharge and partial discharge.It can be seen as Table 2, and each group of data is balanced.There are 109 sets of data.
Table 2. Fault type used in analysis.Through this optimization model, the potential faults of power transformers can be predicted and classified.

Experimental Setup
In order to evaluate the performance of the proposed method in power transformer fault diagnosis, we obtain real-world data to implement experiments.By IEC three-ratio processing 109 sets of DGA data, the feature-filtered data sets are obtained, some of which are shown in Table 3.In this paper, the neural network is used as the basic classifier, so the encoding method of data is shown in Table 4.The MCS-BP method is compared with other excellent predictive classifiers BP, CS-BP, Multi-Verse Optimizer-Multi-Layer Perceptron (MVO-MLP), PSO-BP, GA-BP, PNN and SVM, respectively.Firstly, the accuracy and error rate of the algorithm of MCS-BP is compared with CS-BP and BP, and the superiority of the algorithm of MCS in the optimization of a neural network is proved.Secondly, by comparing the MCS-BP algorithm with other machine learning algorithms and optimization algorithms, it is proved that the algorithm has strong robustness and classification performance.

Experimental Results
Firstly, the MCS-BP algorithm and CS-BP algorithms are compared.In this experiment, the parameters of the BP neural network are the same, its structure is 3-4-5, three-dimensional input, five-dimensional output, for MCS and CS, the MCS algorithm sets the Levy exponent λ to 1.5, the step size α 0 to 0.1, α max to 0.5, α min to 0.05, the discovery probability P amax to 0.5, P amin to 0.1, P c to 0.3, P m to 0.3, m to 0.5, and n to 0.5.The CS algorithm sets the discovery probability P a to 0.25, λ to 1, and the step size α to 0.4.The accuracy of classification is shown in Table 5.For five types of fault outputs, MCS-BP has a high recognition rate of 97.14% with asterisk, and the classification recognition rate of different fault types is higher than or equal to the BP and CS-BP algorithms.
As shown in Table 6, the training set and test set MSE of MCS-BP are the smallest.Compared with the other two algorithms, the performance of MCS is better than that of normal BP and standard CS.Next, we use many different algorithms to further evaluate the classification performance of MCS-BP.The GA sets the crossover probability to 0.7 and the mutation probability to be 0.01.The PSO algorithm sets the maximum speed to 1, the minimum speed to −1, the solution space is [−5, 5] and the learning factor to 1.49445.The recognition rate is shown in Table 7. From Table 7, it can be found that the recognition rate and Micro F1-score of MCS-BP is obviously higher than that of other algorithms, and for fault type T1 and PD, the classification recognition rate of other algorithms is very low, and the diagnosis rate of MCS-BP is still very high.The error rate comparison is shown in Table 6, from which we can see that the MSE of MCS-BP is still the smallest, indicating the excellent performance of MCS-BP.From Table 8, we further know that the performance of the classification of MCS-BP is better than other models via each fault type F1-score and the Macro F1-score (98.46%).
In order to further evaluate the performance of MCS-BP, we plot the output results of different algorithms, such as Figure 4, Figure 4a,c,e,g for the output of training set and the layout of predicted classification results, and Figure 4b,d,f,h for the output of test set classification.Its test set classification effect is not ideal; however, the MCS-BP (a), (b) shows a good recognition rate, and (b) shows that only the third fault type judgment is an error.By Figure 4, we know that the developed method is better than other algorithms with respect to stability as the test data results can reflect that our model won't fall into the problem of over-fitting, and both the train sample and test samples have great classification results.It indicates that this model is helpful for fault diagnosis of power transformer because it can give a suitable decision for which the fault type of power transformer is contained.
In addition, to clarify the optimization effect of MCS on BP during the iteration process, we consider the variation of the fitness (error) of the MCS, as shown in Figure 5. From generation 1 to generation 50, the fitness value of MCS in the process of optimizing neural network decreases rapidly in generations 1 to 4, which shows that the efficiency of global optimization and local optimization of MCS is very high.Then, it falls into a local extremum in generation 18 or so.However, after a period of iteration, it jumps out of a local extremum point and continues to search for optimization.In addition, we completed the Receiver Operating Characteristic (ROC) Curve, which is shown in Figure 6, and obtain the value area under the curve (AUC).Both of them can prove that our model has great performance of classification for power transformers.For Figure 6a, we can see that each fault class can be classified well.

Conclusions
In this paper, we propose a machine learning-based method, CS optimized BP neural network model for power transformer fault diagnosis.This algorithm can adjust the search step of solution space adaptively to find a better global optimal solution, and the fitness value of each solution is utilized to build the mutation probability to avoid local convergence.In addition, the MCS enhances the exploitation capacity and convergence rate.We conducted the experiments to validate the developed models by using 109 sets of real-world data collected from power transformers.Compared with other algorithms, experimental results show that the MCS method we developed outperformed other methods and can converge to the optimal solution for most test cases.
To validate the machine learning-based models or methods for fault diagnosis, more extensive experiments and more advanced metrics and evaluation tools are in high demand.This will be our future work.We will continue to enhance the performance of the algorithms and models and evaluate the performance of the models under different circumstances of the error rate and operating efficiency, using other evaluation tools and metrics.

Figure 3 .
Figure 3. Structure flow of the power transformer fault diagnosis process.

Figure 4 .
Figure 4.The classification results of different models.(a), (c), (e) and (g) represent the results of train sample classification for different methods, respectively.(b), (d), (f) and (h) are the results of test sample classification for different methods, respectively.

Figure 5 .
Figure 5.The curve of fitness of MCS-BP.

Figure 6 .
Figure 6.The curve of ROC-AUC of different models.(a-d) represent the receiver operating characteristic curve with AUC for MCS-BP, MVO-MLP, PSO-BP and GA-BP, respectively.

Table 3 .
Statistical data of partial samples.

Table 4 .
Output target coding of different faults.

Table 5 .
The comparison of basic methods (* means the best result in the table).

Table 6 .
Comparison of sample errors.

Table 7 .
The comparison of different methods (* means the best result in the table).

Table 8 .
The comparison of different methods with F1-score (* means the best result in the table).