An Integrated Model for Transformer Fault Diagnosis to Improve Sample Classiﬁcation near Decision Boundary of Support Vector Machine

: Support vector machine (SVM), which serves as one kind of artiﬁcial intelligence technique, has been widely employed in transformer fault diagnosis when involving dissolved gas analysis (DGA). However, when using SVM, it is easy to misclassify samples which are located near the decision boundary, resulting in a decrease in the accuracy of fault diagnosis. Given this issue, this paper proposed a genetic algorithm (GA) optimized probabilistic SVM (GAPSVM) integrated with the fuzzy three-ratio (FTR) method, in which the GAPSVM can judge whether a sample is near the decision boundary according to its output probabilities and diagnose the samples which are not near the decision boundary. Then, FTR is used to diagnose the samples which are near the decision boundary. Combining GAPSVM and FTR, the integrated model can accurately diagnose samples near the decision boundary of SVM. In addition, to avoid redundant and erroneous features, this paper also used GA to select the optimal DGA features. The diagnostic accuracy of the proposed GAPSVM integrated with the FTR fault diagnosis method reached 86.80% after 10 repeated calculations using 118 groups of IEC technical committee (TC) 10 samples. Moreover, the robustness is also proven through 30 groups of DGA samples from the State Grid Co. of China and 15 practical cases with missing values.


Introduction
Oil-immersed power transformers are important pieces of power transmission equipment in the power system. Transformer failure causes widespread power blackout, resulting in economic losses that cannot be estimated [1][2][3]. Therefore, the safe and stable operation of the transformer is important to the power system, and it is of great importance to diagnose transformer faults such as over-heating and discharges in time and correctly.
In the existing research, dissolved gas analysis (DGA) has been widely used as the on-line fault monitoring approach for power transformer fault diagnosis. The gases dissolved in the transformer oil mainly include hydrogen (H 2 ), methane (CH 4 ), acetylene (C 2 H 2 ), ethylene (C 2 H 4 ), and ethane (C 2 H 6 )

Gas Features Dissolved in Oil
The conventional DGA features mainly include H 2 , CH 4 , C 2 H 2 , C 2 H 4 , C 2 H 6 , CO, CO 2 , and total hydrocarbon (TH). To find the ODF, contents of the above gases and the ratio of every two gas contents formed all DGA features to be selected. The corresponding DGA features are numbered in Table 1. No.  are the conventional DGA features, and No.  are the ratios of every two gas contents. ODF is selected from the above DGA features. Feature engineering is an important procedure in machine learning. Redundant features will reduce the calculation speed of the algorithm, and incorrect features may reduce the accuracy of the algorithm [27]. The feature selection method based on GA and SVM proposed in [28] is improved and used in this work for ODF selection; the binary encoding of chromosomes is shown in Figure 1.

Gas Features Dissolved in Oil
The conventional DGA features mainly include H2, CH4, C2H2, C2H4, C2H6, CO, CO2, and total hydrocarbon (TH). To find the ODF, contents of the above gases and the ratio of every two gas contents formed all DGA features to be selected. The corresponding DGA features are numbered in Table 1. No.1-No.8 are the conventional DGA features, and No.9-No.36 are the ratios of every two gas contents. ODF is selected from the above DGA features. Feature engineering is an important procedure in machine learning. Redundant features will reduce the calculation speed of the algorithm, and incorrect features may reduce the accuracy of the algorithm [27]. The feature selection method based on GA and SVM proposed in [28] is improved and used in this work for ODF selection; the binary encoding of chromosomes is shown in Figure 1. The chromosomes of GA are generated by binary coding. Each chromosome consists of three genes. The first two genes are penalty factor c and σ of SVM; the third gene is the 36 sets of DGA features in order. Moreover, the corresponding relationship is shown in Figure 1. The encoding "1" represents the DGA feature that has been selected, while "0" represents the one which has not been selected. The parameter settings of GA are shown in Table 2. The ODF can be obtained by GA iterations using k-fold cross-validation (CV) accuracy as the fitness function. The chromosomes of GA are generated by binary coding. Each chromosome consists of three genes. The first two genes are penalty factor c and σ of SVM; the third gene is the 36 sets of DGA features in order. Moreover, the corresponding relationship is shown in Figure 1. The encoding "1" represents the DGA feature that has been selected, while "0" represents the one which has not been selected. The parameter settings of GA are shown in Table 2. The ODF can be obtained by GA iterations using k-fold cross-validation (CV) accuracy as the fitness function.  The conventional SVM is a linear and two-class classifier which must be upgraded as the transformer fault diagnosis is a nonlinear and multi-classification problem. The nonlinear SVM model and its flowchart are shown in Figure 1.
where ξ i is a slack variable and C is a penalty factor. The Lagrange function is presented as follows: Additionally, the decision function is: where K(x i ,x j ) is the kernel function which maps low-dimensional space to high-dimensional space. The commonly used kernel functions are Gaussian radial basis functions (RBF), polynomial functions, etc. There is only one parameter to be fitted in RBF function. Therefore, RBF is used as the kernel function of SVM:

Probabilistic SVM
To output the probability of each class, Platt [25] proposed a sigmoid-fitting method to obtain probabilistic outputs for SVM instead of uncalibrated values.
where f i is the sample's unthresholded output, y i is the sample's label, A and B are the parameters to be fitted by minimizing a cross-entropy function of p i and t i , which is shown in Equation (6). t i is the target probabilities, which is defined as Equation (7). For conventional IEC three ratios for transformer fault diagnosis, ratios of C 2 H 2 /C 2 H 4 , CH 4 /H 2 , and C 2 H 4 /C 2 H 6 are respectively encoded in a certain interval; the coding rule of the three-ratio method is shown in Table 3. The fault types can be recognized according to the corresponding codes in Table 4. Table 3. Coding rule of three-ratio method.

Ranges of Gas Ratios
Codes of Different Gas Ratios Thermal fault of low temperature < 300 • C 0 0 or 2 1 or 2 4 Thermal fault of high temperature ≥ 300 • C 0 2 1 or 2 5 No fault 0 0 0 However, the coding boundaries are too clear and depend heavily on the experience; a very small increase in the gas ratio may sharply change the codes. In fact, the boundaries of each code should be fuzzy [29].
In the FTR model, IEC codes 0, 1, 2 are replaced by ZERO, ONE, TWO; each gas ratio can be represented by a fuzzy vector. [u ZERO (r i ), u ONE (r i ), u THREE (r i )] is used to replace the IEC codes to obtain the fuzzy boundaries, where r 1 = C 2 H 2 /C 2 H 4 , r 2 = CH 4 /H 2 and r 3 = C 2 H 4 /C 2 H 6 . u ZERO (r i ), u ZERO (r i ), u ZERO (r i ) are membership functions. In previous studies on the fuzzy three-ratio model, the triangular membership function is often used, because the triangular membership function has fewer parameter settings and the sine curve transition is relatively smooth. Therefore, the triangular membership function is also chosen in this paper. Replace the conventional logic "AND" by "min", "OR" by "max", then calculate the fuzzy fault diagnosis vector f (i) [30]. To make the sum of f (i) equal to one, the normalization is shown as Equation (8).
According to Equation (8), if f (i) is the maximum, it can be considered that the transformer has No.i fault. If the second maximum f (j) is very close to f (i), the transformer is considered to have both No.i and No.j fault.

Analysis of PSVM and the Combination Method of GAPSVM and FTR
The outputs of the GAPSVM are the probabilities of each fault type of a sample; p i represents the probability of the No. i fault. Thus, there might be the following conditions: Energies 2020, 13, 6678 6 of 15

•
If p i > 0.5, the SVM has high confidence that the sample belongs to the corresponding fault type.

•
If p i ≤ 0.5, the sample is near the decision boundary of the SVM which carries out the classification of the fault in this situation. SVM has low confidence to classify the samples, and misclassification usually occurs in this situation.

•
The sample is more likely to be divided into the class with higher probability.
Based on the above theory, if the maximum probability of each classification does not reach 0.5, it is considered that SVM is not sufficient for the sample and the FTR model will be chosen for fault diagnosis.
The flowchart of the GAPSVM integrated with FTR model is shown in Figure 2. The ODF is selected by GA combined with SVM, and DGA samples are divided into a training set and testing set. Afterwards, the GAPSVM gives the probabilities of each fault type. If the maximum probability exceeds 0.5, the diagnosis result will be given by GAPSVM; otherwise, the sample will be diagnosed by FTR. The total equation for the integrated model is given as Equation (9). P i is the max GAPSVM output probability of a certain sample. When P i is less than or equal to 0.5, the sample is considered to be near the decision boundary of SVM, so it is diagnosed by FTR; u(r i ) is the fuzzy vector of FTR. FTR calculates the fuzzy vectors according to max and min and finally diagnoses the fault type of the sample. When P i is more than 0.5, the sample is considered not near the decision boundary of SVM, and the sample is diagnosed by GAPSVM. Where ξ i is a slack variable and C is a penalty factor, and Energies 2020, 13, 6678 7 of 15 by FTR. The total equation for the integrated model is given as Equation (9). Pi is the max GAPSVM output probability of a certain sample. When Pi is less than or equal to 0.5, the sample is considered to be near the decision boundary of SVM, so it is diagnosed by FTR; u(ri) is the fuzzy vector of FTR. FTR calculates the fuzzy vectors according to max and min and finally diagnoses the fault type of the sample. When Pi is more than 0.5, the sample is considered not near the decision boundary of SVM, and the sample is diagnosed by GAPSVM. Where ξi is a slack variable and C is a penalty factor, and K(xi,xj) is the kernel function, L(ω, λ, ξ, α, β) is the Lagrange function.

Fault Sample Data Source and Data Preprocessing
IEC TC 10 is a standard benchmarking dataset for power transformer diagnosis. In total, 118 samples of IEC TC 10 fault data have been randomly divided into training and testing datasets in each computation. The training set includes 93 samples of fault data and the testing set contains 25 samples of fault data. The information of the 118 samples is shown in Table 5. LE-D, HE-D, LM-T, H-T, and N-C, respectively, represent low energy discharge, high energy discharge, low and medium temperature fault, and normal condition. In order to eliminate the error caused by large data variation, the DGA data are normalized by the following equation: where x i is the i-th sample to be normalized, x imax , x imin is the maximum and minimum of values before normalization. x ni is the normalized value.  In the process of GA optimal feature selection, the fitness curve of GA is shown in Figure 3, and accuracy for all sample points and the optimal point found by GA is shown in Figure 4. Figure 4a shows the training accuracy of different c and σ. Figure 4b is the top view of Figure 4a, and the optimal point found by GA is marked in this figure.

DGA Feature Optimization Result Analysis
After 50-time GA optimal feature selection, DGA features are screened according to CV accuracy. The best three sets of DGA features and their CV accuracy are shown in  In the process of GA optimal feature selection, the fitness curve of GA is shown in Figure 3, and accuracy for all sample points and the optimal point found by GA is shown in Figure 4. Figure 4a shows the training accuracy of different c and σ. Figure 4b is the top view of Figure 4a, and the optimal point found by GA is marked in this figure.   In order to compare the accuracy between different DGA features, the input features of GAPSVM are divided into three categories: (1) the DGA full data, including H2, CH4, C2H2, C2H4, C2H6, CO, CO2, and TH; (2) the IEC three-ratio feature including CH4/H2, C2H4/C2H6, and C2H2/C2H4; (3) the ODF including H2/CH4, H2/C2H6, H2/TH, CH4/C2H2, CH4/C2H6, C2H2/C2H4, C2H4/TH C2H6/TH, CO/CO2. After 30 repeated genetic algorithm optimized SVM (GASVM) calculations, the accuracy of the training and testing sets of the three DGA features is shown in Table 7. Both the training and testing accuracy of ODF are higher than those of the other two DGA features, which indicates that the ODF significantly improves the training and testing accuracy of fault diagnosis. Moreover, ODF did not significantly increase the time complexity.

Threshold Optimization of the Integrated Model
The authors of [24] proposed that when the output probability of PSVM is approximately 0.5, then the sample is near the decision boundary. For the research in this paper, the question of how to find the optimal threshold to determine whether to choose GAPSVM or FTR for diagnosis is of great importance. When the threshold selected is larger, most of the samples will be diagnosed by FTR; when the threshold selected is smaller, most of the samples will be diagnosed by GAPSVM. Hence, choosing the right threshold is essential to the accuracy of the model. Thus, this paper selects nine values from 0.3 to 0.7 in steps of 0.05 as the thresholds to be selected. The training set and the testing set are randomly selected for 10 repeated calculations; the threshold with the highest average fault diagnosis accuracy of the testing set in the 10 repeated calculations is the optimal threshold. The average diagnostic accuracy of each threshold is shown in Figure 5. In order to compare the accuracy between different DGA features, the input features of GAPSVM are divided into three categories: (1)   Both the training and testing accuracy of ODF are higher than those of the other two DGA features, which indicates that the ODF significantly improves the training and testing accuracy of fault diagnosis. Moreover, ODF did not significantly increase the time complexity.

Threshold Optimization of the Integrated Model
The authors of [24] proposed that when the output probability of PSVM is approximately 0.5, then the sample is near the decision boundary. For the research in this paper, the question of how to find the optimal threshold to determine whether to choose GAPSVM or FTR for diagnosis is of great importance. When the threshold selected is larger, most of the samples will be diagnosed by FTR; when the threshold selected is smaller, most of the samples will be diagnosed by GAPSVM. Hence, choosing the right threshold is essential to the accuracy of the model. Thus, this paper selects nine values from 0.3 to 0.7 in steps of 0.05 as the thresholds to be selected. The training set and the testing set are randomly selected for 10 repeated calculations; the threshold with the highest average fault diagnosis accuracy of the testing set in the 10 repeated calculations is the optimal threshold. The average diagnostic accuracy of each threshold is shown in Figure 5. It can be seen from the figure that when the threshold is 0.5, the testing accuracy is the highest, because when the threshold is too small, a large number of samples near the decision boundary still choose GAPSVM for diagnosis, but GAPSVM has a lower diagnostic accuracy for samples at the decision boundary; when the threshold selected is too large, a large number of samples that are not near the decision boundary are diagnosed by FTR. For samples that are not near the decision boundary, the diagnostic accuracy of FTR is lower than that of GAPSVM. Moreover, the balance is reached when the threshold is selected as 0.5, so the optimal threshold is chosen as 0.5.

Analysis of Accuracy of GAPSVM
The training and testing accuracy of the maximum output probability which are larger and equal to 0.5 or smaller after 30 repeated GAPSVM calculations, with the ODF feature as input, are listed in Table 8. It can be identified from Table 8 that the training and testing accuracy of the maximum probability >0.5 is much higher than that of the maximum probability ≤0.5, which reflects the deficiency of the GAPSVM when a sample's maximum probability ≤0.5. As described in Section 2, the FTR model will be applied to diagnose the samples of max probability ≤0.5; the training set and testing set accuracy of the FTR model is 76.76% and 75.25%, respectively. The FTR model significantly improves the accuracy of the testing set and does not reduce the accuracy of the training set, although its accuracy is less than that of the SVM in the training samples with the max probability ≤0.5, because the average number of samples in the training set with a max probability ≤0.5 is only 1.7 in 30 calculations. It can be seen from the figure that when the threshold is 0.5, the testing accuracy is the highest, because when the threshold is too small, a large number of samples near the decision boundary still choose GAPSVM for diagnosis, but GAPSVM has a lower diagnostic accuracy for samples at the decision boundary; when the threshold selected is too large, a large number of samples that are not near the decision boundary are diagnosed by FTR. For samples that are not near the decision boundary, the diagnostic accuracy of FTR is lower than that of GAPSVM. Moreover, the balance is reached when the threshold is selected as 0.5, so the optimal threshold is chosen as 0.5.

Analysis of Accuracy of GAPSVM
The training and testing accuracy of the maximum output probability which are larger and equal to 0.5 or smaller after 30 repeated GAPSVM calculations, with the ODF feature as input, are listed in Table 8. Table 8. Accuracy of max probability >0.5 and max probability ≤0.5 samples.

Max Probability
Training Accuracy Testing Accuracy It can be identified from Table 8 that the training and testing accuracy of the maximum probability >0.5 is much higher than that of the maximum probability ≤0.5, which reflects the deficiency of the GAPSVM when a sample's maximum probability ≤0.5. As described in Section 2, the FTR model will be applied to diagnose the samples of max probability ≤0.5; the training set and testing set accuracy of the FTR model is 76.76% and 75.25%, respectively. The FTR model significantly improves the accuracy of the testing set and does not reduce the accuracy of the training set, although its accuracy is less than that of the SVM in the training samples with the max probability ≤0.5, because the average number of samples in the training set with a max probability ≤0.5 is only 1.7 in 30 calculations.

Comparisons with Other Diagnosis Methods
Back propagation neural network (BPNN), K-Nearest Neighbor (kNN), and GASVM are usually used in traditional power transformer fault diagnosis, when ODF is adopted as the input feature of these methods. The testing accuracy of the above methods and two published studies is listed in Table 8 and the accuracy of 10-time computation of different methods is shown in Figure 6.

Comparisons with Other Diagnosis Methods
Back propagation neural network (BPNN), K-Nearest Neighbor (kNN), and GASVM are usually used in traditional power transformer fault diagnosis, when ODF is adopted as the input feature of these methods. The testing accuracy of the above methods and two published studies is listed in Table  8 and the accuracy of 10-time computation of different methods is shown in Figure 6. As shown in Table 9, the testing accuracy of the GAPSVM integrated with the FTR model proposed in this paper reaches 86.80%, which is higher than that of kNN (64.00%), BPNN (81.60%), and GASVM (82.00%), the method in [18] (83.60%) and the method in [19] (84.40%). It can be also seen from Figure 6 that, in most cases, this model performs better than the traditional methods.  [18] 83.60 Method in [19] 84.40 This Paper 86.80

Model Evaluation
To verify the validity and generalization ability of the proposed model, 30 groups of DGA fault samples from the State Grid Co. of China are used as testing samples of the trained model in 3.4; the diagnostic results are shown in Table 10. As shown in Table 9, the testing accuracy of the GAPSVM integrated with the FTR model proposed in this paper reaches 86.80%, which is higher than that of kNN (64.00%), BPNN (81.60%), and GASVM (82.00%), the method in [18] (83.60%) and the method in [19] (84.40%). It can be also seen from Figure 6 that, in most cases, this model performs better than the traditional methods.  [18] 83.60 Method in [19] 84.40 This Paper 86.80

Model Evaluation
To verify the validity and generalization ability of the proposed model, 30 groups of DGA fault samples from the State Grid Co. of China are used as testing samples of the trained model in 3.4; the diagnostic results are shown in Table 10.  True samples  6  6  5  5  7  Predicted samples  6  6  7  6  4 From the diagnostic results of the 30 DGA samples, the proposed model is able to correctly classify 26 samples, and the accuracy can reach 86.67%. Furthermore, confusion matrix, F-measure, precision, and recall are introduced to examine the performance of the proposed model. The confusion matrix illustrates the relationship between predicted fault types and true fault types. Precision indicates the percentage of the samples that are identified as positive categories which are indeed positive categories, while the recall indicates the percentage of the positive examples which are predicted correctly in the dataset. On the other hand, F-measure is a weighted harmonic average of precision and recall, which provides a single score that balances both the concerns of precision and recall in one number. Equations of each measure index are shown as follows.

Fault Type LE-D HE-D LM-T H-T N-C
It can by identified from Table 11 that the model can effectively diagnose most of the fault types. Precision, recall, and F-measure are 0.875, 0.874, and 0.859, respectively. The above measure indexes and confusion matrix proved the validity and generalization of proposed model.

Model Validation Using Practical Dataset
In order to verify the performance of the method proposed in this article in practical applications and other datasets, the dataset of [18] is cited. The lack of some DGA data in the actual operation of the transformers is considered in this dataset, in which one or two gases are null. The information of the dataset is shown in Table 12. Firstly, the missing dissolved gas is replaced by the average value of the gas corresponding to the fault type. Because C 2 H 6 in HE-D are all missing values, the C 2 H 6 value of HE-D is replaced by the average value of C 2 H 6 gas corresponding to the fault type in the IEC TC 10 database. Then, kNN, BPNN, GASVM, the method in [18], the method in [19], and the model proposed in this paper are used to diagnose the fault types of DGA samples. The fault types of DGA samples diagnosed by this method are shown in Table 12. Moreover, the diagnostic accuracy of the different methods is shown in Table 13. It can be identified from the diagnostic results that the integrated model proposed in this paper is able to correctly diagnose 13 of 15 DGA fault samples. The fault diagnosis accuracy reached 86.67%, which is higher than kNN (66.67%), BPNN (73.33%), GASVM (73.33%), the method in [18] (80%), and the method in [19] (80%). The diagnostic results proved the superiority and robustness of the integrated model.

Conclusions
In this paper, GA combined with SVM is used to select the ODF, which is adopted as the input feature of the proposed fault diagnosis model. Aiming at eliminating the insufficiency of GASVM in some samples which are located near the decision boundary, an AI and expert experience combined model based on the GAPSVM integrated with FTR is proposed, which is the main innovation of this paper. The conclusions are as follows: • The ODF is selected from 36 DGA features by the GA and SVM, and the average testing accuracy of GASVM is 82.96%, which is higher than that of the IEC three-ratio feature (75.41%) and DGA full data (57.53%). The ODF is more suitable as the input feature of the power transformer fault diagnosis model.

•
The AI and expert experience combined model is established based on the IEC TC 10 dataset, and the average testing accuracy is 86.80% after 10-time computation, which is higher than kNN (64.00%), BPNN (81.60%), GASVM (82.00%), the method in [18] (83.60%), and the method in [19] (84.4%). Specifically, this model avoids misclassification efficiently when a sample is near the decision boundary of GAPSVM. Moreover, when 30 groups of DGA data from the State Grid Co. of China are diagnosed by the proposed model trained by 118 groups of IEC TC 10 DGA data, diagnostic accuracy is 86.67%. Additionally, the validity and generalization are verified by measure indexes of classification. • A total of 15 real cases with missing values are tested by six methods. GAPSVM integrated with the FTR model correctly diagnosed the fault types of the 13 cases, which proves that AI-based algorithms integrated with expert experience have great robustness.