A Transformer Fault Diagnosis Model Based On Hybrid Grey Wolf Optimizer and LS-SVM

: Dissolved gas analysis (DGA) is a widely used method for transformer internal fault diagnosis. However, the traditional DGA technology, including Key Gas method, Dornenburg ratio method, Rogers ratio method, International Electrotechnical Commission (IEC) three-ratio method, and Duval triangle method, etc., su ﬀ ers from shortcomings such as coding deﬁciencies, excessive coding boundaries and critical value criterion defects, which a ﬀ ect the reliability of fault analysis. Grey wolf optimizer (GWO) is a novel swarm intelligence optimization algorithm proposed in 2014 and it is easy for the original GWO to fall into the local optimum. This paper presents a new meta-heuristic method by hybridizing GWO with di ﬀ erential evolution (DE) to avoid the local optimum, improve the diversity of the population and meanwhile make an appropriate compromise between exploration and exploitation. A fault diagnosis model of hybrid grey wolf optimized least square support vector machine (HGWO-LSSVM) is proposed and applied to transformer fault diagnosis with the optimal hybrid DGA feature set selected as the input of the model. The kernel principal component analysis (KPCA) is used for feature extraction, which can decrease the training time of the model. The proposed method shows high accuracy of fault diagnosis by comparing with traditional DGA methods, least square support vector machine (LSSVM), GWO-LSSVM, particle swarm optimization (PSO)-LSSVM and genetic algorithm (GA)-LSSVM. It also shows good ﬁtness and fast convergence rate. Accuracies calculated in this paper, however, are signiﬁcantly a ﬀ ected by the misidentiﬁcations of faults that have been made in the DGA data collected from the literature.


Introduction
Transformer is one of the most critical equipment for power transmission and transformation and its safety and reliability is the basis to ensure continuous operation and power supply of power grid.Failures of transformer may bring huge losses to the power grid, and the repair and maintenance of the transformer is very expensive and difficult.Identifying the incipient faults of the transformer in time becomes very important which may avoid power outages and economic losses.DGA is an important and successful tool to detect incipient faults of oil-filled transformers.Based on the corresponding relationship between the type of dissolved gas in oil and internal fault, the abnormal state of the transformer can be identified by DGA method according to the composition and the content of various gases, and the fault type, severity and development trend of the fault can be determined.Several Energies 2019, 12, 4170 3 of 18 select the optimal training parameters of the TWSVM classifier, and finally, the actual fault samples and random tests were used to verify the validity of the model.Hazlee Azil Illias and Wee Zhao Liang [23] proposed a transformer fault diagnosis model based on hybrid SVM and improved evolutionary particle swarm optimization (SVM-MEPSO), which used a stepwise regression approach for data reduction and the results show that the hybrid SVM-MEPSO time-varying acceleration coefficient (TVAC) technology can obtain the highest accuracy compared with other PSO algorithms.The optimal hybrid DGA feature subset (OHFS) was selected from three feature sets by using genetic algorithm-support vector machine-feature screen (GA-SVM-FS) model and used as input of the improved social group optimization (ISGO) optimized multi-SVM classifier to develop a transformer fault diagnosis model which achieved the highest fault diagnosis accuracy (92.86%) compared with other diagnostic models [24].In addition, other scholars also used the SVM [49], relevance vector machine (RVM) [50] for transformer fault diagnosis and achieved good results.
The intelligent approaches mentioned above have directly or indirectly improved the accuracy of the transformer fault diagnosis methods based on DGA.However, there are deficiencies in the parameter optimization, the feature set selection and data preprocessing methods, which limit the practical application of AI algorithm in transformer fault diagnosis.A novel swarm intelligence algorithm proposed in 2014 by Mirjalili et al., the grey wolf optimization [51], which has the advantage of superior performance, few parameters and easy to implement, has attracted the attention of many scholars [52][53][54].Compared with GA, PSO and DE, GWO shows superior performance in exploitation and exploration, high local optima avoidance and fast convergence.Due to its competitive performance, the GWO is employed for parameter optimization in this study.Because of a slow convergence rate and easy to fall into local optimum of the original GWO, various improved strategies for the GWO have been proposed, and achieved good results [55][56][57][58].This paper proposes a hybrid grey wolf optimization algorithm (HGWO), combining the DE algorithm with the GWO, which uses the powerful search ability of the DE to update position of the grey wolf α, β, δ, and thus jump out stagnation and makes the GWO not to fall into the local optimum, which accelerates the convergence speed and improves the performance of the algorithm.In addition, the variation and selection of DE algorithm are used to generate the initial population, which can improve the diversity of the population.Then, the HGWO is applied as the optimizer of a transformer fault diagnosis model based on HGWO-LSSVM with the optimal hybrid DGA feature set selected as the input.The KPCA method is used for feature extraction.Finally, the proposed model is tested and compared with other models.This paper is organized as follows: Section 2 introduces the basic theory of the HGWO-LSSVM model.In Section 3, the HGWO-LSSVM model is proposed and in Section 4 the performance of HGWO-LSSVM model is tested and compared with other diagnostic models, which proves the effectiveness of the proposed model.Finally, the conclusion is summarized and potential future work is discussed in Section 5.

Kernel Principal Component Analysis
Principal component analysis (PCA) is a linearly reduced method for data compression and can be used to extract the main components from high-dimensional variables, by which the dimension and complexity of the data are reduced.The extracted data, which can only characterize the linear state, loses the nonlinear components in the original data, which leads to the lack of valid information.The principle of KPCA is based on PCA.In the KPCA, the kernel function is used to realize the nonlinear variation of mapping the original data to the high-dimensional linear feature space, and then PCA is used to extract the features.The essence of KPCA is to perform PCA on the data mapped to the feature space.Let x 1 , x 2 , x 3 , . . ., x N ∈ R as the data sample, and it is used as the input data which is mapped from the original space to the high-dimensional linear feature space F by the nonlinearity function φ(•), and the covariance matrix C of φ x j is: where the eigenvalue and eigenvector in the formula are: λV = C F V, and the eigenvalue λ ≥ 0, V is the eigenvector.Defining N × N dimension matrix K ij = K x i , x j = φ x j φ x j , and the eigenvectors Then, the k(k = 1, 2, . . ., N) principal elements t k in the feature space is: As the same with the general principal component analysis algorithm, the input data needs to satisfy zero-mean conditions.This work can be done by replacing K with the following: where L i,j = 1 N .The KPCA has the same mathematical and statistical characteristics as the linear PCA in the F space, such as each principal component is uncorrelated, the principal component can represent the maximum variance of the sample data, and the principal component is used to reconstruct the sample data, which can gain a minimum mean square error.In addition, it extracts more sample information than linear PCA.Under the premise of achieving the same classification performance, the number of principals required by KPCA is less than that of linear PCA.Compared with other nonlinear feature extraction methods, it does not need to solve the nonlinear optimization problem and only involves the eigenvalue decomposition calculation of the matrix.KPCA has been widely used in feature extraction [42] and has achieved good results.

Differential Evolution
Storn and Price [59] proposed a powerful method for global optimization, differential evolution, DE mainly produces a new population through the mechanisms of population variation, crossover and selection to obtain the optimal solution, which can improve the diversity of population.Because of its simple principle, few controlled parameters and strong robustness, DE has been widely used in constrained optimization [60][61][62], nonlinear control optimization [63], feature selection [64] and other optimization problems [65][66][67][68].
DE is used to solve the optimization problem, which mainly includes the following operations:

Initialization of Population
Like other swarm intelligence optimization algorithms, DE also needs to initialize the population: where x i (0) is the ith individual, j is the dimension.
where x L i,j and x U i,j are the lower bound and the upper bound of the j dimension, respectively, rand(0, 1) is a random number in the range of [0, 1].
Energies 2019, 12, 4170 5 of 18 2.2.2.Mutation DE realizes individual variation through differential strategy.The common differential strategy is to randomly select two different individuals in the population, and scale the vector difference and synthesize the vector with the individual to be mutated.
where r1, r2 and r3 are random numbers in the range of [0, NP], F is scaling factor, x i (g) represents the ith individual in the g generation population.

Crossover
The crossover operation is carried out on the gth generation population x i (g) and its variant intermediate v i (g + 1) .
where CR is crossover probability.

Selection
The strategy of greedy selection is adopted in DE, that is, the better individual is selected as the new one.

Grey Wolf Optimizer
Grey wolf optimizer, a newly swarm intelligence algorithm introduced by Mirjalili et al. [51], is a powerful meta-heuristic algorithm, which has the ability to compete with other algorithms including PSO, GA, DE and many other algorithms in terms of solution accuracy, minimum computational effort, and aversion of premature convergence [69,70].Because of these advantages, it has been gained a very big research interest by tremendous audiences from several domains and successfully applied in the fields of global optimization [71], control engineering [72,73], feature selection [74], scheduling problems [75,76] in recent years.
Based on the physical behavior and social behavior of grey wolves, the mathematical model of the GWO algorithm contains five parts, including social hierarchy, encircling, hunting, attacking and searching, and a brief introduction is presented as follows.

Social Hierarchy
In GWO, a hierarchical model is constructed according to social hierarchy of the grey wolf, and the fitness of each individual is calculated, and the three grey wolves with the best fitness are sequentially labeled as α, β, δ, and the rest grey wolf is marked as ω.The optimization process of GWO is mainly guided by the best three solutions in each generation (i.e., α, β, δ).

Encircling Prey
When the grey wolf hunts the prey, it gradually approaches the prey and surrounds it.The mathematical model of this behavior is as follows: Energies 2019, 12, 4170 6 of 18 where t is number of iterations, A and C are the coefficient vectors; X p is the position vector of the prey, X(t) is the position vector of the wolf, a is linearly reduced from 2 to 0 during the iteration; r 1 and r 2 is a random vector in [0, 1].

Hunting
In order to simulate the search behavior of grey wolves, it is assumed that α, β, δ have strong ability to identify the potential prey and during each iteration, the best three wolves (α, β, δ) are retained, and then the locations of other search agents are updated based on their location.The mathematical model can be expressed as follows: where X α , X β , X δ are the positions of α, β, δ, X represents the position of the wolf, D α , D β , D δ respectively represent the distance between the current candidate and the optimal three wolves, when |A| > 1, the grey wolves are scattered among the regions to search for prey and when |A| < 1, the grey wolves will focus on hunting for prey in the search areas.

Attacking Prey
According to the formula of encircling prey, the decrease of a causes a fluctuation of A accordingly.And A is a random vector in [−2a, 2a], where a decreases linearly during the iteration.When A is in the [−1, 1], the position of the search agent in next moment can be anywhere between the current grey wolf and the prey.Parameter a is linearly updated in each iteration to range from 2 to 0 as follows: where t is the iteration number and Max Iter is the total number of iterations allowed for the optimization.

Searching Prey
Grey wolves rely mainly on α, β, δ to find the prey.They search for prey location in the beginning and then concentrate to attack prey.In the model, A > 1 makes the search agent far away from the prey, enabling GWO to perform global search.C is another search coefficient of the GWO algorithm.As can be seen from the formula of encircling prey, the C is a random vector in the range of [0, 2], which provides a random weight for the prey to add (C > 1) or decrease (C < 1).This helps GWO to exhibit random search behavior during the optimization process to avoid the algorithm falling into local optimum.The pseudo-code of the GWO (Algorithm 1) is presented in the following form: Although the GWO algorithm shows the superiority in many fields, when the training sample is a big data, it will face problems of local optimum, slow computation speed, and low accuracy.Therefore, this paper uses DE combined with the GWO to improve the performance of the original GWO algorithm, which uses the DE with the powerful search ability to force the GWO to jump out of the stagnation when attacking the prey to avoid the local optimum and achieve the appropriate compromise between exploration and exploitation for further accelerating the convergence speed and improving the accuracy of GWO.In addition, the variation and selection of DE algorithm are used to generate the initial population, which can improve the diversity of the population.
(3) Find α, β, and δ as the first three best solutions based on their fitness values.(4) t = 0. while t ≤ Max Iter do for each Wolf i ∈ pack do Update current wolf's position according to Equation (15).end -Update a, A, and C as in Equations ( 16), (11) and (12).
-Evaluate the positions of individual wolves.
-Update α, β, and δ positions as the first best three solutions in the current population.

Least Square Support Vector Machine
SVM is a new machine learning theory based on V-dimensional theory and structural risk minimization principle proposed by Bell Labs researcher Vapnik in the 1990s [34].which has excellent learning performance and generalization ability.Compared with other machine learning algorithms, SVM has significant advantages in dealing with overfitting and local optimum.Since SVM was proposed, it has been successfully applied in many fields, such as regression analysis, pattern recognition and so on.Least squares support vector machine LS-SVM (Least Square-Support Vector Machine) is an extension of standard SVM, which transforms quadratic programming problem into linear equations and a much faster solution speed and strong real-time performance is obtained.Let D = (x i , y i ) i = 1, 2, 3, . . ., N be the training sample set, where x i is the input and y i is the output.For nonlinear regression, LS-SVM is modeled as follows: where ω represents the weight vector, ϕ(x i ) is a nonlinear function, which is used to complete the mapping from the input space to the high dimensional feature space.b is the deviation, and e i represents the fitting error, which is the error between the actual training output and the estimated output of the data group i. ω and b can be obtained from the following optimization problems: Equation ( 18) satisfies the equation constraint: In the Equation (18), the first part is to adjust the weight and punish the large weight, and the second part represents the training error.For Equation (18), define the Lagrange function L: In Equation ( 20), α i is the Lagrange multiplier and γ is the penalty parameter, which balances the complexity of the LS-SVM model, such as y(x) and training error.According to the KKK (Karush-Kuhn-Tucker) optimization condition, Equation ( 20) is used to obtain the partial derivatives of w, b, e and α i respectively and make them all 0, and the optimization conditions are obtained.
The ω is eliminated and the LS-SVM regression model was obtained.
where K(x, x i ) is the kernel function, x represents the input vector of the training sample, and x i is the center of the kernel function.α and b is the solution of Equation ( 21).Because there is a nonlinear relationship between the transformer fault and the DGA data, the radial basis kernel function (RBF), which is suitable to solve the nonlinear problem and has few kernel parameters, is selected as the kernel function for the research.
where σ 2 is the kernel parameter.Penalty parameter γ and kernel parameter σ 2 have great effect on the accuracy of LS-SVM model.The generalization ability of the model increases with the decrease of γ, while the training error increases.The smaller of kernel parameter, the higher of the model complexity, and a larger kernel parameter is easy to lead to lack of learning.So reasonable γ and σ 2 values are the key to the success of the model.

Fault Diagnosis Model Based on HGWO-LSSVM
In the proposed fault diagnosis method based on HGWO-LSSVM model, the HGWO is used to optimize the parameter of LSSVM algorithm.The construction of the model includes the following parts: (1) Sample collection.The DGA data of various fault modes are collected to form the fault sample set, which is used as the training set of the fault diagnosis model.
(2) Feature set selection.Select commonly used feature set and optimal hybrid feature set as the input of the model, respectively.
(3) Sample division.The sample is divided into two groups: training data and test data.The training data is used in the simulation to establish the mathematical model, and the test data is used to validate the model.
(4) Sample normalization.After normalization, all the sample data values are in the range of [0,1], which makes the calculation speed of the model faster.The conversion function of normalization is as follows: x where: x i represents the actual value; x max and x min represent the maximum and minimum value, respectively.
(5) Feature extraction.The KPCA method is used for feature extraction to reduce the dimensions of the sample data and the number of principal components is selected with a cumulative contribution rate greater than 90%.Step 1: Set each initial parameter including population size, maximum number of iterations, dimension, the scaling factors and the crossover probability factor CR.
Step 2: Initialize the population according to Equation ( 4), where X consists of a kernel width parameter σ and a regularization parameter C of the least squares vector machine.
Step 3: Calculate the individual fitness values and arrange them in descending order, with the top three individual X α , X β , X δ as the upper wolves.
Step 4: Update the position of the parent population individual using Equation (15).
Step 5: According to Equations ( 6) and ( 7), the differential algorithm is used to perform mutation and cross-update to generate new children.
Step 6: Update the parent population according to Equation ( 8), and then update C, A, and a according to Equations ( 11) and ( 12).
Step 7: Update the parental P α , P β , P δ , and sort the grey wolf father population again.The algorithm termination condition is judged.When the condition is satisfied, the parents P α and f (P α) are returned, and the obtained optimal solutions C and σ are output.
Step 8: Establish an LSSVM model based on σ and C.
The fault diagnosis model based on LSSVM integrated with KPCA and HGWO is shown in Figure 1.It includes two main parts.One is that the transformer DGA data is preprocessed by KPCA.The other is that the parameter of LSSVM model is optimized by HGWO.

𝐾(𝑥, 𝑥
where σ 2 is the kernel parameter.Penalty parameter γ and kernel parameter σ 2 have great effect on the accuracy of LS-SVM model.The generalization ability of the model increases with the decrease of γ, while the training error increases.The smaller of kernel parameter, the higher of the model complexity, and a larger kernel parameter is easy to lead to lack of learning.So reasonable γ and σ 2 values are the key to the success of the model.

Fault Diagnosis Model Based on HGWO-LSSVM
In the proposed fault diagnosis method based on HGWO-LSSVM model, the HGWO is used to optimize the parameter of LSSVM algorithm.The construction of the model includes the following parts: 1) Sample collection.The DGA data of various fault modes are collected to form the fault sample set, which is used as the training set of the fault diagnosis model.
2) Feature set selection.Select commonly used feature set and optimal hybrid feature set as the input of the model, respectively.
3) Sample division.The sample is divided into two groups: training data and test data.The training data is used in the simulation to establish the mathematical model, and the test data is used to validate the model.4) Sample normalization.After normalization, all the sample data values are in the range of [0,1], which makes the calculation speed of the model faster.The conversion function of normalization is as follows: where:   represents the actual value;   and   represent the maximum and minimum value, respectively.

Case study and Analysis
The MATLAB toolkit (R2018b, MathWorks, Natick, Massachusetts, USA) is used to implement the LSSVM fault diagnosis model using HGWO optimization mentioned above.At the same time, a large number of transformer DGA data were collected, and the data was preprocessed and classified to verify the effectiveness of the fault diagnosis model.

Fault Sample Collection
During the operation of the power transformer, internal heat or discharge failure will cause the transformer oil to decompose and generate gases, mainly including H and CO 2 .When faults of different type and degrees occur, the content of the seven gases will vary significantly.Therefore, the content of these seven gases can be selected as the feature set.
In this paper, transformer DGA data have been collected from many literatures.These literatures analyze the transformer fault condition and the processing process, and finally determine the specific fault cause and fault type through the disintegration inspection.The fault types of the transformer include low temperature overheating T1 (<300 • C), medium and low overheating T2 (300~700 • C), and high temperature overheating T3 (> 700 • C), low energy discharge (D1), high energy discharge (D2), partial discharge (PD), including normal mode (N).The distribution of the sample DGA data used in this study are shown in Table 1.In addition, part of the field DGA data with actual faults and the fault type diagnosed by the IEC ratio method are shown in Table 2. Considering that the fault sample data of the low temperature overheating is relatively few, the two types of faults, low temperature overheating and medium temperature overheating, are regarded as one category.Thus, the failure types involved in this paper include five categories, namely, low to medium temperature overheating (T2), high temperature overheating (T3), low energy discharge (D1), high energy discharge (D2) and partial discharge (PD), including normal mode (N), a total of 6 categories.

Feature Set Selection
Feature selection is crucial for a classification mathematical model.It is necessary to select features that reflect the core characteristics of the sample and consider reducing the computational errors caused during the model training.In a transformer fault diagnosis model, the DGA data are used as inputs of the diagnostic model.The feature sets that have been widely used so far include two categories: dissolved gases concentration and dissolved gas ratios [77], as shown in Table 3.Studies have shown that [24,47,48,78], using a hybrid feature set including DGA gas and gas ratios as input is preferred over using only DGA gas or gas ratios.The optimal hybrid feature set selected in this paper consists of CH , which has been proved that high diagnostic accuracy can be obtained [24].

Multi-Class Classification Model
The fault diagnosis process of the transformer is essentially a multi-class classification problem.As a two-classifier, LS-SVM cannot be directly used for multi-class classification.In the diagnosis model proposed in this paper, a multi-class binary tree based on LS-SVM is developed.
The model includes a total of 5 sub-classifiers, which are proposed to identify the six fault types: low to medium temperature overheating (T2), high temperature overheating (T3), low energy discharge (D1), high energy discharge (D2), partial discharge (PD) and normal mode.LS-SVM1 separates the normal state from the fault state while LS-SVM2 separates discharge faults from thermal faults.The third and fourth LS-SVM classify the thermal faults as either low to medium temperature overheating or high temperature overheating, and discharge faults as either partial discharge or low energy discharge and high energy discharge, respectively, while the fifth LS-SVM is used to classify the low energy discharge and high energy discharge.Meanwhile, to improve training and diagnostic efficiency, the input of each sub-classifier contains the most effective feature parameters for identifying the fault, which are optimized by HGWO.The multi-class binary tree constructed in this paper is shown in Figure 2.

Results and Discussion
HGWO is used to optimize the parameters of the LS-SVM in the multi-classification model.The relevant initial parameters of the HGWO algorithm are set as: population size is 50, maximum iteration number is 200, and variable dimension is 2. In the differential evolution algorithm, the scaling factors M max and M min are 0.8 and 0.2, respectively, and the crossover probability factor CR is 0.2.The HGWO-LSSVM fault diagnosis model has been implemented by the MATLAB simulation platform on an 8-core Lenovo laptop (T470P, Lenovo, Beijing, China) with 8 GB memory and 2.8 GHz clock, running Windows 10 enterprise operating system (64-bit).

Multi-Class Classification Model
The fault diagnosis process of the transformer is essentially a multi-class classification problem.As a two-classifier, LS-SVM cannot be directly used for multi-class classification.In the diagnosis model proposed in this paper, a multi-class binary tree based on LS-SVM is developed.The model includes a total of 5 sub-classifiers, which are proposed to identify the six fault types: low to medium temperature overheating (T2), high temperature overheating (T3), low energy  Traditional DGA methods, including the IEC three-ratio method, Rogers ratio method, Duval triangle method, Dornenburg ratio method, are adopted to diagnose the testing data set for comparison.Table 5 shows the fault diagnosis accuracy for different methods using the same sample.The Dornenburg ratio method shows the lowest accuracy.The accuracy of Rogers ratio method is 63.84%, lower than the three ratio method and Duval triangle method.The accuracy of three ratio method is better than Duval triangle method.Because three-ratio and Duval triangle methods are obtained from typical faults, they will fail in dealing with some complex faults.The accuracy of the proposed method is 97.45%.Compared with the traditional DGA methods, the LSSVM method shows a relatively good diagnosis accuracy rate.When the LSSVM parameters are optimized by HGWO, the accuracy of the fault diagnosis improves substantially.However, misclassifications of the original DGA data collected from the literatures may lead to errors in the accuracy in this paper.In order to verify the superiority of the proposed method, the sample data is used to construct the fault diagnosis model by using LSSVM, GWO-LSSVM, PSO-LSSVM, GA-LSSVM, etc.The results are compared with the method in this paper, as shown in Table 6 and Figure 3.To further verify the improvement of using the optimal hybrid feature set to the model accuracy, we applied dissolved gases concentration and the optimal hybrid feature set as inputs, respectively.And the results are shown in Table 7.It can be seen from Table 7 that while using the optimal hybrid feature set as the inputs, the accuracy of the fault diagnosis model can be significantly improved, which means data preprocessing and feature selection play an important role in the construction of fault diagnosis model.According to the results above, the fault diagnosis model proposed in this paper not only has higher diagnostic accuracy, but also consumes less time and has higher efficiency.However, misclassifications of the raw data may affect the accuracies in this paper.

Conclusions
In this paper, a transformer fault diagnosis model based on HGWO-LSSVM is proposed.First, transformer DGA data from many literatures are collected and the optimal hybrid feature set is   It can be seen from Table 6: (1) The average training time of the classifier in the proposed method is far less than the training time of the classifiers constructed by several other methods, indicating that training time of the transformer fault diagnosis model can be greatly shortened according to the method of this paper, which can improve the efficiency of fault diagnosis and increase the online diagnostic capabilities.
(2) Under the same fault sample set, the proposed method achieves a higher average classification accuracy in the diagnosis of various types of faults.In addition, compared with other optimization algorithms, GWO-LSSVM achieves higher classification accuracy and fast convergence speed, which proves that the good performance of GWO algorithm in parameter optimization.
(3) Compared with GWO-LSSVM, the fault diagnosis model proposed by HGWO-LSSVM achieves higher fault classification accuracy and faster training speed, indicating that after the combined with DE algorithm, population diversity is improved through operations such as crossover and mutation.At the same time, the DE algorithm forces GWO to jump out of the stagnation state when attacking the prey, thus improving the local optimum avoidance.
It can be seen from Table 7 that while using the optimal hybrid feature set as the inputs, the accuracy of the fault diagnosis model can be significantly improved, which means data preprocessing and feature selection play an important role in the construction of fault diagnosis model.
According to the results above, the fault diagnosis model proposed in this paper not only has higher diagnostic accuracy, but also consumes less time and has higher efficiency.However, misclassifications of the raw data may affect the accuracies in this paper.

Conclusions
In this paper, a transformer fault diagnosis model based on HGWO-LSSVM is proposed.First, transformer DGA data from many literatures are collected and the optimal hybrid feature set is selected as the input of the model.KPCA is used for feature selection.The hybrid grey wolf optimizer, combined GWO with DE, is proposed to optimize the LSSVM to develop a fault diagnosis model.The proposed model is compared with traditional DGA methods and other models such as LSSVM, GWO-LSSVM, PSO-LSSVM and GA-LSSVM.The major conclusions in this paper are listed as follows: (1) Compared with traditional DGA methods, the model proposed in this paper has achieved better performance on transformer fault diagnosis, indicating the effectiveness of the proposed model.
(2) Compared with other optimization algorithms, GWO-LSSVM achieves higher classification accuracy and fast convergence speed, which proves that the good performance of GWO algorithm in parameter optimization than PSO and GA.
(3) The model proposed by HGWO-LSSVM achieves higher fault classification accuracy and faster training speed than GWO-LSSVM, which verifies the effectiveness of combining DE with GWO.
(4) The dissolved gases and hybrid DGA features are used as DGA feature sets, respectively.The accuracy of the fault diagnosis model based on the optimal hybrid feature set has been improved by nearly 10% than DGA gases.It is proved that the optimal hybrid feature set can indeed improve the accuracy of fault diagnosis model.
(5) Accuracies calculated in this paper, however, are significantly affected by the misidentifications of faults that have been made in the DGA data collected from the literature.Therefore, in order to ensure the reliability of the accuracy for the model, it is very important to ensure the accuracy of the raw data.
At present, most of the transformer fault diagnosis model are rarely taking the correlation between transformer faults and other factors other than DGA data into consideration, which lead to a low generalization ability of the model and the accuracy of fault diagnosis will decrease for a new data set.In fact, the failure of the transformer, in addition to the relationship with the DGA data, may also be related to insulating oil type [79], voltage levels, operating oil temperature, load, operating years, and so on [80].In this paper, the DGA data are arranged according to the voltage level, including four voltage levels of 110 kV, 220 kV, 500 kV, and 750 kV.Therefore, in the future work, the DGA data will be classified by the voltage level to develop fault diagnosis models, from which the relationship between the voltage level and the fault type of the DGA data can be analyzed.Based on the results of the study, a more generalized model can be proposed which can further improve the accuracy of transformer fault diagnosis.

( 6 )
Model construction.The steps of the transformer fault diagnosis algorithm based on HGWO-LSSVM model are as follows: Energies 2019, 12, 4170 9 of 18

Figure 1 .
Figure 1.Flowchart of Fault Diagnosis Model Based on hybrid grey wolf optimized least square support vector machine (HGWO-LSSVM).

Figure 1 .
Figure 1.Flowchart of Fault Diagnosis Model Based on hybrid grey wolf optimized least square support vector machine (HGWO-LSSVM).

Figure 2 .
Figure 2. Binary tree of transformer fault diagnosis model.

Figure 2 .
Figure 2. Binary tree of transformer fault diagnosis model.

Figure 3 .
Figure 3.Comparison of accuracy for different fault diagnosis model.

Figure 3 .
Figure 3.Comparison of accuracy for different fault diagnosis model.

Table 1 .
Distribution of transformer sample data.

Table 2 .
Partial field dissolved gas analysis (DGA) data with actual faults.

Table 5 .
Accuracy rate for the different diagnostic methods.

Table 6 .
Comparison of different fault diagnosis model.

Table 7 .
Comparison of using dissolved gases concentration and optimal hybrid DGA feature subset (OHFS) as input, respectively.

Table 7 .
Comparison of using dissolved gases concentration and optimal hybrid DGA feature subset (OHFS) as input, respectively.