Prediction Model for Dissolved Gas Concentration in Transformer Oil Based on Modiﬁed Grey Wolf Optimizer and LSSVM with Grey Relational Analysis and Empirical Mode Decomposition

: Oil-immersed transformer is one of the most important components in the power system. The dissolved gas concentration prediction in oil is vital for early incipient fault detection of transformer. In this paper, a model for predicting the dissolved gas concentration in power transformer based on the modiﬁed grey wolf optimizer and least squares support vector machine (MGWO-LSSVM) with grey relational analysis (GRA) and empirical mode decomposition (EMD) is proposed, in which the inﬂuence of transformer load, oil temperature and ambient temperature on gas concentration is taken into consideration. Firstly, GRA is used to analyze the correlation between dissolved gas concentration and transformer load, oil temperature and ambient temperature, and the optimal feature set a ﬀ ecting gas concentration is extracted and selected as the input of the prediction model. Then, EMD is used to decompose the non-stationary series data of dissolved gas concentration into stationary subsequences with di ﬀ erent scales. Finally, the MGWO-LSSVM is used to predict each subsequence, and the prediction values of all subsequences are combined to get the ﬁnal result. DGA samples from two transformers are used to verify the proposed method, which shows high prediction accuracy, stronger generalization ability and robustness by comparing with LSSVM, particle swarm optimization (PSO)-LSSVM, GWO-LSSVM, MGWO-LSSVM, EMD-PSO-LSSVM, EMD-GWO-LSSVM, EMD-MGWO-LSSVM, GRA-EMD-PSO-LSSVM and GRA-EMD-GWO-LSSVM.


Introduction
The transformer is the core equipment of power system and its running state is closely related to the reliability and stability of power grid. The catastrophic failure of the transformer will lead to a power failure accident, and the power system will be damaged, which will bring huge economic loss and social harm. Therefore, it is very important to detect potential faults in transformer. Dissolved gas analysis (DGA) is widely used in transformer internal latent fault diagnosis. A failure of a power transformer usually results in degradation of the insulation, leading to the release of gases dissolved in oil. The composition of dissolved gas is closely related to the abnormal state inside the transformer. The fault-related characteristic gases mainly include hydrogen (H 2 ), methane (CH 4 ), acetylene (C 2 H 2 ), ethylene (C 2 H 4 ), ethane (C 2 H 6 ), carbon monoxide (CO) and carbon dioxide (CO 2 ). In the case of electrical or thermal failure, the concentration of gases varies gradually and regularly over time. By analyzing the change trend of the dissolved gas concentration, the operating state of the transformer can be obtained to determine the fault type, and the potential risks of the transformer can be detected in time to predict the development trend of the latent faults of the transformer, so as to avoid the occurrence of serious events and minimize the loss. Therefore, it is of great significance to study the prediction method of dissolved gas concentration and the prediction results can provide a basis for transformer state evaluation and fault prediction.
Recently, various artificial intelligence techniques have been used to develop time series prediction models and achieved good results, such as artificial neural network (ANN) [1], grey model (GM) [2], support vector machine (SVM) [3][4][5] and least squares support vector machine (LSSVM) [6,7]. However, these methods have some drawbacks. For example, training speed of ANN is slow, and it is easy to fall into local minimum, and a large number of training samples are needed in the training. GM (1,1) can reveal the development law of things with a small amount of incomplete information, but only consider the development and change of a certain gas, and lack of comprehensive consideration of a variety of gases.
In addition, many scholars put forward many improved prediction models, which can be applied to the prediction of dissolved gas in power transformer oil. Lu et al. used the Gaussian process regression (GPR) to predict the dissolved gas concentration, in which the grey relational coefficients of gas concentration were analyzed by using grey relational analysis, and then the performance of the model was improved [8]. By using improved fruit fly optimization algorithm (FFOA) to select the smooth factor, Lin et al. proposed a combined prediction model based on kernel principal component analysis (KPCA) and generalized regression neural network (GRNN) to predict gas concentration, obtaining better data fitting and more accurate prediction [9]. Zheng et al. proposed an improved particle swarm optimization algorithm combined with LSSVM based on wavelet technology to predict the dissolved gas concentration. The comparison results showed that the mean absolute percentage error (MAPE) of this method was significantly better than the other four methods [10]. Pereira, F.H. et al. proposed a nonlinear autoregressive neural network model combined with discrete wavelet transform to predict the concentration of dissolved gas, which shows better prediction results compared with the current prediction models and the commonly used time series techniques [11]. Lin et al. proposed a transformer operation state prediction method based on long short-term memory and deep belief network (LSTM_DBN), which predicted the dissolved gas concentration by developing a long short term memory (LSTM) model [12]. On the basis of radial basis function neural network (RBFNN), back propagation neural network (BPNN), LSSVM of two different kernel functions and grey model, Liu et al. proposed a combined prediction model based on cross entropy, in which the weight coefficient of each algorithm is determined by cross entropy theory, and analyzed its application [13]. Peimankar, et al. proposed an integrated time series prediction algorithm based on evolutionary multi-objective optimization algorithm for predicting dissolved gas concentration in power transformers, which has higher accuracy and reliability [14].
Although the intelligent prediction method mentioned above improves the accuracy of the prediction model, there is still room for improvement. Moreover, the possible influence of transformer load, operating oil temperature, ambient temperature and other variables on gas concentration is not considered in the prediction methods. Therefore, the practical application of these algorithms is limited. At present, the commonly used intelligent optimization algorithms, including genetic algorithm (GA), particle swarm optimization (PSO), differential evolution (DE), etc., have achieved good results in parameter optimization of LSSVM. The grey wolf optimizer (GWO) is a novel swarm intelligence optimization algorithm proposed by Mirjalili et al. in 2014 [15], which shows good Energies 2020, 13, 422 3 of 20 performance in strong convergence, few parameters and easy implementation compared with GA, PSO and DE, and has attracted the attention of many scholars [16][17][18]. Due to the slow convergence rate of GWO in the late stage, it is easy to fall into the local optimization. The modified grey wolf optimizer (MGWO) proposed in this paper adjusts the exploration and exploitation of the algorithm and assigns more weight to the most suitable grey wolf, so as to find the optimal new position of grey wolf in the iterative process. The MGWO is applied to the parameter optimization of LSSVM. In the prediction model, the influence of the transformer oil temperature, load and ambient temperature on the concentration of dissolved gas is taken into consideration. Firstly, grey relational analysis (GRA) is used to evaluate the correlation between gas concentration and transformer load, oil temperature and ambient temperature to extract the main factors influencing the gas concentration as the input of the model. Then the non-stationary series DGA data is decomposed into subsequences with different scales by using empirical mode decomposition (EMD). Finally, the MGWO-LSSVM model is used to predict subsequences of each gas to get the final concentration of dissolved gas and the validity and superiority of the model are verified.
This paper is organized as follows: In Section 2, the basic theory of the GRA-EMD-MGWO-LSSVM model is introduced. The GRA-EMD-MGWO-LSSVM model is proposed in Section 3. In Section 4, the performance of GRA-EMD-MGWO-LSSVM model is verified by comparing with other models and Section 5 provides the conclusion of the work and discussion of potential future work.

Grey Relational Analysis
Grey relational analysis (GRA) is an analysis method based on grey system theory [19]. Its basic idea is to evaluate the correlation degree between various factors according to the similarity degree of the geometric shape of the change curve of each factor. By quantitative analysis of the development trend of the dynamic process, the method compares the geometric relations of the relevant statistical data of time series and calculates the grey relational degree of each factor. The concrete steps of GRA are as follows: Step 1 Let the reference time series X 0 = X 0 (k) k = 1, 2, 3, . . . , n , the comparison sequence X i = X i (k) k = 1, 2, . . . , n; i = 1, 2, . . . , m . The original data is dimensionless processed according to Formula (5): Step 2 Calculate the grey relational coefficient. Grey relational coefficient is calculated as follows: where ξ i (k) is the grey relational coefficient x 0 (k) and x i (k), which reflects the close degree of two sequences at some point.Constant ρ is the resolution coefficient and its value range is (0, 1), while a smaller ρ indicating greater discrimination. To increase the difference between correlation coefficients, ρ = 0.5 is usually taken.
Step 3 Calculate the grey relational degree. By integrating the grey relational coefficients of all points, the grey correlation degree of X i and X 0 can be calculated as follows: Energies 2020, 13, 422 4 of 20 r i reflects the degree of correlation between X i and X 0 A larger r i means higher correlation degree, closer relationship and closer development trend and rate.
Step 4 Grey relational degree ranking. The grey relational degree of influencing factor sequence to the system behavior characteristic sequence is ranked from large to small.
Since there is no definite qualitative and quantitative description between dissolved gas concentration, oil temperature, transformer load and ambient temperature, and there is uncertainty in the mutual restriction relationship among all gases, the grey relational degree is used to measure the affinity among all factors, and obtains the main factors that influence each dissolved gas concentration.

Empirical Mode Decomposition
Empirical mode decomposition (EMD) [20][21][22] is a signal decomposition method based on local characteristics of signals, which absorbs the advantage of multi-resolution of wavelet transform and overcomes the difficulty of choosing wavelet basis and determining decomposition scale in wavelet transform, so it is more suitable for nonlinear non-stationary signal analysis and is an adaptive signal decomposition method. The EMD assumes that any complex signal is composed of simple intrinsic mode functions (IMF), and each IMF is mutually independent. This EMD can decompose different scales or trends in time series data into its component step by step, and a series of data sequences with the same characteristics of the scale are produced, by which the non-stationary nonlinear data is transformed into a smooth linear data. Compared with the original data sequence, the sequence after decomposition is with greater regularity, which is of great help on identifying hidden relationship and can improve the prediction accuracy [23][24][25]. The steps of EMD for a given time series are as follows: Step 1: Determine the upper envelope e up (t) and the lower envelope e low (t) from the local maximum and local minimum of time series data x(t), and calculate the mean envelope m 1 (t): Step 2: Subtract m 1 (t) from x(t) to get h 1 (t), and consider h 1 (t) as a new signal x(t), repeat Step 1, by k times of screening, until h 1 (t) = x(t) − m 1 (t) meets IMF conditions, then c 1 (t) is the first IMF component of the time series, as shown in (2), it contains the shortest periodic component in the original sequence.
Step 3: After separating the first IMF component from the time series x(t), the remaining component r 1 (t) of x(t) can be obtained as follows: Step 4: Take r 1 (t) as a new time series and repeat Step 1 3 to obtain a series of qualified IMF components c i (t) and residual r n (t). Then, the original time series x(t) can be described by IMF components and residual component as follows: From Step 1 to Step 4, the original time series can be decomposed into sub-sequences of different frequencies, namely IMF and residual r. Then, the trend prediction of each subsequence is carried out, and the prediction results of the subsequence are superimposed to obtain the prediction results of the original sequence.

Standard Grey Wolf Optimization
The grey wolf optimizer (GWO) [14] is a novel swarm intelligence algorithm inspired by predation behavior of grey wolves by Mirjalili et al., which mimics hunting behavior and social leadership of grey wolves in nature and uses four types of grey wolves to simulate social hierarchy.
The three wolves with the best fitness are alpha (α), beta (β) and delta (δ), while the remaining wolves are omega (ω). GWO algorithm realizes the global optimization by imitating the predation behavior of grey wolves such as encircling, hunting and attacking. The optimization process is mainly guided by the three best solutions (α, β, and δ) in each generation of the population.
The mathematical model of the behavior of grey wolves encircling their prey is as follows: where: t is the number of current iterations; A and C are coefficient vectors; X p represents the position vector of prey; X(t) represents the position vector of the current grey wolf. In the whole iteration, a is the convergence factor, which decreases linearly from 2 to 0; r 1 and r 2 are random vectors in (0,1). Assuming that α, β, and δ are capable of identifying potential prey positions, the three best wolves in the current population are retained during each iteration and the positions of other search agents are updated based on their positions. The mathematical model of this behavior can be expressed as follows: where: X α , X β , and X δ represents the current position of α, β, and δ wolf respectively; X is the current location of the ω wolf and D α , D β and D δ represents the distance between the current candidate and the three optimal wolves respectively. C 1 , C 2 and C 3 are random vectors. The vector positions of the prey can be determined based on the α, β and δ positions using the following equations: where: X 1 , X 2 , and X 3 represents the current position of the prey determined α, β, and δ wolf respectively; X(t + 1) is the final location of the prey determined based on the X 1 , X 2 , and X 3 .

Modified Grey Wolf Optimization
Although GWO algorithm shows its advantages in many fields, it is easy to fall into the local optimum and the calculation speed and the accuracy is a little low, which limit the application of the algorithm. Therefore, this paper improved the original grey wolf optimization algorithm as follows.
In the process of grey wolf population approaching the target, the position updating Equation (14) shows the equal importance of α, β, δ, ignoring the different characteristics of the three wolves, and the proportion of the leading position and the optimal solution of grey wolf is not well reflected in Equation (14). Considering that the contribution of each grey wolf is different, different weights Energies 2020, 13, 422 6 of 20 are given to individual grey wolf of different social hierarchy [26]. The position update Equation (15) was adopted: In the optimization algorithm, excess exploration of search space may result in lower probability of getting trapped in local optima, while too much exploitation is related to less randomness and the algorithm may not reach the global optimum. Hence, a balance between exploration and exploitation should be maintained during the iterations. In the original GWO, the value of a decreases linearly from 2 to 0 during the iteration, and the updated equation is as follows: where Max Iter represents the maximum number of iterations, and t is the current number of iterations. The update equation makes half the iteration for exploration and the other half for exploitation. In this paper, Formula (17) was used for the attenuation of a in the iterative process, and the value of a was reduced from 2 to 0 [27]: The decay function in Formula (17) made the number of iterations used for exploration and exploitation are 70% and 30% respectively. In this way, more iterations were used for exploration, while fewer iterations were used for exploitation, so as to achieve better performance than the original GWO algorithm [27]. The pseudo-code of the MGWO (Algorithm 1) is presented in the following form.
(3) Find α, β, and δ as the first three best solutions based on their fitness values.
Update current wolf's position according to Equation (15).

-
Update a as in Equation (17). -Update A, and C as in Equations (10) and (11)

Least Square Support Vector Machine
SVM [28] is a machine learning technology based on the structural risk minimization principle, which has shown excellent learning performance and generalization ability in the fields of regression analysis, pattern recognition, fault diagnosis and time series trend prediction. LSSVM is an extension of standard SVM. LSSVM algorithm replaces the inequality constraint in traditional SVM with Energies 2020, 13, 422 7 of 20 equality constraint and replaces the relaxation variable in SVM with the square of training error, thus transforming the quadratic programming problem into the equality constraint problem of linear equations, which obtains faster solution speed and stronger real-time performance. The mathematical model of least squares support vector machine is as follows [7]: Given The optimal decision function is constructed in the feature space as follows: where ω is the weight vector and b is the deviation. Use the structural risk minimization (SRM) principle to calculate the parameters ω and b by the minimization Formula (19): Equation (19) satisfies the equation constraint: In Equation (19), ω 2 controls the complexity of the model, γ N i=1 ξ 2 i is error control function, γ is the regularization parameter and ξ i is fitting error of the sample i. By introducing Lagrange multiplier α > 0, the unconstrained optimization problem can be obtained as follows: According to KKT (Karush-Kuhn-Tucker) optimal conditions, take the partial derivative of ω, b, ξ i and α i through L respectively and make them all zero, finally the LSSVM model is obtained: where K(x, x i ) is the kernel function. Due to the non-linear relationship among the dissolved gas concentrations in transformer oil, the radial basis kernel function (RBF) suitable for solving nonlinear problems and with fewer nuclear parameters is selected as the kernel function.
where σ 2 is the kernel parameter. In this paper, the MGWO was adopted to optimize LSSVM model parameters γ and σ to improve the prediction accuracy of the model. The steps of MGWO-LSSVM model for time series prediction were as follows: Step 1 Normalize the sample data and change the original data linearly to the interval (−1.1), and use it as the training data set.
Step 2 Select different parameters for the training data to conduct experiments and generate training tables with different parameters. In this step, the parameters of each set of decomposed IMF sequence prediction models should be optimized.
Step 3 Select the appropriate optimal parameters according to the training error and the comprehensive performance of different parameters from the training result.
Energies 2020, 13, 422 8 of 20 Step 4 The training model is generated after learning the training data by using the selected parameters, and then the prediction data is used as input to test the prediction results. If the prediction results do not meet with the prediction accuracy requirements, then return to Step 3 and re-select the parameters for learning, while the prediction results conform to the prediction accuracy, proceeding to Step 5.
Step 5 After the parameters are determined, using the data sequence for prediction, and finally conduct error analysis.

Prediction Model Based on GRA-EMD-MGWO-LSSVM
The time series data of dissolved gas concentration in transformer oil have strong nonlinearity and non-stationary. In view of the prominent advantages of EMD technology in non-stationary data processing, this paper proposed a prediction model for dissolved gas concentration that integrated grey relational analysis, empirical mode decomposition and modified grey wolf optimized least squares support vector machine. The prediction steps were as follows: Step 1 Sample collection: The DGA data were collected, including the sampling time, oil temperature, load, ambient temperature and gas concentration. The gas concentration, oil temperature, load and ambient temperature were listed as the comparison sequence, and the gas concentration to be predicted as the reference sequence.
Step 2 Grey relational analysis: The initial value transformation was performed on the original data according to Formula (1), the grey relational coefficient was calculated according to Formula (2), and grey relational degree is calculated according to Formula (3). According to the grey incidence matrix, the main influencing factors of each gas concentration were screened out, and the factors with weak relational degree were eliminated.
Step 3 EMD processing: The EMD method was used to decompose the time series data of dissolved gas concentration and obtain c i (t) and residual r n (t) of each IMF component.
Step 4 MGWO-LSSVM model prediction: LSSVM regression model was established for each IMF component c i (t) and r n (t) to obtain the predicted values of each decomposition sequence, and MGWO was used to select optimal parameters. Step 5 The predicted values of each component were superimposed to obtain the predicted values of gas concentration.
Step 6 Verification of prediction results: compared with actual data, calculated error index and conducted error analysis.
The prediction model proposed in this paper based on MGWO-LSSVM combined with EMD and GRA was shown in Figure 1. It consists of four main parts. Firstly, GRA was used to conduct correlation analysis of the original DGA data. Secondly, EMD was used to decompose the DGA data to get the IMF components of each gas concentration. Thirdly, MGWO was used to optimize LSSVM model to predict the trend of each IMF components and finally, the prediction result was being evaluated.
In order to verify the prediction accuracy of the model, the average absolute percentage error (MAPE) η MAPE , root mean square error (RMSE) η RMSE and maximum average absolute percentage error η max to evaluate the predictive results of the proposed algorithm [29,30], it is defined as: where y i is the actual value;ŷ i is the predicted value and n is the number of samples. max means to calculate the maximum relative error of |yi−ŷi| y i . A large average relative error means a low prediction accuracy.
Step 5 The predicted values of each component were superimposed to obtain the predicted values of gas concentration.
Step 6 Verification of prediction results: compared with actual data, calculated error index and conducted error analysis.
The prediction model proposed in this paper based on MGWO-LSSVM combined with EMD and GRA was shown in Figure 1. It consists of four main parts. Firstly, GRA was used to conduct correlation analysis of the original DGA data. Secondly, EMD was used to decompose the DGA data to get the IMF components of each gas concentration. Thirdly, MGWO was used to optimize LSSVM model to predict the trend of each IMF components and finally, the prediction result was being evaluated. In order to verify the prediction accuracy of the model, the average absolute percentage error (MAPE)

Sequence of Dissolved Gas
, root mean square error (RMSE) and maximum average absolute percentage error to evaluate the predictive results of the proposed algorithm [29,30], it is defined as:

Case Study and Analysis
In this paper, the prediction model based on GRA-EMD-MGWO-LSSVM was implemented by the MATLAB simulation platform (R2018b, MathWorks, Natick, Massachusetts, USA) on an 8-core Lenovo laptop (T470P, Lenovo, Beijing, China) with 8 GB memory and 2.8 GHz clock, running Windows 10 enterprise operating system (64-bit). In addition, two examples were given to test and verify the model.

Prediction Example 1
In this part, the DGA data were from a transformer with a voltage of 750 kV in State Grid Corporation of China. The equipment model was BKD-120000/800. Since 9 January 2012, the workers took DGA samples every three days with the oil temperature, load and ambient temperature recorded.

Grey Relational Analysis
The grey relational analysis results of the DGA data are shown in Table 2 and the grey relational degree of each factor ranged from 0.46 to 0.93. It can be seen from Table 2 that: (1) There was a strong correlation between the load and gas concentration of H 2 and CO and a less stronger correlation between load and CH 4 , C 2 H 2 , C 2 H 4 , CO 2 , and total hydrocarbon. The correlation between load and the gas concentration of C 2 H 6 was weak. (2) The correlation between oil temperature and each gas concentration was strong and the grey relational degree ranged from 0.65 to 0.85. (3) The grey relational degree of ambient temperature and gas concentration was between 0.65 and 0.86, indicating that there was a strong correlation between ambient temperature and each gas. (4) The grey relational degree between oil temperature and ambient temperature was 0.85, which indicates that the ambient temperature was positively correlated with the oil temperature range.
Energies 2020, 13, 422 11 of 20 (5) The grey relational degree of the total hydrocarbon and the hydrocarbon gases was between 0.77 and 0.93, which verified that there was a strong correlation between the hydrocarbon gases and total hydrocarbon while the total hydrocarbon concentration was the sum of the four gas concentrations.

Empirical Mode Decomposition
EMD was performed on the time series data of dissolved gas concentration in Table 1 and the EMD results of H 2 , CH 4 , C2H 6 , C 2 H 2 , C 2 H 4 , CO, CO 2 and total hydrocarbon are shown in the Figure 2, respectively. As can be seen from the Figure 2a

Results and Discussion
The factors with grey relational degree greater than 0.50 were taken as the main factors to establish the MGWO-LSSVM prediction model. H2 was taken as an example and the IMFs and residual of H2 and transformer load, oil temperature and ambient temperature were used as the input of the model. The data from 9 January 2012 to 13 June 2012 were usedas the training set, and 17 June 2012 to 3 July 2012 as the test set. The prediction results of H2 by GRA-EMD-MGWO-LSSVM model are shown in Figure 3.

Results and Discussion
The factors with grey relational degree greater than 0.50 were taken as the main factors to establish the MGWO-LSSVM prediction model. H 2 was taken as an example and the IMFs and residual of H 2 and transformer load, oil temperature and ambient temperature were used as the input of the model. The data from 9 January 2012 to 13 June 2012 were usedas the training set, and 17 June 2012 to 3 July 2012 as the test set. The prediction results of H 2 by GRA-EMD-MGWO-LSSVM model are shown in Figure 3.  In order to verify the validity of the proposed model, the prediction model was compared with several models. The comparison results were shown in Tables 3 and 4. For the prediction models in Table 3, only the dissolved gas concentration data were used as the input without processing by EMD. In Table 4, the dissolved gas concentration data processed by EMD were used as the input of EMD-PSO-LSSVM, EMD-GWO-LSSVM and EMD-MGWO-LSSVM, respectively, while for GRA-EMD-PSO-LSSVM, GRA-EMD-GWO-LSSVM and GRA-EMD-MGWO-LSSVM, the transformer load, oil temperature, ambient temperature and the EMD processing results were used as the input. In order to verify the validity of the proposed model, the prediction model was compared with several models. The comparison results were shown in Tables 3 and 4. For the prediction models in Table 3, only the dissolved gas concentration data were used as the input without processing by EMD. In Table 4, the dissolved gas concentration data processed by EMD were used as the input of EMD-PSO-LSSVM, EMD-GWO-LSSVM and EMD-MGWO-LSSVM, respectively, while for GRA-EMD-PSO-LSSVM, GRA-EMD-GWO-LSSVM and GRA-EMD-MGWO-LSSVM, the transformer load, oil temperature, ambient temperature and the EMD processing results were used as the input. It can be seen from Tables 3 and 4 that: (1) Compared with LSSVM, PSO-LSSVM and GWO-LSSVM, the MGWO-LSSVM model obtained higher prediction accuracy and showed good effectiveness. (2) Compared with PSO-LSSVM, GWO-LSSVM and MGWO-LSSVM, after the processing step by EMD, EMD-PSO-LSSVM, EMD-GWO-LSSVM and EMD-MGWO-LSSVM shows better performance in prediction, respectively. The EMD method could decompose the concentration time series of dissolved gas in oil to produce a series of stationary data sequences with the same scale, which reduced the influence of nonlinearity and non-stationary of dissolved gas concentration data on the prediction results and improved the accuracy of prediction model. (3) By using GRA to conduct a correlation analysis of dissolved gas concentration and transformer load, oil temperature, ambient temperature and the mutual influencing factors of gas concentration were extracted and irrelevant information were reduced, which could improve the prediction accuracy (4) Compared with other prediction models, the GRA-EMD-MGWO-LSSVM model proposed in this paper achieved higher prediction accuracy.

Prediction Example 2
In this part, the raw DGA data were collected from the main transformer of a 750 kV substation of State Grid Corporation of China to verify the proposed model in this paper. The transformer was put into operation in April 2010 and the equipment model was ODFPS-700000/750GY. Since April 21, 2011, trace amounts of acetylene appeared in the transformer and the data of gas concentration, oil temperature, load and ambient temperature were recorded by the workers in the follow-up test once every day. Table 5 shows the data of the transformer during April 22, 2011 andMay 18, 2011, where A1-A11 represents H 2 (µL/L), CH 4 (µL/L), C 2 H 6 (µL/L), C 2 H 2 (µL/L), C 2 H 4 (µL/L), CO (µL/L), CO 2 (µL/L), total hydrocarbon (µL/L), oil temperature ( • C), load (MW) and ambient temperature ( • C), respectively.

Grey Relational Analysis
The grey relational degree of each factors were calculated according to GRA, as shown in Table 6 and the grey relational degree ranged from 0.43 to 0.82. It can be seen from Table 6 that (1) The grey relational degree of load and the gas concentration was between 0.43 and 0.62, indicating that there was a strong correlation between the load and gas concentration of CH 4 , C 2 H 2 , C 2 H 4 and CO 2 , while the correlation between load and H 2, C 2 H 6 was weak. (2) The grey relational degree between oil temperature and each gas concentration was between 0.58 and 0.69, which indicates that oil temperature had strong correlation with each gas concentration. (3) The grey relational degree of ambient temperature and gas concentration was between 0.64 and 0.70, indicating that there was also a strong correlation between ambient temperature and gas concentration.  (4) The grey relational degree between oil temperature and ambient temperature was 0.78, which indicates that the ambient temperature was positively correlated with the oil temperature range. Lower ambient temperature was conducive to heat dissipation, resulting in lower oil temperature. (5) The grey relational degree of the total hydrocarbon and the hydrocarbon gases was between 0.68 and 0.82, which verified that there was a strong correlation between the hydrocarbon gases and total hydrocarbon while the total hydrocarbon concentration was the sum of the four gas concentrations.

Empirical Mode Decomposition
The EMD results of H 2 , CH 4 , C2H 6 , C 2 H 2 , C 2 H 4 , CO, CO 2 and total hydrocarbon are shown in the Figure 4, respectively. The original H 2 nonlinear sequence was decomposed into two less volatile time series components IMF (IMF1-IMF2) and one residual component shown in Figure 4a. The EMD results of H2, CH4, C2H6, C2H2, C2H4, CO, CO2 and total hydrocarbon are shown in the Figure 4, respectively. The original H2 nonlinear sequence was decomposed into two less volatile time series components IMF (IMF1-IMF2) and one residual component shown in Figure 4a.

Results and Discussion
The IMFs and the residual of each gas were predicted by the MGWO-LSSVM model, respectively. The data from June 7, 2010 to February 7, 2012 were used as the training set, and February 20, 2012 to July 17, 2012 as the test set. The prediction result of H2 by the GRA-EMD-MGWO-LSSVM model is shown in Figure 5. The prediction results of different prediction models were compared with the proposed model in this paper, as shown in Tables 7 and 8.

Results and Discussion
The IMFs and the residual of each gas were predicted by the MGWO-LSSVM model, respectively. The data from June 7, 2010 to February 7, 2012 were used as the training set, and February 20, 2012 to July 17, 2012 as the test set. The prediction result of H 2 by the GRA-EMD-MGWO-LSSVM model is shown in Figure 5.

Results and Discussion
The IMFs and the residual of each gas were predicted by the MGWO-LSSVM model, respectively. The data from June 7, 2010 to February 7, 2012 were used as the training set, and February 20, 2012 to July 17, 2012 as the test set. The prediction result of H2 by the GRA-EMD-MGWO-LSSVM model is shown in Figure 5. The prediction results of different prediction models were compared with the proposed model in this paper, as shown in Tables 7 and 8.  The prediction results of different prediction models were compared with the proposed model in this paper, as shown in Tables 7 and 8. It can be seen from Tables 7 and 8 that: (1) The MGWO-LSSVM model had higher prediction accuracy, which verified the effectiveness of MGWO algorithm. (2) The prediction models with EMD method showed higher prediction accuracy, which was consistent with the results shown in Table 5. (3) The prediction models with GRA method also showed better performance, which was consistent with the results shown in Table 5. (4) Compared with other prediction models, the GRA-EMD-MGWO-LSSVM model proposed in this paper achieved higher prediction accuracy.

Conclusions
This paper presented a dissolved gas concentration prediction model based on GRA-EMD-MGWO-LSSVM for oil-immersed transformer oil. Firstly, the original time series data of dissolved gas concentration in the original oil were analyzed by the grey relational analysis method to extract the optimal feature set affecting gas concentration and the concentration data of each dissolved gas were decomposed by empirical mode decomposition. Then, the modified grey wolf optimized least squares support vector machine was used to predict the subsequences of each gas. Finally, the prediction sequences were reconstructed to obtain the final prediction results. The prediction model proposed in this paper was compared with LSSVM, PSO-LSSVM, GWO-LSSVM, MGWO-LSSVM, EMD-PSO-LSSVM, EMD-GWO-LSSVM, EMD-MGWO-LSSVM, GRA-EMD-PSO-LSSVM, and GRA-EMD-GWO-LSSVM by using DGA samples from two transformers. The main conclusions are listed as follows: (1) The modification strategy of GWO improved the performance of the original GWO, which promoted the accuracy of the prediction model. (2) The introduction of GRA and EMD into the prediction model could greatly improve the accuracy of the prediction model (3) The effect of transformer load, oil temperature and ambient temperature on the dissolved gas concentration was explored and these factors were considered in the model, which improved the performance of the prediction model. (4) The model proposed in this paper showed high prediction accuracy, stronger generalization ability and robustness. (5) The model proposed in this paper maintained high prediction accuracy and showed strong generalization ability for different DGA data samples.
Although the proposed model shows good performance in prediction, there were some differences in the prediction accuracy for different DGA data samples. For transformers in different operating states, oil temperature, load and ambient temperature had different effects on DGA data, which need more research in the future. In addition, the concentration of dissolved gas in oil is also related to the transformer operating life, operating state and other factors [31]. According to the further tests in the field, we know that for some transformers with long operation life, the dissolved gas concentration will increase with the increase of load due to equipment aging. Therefore, in future work, more running data will be collected and the relationship between dissolved gas concentration and the transformer operating life, operating state and other factors will be analyzed to develop a prediction model with stronger generalization ability and higher prediction accuracy, and further improve the practical ability of the prediction model.