Mixed Kernel Function Support Vector Regression with Genetic Algorithm for Forecasting Dissolved Gas Content in Power Transformers

: Forecasting dissolved gas content in power transformers plays a signiﬁcant role in detecting incipient faults and maintaining the safety of the power system. Though various forecasting models have been developed, there is still room to further improve prediction performance. In this paper, a new forecasting model is proposed by combining mixed kernel function-based support vector regression (MKF-SVR) and genetic algorithm (GA). First, forecasting performance of SVR models constructed with a single kernel are compared, and then Gaussian kernel and polynomial kernel are retained due to better learning and prediction ability. Next, a mixed kernel, which integrates a Gaussian kernel with a polynomial kernel, is used to establish a SVR-based forecasting model. Genetic algorithm (GA) and leave-one-out cross validation are employed to determine the free parameters of MKF-SVR, while mean absolute percentage error (MAPE) and squared correlation coefﬁcient ( r 2 ) are applied to assess the quality of the parameters. The proposed model is implemented on a practical dissolved gas dataset and promising results are obtained. Finally, the forecasting performance of the proposed model is compared with three other approaches, including RBFNN, GRNN and GM. The experimental and comparison results demonstrate that the proposed model outperforms other popular models in terms of forecasting accuracy and ﬁtting capability.


Introduction
Power transformers is some of the most vital and expensive devices in power grids. They play a significant role in transferring energy and converting voltages to different levels. Any unexpected malfunction or failure of a power transformer may jeopardize the continuity of the power supply, cause catastrophic damages to electrical equipment and power system, and bring economic losses for power utilities and society. Therefore, considerable efforts have been made to detect and monitor the operating conditions of power transformers to keep transformers working under safe conditions [1][2][3].
Due to thermal stresses, electrical stresses and aging, the insulation systems (i.e., mineral oil, cellulose and solid insulation) of power transformers inevitably deteriorate and decompose. As a result, several kinds of gases are produced and dissolve in the mineral oil during the degradation process. Numerous approaches and models based on dissolved gas concentration and gas characteristics have been developed and utilized for the last decades [4][5][6][7]. The dissolved gas analysis (DGA) technique, a simple and effective method, has been widely applied to interpret the working conditions and number of the support vectors instead of the dimensions of input data, which not only prevents the "dimension curse", but also reduces computational cost.
Given a dataset D = {(x i , y i )}, (x i ∈ R n , y i ∈ R, i = 1, 2, · · · n), where x is the n-dimensional input variable and y is the corresponding output value. n represents the number of samples. A linear problem can be described by the function shown below: where ω and b denote weight coefficient and constant coefficient, respectively; f (x) is the forecasting value. When it comes to a non-linear problem, kernel function ϕ(x) is applied to transform the low-dimension nonlinear problem to a high-dimension linear problem. The regression function is shown as Equation (2): The parameters ω and b can be estimated by minimizing the regularized risk function: where C is the penalty factor, which is used to balance empirical risk and confidence degree. ε(·) denotes the ε -non sensitive loss function and ε represents for the ε -intensive loss parameter. Two non-negative slack variable ξ i and ξ * i are introduced to facilitate the solving process and then the optimization problem becomes: Lagrangian multipliers are introduced to convert the problem described above to a dual optimization problem, which is shown as Equation (5): where α i and α * i are the Lagrangian multipliers, K x i · x j is kernel function. Then, the support vector regression function f (x) can be obtained as follows:

Multi-Kernel Funciton
The kernel function is the most significant component of SVR. It is used to project original low dimensional data to a higher dimensional data space, and converts a nonlinear problem into a linear problem [30]. Different kernel functions have different mapping capability, which results in different prediction accuracy. Therefore, significant efforts have been made to choose proper kernels [31,32]. Four commonly applied kernel functions are listed below: (1) Linear kernel function: (2) Polynomial kernel function: (3) Gaussian kernel function (or RBF): (4) Sigmoid kernel function: Generally, these kernel functions can be divided into two categories: local kernels and global kernels. There are pronounced differences in the projecting ability of different kernel functions. For the global kernel functions, such as linear kernel function and polynomial kernel function, data points far away from each other affect the kernel value and a higher order of the polynomial kernel function has better interpolation ability, while a lower order of the polynomial kernel function has better extrapolation ability. On the contrary, the local kernel functions, including Gaussian kernel function and sigmoid kernel function, allow data close to each other to have an impact on the kernel value [23,24]. The data distribution characteristics of polynomial kernel and Gaussian kernel are shown as Figures 1 and 2, respectively.
Considering the advantages and disadvantage of local kernels and global kernels, we try to integrate different kernel functions to obtain one mixed-kernel function (MKF), which is shown as Equation (11). According to Mercer's conditions, when k 1 and k 2 are allowable kernel function, then the combined kernel k is also an admissible kernel.
(1) Linear kernel function: (2) Polynomial kernel function: (3) Gaussian kernel function (or RBF): (4) Sigmoid kernel function: Generally, these kernel functions can be divided into two categories: local kernels and global kernels. There are pronounced differences in the projecting ability of different kernel functions. For the global kernel functions, such as linear kernel function and polynomial kernel function, data points far away from each other affect the kernel value and a higher order of the polynomial kernel function has better interpolation ability, while a lower order of the polynomial kernel function has better extrapolation ability. On the contrary, the local kernel functions, including Gaussian kernel function and sigmoid kernel function, allow data close to each other to have an impact on the kernel value [23,24]. The data distribution characteristics of polynomial kernel and Gaussian kernel are shown as Figures 1 and 2, respectively.
Considering the advantages and disadvantage of local kernels and global kernels, we try to integrate different kernel functions to obtain one mixed-kernel function (MKF), which is shown as Equation (11). According to Mercer's conditions, when 1 k and 2 k are allowable kernel function, then the combined kernel k is also an admissible kernel.   A prevalent MKF is a mixture of the Gaussian kernel and polynomial kernel, which is defined as Equation (12): where γ and d are the kernel width and power exponent, respectively, and ω is the mixing coefficient. Obviously, a single kernel method can be regarded as a specific case of the MKF. That is, the MKF will be polynomial kernel when 0 ω = , and a Gaussian kernel if 1 ω = . Figure

Genetic Algorithm
The genetic algorithm (GA), initially developed by John Holland in the 1970s, is a global heuristic searching and optimization technique. GA is inspired by Darwin's principle of the "survival of the A prevalent MKF is a mixture of the Gaussian kernel and polynomial kernel, which is defined as Equation (12): where γ and d are the kernel width and power exponent, respectively, and ω is the mixing coefficient.
Obviously, a single kernel method can be regarded as a specific case of the MKF. That is, the MKF will be polynomial kernel when ω = 0, and a Gaussian kernel if ω = 1. Figure 3 depicts the effect of the mixing of a polynomial kernel and Gaussian kernel when test point X = 0.3, d = 1 and γ = 50. It can be seen that the mixed kernel function possesses the merits of both the local kernel and global kernel, and able to promote fitting and generalization ability. A prevalent MKF is a mixture of the Gaussian kernel and polynomial kernel, which is defined as Equation (12): where γ and d are the kernel width and power exponent, respectively, and ω is the mixing coefficient. Obviously, a single kernel method can be regarded as a specific case of the MKF. That is, the MKF will be polynomial kernel when 0 ω = , and a Gaussian kernel if 1 ω = . Figure

Genetic Algorithm
The genetic algorithm (GA), initially developed by John Holland in the 1970s, is a global heuristic searching and optimization technique. GA is inspired by Darwin's principle of the "survival of the fittest" and natural evolution. GA has been applied to various optimization problems in many diverse fields and has achieved substantial progresses [33][34][35][36]. Compared with other optimization algorithms, GA is easier to converge, its calculation is more efficient and it gets a better global view of the search space because of its effective exploitation and exploration technique [37].

Genetic Algorithm
The genetic algorithm (GA), initially developed by John Holland in the 1970s, is a global heuristic searching and optimization technique. GA is inspired by Darwin's principle of the "survival of the fittest" and natural evolution. GA has been applied to various optimization problems in many diverse fields and has achieved substantial progresses [33][34][35][36]. Compared with other optimization algorithms, GA is easier to converge, its calculation is more efficient and it gets a better global view of the search space because of its effective exploitation and exploration technique [37].
In general, GA starts with a randomly produced population, which represents the candidate solutions of a specific problem. Each candidate solution is called as chromosome or individual. A chromosome is composed of all concerned parameters that need to be optimized. The quality of a chromosome is assessed by a fitness function, which is established according to the objective function of the optimization problems. Genetic operations, including selection, crossover and mutation, are employed to manipulate the genetic reproduction of population during the optimization process. Selection is a process that chooses individual with higher fitness to reproduce offspring for the next generation. By this process, the population size is controlled and excellent individual is put into the next generation with a higher possibility. Crossover is a process that partial genetic information of two chosen chromosomes is exchanged by a specific way to generate new individual. Hence, individual of the next generation inherits some characteristic from each parent. The mutation operation produces new individual by randomly altering genetic information of a chromosome. The main purpose of mutation is to maintain the genetic diversity of population and avoid getting stuck in local minima. These genetic operations aforementioned will be repeated until the stopping criterion is met. A common optimization procedure of GA is shown as Figure 4. chromosome is composed of all concerned parameters that need to be optimized. The quality of a chromosome is assessed by a fitness function, which is established according to the objective function of the optimization problems. Genetic operations, including selection, crossover and mutation, are employed to manipulate the genetic reproduction of population during the optimization process. Selection is a process that chooses individual with higher fitness to reproduce offspring for the next generation. By this process, the population size is controlled and excellent individual is put into the next generation with a higher possibility. Crossover is a process that partial genetic information of two chosen chromosomes is exchanged by a specific way to generate new individual. Hence, individual of the next generation inherits some characteristic from each parent. The mutation operation produces new individual by randomly altering genetic information of a chromosome. The main purpose of mutation is to maintain the genetic diversity of population and avoid getting stuck in local minima. These genetic operations aforementioned will be repeated until the stopping criterion is met. A common optimization procedure of GA is shown as Figure 4.

Procedure for Forecasting Using Proposed Regression
Usually, the concentrations of dissolved gases, including H2, CH4, C2H2, C2H4 and C2H6, are recorded or saved in chronological order. Therefore, forecasting of dissolved gas content in power transformers is treated as a non-linear time series issue. The historical dissolved gas data are used as the time sequence in the forecasting process. There are two steps to establish an effective and accurate foresting model in this study. These are, data preprocessing, training and verify the forecasting model.

Data Preprocess
For a time series problem, it is essential to preprocess the raw data due to the possibility of missing values or false data. Firstly, the input data (the historical data of H2, CH4, C2H2, C2H4 and C2H6) needs to be carefully examined in order to remove any singular values and fill in missing data by some interpolation technique. Then, normalization should be implemented prior to the construction of training and testing data to reduce estimation error and improve generalization. In this study, original data is normalized with Equation (13): where i x and n x are the data before and after normalization, respectively; max x and min x

Procedure for Forecasting Using Proposed Regression
Usually, the concentrations of dissolved gases, including H 2 , CH 4 , C 2 H 2 , C 2 H 4 and C 2 H 6 , are recorded or saved in chronological order. Therefore, forecasting of dissolved gas content in power transformers is treated as a non-linear time series issue. The historical dissolved gas data are used as the time sequence in the forecasting process. There are two steps to establish an effective and accurate foresting model in this study. These are, data preprocessing, training and verify the forecasting model.

Data Preprocess
For a time series problem, it is essential to preprocess the raw data due to the possibility of missing values or false data. Firstly, the input data (the historical data of H 2 , CH 4 , C 2 H 2 , C 2 H 4 and C 2 H 6 ) needs to be carefully examined in order to remove any singular values and fill in missing data by some interpolation technique. Then, normalization should be implemented prior to the construction of training and testing data to reduce estimation error and improve generalization. In this study, original data is normalized with Equation (13): (13) where x i and x n are the data before and after normalization, respectively; x max and x min represent the maximum and minimum of the primary data.
Considering the case that historical data of dissolved gas may not be recorded with equal time intervals, it is necessary to convert these unequal interval series into equal time interval series to build a more convenient forecasting model. Hermite spline interpolation [19] and linear interpolation [21] are the two most popular interpolation approaches. In this paper, Hermite spline interpolation is selected to normalize primary data.

Training and Testing of The Forecasting Model
According to the historical data of dissolved gas sequence G n = {g 1 , g 2 , · · · g n }, the training set T can be built as below: where: After the historical data are divided into a training set and a testing set, a forecasting model based on MFK-SVR is trained to predict the development trend of the dissolved gas content in a power transformer. It has been mentioned in Section 2 that the free parameters of SVR and kernel functions have a great impact on the forecasting performance. Hence, GA is introduced in this paper to optimize these free parameters to improve forecasting accuracy and generalization ability. The main details of training by GA are elaborated as below: (1) Initialization of GA and encoding parameters In this investigation, the size of population, maximum iteration number, crossover possibility and mutation possibility are predefined at the initialization process. The chromosome, composed of free parameters (such as penalty factor C, kernel bandwidth σ, intensive loss parameter ε, power exponent d, mixing coefficient ω and so on), is set randomly. These parameters are encoded with the real code as it is suitable for complex problem and simple to use genetic operators to individuals [38]. The range and value of free parameters employed in the optimization are displayed in Table 1. (2) Definition and calculation of fitness function The fitness function is the core part of GA, which is used to estimate the performance of each individual. The leave one out cross-validation (LOO CV) method is adopted to calculate the forecasting accuracy. For the LOO CV method, a single sample selected from the training set is used as validation set in turn and other samples are applied as training set, then each sample is validated just once. Mean absolute percentage error (MAPE) and squared correlation coefficient (r 2 ) are employed as fitness function to measure the quality of each chromosome and evaluate the forecasting accuracy. Generally speaking, the less MAPE, the higher forecasting accuracy. While the value of r 2 is limited to the range of [0,1], and the greater the value, the better forecasting performance. MAPE and r 2 are calculated as follows: where x i represents the training data; y i and f (x i ) denote the actual value and forecasting value by the proposed model. l represents the size of training set.

(3) Genetic operation
Based on the estimation of the fitness value, a chromosome with higher fitness value is more likely to be selected to reproduce offspring by crossover and mutation. In this paper, roulette wheel selection, arithmetical crossover and uniform mutation are adopted to carry out genetic operations [39]. The whole process will be repeated until the maximum iteration number is reached, and then the best solution of the last generation is considered as the optimal result. The entire optimization process of the proposed approach is shown in Figure 5.

3) Genetic operation
Based on the estimation of the fitness value, a chromosome with higher fitness value is more likely to be selected to reproduce offspring by crossover and mutation. In this paper, roulette wheel selection, arithmetical crossover and uniform mutation are adopted to carry out genetic operations [39]. The whole process will be repeated until the maximum iteration number is reached, and then the best solution of the last generation is considered as the optimal result. The entire optimization process of the proposed approach is shown in Figure 5.  These parameters contained by the optimal solution (or chromosome) are used to establish the final forecasting model. Testing samples are set as Equation (15) described and used to calculate the forecasting valued by the established model. The index MPAE described as Equation (16), is used to test the forecasting accuracy of the proposed method. After the concerned dissolved gas contents are obtained, a local standard (GBT-7252 2001) can be employed to diagnose the working condition and incipient faults of power transformer.

Experimental Results for Forecasting Dissolved Gas Content in Power Transformer Oil
Several dissolved gas content sequences of 110 kV and 220 kV power transformers from China Southern Power Grid are used to demonstrate the forecasting performance of the proposed method. These parameters contained by the optimal solution (or chromosome) are used to establish the final forecasting model. Testing samples are set as Equation (15) described and used to calculate the forecasting valued by the established model. The index MPAE described as Equation (16), is used to test the forecasting accuracy of the proposed method. After the concerned dissolved gas contents are obtained, a local standard (GBT-7252 2001) can be employed to diagnose the working condition and incipient faults of power transformer.

Experimental Results for Forecasting Dissolved Gas Content in Power Transformer Oil
Several dissolved gas content sequences of 110 kV and 220 kV power transformers from China Southern Power Grid are used to demonstrate the forecasting performance of the proposed method. These DGA data are shown in Table 2. The dissolved gas data is firstly divided into training set and testing set according to data size and related references. Among them, case 1 and case 2 are sampled every day, while case 3 is sampled each week. It should be noted in this study that none singular value is eliminated and no more than 5% of data (11 out of 272) are missing in all three cases. Afterwards, normalization is implemented to improve generalization capability and reduce computational error. The whole experimental tests of the proposed approaches are conducted in the MATLAB (R2016) environment with the aid of the LIBSVM toolbox [40]. After data preprocessing and normalization, the time sequences for training set and testing set are established according to Equation (15). In this study, we test the forecasting performance of SVR models established on different single kernels, shown as Equations (7)- (10). GA is utilized to optimize the kernel parameters of ε -SVR, while LOO CV is applied to estimate the fitness to select the best choice among the candidate solutions. Numerical experiments for each model are repeated 50 times to decrease randomness within the final results. Results of the forecasting model for the training set of case 1 and case 2 are shown in Tables 3 and 4, respectively.
It can be seen from Tables 3 and 4 that, the SVR models based on linear kernel and sigmoid kernel have relatively better average MAPE than that of models established on the Gaussian kernel and polynomial kernel for all cases. However, the average r 2 of ε-SVR model with linear kernel or sigmoid kernel are far lower than that of models based on the other two kernels. r 2 provided by linear or sigmoid kernel is generally no more than 0.2 and 0.3, while the values obtained by the Gaussian and polynomial kernel are no less than 0.6 and 0.8 for case 1 and case 2, respectively.
Small r 2 indicates that the established model could not effectively depict the developing trends of the time series. Therefore, it is concluded that sigmoid kernel and lineal kernel are not suitable for forecasting the dissolved gas content of power transformer, and then they have not been studied further in the following investigation. Models based on Gaussian and polynomial kernel have better squared correlation coefficient and acceptable forecasting accuracy. Therefore, we apply Gaussian and polynomial kernel to develop a novel MKF-SVR model for predicting dissolved gas contents. Forecasting results of the training set are also presented in Tables 3 and 4, respectively. Compared with Gaussian and polynomial kernels, MKF-SVR model has slightly lower MAPE and comparative r 2 . Among the forecasting result obtained by mixed kernel, the worst average MAPE is no more than 2% and 5%, and the lowest r 2 is no less than 0.95 and 0.97 for case 1 and case 2, respectively. Besides, squared correlation coefficient r 2 of MKF-SVR model is far better than that of models based on linear and sigmoid kernel. The results reveal that the proposed MKF-SVR model integrates the advantages of local and global kernel and manifest the superiority of the proposed approach. All cases described in Table 2 are examined with the MKF-SVR model (repeated 50 times). For the proposed model, the mixed kernel function is shown as Equation (12), and free parameters that need to be optimized have been listed in Table 1. It should be pointed out that the dimension of input vector m plays an important role in forecasting performance. An improper dimension value m will lead to undesirable forecasting results [20]. Hence, parameter m is also considered in optimization process. The optimal parameters for each gas and corresponding prediction performance are presented in Table 5.
It is shown in Table 5 that the optimal value of m varies from case to case, which proves that it is necessary to tune the input vector dimension to gain a better performance. Moreover, the variation of mixing coefficient ω suggests that it is indispensable to integrate different kernels to obtain better forecasting performance. Take "H 2 " of case 1 for example, according to Table 4, the minimum MAPE for Gaussian and polynomial kernel is 0.4961 and 0.8222, while for the mixed kernel, ω G is equal to 0.9991, which means that the mapping characteristic of the kernel function is mainly determined by the Gaussian kernel. Although the weight of polynomial kernel is negligible (ω p = 0.0009), the predicted result of the training set is greatly improved to 0.1884, which indicates that the participation of the linear kernel has greatly improved the learning ability and decreased the forecasting error. Predicted values and absolute percentage error (APE, obtained by Equation (18)) of the training set and testing set are shown in Figures 6 and 7. Compared with models based on Gaussian and polynomial kernels, the MKF-SVR model can depict the variation trend of dissolved gas more accurate and more reliable. In addition, APE of models based on mixed kernels are generally lower than that of models established on other kernels. Specific forecasting values and corresponding MAPE for all training set and testing set are presented in Table 6. For each case, the least MAPE for forecasting value of the testing set are displayed in bold. Most of the result obtained by MKF-SVR are preferable to that of other methods. To sum up, the MKF-SVR model generally can accurately depict developing trends of dissolved gas and elevate the forecasting accuracy:   Furthermore, we compare the forecasting performance of MKF-SVR with other three popular methods (including GRNN, RBFNN and GM) in order to demonstrate the superiority of the proposed method. The experimental results and forecasting values of training and testing sets are presented as Table 7 and Figure 8 (for Case 1, H2). Table 7 shows that the proposed MKF-SVR method has the best MAPE and r 2 than that of other traditional methods. According to Figure 8, it can be found that, for the grey model, the forecasting value is monotonously increased, which is not accordance with the actual value at all and gives the biggest error and lowest r 2 due to the limitation of grey model mentioned in Section 1. In comparison with RBFNN, GRNN not only has better forecasting results, but also can depict the developing trends better. Nevertheless, these models are based on the principle of empirical risk minimization, thus the forecasting performance can be further promoted by adding extra samples. The proposed MKF-SVR method applies the principle of structural risk minimization, which make it have satisfying generalization ability with fewer samples. Moreover, it has better learning ability and prediction ability as the combination of local kernel and global kernel, which is conducive to illustrate developing trends of dissolved gas in Furthermore, we compare the forecasting performance of MKF-SVR with other three popular methods (including GRNN, RBFNN and GM) in order to demonstrate the superiority of the proposed method. The experimental results and forecasting values of training and testing sets are presented as Table 7 and Figure 8 (for Case 1, H 2 ). Table 7 shows that the proposed MKF-SVR method has the best MAPE and r 2 than that of other traditional methods. According to Figure 8, it can be found that, for the grey model, the forecasting value is monotonously increased, which is not accordance with the actual value at all and gives the biggest error and lowest r 2 due to the limitation of grey model mentioned in Section 1. In comparison with RBFNN, GRNN not only has better forecasting results, but also can depict the developing trends better. Nevertheless, these models are based on the principle of empirical risk minimization, thus the forecasting performance can be further promoted by adding extra samples. The proposed MKF-SVR method applies the principle of structural risk minimization, which make it have satisfying generalization ability with fewer samples. Moreover, it has better learning ability and prediction ability as the combination of local kernel and global kernel, which is conducive to illustrate developing trends of dissolved gas in power transformers. In conclusion, the forecasting accuracy and fitting performance of the proposed MKF-SVR model outperform that of other popular approaches.   Considering a situation that there exists bias or noisy during measurement of the dissolved gas content, which might affect the reliability and accuracy of the proposed model. Hence, the robustness of the proposed model is examined with noisy data. The noisy data is obtained by Equation (19): where ori data and noisy data are the original data and noisy data, respectively. p denotes a percentage level of noise. rand represents a data generated by uniform distribution between 0 and 1. When the noisy data is ready, the data preprocessing techniques mentioned in Section 3 are Considering a situation that there exists bias or noisy during measurement of the dissolved gas content, which might affect the reliability and accuracy of the proposed model. Hence, the robustness of the proposed model is examined with noisy data. The noisy data is obtained by Equation (19): data noisy = data ori (1 + p * rand)) (0% < p ≤ 100%) (19) where data ori and data noisy are the original data and noisy data, respectively. p denotes a percentage level of noise. rand represents a data generated by uniform distribution between 0 and 1. When the noisy data is ready, the data preprocessing techniques mentioned in Section 3 are carried out and the proposed model with the optimal parameters is employed to forecast the dissolved gas content. Forecasting values of training set and testing set are obtained and APE is adopted to estimate the forecasting performance. Data of H 2 in case 1 and case 2 are used to demonstrate the prediction capability, and the absolute value of APE of the forecasting results at different p are shown in Figure 9.
It can be seen from Figure 9 that APE is increased as p increases, whereas, there are slight differences in the training sets for both case 1 and case 2. Although the APE value of the testing set varies greatly, for case 1, APE for the testing set is obviously increased when p is larger than 5%, while for case 2, there are minor difference in APE for the testing set when the noise level is increased from 0 to 20%. Besides, the maximum change on APE for the testing sets of both case is no more than 10%, which is acceptable in practical application. Therefore, it can be concluded from Figure 9 that the proposed model has remarkable forecasting performance and desirable robustness.

Conclusions
In this paper, a mixed-kernel function based support vector regression model (MKF-SVR) is proposed to forecast the dissolved gas content in power transformers. At the beginning, the forecasting performance of SVR models with single kernel function are checked and the results suggest that models based on sigmoid kernel or linear kernel are not suitable for prediction of dissolved gas content. A mixed kernel function, combined with Gaussian kernel and polynomial kernel, is applied to develop the novel MKF-SVR model. Genetic algorithm and LOO-CV are adopted to optimize free parameters. Forecasting performance of the proposed MKF-SVR model is

Conclusions
In this paper, a mixed-kernel function based support vector regression model (MKF-SVR) is proposed to forecast the dissolved gas content in power transformers. At the beginning, the forecasting performance of SVR models with single kernel function are checked and the results suggest that models based on sigmoid kernel or linear kernel are not suitable for prediction of dissolved gas content. A mixed kernel function, combined with Gaussian kernel and polynomial kernel, is applied to develop the novel MKF-SVR model. Genetic algorithm and LOO-CV are adopted to optimize free parameters. Forecasting performance of the proposed MKF-SVR model is tested by actual gas data and the results indicate that the proposed model is generally superior to single kernel function based SVR models. Moreover, prediction results of RBFNN, GRNN and GM are compared with that of MKF-SVR, and the comparison results demonstrate that the proposed model has a better forecasting accuracy and fitting capability than that of other models. Additionally, the forecasting results based on noisy data verify the desirable robustness of the proposed model. In the future, several extra factors, including oil temperature, working load and environmental condition, should be taken into consideration for forecasting the development trends of dissolved gas levels in power transformers. Besides, more kernel types and different optimization algorithms can also be investigated to improve the forecasting performance.
Author Contributions: T.K. conceived the experiment and wrote the manuscript. A.T. and Y.Y. debugged the code. W.G. supervised the research and Z.Z. edited the manuscript. All author have approved the submitted manuscript.