Methane Detection Based on Improved Chicken Algorithm Optimization Support Vector Machine

: Methane, known as a ﬂammable and explosion hazard gas, is the main component of marsh gas, ﬁredamp, and rock gas. Therefore, it is important to be able to detect methane concentration safely and e ﬀ ectively. At present, many models have been proposed to enhance the performance of methane predictions. However, the traditional models displayed inevitable shortcomings in parameter optimization in our experiment, which resulted in their having poor prediction performance. Accordingly, the improved chicken swarm algorithm optimized support vector machine (ICSO-SVM) was proposed to predict the concentration of methane precisely. The traditional chicken swarm optimization algorithm (CSO) easily falls into a local optimum due to its characteristics, so the ICSO algorithm was developed. The formula for position updating of the chicks of the ICSO is not only about the rooster of the same subgroup, but also about the roosters of other subgroups. Therefore, the ICSO algorithm more easily avoids falling into the local extremum. In this paper, the following work has been done. The sample data were obtained by using the methane detection system designed by us; In order to verify the validity of the ICSO algorithm, the ICSO, CSO, genetic algorithm (GA), and particle swarm optimization algorithm (PSO) algorithms were tested, and the four models were applied for methane concentration prediction. The results showed that he ICSO algorithm had the best convergence e ﬀ ect, relative error percentage, and average mean squared error, when the four models were applied to predict methane concentration. The results showed that the average mean squared error values of ICSO-SVM model were smaller than other three models, and that the ICSO-SVM model has better stability, and the average recovery rate of the ICSO-SVM is much closer to 100%. Therefore, the ICSO-SVM model can e ﬃ ciently predict methane concentration.


Introduction
Air pollution is a serious environmental issue that has attracted more and more attention globally in recent years [1][2][3]. Methane is the main greenhouse pollutant, and also the main component of mine gas, biogas, and various liquid fuels [4,5]. It is stipulated that the lowest limit of explosion in air is 5.0%, the highest limit is 15.0%, and the explosive capacity is the strongest when the volume fraction is 9.5% [6]. Methane in the atmosphere can also cause a greenhouse effect and accelerate global warming [7,8]. It is for these reasons that methane detection is an indispensable field of research. Traditional detection generally adopts chemical methods, which require chemical reagents. These reagents have many disadvantages, such as danger, need to replace, and short life, and so these methods are not conducive to online real-time detection. In addition, absorption spectroscopy is a detection method that offers a rapid, direct, and selective technique to measure the concentration of (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n )x i ∈ R d , y i ∈ {−1, +1}, i = 1, . . . , n where x i ∈ R d are input training samples, y i are training samples, and d is dimension.
If the samples are separable, there is a classification hyperplane that separates the two types of samples. Crosses and open circles represent two types of samples in Figure 1, respectively. The nearest points to the classified hyperplane are named the support vector. H is a classification hyperplane. Hyperplanes H 1 and H 2 are linked to two types of support vectors and are parallel to H. The distance between H 1 and H is equal to the distance between H 2 and H. The distance between H 1 and H 2 is called the classification interval.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 17 The SVM was proposed by Vapnik in 1995 [24,25]. SVM, as a machine learning method, is effective for small samples, nonlinear, high dimensional, etc. SVM is developed from solving linear problems; it can construct an optimal hyperplane under the condition of linear and divisible. However, in practical applications, most problems are nonlinear. Therefore, the nonlinear input data map to a high-dimensional feature space. For example, given a set of array lengths n, which belong to R d :  (1) where xi ∈ R d are input training samples, yi are training samples, and d is dimension.
If the samples are separable, there is a classification hyperplane that separates the two types of samples. Crosses and open circles represent two types of samples in Figure 1, respectively. The nearest points to the classified hyperplane are named the support vector. H is a classification hyperplane. Hyperplanes H1 and H2 are linked to two types of support vectors and are parallel to H. The distance between H1 and H is equal to the distance between H2 and H. The distance between H1 and H2 is called the classification interval. A hyperplane divides the data into two categories, as follows: where w is the vector of the hyperplane, x is the input vector of the training set, and b is the constant term of the hyperplane. A hyperplane over two types of sample support vector is defined as: The interval d between hyperplanes H1 and H2 can be obtained from Equation (3): The regression function of classification of the hyperplane is defined as: The optimal hyperplane has a maximum margin between two classes. The optimal hyperplane problem is transformed into solving the quadratic optimization, and the slack variable is introduced [26]. The quadratic form can be represented as: Support vectors A hyperplane divides the data into two categories, as follows: where w is the vector of the hyperplane, x is the input vector of the training set, and b is the constant term of the hyperplane. A hyperplane over two types of sample support vector is defined as: The interval d between hyperplanes H 1 and H 2 can be obtained from Equation (3): The regression function of classification of the hyperplane is defined as: The optimal hyperplane has a maximum margin between two classes. The optimal hyperplane problem is transformed into solving the quadratic optimization, and the slack variable is introduced [26]. The quadratic form can be represented as: where ξ i , ξ * i are relaxation factors, C is the penalty factor, ε is the insensitivity coefficient, and s.t. is constraint.
Due to the complexity of the calculations of the quadratic optimization, the Equation (6) is transformed into a dual problem with Lagrange duality theory, as Equation (7) and (8) where The expression of f (x) in Equation (5) is expressed as: β i is non-support vector. When the data set is certain, β i = 0. The f (x) is represented by the remaining support vectors as: where N is a subset of the input data set. For a particular problem, a model for this problem can be determined by a subset of given data. When the problem is nonlinear, Equation (10) cannot accurately represent f (x). The nonlinear input data map to a high-dimensional feature space using nonlinear mapping Φ(x). In order to reduce the amount of calculation involved, the inner product operation of the high-dimensional feature space is converted into a function transport of the input space by using the kernel function K(x i , x).
Appl. Sci. 2019, 9, 1761 5 of 15 The expression of f (x) of Equation (10) can be expressed as: Common kernel functions include the linear kernel function, polynomial kernel function, sigmoid kernel function, and Gaussian radical kernel function. The Gaussian radical kernel function is better for problems with less a priori information [20]. The expression of the Gaussian radical kernel function is expressed as: where r is the radius of radial basis kernel function, and g = 2r 2 is the kernel parameter. The values of g and penalty factor C heavily affect the performance of SVM. The ICSO algorithm is intended to optimize the parameters of SVM rather than relying on random selection.

Chicken Swarm Optimization Algorithm (CSO)
The CSO algorithm is a kind of bionic random search algorithm, which imitates the foraging behaviors of a chicken swarm. The CSO algorithm consists of several subgroups. Each subgroup consists of a rooster, some hens, and several chicks. The roosters have the best fitness value, and the chicks have the worst fitness value. The position of each individual (roosters, hens, and chicks) represents a solution of the problem. The rooster has the best search ability compared to hens and chicks in each subgroup.
The rooster is a leader of the subgroup, with its position update equation defined by the following: where x i,j (t + 1) is the position of the rooster at the time t + 1. x i,j (t) is the position of at the time t. i is the subgroup number, j is the rooster index, Φ(0, σ 2 ) is the Gaussian distribution with zero mean and standard deviation σ. f ir and f kr are the fitness value of rooster, which is randomly selected (k i). ε is the smallest constant that is not equal to 0. The hens follow the rooster when foraging, so their position is affected by roosters in both the same subgroup and other subgroups. Hens' position update equation is as follows: where rand is a random number over [0, 1]. f r1 is the fitness value of the r 1 th rooster, which belongs to the same subgroup as the ith hen. r 2 is an index of chicken (rooster or hen), which is randomly selected and not equal to r 1 . C 1 and C 2 are the weight of the same subgroup and a different subgroup to the hen, respectively. The chicks follow their mothers when foraging; their position update equation is as follows: where x m,j is the position of the ith chick's mother. F (F ∈ [0, 2]) is a following coefficient, which means that the chick will follow its mother to go foraging.

Improved Chicken Swarm Optimization (ICSO)
In the swarm, the chicks will follow their mother hen when foraging. The chicks have the worst foraging, and have the smallest foraging range-that is to say, the chicks have the worst global search ability. In the CSO algorithm, the number of hens is the largest. Therefore, the search ability of hens has a great influence on the convergence of the CSO algorithm. From Equation (16), we can see that the position of hens is affected by roosters in the same subgroup and other subgroups, and hens have no self-learning ability. The roosters fall into the local optimum, which results in the hens and chicks falling into the local optimum and affecting the convergence of the whole algorithm. In the Improve Chicken Swarm Optimization (ICSO) algorithm, learning factors C 3 and C 4 are introduced to the chicks' position equation to solve the above problem. The chicks' position is not just about the rooster of same subgroup, but the roosters of other subgroups. The chicks' position update equation is modified as follows: where C 3 and C 4 are constants, which are learning factors by which the chicks follow roosters of the same subgroup and other subgroups, respectively. x r3,j is position of the rooster in same subgroup as the chicks. x r4,j is position of the rooster in other subgroups.

ICSO Optimized SVM Model
The steps of ICSO optimized SVM are as follows: (1) Parameter setting. The population size pop: namely, the number of chickens (roosters, hens, and chicks). The maximum number of iterations M: the chickens finish their forage after repeating their search procedure M times. Reconstruction coefficient G: the role assignment of chickens and the subgroup divisions will be done every G times. The numbers of roosters is denoted as RP, hens are HP, mother hens are MP, and chicks are CP. The values of the learning factors are denoted as C 3 and C 4 . The penalty factor C and the kernel parameter g are set within a range.  (14), (16), and (20), and recalculate the fitness values of the chickens. Update the value of p best and g best. (5) Repeat steps (3) and (4) until the iteration stop condition is reached, and output the optimum value.

Performance Evaluation Criterion
For the spectral data of methane with large variations, pretreatment should be done before training. Experimental data were given normalized treatment as follows: where x is the raw data, y is the processed data, and x, y ∈ R m . x min = min (x). x max = max (x). The fluctuation range of processed data is 0-1, and y i ∈ [0, 1], i = 1, 2, . . . n.
The mean squared error (MSE), relative error (RE), and the recovery rate (r) were used to evaluate the predictive effect of the model. Their values can be computed as follows: where y i is the true concentration value, y i ' is the predicted concentration value, and N is the numbers of the sample set.

Introduction of Datasets
The experiment was carried out using the methane detection system shown in Figure 2. Based on the infrared spectrum absorption characterization of methane gas, the long optical distance differential absorption method for methane detection was studied. The system is mainly composed of light source, filter system, double-chamber, signal collector, and a processing part. where x is the raw data, y is the processed data, and x, y ∈ R m . xmin = min (x). xmax = max (x). The fluctuation range of processed data is 0-1, and yi ∈ [0, 1], i = 1, 2, , … n.
The mean squared error (MSE), relative error (RE), and the recovery rate (r) were used to evaluate the predictive effect of the model. Their values can be computed as follows: where yi is the true concentration value, yi' is the predicted concentration value, and N is the numbers of the sample set.

Introduction of Datasets
The experiment was carried out using the methane detection system shown in Figure 2. Based on the infrared spectrum absorption characterization of methane gas, the long optical distance differential absorption method for methane detection was studied. The system is mainly composed of light source, filter system, double-chamber, signal collector, and a processing part. The light source uses a super light emitting diode (SLED). The power spectrum of the SLED was obtained using the steady-state spectrograph (AQ6317C, YOKOGAWA, Tokyo, Japan), as shown in Figure 3. The filter system uses slits, a collimator, a grating, a focus lens, and plane mirrors to obtain the necessary experimental monochrome. The chamber consists of two parts: reference chamber Ⅰ and test chamber Ⅱ. The length of chambers is 0.9 m. Reference chamber Ⅰ is filled with nitrogen, and the test chamber .
Ⅱ Is filled with the target gas (methane). As shown in Figure 2, the effective optical path can be extended to 2.7 m because the light was reflected twice in chamber. Light from the light source is scattered in the air inlet of the chamber, therefore, a graduated refractive index (GRIN) rod lens is placed at the inlet and outlet of the chamber. The pigtail of the GRIN rod lens is fused with transmission optical fiber. The light source uses a super light emitting diode (SLED). The power spectrum of the SLED was obtained using the steady-state spectrograph (AQ6317C, YOKOGAWA, Tokyo, Japan), as shown in Figure 3. The filter system uses slits, a collimator, a grating, a focus lens, and plane mirrors to obtain the necessary experimental monochrome. The chamber consists of two parts: reference chamber I and test chamber II. The length of chambers is 0.9 m. Reference chamber I is filled with nitrogen, and the test chamber II. Is filled with the target gas (methane). As shown in Figure 2, the effective optical path can be extended to 2.7 m because the light was reflected twice in chamber. Light from the light source is scattered in the air inlet of the chamber, therefore, a graduated refractive index (GRIN) rod lens is placed at the inlet and outlet of the chamber. The pigtail of the GRIN rod lens is fused with transmission optical fiber. concentration absorption spectra of methane are shown in Figure 4. The linear relationship between optical power and methane concentration is shown in Figure 5. The linear correlation coefficient is 0.9888, and the linear equation is y = −0.2344 x − 41.41.
From the experimental results, we can see that the methane detection system shown in Figure 2 can be used to detect methane. The data sets in this paper were obtained from the above detection system.   Nitrogen was used as the diluting gas to create concentration standards of methane gas. Concentrations of methane at 2000 ppm, 3000 ppm, 4000 ppm, 5000 ppm, 6000 ppm, 7000 ppm, 8000 ppm, 9000 ppm, 10,000 ppm, 11,000 ppm, 12,000 ppm, 13,000 ppm, 14,000 ppm, 15,000 ppm, 16,000 ppm, 17,000 ppm, 18,000 ppm, 19,000 ppm, and 20,000 ppm were prepared. For each concentration of gas, we made three repeated measurements. The measurement results are shown in Table 1. Table 1 reveals that the maximum measuring error was 0.045, and the average error was 0.0075. The four-concentration absorption spectra of methane are shown in Figure 4. The linear relationship between optical power and methane concentration is shown in Figure 5. The linear correlation coefficient is 0.9888, and the linear equation is y = −0.2344 x − 41.41.
From the experimental results, we can see that the methane detection system shown in Figure 2 can be used to detect methane. The data sets in this paper were obtained from the above detection system.

Results and Analysis
The Windows 7 Ultimate operating system was used to perform the experiments. The specific version of the software used to conduct the proposed model was Matlab2014a. The details of the hardware are as follows: Intel(R) Core (TM) i3-4160 CPU (Fourth Generation Standard Edition, Intel Corporation, Santa Clara, Cal., America and 2014), and 4 GB RAM. The effectiveness and superiority of our method were verified through the following aspects.
The results of the ICSO algorithm were compared with the CSO, PSO, and GA algorithms.

Parameter Setting and Analysis
In this subsection, we give all parameter settings used in this paper and focus on analyzing some parameters used in our method.
The parameters settings and analysis of ICSO, CSO, PSO, and GA are given after experimental verification, as follows: First, considering that the population size pop and the iterations M were small, it was difficult to converge to a global optimum. If their values are too large, it will take much time. We set their values to be 100 and 100, respectively, after experimental verification, and set the cross-validation

Results and Analysis
The Windows 7 Ultimate operating system was used to perform the experiments. The specific version of the software used to conduct the proposed model was Matlab2014a. The details of the hardware are as follows: Intel(R) Core (TM) i3-4160 CPU (Fourth Generation Standard Edition, Intel Corporation, Santa Clara, CA, USA and 2014), and 4 GB RAM. The effectiveness and superiority of our method were verified through the following aspects.
The results of the ICSO algorithm were compared with the CSO, PSO, and GA algorithms.

Parameter Setting and Analysis
In this subsection, we give all parameter settings used in this paper and focus on analyzing some parameters used in our method.
The parameters settings and analysis of ICSO, CSO, PSO, and GA are given after experimental verification, as follows: First, considering that the population size pop and the iterations M were small, it was difficult to converge to a global optimum. If their values are too large, it will take much time. We set their values to be 100 and 100, respectively, after experimental verification, and set the cross-validation value to be 3. Other parameters of the four algorithms are listed in Table 2. The four algorithms ran independently, and the average convergence curve obtained is shown in Figures 6-9.
As can be seen from Figures 6-9, the ICSO algorithm found the optimal fitness value after the 4th iteration, CSO after the 9th iteration, GA after about the 10th iteration, and PSO after about the 13th iteration. The average fitness value of the ICSO algorithm began to converge after the 21st iteration, the CSO after about the 22th iteration, and the GA after the 9th iteration, but it did not converge to the optimal fitness. The average fitness value of PSO stabilized at the 3rd iteration, but it has a large gap between the average fitness curve and optimal fitness curve.
According to the above results, the ICSO algorithm is the fastest algorithm that is convergent to a global optimum solution to solve optimization problems. The CSO algorithm is also convergent to a global optimum solution, but the convergence speed is slower. The GA and PSO algorithms cannot converge to the global optimum. Comprehensive comparison shows that the convergence effect of the ICSO algorithm is the best. Table 2. The parameters of the four algorithms.

The Algorithms
Parameters 1 genetic algorithm; 2 particle swarm optimization algorithm; 3 chicken swarm optimization algorithm; 4 improved chicken swarm optimization algorithm.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 10 of 17 value to be 3. Other parameters of the four algorithms are listed in Table 2. The four algorithms ran independently, and the average convergence curve obtained is shown in Figures 6-9.       As can be seen from Figures 6-9, the ICSO algorithm found the optimal fitness value after the 4th iteration, CSO after the 9th iteration, GA after about the 10th iteration, and PSO after about the 13th iteration. The average fitness value of the ICSO algorithm began to converge after the 21st iteration, the CSO after about the 22th iteration, and the GA after the 9th iteration, but it did not converge to the optimal fitness. The average fitness value of PSO stabilized at the 3rd iteration, but it has a large gap between the average fitness curve and optimal fitness curve.
According to the above results, the ICSO algorithm is the fastest algorithm that is convergent to a global optimum solution to solve optimization problems. The CSO algorithm is also convergent to a global optimum solution, but the convergence speed is slower. The GA and PSO algorithms cannot converge to the global optimum. Comprehensive comparison shows that the convergence effect of the ICSO algorithm is the best.

Prediction Results
In our experiment, there were 40 concentrations (1000 ppm-40,000 ppm) of methane. We randomly split the dataset into 80% training and 20% test sets. In other words, 32 samples were selected for training the classifiers, while the rest of the samples were used to test the model. The training set and testing set were randomly selected from the whole dataset. We repeated the traintest procedure five times with four models (ICSO-SVM, CSO-SVM, GA-SVM, and PSO-SVM), and calculated the mean value. The predicted results of the four models are shown in Table 3.

Prediction Results
In our experiment, there were 40 concentrations (1000 ppm-40,000 ppm) of methane. We randomly split the dataset into 80% training and 20% test sets. In other words, 32 samples were selected for training the classifiers, while the rest of the samples were used to test the model. The training set and testing set were randomly selected from the whole dataset. We repeated the train-test procedure five times with four models (ICSO-SVM, CSO-SVM, GA-SVM, and PSO-SVM), and calculated the mean value. The predicted results of the four models are shown in Table 3.
In order to analyze the performances of four models clearly, we calculated the relative error percentages of the four models, as shown in Figure 10.
As shown in Figure 10, the fluctuations of the ICSO-SVM and CSO-SVM relative error lines are stable, while the GA-SVM and PSO-SVM relative error lines are volatile. The maximum relative error percentage of the ICSO-SVM model was 4%, which is obviously lower than the other three models. In order to analyze the performances of four models clearly, we calculated the relative error percentages of the four models, as shown in Figure 10. As shown in Figure 10, the fluctuations of the ICSO-SVM and CSO-SVM relative error lines are stable, while the GA-SVM and PSO-SVM relative error lines are volatile. The maximum relative error percentage of the ICSO-SVM model was 4%, which is obviously lower than the other three models.
To eliminate bias in the test results, we repeated this train-test procedure 50 times with different random splits. We then averaged the recovery of each test to get the recovery rate and the mean squared error for each model.
The recovery rate can be calculated with Equation (23), and the mean squared error with Equation (22). The recovery rates for 50 repetitions of the four models are shown in  The recovery rates and the mean squared errors of ICSO-SVM, CSO-SVM, GA-SVM, and PSO-SVM models are shown in Table 4.  To eliminate bias in the test results, we repeated this train-test procedure 50 times with different random splits. We then averaged the recovery of each test to get the recovery rate and the mean squared error for each model.
The recovery rate can be calculated with Equation (23), and the mean squared error with Equation (22). The recovery rates for 50 repetitions of the four models are shown in Figures 11-14. The recovery rates and the mean squared errors of ICSO-SVM, CSO-SVM, GA-SVM, and PSO-SVM models are shown in Table 4.       It can be seen from Figures 11-14 that the recovery rate of the ICSO-SVM model remained stable within the values of [90,110], the CSO-SVM and GA-SVM models within [80,120], and the PSO-SVM model within [75,120]. The results of the stability study showed that the ICSO-SVM model has better stability. From Table 4, the four models could be indexed on their average recovery rate, as follows: ICSO-SVM > CSO-SVM > GA-SVM > PSO-SVM. The four models could also be indexed on their average mean squared error, as follows: ICSO-SVM > CSO-SVM > PSO-SVM > GA-SVM. The results from the experiments indicate that the ICSO-SVM has the best prediction performance.

Conclusions
In order to detect the concentration of methane accurately, the support vector machine optimized by improved chicken swarm optimization (ICSO-SVM) was used in this paper. First, the data were obtained by the methane detecting system. Next, in order to verify the validity of the ICSO-SVM model for predicting methane, CSO-SVM, GA-SVM, and PSO-SVM were used for comparison.
This study draws the following conclusions: (1) The mean squared error was adopted as the fitness function of the models. The experimental