Simultaneous Feature Selection and Support Vector Machine Optimization Using an Enhanced Chimp Optimization Algorithm

: Chimp Optimization Algorithm (ChOA), a novel meta ‐ heuristic algorithm, has been pro ‐ posed in recent years. It divides the population into four different levels for the purpose of hunting. However, there are still some defects that lead to the algorithm falling into the local optimum. To overcome these defects, an Enhanced Chimp Optimization Algorithm (EChOA) is developed in this paper. Highly Disruptive Polynomial Mutation (HDPM) is introduced to further explore the popu ‐ lation space and increase the population diversity. Then, the Spearman’s rank correlation coefficient between the chimps with the highest fitness and the lowest fitness is calculated. In order to avoid the local optimization, the chimps with low fitness values are introduced with Beetle Antenna Search Algorithm (BAS) to obtain visual ability. Through the introduction of the above three strat ‐ egies, the ability of population exploration and exploitation is enhanced. On this basis, this paper proposes an EChOA ‐ SVM model, which can optimize parameters while selecting the features. Thus, the maximum classification accuracy can be achieved with as few features as possible. To verify the effectiveness of the proposed method, the proposed method is compared with seven common meth ‐ ods, including the original algorithm. Seventeen benchmark datasets from the UCI machine learn ‐ ing library are used to evaluate the accuracy, number of features, and fitness of these methods. Experimental results show that the classification accuracy of the proposed method is better than the other methods on most data sets, and the number of features required by the proposed method is also less than the other algorithms.


Introduction
With society gradually becoming digitalized, the question of how to extract useful information effectively from complex and huge data has become the focus of research in recent years. Machine learning is one of the important research fields of information recognition and pattern recognition in data sets [1]. It has been widely used due to its strong data processing ability. Machine learning is evolving from computer science [2], which is a multidisciplinary and interdisciplinary major. It covers probability theory, statistics, approximate theory, and complex algorithms, and it can design efficient and accurate prediction algorithms [3]. It can be divided into supervised learning and unsupervised learning according to the learning mode [4]. SVM is one of the most widely used supervised learning algorithms. Vapnik and other researchers built the cornerstone of SVM [5]. It is widely used in pattern recognition [6,7], text recognition [8], biomedical [9][10][11], imaging medicine [12], anomaly detection [13] and in other fields. SVM classifies feature spaces into two categories and defines a non-parametric method for finding decision boundaries [14]. SVM has shown excellent performance in many problems [15]. Using the support vector machines to solve problems can improve generalization performance, improve computational efficiency, reduce running time, and produce a very accurate classification model [16,17]. The Support Vector Machines can solve both linear and nonlinear classification problems. When the data set is linearly inseparable, the nonlinear problem is usually transformed into a linear problem. Then, the data set can be used to construct an optimal hyperplane in the feature space, so that the feature space can be divided into two classes more easily [18]. However, when machine learning processes high-dimensional data sets, there will be problems such as noise and redundant data [19]. In this case, feature selection [20] can be used to reduce the number of features.
Feature selection is to select the optimal feature subset from the original data set [21]. According to the form of feature selection, the feature selection methods can be divided into the following three types: filter, wrapper, and embedded [22][23][24]. The wrapper-based approach is used in this article. The wrapper method, which relies on the classification method [25], is more efficient than the other two, but its computational intensity is relatively greater [26]. The filter method [27] uses numerous indicators to evaluate and select high-order features according to their discriminant attributes [28]. After that it will select the subset with the richest information [29]. The main idea of the wrapper approach is to treat the selection of subsets as a search optimization problem. At first it will generate different combinations, then evaluate them, and finally compare them with other combinations. The wrapper method is often used in feature selection problems due to the superiority of its calculation results, and the wrapper method can directly interact with classifier [30]. The wrapper method can be effectively combined with the meta-heuristic optimization algorithm so to achieve better practical application effect.
In recent years, many researchers have combined optimization algorithms with SVM to solve problems. Influenced by this, this paper tries to combine ChOA with SVM for feature selection and parameter optimization. ChOA is a new meta-heuristic optimization algorithm proposed by Khishe and Mosavi in 2019 [31] which is inspired by the hierarchy mechanism of chimps in nature when they are hunting. It can solve the problem of slow convergence and local optimum when it is solving the serious dimensional problems. However, the ChOA still has some shortcomings. Firstly, the population diversity of the algorithm is insufficient in the initial stage. Secondly, there is a risk of falling into local optimal in the final search stage. Therefore, an Enhanced Chimp Optimization Algorithm is proposed in this paper. Three strategies have been introduced successively, including highly disruptive polynomial mutation [32], Spearman's correlation coefficient, and Beetle Antennae Search Algorithm [33]. In the initial stage, HDPM is used to increase the population's variety, which improved the detection ability of the optimization algorithm. Due to the updating of the position of the chimp being determined by each level of chimp, this paper will improve the global search ability of chimps by improving the position updating ability of lower level chimps. The algorithm firstly introduces the Spearman's rank correlation coefficient to calculate the distance between low grade and high grade chimp. For the lower chimps that are far away from the higher chimps, this paper will introduce the beetle antenna search algorithm. It can make the chimps with low fitness achieve the visual ability, thus they can change their movement direction according to the surrounding environment. This strategy improves the local and global search ability of the chimps with lower fitness. On this basis, this paper proposes an EChOA-SVM model. This model is used for feature selection and parameter optimization at the same time to obtain good classification accuracy and performance. The main contributions of this paper are summarized as follows: 1. An Enhanced Chimp Optimization Algorithm is proposed to solve the shortcomings of the ChOA and make it better applied to feature selection problems. 2. HDPM strategy is introduced in the initial stage to enhance the population diversity.
3. Spearman's rank correlation coefficient helps to identify the chimps that need to be improved. Then BAS is introduced to improve the positions updating ability. 4. The EChOA-SVM model is used for feature selection and SVM parameter optimization simultaneously. The model is evaluated by 17 benchmark data sets in UCI machine learning library [34]. In order to verify the effectiveness of this method, it is compared with seven optimization algorithms, such as ChOA [31], GWO [35], WOA [36], ALO [37], GOA [38], MFO [39], and SSA [40].
The structure of this article is as follows. Section two reviews relevant literatures. Section three briefly introduces the basic principle of the ChOA. Section four proposes a new EChOA, then introduces the related theory of EChOA-SVM. In Section five, compared and analyzed the experimental results.

Literature Review
Optimization algorithms have been widely used in different fields such as medicine, multi-objective optimization, data classification, feature selection, and Support Vector Machine optimization. Zhao and Zhang proposed a Learning-based Evolutionary Multiobjective Algorithm [41]. By comparing five algorithms, the proposed algorithm is significantly superior to the other compared algorithms in determining the convergence and the approximation of the Pareto front. Dulebenets proposed an Adaptive Polyploid Memetic Algorithm to solve the cross-docking terminal vehicle scheduling problem [42]. This algorithm can assist the correct operation planning of CDT. Liu, Wang and Huang proposed an Alternative Algorithm to deal with Multiobjective optimization problems [43] which is proven to be better than other many-objective evolutionary algorithms. Furthermore, it is easily extended to solve constrained multi-objective optimization problems. Junayed et al., established an optimization model and solution algorithm to optimize the factory-in-a-box supply chain [44]. Gianni et al., Established specific rules or formulas that can distinguish bacterial from viral meningitis by machine learning methods [45]. The method even achieved 100% accuracy in detecting bacterial meningitis. Panda and Majhi used the Salp Swarm Algorithm to train the Multilayer Perceptron for data classification [46]. Compared with other classical optimization algorithms, the results show that this method is very advantageous.
This paper mainly studies the application of optimization algorithm in feature selection and support vector machine optimization. Feature selection is to select the optimal feature subset from the original data set, which can be regarded as an optimization problem [47]. In order to obtain a better classification accuracy, it is necessary to input the optimal SVM parameters into the feature subset. The selection of the feature subset will also affect the parameters of SVM. Therefore, in order to obtain the ideal classification accuracy, both feature selection and optimization of SVM parameters are required [48]. Huang and Wang [49] proposed a GA algorithm for feature selection and support vector machine parameter optimization simultaneously. Compared with other methods, the proposed method can achieve higher classification accuracy with fewer features. Shih-Wei Lin et al. [50] proposed a feature selection and parameter optimization method based on SA, and compared the method with grid algorithm. The results showed that the proposed method could significantly improve the classification accuracy. Heming Jia and Kangjian Sun proposed an IBMO-SVM classification model [51]. This model can help SVM to find the optimal feature subset and parameters at the same time. Compared with other optimization algorithms, this model shows good performance on both low and high dimensional data sets. The ECHOA-SVM model, which is proposed in this paper, is inspired by these methods.

Chimp Optimization Algorithm (ChOA)
ChOA is a mathematical model that is based on intelligent diversity [31]. Driving, chasing, blocking and attacking are accomplished by four different types of chimps, which are accomplished by attackers, obstacles, chasers, and drivers. The four hunting steps are completed in two stages. The first stage is the exploration stage, and the second stage is the exploitation stage. The exploration stage includes driving, blocking, and chasing the prey. As for the exploitation stage, it has to attack the prey. In which, the driving and chasing are represented by Equations (1) and (2).
where prey where f non-linearly declined from 2.5 to 0, r1 and r2 is the random number between 0 and 1, and m is the chaotic vector. The dynamic coefficient f can be selected for different curves and slopes, thus chimps can use different abilities to search the prey.
Chimps can update their positions based on the other chimps, and this mathematical model can be represented by Equations (6) and (8).

Highly Disruptive Polynomial Mutation (HDPM)
The highly disruptive polynomial mutation is the improved version of the polynomial mutation method. It can solve the shortcoming that the polynomial mutation method may fall into local optimum when the variable is close to the boundary [32]. Equations  (12) where ub and lb represent the upper and lower boundaries of the search space. r is a random number between 0 and 1. m  is a distribution exponential, which is a non-negative number. As can be seen from the above formula, HDPM can explore the entire search space.

Spearman's Rank Correlation Coefficient
Spearman's rank correlation coefficient is a statistical index to reflect the degree of the relationship between two groups of variables, which is figured upon a level basis. The formula for calculating as Equation (13).
where i d is the difference of grades among each pair of samples, and n is the dimension of the series. If the absolute value of the correlation coefficient is equal to 1, then the two series are monotonically correlated; otherwise, they are uncorrelated.

Beetle Antennae Search Algorithm (BAS)
The long-horned beetle has two too long antennae, which can combine the scent of prey to expand detection range and act as a protective alarm mechanism [33]. The beetle explores nearby areas by swinging its antennae on one side of its body to accept the smell. The beetle will move toward to the side where it detects a high odor concentration, as it is shown in Figure 1. The Beetle Antennae Search Algorithm is designed based on this property of beetles. The search direction of beetles is represented by Equation (14).
where  b is the direction vector of the beetle, rnd is a random function, and k is the dimension of the position.
Next, Equation (15) respectively presents the search behaviors on the left and right sides to simulate the activity tracking of the beetle.
where l x and r x respectively represent the locations within the left and right search areas, t x represents the beetle's position at t th time instant, and t d represents the perceived length of the antenna, which will gradually decrease with the passage with time.
The position update of the beetle can be represented by Equation (16).
where  is the step size of the search, which initial scope should be equal to the search area, and the sign represents the sign function. The odor concentration at x is expressed by ) (x f , which is also known as the fitness function.

Improvement Strategy
Although the original ChOA divides the chimps into four different levels to complete the hunting process, the algorithm still has two obvious defects when applied to higher dimensional problems. The first disadvantage is that in the initial stage of the population it lacks the population diversity. The other disadvantage is there is a risk of falling into the local optimal in the final search stage. Therefore, this article proposes an Enhanced Chimp Optimization Algorithm, which can be better applied to feature selection problems.
Three strategies including HDPM, Spearman's rank correlation coefficient, and BAS algorithm are introduced into EChOA based on the original algorithm. First, the HDPM strategy is introduced to enhance the population diversity in the initialization phase. The traditional polynomial mutation (PM) has almost no effect when the variable is close to the boundary. The HDPM strategy uses Equation (12) to generate the mutation location of the chimp, which helps further to explore the regions and boundaries of the initial space. In this way, even if the variables are on the search boundary, the search space can be fully utilized to ensure the diversity of the population. Second, the Spearman's rank correlation coefficient between the driver and attacker chimp is calculated. As it can be seen from Equation (7), position update is jointly determined by all chimps of different grades. By improving the position of the lower chimp, the population can avoid falling into local optimal effectively. The distance between the driver and the attacker can be determined by calculating the Spearman's rank correlation coefficient of the two by Equation (13). The two chimp species are negatively or uncorrelated when Spearman's rank correlation coefficient is less than or equal to 0. This will show whether the driver and the attacker are close or far away. Finally, the BSA is introduced for the chimp which is far away from the attacker. The position of the chimp is improved by using Equation (16), which makes it acquire the visual ability. Thus, they can judge the surrounding environment and decide the direction of movement. This increases the performance of the driver chimp and avoids local optimal.

EChOA for Optimizing SVM and Feature Selection
EChOA-SVM firstly uses optimization algorithm to search the optimal feature subset, and then classify the feature subset by SVM. In the feature selection, the positions of population are separated into 0 and 1 by logical function. If the feature corresponds to 1, the feature will be selected; if the feature corresponds to 0, the feature will not be selected. Since the selection of kernel function and its parameters is related to the performance of SVM, in order to obtain better classification accuracy, EChOA is also needed to optimize the parameters. The EChOA-SVM model generates the primary agents through the penalty parameter c and the nuclear parameter g, and calculates the fitness value to evaluate the chimp's positions. Finally outputs the optimal value of c, g and the best accuracy to obtain the final classification result.
The description of the EChOA-SVM is shown as follows. The flow chart is shown in Figure 2.

Datasets Details
The implementation of the proposed algorithm is done using Matlab. Seventeen data sets from the University of California at Irvine (UCI) machine learning repository [34] has been selected to evaluate the proposed EChOA-SVM approach. Table 1 describes the number of instances, the number of features, and the number of classes of the data set. Seventeen data sets of different sizes and latitudes are selected to observe the performance of the algorithm at different scales. All the experimental series are carried out on Matlab R2016a, and the computer is configured as Intel(R) Core (TM) i5-1035G1 CPU @ 1.00GHz 1.19GHz, using Microsoft Windows 10 system. Each experiment is run ten times independently to reduce the random influence. In addition, the general parameters are set as follows: the population size is 30, and each run is set to 100 iterations as stopping criteria.
All settings and parameters that are used in the experiments for all algorithms are presented in Table 2.

Results and Discussion
In order to compare the different feature selection methods with the method proposed in this paper, three indexes are used in this paper [53].
1. Accuracy: Accuracy represents the ratio between the number of correct classification and the actual number, which is reflecting the accuracy of the classifier recognition results. It is an important index to evaluate algorithm performance; 2. The number of features: The number of features reflects ability of eliminate redundancy. It shows whether the method can find the optimal feature subset; 3. Fitness value: Fitness value can reflect the advantages and disadvantages of the solution selected by the classifier. The better fitness value can get the better the solution. Table 3 shows the mean and the standard deviation of the accuracy. Figure 3 presents the boxplot charts of the accuracy for eight used data sets. The boxplots are shown for the seventeen values of classification accuracy that are given by each algorithm at the end of the SVM training. The boxplots indicate the improved performance of EChOA for optimizing the SVM. Table 3. Mean and standard deviation of classification accuracy for all data sets with applying feature selection.  The performance of EChOA is better than other optimization algorithms in terms of average accuracy. EChOA obtains the best performance on 13 data sets out of the 17 data sets, which accounts for 76.5% of the total data sets. For the other four data sets, EChOA also has strong competitiveness. Although EChOA do not achieve the highest accuracy in Coimbra, Dermatology and Glass, its classification accuracy ranked the third among the eight optimization algorithms, just next to GWO and WOA. It is noteworthy that the accuracy of EChOA is 10 percent higher than ChOA on Bupa data sets. It is 12 percent higher than GWO on the Knowledge data set and 18 percent higher than GWO on the Lymphography data set. At the same time, it can be seen from the box chart that each optimization algorithm generates ten accuracies in the process of running each data set ten times. From the perspective of their distribution, the accuracy of EChOA is very stable. The accuracy of the EChOA optimization algorithm hardly changes significantly in the 17 data sets. Especially in the Coimbra and Heart data sets, the EChOA optimization algorithm is still very stable in the case that the accuracy of almost all optimization algorithms varies significantly each time. All these show the superiority of the EChOA algorithm. Table 4 shows the mean and the standard deviation of the number of the features. Figure 4 presents the histogram of the number of features for all used data sets. We can intuitively see that the number of features required by EChOA is almost below the average of the number of features of all the algorithms. There are only two data sets do not get the best results (Bupa and Transfusion), which accounts for only 11.8 percent of the total. In Liver data set EChOA even requires only one feature. It shows that EChOA can use space more efficiently. In the analysis of classification accuracy, it has been found that EChOA's classification accuracy is lower than GWO and WOA in Coimbra, Dermatology and Glass. However, it is not difficult to find in Table 4 that the number of features required by EChOA in these three data sets is less than GWO and WOA. Especially in the Dermatology data set, EChOA required nearly seven fewer features than GWO and WOA. It can be seen in Figure 4, in Coimbra, Dermatology, Divorce and Liver data sets, there are two or three optimization algorithms with far higher features counts than the other optimization algorithms, but EChOA never had this situation. It should be noticed that the number of features required by CHOA is much higher than that of EChOA in Dermatology and Wine data sets, which also indicates the superiority of EChOA over CHOA and other algorithms in the quantitative aspect of features.   Table 5 shows the mean and the standard deviation of the fitness. Figure 5 presents the line chart of the fitness for six used data sets. The lines in Figure 5 represents the convergence curves of corresponding algorithm. The fitness value of EChOA is the minimum in all the 12 data sets, and the convergence time is very short. The convergence speed of Coimbra, Dermatology, Diagnostic, Divorce and Heart data sets is all higher than ChOA, and some fitness values are also lower than ChOA. At the same time, EChOA convergence curve is relatively flat, in the performance of each data set is relatively good, which are almost below to the average fitness value, does not appear similar to GOA (e.g., Liver) in a certain data set fitness value too much higher than the other algorithms. As it can be seen from Figure 5, the convergence curve of EChOA in the Heart data set is smoother than other algorithms, which shows that the local search ability of the algorithm has been greatly improved. EChOA algorithm can achieve the maximum objective function value when the number of iterations is less than 10 on Dermatology and Divorce data sets, which is earlier than other optimization algorithms. This indicates that the convergence rate of EChOA is improved. As you can see, EChOA is better and more stable than most algorithms.

Conclusions
This work presents a novel hybrid method for optimizing SVM based on the EChOA. The proposed approach is able to tune the parameters of the SVM kernel and at the same time can find the best accuracy. Experimental results show that the proposed algorithm is effective in improving the classification accuracy of SVM. The experimental results show that the EChOA algorithm has certain advantages over the other seven optimization algorithms in terms of accuracy, feature number and fitness value, and its comprehensive performance is relatively stable. All these indicate that the EChOA algorithm is very competitive.
Although EChOA can effectively improve the exploitation and exploration of ChOA, EChOA introduces three strategies at the same time that increase the complexity of the algorithm. Therefore, it is necessary to consider how to reduce ineffective improvement strategies to alleviate this problem. In subsequent studies, some parallel strategies such as co-evolutionary mechanism can be introduced.
In the future research, EChOA can be applied in multi-objective problems, social manufacturing optimization, and video coding optimization. Simultaneously, the ECHOA-SVM model that proposed in this paper can be studied in a larger scale. It can also be applied to other practical problems like data mining.