Abstract
Feature selection is known as an NP-hard combinatorial problem in which the possible feature subsets increase exponentially with the number of features. Due to the increment of the feature size, the exhaustive search has become impractical. In addition, a feature set normally includes irrelevant, redundant, and relevant information. Therefore, in this paper, binary variants of a competitive swarm optimizer are proposed for wrapper feature selection. The proposed approaches are used to select a subset of significant features for classification purposes. The binary version introduced here is performed by employing the S-shaped and V-shaped transfer functions, which allows the search agents to move on the binary search space. The proposed approaches are tested by using 15 benchmark datasets collected from the UCI machine learning repository, and the results are compared with other conventional feature selection methods. Our results prove the capability of the proposed binary version of the competitive swarm optimizer not only in terms of high classification performance, but also low computational cost.
1. Introduction
In recent days, many applications involve the role of extracting useful information for data collection. The extracted information is known as feature, and it is useful in describing the target concept [1]. However, an increment in the number of features will cause the “curse of dimensionality” in which the performance of the system is degraded and becomes complex. This is mainly due to the existence of irrelevant and redundant information, which badly affects the performance of the classification model [2]. To resolve the issue above, a proper selection of extracted features is critically important. Hence, the feature selection problem has become one of the major concerns in most of the research areas [3].
Feature selection is the pre-processing step to determine a subset of significant features that can strongly improve the performance of the system. It not only eliminates the redundant information, but also reduces the temporal and spatial complexity of the classification model [4]. Generally, feature selection can be classified into two approaches: filter and wrapper. The former identifies the relevant features by using the proxy measure, mutual information, and data characteristics, while the later utilizes a predictive model to train the feature set for evaluating the nearly optimal feature subset [5,6]. As compared to the wrapper, filter feature selection is independent of the learning algorithm, and it is computationally less expensive. However, wrapper feature selection can usually offer better performance [7].
As for the wrapper approach, the feature selection is considered as a combinatorial optimization problem, which can be solved by using metaheuristic algorithms [8,9]. The most common wrapper feature selection methods are the genetic algorithm (GA) and binary particle swarm optimization (BPSO). The GA is an evolutionary algorithm that generates the population of solutions called chromosomes. For each generation, the solutions are evolved based on the selection, crossover, and mutation operations [10]. Several studies have shown that GA is good for the high dimensional feature selection problem [10,11]. However, the GA suffers from time consumption and parameter setting. Thus, Ghareb et al. [12] performed the hybridization between an enhanced genetic algorithm (EGA) and a filter approach for text categorization. The authors first employed the filter approach to identify the potential initial solutions, and the solutions are then evaluated by the EGA. Ma and Xia [13] introduced a novel tribe competition-based genetic algorithm (TCbGA) to tackle the feature selection problem in pattern classification. Another study proposed the adaptive multi-parent crossover GA for epileptic seizure identification [14].
BPSO is a binary variant of particle swarm optimization (PSO). Unlike GA, BPSO is a swarm-based algorithm that generates the population of solutions called particles. The particles adjust their positions by changing their velocities according to their own experience, as well as the experience of their neighbors [15]. BPSO is a useful tool and it has been widely applied for feature selection. However, BPSO has the disadvantages of premature convergence and stagnation, thus leading to ineffective solutions [15,16]. Therefore, Chuang et al. [17] proposed a chaotic binary particle swarm optimization (CBPSO) for feature selection in which the chaotic maps were implemented for identifying the inertia weight in each iteration. Jain et al. [18] developed an improved binary particle swarm optimization (iBPSO) for gene selection and cancer classification. The authors first applied the correlation-based feature selection (CFS) to reduce the dimensions, and then evaluated the relevant features using iBPSO. Another study introduced the BPSO with the personal best (pbest) guide strategy to tackle the feature selection problem in electromyography signals classification [19].
Competitive swarm optimizer (CSO) is a newly introduced variant of PSO [20]. In comparison with other metaheuristic algorithms, CSO has shown superior performance in several benchmark tests. Generally, CSO employs the competition strategy that partitioned the solutions into winners and losers in which the winners are directly moved to the next iteration. In this way, CSO is computationally less expensive since only half of the population is used in the evaluations. In this paper, we propose the binary version of CSO to solve the feature selection problem in classification tasks. The binary version introduced here is performed by implementing the transfer functions. In this approach, the transfer functions from S-shaped and V-shaped families are used to allow the search agents to move around the binary search space. The proposed approaches are validated with 15 benchmark datasets, and the results are compared with other conventional methods.
The organization of this paper as the following: Section 2 details the background of the competitive swarm optimizer (CSO). Section 3 presents the proposed binary version of the competitive swarm optimizer (BCSO) and Section 4 describes the application of BCSO in feature selection. The experimental results are discussed in Section 5. Finally, conclusions are offered in Section 6.
2. The Competitive Swarm Optimizer
The competitive swarm optimizer (CSO) is a recent metaheuristic optimization algorithm proposed by Cheng and Jin in 2015 [20]. The CSO is a new variant of particle swarm optimization (PSO), and it has been proven to work more effectively on large-scale optimization. In addition, the CSO is able to find the global optimum in a very short period, which leads to fast computational speed. In the CSO, the population of particles is randomly partitioned into two groups with equal size. The competition is then made between the particles from each group. From the competition, the particle that scores a better fitness value is known as the winner and it is directly moved to the new iteration. On the contrary, the loser updates its velocity and position by learning from the winner. Mathematically, the velocity and position of loser is updated as follows:
where vl is the velocity of loser particle, xw is the position of the winner particle, xl is the position of the loser particle, is the mean position of the current swarm, r1, r2, and r3 are three independent random vectors in [0, 1], is the social factor, d is the dimension of search space, and t is the iteration number. The pseudocode of the CSO is presented in Algorithm 1.
| Algorithm 1. Competitive swarm optimizer |
| Input parameter:N, Tmax and |
| (1) Initialize a population of particles, x |
| (2) Calculate the fitness of particles, F(x) |
| (3) Define the best particle as gbest |
| (4) for t = 1 to maximum number of iterations, Tmax |
| // Competition Strategy // |
| (5) for i = 1 to half of population, N/2 |
| (6) Random select two particles, xk and xm |
| (7) if F(xk) better than F(xm) |
| (8) xw = xk, xl = xm |
| (9) else |
| (10) xw = xm, xl = xk |
| (11) end if |
| (12) Add xw into new population |
| (13) Remove xk and xm from the population |
| (14) next i |
| //Velocity and Position Update // |
| (15) for i = 1 to half of population, N/2 |
| (16) for d = 1 to the dimension of search space, D |
| (17) Update velocity of loser using Equation (1) |
| (18) Update position of loser as shown in Equation (2) |
| (19) next d |
| (20) Calculate the fitness of new loser, F(xl) |
| (21) Move new loser into new population |
| (22) Update gbest if there is better solution |
| (23) next i |
| (24) Pass new population to next iteration |
| (25) next t |
| Output: Global best solution |
3. Binary Version of the Competitive Swarm Optimizer
The CSO is a swarm intelligent method that mimics the concept of competition between particles in the population. As mentioned in [20], the CSO has been tested on several benchmark functions, and it showed superior performance against other conventional optimization algorithms. The CSO algorithm utilizes the competition strategy and new velocity updating rule, which is beneficial in improving the exploration and convergence rate [20]. This motivates us to model the CSO so that it can be useful for wrapper-based feature selection.
Generally speaking, wrapper feature selection is considered as a binary optimization problem. In wrapper feature selection, the solution is represented as either 0 or 1 [8]. In the traditional CSO, the particles are moved around the search space by updating their positions within the continuous real domain. However, the real continuous value is not suitable for binary optimization since the solution should be represented in binary form. For such a reason, the CSO is modeled into the binary version.
One of the effective ways to convert the continuous optimization into a binary version is the utilization of a transfer function. In binary optimization, a transfer function is a mathematical function that determines the probability of changing a position vector’s dimension from 0 to 1, and vice versa [21]. More importantly, a transfer function is an extremely cheap operator, and it can improve the exploitation and exploration of the CSO in feature selection [22]. Hence, the transfer function has become our main focus in this work. In this paper, we propose eight versions of binary competitive swarm optimizers for feature selection.
3.1. S-Shaped Family
In general, the transfer function can be categorized into S-shaped and V-shaped families. In this sub-section, the implementation of the S-shaped transfer function is described. The S-shaped transfer function forces the search agents to move around the binary search space [8]. Previously, S-shaped transfer functions have been successfully applied in binary particle swarm optimization (BPSO), binary antlion optimizer (BALO) and binary salp swarm algorithm (BSSA) [8,21,23]. The four commonly used S-shaped transfer functions (S1–S4) are expressed as follows:
where vl is the velocity of loser particle, d is the dimension, and t is the iteration number. The illustrations of the S-shaped transfer functions are presented in Figure 1. In these approaches, the velocity of the loser is first calculated as shown in Equation (1). The transfer function is then used to convert the velocity into a probability value between [0, 1]. After that, the position of the loser is updated as:
where S can be S1, S2, S3, or S4 and r4 is a random vector distributed in [0, 1].
Figure 1.
S-shaped transfer functions (S1–S4).
3.2. V-Shaped Family
In this sub-section, the implementation of the V-shaped transfer function is presented. The V-shaped transfer function allows the search agents to perform the search within the binary search space. Many studies employ the V-shaped transfer function to convert the metaheuristic algorithms into a binary version [23,24,25]. The four frequently used V-shaped transfer functions (V1–V4) are defined as follows:
where vl is the velocity of loser particle, d is the dimension, and t is the iteration number. The illustrations of the V-shaped transfer functions are shown in Figure 2. Unlike the S-shaped transfer function, the V-shaped transfer function does not force the search agents to move on the binary search space. In this approach, the position of loser particle is updated as:
where V can be V1, V2, V3, or V4 and r5 is a random vector distributed in [0, 1].
Figure 2.
V-shaped transfer functions (V1–V4).
The pseudocode of the binary competitive swarm optimizer (BCSO) is shown in Algorithm 2. N and Tmax are the number of particles and the maximum number of iterations. In the first step, a population of N particles is randomly initialized, and the velocity of each particle is initialized as zero. Then, the fitness of each particle is evaluated. The best particle is defined as the global best, gbest. For each iteration, the particles are randomly divided into two groups, and the competition is made between two coupled particles. From the competition, the winners are directly passed into the new population. On the other hand, the losers update their velocity using Equation (1). Then, the velocity is converted into a probability value by employing S-shaped or V-shaped transfer functions. Afterward, the position of the loser particle is updated using Equation (7) or Equation (12). Next, the fitness of each new loser is evaluated, and the new losers are moved into the new population for the next iteration. At the end of each iteration, the global best solution gbest is updated. The procedure is repeated iteratively until the maximum number of iterations is reached. Finally, the global best solution is achieved.
| Algorithm 2. Binary competitive swarm optimizer. |
| Input parameter:N, Tmax and |
| (1) Initialize a population of particles, x |
| (2) Calculate the fitness of particles, F(x) |
| (3) Define the best particle as gbest |
| (4) for t = 1 to maximum number of iterations, Tmax |
| // Competition Strategy // |
| (5) for i = 1 to half of population, N/2 |
| (6) Random select two particles, xk and xm |
| (7) if F(xk) better than F(xm) |
| (8) xw = xk, xl = xm |
| (9) else |
| (10) xw = xm, xl = xk |
| (11) end if |
| (12) Add xw into new population |
| (13) Remove xk and xm from the population |
| (14) next i |
| //Velocity and Position Update // |
| (15) for i = 1 to half of population, N/2 |
| (16) for d = 1 to the dimension of search space, D |
| (17) Update velocity of loser using Equation (1) |
| (18) Convert velocity into probability using S-shaped or V-shaped transfer function |
| (19) Update position of loser as shown in Equation (7) or Equation (12) |
| (20) next d |
| (21) Calculate the fitness of new loser, F(xl) |
| (22) Move new loser into new population |
| (23) Update gbest if there is better solution |
| (24) next i |
| (25) Pass new population to next iteration |
| (26) next t |
| Output: Global best solution |
4. Application of the Binary Competitive Swarm Optimizer for Feature Selection
In this section, the proposed binary competitive swarm optimization approaches are applied to solve the feature selection problem in classification tasks. In wrapper feature selection, the solution is represented in binary form. Bit 1 indicates that the feature is selected, while bit 0 denotes the unselected feature [23]. For example, let solution X = {0, 1, 0, 0, 1, 0, 0, 0, 1, 1}. As can be seen, solution X consists of 10 dimensions (features). Among them, only four features (2nd, 5th, 9th, and 10th) are selected.
Feature selection is an NP-hard combinatorial problem. For a dataset with feature size D, the possible combination of feature subsets will be 2D – 1, which is impractical for searching exhaustively. Therefore, the proposed approaches are used to evaluate the best feature subset. In this paper, the fitness function that considers both classification error rate and number of features is applied. Mathematically, the fitness function can be expressed as:
where ER(K) is the classification error rate computed by a classifier relative to selection decision K of the features, |C| is the total number of features in the dataset, |S| is the length of selected feature subset, and α is the parameter in [0, 1] that controls the influence of the classification error rate. According to [8,23,26], the a is set at 0.99 since classification performance is the most important measure in this framework.
5. Experimental Results and Discussions
5.1. Experiment Setup
In this sub-section, the performances of the proposed binary competitive swarm optimizer approaches are investigated. The proposed approaches are validated with fifteen benchmark datasets acquired from the UCI machine learning repository [27]. Table 1 outlines the detail of the datasets in terms of the number of instances, number of features, and number of classes. Note that the features in the LSVT Voice Rehabilitation dataset are normalized in order to prevent numerical problems.
Table 1.
List of the used datasets.
As for wrapper feature selection, the classification error rate in the fitness function is computed by using the k-nearest neighbor (KNN) classifier with Euclidean distance metric and k = 5. The KNN is chosen due to its promising performance and fast computation speed in previous work [10]. In this paper, we use a hold-out strategy in which each dataset is partitioned into 80% for training and 20% for testing.
5.2. Comparison Algorithms and Evaluation Metrics
To examine the efficiency and efficacy of the proposed approaches, four state-of-the-art feature selection methods, including binary particle swarm optimization (BPSO), genetic algorithm (GA), binary differential evolution (BDE) [28], and binary salp swarm algorithm (BSSA) [23], are used for the comparison of performance. To ensure a fair comparison, the population size (N) and maximum number of iterations (Tmax) are fixed at 10 and 100, respectively [23]. On one hand, the dimension of the search space (D) is equal to the total number of features in each dataset. Table 2 exhibits the parameter settings for the utilized approaches. Note that there is no additional parameter setting for BSSA.
Table 2.
Parameter settings of the utilized approaches.
In the experiment, six evaluation metrics, including the best fitness, worst fitness, mean fitness, standard deviation of fitness (STD), feature size (number of selected features), and accuracy, are recorded. The details of the evaluation metrics can be found in [9,29,30]. To achieve statistically meaningful results, each approach is repeated for 30 independent runs. Thereafter, the average statistical measurements obtained throughout 30 independent runs are displayed as the experimental results. All the evaluations are conducted with MATLAB 2017 software (MathWorks, Natick, MA, USA) by using a computer with 2.90 GHz Intel Core i5-9400 CPU and 16 GB RAM.
5.3. Assessments of the BCSO in Feature Selection
In the first part of the experiment, the BCSO with the best transfer function is determined. There are eight transfer functions (from both S-shaped and V-shaped families) utilized in this work. Table 3 displays the experimental results of the best fitness, worst fitness, mean fitness, STD of fitness, and feature size of BCSOs on 15 datasets. Note that the best results among eight BCSOs are highlighted with bold text. In this table, the smaller the best, worst, mean, and STD of fitness values are, the better the performances are. As for the feature size, a lower value indicates that fewer features are selected by the algorithm. In other words, a smaller number of the feature size means more irrelevant and redundant features have been eliminated. From Table 3, it is observed that BCSO-V2 offered the smallest best fitness value on five datasets (6, 7, 9, 13, and 14), which overwhelmed other transfer functions in feature selection tasks. On the other hand, BCSO-S4 perceived the optimal STD value in most cases, in which a high consistency result can be ensured. In terms of feature size, BCSO-V3 contributed the lowest number of selected features for most of the datasets.
Table 3.
The experimental results of BCSO with eight different transfer functions.
Another important measurement is the accuracy obtained from the features selected by each approach. Figure 3 demonstrates the boxplot of BCSO with eight different transfer functions across 15 datasets. As can be seen, BCSOs with V-shaped transfer functions can usually achieve better results as compared to S-shaped transfer functions. This is because the V-shaped transfer function does not force the search agent to take the bits 1 or 0, thus resulting in excellent performance. Across 15 datasets, it is seen that the optimal classification performance is achieved by BCSO-V1 and BCSO-V2. Based on the results obtained from Table 3 and Figure 3, it can conclude that the BCSO with transfer function V2 yielded superior performance in evaluating the relevant features, which overtakes other transfer functions in the current work.
Figure 3.
Boxplot of the BCSO with eight different transfer functions across 15 datasets.
5.4. Comparison with Other Algorithms
Table 4 presents the comparison of the results of the BCSO with BDE, BPSO, BSSA, and GA. In Table 4, the best result on each metric is bolded. Through observation from the result in Table 4, it is seen that BCSO showed competitive performance in feature selection tasks. In comparison with BDE, BPSO, BSSA, and GA, the BCSO achieved the optimal best fitness value on 11 datasets. This result implies that the performance of the BCSO was superior, which overtakes other algorithms in identifying the significant features.
Table 4.
Comparison of the results of BCSO with BDE, BPSO, BSSA, and GA.
On the other hand, one can see that BSSA perceived the smallest feature size (number of selected features) in this work. This finding indicates that BSSA can usually select a subset of minimal features while maintaining high performance. In terms of robustness, the most consistent results are provided by BDE due to the smallest standard deviation value.
Figure 4 demonstrates the accuracy of BDE, BPSO, GA, BSSA, and BCSO on 15 datasets. From Figure 4, the best classification performance was perceived by the BCSO. As compared to other methods, the BCSO showed superior accuracy on 11 datasets. It is obvious that the BCSO is a useful feature selection tool, which provides better classification performance in this work. As for datasets 1 and 8, the best accuracy was achieved by the BPSO. On one hand, BSSA and GA obtained the best accuracy on datasets 5 and 10, respectively. Inspecting the result, the worst performance was found to be BDE. The result obtained implies that BDE did not work very well in this study.
Figure 4.
Accuracy of BDE, BPSO, BSSA, GA, and BCSO on 15 datasets.
Figure 5 and Figure 6 illustrate the convergence curves of BDE, BPSO, BSSA, GA, and BCSO for 15 datasets. Note that the fitness is the average fitness value obtained from 30 runs. In these figures, it is observed that BCSO provided competitive performance against BPSO, GA, BSSA, and BDE. Among the rivals, the worst performance was achieved by BDE. Unfortunately, BDE did not find the global optimum efficiently, thus resulting in ineffective resolution. On one hand, the BCSO keeps tracking the global optimum, which leads to good exploitation and exploration capability. As a result, the BCSO offers very good diversity, perceived as the best performance on most of the datasets.
Figure 5.
Convergence curves of BDE, BPSO, BSSA, GA, and BCSO on datasets 1–8.
Figure 6.
Convergence curves of BDE, BPSO, BSSA, GA, and BCSO on datasets 9–15.
Furthermore, the Wilcoxon rank sum test with 95% confidence level is applied to examine whether the classification performance achieved by the BCSO is significantly better than other methods. The BCSO is selected as the reference algorithm since it offers better classification results in this work. The results of the Wilcoxon test with p-values is presented in Table 5. For the ease of understanding, the symbols “w/t/l” indicate the BCSO is superior to (win), equal to (tie), and inferior to (lose) other algorithms. As can be seen, the classification performance of the BCSO was significantly better than BPSO, BSSA, GA, and BDE (p-value < 0.05) in most cases. For example, the performance of the BCSO was significantly better than the BPSO on nine datasets. Additionally, the analysis of variance (ANOVA) with post-hoc test is applied to investigate whether there is a significant difference between the BCSO and other algorithms across 15 datasets. Successively, the performance of our BCSO was significantly better (p-value < 0.05) when compared to BDE, BPSO, BSSA, and GA. The results obtained evidently show the superiority of the BCSO with respect to the feature selection problem.
Table 5.
p-values of the Wilcoxon rank sum test of the BCSO accuracy results versus other algorithms.
Table 6 outlines the results of the computational cost of BDE, BPSO, BSSA, GA, and BCSO on 15 datasets. Judging from Table 6, the lowest computation time is perceived by the BCSO. This is expected since the proposed approaches only update the velocity and position of losers (half of the population) in the process of evaluations. In this way, the proposed approaches compute faster than other conventional methods in feature selection. On the contrary, the slowest processing speed is found to be the GA, followed by the BPSO. Based on the results obtained, the BCSO not only achieves the best classification performance, but is also computationally less expensive. Evidently, the BCSO is a powerful feature selection tool and it can be applied to other engineering applications.
Table 6.
Computational cost of BDE, BPSO, GA, BSSA and BCSO on 15 datasets.
6. Conclusions
In this paper, binary variants of the CSO are proposed and applied for feature selection tasks. The continuous CSO is converted into the binary version by using transfer functions. Eight different transfer functions from S-shaped and V-shaped families are implemented in the BCSO. The S-shaped transfer functions force the search agents to move on the binary search space. On one hand, the V-shaped transfer functions allow the search agents to perform the search around the binary search space. The proposed BCSO is validated using 15 benchmark datasets. Firstly, the BCSO with the optimal transfer function is investigated. In comparison with other transfer functions, we found that the BCSO with the V2 transfer function was the most suitable, which perceived the optimal performance in current work. Secondly, the performance of the BCSO is verified with four other conventional feature selection methods. Based on the results obtained, the BCSO outperformed other methods (BDE, BSSA, BPSO, and GA) when finding the significant features, in which a high searching capability can be ensured. In addition, BCSO can often select a smaller number of significant features that contributed a high accuracy. Moreover, the processing speed of the BCSO is extremely fast, which is more appropriate in real-world applications. All in all, it can be inferred that the BCSO is a valuable feature selection tool. In the future, the BCSO can be applied to other binary optimization tasks, such as unit commitment issues, optimized neural networks, and the knapsack problem.
Author Contributions
Conceptualization: J.T.; formal analysis: J.T.; funding acquisition: A.R.A.; investigation: J.T.; methodology: J.T.; software: J.T.; supervision: A.R.A.; validation: J.T.; writing—original draft: J.T.; writing—review and editing: J.T., A.R.A., and N.M.S.
Funding
This research and the article processing charge were funded by the Ministry of Higher Education (MOHE) Malaysia under grant number GLuar/STEVIA/2016/FKE-CeRIA/l00009.
Acknowledgments
The authors would like to thank the Ministry of Higher Education Malaysia for funding this research under grant GLuar/STEVIA/2016/FKE-CeRIA/l00009.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Mafarja, M.; Aljarah, I.; Heidari, A.A.; Hammouri, A.I.; Faris, H.; Al-Zoubi, A.M.; Mirjalili, S. Evolutionary Population Dynamics and Grasshopper Optimization approaches for feature selection problems. Knowl.-Based Syst. 2018, 145, 25–45. [Google Scholar] [CrossRef]
- Arora, S.; Anand, P. Binary butterfly optimization approaches for feature selection. Expert Syst. Appl. 2019, 116, 147–160. [Google Scholar] [CrossRef]
- Hafiz, F.; Swain, A.; Patel, N.; Naik, C. A two-dimensional (2-D) learning framework for Particle Swarm based feature selection. Pattern Recognit. 2018, 76, 416–433. [Google Scholar] [CrossRef]
- Lin, K.C.; Hung, J.C.; Wei, J. Feature selection with modified lion’s algorithms and support vector machine for high-dimensional data. Appl. Soft Comput. 2018, 68, 669–676. [Google Scholar] [CrossRef]
- Lin, K.C.; Zhang, K.Y.; Huang, Y.H.; Hung, J.C.; Yen, N. Feature selection based on an improved cat swarm optimization algorithm for big data classification. J. Supercomput. 2016, 72, 3210–3221. [Google Scholar] [CrossRef]
- Chen, Y.P.; Li, Y.; Wang, G.; Zheng, Y.F.; Xu, Q.; Fan, J.H.; Cui, X.T. A novel bacterial foraging optimization algorithm for feature selection. Expert Syst. Appl. 2017, 83, 1–17. [Google Scholar] [CrossRef]
- Xue, B.; Zhang, M.; Browne, W.N. Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach. IEEE Trans. Cybern. 2013, 43, 1656–1671. [Google Scholar] [CrossRef] [PubMed]
- Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary ant lion approaches for feature selection. Neurocomputing 2016, 213, 54–65. [Google Scholar] [CrossRef]
- Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 172, 371–381. [Google Scholar] [CrossRef]
- Huang, C.L.; Wang, C.J. A GA-based feature selection and parameters optimization for support vector machines. Expert Syst. Appl. 2006, 31, 231–240. [Google Scholar] [CrossRef]
- De Stefano, C.; Fontanella, F.; Marrocco, C.; Scotto di Freca, A. A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recognit. Lett. 2014, 35, 130–141. [Google Scholar] [CrossRef]
- Ghareb, A.S.; Bakar, A.A.; Hamdan, A.R. Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 2016, 49, 31–47. [Google Scholar] [CrossRef]
- Ma, B.; Xia, Y. A tribe competition-based genetic algorithm for feature selection in pattern classification. Appl. Soft Comput. 2017, 58, 328–338. [Google Scholar] [CrossRef]
- Al-Sharhan, S.; Bimba, A. Adaptive multi-parent crossover GA for feature optimization in epileptic seizure identification. Appl. Soft Comput. 2019, 75, 575–587. [Google Scholar] [CrossRef]
- Chuang, L.Y.; Chang, H.W.; Tu, C.J.; Yang, C.H. Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 2008, 32, 29–38. [Google Scholar] [CrossRef]
- Tan, T.Y.; Zhang, L.; Neoh, S.C.; Lim, C.P. Intelligent skin cancer detection using enhanced particle swarm optimization. Knowl.-Based Syst. 2018, 158, 118–135. [Google Scholar] [CrossRef]
- Chuang, L.Y.; Yang, C.H.; Li, J.C. Chaotic maps based on binary particle swarm optimization for feature selection. Appl. Soft Comput. 2011, 11, 239–248. [Google Scholar] [CrossRef]
- Jain, I.; Jain, V.K.; Jain, R. Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Appl. Soft Comput. 2018, 62, 203–215. [Google Scholar] [CrossRef]
- Too, J.; Abdullah, A.R.; Mohd Saad, N.; Tee, W. EMG Feature Selection and Classification Using a Pbest-Guide Binary Particle Swarm Optimization. Computation 2019, 7, 12. [Google Scholar] [CrossRef]
- Cheng, R.; Jin, Y. A Competitive Swarm Optimizer for Large Scale Optimization. IEEE Trans. Cybern. 2015, 45, 191–204. [Google Scholar] [CrossRef]
- Mirjalili, S.; Lewis, A. S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization. Swarm Evol. Comput. 2013, 9, 1–14. [Google Scholar] [CrossRef]
- Saremi, S.; Mirjalili, S.; Lewis, A. How important is a transfer function in discrete heuristic algorithms. Neural Comput. Appl. 2015, 26, 625–640. [Google Scholar] [CrossRef]
- Faris, H.; Mafarja, M.M.; Heidari, A.A.; Aljarah, I.; Al-Zoubi, A.M.; Mirjalili, S.; Fujita, H. An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems. Knowl.-Based Syst. 2018, 154, 43–67. [Google Scholar] [CrossRef]
- Mafarja, M.; Aljarah, I.; Faris, H.; Hammouri, A.I.; Al-Zoubi, A.M.; Mirjalili, S. Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst. Appl. 2019, 117, 267–286. [Google Scholar] [CrossRef]
- Rashedi, E.; Nezamabadi-pour, H.; Saryazdi, S. BGSA: Binary gravitational search algorithm. Nat. Comput. 2010, 9, 727–745. [Google Scholar] [CrossRef]
- Emary, E.; Zawbaa, H.M. Feature selection via Lèvy Antlion optimization. Pattern Anal. Appl. 2018, 19, 1–20. [Google Scholar] [CrossRef]
- UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 24 March 2019).
- Zorarpacı, E.; Özel, S.A. A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst. Appl. 2016, 62, 91–103. [Google Scholar] [CrossRef]
- Zawbaa, H.M.; Emary, E.; Grosan, C. Feature Selection via Chaotic Antlion Optimization. PLoS ONE 2016, 11, e0150652. [Google Scholar] [CrossRef]
- Too, J.; Abdullah, A.R.; Mohd Saad, N. A New Co-Evolution Binary Particle Swarm Optimization with Multiple Inertia Weight Strategy for Feature Selection. Informatics 2019, 6, 21. [Google Scholar] [CrossRef]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).