Next Article in Journal
Pressure-Induced Deformation of Pillar-Type Profiled Membranes and Its Effects on Flow and Mass Transfer
Previous Article in Journal
Experimental Implementation and Theoretical Investigation of a Vanadium Dioxide Optical Filter for Bit Error Rate Enhancement of Enhanced Space Shift Keying Visible Light Communication Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Binary Competitive Swarm Optimizer Approaches for Feature Selection

by
Jingwei Too
1,*,
Abdul Rahim Abdullah
1,* and
Norhashimah Mohd Saad
2
1
Fakulti Kejuruteraan Elektrik, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, Durian Tunggal, Melaka 76100, Malaysia
2
Fakulti Kejuruteraan Elektronik dan Kejuruteraan Komputer, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, Durian Tunggal, Melaka 76100, Malaysia
*
Authors to whom correspondence should be addressed.
Computation 2019, 7(2), 31; https://doi.org/10.3390/computation7020031
Submission received: 16 May 2019 / Revised: 7 June 2019 / Accepted: 10 June 2019 / Published: 14 June 2019
(This article belongs to the Section Computational Engineering)

Abstract

:
Feature selection is known as an NP-hard combinatorial problem in which the possible feature subsets increase exponentially with the number of features. Due to the increment of the feature size, the exhaustive search has become impractical. In addition, a feature set normally includes irrelevant, redundant, and relevant information. Therefore, in this paper, binary variants of a competitive swarm optimizer are proposed for wrapper feature selection. The proposed approaches are used to select a subset of significant features for classification purposes. The binary version introduced here is performed by employing the S-shaped and V-shaped transfer functions, which allows the search agents to move on the binary search space. The proposed approaches are tested by using 15 benchmark datasets collected from the UCI machine learning repository, and the results are compared with other conventional feature selection methods. Our results prove the capability of the proposed binary version of the competitive swarm optimizer not only in terms of high classification performance, but also low computational cost.

1. Introduction

In recent days, many applications involve the role of extracting useful information for data collection. The extracted information is known as feature, and it is useful in describing the target concept [1]. However, an increment in the number of features will cause the “curse of dimensionality” in which the performance of the system is degraded and becomes complex. This is mainly due to the existence of irrelevant and redundant information, which badly affects the performance of the classification model [2]. To resolve the issue above, a proper selection of extracted features is critically important. Hence, the feature selection problem has become one of the major concerns in most of the research areas [3].
Feature selection is the pre-processing step to determine a subset of significant features that can strongly improve the performance of the system. It not only eliminates the redundant information, but also reduces the temporal and spatial complexity of the classification model [4]. Generally, feature selection can be classified into two approaches: filter and wrapper. The former identifies the relevant features by using the proxy measure, mutual information, and data characteristics, while the later utilizes a predictive model to train the feature set for evaluating the nearly optimal feature subset [5,6]. As compared to the wrapper, filter feature selection is independent of the learning algorithm, and it is computationally less expensive. However, wrapper feature selection can usually offer better performance [7].
As for the wrapper approach, the feature selection is considered as a combinatorial optimization problem, which can be solved by using metaheuristic algorithms [8,9]. The most common wrapper feature selection methods are the genetic algorithm (GA) and binary particle swarm optimization (BPSO). The GA is an evolutionary algorithm that generates the population of solutions called chromosomes. For each generation, the solutions are evolved based on the selection, crossover, and mutation operations [10]. Several studies have shown that GA is good for the high dimensional feature selection problem [10,11]. However, the GA suffers from time consumption and parameter setting. Thus, Ghareb et al. [12] performed the hybridization between an enhanced genetic algorithm (EGA) and a filter approach for text categorization. The authors first employed the filter approach to identify the potential initial solutions, and the solutions are then evaluated by the EGA. Ma and Xia [13] introduced a novel tribe competition-based genetic algorithm (TCbGA) to tackle the feature selection problem in pattern classification. Another study proposed the adaptive multi-parent crossover GA for epileptic seizure identification [14].
BPSO is a binary variant of particle swarm optimization (PSO). Unlike GA, BPSO is a swarm-based algorithm that generates the population of solutions called particles. The particles adjust their positions by changing their velocities according to their own experience, as well as the experience of their neighbors [15]. BPSO is a useful tool and it has been widely applied for feature selection. However, BPSO has the disadvantages of premature convergence and stagnation, thus leading to ineffective solutions [15,16]. Therefore, Chuang et al. [17] proposed a chaotic binary particle swarm optimization (CBPSO) for feature selection in which the chaotic maps were implemented for identifying the inertia weight in each iteration. Jain et al. [18] developed an improved binary particle swarm optimization (iBPSO) for gene selection and cancer classification. The authors first applied the correlation-based feature selection (CFS) to reduce the dimensions, and then evaluated the relevant features using iBPSO. Another study introduced the BPSO with the personal best (pbest) guide strategy to tackle the feature selection problem in electromyography signals classification [19].
Competitive swarm optimizer (CSO) is a newly introduced variant of PSO [20]. In comparison with other metaheuristic algorithms, CSO has shown superior performance in several benchmark tests. Generally, CSO employs the competition strategy that partitioned the solutions into winners and losers in which the winners are directly moved to the next iteration. In this way, CSO is computationally less expensive since only half of the population is used in the evaluations. In this paper, we propose the binary version of CSO to solve the feature selection problem in classification tasks. The binary version introduced here is performed by implementing the transfer functions. In this approach, the transfer functions from S-shaped and V-shaped families are used to allow the search agents to move around the binary search space. The proposed approaches are validated with 15 benchmark datasets, and the results are compared with other conventional methods.
The organization of this paper as the following: Section 2 details the background of the competitive swarm optimizer (CSO). Section 3 presents the proposed binary version of the competitive swarm optimizer (BCSO) and Section 4 describes the application of BCSO in feature selection. The experimental results are discussed in Section 5. Finally, conclusions are offered in Section 6.

2. The Competitive Swarm Optimizer

The competitive swarm optimizer (CSO) is a recent metaheuristic optimization algorithm proposed by Cheng and Jin in 2015 [20]. The CSO is a new variant of particle swarm optimization (PSO), and it has been proven to work more effectively on large-scale optimization. In addition, the CSO is able to find the global optimum in a very short period, which leads to fast computational speed. In the CSO, the population of particles is randomly partitioned into two groups with equal size. The competition is then made between the particles from each group. From the competition, the particle that scores a better fitness value is known as the winner and it is directly moved to the new iteration. On the contrary, the loser updates its velocity and position by learning from the winner. Mathematically, the velocity and position of loser is updated as follows:
v l , d ( t + 1 ) = r 1 v l , d ( t ) + r 2 ( x w , d ( t ) x l , d ( t ) ) + ϕ r 3 ( x ¯ d ( t ) x l , d ( t ) )
x l , d ( t + 1 ) = x l , d ( t ) + v l , d ( t + 1 )
where vl is the velocity of loser particle, xw is the position of the winner particle, xl is the position of the loser particle, x ¯ is the mean position of the current swarm, r1, r2, and r3 are three independent random vectors in [0, 1], ϕ is the social factor, d is the dimension of search space, and t is the iteration number. The pseudocode of the CSO is presented in Algorithm 1.
Algorithm 1. Competitive swarm optimizer
Input parameter:N, Tmax and ϕ
(1) Initialize a population of particles, x
(2) Calculate the fitness of particles, F(x)
(3) Define the best particle as gbest
(4) for t = 1 to maximum number of iterations, Tmax
    // Competition Strategy //
(5)  for i = 1 to half of population, N/2
(6)   Random select two particles, xk and xm
(7)   if F(xk) better than F(xm)
(8)    xw = xk, xl = xm
(9)   else
(10)    xw = xm, xl = xk
(11)   end if
(12)   Add xw into new population
(13)   Remove xk and xm from the population
(14)  next i
    //Velocity and Position Update //
(15)  for i = 1 to half of population, N/2
(16)   for d = 1 to the dimension of search space, D
(17)    Update velocity of loser using Equation (1)
(18)    Update position of loser as shown in Equation (2)
(19)   next d
(20)   Calculate the fitness of new loser, F(xl)
(21)   Move new loser into new population
(22)   Update gbest if there is better solution
(23)  next i
(24)  Pass new population to next iteration
(25) next t
Output: Global best solution

3. Binary Version of the Competitive Swarm Optimizer

The CSO is a swarm intelligent method that mimics the concept of competition between particles in the population. As mentioned in [20], the CSO has been tested on several benchmark functions, and it showed superior performance against other conventional optimization algorithms. The CSO algorithm utilizes the competition strategy and new velocity updating rule, which is beneficial in improving the exploration and convergence rate [20]. This motivates us to model the CSO so that it can be useful for wrapper-based feature selection.
Generally speaking, wrapper feature selection is considered as a binary optimization problem. In wrapper feature selection, the solution is represented as either 0 or 1 [8]. In the traditional CSO, the particles are moved around the search space by updating their positions within the continuous real domain. However, the real continuous value is not suitable for binary optimization since the solution should be represented in binary form. For such a reason, the CSO is modeled into the binary version.
One of the effective ways to convert the continuous optimization into a binary version is the utilization of a transfer function. In binary optimization, a transfer function is a mathematical function that determines the probability of changing a position vector’s dimension from 0 to 1, and vice versa [21]. More importantly, a transfer function is an extremely cheap operator, and it can improve the exploitation and exploration of the CSO in feature selection [22]. Hence, the transfer function has become our main focus in this work. In this paper, we propose eight versions of binary competitive swarm optimizers for feature selection.

3.1. S-Shaped Family

In general, the transfer function can be categorized into S-shaped and V-shaped families. In this sub-section, the implementation of the S-shaped transfer function is described. The S-shaped transfer function forces the search agents to move around the binary search space [8]. Previously, S-shaped transfer functions have been successfully applied in binary particle swarm optimization (BPSO), binary antlion optimizer (BALO) and binary salp swarm algorithm (BSSA) [8,21,23]. The four commonly used S-shaped transfer functions (S1–S4) are expressed as follows:
S 1 ( v l , d ( t + 1 ) ) = 1 1 + exp ( 2 v l , d ( t + 1 ) )
S 2 ( v l , d ( t + 1 ) ) = 1 1 + exp ( v l , d ( t + 1 ) )
S 3 ( v l , d ( t + 1 ) ) = 1 1 + exp ( v l , d ( t + 1 ) / 2 )
S 4 ( v l , d ( t + 1 ) ) = 1 1 + exp ( v l , d ( t + 1 ) / 3 )
where vl is the velocity of loser particle, d is the dimension, and t is the iteration number. The illustrations of the S-shaped transfer functions are presented in Figure 1. In these approaches, the velocity of the loser is first calculated as shown in Equation (1). The transfer function is then used to convert the velocity into a probability value between [0, 1]. After that, the position of the loser is updated as:
x l , d ( t + 1 ) = { 1   ,   if   S ( v l , d ( t + 1 ) )   >   r 4 0   ,   otherwise
where S can be S1, S2, S3, or S4 and r4 is a random vector distributed in [0, 1].

3.2. V-Shaped Family

In this sub-section, the implementation of the V-shaped transfer function is presented. The V-shaped transfer function allows the search agents to perform the search within the binary search space. Many studies employ the V-shaped transfer function to convert the metaheuristic algorithms into a binary version [23,24,25]. The four frequently used V-shaped transfer functions (V1–V4) are defined as follows:
V 1 ( v l , d ( t + 1 ) ) = | erf ( π 2 v l , d ( t + 1 ) ) |
V 2 ( v l , d ( t + 1 ) ) = | tanh ( v l , d ( t + 1 ) ) |
V 3 ( v l , d ( t + 1 ) ) = | v l , d ( t + 1 ) 1 + ( v l , d ( t + 1 ) ) 2 |
V 4 ( v l , d ( t + 1 ) ) = | 2 π arc tan ( π 2 v l , d ( t + 1 ) ) |
where vl is the velocity of loser particle, d is the dimension, and t is the iteration number. The illustrations of the V-shaped transfer functions are shown in Figure 2. Unlike the S-shaped transfer function, the V-shaped transfer function does not force the search agents to move on the binary search space. In this approach, the position of loser particle is updated as:
x l , d ( t + 1 ) = { 1   x l , d ( t ) ,   if   V ( v l , d ( t + 1 ) )     r 5 x l , d ( t ) ,   otherwise
where V can be V1, V2, V3, or V4 and r5 is a random vector distributed in [0, 1].
The pseudocode of the binary competitive swarm optimizer (BCSO) is shown in Algorithm 2. N and Tmax are the number of particles and the maximum number of iterations. In the first step, a population of N particles is randomly initialized, and the velocity of each particle is initialized as zero. Then, the fitness of each particle is evaluated. The best particle is defined as the global best, gbest. For each iteration, the particles are randomly divided into two groups, and the competition is made between two coupled particles. From the competition, the winners are directly passed into the new population. On the other hand, the losers update their velocity using Equation (1). Then, the velocity is converted into a probability value by employing S-shaped or V-shaped transfer functions. Afterward, the position of the loser particle is updated using Equation (7) or Equation (12). Next, the fitness of each new loser is evaluated, and the new losers are moved into the new population for the next iteration. At the end of each iteration, the global best solution gbest is updated. The procedure is repeated iteratively until the maximum number of iterations is reached. Finally, the global best solution is achieved.
Algorithm 2. Binary competitive swarm optimizer.
Input parameter:N, Tmax and ϕ
(1) Initialize a population of particles, x
(2) Calculate the fitness of particles, F(x)
(3) Define the best particle as gbest
(4) for t = 1 to maximum number of iterations, Tmax
    // Competition Strategy //
(5)  for i = 1 to half of population, N/2
(6)   Random select two particles, xk and xm
(7)   if F(xk) better than F(xm)
(8)    xw = xk, xl = xm
(9)   else
(10)    xw = xm, xl = xk
(11)   end if
(12)   Add xw into new population
(13)   Remove xk and xm from the population
(14)  next i
    //Velocity and Position Update //
(15)  for i = 1 to half of population, N/2
(16)   for d = 1 to the dimension of search space, D
(17)    Update velocity of loser using Equation (1)
(18)    Convert velocity into probability using S-shaped or V-shaped transfer function
(19)    Update position of loser as shown in Equation (7) or Equation (12)
(20)   next d
(21)   Calculate the fitness of new loser, F(xl)
(22)   Move new loser into new population
(23)   Update gbest if there is better solution
(24)  next i
(25)  Pass new population to next iteration
(26) next t
Output: Global best solution

4. Application of the Binary Competitive Swarm Optimizer for Feature Selection

In this section, the proposed binary competitive swarm optimization approaches are applied to solve the feature selection problem in classification tasks. In wrapper feature selection, the solution is represented in binary form. Bit 1 indicates that the feature is selected, while bit 0 denotes the unselected feature [23]. For example, let solution X = {0, 1, 0, 0, 1, 0, 0, 0, 1, 1}. As can be seen, solution X consists of 10 dimensions (features). Among them, only four features (2nd, 5th, 9th, and 10th) are selected.
Feature selection is an NP-hard combinatorial problem. For a dataset with feature size D, the possible combination of feature subsets will be 2D – 1, which is impractical for searching exhaustively. Therefore, the proposed approaches are used to evaluate the best feature subset. In this paper, the fitness function that considers both classification error rate and number of features is applied. Mathematically, the fitness function can be expressed as:
F i t n e s s = α E R ( K ) + ( 1 α ) | S | | C |
where ER(K) is the classification error rate computed by a classifier relative to selection decision K of the features, |C| is the total number of features in the dataset, |S| is the length of selected feature subset, and α is the parameter in [0, 1] that controls the influence of the classification error rate. According to [8,23,26], the a is set at 0.99 since classification performance is the most important measure in this framework.

5. Experimental Results and Discussions

5.1. Experiment Setup

In this sub-section, the performances of the proposed binary competitive swarm optimizer approaches are investigated. The proposed approaches are validated with fifteen benchmark datasets acquired from the UCI machine learning repository [27]. Table 1 outlines the detail of the datasets in terms of the number of instances, number of features, and number of classes. Note that the features in the LSVT Voice Rehabilitation dataset are normalized in order to prevent numerical problems.
As for wrapper feature selection, the classification error rate in the fitness function is computed by using the k-nearest neighbor (KNN) classifier with Euclidean distance metric and k = 5. The KNN is chosen due to its promising performance and fast computation speed in previous work [10]. In this paper, we use a hold-out strategy in which each dataset is partitioned into 80% for training and 20% for testing.

5.2. Comparison Algorithms and Evaluation Metrics

To examine the efficiency and efficacy of the proposed approaches, four state-of-the-art feature selection methods, including binary particle swarm optimization (BPSO), genetic algorithm (GA), binary differential evolution (BDE) [28], and binary salp swarm algorithm (BSSA) [23], are used for the comparison of performance. To ensure a fair comparison, the population size (N) and maximum number of iterations (Tmax) are fixed at 10 and 100, respectively [23]. On one hand, the dimension of the search space (D) is equal to the total number of features in each dataset. Table 2 exhibits the parameter settings for the utilized approaches. Note that there is no additional parameter setting for BSSA.
In the experiment, six evaluation metrics, including the best fitness, worst fitness, mean fitness, standard deviation of fitness (STD), feature size (number of selected features), and accuracy, are recorded. The details of the evaluation metrics can be found in [9,29,30]. To achieve statistically meaningful results, each approach is repeated for 30 independent runs. Thereafter, the average statistical measurements obtained throughout 30 independent runs are displayed as the experimental results. All the evaluations are conducted with MATLAB 2017 software (MathWorks, Natick, MA, USA) by using a computer with 2.90 GHz Intel Core i5-9400 CPU and 16 GB RAM.

5.3. Assessments of the BCSO in Feature Selection

In the first part of the experiment, the BCSO with the best transfer function is determined. There are eight transfer functions (from both S-shaped and V-shaped families) utilized in this work. Table 3 displays the experimental results of the best fitness, worst fitness, mean fitness, STD of fitness, and feature size of BCSOs on 15 datasets. Note that the best results among eight BCSOs are highlighted with bold text. In this table, the smaller the best, worst, mean, and STD of fitness values are, the better the performances are. As for the feature size, a lower value indicates that fewer features are selected by the algorithm. In other words, a smaller number of the feature size means more irrelevant and redundant features have been eliminated. From Table 3, it is observed that BCSO-V2 offered the smallest best fitness value on five datasets (6, 7, 9, 13, and 14), which overwhelmed other transfer functions in feature selection tasks. On the other hand, BCSO-S4 perceived the optimal STD value in most cases, in which a high consistency result can be ensured. In terms of feature size, BCSO-V3 contributed the lowest number of selected features for most of the datasets.
Another important measurement is the accuracy obtained from the features selected by each approach. Figure 3 demonstrates the boxplot of BCSO with eight different transfer functions across 15 datasets. As can be seen, BCSOs with V-shaped transfer functions can usually achieve better results as compared to S-shaped transfer functions. This is because the V-shaped transfer function does not force the search agent to take the bits 1 or 0, thus resulting in excellent performance. Across 15 datasets, it is seen that the optimal classification performance is achieved by BCSO-V1 and BCSO-V2. Based on the results obtained from Table 3 and Figure 3, it can conclude that the BCSO with transfer function V2 yielded superior performance in evaluating the relevant features, which overtakes other transfer functions in the current work.

5.4. Comparison with Other Algorithms

Table 4 presents the comparison of the results of the BCSO with BDE, BPSO, BSSA, and GA. In Table 4, the best result on each metric is bolded. Through observation from the result in Table 4, it is seen that BCSO showed competitive performance in feature selection tasks. In comparison with BDE, BPSO, BSSA, and GA, the BCSO achieved the optimal best fitness value on 11 datasets. This result implies that the performance of the BCSO was superior, which overtakes other algorithms in identifying the significant features.
On the other hand, one can see that BSSA perceived the smallest feature size (number of selected features) in this work. This finding indicates that BSSA can usually select a subset of minimal features while maintaining high performance. In terms of robustness, the most consistent results are provided by BDE due to the smallest standard deviation value.
Figure 4 demonstrates the accuracy of BDE, BPSO, GA, BSSA, and BCSO on 15 datasets. From Figure 4, the best classification performance was perceived by the BCSO. As compared to other methods, the BCSO showed superior accuracy on 11 datasets. It is obvious that the BCSO is a useful feature selection tool, which provides better classification performance in this work. As for datasets 1 and 8, the best accuracy was achieved by the BPSO. On one hand, BSSA and GA obtained the best accuracy on datasets 5 and 10, respectively. Inspecting the result, the worst performance was found to be BDE. The result obtained implies that BDE did not work very well in this study.
Figure 5 and Figure 6 illustrate the convergence curves of BDE, BPSO, BSSA, GA, and BCSO for 15 datasets. Note that the fitness is the average fitness value obtained from 30 runs. In these figures, it is observed that BCSO provided competitive performance against BPSO, GA, BSSA, and BDE. Among the rivals, the worst performance was achieved by BDE. Unfortunately, BDE did not find the global optimum efficiently, thus resulting in ineffective resolution. On one hand, the BCSO keeps tracking the global optimum, which leads to good exploitation and exploration capability. As a result, the BCSO offers very good diversity, perceived as the best performance on most of the datasets.
Furthermore, the Wilcoxon rank sum test with 95% confidence level is applied to examine whether the classification performance achieved by the BCSO is significantly better than other methods. The BCSO is selected as the reference algorithm since it offers better classification results in this work. The results of the Wilcoxon test with p-values is presented in Table 5. For the ease of understanding, the symbols “w/t/l” indicate the BCSO is superior to (win), equal to (tie), and inferior to (lose) other algorithms. As can be seen, the classification performance of the BCSO was significantly better than BPSO, BSSA, GA, and BDE (p-value < 0.05) in most cases. For example, the performance of the BCSO was significantly better than the BPSO on nine datasets. Additionally, the analysis of variance (ANOVA) with post-hoc test is applied to investigate whether there is a significant difference between the BCSO and other algorithms across 15 datasets. Successively, the performance of our BCSO was significantly better (p-value < 0.05) when compared to BDE, BPSO, BSSA, and GA. The results obtained evidently show the superiority of the BCSO with respect to the feature selection problem.
Table 6 outlines the results of the computational cost of BDE, BPSO, BSSA, GA, and BCSO on 15 datasets. Judging from Table 6, the lowest computation time is perceived by the BCSO. This is expected since the proposed approaches only update the velocity and position of losers (half of the population) in the process of evaluations. In this way, the proposed approaches compute faster than other conventional methods in feature selection. On the contrary, the slowest processing speed is found to be the GA, followed by the BPSO. Based on the results obtained, the BCSO not only achieves the best classification performance, but is also computationally less expensive. Evidently, the BCSO is a powerful feature selection tool and it can be applied to other engineering applications.

6. Conclusions

In this paper, binary variants of the CSO are proposed and applied for feature selection tasks. The continuous CSO is converted into the binary version by using transfer functions. Eight different transfer functions from S-shaped and V-shaped families are implemented in the BCSO. The S-shaped transfer functions force the search agents to move on the binary search space. On one hand, the V-shaped transfer functions allow the search agents to perform the search around the binary search space. The proposed BCSO is validated using 15 benchmark datasets. Firstly, the BCSO with the optimal transfer function is investigated. In comparison with other transfer functions, we found that the BCSO with the V2 transfer function was the most suitable, which perceived the optimal performance in current work. Secondly, the performance of the BCSO is verified with four other conventional feature selection methods. Based on the results obtained, the BCSO outperformed other methods (BDE, BSSA, BPSO, and GA) when finding the significant features, in which a high searching capability can be ensured. In addition, BCSO can often select a smaller number of significant features that contributed a high accuracy. Moreover, the processing speed of the BCSO is extremely fast, which is more appropriate in real-world applications. All in all, it can be inferred that the BCSO is a valuable feature selection tool. In the future, the BCSO can be applied to other binary optimization tasks, such as unit commitment issues, optimized neural networks, and the knapsack problem.

Author Contributions

Conceptualization: J.T.; formal analysis: J.T.; funding acquisition: A.R.A.; investigation: J.T.; methodology: J.T.; software: J.T.; supervision: A.R.A.; validation: J.T.; writing—original draft: J.T.; writing—review and editing: J.T., A.R.A., and N.M.S.

Funding

This research and the article processing charge were funded by the Ministry of Higher Education (MOHE) Malaysia under grant number GLuar/STEVIA/2016/FKE-CeRIA/l00009.

Acknowledgments

The authors would like to thank the Ministry of Higher Education Malaysia for funding this research under grant GLuar/STEVIA/2016/FKE-CeRIA/l00009.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Mafarja, M.; Aljarah, I.; Heidari, A.A.; Hammouri, A.I.; Faris, H.; Al-Zoubi, A.M.; Mirjalili, S. Evolutionary Population Dynamics and Grasshopper Optimization approaches for feature selection problems. Knowl.-Based Syst. 2018, 145, 25–45. [Google Scholar] [CrossRef] [Green Version]
  2. Arora, S.; Anand, P. Binary butterfly optimization approaches for feature selection. Expert Syst. Appl. 2019, 116, 147–160. [Google Scholar] [CrossRef]
  3. Hafiz, F.; Swain, A.; Patel, N.; Naik, C. A two-dimensional (2-D) learning framework for Particle Swarm based feature selection. Pattern Recognit. 2018, 76, 416–433. [Google Scholar] [CrossRef] [Green Version]
  4. Lin, K.C.; Hung, J.C.; Wei, J. Feature selection with modified lion’s algorithms and support vector machine for high-dimensional data. Appl. Soft Comput. 2018, 68, 669–676. [Google Scholar] [CrossRef]
  5. Lin, K.C.; Zhang, K.Y.; Huang, Y.H.; Hung, J.C.; Yen, N. Feature selection based on an improved cat swarm optimization algorithm for big data classification. J. Supercomput. 2016, 72, 3210–3221. [Google Scholar] [CrossRef]
  6. Chen, Y.P.; Li, Y.; Wang, G.; Zheng, Y.F.; Xu, Q.; Fan, J.H.; Cui, X.T. A novel bacterial foraging optimization algorithm for feature selection. Expert Syst. Appl. 2017, 83, 1–17. [Google Scholar] [CrossRef]
  7. Xue, B.; Zhang, M.; Browne, W.N. Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach. IEEE Trans. Cybern. 2013, 43, 1656–1671. [Google Scholar] [CrossRef] [PubMed]
  8. Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary ant lion approaches for feature selection. Neurocomputing 2016, 213, 54–65. [Google Scholar] [CrossRef]
  9. Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 172, 371–381. [Google Scholar] [CrossRef]
  10. Huang, C.L.; Wang, C.J. A GA-based feature selection and parameters optimization for support vector machines. Expert Syst. Appl. 2006, 31, 231–240. [Google Scholar] [CrossRef]
  11. De Stefano, C.; Fontanella, F.; Marrocco, C.; Scotto di Freca, A. A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recognit. Lett. 2014, 35, 130–141. [Google Scholar] [CrossRef]
  12. Ghareb, A.S.; Bakar, A.A.; Hamdan, A.R. Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 2016, 49, 31–47. [Google Scholar] [CrossRef]
  13. Ma, B.; Xia, Y. A tribe competition-based genetic algorithm for feature selection in pattern classification. Appl. Soft Comput. 2017, 58, 328–338. [Google Scholar] [CrossRef] [Green Version]
  14. Al-Sharhan, S.; Bimba, A. Adaptive multi-parent crossover GA for feature optimization in epileptic seizure identification. Appl. Soft Comput. 2019, 75, 575–587. [Google Scholar] [CrossRef]
  15. Chuang, L.Y.; Chang, H.W.; Tu, C.J.; Yang, C.H. Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 2008, 32, 29–38. [Google Scholar] [CrossRef]
  16. Tan, T.Y.; Zhang, L.; Neoh, S.C.; Lim, C.P. Intelligent skin cancer detection using enhanced particle swarm optimization. Knowl.-Based Syst. 2018, 158, 118–135. [Google Scholar] [CrossRef]
  17. Chuang, L.Y.; Yang, C.H.; Li, J.C. Chaotic maps based on binary particle swarm optimization for feature selection. Appl. Soft Comput. 2011, 11, 239–248. [Google Scholar] [CrossRef]
  18. Jain, I.; Jain, V.K.; Jain, R. Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Appl. Soft Comput. 2018, 62, 203–215. [Google Scholar] [CrossRef]
  19. Too, J.; Abdullah, A.R.; Mohd Saad, N.; Tee, W. EMG Feature Selection and Classification Using a Pbest-Guide Binary Particle Swarm Optimization. Computation 2019, 7, 12. [Google Scholar] [CrossRef]
  20. Cheng, R.; Jin, Y. A Competitive Swarm Optimizer for Large Scale Optimization. IEEE Trans. Cybern. 2015, 45, 191–204. [Google Scholar] [CrossRef]
  21. Mirjalili, S.; Lewis, A. S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization. Swarm Evol. Comput. 2013, 9, 1–14. [Google Scholar] [CrossRef]
  22. Saremi, S.; Mirjalili, S.; Lewis, A. How important is a transfer function in discrete heuristic algorithms. Neural Comput. Appl. 2015, 26, 625–640. [Google Scholar] [CrossRef]
  23. Faris, H.; Mafarja, M.M.; Heidari, A.A.; Aljarah, I.; Al-Zoubi, A.M.; Mirjalili, S.; Fujita, H. An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems. Knowl.-Based Syst. 2018, 154, 43–67. [Google Scholar] [CrossRef]
  24. Mafarja, M.; Aljarah, I.; Faris, H.; Hammouri, A.I.; Al-Zoubi, A.M.; Mirjalili, S. Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst. Appl. 2019, 117, 267–286. [Google Scholar] [CrossRef]
  25. Rashedi, E.; Nezamabadi-pour, H.; Saryazdi, S. BGSA: Binary gravitational search algorithm. Nat. Comput. 2010, 9, 727–745. [Google Scholar] [CrossRef]
  26. Emary, E.; Zawbaa, H.M. Feature selection via Lèvy Antlion optimization. Pattern Anal. Appl. 2018, 19, 1–20. [Google Scholar] [CrossRef]
  27. UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 24 March 2019).
  28. Zorarpacı, E.; Özel, S.A. A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst. Appl. 2016, 62, 91–103. [Google Scholar] [CrossRef]
  29. Zawbaa, H.M.; Emary, E.; Grosan, C. Feature Selection via Chaotic Antlion Optimization. PLoS ONE 2016, 11, e0150652. [Google Scholar] [CrossRef]
  30. Too, J.; Abdullah, A.R.; Mohd Saad, N. A New Co-Evolution Binary Particle Swarm Optimization with Multiple Inertia Weight Strategy for Feature Selection. Informatics 2019, 6, 21. [Google Scholar] [CrossRef]
Figure 1. S-shaped transfer functions (S1–S4).
Figure 1. S-shaped transfer functions (S1–S4).
Computation 07 00031 g001
Figure 2. V-shaped transfer functions (V1–V4).
Figure 2. V-shaped transfer functions (V1–V4).
Computation 07 00031 g002
Figure 3. Boxplot of the BCSO with eight different transfer functions across 15 datasets.
Figure 3. Boxplot of the BCSO with eight different transfer functions across 15 datasets.
Computation 07 00031 g003
Figure 4. Accuracy of BDE, BPSO, BSSA, GA, and BCSO on 15 datasets.
Figure 4. Accuracy of BDE, BPSO, BSSA, GA, and BCSO on 15 datasets.
Computation 07 00031 g004
Figure 5. Convergence curves of BDE, BPSO, BSSA, GA, and BCSO on datasets 1–8.
Figure 5. Convergence curves of BDE, BPSO, BSSA, GA, and BCSO on datasets 1–8.
Computation 07 00031 g005
Figure 6. Convergence curves of BDE, BPSO, BSSA, GA, and BCSO on datasets 9–15.
Figure 6. Convergence curves of BDE, BPSO, BSSA, GA, and BCSO on datasets 9–15.
Computation 07 00031 g006
Table 1. List of the used datasets.
Table 1. List of the used datasets.
No.DatasetNumber of InstancesNumber of FeaturesNumber of Classes
1Arrhythmia45227916
2Breast Cancer Wisconsin69992
3Dermatology366346
4Diabetic Retinopathy Debrecen1151192
5Hepatitis155192
6Ionosphere351342
7Libras Movement3609015
8LSVT Voice Rehabilitation1263092
9SCADI702057
10Wine178133
11Breast Cancer Coimbra11692
12Iris15043
13Lung Cancer32562
14Musk 14761672
15Seeds21073
Table 2. Parameter settings of the utilized approaches.
Table 2. Parameter settings of the utilized approaches.
AlgorithmParameterValue
BPSOInertia weight, w[0.9–0.4]
Acceleration coefficient, c1 and c22
Maximum velocity, Vmax6
GACrossover rate, CR0.8
Mutation rate, MR0.01
BDECrossover rate, CR0.9
BCSOSocial factor, ϕ 0.2
Maximum velocity, Vmax6
Table 3. The experimental results of BCSO with eight different transfer functions.
Table 3. The experimental results of BCSO with eight different transfer functions.
DatasetMetricsBinary Version of Competitive Swarm Optimizer (BCSO)
S1S2S3S4V1V2V3V4
1Best fitness0.39270.39470.40080.40150.36560.36450.36410.3656
Worst fitness0.40300.40410.40520.40450.40350.40310.40340.4023
Mean fitness0.39450.39740.40130.40200.37170.37090.37030.3726
STD0.00270.00310.00090.00060.00960.00950.00960.0094
Feature size133.00138.07134.57134.17134.03132.70132.17133.57
2Best fitness0.02860.03010.03160.03160.02150.02210.02190.0201
Worst fitness0.03140.03080.03190.03200.03020.03040.03040.0306
Mean fitness0.02910.03010.03160.03160.02230.02310.02280.0213
STD0.00070.00010.00010.00010.00190.00200.00200.0026
Feature size3.503.334.434.473.473.573.573.63
3Best fitness0.00640.00670.00690.00690.01730.01820.02010.0214
Worst fitness0.02950.02610.02240.01970.04370.04370.04370.0437
Mean fitness0.00800.00740.00730.00720.02090.02170.02400.0259
STD0.00460.00300.00220.00180.00580.00570.00570.0067
Feature size21.6322.8323.4723.5315.9015.8316.1715.73
4Best fitness0.29760.30330.30560.30560.28740.28850.28280.2863
Worst fitness0.30470.30570.30560.30560.30940.30920.31010.3101
Mean fitness0.29790.30340.30560.30560.28970.29040.28580.2891
STD0.00130.00060.00000.00000.00460.00380.00640.0055
Feature size8.7010.8011.2311.237.877.637.337.20
5Best fitness0.13000.14430.14640.14640.12220.12630.12760.1245
Worst fitness0.13550.14650.14650.14650.14330.14340.14340.1434
Mean fitness0.13060.14440.14640.14640.12560.12890.12940.1274
STD0.00150.00040.00000.00000.00630.00470.00410.0049
Feature size6.377.177.107.135.735.375.775.93
6Best fitness0.12070.13930.14230.14320.08860.08680.08940.0890
Worst fitness0.14230.14290.14370.14370.14230.14230.14190.1428
Mean fitness0.12520.14000.14250.14350.10050.09800.10160.1035
STD0.00560.00100.00050.00030.01390.01520.01520.0146
Feature size11.3013.4714.0314.0011.0311.3310.7711.03
7Best fitness0.23210.23560.24280.24100.20040.19910.20600.2107
Worst fitness0.26430.26660.27020.26980.26960.26830.26830.2683
Mean fitness0.23860.24010.24710.24630.21900.21810.22190.2266
STD0.00830.00690.00710.00800.01920.01940.01580.0153
Feature size46.6749.6348.4748.9338.4738.5738.6339.67
8Best fitness0.09350.11460.14490.15950.10490.11140.10220.1062
Worst fitness0.16480.17010.16220.17010.18710.18570.18570.1844
Mean fitness0.10330.11930.14830.15960.12610.12720.12150.1273
STD0.01620.01070.00450.00120.02050.01930.02370.0218
Feature size156.60155.57155.80156.37140.60140.27141.00142.63
9Best fitness0.21690.22170.22870.22880.20860.20400.20630.2063
Worst fitness0.23580.23580.23580.22880.24030.24040.24270.2427
Mean fitness0.21810.22340.22950.22880.21260.20970.21320.2138
STD0.00430.00450.00220.00000.00730.00940.00910.0088
Feature size97.5798.7798.5099.0773.1774.1774.3373.90
10Best fitness0.07410.07990.08780.08780.05240.05140.04970.0472
Worst fitness0.08630.08450.08800.08800.08480.08480.09040.0906
Mean fitness0.07420.07990.08780.08780.05500.05490.05330.0502
STD0.00140.00050.00000.00000.00680.00690.00790.0079
Feature size5.734.904.474.534.904.875.075.30
11Best fitness0.13520.13520.13570.13580.14670.14550.14950.1456
Worst fitness0.15370.14250.13850.13580.21010.21010.21010.2100
Mean fitness0.13550.13530.13570.13580.14990.15010.15290.1491
STD0.00210.00090.00030.00000.01200.01350.01140.0118
Feature size5.475.475.906.004.174.374.104.47
12Best fitness0.00250.00250.00250.00250.00370.00370.00390.0038
Worst fitness0.00910.00660.00610.00610.00990.00990.00990.0099
Mean fitness0.00260.00250.00250.00250.00390.00390.00400.0040
STD0.00070.00040.00040.00040.00090.00090.00090.0009
Feature size1.001.001.001.001.031.031.101.07
13Best fitness0.03210.24090.20770.26850.00310.00310.00320.0088
Worst fitness0.25770.26320.26310.27410.27400.27400.26840.2631
Mean fitness0.08680.24710.21360.26890.02820.03460.03260.0422
STD0.07810.00820.01580.00140.06380.06550.06440.0648
Feature size25.5724.7723.6725.1017.5317.6017.9318.60
14Best fitness0.07420.08240.09240.09690.06670.06220.06450.0674
Worst fitness0.10430.10780.10550.10480.10710.10740.10770.1067
Mean fitness0.08120.08820.09420.09830.07670.07370.07470.0779
STD0.00780.00630.00350.00210.01060.01240.01070.0103
Feature size84.8782.681.0381.7082.0782.1079.4081.97
15Best fitness0.05170.05170.05170.05170.05040.05040.05030.0504
Worst fitness0.05560.05170.05170.05170.06520.06520.06520.0660
Mean fitness0.05170.05170.05170.05170.05090.05100.05090.0510
STD0.00040.00000.00000.00000.00230.00240.00250.0026
Feature size3.173.203.203.202.302.302.202.27
Table 4. Comparison of the results of BCSO with BDE, BPSO, BSSA, and GA.
Table 4. Comparison of the results of BCSO with BDE, BPSO, BSSA, and GA.
DatasetMetricsFeature Selection Method
BDEBPSOBSSAGABCSO
1Best fitness0.39650.36040.38540.38060.3645
Worst fitness0.40340.39940.40710.40130.4031
Mean fitness0.39660.36310.38620.38110.3709
STD0.00090.00730.00320.00270.0095
Feature size156.97131.73102.43132.67132.70
2Best fitness0.02790.02590.02600.02280.0221
Worst fitness0.02970.02620.03200.02540.0304
Mean fitness0.02790.02590.02640.02280.0231
STD0.00020.00000.00110.00030.0020
Feature size4.534.433.873.773.57
3Best fitness0.02910.0280.03580.02030.0182
Worst fitness0.03650.0420.04450.03460.0437
Mean fitness0.02920.02890.03660.02060.0217
STD0.00090.00210.00190.00170.0057
Feature size19.1015.1314.3015.3315.83
4Best fitness0.3060.29860.29450.30430.2885
Worst fitness0.31030.31510.31510.31180.3092
Mean fitness0.30610.30040.29670.30440.2904
STD0.00060.00390.00490.00090.0038
Feature size10.808.136.778.977.63
5Best fitness0.14250.12960.12160.13430.1263
Worst fitness0.14460.14660.14660.14110.1434
Mean fitness0.14260.13040.12230.13440.1289
STD0.00030.00300.00320.00080.0047
Feature size7.905.474.476.405.37
6Best fitness0.13590.10160.10950.11840.0868
Worst fitness0.14020.13980.14380.13410.1423
Mean fitness0.13600.10450.11030.11860.0980
STD0.00060.00780.00440.00200.0152
Feature size16.4715.238.6012.9011.33
7Best fitness0.26110.22690.24330.24300.1991
Worst fitness0.26690.27080.27330.26280.2683
Mean fitness0.26120.23130.24500.24330.2181
STD0.00080.00910.00490.00250.0194
Feature size48.0345.9328.2341.6038.57
8Best fitness0.16530.10650.14180.13790.1114
Worst fitness0.18090.18060.19230.18300.1857
Mean fitness0.16560.11870.14730.13930.1272
STD0.00210.01840.01200.00690.0193
Feature size171.73151.2799.73141.50140.27
9Best fitness0.23350.22560.21700.21420.2040
Worst fitness0.23820.24510.24510.23320.2404
Mean fitness0.23360.22860.21750.21460.2097
STD0.00070.00550.00320.00240.0094
Feature size99.0781.751.2391.4774.17
10Best fitness0.07520.05310.06130.05110.0514
Worst fitness0.08350.09800.09800.06780.0848
Mean fitness0.07540.05740.06280.05140.0549
STD0.00120.01030.00540.00220.0069
Feature size6.034.634.675.604.87
11Best fitness0.17390.18120.17100.15560.1455
Worst fitness0.19930.22590.22590.18310.2101
Mean fitness0.17440.18400.17240.15610.1501
STD0.00350.00890.00680.00350.0135
Feature size5.404.274.104.474.37
12Best fitness0.01140.00590.00730.00510.0037
Worst fitness0.01150.01260.01260.00640.0099
Mean fitness0.01140.00690.00740.00510.0039
STD0.00000.00210.00060.00010.0009
Feature size1.471.031.171.171.03
13Best fitness0.16470.09180.12410.14190.0031
Worst fitness0.21960.09750.27420.20260.2740
Mean fitness0.16610.09510.13260.14290.0346
STD0.00790.00270.02520.00710.0655
Feature size29.2321.4717.5324.4317.60
14Best fitness0.08780.06920.08990.08160.0622
Worst fitness0.10150.10050.10990.10120.1074
Mean fitness0.08810.07420.09110.08220.0737
STD0.00180.00740.00330.00290.0124
Feature size108.6382.4363.2380.8082.10
15Best fitness0.05990.06240.05540.05200.0504
Worst fitness0.06110.06670.06670.06020.0652
Mean fitness0.06000.06250.05560.05210.0510
STD0.00020.00060.00130.00100.0024
Feature size2.903.002.472.832.30
Table 5. p-values of the Wilcoxon rank sum test of the BCSO accuracy results versus other algorithms.
Table 5. p-values of the Wilcoxon rank sum test of the BCSO accuracy results versus other algorithms.
DatasetP-Value
BDEBPSOBSSAGA
10.0000000.3801886.00 × 10−60.000434
23.70 × 10−50.0005320.0019210.630455
30.0011560.0176442.60 × 10−50.107494
40.0000000.0162110.0124781.38 × 10−4
50.0162810.5877980.5461440.214683
60.0000000.0015414.00 × 10−60.000000
70.0000000.0000000.0000000.000000
80.0000000.3569955.70 × 10−50.001066
90.0005910.0030160.0133940.090513
100.0339280.9199990.1511880.797191
110.0049390.0008663.20 × 10−50.319416
120.0246370.3128170.1694010.570163
130.0000004.00 × 10−62.00 × 10−60.000000
140.0000000.1703180.0000003.00 × 10−6
150.0055555.80 × 10−50.0214980.333711
w/t/l15/0/09/6/012/3/07/8/0
Table 6. Computational cost of BDE, BPSO, GA, BSSA and BCSO on 15 datasets.
Table 6. Computational cost of BDE, BPSO, GA, BSSA and BCSO on 15 datasets.
DatasetAverage Computational Time (s)
BDEBPSOBSSAGABCSO
11.99702.29861.87103.46471.4204
22.60032.53382.38494.39471.2760
31.52811.40241.33352.43160.7592
47.30247.01966.611212.3053.6870
50.84500.79380.75710.90280.4451
61.35881.38791.25952.24710.7262
71.48881.55631.38182.43230.8689
80.89201.14790.93081.42860.9336
90.68140.86160.71861.20210.6763
100.86080.85240.80201.26490.4529
110.68640.66770.62981.24660.3558
120.74880.74890.70481.00540.3944
130.56670.63570.57200.87430.3884
142.04752.09891.80853.60511.2123
150.91560.87260.85491.51000.4702

Share and Cite

MDPI and ACS Style

Too, J.; Abdullah, A.R.; Mohd Saad, N. Binary Competitive Swarm Optimizer Approaches for Feature Selection. Computation 2019, 7, 31. https://doi.org/10.3390/computation7020031

AMA Style

Too J, Abdullah AR, Mohd Saad N. Binary Competitive Swarm Optimizer Approaches for Feature Selection. Computation. 2019; 7(2):31. https://doi.org/10.3390/computation7020031

Chicago/Turabian Style

Too, Jingwei, Abdul Rahim Abdullah, and Norhashimah Mohd Saad. 2019. "Binary Competitive Swarm Optimizer Approaches for Feature Selection" Computation 7, no. 2: 31. https://doi.org/10.3390/computation7020031

APA Style

Too, J., Abdullah, A. R., & Mohd Saad, N. (2019). Binary Competitive Swarm Optimizer Approaches for Feature Selection. Computation, 7(2), 31. https://doi.org/10.3390/computation7020031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop