A New Co-Evolution Binary Particle Swarm Optimization with Multiple Inertia Weight Strategy for Feature Selection

: Feature selection is a task of choosing the best combination of potential features that best describes the target concept during a classiﬁcation process. However, selecting such relevant features becomes a di ﬃ cult matter when large number of features are involved. Therefore, this study aims to solve the feature selection problem using binary particle swarm optimization (BPSO). Nevertheless, BPSO has limitations of premature convergence and the setting of inertia weight. Hence, a new co-evolution binary particle swarm optimization with a multiple inertia weight strategy (CBPSO-MIWS) is proposed in this work. The proposed method is validated with ten benchmark datasets from UCI machine learning repository. To examine the e ﬀ ectiveness of proposed method, four recent and popular feature selection methods namely BPSO, genetic algorithm (GA), binary gravitational search algorithm (BGSA) and competitive binary grey wolf optimizer (CBGWO) are used in a performance comparison. Our results show that CBPSO-MIWS can achieve competitive performance in feature selection, which is appropriate for application in engineering, rehabilitation and clinical areas.


Introduction
Various pattern recognition studies have shown that a proper selection of features can lead to satisfactory classification performance.However, it is difficult to determine which feature is relevant due to the lack of experience and prior knowledge [1][2][3].In addition, a weak feature (a feature that contributes low classification accuracy) might be able to enhance the classification performance when it is combined with other potential features.Moreover, the selection of features is considered as an non-deterministic polynomial-time (NP) hard combinatorial problem, where the number of possible solutions increases exponentially with the number of features.Hence, an exhaustive search is impractical [4].In fact, a feature set with large number of features not only introduces the extra computational complexity, but also significantly degrades the performance of system.Therefore, the feature selection process is critically important for classification tasks.
Feature selection is a technique that aims to find a subset of input features, through which can improve or maintain the classification accuracy [5].Generally, feature selection can be categorized into filter and wrapper approaches.The former is based on statistical, information theory, distance measurement and intrinsic characteristic of the data.By contrast, the latter evaluates the best combination of features (feature subset) by optimizing the classification performance.As compared to the wrapper approach, the filter approach does not involve any specific learning algorithm in the process of evaluation, which is more general than the wrapper approach.However, the wrapper approach can always achieve better classification results, which has become a major interest of researches in feature selection [6].Thus, this study focuses on wrapper feature selection.
Recently, there are many metaheuristic algorithms that have been proposed for wrapper feature selection.Huang et al. [7] proposed a new ant colony optimization (ACO) with minimum redundancy maximum relevance criterion (mRMR) as the heuristic measurement for electromyography signals classification.The authors reported the proposed approach can offer better classification results as compared to principle component analysis (PCA) and original feature sets.Mesa et al. [8] introduced a novel mRMR with F-test Correlation Out (FCO) for channel and feature selection.In the same year, Venugopal et al. [9] applied the genetic algorithm (GA) and information gain (IG) to select the relevant features for measuring the muscle fatigue conditions.Phinyomark et al. [10] employed the sequential forward selection (SFS) for feature selection tasks.Moreover, Purushothaman and Vikas [11] made use of particle swarm optimization (PSO) and ACO to solve the feature selection problem in finger movement recognition.Another recent study proposed a new competitive binary grey wolf optimizer (CBGWO) for electromyography signals classification, which shown to be outperformed binary particle swarm optimization (BPSO), GA and binary grey wolf optimization (BGWO) in evaluating the optimal feature subset [12].
Among those feature selection methods, PSO and BPSO are the most frequently used.This is mainly due to their advantageous of simplicity and low computational complexity, which has become of major interest to researchers in feature selection studies [13,14].However, BPSO has the limitations of premature convergence, and it is not good at avoiding the local optimal [15][16][17].In addition, BPSO suffers from the setting of the inertia weight, thus leading unsatisfactory performance [18].Therefore, Chuang et al. [13] developed an improved BPSO for gene selection.The proposed approach aimed to reset the global best solution (gbest) when it does improve for three iterations.Banka and Dara [19] designed a Hamming distance based BPSO with novel fitness function to tackle the high dimensional feature selection problem.Furthermore, Bharti and Singh [20] integrated the opposition based strategy, chaos theory, mutation and fitness based dynamic inertia weight into BPSO for efficient feature selection in text clustering.
In this paper, our goal was to develop a new variant of BPSO that works effectively in feature selection problems.A new feature selection method namely co-evolution binary particle swarm optimization with multiple inertia weight strategy (CBPSO-MIWS) is proposed in this work.To resolve the limitations of BPSO, two strategies are introduced in CBPSO-MIWS.The first strategy is a co-evolution concept, which partitions the population of particles into several species (sub-populations).In this way, the particles can share information within different species, and this increases the global search capability.The second strategy is a multiple inertia weight strategy, which promotes the use of multiple inertia weight schemes in each species iteratively.Since multiple species are involved in CBPSO-MIWS, each species can perform the search with different inertia weight schemes, which is good at improving diversity.The proposed CBPSO-MIWS was tested with 10 benchmark datasets collected from the UCI machine learning repository.In order to examine the efficiency and efficacy of proposed CBPSO-MIWS, four recent and popular feature selection methods include BPSO, genetic algorithm (GA), binary gravitational search algorithm (BGSA) and CBGWO were used in performance comparison.The experimental result showed that CBPSO-MIWS had promising performance in most of the datasets.
The remainder of this paper is organized as follows: Section 2 details the standard binary particle swarm optimization.Section 3 briefly describes the proposed CBPSO-MIWS and its application for feature selection.Section 4 presents the experimental results.The discussions are shown in Section 5.At last, Section 6 concludes the findings of this work.

Binary Particle Swarm Optimization
Binary particle swarm optimization (BPSO) is a binary version of particle swarm optimization (PSO) that has been proposed to solve the binary optimization tasks [21].Like PSO, BPSO involves the personal best (pbest) and global best (gbest) solutions in the velocity and position update.For each particle (solution), the velocity is updated as [13,22]: where v is the velocity, x is the solution (position of particle), w is the inertia weight, c 1 and c 2 are the acceleration factors, r 1 and r 2 are two independent random numbers in [0,1], pbest is the personal best solution, gbest is the global best solution for the entire population, i is the order of particle in the population, d is the dimension of search space, and t is the number of iterations.Note that the velocity is bounded by the maximum velocity, v max and minimum velocity, v min .In this study, the v max and v min were set at 6 and −6, respectively [13].
Afterward, the velocity is converted into probability value using Equation ( 2), and the position of particle is updated as shown in Equation (3): where rand is a random number uniformly distributed between 0 and 1.In BPSO, pbest and gbest play an important role in guiding the particle to move toward the global optimum.Considering the minimization function was applied in this paper.Iteratively, the pbest and gbest are updated as follows: where x is the solution, pbest is the personal best solution, gbest is the global best solution for the entire population, F(.) is the fitness function, and t is the number of iterations.

Co-evolution Binary Particle Swarm Optimization with Multiple Inertia Weight Strategy
Generally speaking, BPSO is a useful optimization tool and it has been successfully applied for many feature selection tasks.However, BPSO suffers from the premature convergence and slow convergence rate [15][16][17].Additionally, one of the major drawbacks of BPSO is the setting of the inertia weight [18].In order to solve the limitations of BPSO, a new co-evolution binary particle swarm optimization with a multiple inertia weight strategy (CBPSO-MIWS) is proposed in this work.
Among birds or fishes, there can be several types of species.Instead of working on a specific species, the co-evolution between different types of species can lead to more efficient local and global search capability.In CBPSO-MIWS, the population of particles is equally divided into several sub-populations, where each sub-population is assumed to consist of one species.For example, a population of 30 particles is partitioned into three sub-populations (species), where each sub-population is comprised of 10 particles.The example is illustrated in Figure 1.

Multiple Inertia Weight Strategy
Briefly, inertia weight is one of the important parameters in BPSO, which is useful in balancing the exploration and exploitation behavior [18].A smaller value of inertia weight ensures high exploitation.By contrast, a larger value of inertia weight guarantees high exploration.In order to achieve optimal performance, a proper balance between exploration and exploitation is critically essential.According to the literature, several types of inertia weight schemes have been proposed to enhance the performance of PSO.However, an inertia weight scheme that performs better in problem A might not work effectively in problem B. To date, there is no universal inertia weight strategy that can provide an optimal performance for all engineering problems.
To resolve the issues above, a multiple inertia weight strategy (MIWS) was introduced.MIWS consists of several inertia weight schemes, which takes advantage of different inertia weight strategies in the evaluation process.Since multiple species are involved in CBPSO-MIWS, a search with different kind of inertia weight strategies can be performed by each species, which is beneficial in enhancing the diversity and avoiding the local optimal.In other words, instead of using a fixed inertia weight strategy, each sub-population carries out the search with different inertia weight to seek out the global optimal solution.In this study, four inertia weight schemes were implemented and they are listed as follows [23][24][25][26]: Inertia weight scheme 1 (IWS 1): Inertia weight scheme 2 (IWS 2): Inertia weight scheme 3 (IWS 3): Inertia weight scheme 4 (IWS 4):

Multiple Inertia Weight Strategy
Briefly, inertia weight is one of the important parameters in BPSO, which is useful in balancing the exploration and exploitation behavior [18].A smaller value of inertia weight ensures high exploitation.By contrast, a larger value of inertia weight guarantees high exploration.In order to achieve optimal performance, a proper balance between exploration and exploitation is critically essential.According to the literature, several types of inertia weight schemes have been proposed to enhance the performance of PSO.However, an inertia weight scheme that performs better in problem A might not work effectively in problem B. To date, there is no universal inertia weight strategy that can provide an optimal performance for all engineering problems.
To resolve the issues above, a multiple inertia weight strategy (MIWS) was introduced.MIWS consists of several inertia weight schemes, which takes advantage of different inertia weight strategies in the evaluation process.Since multiple species are involved in CBPSO-MIWS, a search with different kind of inertia weight strategies can be performed by each species, which is beneficial in enhancing the diversity and avoiding the local optimal.In other words, instead of using a fixed inertia weight strategy, each sub-population carries out the search with different inertia weight to seek out the global optimal solution.In this study, four inertia weight schemes were implemented and they are listed as follows [23][24][25][26]: Inertia weight scheme 1 (IWS 1): Inertia weight scheme 2 (IWS 2): Inertia weight scheme 3 (IWS 3): Inertia weight scheme 4 (IWS 4): where w max and w min are bound on inertia weight, r 3 is a random number uniformly distributed in [0,1], p is the nonlinear modulation index, w 0 is the initial inertia weight, t is the number of iteration and T max is the maximum number of iterations.These inertia weight schemes were chosen due to their promising performances and low complexity in previous works.It is worth noting that other inertia weight schemes are also applicable in CBPSO-MIWS.As for simplicity, we only consider four inertia weight schemes in this paper.
There are several types of inertia weight schemes implemented in CBPSO-MIWS.The question is, which inertia weight scheme (IWS) should be selected for each species in the process of evaluation?Generally, it is extremely difficult to choose a proper inertia weight scheme since the optimal inertia weight might be highly depended on the data and model specification.In order to resolve this problem, we adopted a random selection strategy, which randomly selects an inertia weight scheme for each species in each iteration.Mathematically, the random selection strategy can be represented as follows: where IWS is the inertia weight scheme and rand(1,4) is a random number generated-either 1, 2, 3 or 4. In this way, all inertia weight schemes have equal probability to be selected by each sub-population in each iteration.This, in turn, will not only enhance the diversity of the swarm, but also prevent the algorithm from being trapped in the local optimal.
The pseudocode of CBPSO-MIWS is demonstrated in Algorithm 1.In the first step, an initial population of N particles is randomly initialized (either 1 or 0), and the velocity of each particle is initialized to zero.Next, the population of particles is equally divided into ns sub-populations, where ns is the number of species.Afterward, the fitness of each particle for each species is evaluated.The pbest n and gbest n of each species are set, and the overall best particle from all species is known as Gbest.In each iteration, for each species, the inertia weight scheme (IWS) is selected, as shown in the Equation (10).Then, the inertia weight is computed based on the selected IWS.For each particle in each species, the velocity and position are updated using Equation ( 1) and (3), respectively.In the next step, the fitness of each particle of each species is evaluated.The pbest n and gbest n are again updated.At the end of each iteration, the overall global best particle, gbest is updated.The algorithm is repeated until the maximum number of iterations is reached.Finally, the overall best particle is selected as the optimal feature subset.

Algorithm 1. Pseudocode of CBPSO-MIWS
Initialize a population of particles, Divide the population into ns sub-populations/species, S n (n = 1, 2 . . ., ns) 3) Evaluate the fitness of particles for each species, F(S n ) using fitness function 4) Define the global best particle of each species as gbest n (n = 1, 2 . . ., ns), and select the overall global best particle from gbest n and set it as Gbest 5) Set the personal best particles for each species aspbest n (n = 1, 2 . . ., ns) 6) for t = 1 to the maximum number of iteration, T max 7) for n = 1 to the number of sub-population/species, ns // Multiple Inertia Weight Strategy // 8) Randomly select one IWS using Equation (10) 9) Compute the inertia weight based on the selected IWS 10)

Proposed CBPSO-MIWS for Feature Selection
In this section, the application of CBPSO-MIWS for solving the feature selection problem is described.Feature selection is known as a NP hard combinatorial problem, where the number of possible solutions increases exponentially with the number of features, D. Hence, an exhaustive search that requires a very high computational complexity was impractical.In this paper, a new CBPSO-MIWS is proposed to tackle the feature selection problem in classification tasks.Ultimately, our main goal was to select k potential features from a large available feature set, where k < D. As for feature selection, the solution is represented in binary form, which can be either bit 0 or 1. Bit 1 shows that the feature is selected, while the bit 0 exhibits the unselected feature [13].Figure 2 illustrates an example of a solution with 10 dimensions.As can be observed, four features (2nd, 3rd, 6th and 10th) are selected from the original feature set.

Proposed CBPSO-MIWS for Feature Selection
In this section, the application of CBPSO-MIWS for solving the feature selection problem is described.Feature selection is known as a NP hard combinatorial problem, where the number of possible solutions increases exponentially with the number of features, D. Hence, an exhaustive search that requires a very high computational complexity was impractical.In this paper, a new CBPSO-MIWS is proposed to tackle the feature selection problem in classification tasks.Ultimately, our main goal was to select k potential features from a large available feature set, where k < D. As for feature selection, the solution is represented in binary form, which can be either bit 0 or 1. Bit 1 shows that the feature is selected, while the bit 0 exhibits the unselected feature [13].Figure 2 illustrates an example of a solution with 10 dimensions.As can be observed, four features (2nd, 3rd, 6th and 10th) are selected from the original feature set.
Figure 3 demonstrates the flowchart proposed for CBPSO-MIWS feature selection and classification tasks.Firstly, the benchmark feature set is acquired from UCI machine learning dataset.Then, the proposed CBPSO-MIWS is applied to evaluate the most informative feature subset.In CBPSO-MIWS, a population of initial solutions with D dimensions are randomly generated, where D represents the number of features.The fitness of the initial solutions is evaluated, and the best solution (Gbest) is set.Iteratively, the solutions are updated as shown in Algorithm 1.As for wrapper feature selection, the fitness function that considered both classification performance and feature size is utilized, it can be defined as follows: and: Number of wrongly classified instances Total number of instances Error  (12) where |S| is the length of feature subset, |T| is the total number of features in each dataset, Error is the classification error rate and α is the parameter in [0,1] to control the influence of classification performance and feature size.In this paper, we set α to 0.01 according to [2,3].Note that the Error is computed by using k-nearest neighbor (KNN) with Euclidean distance and k = 5.The KNN was chosen since it offers a very low computational complexity, and it is easy to implement [1,3,27].In the final feature selection step, the global best solution (best feature subset) that comprises of the optimal features is produced.After that, the feature subset is fed into the KNN classifier for the classification process.Figure 3 demonstrates the flowchart proposed for CBPSO-MIWS feature selection and classification tasks.Firstly, the benchmark feature set is acquired from UCI machine learning dataset.Then, the proposed CBPSO-MIWS is applied to evaluate the most informative feature subset.In CBPSO-MIWS, a population of initial solutions with D dimensions are randomly generated, where D represents the number of features.The fitness of the initial solutions is evaluated, and the best solution (Gbest) is set.Iteratively, the solutions are updated as shown in Algorithm 1.As for wrapper feature selection, the fitness function that considered both classification performance and feature size is utilized, it can be defined as follows: and: Error = Number of wrongly classified instances Total number of instances (12) where |S| is the length of feature subset, |T| is the total number of features in each dataset, Error is the classification error rate and α is the parameter in [0,1] to control the influence of classification performance and feature size.In this paper, we set α to 0.01 according to [2,3].Note that the Error is computed by using k-nearest neighbor (KNN) with Euclidean distance and k = 5.The KNN was chosen since it offers a very low computational complexity, and it is easy to implement [1,3,27].In the final feature selection step, the global best solution (best feature subset) that comprises of the optimal features is produced.After that, the feature subset is fed into the KNN classifier for the classification process.

Dataset and Parameter Setting
In this paper, ten benchmark datasets collected from the UCI machine learning repository (https://archive.ics.uci.edu/ml/index.php) were used to validate the performance of CBPSO-MIWS.Table 1 lists the ten benchmark datasets used in this work.For each dataset, the features were first normalized in the range between 0 and 1.In the process of fitness evaluation, the dataset was randomly partitioned into 80% for the training set and 20% for the testing set [4].

Dataset and Parameter Setting
In this paper, ten benchmark datasets collected from the UCI machine learning repository (https://archive.ics.uci.edu/ml/index.php) were used to validate the performance of CBPSO-MIWS.
Table 1 lists the ten benchmark datasets used in this work.For each dataset, the features were first normalized in the range between 0 and 1.In the process of fitness evaluation, the dataset was randomly partitioned into 80% for the training set and 20% for the testing set [4].Furthermore, in order to measure the effectiveness of the proposed CBPSO-MIWS, four recent and popular feature selection methods include BPSO [13,22], genetic algorithm (GA) [28], binary gravitational search algorithm (BGSA) [29] and competitive binary grey wolf optimizer (CBGWO) [12] were used in comparison.GA is an evolutionary algorithm that utilizes the selection, crossover and mutation operators to evolve solutions.On one hand, BGSA is a binary version of the gravitational search algorithm (GSA), and it advances the resolution by calculating the total force to update acceleration, velocity and position of particles.Finally, CBGWO is an improved version of BGWO, in which the competition and leader enhancement strategy are integrated.The parameter settings of BPSO, GA, BGSA, CBGWO and CBPSO-MIWS are listed in Table 2.In the experiment, we tested the CBPSO-MIWS different numbers of species, ns (1, 2, 3, 4, and 5), and we found that the best result was obtained when ns = 3 and 4. To ensure fair comparison, the population size and the maximum number of iterations were fixed at 10 and 100 for each feature selection method.All the analysis was done in MATLAB 2017 software (MathWorks, Massachusetts and United States) by using a computer with Intel Core i3 3.3 GHz and 8.0 GB RAM.

Evaluation Metrics
The CBPSO-MIWS and other feature selection methods were executed for 20 independent runs in order to obtain useful statistical results.For performance evaluation, the best fitness, worst fitness, mean fitness, standard deviation of fitness (STD), accuracy, and feature size were recorded.These parameters can be calculated as follows [27,30,31]: Worst Fitness = max T max t=1 F t (14) Mean Fitness = 1 T max Accuracy = Number of correctly predicted instances Total number of instances where F is the fitness values of best solution, µ is the mean, t is the number of iteration and T max is the maximum number of iterations.In the final step, the average of the parameters over 20 independent runs were calculated and presented as the experimental results.

Experimental Results
Figures 4 and 5 illustrate the convergence curves of five different feature selection methods for 10 datasets.Note that the fitness is the average fitness value obtained from 20 runs.In these Figures, the proposed CBPSO-MIWS is marked with diamond shape on green line.It is observed that the performance of CBPSO-MIWS was superior for most datasets.For dataset 3, 5, 7, 8 and 9, it can be clearly seen that CBPSO-MIWS outperformed other feature selection methods, which converged faster and deeper, to seek out the optimal solution.Such improvement is mostly coming from co-evolution and multiple inertia weight strategy, which greatly enhances the performance of CBPSO-MIWS in feature selection.Moreover, CBPSO-MIWS can usually achieve better fitness value than BPSO.This implies that the proposed approach overtakes BPSO by overcoming the limitations of BPSO in both premature convergence and the setting of inertia weight.On the other hand, CBGWO provided competitive performance in this work, especially for datasets 2, 4 and 6.
Table 3 outlines the results of best fitness, worst fitness, mean fitness, STD, accuracy and feature size of five different feature selection methods for 10 datasets.Note that the best parameter value for each method is bolded.In Table 3, the lower the values of best fitness, worst fitness and mean fitness are, the better the performance is.The STD represents the robustness and consistency of the algorithm.Thus, the feature selection method that scores the lowest STD value has very good robustness, which can produce highly consistent results.On one hand, a higher value of accuracy indicates that more samples have been successfully predicted.For instance, the feature size with a smaller number of features reveals less features are selected.Successively, CBPSO-MIWS showed a very competitive performance in best, worst and mean fitness values.The experimental results highlight that CBPSO-MIWS was very good in selecting the relevant features, thus leading to promising performance.Table 3 outlines the results of best fitness, worst fitness, mean fitness, STD, accuracy and feature size of five different feature selection methods for 10 datasets.Note that the best parameter value for each method is bolded.In Table 3, the lower the values of best fitness, worst fitness and mean fitness are, the better the performance is.The STD represents the robustness and consistency of the algorithm.Thus, the feature selection method that scores the lowest STD value has very good  From Table 3, CBPSO-MIWS offered mean fitness with a very low STD value.This again expresses the high consistency of CBPSO-MIWS in feature selection.Another important result is the accuracy.Based on the result obtained, CBPSO-MIWS ranked first in six datasets, which outperformed other methods in feature selection problems.By applying the CBPSO-MIWS to evaluate the optimal feature subset, a high classification performance can be guaranteed.As for feature size, it shows that roughly half of the features can be eliminated for all methods, which indicates that some of the features are redundant and they badly degraded the classification result.The experimental result clearly evinced the impact of feature selection in classification tasks.
Since the classification result of each subject is the average accuracy obtained from 20 independent runs, intuitively, the two sample t-test with 95% confidential level was applied to examine whether the classification performance achieved by the proposed CBPSO-MIWS was significantly better than the other methods.In the statistical test, a null hypothesis indicates that the classification performance of two different methods were similar.If the p-value is less than 0.05, then the null hypothesis is rejected, which claims that there is a significant difference in classification performance among two different methods.Table 4 demonstrates the results of the t-test with p-values by using the CBPSO-MIWS as the reference method.In this Table, " * " indicates the performance of proposed method was significantly better, while " * * " means the performance of CBPSO-MIWS was significantly worse.As can be observed, the classification performance of CBPSO-MIWS was significantly better than BPSO and GA (p-value < 0.05) on eight datasets, respectively.On one side, CBPSO-MIWS significantly outperformed BGSA and CBGWO (p-value < 0.05) for four datasets.In this study, the most important measurement is the accuracy, which indicates goodness of the features selected by the proposed method in classification tasks.For the ease of understanding, the accuracies obtained by the 10 datasets were averaged, and the result of mean accuracy is displayed in Figure 6.In this Figure, the error bars represent the standard deviation value.Averaged across 10 datasets, it shows that the best mean accuracy was achieved by CBPSO-MIWS (90.17%), followed by CBGWO (89.35%).When inspecting the results, the classification performance of CBPSO-MIWS was superior against BPSO, GA and BGSA, and slightly better than CBGWO.This again validates the effectiveness of the CBPSO-MIWS in selecting the significant features.
Informatics 2019, 6, x 12 of 15 CBPSO-MIWS was significantly worse.As can be observed, the classification performance of CBPSO-MIWS was significantly better than BPSO and GA (p-value < 0.05) on eight datasets, respectively.On one side, CBPSO-MIWS significantly outperformed BGSA and CBGWO (p-value < 0.05) for four datasets.In this study, the most important measurement is the accuracy, which indicates goodness of the features selected by the proposed method in classification tasks.For the ease of understanding, the accuracies obtained by the 10 datasets were averaged, and the result of mean accuracy is displayed in Figure 6.In this Figure, the error bars represent the standard deviation value.Averaged across 10 datasets, it shows that the best mean accuracy was achieved by CBPSO-MIWS (90.17%), followed by CBGWO (89.35%).When inspecting the results, the classification performance of CBPSO-MIWS was superior against BPSO, GA and BGSA, and slightly better than CBGWO.This again validates the effectiveness of the CBPSO-MIWS in selecting the significant features.Table 5 exhibits the average computational cost of five feature selection methods.As can be observed, the highest computational result was achieved by GA.In comparison with BPSO, CBPSO-MIWS was more time consuming.Nevertheless, CBPSO-MIWS contributed better classification performance.On the other hand, CBGWO offered the fastest processing speed in this work.This expected because CBGWO utilizes a competition strategy, in which only half of the population is used in evaluation process.Even though the computational complexity of CBPSO-MIWS was higher Table 5 exhibits the average computational cost of five feature selection methods.As can be observed, the highest computational result was achieved by GA.In comparison with BPSO, CBPSO-MIWS was more time consuming.Nevertheless, CBPSO-MIWS contributed better classification performance.On the other hand, CBGWO offered the fastest processing speed in this work.This expected because CBGWO utilizes a competition strategy, in which only half of the population is used in evaluation process.Even though the computational complexity of CBPSO-MIWS was higher than CBGWO, CBPSO-MIWS could often affirm promising results.For instance, only four simple inertia weight schemes are utilized in CBPSO-MIWS.Hence, it is believed that the performance of CBPSO-MIWS can show great improvement when more formidable inertia weight schemes are implemented.

Discussion
In the present study, a new co-evolution binary particle swarm optimization with multiple inertia weight strategy (CBPSO-MIWS) has been proposed for wrapper feature selection.In the proposed scheme, the co-evolution concept and multiple inertia weight strategy are adopted to heighten the performance of CBPSO-MIWS in feature selection.Owing to the co-evolution strategy, the particles are able to share the information while performing the search on different search spaces.This in turn will maximize the global search capability in the process of evaluation.On one hand, the multiple inertia weight strategy promotes the usage of different weight components, which is good at improving the diversity of the algorithm.By making full use of these mechanisms, CBPSO-MIWS has the ability to prevent premature convergence, thus leading to promising results.This study has shown the impact of feature selection in classification tasks.CBPSO-MIWS not only eliminated redundant and irrelevant features, but also enhanced the classification performance.It is worth noting that CBPSO-MIWS can be applied without prior knowledge.CBPSO-MIWS automatically selects the best feature subset for each dataset, and then stores the selected features for real time application.The experimental results evidently show the superiority of CBPSO-MIWS in the feature selection problem.The findings of the current work indicate that CBPSO-MIWS is a powerful method, which can select the optimal feature subset that best describes the target in the classification process.Thus, CBPSO-MIWS can be a useful tool in engineering, rehabilitation and clinical applications.
There were several limitations in this work.First, only four types of inertia weight schemes were used in this research.It must be mentioned that other inertia weight schemes are also applicable in CBPSO-MIWS.Second, the number of species/sub-population, ns is fixed at 3 in the present work.The users are encouraged to test different numbers of species in order to achieve the optimal performance.Note that the number of species is related to the population size.A larger number of species is recommended if the population size is larger, which ensures the goodness of the multiple inertia weight strategy in the optimization task.

Conclusions
Feature selection is an effective way to improve classification performance with a minimal number of features.In this paper, we have proposed a new variant of BPSO, namely co-evolution binary particle swarm optimization with a multiple inertia weight strategy (CBPSO-MIWS) for wrapper feature selection.The main contribution was the proposal of a co-evolution concept and multiple inertia weight strategy (MIWS) into CBPSO-MIWS, which improved the and prevented the algorithm from having premature convergence.Ten benchmark datasets from the UCI repository were used to test the proposed approach, and the results were compared with other recent and popular feature selection approaches.Based on the results obtained, CBPSO-MIWS can provide competitive and promising performances against other approaches.In comparison with BPSO, CBPSO-MIWS usually selected a subset of minimal features that gave the highest accuracy in this work.Thus, it is concluded CBPSO-MIWS can be useful in engineering, rehabilitation and clinical applications.
This research makes open to the public, different future research directions where co-evolution and multiple inertia weight strategies can be implemented in other optimization algorithms.In future, more inertia weight schemes can be added into CBPSO-MIWS for performance enhancement.Moreover, the adaptive scheme that can automatically adjust the number of species can be implemented in CBPSO-MIWS for the extension of future work.

Figure 1 .
Figure 1.The example of structure of co-evolution binary particle swarm optimization with multiple inertia weight strategy (CBPSO-MIWS).

Figure 1 .
Figure 1.The example of structure of co-evolution binary particle swarm optimization with multiple inertia weight strategy (CBPSO-MIWS).

Figure 2 .
Figure 2.An example of a solution with 10 dimensions.

Figure 3 .
Figure 3. Overview of proposed CBPSO-MIWS for feature selection and classification.

Figure 3 .
Figure 3. Overview of proposed CBPSO-MIWS for feature selection and classification.

Figure 5 .
Figure 5. Convergence curves of five feature selection methods on datasets 7 to 10.

Figure 6 .
Figure 6.Mean accuracy of five different feature selection methods over 10 datasets.

Figure 6 .
Figure 6.Mean accuracy of five different feature selection methods over 10 datasets.

Table 1 .
The ten utilized benchmark datasets.

Table 1 .
The ten utilized benchmark datasets.

Table 3 .
Experimental results of five feature selection methods for 10 datasets.

Table 4 .
The result of t-test with p-values (The CBPSO-MIWS is used as reference algorithm).

Table 4 .
The result of t-test with p-values (The CBPSO-MIWS is used as reference algorithm).

Table 5 .
The average computational cost of five feature selection methods.