Hybrid Binary Particle Swarm Optimization Di ﬀ erential Evolution-Based Feature Selection for EMG Signals Classiﬁcation

: To date, the usage of electromyography (EMG) signals in myoelectric prosthetics allows patients to recover functional rehabilitation of their upper limbs. However, the increment in the number of EMG features has been shown to have a great impact on performance degradation. Therefore, feature selection is an essential step to enhance classiﬁcation performance and reduce the complexity of the classiﬁer. In this paper, a hybrid method, namely, binary particle swarm optimization di ﬀ erential evolution (BPSODE) was proposed to tackle feature selection problems in EMG signals classiﬁcation. The performance of BPSODE was validated using the EMG signals of 10 healthy subjects acquired from a publicly accessible EMG database. First, discrete wavelet transform was applied to decompose the signals into wavelet coe ﬃ cients. The features were then extracted from each coe ﬃ cient and formed into the feature vector. Afterward, BPSODE was used to evaluate the most informative feature subset. To examine the e ﬀ ectiveness of the proposed method, four state-of-the-art feature selection methods were used for comparison. The parameters, including accuracy, feature selection ratio, precision, F -measure, and computation time were used for performance measurement. Our results showed that BPSODE was superior, in not only o ﬀ ering a high classiﬁcation performance, but also in having the smallest feature size. From the empirical results, it can be inferred that BPSODE-based feature selection is useful for EMG signals classiﬁcation.


Introduction
Electromyography (EMG) is a biomedical signal that records electric potential when there is a muscle contraction.Recently, the usefulness of EMG as a control source of myoelectric prosthetics has received much attention from biomedical researchers.The recognition of hand movements enables the application of multi-functional myoelectric prosthetics in engineering, rehabilitation, and clinical areas.However, myoelectric control is still limited by inadequate control techniques [1].In addition, EMG signals are easily influenced by noise due to the fact of its complex nature [2].Therefore, most researchers apply advanced signal processing, feature extraction, and feature selection techniques to extract only the useful information from the signal.
In previous studies, discrete wavelet transform (DWT) was found to be the most frequently used signal processing method due to its effectiveness in the analysis of EMG signals [3,4].Intuitively, DWT offers the optimal time-frequency resolution by decomposing the signal into multi-resolution Axioms 2019, 8, 79 3 of 17

Binary Particle Swarm Optimization
Binary particle swarm optimization (BPSO) was first proposed by Kennedy and Eberhart [19] to solve binary optimization problems.In BPSO, the population is known as a swarm, which comprises N particles that flow through the multidimensional search space.The particle represents the potential solution, and it moves through the search space to seek out the best solution.Each particle searches for the global maximum or minimum according to its own experience and knowledge [20].
For a D dimensional problem, the velocity of the particle is expressed as V = (v i1 , v i2 , . . ., v iD ) and the position of the particle is denoted as X = (x i1 , x i2 , . . ., x iD ), where i represents the order of the particle in the population.In BPSO, the optimal location of each particle is known as Pbest and the global best solution in the population is called Gbest.For each iteration t, the particle updates its velocity as follow: where x is the position of the particle, v denotes the velocity of the particle, i is the order of the particle in the population, d is the dimension of the search space, w is the inertia weight, c 1 and c 2 are the acceleration coefficients, and r 1 and r 2 are the two independent random numbers uniformly distributed between 0 and 1.Then, the velocity is converted into a probability value using the sigmoid function as follow: Afterward, the position of the particle is updated as: where δ is a random number uniformly distributed between 0 and 1.
In BPSO, an inertia weight is gradually decreased from a higher to a lower value in order to ensure a well and stable balance between global and local exploration [21].At each iteration, the inertia weight is computed as: where w max and w min are the bounds on the inertia weight, t is the current iteration, and T is the maximum number of iterations.In this study, w max and w min were set to 0.9 and 0.4, respectively.

Binary Differential Evolution
Differential evolution (DE) is an evolutionary heuristic approach proposed by Storn and Price [22] to minimize the non-linear and continuous function.Originally, DE was designed to solve the continuous value problem; for feature selection, the DE is modified into binary differential evolution (BDE) according to Reference [13].Binary differential evolution is a simple, direct use, and efficient feature selection method.It is composed of three main operators, which are mutation, crossover, and selection.
Firstly, BDE generates an initial population for a D dimensional problem randomly, where D is the number of features that need to be optimized.During the mutation stage, three random vectors x r1 , x r2 , and x r3 are randomly selected from the population for vector x i .Note that r 1 r 2 r 3 i.Then, the difference vector is computed as follow: where i is the order of the vector in the population and d is the dimension of the vector.If the dth dimension of x r1 is equal to x r2 , then the difference vector will become 0. Otherwise, the differential vector will become the same as x r1 .Next, the mutation is performed as shown in Equation (6).
After that, the crossover process is executed as follow: where u is the trial vector, x is the vector, d is the dimension of search space, CR ∈ (0,1) is the crossover rate, d rand is a random feature index distributed between 1 and D, and δ is a random number distributed between 0 and 1.
For the selection process, if the fitness value of the trial vector is better, then the current vector will be replaced.Otherwise, the current vector is kept for the next generation.

Materials and Methods
Figure 1 illustrates the flow diagram of the proposed EMG pattern recognition system.In the first step, the EMG data are acquired from the publicly accessible EMG database.Next, the discrete wavelet transform (DWT) is applied to decompose the EMG signals into multi-resolution coefficients.Then, the features are extracted from the wavelet coefficients and form the feature vector.After that, five feature selection methods including BBA, BDE, BFPA, BPSO, and BPSODE are used to evaluate the optimal feature subset.In the final step, the k-nearest neighbor (KNN) is employed for the classification process.
where i is the order of the vector in the population and d is the dimension of the vector.If the dth dimension of xr1 is equal to xr2, then the difference vector will become 0. Otherwise, the differential vector will become the same as xr1.Next, the mutation is performed as shown in Equation ( 6).

1 If 1
Otherwise After that, the crossover process is executed as follow: ( ) where u is the trial vector, x is the vector, d is the dimension of search space, CR є (0,1) is the crossover rate, drand is a random feature index distributed between 1 and D, and δ is a random number distributed between 0 and 1.
For the selection process, if the fitness value of the trial vector is better, then the current vector will be replaced.Otherwise, the current vector is kept for the next generation.

Materials and Methods
Figure 1 illustrates the flow diagram of the proposed EMG pattern recognition system.In the first step, the EMG data are acquired from the publicly accessible EMG database.Next, the discrete wavelet transform (DWT) is applied to decompose the EMG signals into multi-resolution coefficients.Then, the features are extracted from the wavelet coefficients and form the feature vector.After that, five feature selection methods including BBA, BDE, BFPA, BPSO, and BPSODE are used to evaluate the optimal feature subset.In the final step, the k-nearest neighbor (KNN) is employed for the classification process.

EMG Data
The Non-Invasive Adaptive Prosthetics (NinaPro) project [23] is a publicly accessible EMG database that has previously been applied in EMG pattern recognition studies.In this study, the NinaPro database 4 (DB4), composed of the EMG signals of twelve different hand movement types (Exercise A), was utilized.The twelve hand movement types included index flexion, index extension, middle flexion, middle extension, ring flexion, ring extension, little finger flexion, little finger extension, thumb adduction, thumb abduction, thumb flexion, and thumb extension [24].The DB4 contained the EMG data of 10 healthy subjects.In the experiment, 12 electrodes (12 channels) were implemented.The subjects were instructed to perform each movement type for 5 s, followed by a resting state of 3 s.In addition, each movement type was repeated six times, and the EMG signal was sampled at the rate of 2000 Hz [24].Note that all resting states were removed before any further processing.

EMG Data
The Non-Invasive Adaptive Prosthetics (NinaPro) project [23] is a publicly accessible EMG database that has previously been applied in EMG pattern recognition studies.In this study, the NinaPro database 4 (DB4), composed of the EMG signals of twelve different hand movement types (Exercise A), was utilized.The twelve hand movement types included index flexion, index extension, middle flexion, middle extension, ring flexion, ring extension, little finger flexion, little finger extension, thumb adduction, thumb abduction, thumb flexion, and thumb extension [24].The DB4 contained the EMG data of 10 healthy subjects.In the experiment, 12 electrodes (12 channels) were implemented.The subjects were instructed to perform each movement type for 5 s, followed by a resting state of 3 s.In addition, each movement type was repeated six times, and the EMG signal was sampled at the rate of 2000 Hz [24].Note that all resting states were removed before any further processing.

Discrete Wavelet Transform-Based Feature Extraction
Recently, discrete wavelet transform (DWT) has shown its potential and capability in biomedical signal processing.Discrete wavelet transform has the advantage of varying the time and frequency window, which can provide an optimal time-frequency resolution in EMG pattern recognition [25].
Basically, DWT decomposes the EMG signal into multi-resolution by filtering the signal with a high-pass filter, h(n) and low-pass filter, h(n).The first decomposition of DWT can be expressed as: where x(n) is the input EMG signal, and y high (k) and y low (k) represent the detail and approximation, respectively.In wavelet decomposition, detail (D) exhibits the signal at high frequency, whereas the low-frequency component is represented by the approximation (A) [26].Previous works indicated that the selection of the mother wavelet and decomposition level were the main factors that can strongly affect the performance of DWT in EMG pattern recognition.According to the finding of Reference [27], DWT at the fourth decomposition level was employed in this work.An illustration of DWT is displayed in Figure 2.
signal processing.Discrete wavelet transform has the advantage of varying the time and frequency window, which can provide an optimal time-frequency resolution in EMG pattern recognition [25].
Basically, DWT decomposes the EMG signal into multi-resolution by filtering the signal with a highpass filter, h(n) and low-pass filter, g(n).The first decomposition of DWT can be expressed as: where x(n) is the input EMG signal, and yhigh(k) and ylow(k) represent the detail and approximation, respectively.In wavelet decomposition, detail (D) exhibits the signal at high frequency, whereas the low-frequency component is represented by the approximation (A) [26].Previous works indicated that the selection of the mother wavelet and decomposition level were the main factors that can strongly affect the performance of DWT in EMG pattern recognition.According to the finding of Reference [27], DWT at the fourth decomposition level was employed in this work.An illustration of DWT is displayed in Figure 2.
As for the mother wavelet selection, twelve mother wavelets including db4, db6, db8, sym4, sym6, sym8, bior2.2,bior3.3,bior4.4,coif3, coif4, and coif5 are investigated.From the experiment, we found that DWT with bior4.4 offered the optimal performance in the current work.Hence, only DWT with bior4.4 at the fourth decomposition level was applied in the rest of this paper.In this work, five popular features, namely, mean absolute value (MAV), wavelength (WL), zero crossing (ZC), slope sign change (SSC), and maximum fractal length (MFL) were extracted from each As for the mother wavelet selection, twelve mother wavelets including db4, db6, db8, sym4, sym6, sym8, bior2.2,bior3.3,bior4.4,coif3, coif4, and coif5 are investigated.From the experiment, we found that DWT with bior4.4 offered the optimal performance in the current work.Hence, only DWT with bior4.4 at the fourth decomposition level was applied in the rest of this paper.
In this work, five popular features, namely, mean absolute value (MAV), wavelength (WL), zero crossing (ZC), slope sign change (SSC), and maximum fractal length (MFL) were extracted from each wavelet coefficient to form the feature set.These features were selected due to their promising performances in previous works [3,4,28].

Proposed Hybrid Binary Particle Swarm Optimization Differential Evolution
In this paper, a hybrid binary particle swarm optimization differential evolution method (BPSODE) that combines the superior capability of BPSO and BDE algorithms is proposed to solve the feature selection problem in EMG signals classification.In the proposed BPSODE, the BPSO and BDE algorithms are computed in sequence.For example, BPSO is computed in the first, third, and fifth iterations, whereas the second, fourth, and sixth iterations are performed by BDE.In this way, BPSODE can fully take the advantages of BPSO and BDE without the additional computation cost.However, both BPSO and BDE have the limitations of premature convergence.To prevent the BPSODE from being trapped in the local optima, two simple schemes are introduced.The first scheme is dynamic inertia weight, which enables BPSODE to track the optimal solution in dynamic environment.The second scheme is the dynamic crossover rate.Instead of using a fixed crossover rate, a dynamic crossover rate is more capable of balancing the exploration and exploitation.

Dynamic Inertia Weight
The inertia weight is a parameter proposed by Shi and Eberhart [29] to enhance the performance of PSO.Generally, a larger inertia weight leads to good global exploration.On the contrary, a smaller inertia weight tends to promote the local exploration around the best solution [30].In BPSO, the inertia weight linearly decreases from 0.9 to 0.4 for balancing the global and local exploration.However, in the experiment, we found that such a mechanism did not work very well in BPSODE.Thus, we applied a dynamic inertia weight as shown in Equation (10).
where r 3 is a random number distributed between 0 and 1. Figure 3a illustrates an example of dynamic inertia weight.As can be seen, the inertia weight was generated uniformly between 0.5 and 1.Since it is difficult to estimate the exploration and exploitation stage, a random inertia weight is more appropriate to be used in this dynamic environment [31].Algorithm 1 demonstrates the pseudocode of BPSODE.Initially, the position of particles is randomly initialized in binary form (bit 1 or 0).The velocity of particles is initialized to zero.Next, the fitness of each particle is evaluated, and the Pbest and Gbest are defined.Then, BPSO (iteration with odd number) and BDE (iteration with even number) algorithms are computed in sequence.For the iteration with odd number, the inertia weight is updated as shown in Equation (10).Afterward, the position and velocity of particles are updated using Equations ( 1) and (3), respectively.Then, the fitness of each particle is evaluated.As for the iteration with even number, the crossover rate is updated as shown in Equation (11).After that, the mutation and crossover operations are computed as shown in Equations ( 6) and ( 7), respectively.From the mutation and crossover, the trial vector is

Dynamic Crossover Rate
The crossover rate (CR) is a parameter introduced in BDE.It controls the number of d dimension parameter values copied from the mutant vector [32].A higher value of CR indicates more parameters are duplicated from the mutant vector.By contrast, a lower CR means less parameters are reproduced from the mutant vector.In BPSODE, the dynamic CR is proposed as shown in Equation (11).
where t is the current iteration and T is the maximum number of iterations.An example of dynamic crossover rate is exhibited in Figure 3b.One can see that the crossover rate was reduced from 1 to 0 as the number of iterations increased.At the beginning, a higher CR ensured more parameters were reproduced to improve the exploration (global search).As time passed, a lower CR guaranteed the exploitation process (local search).Algorithm 1 demonstrates the pseudocode of BPSODE.Initially, the position of particles is randomly initialized in binary form (bit 1 or 0).The velocity of particles is initialized to zero.Next, the fitness of each particle is evaluated, and the Pbest and Gbest are defined.Then, BPSO (iteration with odd number) and BDE (iteration with even number) algorithms are computed in sequence.For the iteration with odd number, the inertia weight is updated as shown in Equation (10).Afterward, the position and velocity of particles are updated using Equations ( 1) and (3), respectively.Then, the fitness of each particle is evaluated.As for the iteration with even number, the crossover rate is updated as shown in Equation (11).After that, the mutation and crossover operations are computed as shown in Equations ( 6) and (7), respectively.From the mutation and crossover, the trial vector is generated.The fitness of newly generated trial vector is then evaluated and compared with current particle.If the trial vector results in better fitness, then the current particle will be replaced; otherwise, the current particle is kept for the next iteration.At the end of each iteration, the Pbest and Gbest are updated.The algorithm is repeated until the termination criteria (maximum number of iterations) is satisfied.At last, the global best solution is pointed out.if mod(t,2) = 1 (6) w = 0.5 + rand(0,1) 2 ( 7) for d = 1 to number of dimension, D (9) x d i (t + 1) = 1 ( 13) else (14) x d i (t + 1) = 0 (15) end if (16) end for (17) Evaluate the fitness of new particle, F(x i (t + 1)) (18) end for

Application of BPSODE for Feature Selection
In BPSODE, the position of the particle is expressed in binary form; either bit value 1 or 0, where bit 1 and bit 0 represent the selected feature and non-selected feature, respectively.For example, given a solution X= {0,1,1,0,0,1,1,0,0,0}, it shows that four features (2nd, 3rd, 6th, and 7th features) are selected.
As for the wrapper feature selection, the fitness function that maximizes the classification performance and minimizes the number of features is utilized, and it can be defined as: E R = No.o f wrongly predicted instances Total number o f instances (13) where E R is the error rate computed by a learning algorithm, |R| is the length of the feature subset, |S| is the total number of features, and α is the parameter that control the weight between error rate and ratio of selected features.Considering the classification performance to be the most important measurement, the α was set to 0.9 in this work.For fitness evaluation, the k-nearest neighbor (KNN) with a Euclidean distance and k = 1 was used as the learning algorithm.The KNN was chosen because it is a common, fast, and simple machine learning algorithm that has been widely applied in feature selection studies [33,34].For performance evaluation, the 10-fold cross validation method was implemented.In this scheme, the data were randomly divided into 10 equal parts.Each part took turns testing while the remaining parts (nine parts) were used for the training set.The results obtained from the 10 folds were then averaged and recorded.

Results and Discussions
The EMG signals of the 10 subjects were gathered from the NinaPro database 4, comprising 10 different datasets.In the next step, DWT was applied to decompose the EMG signals into multi-resolution coefficients.It is worth noting that DWT produced eight coefficients (four details and four approximations) at the fourth decomposition level.Then, five features were extracted from each wavelet coefficient, and the feature set was formed.In total, 480 features (12 channels × 5 features × 8 coefficients) were extracted from each movement from each subject.On the other hand, 72 instances (12 hand movement types × 6 repetitions) were acquired from each subject.As a result, for each subject (dataset), a feature vector with a matrix of 72 × 480 was formed.In order to prevent numerical problems, the features were normalized between 0 and 1. Afterward, the feature selection algorithms were used to select the most informative feature subset.At last, the selected features were then fed into the KNN for the classification of twelve different hand movement types (12 classes).The classification process is critically important because it shows how accurate a myoelectric prosthetic can be.In a nutshell, a myoelectric prosthetic with higher accuracy allows the users to perform the hand movement types accurately.

Comparison Algorithms and Evaluation Metrics
In this study, five feature selection algorithms including BBA [17], BFPA [18], BPSO [14], BDE [13], and BPSODE were used to evaluate the best feature subset (best combination of features and channels).The specific parameter setting of utilized algorithms are given in Table 1.To ensure fair comparison, the maximum number of iterations (T) was fixed at 100.On the one hand, the population size (N) was chosen at 80. Note that the analysis for the selection of the population size will be discussed in Section 4.2.1.All analyses were conducted in MATLAB 9.3 using a computer with an Intel Core i5-9400F CPU 2.90 GHz and 16.0 GB RAM.To evaluate the effectiveness of the proposed method, four statistical metrics including accuracy, feature selection ratio (FSR), precision, and F-measure were calculated, and they are defined as follows [35][36][37]: Precision = TP TP + FP (16) where |R| is the length of the feature subset, |S| is the total number of features, TP is the true positive, FP is the false positive, and FN is the false negative.To obtain the statistical results, each algorithm (i.e., BBA, BPSO, BDE, BFPA, and BPSODE) was executed for 20 independent runs.Then, the averaged results obtained from 20 runs were recorded for performance comparison.

Effect of Population Size
In the first part of the experiment, we studied the effect of population size.Briefly, population size is one of the key factors that can strongly affect the performance of BPSODE in feature selection.A higher population size can usually offer better performance; however, more computation time is required [38].In this paper, five different population sizes (i.e., 20, 40, 60, 80 and 100) were investigated.Figure 4 illustrates the boxplot of BPSODE with five different population sizes across 10 subjects.We used accuracy as the evaluation metric since it is the most important measurement in this work.From Figure 4, it is noted that the optimal result was seen with the population size of 80.In comparison with other population sizes, the population size of 80 contributed to the highest median value (red line in the box) of 92.64%.The result shows that the population size of 80 overwhelmed its competitors in the current work.Thereafter, only the population size of 80 was applied in the rest of this paper.
In comparison with other population sizes, the population size of 80 contributed to the highest median value (red line in the box) of 92.64%.The result shows that the population size of 80 overwhelmed its competitors in the current work.Thereafter, only the population size of 80 was applied in the rest of this paper.

Comparison Results
In the second part of the experiment, we examined the efficacy of BPSODE by comparing its performance with BBA, BDE, BPSO, and BFPA. Figure 5 illustrates the classification performance of five different feature selection methods on 10 subjects (detailed results on accuracy can be found in Table 2).As can be seen, the accuracies achieved by BDE and BPSO were relatively poor.This result highlights that the features selected by BDE and BPSO might contain redundant and irrelevant information, which caused them to be trapped in the local optima at the early stagnation.
From Figure 5, it can be seen that BPSODE reached the highest accuracy in most cases (five out of ten subjects).Especially for Subject 3, a great increment of 4.43% accuracy was found as compared to BDE.Based on the results obtained, the best feature selection method was found to be BPSODE, followed by BBA.On average across 10 subjects, the experimental result showed that BPSODE overtook other the algorithms with the highest mean accuracy of 92.5%.Obviously, BPSODE has proven its capability in effectively searching for significant features in the feature space.

Comparison Results
In the second part of the experiment, we examined the efficacy of BPSODE by comparing its performance with BBA, BDE, BPSO, and BFPA. Figure 5 illustrates the classification performance of five different feature selection methods on 10 subjects (detailed results on accuracy can be found in Table 2).As can be seen, the accuracies achieved by BDE and BPSO were relatively poor.This result highlights that the features selected by BDE and BPSO might contain redundant and irrelevant information, which caused them to be trapped in the local optima at the early stagnation.Table 2 demonstrates the experimental results of accuracy, feature selection ratio (FSR), precision, and F-measure of the five different feature selection methods on 10 subjects.In this table, the best result of each metric is highlighted in bold text.A higher FSR means that more features are selected, while a lower FSR indicates less feature are selected by the algorithm.On the one hand, the higher the accuracy, precision, and F-measure, the better the performances.
Inspecting the result on FSR, it is seen that roughly half of the original features were eliminated, especially for BPSODE, which facilitated a smaller number of features while keeping a high classification performance.Based on the result obtained, the lowest FSR was achieved by BPSODE in all cases.The reduction in the number of features not only decreased the complexity of the recognition  From Figure 5, it can be seen that BPSODE reached the highest accuracy in most cases (five out of ten subjects).Especially for Subject 3, a great increment of 4.43% accuracy was found as compared to BDE.Based on the results obtained, the best feature selection method was found to be BPSODE, followed by BBA.On average across 10 subjects, the experimental result showed that BPSODE overtook other the algorithms with the highest mean accuracy of 92.5%.Obviously, BPSODE has proven its capability in effectively searching for significant features in the feature space.
Table 2 demonstrates the experimental results of accuracy, feature selection ratio (FSR), precision, and F-measure of the five different feature selection methods on 10 subjects.In this table, the best result of each metric is highlighted in bold text.A higher FSR means that more features are selected, Axioms 2019, 8, 79 13 of 17 while a lower FSR indicates less feature are selected by the algorithm.On the one hand, the higher the accuracy, precision, and F-measure, the better the performances.
Inspecting the result on FSR, it is seen that roughly half of the original features were eliminated, especially for BPSODE, which facilitated a smaller number of features while keeping a high classification performance.Based on the result obtained, the lowest FSR was achieved by BPSODE in all cases.The reduction in the number of features not only decreased the complexity of the recognition system, but also enhanced the prediction accuracy.
From Table 2, it can be seen that BPSODE scored the highest precision and F-measure values for five subjects.These findings suggest that BPSODE is more capable of solving feature selection problems in EMG signals classification.The superiority of BPSODE mainly comes from the hybridization strategy, which adopts the advantages of both BDE and BPSO for searching significant features in the feature space.
Furthermore, the statistical t-test with 95% confidence level was used to examine whether there was a significant difference in the classification performance between BPSODE and other competitors.The results of the t-tests with p-values are presented in Table 3.In this table, the symbols "w/t/l" indicate that BPSODE was significantly better to (win), equal to (tie), and significantly worse to (lose) other feature selection methods.By applying the t-test, it shows that BPSODE was significantly better than BDE and BPSO (p-value < 0.05) with at least three subjects.In addition, BPSODE did not provide any significant worse results against its competitors.This again validates the efficiency of BPSODE for solving the feature selection problem in EMG signals classification.Figure 6 illustrates the convergence curve of the five different feature selection methods on 10 subjects.Note that the fitness is the average fitness values obtained from 20 runs.As can be seen, BPSODE achieved the lowest fitness value on most subjects, followed by BBA.Through the observation in Figure 6, BDE converged faster, but without acceleration.This explains why BDE did not work very well for high-dimensional feature selection.On the one side, one can see that BPSO and BBA converged faster at the initial stage.However, as time (iteration) passed, BPSO and BBA were trapped in the local optima, even though BPSODE did not give the fastest convergence speed.Nevertheless, BPSODE kept tracking for the global optimum, thus leading to a very good diversity.As a result, BPSODE overtook BBA, BFPA, BPSO, and BDE in evaluating the most informative feature subset.
As a result, BPSODE overtook BBA, BFPA, BPSO, and BDE in evaluating the most informative feature subset.Table 4 outlines the computational cost of the five different feature selection methods on 10 subjects.As can be observed, BPSODE was computationally efficient in finding the best feature subset.In comparison with BPSO and BDE, BPSODE did not show much of an increment in computation time.This was because in BPSODE, the BPSO and BDE algorithms are computed in sequence, and thus, no additional computation cost is needed for the evaluations.Table 4 outlines the computational cost of the five different feature selection methods on 10 subjects.As can be observed, BPSODE was computationally efficient in finding the best feature subset.In comparison with BPSO and BDE, BPSODE did not show much of an increment in computation time.This was because in BPSODE, the BPSO and BDE algorithms are computed in sequence, and thus, no additional computation cost is needed for the evaluations.In this paper, we proposed BPSODE to solve feature selection problems in EMG signals classification.The BPSODE is the hybridization of BPSO and BDE, which inherits the advantages of both BPSO and BDE in feature selection.From the experiments, it can be inferred that the performance of BPSODE was superior against BPSO and BDE.In terms of accuracy, FSR, precision, and F-measure values, BPSODE proved to be the most powerful algorithm in this work.
The following observations explain why BPSODE outperformed BPSO and BDE in feature selection.Firstly, the hybridization of BPSO and BDE allowed a good exchange between exploitation and exploration.This restricts BPSODE from being trapped in the local optima.Secondly, a dynamic crossover rate offered a high diversity in the searching process.Lastly, the implementation of a dynamic inertia weight improved the convergence, which enhanced the performance of BPSODE in searching for the potential solution.
On the whole, the proposed BPSODE outperformed other conventional feature selection methods in exploring the feature search space.The BPSODE not only enhanced the BPSO algorithm with the aid of BDE in exploration, but also prevented itself from being trapped in the local solution.The present study showed that proper hybridization was able to overcome the limitations of two different algorithms leading to promising results.

Conclusions
In this study, a hybrid binary particle swarm optimization differential evolution (BPSODE) was proposed to solve the feature selection problem in EMG signals classification.In BPSODE, the BPSO and BDE algorithms are computed in sequence, hence, no extra computational cost is required.Additionally, two simple schemes, the dynamic inertia weight and dynamic crossover rate were introduced to improve the convergence and diversity of BPSODE in the searching process.In comparison with BBA, BFPA, BDE, and BPSO, our BPSODE can effectively remove the redundant features and maximize the classification accuracy.Successively, BPSODE overtook other algorithms in terms of the classification performance, FSR, precision, and F-measure values.Therefore, it can be inferred that BPSODE is a powerful feature selection tool, and BPSODE can be useful in engineering, rehabilitation, and clinical applications.In the future, the hybridization of other feature selection methods is recommended for tackling feature selection problems.

Figure 1 .
Figure 1.Flow diagram of the proposed electromyography (EMG) pattern recognition system.

Figure 1 .
Figure 1.Flow diagram of the proposed electromyography (EMG) pattern recognition system.

Figure 2 .
Figure 2. The illustration of the discrete wavelet transform (DWT) at fourth decomposition level.

Figure 2 .
Figure 2. The illustration of the discrete wavelet transform (DWT) at fourth decomposition level.

Figure 3 .
Figure 3.Samples of two schemes: (a) dynamic inertia weight and (b) dynamic crossover rate.

Figure 4 .
Figure 4. Boxplot of BPSODE with five different population sizes across 10 subjects.

Figure 4 .
Figure 4. Boxplot of BPSODE with five different population sizes across 10 subjects.

Figure 5 .
Figure 5. Accuracy of the five different feature selection methods on 10 subjects.

Figure 5 .
Figure 5. Accuracy of the five different feature selection methods on 10 subjects.

Table 2 .
Experimental results of five different feature selection methods on 10 subjects.The best result of each metric is highlighted in bold text.

Table 3 .
p-Value of the t-test on 10 subjects.

Table 4 .
Computational cost of the five different feature selection methods on 10 subjects.In this paper, we proposed BPSODE to solve feature selection problems in EMG signals classification.The BPSODE is the hybridization of BPSO and BDE, which inherits the advantages of both BPSO and BDE in feature selection.From the experiments, it can be inferred that the

Table 4 .
Computational cost of the five different feature selection methods on 10 subjects.