Next Article in Journal
Detection of Wildfire Smoke Images Based on a Densely Dilated Convolutional Network
Previous Article in Journal
An Energy and Area Efficient Carry Select Adder with Dual Carry Adder Cell

Electronics 2019, 8(10), 1130; https://doi.org/10.3390/electronics8101130

Article
A New Quadratic Binary Harris Hawk Optimization for Feature Selection
1
Fakulti Kejuruteraan Elektrik, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100 Durian Tunggal, Melaka, Malaysia
2
Fakulti Kejuruteraan Elektronik dan Kejuruteraan Komputer, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100 Durian Tunggal, Melaka, Malaysia
*
Authors to whom correspondence should be addressed.
Received: 14 September 2019 / Accepted: 30 September 2019 / Published: 7 October 2019

Abstract

:
Harris hawk optimization (HHO) is one of the recently proposed metaheuristic algorithms that has proven to be work more effectively in several challenging optimization tasks. However, the original HHO is developed to solve the continuous optimization problems, but not to the problems with binary variables. This paper proposes the binary version of HHO (BHHO) to solve the feature selection problem in classification tasks. The proposed BHHO is equipped with an S-shaped or V-shaped transfer function to convert the continuous variable into a binary one. Moreover, another variant of HHO, namely quadratic binary Harris hawk optimization (QBHHO), is proposed to enhance the performance of BHHO. In this study, twenty-two datasets collected from the UCI machine learning repository are used to validate the performance of proposed algorithms. A comparative study is conducted to compare the effectiveness of QBHHO with other feature selection algorithms such as binary differential evolution (BDE), genetic algorithm (GA), binary multi-verse optimizer (BMVO), binary flower pollination algorithm (BFPA), and binary salp swarm algorithm (BSSA). The experimental results show the superiority of the proposed QBHHO in terms of classification performance, feature size, and fitness values compared to other algorithms.
Keywords:
feature selection; binary optimization; classification; Harris hawk optimization; quadratic transfer function

1. Introduction

In recent days, the data representation has become one of the essential factors that can significantly affect the performance of classification models. To date, more and more high dimensional data are gathered from the process of data acquisition, which introduces the curse of dimensionality to the data mining tasks [1]. In addition, the presence of redundant and irrelevant features is another issue that degrades the performance of the system and brings additional computational cost. Therefore, feature selection has become a critical step in the data mining process. The main goal of feature selection is to select the best combination of potential features that offers a better understanding of the classification model. Feature selection not only improves the prediction accuracy but also reduces the dimension of data [2,3].
Generally, feature selection can be categorized into two classes: filter and wrapper. The filter method is simple, and it can obtain the results faster. However, the filter method does not dependent on the learning algorithm, thus resulting in unsatisfactory performance [4,5]. As compared to the filter method, the wrapper method can usually provide higher classification accuracy. The wrapper method includes a machine learning algorithm as part of the evaluation, which enables it to achieve better classification results than the filter method. Thus, wrapper methods have more widely used in feature selection works [6,7,8].
Wrapper feature selection is also known as an NP-hard combinatorial optimization problem in which the possible solutions increase exponentially with the number of features. To date, many researchers adopt metaheuristic algorithms (wrapper methods) to tackle the feature selection problem in classification tasks. From the previous works, metaheuristic algorithms were showing excellent performance when dueling with the feature selection problem. For instance, the binary grey wolf optimization (BGWO) was developed to resolve this problem in [9] as a wrapper feature selection method. Additionally, an enhanced version of grey wolf optimizer, namely multi-strategy ensemble grey wolf optimizer (MEGWO) was designed to solve the feature selection problem on real-world applications [10]. The authors proposed the enhanced global best lead strategy, adaptable cooperative strategy, and disperse foraging strategy to boost the performance of GWO. Sindhu et al. [11] introduced an improved sine cosine algorithm (ISCA) that integrated an elitism strategy for feature selection. Jude Hemanth and Anita [12] developed the modified genetic algorithms to evolve the usage of genetic algorithm for tackling the feature selection issue in brain image classification. Moreover, the binary versions of spotted hydra optimizer and multi-verse optimizer were proposed to solve the feature selection problems as the wrapper methods [13,14]. Recently, a chaotic dragonfly algorithm (CDA) was developed for wrapper feature selection [15]
In this paper, we propose the binary version of Harris hawk optimization (HHO) to tackle the feature selection problem in classification tasks. HHO is a recently proposed metaheuristic algorithm in 2019. Generally, HHO mimics the concepts of Harris hawks to explore the prey, surprise pounce, and different attack strategies of hawks in nature. According to the literature, HHO showed superior performance for several benchmark tests compared to other well-established metaheuristic algorithms [16]. Among the competitors, HHO is highly capable of maintaining a well stable balance between exploration (global search) and exploitation (local search), which allows it to score the best properties in optimization tasks. As a bonus, HHO utilizes a series of search strategies in exploitation that enables it to provide a constructive impact on local search. Thus, HHO can be considered as a powerful algorithm for optimization problems. However, HHO is originally designed to solve the continuous optimization problems, which cannot directly apply to binary optimization problems such as feature selection. To the best of our knowledge, there is no binary version of HHO has proposed to solve the feature selection problems in the literature. Moreover, according to the No Free Lunch (NFL) theorem, no universal metaheuristic algorithm was good at solving all the optimization problems [17]. This motivates us to propose a new binary version of HHO in this work.
In this study, we integrate the S-shaped and V-shaped transfer functions into the algorithm to convert the continuous HHO into the binary version (BHHO). Furthermore, we propose another new variant of HHO, namely quadratic binary Harris hawk optimization (QBHHO) for performance enhancement. Unlike BHHO, QBHHO integrates the quadratic transfer function for the conversion. The proposed BHHO and QBHHO algorithms are used to solve the feature selection problems as wrapper methods. Twenty-two datasets collected from UCI machine learning repository are employed to test the performance of proposed algorithms in this work. Moreover, five state-of-the-art methods include binary differential evolution (BDE), genetic algorithm (GA), binary multi-verse optimizer (BMVO), binary flower pollination algorithm (BFPA), and binary salp swarm algorithm (BSSA) are applied to examine the effectiveness of the proposed algorithm in feature selection. The experimental results reveal the superiority of QBHHO not only in the optimal classification performance but also the minimal number of selected features.
The remainder of this paper is organized as follows: Section 2 introduces the standard Harris hawk optimization (HHO) algorithm. Section 3 presents the proposed binary Harris hawk optimization (BHHO) algorithms. In Section 4, the detailed on the quadratic binary Harris hawk optimization (QBHHO) algorithms is outlined. Section 5 demonstrates the application of proposed BHHO and QBHHO algorithms for feature selection. Section 6 discusses the finding of the experiments. Section 7 concludes the findings of this research.

2. Harris Hawk Optimization

Harris hawk optimization (HHO) is a recent metaheuristic algorithm proposed by Heidari and his colleagues in 2019 [16]. HHO mimics the concepts of Harris hawks to explore the prey, surprise pounce, and different attack strategies of Harris hawks in nature. In HHO, the candidate solutions are represented by hawks, while the best solution (nearly optimal solution) is known as prey. The Harris hawks try to track the prey by using their powerful eyes and then perform the surprise pounce to catch the prey detected.
Generally, HHO is modeled into exploitation and exploration phases. The HHO algorithm can be transferred from exploration to exploitation, and then the exploration behavior is changed based on the escaping energy of prey. Mathematically, the escaping energy of prey can be computed as [16]:
E = 2 E 0 ( 1 t T )
E 0 = 2 r 1
where t is the current iteration, T is the maximum number of iterations, E0 is the initial energy randomly generated in [−1, 1], and r is a random number in [0, 1]. Figure 1 shows an illustration of escaping energy of prey over 300 iterations. As can be seen, the escaping energy was showing a decreasing trend. When the escaping energy of prey |E| ≥ 1, HHO allowed the hawks to search globally on different regions. On the contrary, HHO tended to promote the local search around the neighborhood of the best solutions when the escaping energy of prey |E| < 1.

2.1. Exploration Phase

In the exploration phase, the position of the hawk is updated via random location and other hawks as follow [16]:
X ( t + 1 ) = { X k ( t ) r 1 | X k ( t ) 2 r 2 X ( t ) | q 0.5 ( X r ( t ) X m ( t ) ) r 3 ( l b + r 4 ( u b l b ) ) q < 0.5
where X is the position of the hawk, Xk is the position of randomly selected hawk, Xr is the position of prey (global best solution in the entire population), t is the current iteration, ub and lb are the upper and lower boundaries of search space, r1, r2, r3, r4, and q are the five independent random numbers in [0, 1]. The Xm is the mean position of the current population of hawks, and it can be computed using Equation (4).
X m ( t ) = 1 N n = 1 N X n ( t )
where Xn is the n-th hawk in the population, and N is the number of hawks (population size).

2.2. Exploitation Phase

In the exploitation phase, the position of the hawk is updated based on four different situations. This behavior is manipulated based on the escaping energy of prey (E), and the chance of prey in successfully escaping (r < 0.5) or not successfully escaping (r ≥ 0.5) before surprise bounce.

2.2.1. Soft Besiege

The soft besiege happens when r ≥ 0.5 and |E| ≥ 0.5. In this situation, the hawk updates its position using Equation (5):
X ( t + 1 ) = Δ X ( t ) E | J X r ( t ) X ( t ) |
where E is the escaping energy of prey, X is the position of the hawk, t is the current iteration, ΔX is the difference between the position of the prey and current hawk, and J is the jump strength. The ΔX and J are defined as follows [16]:
Δ X ( t ) = X r ( t ) X ( t )
J = 2 ( 1 r 5 )
where r5 is a random number in [0, 1] that changes randomly in each iteration.

2.2.2. Hard Besiege

The HHO performs the hard besiege when r ≥ 0.5 and |E| < 0.5. In this situation, the position of the hawk is updated as follow [16]:
X ( t + 1 ) = X r ( t ) E | Δ X ( t ) |
where X is the position of the hawk, Xr is the position of prey, E is the escaping energy of prey, and ΔX is the difference between the position of the prey and current hawk.

2.2.3. Soft Besiege with Progressive Rapid Dives

The soft besiege with progressive rapid dives is occurred when r < 0.5 and |E| ≥ 0.5. The hawk progressively selects the best possible dive to catch the prey competitively. In this circumstance, the new position of the hawk is generated as follows [16]:
Y = X r ( t ) E | J X r ( t ) X ( t ) |
Z = Y + α × L e v y ( D )
where Y and Z are two newly generated hawks, E is the escaping energy, J is the jump strength, X is the position of the hawk, t is the current iteration, Xr is the position of prey, D is the total number of dimensions, α is a random vector with dimension D, and Levy is the levy flight function that can be computed as:
L e v y ( x ) = 0.01 × μ × σ | v | 1 / β
where u, v are two independent random numbers generated from the normal distribution, and σ is defined as:
σ = ( Γ ( 1 + β ) × sin ( π β 2 ) Γ ( 1 + β 2 ) × β × 2 ( β 1 2 ) ) 1 β
where β is a default constant set to 1.5. In this phase, the position of the hawk is updated as in Equation (13).
X ( t + 1 ) = { Y if F ( Y ) < F ( X ( t ) ) X if F ( Z ) < F ( X ( t ) )
where F(.) is the fitness function, Y and Z are two new solutions obtained from Equations (9) and (10).

2.2.4. Hard Besiege with Progressive Rapid Dives

The last situation is hard besiege with progressive rapid dives, which is performed when r < 0.5 and |E| < 0.5. In this condition, two new solutions are generated as follows [16]:
Y = X r ( t ) E | J X r ( t ) X m ( t ) |
Z = Y + α × L e v y ( D )
where E is the escaping energy, J is the jump strength, Xm is the mean position of the hawks in current population, t is the current iteration, Xr is the position of the prey, D is the total number of dimensions, α is a random vector with dimension D, and Levy is the levy flight function. Afterward, the position of the hawk is updated as:
X ( t + 1 ) = { Y if F ( Y ) < F ( X ( t ) ) X if F ( Z ) < F ( X ( t ) )
where F(.) is the fitness function, Y and Z are two new solutions obtained from Equations (14) and (15).

3. The Proposed Binary Harris Hawk Optimization

Previous work indicates that HHO can usually offer significantly superior results than other well-established optimizers in several benchmark tests. Among rivals, HHO showed highly competitive performance concerning the quality of exploration and exploitation [16]. The main reason for the excellent exploration is due to the different diversification mechanisms that facilitate a high global search capability of HHO in the initial iterations. As for the exploitation, HHO utilizes different Levy flight-based patterns with short length jump and a series of searching strategies to boost the exploitative behavior. Moreover, HHO benefits from the dynamic randomized escaping energy parameter that allows it to retain a smooth transition between local search and global search [16]. These attractive behaviors of HHO motivate us to apply it in the feature selection problem.

3.1. Representation of Solutions

Feature selection is considered as a combinatorial binary optimization problem that represents the solutions on the binary search space. However, HHO is designed to solve the continuous optimization problems, which is not suitable for the feature selection problem. To develop the binary version of HHO, the solutions should be representing in binary form (either 0 or 1). Thus, several modifications are needed to meet the requirement.

3.2. Transformation of Solutions

According to literature, the utilization of transfer function is one of the effective ways to convert continuous optimizer into a binary one. In comparison with other operators, the transfer function is cheaper, simple, and faster, which leads to the ease of implementation [18,19]. In most of the previous studies, researchers employed either S-shaped or V-shaped transfer function to convert the continuous optimizer to binary. Therefore, in this study, we use four S-shaped (S1–S4) and V-shaped (V1–V4) transfer functions to change the continuous HHO into the binary version. Table 1 shows the S-shaped and V-shaped transfer functions with mathematical definitions. The illustrations of S-shaped and V-shaped transfer functions are demonstrated in Figure 2.
By integrating the transfer function into binary HHO (BHHO), the algorithm is able to perform the search on the binary search space. In BHHO, the position of the hawk is updated in two stages. In the first stage, BHHO updates the position of the hawk (Xid(t)) into a new position (∆Xid(t + 1)) similar to HHO. Note that the new position (∆Xid(t + 1)) is presenting in continuous form. In the second stage, the S-shaped or V-shaped transfer function is used to transform the new position into probability value. The new position of the hawk is then updated using Equation (17) or Equation (18). In this way, the position of the hawk can be expressed in binary form.
In S-shaped family, BHHO updates the new position of the hawk as follow [20]:
X i d ( t + 1 ) = { 1 if r a n d ( 0 , 1 ) < T ( Δ X i d ( t + 1 ) ) 0 otherwise
where T(x) is the S-shaped transfer function, rand(0,1) is a random number in [0,1], X is the position of the hawk, i is the order of hawk in the population, d is the dimension, and t is the current iteration.
Unlike the S-shaped transfer function, V-shaped transfer function does not force the search agent into 0 or 1. In V-shaped family, the new position of the hawk is updated as [21]:
X i d ( t + 1 ) = { ¬ X i d ( t ) if r a n d ( 0 , 1 ) < T ( Δ X i d ( t + 1 ) ) X i d ( t ) otherwise
where T(x) is the V-shaped transfer function, rand(0,1) is a random number in [0,1], X is the position of the hawk, i is the order of hawk in the population, d is the dimension, t is the current iteration, and ¬ X is the complement of X.

3.3. Binary Harris Hawk Optimization Algorithm

The pseudocode of BHHO is demonstrated in Algorithm 1.
Algorithm 1. Binary Harris hawk optimization.
Inputs: N and T
1:  Initialize the Xi for N hawks
2:  for (t = 1 to T)
3:   Evaluate the fitness value of the hawks, F(X)
4:   Define the best solution as Xr
5:   for (i = 1 to N)
6:     Compute the E0 and J as shown in (2) and (7), respectively
7:    Update the E using (1)
     // Exploration phase //
8:    if (|E| ≥ 1)
9:       Update the position of the hawk using (3)
10:     Calculate the probability using S-shaped or V-shaped transfer function
11:     Update new position of the hawk using (17) or (18)
     // Exploitation phase //
12:          elseif (|E| < 1)
     // Soft besiege //
13:       if (r ≥ 0.5) and (|E| ≥ 0.5)
14:        Update the position of the hawk as shown in (5)
15:        Calculate the probability using S-shaped or V-shaped transfer function
16:        Update new position of the hawk using (17) or (18)
     // Hard besiege //
17:       elseif (r ≥ 0.5) and (|E| < 0.5)
18:        Update the position of the hawk using (8)
19:        Calculate the probability using S-shaped or V-shaped transfer function
20:        Update new position of the hawk using (17) or (18)
     // Soft besiege with progressive rapid dives //
21:     elseif (r < 0.5) and (|E| ≥ 0.5)
22:        Update the position of the hawk using (13)
23:        Calculate the probability using S-shaped or V-shaped transfer function
24:      Update new position of the hawk using (17) or (18)
     // Hard besiege with progressive rapid dives //
25:     elseif (r < 0.5) and (|E| < 0.5)
26:        Update the position of the hawk using (16)
27:        Calculate the probability using S-shaped or V-shaped transfer function
28:        Update new position of the hawk using (17) or (18)
29:     end if
30:    end if
31:   next i
32:   Update Xr if there is a better solution
33:  next t
Output: Global best solution

4. The Proposed Quadratic Binary Harris Hawk Optimization

In the previous section, the BHHO algorithms have discussed. However, in the experiment, we found that the performance of BHHO was still far from perfect. It shows that BHHO cannot efficiently solve the feature selection problems. From the aforementioned, transfer function is one of the simplest and effective ways to convert the HHO into a binary one. However, the BHHO with S-shaped or V-shaped transfer function might not work effectively in feature selection tasks. In this regard, we propose the new quadratic binary Harris hawk optimization (QBHHO) to enhance the performance of BHHO in current work. Unlike BHHO, QBHHO utilizes the quadratic transfer function to convert the HHO into a binary one. In this work, we propose four quadratic transfer functions (Q1–Q4), as shown in Table 2. Figure 3 presents the illustrations of quadratic transfer functions.
The QBHHO first updates the position of the hawk and then converts the new position into probability value using the quadratic transfer function. Afterward, the following rule is used for updating the hawk’s position [22]:
X i d ( t + 1 ) = { ¬ X i d ( t ) if r a n d ( 0 , 1 ) < T ( Δ X i d ( t + 1 ) ) X i d ( t ) otherwise
where T(x) is the quadratic transfer function, rand(0,1) is a random number in [0,1], X is the position of the hawk, i is the order of hawk in the population, d is the dimension, t is the current iteration, and ¬ X is the complement of X. The pseudocode of QBHHO is demonstrated in Algorithm 2.
Algorithm 2. Quadratic binary Harris hawk optimization.
Inputs: N and T
1:  Initialize the Xi for N hawks
2:  for (t = 1 to T)
3:   Evaluate the fitness value of the hawks, F(X)
4:   Define the best solution as Xr
5:   for (i = 1 to N)
6:        Compute the E0 and J as shown in (2) and (7), respectively
7:    Update the E using (1)
    // Exploration phase //
8:    if (|E| ≥ 1)
9:       Update the position of the hawk using (3)
10:     Calculate the probability using quadratic transfer function
11:     Update new position of the hawk using (19)
     // Exploitation phase //
12:          elseif (|E| < 1)
     // Soft besiege //
13:       if (r ≥ 0.5) and (|E| ≥ 0.5)
14:        Update the position of the hawk as shown in (5)
15:        Calculate the probability using quadratic transfer function
16:        Update new position of the hawk using (19)
     // Hard besiege //
17:       elseif (r ≥ 0.5) and (|E| < 0.5)
18:        Update the position of the hawk using (8)
19:        Calculate the probability using quadratic transfer function
20:        Update new position of the hawk using (19)
     // Soft besiege with progressive rapid dives //
21:     elseif (r < 0.5) and (|E| ≥ 0.5)
22:        Update the position of the hawk as shown in (13)
23:        Calculate the probability using quadratic transfer function
24:        Update new position of the hawk using (19)
     // Hard besiege with progressive rapid dives //
25:     elseif (r < 0.5) and (|E| < 0.5)
26:        Update the position of the hawk using (16)
27:        Calculate the probability using quadratic transfer function
28:        Update new position of the hawk using (19)
29:     end if
30:    end if
31:   next i
32:   Update Xr if there is a better solution
33:  next t
Output: Global best solution

5. Application of Proposed BHHO and QBHHO for Feature Selection

In this section, the application of the proposed BHHO and QBHHO algorithms for the feature selection problems are presented. Generally, the feature selection problem is an NP-hard combinatorial binary optimization problem, in which the possible solutions increase exponentially with the number of features. Let D be the total number of features. The number of possible solutions is 2D-1, which is impractical to perform the search exhaustively. Therefore, we propose the BHHO and QBHHO to automatically find the promising solution (feature subset) that can significantly improve the performance of the classification model.
In feature selection, the solutions are representing in binary form, and they can be either bit ‘0’ or bit ‘1’. Bit ‘1’ indicates that the feature is selected, whereas bit ‘0’ represents the unselected feature [23]. Taking the sample solution (solution consists of 10 features) in Figure 4 as an example, it shows that a total of five features (1st, 3rd, 4th, 7th, and 8th) have been selected.
In wrapper feature selection, a fitness function (objective function) is required to evaluate the individual solution. The primary goal of feature selection is to enhance prediction accuracy and to reduce the number of features. In this work, the fitness function that considers both criteria is utilized, and it is defined as [9]:
Fitness Function = α E r r o r + ( 1 α ) | S | | F |
E r r o r = N o . o f w r o n g l y p r e d i c t e d T o t a l n u m b e r o f i n s t a n c e s
where Error is the error rate computed by a learning algorithm, |S| is the length of feature subset, |F| is the total number of features, and α is a parameter used to control the influence of classification performance and feature size. In Equation (20), the first term is the classification performance, while the second term is the feature reduction.

6. Experiment and Results

6.1. Dataset

In this study, twenty-two benchmark datasets collected from the UCI machine learning repository are used to validate the performances of proposed approaches [24]. Table 3 outlines the datasets used in this work. For each individual dataset, the features are normalized between 0 and 1 to prevent the numerical problem.

6.2. Parameter Settings

In the present study, the k-nearest neighbor (KNN) algorithm with Euclidean distance and k = 5 is used to compute the error rate (first term of the fitness function). Different BHHO and QBHHO algorithms are employed to find the most informative feature subset. The algorithms are repeated 30 times with different random seeds. Besides, the K-fold cross-validation manner is applied to compute the error rate in the fitness function for preventing the overfitting [15]. In each of the 30 runs, the dataset is partitioned into K equal parts. One part is used for the testing set, while the remaining K – 1 parts are used for the training set. The procedure is repeated with K times using different parts for the testing and training sets, and the average results are recorded. Finally, the average statistical measurements are collected throughout 30 independent runs and displayed as the final results. In the previous works, KNN was shown to be faster, simpler, and ease of implement [13,25,26]. Thus, the KNN is chosen as the learning algorithm in this work.
All the experiments are executed on MATLAB 9.3 using a PC with an Intel Core i5-9400F CPU 2.90 GHz and 16.0 GB RAM. In this study, we set K = 10. The number of hawks (population size) is set to 10, and the maximum number of iterations is 100. This hyper-parameter setting was utilized in various previous works [27,28]. The dimension of search space is equal to the total number of features of each dataset. According to [9,28,29], we choose the α = 0.99 since the classification performance is the most important in the current work.

6.3. Evaluation of Proposed BHHO and QBHHO Algorithms

In the first part of the experiment, the performances of the BHHO and QBHHO algorithms are validated on 22 datasets. The commonly used evaluation metrics include the best fitness value, mean fitness value, standard deviation of fitness value (STD), classification accuracy, and the number of selected features (feature size) are measured, and they can be defined as follows [9,30,31]:
Best Fitness = m i n m = 1 R G b m
Mean Fitness = 1 R m = 1 R G b m
STD = m = 1 R ( G b m μ ) 2 R 1
Average   accuracy = 1 R m = 1 R N o . o f c o r r e c t l y p r e d i c t e d m T o t a l n u m b e r o f i n s t a n c e s
Average   feature   size   = 1 R m = 1 R | S | m
where Gb is the global best solution obtained from run m, µ is the mean fitness, |S| is the number of selected features, m is the order of run, and R is the maximum number of runs.
The twelve proposed approaches, BHHO (S1–S4 and V1–V4) and QBHHO (Q1–Q4) are compared to investigate the best binary version of HHO in feature selection. Table 4, Table 5 and Table 6 present the experimental results of the best, mean, and STD fitness values of proposed algorithms. In these tables, the best result is highlighted with bold text. In Table 4, the BHHO-V1 scored the optimal best fitness value on most of the datasets (twelve datasets), followed by QBHHO-Q4 (11 datasets). From Table 5, the best algorithm that contributed to the lowest mean fitness value was found to be QBHHO-Q4 (12 datasets), followed by BHHO-S1 and BHHO-S2 (five datasets). This shows that QBHHO-Q4 provided the highest diversity. On the one hand, BHHO-S3 perceived the most consistent results due to the lowest STD values in eight datasets. Furthermore, the convergence curves of the proposed algorithms on 22 datasets were shown in Figure 5 and Figure 6.
Table 7 and Table 8 outline the result of the average classification accuracy and average feature size of the proposed algorithms. As can be observed, QBHHO-Q4 outperformed other algorithms in finding promising solutions, thus, leading to optimal classification accuracy. Among rivals, QBHHO-Q4 has achieved the highest average classification accuracy in eleven datasets. It shows that the quadratic transfer function can usually overtake S-shaped and V-shaped transfer functions in feature selection. On the other hand, QBHHO-Q2 provided the smallest number of selected features on most of the datasets, followed by BHHO-V4. Even though V-shaped transfer function can significantly reduce the number of features, however, the relevant features are eliminated and thus resulting in unsatisfactory performance. On the whole, QBHHO with quadratic transfer function Q4 is the best binary version of HHO in current work. Therefore, only QBHHO with transfer function Q4 is used in the rest of this paper.

6.4. Comparison with Other Metaheuristic Algorithms

In the second part of the experiment, five recent and popular metaheuristic algorithms, including binary differential evolution (BDE) [32], binary flower pollination algorithm (BFPA) [33], binary multi-verse optimizer (BMVO) [14], binary salp swarm algorithm (BSSA) [28], and genetic algorithm (GA) [34], are applied to examine the efficacy and efficiency of proposed QBHHO in feature selection problem. BDE is a variant of differential evolution (DE) that composes of differentiation, mutation, crossover, and selection process. BFPA is a binary version of flower pollination algorithm (FPA) in which the S2 transfer function is implemented for conversion. BMVO integrates the V4 transfer function to convert the continuous variables into the binary one. BSSA is a binary variant of the salp swarm algorithm (SSA) that implements the V3 transfer function. GA comprises of parent selection, crossover, and mutation operators. We utilize a simple GA with roulette wheel selection and single crossover for comparison. Table 9 lists the parameter settings of the utilized algorithms.
Table 10, Table 11 and Table 12 display the experimental results of the best, mean, and STD fitness values of six different algorithms. In these tables, the best results are bolded. Based on the results obtained, QBHHO was showing competitive performance in feature selection. In comparison with BDE, BFPA, BMVO, BSSA, and GA, QBHHO was highly capable in finding the nearly optimal solution. From Table 10, QBHHO yielded the optimal best fitness value in fourteen datasets, which overwhelmed other competitors in this work. This result proves that the QBHHO with quadratic transfer function is helpful for assisting the algorithm to find the optimal solution. Moreover, QBHHO offered the lowest mean fitness values on most of the datasets. This again validates the efficacy of QBHHO in exploring the untried areas when finding the global optimum. In Table 12, the BFPA perceived the smallest STD value in most cases, which contributed to a high consistent result. However, BFPA cannot find the optimal solution very well and thus leading to an ineffective result.
Figure 7 and Figure 8 demonstrate the convergence curves of six different algorithms on 22 datasets. As can be seen, QBHHO can usually offer a high diversity. Taking dataset 7 (horse colic) and 21 (diabetic) as the examples, QBHHO converged faster to find the promising solution, which overtook other algorithms in feature selection. That is, QBHHO keeps tracking the global optimum, thus leading to a high quality solution. This can be interpreted due to some strong properties of HHO algorithm, and the superior of quadratic transfer function we made in QBHHO algorithm.
Table 13 presents the experimental results of the average classification accuracy of six different algorithms. Note that the best results are highlighted with bold text. According to the result in Table 13, the average classification accuracy obtained by QBHHO was far superior to other competitors in most cases. Out of 22 datasets, QBHHO contributed to the highest average classification accuracy on at least twelve datasets. Figure 9 and Figure 10 exhibit the results of boxplot of six different algorithms on 22 datasets. In these figures, the red line in the box represents the median value, and the symbol ‘+’ denotes the outlier. As can be seen, QBHHO scored the highest median value in most cases, which provided better classification performance than BDE, BMVO, GA, BSSA, and BFPA algorithms.
Table 14 outlines the experimental results of the average feature size of six different algorithms. Inspecting the result in Table 14, QBHHO can often choose fewer features in most cases (15 datasets). Additionally, BSSA selected fewer features on nine datasets. Based on the results obtained, QBHHO was good at finding a smaller subset of relevant features, which led to high classification performance.
Furthermore, we apply the Wilcoxon signed-rank test to examine whether the performance of the proposed QBHHO is significantly better than other algorithms in this work. In the Wilcoxon signed-rank test, if the p-value achieved is less than 0.05, then the performances of the two algorithms are significantly difference; otherwise, the performances of the two algorithms are similar. Table 15 exhibits the result of Wilcoxon signed-rank test with p-values. In this Table, the symbols “w/t/l” indicates that the proposed QBHHO was significantly better to (win), equal to (tie), and significantly worse to (lose) other algorithms. Note that the results with the p-value greater than 0.05 are underlined. Inspecting the result in Table 15, the performance of QBHHO was significantly better than other algorithms in most datasets. The results again verify the superiority of QBHHO to solve the feature selection problem in classification tasks. To sum up, the experiments show the excellent properties of QBHHO in terms of classification accuracy and feature size. As an instant conclusion, QBHHO is a useful tool, and it is appropriate to apply in rehabilitation, clinical, and engineering applications.

7. Conclusions

In this paper, the BHHO and QBHHO algorithms are proposed to tackle the feature selection problem in classification tasks. The BHHO integrated S-shaped or V-shaped transfer function to convert the continuous HHO into the binary version. On the one hand, the quadratic transfer function is introduced in QBHHO to enhance the performance of BHHO in feature selection. The proposed algorithms are validated on 22 benchmark datasets in the UCI machine learning repository. The performances of proposed algorithms are evaluated based on the best fitness value, mean fitness value, standard deviation of fitness value, classification accuracy, and feature size. Among the BHHO and QBHHO algorithms, the QBHHO with quadratic transfer function Q4 offered the optimal performance in current work. Furthermore, the performance of QBHHO is compared with other algorithms, include BDE, BFPA, BMVO, BSSA, and GA. The experimental results show that our proposed QBHHO can usually achieve the highest classification accuracy as well as the smallest feature size when dueling with feature selection tasks. All in all, QBHHO is a powerful tool to solve the feature selection problem in classification tasks.
In the future, QBHHO can be applied to other feature selection and real-world problems. In addition, the classifiers such as support vector machine (SVM) and neural network (NN) can be implemented to enhance the performance of the current algorithm. Moreover, QBHHO can be employed to solve the other binary optimization problems such as knapsack problem and optimized neural network. Lastly, the quadratic transfer function can be integrated into other metaheuristic algorithms to solve binary optimization problems.

Author Contributions

Conceptualization: J.T.; formal analysis: J.T.; funding acquisition: J.T. and A.R.A.; investigation: J.T.; methodology: J.T.; resources: J.T.; software: J.T.; supervision: A.R.A.; validation: J.T.; visualization: J.T.; writing—original draft: J.T.; writing—review and editing: J.T., A.R.A., and N.M.S.

Funding

This research and the Article Processing Charge were funded by the Skim Zamalah UTeM.

Acknowledgments

The authors would like to thank the Skim Zamalah UTeM for funding this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tran, B.; Xue, B.; Zhang, M. Genetic programming for multiple-feature construction on high-dimensional classification. Pattern Recognit. 2019, 93, 404–417. [Google Scholar] [CrossRef]
  2. Qiu, C. A Novel Multi-Swarm Particle Swarm Optimization for Feature Selection. Genet. Program. Evol. Mach. 2019, 1–27. [Google Scholar] [CrossRef]
  3. Jia, H.; Li, J.; Song, W.; Peng, X.; Lang, C.; Li, Y. Spotted Hyena Optimization Algorithm With Simulated Annealing for Feature Selection. IEEE Access 2019, 7, 71943–71962. [Google Scholar] [CrossRef]
  4. Hu, L.; Gao, W.; Zhao, K.; Zhang, P.; Wang, F. Feature selection considering two types of feature relevancy and feature interdependency. Expert Syst. Appl. 2018, 93, 423–434. [Google Scholar] [CrossRef]
  5. Yan, K.; Ma, L.; Dai, Y.; Shen, W.; Ji, Z.; Xie, D. Cost-sensitive and sequential feature selection for chiller fault detection and diagnosis. Int. J. Refrig. 2018, 86, 401–409. [Google Scholar] [CrossRef]
  6. Bharti, K.K.; Singh, P.K. Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl. Soft Comput. 2016, 43, 20–34. [Google Scholar] [CrossRef]
  7. Emary, E.; Zawbaa, H.M. Feature selection via Lèvy Antlion optimization. Pattern Anal. Appl. 2018, 22, 857–876. [Google Scholar] [CrossRef]
  8. Too, J.; Abdullah, A.R.; Saad, N.M. Hybrid Binary Particle Swarm Optimization Differential Evolution-Based Feature Selection for EMG Signals Classification. Axioms 2019, 8, 79. [Google Scholar] [CrossRef]
  9. Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary grey wolf optimization approaches for feature selection. Neurocomputing 2016, 172, 371–381. [Google Scholar] [CrossRef]
  10. Tu, Q.; Chen, X.; Liu, X. Multi-strategy ensemble grey wolf optimizer and its application to feature selection. Appl. Soft Comput. 2019, 76, 16–30. [Google Scholar] [CrossRef]
  11. Sindhu, R.; Ngadiran, R.; Yacob, Y.M.; Zahri, N.A.H.; Hariharan, M. Sine–cosine algorithm for feature selection with elitism strategy and new updating mechanism. Neural Comput. Appl. 2017, 28, 2947–2958. [Google Scholar] [CrossRef]
  12. Hemanth, D.J.; Anitha, J. Modified Genetic Algorithm approaches for classification of abnormal Magnetic Resonance Brain tumour images. Appl. Soft Comput. 2019, 75, 21–28. [Google Scholar] [CrossRef]
  13. Kumar, V.; Kaur, A. Binary spotted hyena optimizer and its application to feature selection. J. Ambient. Intell. Humaniz. Comput. 2019, 1–21. [Google Scholar] [CrossRef]
  14. Al-Madi, N.; Faris, H.; Mirjalili, S. Binary multi-verse optimization algorithm for global optimization and discrete problems. Int. J. Mach. Learn. Cybern. 2019, 1–21. [Google Scholar] [CrossRef]
  15. Sayed, G.I.; Tharwat, A.; Hassanien, A.E. Chaotic dragonfly algorithm: An improved metaheuristic algorithm for feature selection. Appl. Intell. 2019, 49, 188–205. [Google Scholar] [CrossRef]
  16. Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Futur. Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
  17. Wolpert, D.; Macready, W. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef]
  18. Mirjalili, S.; Lewis, A. S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization. Swarm Evol. Comput. 2013, 9, 1–14. [Google Scholar] [CrossRef]
  19. Saremi, S.; Mirjalili, S.; Lewis, A. How important is a transfer function in discrete heuristic algorithms. Neural Comput. Applic. 2015, 26, 625–640. [Google Scholar] [CrossRef]
  20. Rodrigues, D.; Pereira, L.A.; Nakamura, R.Y.; Costa, K.A.; Yang, X.-S.; Souza, A.N.; Papa, J.P. A wrapper approach for feature selection based on Bat Algorithm and Optimum-Path Forest. Expert Syst. Appl. 2014, 41, 2250–2258. [Google Scholar] [CrossRef]
  21. Rashedi, E.; Nezamabadi-pour, H.; Saryazdi, S. BGSA: Binary gravitational search algorithm. Nat Comput 2010, 9, 727–745. [Google Scholar] [CrossRef]
  22. Jordehi, A.R. Binary particle swarm optimisation with quadratic transfer function: A new binary optimisation algorithm for optimal scheduling of appliances in smart homes. Appl. Soft Comput. 2019, 78, 465–480. [Google Scholar] [CrossRef]
  23. Hancer, E.; Xue, B.; Karaboga, D.; Zhang, M. A binary ABC algorithm based on advanced similarity scheme for feature selection. Appl. Soft Comput. 2015, 36, 334–348. [Google Scholar] [CrossRef]
  24. UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 24 March 2019).
  25. Xue, B.; Zhang, M.; Browne, W.N. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Appl. Soft Comput. 2014, 18, 261–276. [Google Scholar] [CrossRef]
  26. Mafarja, M.; Aljarah, I.; Faris, H.; Hammouri, A.I.; Al-Zoubi, A.M.; Mirjalili, S. Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst. Appl. 2019, 117, 267–286. [Google Scholar] [CrossRef]
  27. Mafarja, M.; Aljarah, I.; Heidari, A.A.; Faris, H.; Fournier-Viger, P.; Li, X.; Mirjalili, S. Binary dragonfly optimization for feature selection using time-varying transfer functions. Knowl. Based Syst. 2018, 161, 185–204. [Google Scholar] [CrossRef]
  28. Faris, H.; Mafarja, M.M.; Heidari, A.A.; Aljarah, I.; Al-Zoubi, A.M.; Mirjalili, S.; Fujita, H. An efficient binary Salp Swarm Algorithm with crossover scheme for feature selection problems. Knowl. Based Syst. 2018, 154, 43–67. [Google Scholar] [CrossRef]
  29. Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary ant lion approaches for feature selection. Neurocomputing 2016, 213, 54–65. [Google Scholar] [CrossRef]
  30. Zawbaa, H.M.; Emary, E.; Grosan, C. Feature Selection via Chaotic Antlion Optimization. PLoS ONE 2016, 11, e0150652. [Google Scholar] [CrossRef]
  31. Too, J.; Abdullah, A.R.; Saad, N.M. A New Co-Evolution Binary Particle Swarm Optimization with Multiple Inertia Weight Strategy for Feature Selection. Informatics 2019, 6, 21. [Google Scholar] [CrossRef]
  32. Zorarpacı, E.; Özel, S.A. A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst. Appl. 2016, 62, 91–103. [Google Scholar] [CrossRef]
  33. Rodrigues, D.; Yang, X.S.; De Souza, A.N.; Papa, J.P. Binary Flower Pollination Algorithm and Its Application to Feature Selection. In Recent Advances in Swarm Intelligence and Evolutionary Computation; Studies in Computational Intelligence; Springer: Cham, Switzerland, 2015; pp. 85–100. ISBN 978-3-319-13825-1. [Google Scholar]
  34. De Stefano, C.; Fontanella, F.; Marrocco, C.; Di Freca, A.S. A GA-based feature selection approach with an application to handwritten character recognition. Pattern Recognit. Lett. 2014, 35, 130–141. [Google Scholar] [CrossRef]
Figure 1. An illustration of escaping energy.
Figure 1. An illustration of escaping energy.
Electronics 08 01130 g001
Figure 2. Sample S-shaped and V-shaped transfer functions. (a) S-shaped transfer fFunctions and (b) V-shaped transfer functions.
Figure 2. Sample S-shaped and V-shaped transfer functions. (a) S-shaped transfer fFunctions and (b) V-shaped transfer functions.
Electronics 08 01130 g002
Figure 3. Sample quadratic transfer functions with xmax = 6.
Figure 3. Sample quadratic transfer functions with xmax = 6.
Electronics 08 01130 g003
Figure 4. Sample solution with 10 dimensions.
Figure 4. Sample solution with 10 dimensions.
Electronics 08 01130 g004
Figure 5. Convergence curves of proposed algorithms on datasets 1–11.
Figure 5. Convergence curves of proposed algorithms on datasets 1–11.
Electronics 08 01130 g005
Figure 6. Convergence curves of proposed algorithms on datasets 12–22.
Figure 6. Convergence curves of proposed algorithms on datasets 12–22.
Electronics 08 01130 g006
Figure 7. Convergence curves of six different algorithms on datasets 1–11.
Figure 7. Convergence curves of six different algorithms on datasets 1–11.
Electronics 08 01130 g007
Figure 8. Convergence curves of six different algorithms on datasets 12–22.
Figure 8. Convergence curves of six different algorithms on datasets 12–22.
Electronics 08 01130 g008
Figure 9. Boxplot of six different algorithms on datasets 1–11.
Figure 9. Boxplot of six different algorithms on datasets 1–11.
Electronics 08 01130 g009
Figure 10. Boxplot of six different algorithms on datasets 12–22.
Figure 10. Boxplot of six different algorithms on datasets 12–22.
Electronics 08 01130 g010
Table 1. The utilized S-shaped and V-shaped transfer functions.
Table 1. The utilized S-shaped and V-shaped transfer functions.
S-Shaped FamilyTransfer FunctionV-Shaped FamilyTransfer Function
S1 T ( x ) = 1 1 + e 2 x V1 T ( x ) = | erf ( π 2 x ) |
S2 T ( x ) = 1 1 + e x V2 T ( x ) = | tanh ( x ) |
S3 T ( x ) = 1 1 + e ( x / 2 ) V3 T ( x ) = | ( x ) / 1 + x 2 |
S4 T ( x ) = 1 1 + e ( x / 3 ) V4 T ( x ) = | 2 π arc tan ( π 2 x ) |
Table 2. The utilized quadratic transfer functions.
Table 2. The utilized quadratic transfer functions.
NameTransfer Function
Q1 T ( x ) = { | x ( 0.5 x max ) | , if x < 0.5 x max 1 , otherwise
Q2 T ( x ) = { ( x ( 0.5 x max ) ) 2 , if x < 0.5 x max 1 , otherwise
Q3 T ( x ) = { ( x ( 0.5 x max ) ) 3 , if x < 0.5 x max 1 , otherwise
Q4 T ( x ) = { ( x ( 0.5 x max ) ) 1 / 2 , if x < 0.5 x max 1 , otherwise
Table 3. The list of used datasets.
Table 3. The list of used datasets.
No.DatasetNumber of InstancesNumber of Features
1Glass21410
2Hepatitis15519
3Iris1504
4Lymphography14818
5Primary Tumor33917
6Soybean30735
7Horse Colic36827
8Ionosphere35134
9Zoo10116
10Wine17813
11Breast Cancer W6999
12Lung Cancer3256
13Musk 1476166
14Arrhythmia452279
15Dermatology36634
16SPECT Heart26722
17Libras Movement36090
18ILPD58310
19Seeds2107
20LSVT126310
21Diabetic115119
22Parkinson756754
Table 4. Experimental result of the best fitness value of the proposed algorithms.
Table 4. Experimental result of the best fitness value of the proposed algorithms.
DatasetBest Fitness Value
S1S2S3S4V1V2V3V4Q1Q2Q3Q4
10.01040.01040.01040.01040.01040.01040.01040.01040.01040.01040.01040.0104
20.11800.12140.12090.12350.11430.12040.11380.11430.11430.11430.11430.1143
30.03140.03140.03140.03140.03140.03140.03140.03140.03140.03140.03140.0314
40.11980.12040.12910.12690.12300.12300.12300.12300.11980.12300.12300.1230
50.57710.57710.57710.58490.57950.57710.58850.58790.58850.59080.58850.5765
60.21560.22680.23410.23370.21100.20770.20770.21710.21710.22150.21430.2116
70.14720.16210.14630.15590.11900.11900.11900.11900.11900.11900.11900.1190
80.07190.09030.09540.09660.06370.06370.06590.06370.06590.06910.06880.0637
90.03530.03410.03350.03350.03340.03340.03340.02480.03340.03350.03410.0334
100.01120.01040.01120.01040.01550.01040.01040.01040.01040.01040.01120.0104
110.03030.03030.03030.03030.03030.03030.03030.03030.03030.03030.03030.0303
120.20320.23560.23420.26830.19850.16700.19940.19870.19890.19890.16800.1670
130.10800.10510.10800.10800.07520.07610.08140.07910.08220.07960.07860.0843
140.33590.34880.35620.36150.25350.25550.25330.25790.26000.27540.26200.2754
150.01780.01720.01720.01720.01640.01420.01420.01480.01420.01540.01660.0148
160.14630.14630.14920.14590.13690.14120.14790.14500.14450.14450.14500.1412
170.21100.21240.21420.21070.18000.18060.18550.18570.18840.19140.18550.1925
180.25250.25250.25420.25420.25250.25250.25250.25250.25250.25250.25420.2525
190.04670.04670.04670.04670.04670.04670.04670.04670.04670.04670.04670.0467
200.09700.09720.09600.10330.04160.05100.05020.04980.05950.05820.05810.0505
210.28240.28240.28950.28950.27550.27930.27590.27590.27590.27810.27810.2755
220.09640.09850.09770.09530.07750.07450.08380.07320.07560.07750.07250.0843
Table 5. Experimental result of the mean fitness value of the proposed algorithms.
Table 5. Experimental result of the mean fitness value of the proposed algorithms.
DatasetMean Fitness Value
S1S2S3S4V1V2V3V4Q1Q2Q3Q4
10.02370.02500.02730.02700.02320.02320.02320.02320.02320.02330.02320.0232
20.13780.13820.13940.14020.13320.13270.13110.13310.13170.13530.13940.1276
30.03780.03780.03780.03780.03780.03780.03780.03780.03780.03790.03780.0378
40.14640.15000.15190.15460.15020.14690.14960.15130.15330.15540.16020.1474
50.59380.59780.60230.60300.60670.60730.60640.61000.60490.61140.60950.6019
60.24160.24840.25150.25160.23200.23230.23870.23930.23470.24000.24260.2275
70.17090.18650.18580.18850.13230.13040.12860.13200.13320.14120.14380.1272
80.09350.10190.10380.10360.07390.07400.07280.07250.07390.07900.07990.0717
90.04740.04620.04730.04710.04970.05130.05350.04840.04960.05410.05410.0477
100.01750.01760.01780.01770.02000.01990.01880.01880.01870.02010.02050.0180
110.03190.03180.03180.03180.03230.03240.03190.03220.03220.03240.03260.0319
120.31800.30890.31090.32140.25610.25300.25170.25380.25510.26170.25320.2431
130.12140.12150.12560.12480.10050.09970.10110.10430.10370.09830.09860.0951
140.35900.36860.37140.37390.28140.28350.27980.28210.28530.30200.30070.2950
150.02160.02230.02420.02510.02060.02210.02150.02360.02110.02340.02210.0199
160.16430.16330.16510.16570.16320.16030.16350.16670.17120.17230.16920.1540
170.22560.22720.22630.22760.20360.20540.20650.21000.20540.20940.20430.2069
180.26770.26710.26750.26780.26790.26980.27050.26930.27060.27330.27390.2683
190.05270.05270.05270.05270.05300.05290.05270.05270.05290.05310.05390.0527
200.11930.12000.11900.12020.08060.07790.08110.07900.08300.08450.08600.0854
210.30070.30300.30510.30550.29080.29090.29010.28930.28920.29460.29640.2873
220.10370.10420.10440.10390.09210.09030.09430.09440.09070.09160.09150.0936
Table 6. Experimental result of the STD of the proposed algorithms.
Table 6. Experimental result of the STD of the proposed algorithms.
DatasetStandard Deviation of Fitness Value
S1S2S3S4V1V2V3V4Q1Q2Q3Q4
10.00750.00910.00980.00950.00700.00700.00700.00700.00700.00710.00700.0070
20.00700.00780.00800.00650.01080.00760.00870.00870.01050.01060.01080.0085
30.00330.00330.00330.00330.00330.00330.00330.00330.00330.00340.00330.0033
40.01170.01100.01090.01130.01390.01120.01120.01330.01330.01830.01980.0115
50.00730.00870.00770.00760.00930.01140.00810.00940.00860.01140.01260.0098
60.01210.00960.00900.01100.01310.01200.01260.01270.01180.01400.01830.0103
70.01330.01190.01150.01090.01170.01220.01000.01050.01040.01790.01800.0066
80.00680.00540.00480.00450.00540.00590.00470.00460.00520.00420.00880.0040
90.00640.00780.00840.00780.01230.00960.00960.01070.01030.01230.01020.0091
100.00320.00300.00300.00350.00340.00420.00370.00420.00430.00460.00480.0042
110.00080.00090.00100.00090.00120.00110.00090.00100.00090.00140.00140.0009
120.03930.03540.03500.02940.03750.04010.03280.03590.04150.04430.03890.0314
130.00590.00620.00700.00800.00910.01250.01150.01040.00800.01040.01250.0077
140.00940.00810.00580.00610.01830.01720.01840.01460.01430.01490.01690.0128
150.00220.00250.00280.00280.00270.00430.00350.00480.00420.00440.00390.0029
160.00730.00770.00710.00710.01870.01190.01370.01730.02050.01990.01980.0076
170.00700.00690.00810.00810.01070.01070.01130.00960.00930.00760.01220.0079
180.00770.00700.00620.00640.00720.00830.00780.00730.00790.00890.00850.0080
190.00300.00300.00300.00300.00300.00300.00300.00300.00300.00280.00370.0030
200.01200.01180.00910.01030.01550.01310.01370.01320.01570.01110.01120.0143
210.00840.00790.00710.00680.00760.00600.00620.00670.00740.01020.00940.0065
220.00360.00300.00390.00450.00790.00630.00670.00830.00660.00620.00810.0054
Table 7. Experimental result of the average classification accuracy of the proposed algorithms.
Table 7. Experimental result of the average classification accuracy of the proposed algorithms.
DatasetAverage Classification Accuracy
S1S2S3S4V1V2V3V4Q1Q2Q3Q4
10.97710.97590.97370.97400.97760.97760.97760.97760.97760.97750.97760.9776
20.86470.86420.86330.86220.86760.86800.87000.86760.86910.86530.86090.8736
30.96640.96640.96640.96640.96640.96640.96640.96640.96640.96620.96640.9664
40.85880.85500.85290.85000.85190.85480.85240.85050.84880.84670.84170.8545
50.40740.40320.39860.39750.39230.39180.39270.38880.39440.38780.39010.3978
60.76310.75560.75160.75130.76920.76870.76220.76130.76680.76160.75840.7742
70.82970.81460.81570.81320.86710.86910.87090.86750.86620.85810.85550.8723
80.90820.90040.89900.89920.92650.92630.92750.92790.92640.92130.92050.9289
90.95800.95900.95730.95770.95430.95270.95030.95570.95430.94970.95000.9563
100.98800.98780.98760.98730.98450.98430.98550.98550.98570.98410.98410.9867
110.97360.97350.97350.97360.97230.97260.97320.97290.97270.97230.97200.9732
120.68330.69330.69110.68000.74220.74560.74670.74440.74330.73670.74560.7556
130.88330.88300.87860.87910.90040.90140.89980.89620.89720.90300.90250.9065
140.64010.63170.62990.62740.71620.71410.71770.71550.71220.69530.69650.7027
150.98530.98410.98160.98070.98310.98180.98210.98010.98290.98060.98200.9842
160.84010.84080.83850.83790.83820.84150.83820.83470.83010.82880.83220.8481
170.77750.77560.77670.77540.79590.79440.79310.78960.79430.79020.79540.7931
180.73390.73470.73420.73390.73300.73070.73010.73160.73000.72700.72630.7327
190.95100.95100.95100.95100.95060.95080.95100.95100.95080.95050.94950.9510
200.88530.88440.88530.88390.91940.92220.91890.92080.91690.91580.91420.9153
210.69980.69770.69560.69520.70900.70870.70960.71000.71040.70460.70270.7126
220.90180.90080.90030.90040.90890.91030.90670.90660.91040.90980.90950.9083
Table 8. Experimental result of the average feature size of the proposed algorithms.
Table 8. Experimental result of the average feature size of the proposed algorithms.
DatasetAverage Number of Selected Features
S1S2S3S4V1V2V3V4Q1Q2Q3Q4
11.031.131.231.231.071.071.071.071.071.031.071.07
27.177.177.837.204.033.834.533.834.003.733.174.63
31.831.831.831.831.831.831.831.831.831.801.831.83
412.0011.5311.2310.906.505.576.235.876.476.436.206.17
512.0711.9711.7711.138.708.808.778.279.139.009.639.67
624.9022.3719.4318.9712.4311.6311.7010.4313.2013.7712.2714.07
76.378.139.279.632.072.032.102.102.071.932.002.07
88.8011.1012.7712.933.633.473.733.873.603.933.934.43
99.378.978.178.277.237.076.877.207.006.837.337.20
107.377.207.276.576.035.705.775.735.905.676.176.23
115.174.975.075.074.374.674.904.804.604.534.434.87
1225.4329.4028.8325.905.036.134.774.535.505.677.076.27
1396.4094.7089.8785.6330.9334.3030.7325.9732.2338.3734.6040.60
1473.23110.50137.83139.5713.2311.9310.4711.209.978.678.1018.60
1524.0322.3320.3720.5313.4713.7712.8713.0713.9314.4314.6014.27
1613.3012.5011.4711.706.677.537.276.806.736.336.737.93
1747.9345.2347.0346.8313.7716.2315.0015.5715.6314.9015.5319.40
184.304.504.374.373.533.273.233.503.303.002.903.70
192.932.932.932.932.872.902.932.932.902.872.772.93
20176.60174.67167.60162.1327.1028.8725.2720.1725.3735.4730.5046.67
216.777.037.137.035.034.804.804.304.704.203.935.30
22487.00454.47425.70404.87144.53115.10144.20141.67145.67172.83144.47210.00
Table 9. Parameter settings of the utilized algorithms.
Table 9. Parameter settings of the utilized algorithms.
AlgorithmParameterValue
BDENumber of vectors, N10
Maximum number of generations, T100
Crossover rate, CR0.9
BFPANumber of flowers, N10
Maximum number of iterations, T100
Switch probability, P0.8
BSSANumber of salps, N10
Maximum number of iterations, T100
BMVONumber of universes, N10
Maximum number of iterations, T100
Coefficient, WEP[0.02, 1]
GANumber of chromosomes, N10
Maximum number of generations, T100
Crossover rate, CR0.8
Mutation rate, MR0.01
Table 10. Experimental result of the best fitness value of six different algorithms.
Table 10. Experimental result of the best fitness value of six different algorithms.
DatasetBest Fitness Value
QBHHOBDEBFPABMVOBSSAGA
10.01040.03970.01040.01040.01040.0104
20.11430.13670.13010.12200.12650.1220
30.03140.03140.03140.03140.03140.0314
40.12300.12740.12150.13710.13770.1258
50.57650.59210.58010.59630.60350.5795
60.21160.22520.23240.24550.25680.2037
70.11900.18040.18010.15380.11900.1217
80.06370.09460.09750.07220.06940.0643
90.03340.04590.03530.03410.04330.0341
100.01040.01700.01040.01630.01040.0112
110.03030.03110.03030.03060.03030.0314
120.16700.23560.23810.26790.26540.1354
130.08430.09680.10800.10800.10770.0504
140.27540.36440.37110.34470.32910.3038
150.01480.02210.01500.02370.02370.0121
160.14120.15060.15060.15060.15060.1374
170.19250.21160.21210.21370.21130.1832
180.25250.25420.25250.25250.25730.2542
190.04670.04670.04670.04670.04670.0467
200.05050.09850.08870.09460.09930.0528
210.27550.30800.29270.28520.28480.2778
220.08430.09060.09600.09130.10060.0629
Table 11. Experimental result of the mean fitness value of six different algorithms.
Table 11. Experimental result of the mean fitness value of six different algorithms.
DatasetMean Fitness Value
QBHHOBDEBFPABMVOBSSAGA
10.02320.06260.04440.02590.02320.0301
20.12760.15340.14180.13900.14220.1409
30.03780.03870.03780.03780.03780.0379
40.14740.16620.14950.15800.17320.1563
50.60190.61220.59740.61010.63250.6058
60.22750.25800.24850.26330.28130.2230
70.12720.23900.20360.17360.14680.1469
80.07170.11330.11000.09550.08570.0814
90.04770.06150.04940.05220.06630.0544
100.01800.02570.01900.01960.02490.0214
110.03190.03360.03190.03210.03350.0331
120.24310.33280.32360.32280.32450.2377
130.09510.12260.12610.12430.13260.0746
140.29500.37860.37880.36410.34830.3269
150.01990.03170.02280.02910.03870.0204
160.15400.17800.16620.16750.18540.1618
170.20690.22830.22780.22620.22850.2059
180.26830.28930.26710.27020.27430.2737
190.05270.05880.05270.05270.05390.0553
200.08540.11660.12360.11550.12360.0839
210.28730.32470.30880.30100.30250.3048
220.09360.10000.10510.10500.10910.0780
Table 12. Experimental result of the STD of six different algorithms.
Table 12. Experimental result of the STD of six different algorithms.
DatasetStandard Deviation of Fitness Value
QBHHOBDEBFPABMVOBSSAGA
10.00700.01330.01240.00930.00700.0142
20.00850.00890.00720.00740.00880.0108
30.00330.00420.00330.00330.00330.0034
40.01150.01400.01140.00990.01290.0134
50.00980.01490.00870.00890.01360.0121
60.01030.01730.01090.01050.01570.0121
70.00660.02510.01160.01410.01940.0143
80.00400.00770.00460.00790.00700.0087
90.00910.00840.00630.00890.01100.0119
100.00420.00590.00320.00300.00610.0045
110.00090.00160.00080.00100.00150.0014
120.03140.05100.03380.03150.03020.0445
130.00770.01000.00830.00700.01030.0138
140.01280.00800.00440.00860.00850.0147
150.00290.00810.00300.00320.00660.0041
160.00760.00990.00670.00910.01170.0165
170.00790.01030.00930.00740.00820.0114
180.00800.01590.00720.00670.00650.0104
190.00300.00760.00300.00300.00370.0052
200.01430.01020.01050.01100.01300.0148
210.00650.00970.00570.00790.00750.0124
220.00540.00430.00460.00470.00390.0074
Table 13. Experimental result of the average classification accuracy of six different algorithms.
Table 13. Experimental result of the average classification accuracy of six different algorithms.
DatasetAverage Classification Accuracy
QBHHOBDEBFPABMVOBSSAGA
10.97760.94140.95760.97510.97760.9714
20.87360.85020.86130.86310.85800.8611
30.96640.96580.96640.96640.96640.9664
40.85450.83930.85570.84550.82900.8469
50.39780.38910.40360.39020.36670.3939
60.77420.74730.75570.73890.72110.7799
70.87230.76360.79880.82740.85280.8529
80.92890.89090.89350.90630.91460.9207
90.95630.94570.95600.95230.93830.9500
100.98670.98060.98610.98550.98020.9835
110.97320.97320.97380.97290.97120.9726
120.75560.67000.67890.67780.67330.7633
130.90650.88350.87890.87870.86910.9291
140.70270.62440.62350.63550.64920.6741
150.98420.97560.98360.97570.96640.9840
160.84810.82720.83790.83580.81760.8409
170.79310.77620.77590.77570.77170.7965
180.73270.71240.73520.73050.72580.7275
190.95100.94590.95100.95100.94970.9484
200.91530.88920.88140.88750.87690.9189
210.71260.67760.69240.69920.69650.6961
220.90830.90660.90020.89840.89300.9261
Table 14. Experimental result of the average feature size of six different algorithms.
Table 14. Experimental result of the average feature size of six different algorithms.
DatasetAverage Number of Selected Features
QBHHOBDEBFPABMVOBSSAGA
11.074.572.401.201.071.80
24.639.638.636.573.176.50
31.831.931.831.831.831.87
46.1712.8011.909.077.038.50
59.6712.6011.9310.809.439.80
614.0727.6022.9716.8018.1017.67
72.0713.3312.007.402.803.33
84.4317.8015.709.203.939.77
97.2012.409.277.978.407.80
106.238.406.836.806.906.57
114.876.405.374.804.505.40
126.2734.0331.7721.136.3718.83
1340.60120.70104.2370.6750.4073.60
1418.60189.53169.4090.9328.53119.30
1514.2725.6722.2717.3718.3715.30
167.9315.2012.6310.7310.609.40
1719.4060.3353.6037.9022.0039.50
183.704.574.973.372.803.93
192.933.632.932.932.832.97
2046.67214.37190.10129.0355.47112.40
215.3010.478.276.003.877.43
22210.00569.33477.77338.67241.80364.70
Table 15. P-values of Wilcoxon signed-rank test.
Table 15. P-values of Wilcoxon signed-rank test.
DatasetP-Value
BDEBFPABMVOBSSAGA
10.000001.00 × 10−50.062501.000000.01563
20.000001.00 × 10−51.00 × 10−50.000005.00 × 10−5
30.015631.000001.000001.000001.00000
41.00 × 10−50.130542.00 × 10−50.000000.00029
50.004110.065030.000470.000000.14430
60.000000.000000.000000.000000.05982
70.000000.000000.000009.00 × 10−52.00 × 10−5
80.000000.000000.000000.000002.00 × 10−5
91.00 × 10−50.015570.006050.000000.00203
106.00 × 10−50.138390.117754.00 × 10−50.00772
116.00 × 10−51.000000.123784.00 × 10−50.00028
120.000000.000000.000000.000000.92625
130.000000.000000.000000.000001.00 × 10−5
140.000000.000000.000000.000000.00000
150.000000.000710.000000.000000.52355
160.000001.00 × 10−50.000000.000000.04429
170.000000.000000.000000.000000.44051
183.00 × 10−50.668820.071040.000330.01761
194.00 × 10−51.000001.000000.062500.00098
200.000000.000000.000000.000000.46528
210.000000.000000.000000.000002.00 × 10−5
220.000110.000000.000000.000000.00000
w/t/l22/0/015/7/016/6/019/3/012/7/3

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Back to TopTop