A New Quadratic Binary Harris Hawk Optimization for Feature Selection

Harris hawk optimization (HHO) is one of the recently proposed metaheuristic algorithms that has proven to be work more effectively in several challenging optimization tasks. However, the original HHO is developed to solve the continuous optimization problems, but not to the problems with binary variables. This paper proposes the binary version of HHO (BHHO) to solve the feature selection problem in classification tasks. The proposed BHHO is equipped with an S-shaped or V-shaped transfer function to convert the continuous variable into a binary one. Moreover, another variant of HHO, namely quadratic binary Harris hawk optimization (QBHHO), is proposed to enhance the performance of BHHO. In this study, twenty-two datasets collected from the UCI machine learning repository are used to validate the performance of proposed algorithms. A comparative study is conducted to compare the effectiveness of QBHHO with other feature selection algorithms such as binary differential evolution (BDE), genetic algorithm (GA), binary multi-verse optimizer (BMVO), binary flower pollination algorithm (BFPA), and binary salp swarm algorithm (BSSA). The experimental results show the superiority of the proposed QBHHO in terms of classification performance, feature size, and fitness values compared to other algorithms.


Introduction
In recent days, the data representation has become one of the essential factors that can significantly affect the performance of classification models. To date, more and more high dimensional data are gathered from the process of data acquisition, which introduces the curse of dimensionality to the data mining tasks [1]. In addition, the presence of redundant and irrelevant features is another issue that degrades the performance of the system and brings additional computational cost. Therefore, feature selection has become a critical step in the data mining process. The main goal of feature selection is to select the best combination of potential features that offers a better understanding of the classification model. Feature selection not only improves the prediction accuracy but also reduces the dimension of data [2,3].
Generally, feature selection can be categorized into two classes: filter and wrapper. The filter method is simple, and it can obtain the results faster. However, the filter method does not dependent on the learning algorithm, thus resulting in unsatisfactory performance [4,5]. As compared to the filter method, the wrapper method can usually provide higher classification accuracy. The wrapper method includes a machine learning algorithm as part of the evaluation, which enables it to achieve better classification results than the filter method. Thus, wrapper methods have more widely used in feature selection works [6][7][8].

Harris Hawk Optimization
Harris hawk optimization (HHO) is a recent metaheuristic algorithm proposed by Heidari and his colleagues in 2019 [16]. HHO mimics the concepts of Harris hawks to explore the prey, surprise pounce, and different attack strategies of Harris hawks in nature. In HHO, the candidate solutions are represented by hawks, while the best solution (nearly optimal solution) is known as prey. The Harris hawks try to track the prey by using their powerful eyes and then perform the surprise pounce to catch the prey detected.
Generally, HHO is modeled into exploitation and exploration phases. The HHO algorithm can be transferred from exploration to exploitation, and then the exploration behavior is changed based on the escaping energy of prey. Mathematically, the escaping energy of prey can be computed as [16]: where t is the current iteration, T is the maximum number of iterations, E 0 is the initial energy randomly generated in [−1, 1], and r is a random number in [0, 1]. Figure 1 shows an illustration of escaping energy of prey over 300 iterations. As can be seen, the escaping energy was showing a decreasing trend. When the escaping energy of prey |E| ≥ 1, HHO allowed the hawks to search globally on different regions. On the contrary, HHO tended to promote the local search around the neighborhood of the best solutions when the escaping energy of prey |E| < 1. Harris hawk optimization (HHO) is a recent metaheuristic algorithm proposed by Heidari and his colleagues in 2019 [16]. HHO mimics the concepts of Harris hawks to explore the prey, surprise pounce, and different attack strategies of Harris hawks in nature. In HHO, the candidate solutions are represented by hawks, while the best solution (nearly optimal solution) is known as prey. The Harris hawks try to track the prey by using their powerful eyes and then perform the surprise pounce to catch the prey detected.
Generally, HHO is modeled into exploitation and exploration phases. The HHO algorithm can be transferred from exploration to exploitation, and then the exploration behavior is changed based on the escaping energy of prey. Mathematically, the escaping energy of prey can be computed as [16]: where t is the current iteration, T is the maximum number of iterations, E0 is the initial energy randomly generated in [−1, 1], and r is a random number in [0, 1]. Figure 1 shows an illustration of escaping energy of prey over 300 iterations. As can be seen, the escaping energy was showing a decreasing trend. When the escaping energy of prey |E| ≥ 1, HHO allowed the hawks to search globally on different regions. On the contrary, HHO tended to promote the local search around the neighborhood of the best solutions when the escaping energy of prey |E| < 1.

Exploration Phase
In the exploration phase, the position of the hawk is updated via random location and other hawks as follow [16]: 2 0 . 5 1 0.5 k k r m X t r X t r X t q X t X t X t r lb r ub lb q where X is the position of the hawk, Xk is the position of randomly selected hawk, Xr is the position of prey (global best solution in the entire population), t is the current iteration, ub and lb are the upper and lower boundaries of search space, r1, r2, r3, r4, and q are the five independent random numbers in [0, 1]. The Xm is the mean position of the current population of hawks, and it can be computed using Equation (4).
where Xn is the n-th hawk in the population, and N is the number of hawks (population size).

Exploration Phase
In the exploration phase, the position of the hawk is updated via random location and other hawks as follow [16]: where X is the position of the hawk, X k is the position of randomly selected hawk, X r is the position of prey (global best solution in the entire population), t is the current iteration, ub and lb are the upper and lower boundaries of search space, r 1 , r 2 , r 3 , r 4 , and q are the five independent random numbers in [0, 1]. The X m is the mean position of the current population of hawks, and it can be computed using Equation (4). where X n is the n-th hawk in the population, and N is the number of hawks (population size).

Exploitation Phase
In the exploitation phase, the position of the hawk is updated based on four different situations. This behavior is manipulated based on the escaping energy of prey (E), and the chance of prey in successfully escaping (r < 0.5) or not successfully escaping (r ≥ 0.5) before surprise bounce.

Soft Besiege
The soft besiege happens when r ≥ 0.5 and |E| ≥ 0.5. In this situation, the hawk updates its position using Equation (5): where E is the escaping energy of prey, X is the position of the hawk, t is the current iteration, ∆X is the difference between the position of the prey and current hawk, and J is the jump strength. The ∆X and J are defined as follows [16]: where r 5 is a random number in [0, 1] that changes randomly in each iteration.

Hard Besiege
The HHO performs the hard besiege when r ≥ 0.5 and |E| < 0.5. In this situation, the position of the hawk is updated as follow [16]: where X is the position of the hawk, X r is the position of prey, E is the escaping energy of prey, and ∆X is the difference between the position of the prey and current hawk.

Soft Besiege with Progressive Rapid Dives
The soft besiege with progressive rapid dives is occurred when r < 0.5 and |E| ≥ 0.5. The hawk progressively selects the best possible dive to catch the prey competitively. In this circumstance, the new position of the hawk is generated as follows [16]: where Y and Z are two newly generated hawks, E is the escaping energy, J is the jump strength, X is the position of the hawk, t is the current iteration, X r is the position of prey, D is the total number of dimensions, α is a random vector with dimension D, and Levy is the levy flight function that can be computed as: where u, v are two independent random numbers generated from the normal distribution, and σ is defined as: where β is a default constant set to 1.5. In this phase, the position of the hawk is updated as in Equation (13).
where F(.) is the fitness function, Y and Z are two new solutions obtained from Equations (9) and (10).

Hard Besiege with Progressive Rapid Dives
The last situation is hard besiege with progressive rapid dives, which is performed when r < 0.5 and |E| < 0.5. In this condition, two new solutions are generated as follows [16]: where E is the escaping energy, J is the jump strength, X m is the mean position of the hawks in current population, t is the current iteration, X r is the position of the prey, D is the total number of dimensions, α is a random vector with dimension D, and Levy is the levy flight function. Afterward, the position of the hawk is updated as: where F(.) is the fitness function, Y and Z are two new solutions obtained from Equations (14) and (15).

The Proposed Binary Harris Hawk Optimization
Previous work indicates that HHO can usually offer significantly superior results than other well-established optimizers in several benchmark tests. Among rivals, HHO showed highly competitive performance concerning the quality of exploration and exploitation [16]. The main reason for the excellent exploration is due to the different diversification mechanisms that facilitate a high global search capability of HHO in the initial iterations. As for the exploitation, HHO utilizes different Levy flight-based patterns with short length jump and a series of searching strategies to boost the exploitative behavior. Moreover, HHO benefits from the dynamic randomized escaping energy parameter that allows it to retain a smooth transition between local search and global search [16]. These attractive behaviors of HHO motivate us to apply it in the feature selection problem.

Representation of Solutions
Feature selection is considered as a combinatorial binary optimization problem that represents the solutions on the binary search space. However, HHO is designed to solve the continuous optimization problems, which is not suitable for the feature selection problem. To develop the binary version of HHO, the solutions should be representing in binary form (either 0 or 1). Thus, several modifications are needed to meet the requirement.

Transformation of Solutions
According to literature, the utilization of transfer function is one of the effective ways to convert continuous optimizer into a binary one. In comparison with other operators, the transfer function is cheaper, simple, and faster, which leads to the ease of implementation [18,19]. In most of the previous studies, researchers employed either S-shaped or V-shaped transfer function to convert the continuous optimizer to binary. Therefore, in this study, we use four S-shaped (S1-S4) and V-shaped (V1-V4) transfer functions to change the continuous HHO into the binary version. Table 1 shows the S-shaped and V-shaped transfer functions with mathematical definitions. The illustrations of S-shaped and V-shaped transfer functions are demonstrated in Figure 2.

S1
( ) By integrating the transfer function into binary HHO (BHHO), the algorithm is able to perform the search on the binary search space. In BHHO, the position of the hawk is updated in two stages. In the first stage, BHHO updates the position of the hawk (Xi d (t)) into a new position (∆Xi d (t + 1)) similar to HHO. Note that the new position (∆Xi d (t + 1)) is presenting in continuous form. In the second stage, the S-shaped or V-shaped transfer function is used to transform the new position into probability value. The new position of the hawk is then updated using Equation (17) or Equation (18). In this way, the position of the hawk can be expressed in binary form.
In S-shaped family, BHHO updates the new position of the hawk as follow [20]: By integrating the transfer function into binary HHO (BHHO), the algorithm is able to perform the search on the binary search space. In BHHO, the position of the hawk is updated in two stages. In the first stage, BHHO updates the position of the hawk (X i d (t)) into a new position (∆X i d (t + 1)) similar to HHO. Note that the new position (∆X i d (t + 1)) is presenting in continuous form. In the second stage, the S-shaped or V-shaped transfer function is used to transform the new position into probability value. The new position of the hawk is then updated using Equation (17) or Equation (18). In this way, the position of the hawk can be expressed in binary form. In S-shaped family, BHHO updates the new position of the hawk as follow [20]: where T(x) is the S-shaped transfer function, rand(0,1) is a random number in [0,1], X is the position of the hawk, i is the order of hawk in the population, d is the dimension, and t is the current iteration. Unlike the S-shaped transfer function, V-shaped transfer function does not force the search agent into 0 or 1. In V-shaped family, the new position of the hawk is updated as [21]: where T(x) is the V-shaped transfer function, rand(0,1) is a random number in [0,1], X is the position of the hawk, i is the order of hawk in the population, d is the dimension, t is the current iteration, and ¬ X is the complement of X.

Binary Harris Hawk Optimization Algorithm
The pseudocode of BHHO is demonstrated in Algorithm 1.

Algorithm 1. Binary Harris hawk optimization.
Inputs: N and T 1: Initialize the X i for N hawks 2: for (t = 1 to T) 3: Evaluate the fitness value of the hawks, F(X) 4: Define the best solution as X r 5: for (i = 1 to N) 6: Compute the E 0 and J as shown in (2) and (7), respectively 7: Update the E using (1) Update the position of the hawk using (3) 10: Calculate the probability using S-shaped or V-shaped transfer function 11: Update new position of the hawk using (17) or (18) Update the position of the hawk as shown in (5)  15: Calculate the probability using S-shaped or V-shaped transfer function 16: Update new position of the hawk using (17) or (18) // Hard besiege // 17: elseif (r ≥ 0.5) and (|E| < 0.5) 18: Update the position of the hawk using (8) 19: Calculate the probability using S-shaped or V-shaped transfer function 20: Update new position of the hawk using (17) or (18) // Soft besiege with progressive rapid dives // 21: elseif (r < 0.5) and (|E| ≥ 0.5) 22: Update the position of the hawk using (13) 23: Calculate the probability using S-shaped or V-shaped transfer function 24: Update new position of the hawk using (17) or (18) // Hard besiege with progressive rapid dives // 25: elseif (r < 0.5) and (|E| < 0.5) 26: Update the position of the hawk using (16) 27: Calculate the probability using S-shaped or V-shaped transfer function 28: Update new position of the hawk using (17) or (18)

The Proposed Quadratic Binary Harris Hawk Optimization
In the previous section, the BHHO algorithms have discussed. However, in the experiment, we found that the performance of BHHO was still far from perfect. It shows that BHHO cannot efficiently solve the feature selection problems. From the aforementioned, transfer function is one of the simplest and effective ways to convert the HHO into a binary one. However, the BHHO with S-shaped or V-shaped transfer function might not work effectively in feature selection tasks. In this regard, we propose the new quadratic binary Harris hawk optimization (QBHHO) to enhance the performance of BHHO in current work. Unlike BHHO, QBHHO utilizes the quadratic transfer function to convert the HHO into a binary one. In this work, we propose four quadratic transfer functions (Q1-Q4), as shown in Table 2. Figure 3 presents the illustrations of quadratic transfer functions. Table 2. The utilized quadratic transfer functions.

Name
Transfer Function , otherwise propose the new quadratic binary Harris hawk optimization (QBHHO) to enhance the performance of BHHO in current work. Unlike BHHO, QBHHO utilizes the quadratic transfer function to convert the HHO into a binary one. In this work, we propose four quadratic transfer functions (Q1-Q4), as shown in Table 2. Figure 3 presents the illustrations of quadratic transfer functions. ,otherwise , otherwise  The QBHHO first updates the position of the hawk and then converts the new position into probability value using the quadratic transfer function. Afterward, the following rule is used for updating the hawk's position [22]: where T(x) is the quadratic transfer function, rand(0,1) is a random number in [0,1], X is the position of the hawk, i is the order of hawk in the population, d is the dimension, t is the current iteration, and ¬ X is the complement of X. The pseudocode of QBHHO is demonstrated in Algorithm 2.

Algorithm 2. Quadratic binary Harris hawk optimization.
Inputs: N and T 1: Initialize the X i for N hawks 2: for (t = 1 to T) 3: Evaluate the fitness value of the hawks, F(X) 4: Define the best solution as X r 5: for (i = 1 to N) 6: Compute the E 0 and J as shown in (2) and (7), respectively 7: Update the E using (1) Update the position of the hawk using (3) 10: Calculate the probability using quadratic transfer function 11: Update new position of the hawk using (19) Update the position of the hawk as shown in (5)  15: Calculate the probability using quadratic transfer function 16: Update new position of the hawk using (19) // Hard besiege // 17: elseif (r ≥ 0.5) and (|E| < 0.5) 18: Update the position of the hawk using (8)  19: Calculate the probability using quadratic transfer function 20: Update new position of the hawk using (19) // Soft besiege with progressive rapid dives // 21: elseif (r < 0.5) and (|E| ≥ 0.5) 22: Update the position of the hawk as shown in (13)  23: Calculate the probability using quadratic transfer function 24: Update new position of the hawk using (19) // Hard besiege with progressive rapid dives // 25: elseif (r < 0.5) and (|E| < 0.5) 26: Update the position of the hawk using (16)  27: Calculate the probability using quadratic transfer function 28: Update new position of the hawk using (19)

Application of Proposed BHHO and QBHHO for Feature Selection
In this section, the application of the proposed BHHO and QBHHO algorithms for the feature selection problems are presented. Generally, the feature selection problem is an NP-hard combinatorial binary optimization problem, in which the possible solutions increase exponentially with the number of features. Let D be the total number of features. The number of possible solutions is 2 D -1, which is impractical to perform the search exhaustively. Therefore, we propose the BHHO and QBHHO to automatically find the promising solution (feature subset) that can significantly improve the performance of the classification model.
In feature selection, the solutions are representing in binary form, and they can be either bit '0' or bit '1'. Bit '1' indicates that the feature is selected, whereas bit '0' represents the unselected feature [23]. Taking the sample solution (solution consists of 10 features) in Figure 4 as an example, it shows that a total of five features (1st, 3rd, 4th, 7th, and 8th) have been selected. with the number of features. Let D be the total number of features. The number of possible solutions is 2 D -1, which is impractical to perform the search exhaustively. Therefore, we propose the BHHO and QBHHO to automatically find the promising solution (feature subset) that can significantly improve the performance of the classification model. In feature selection, the solutions are representing in binary form, and they can be either bit '0' or bit '1'. Bit '1' indicates that the feature is selected, whereas bit '0' represents the unselected feature [23]. Taking the sample solution (solution consists of 10 features) in Figure 4 as an example, it shows that a total of five features (1st, 3rd, 4th, 7th, and 8th) have been selected.
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th In wrapper feature selection, a fitness function (objective function) is required to evaluate the individual solution. The primary goal of feature selection is to enhance prediction accuracy and to reduce the number of features. In this work, the fitness function that considers both criteria is utilized, and it is defined as [9]: .

No of wrongly predicted Error Total number of instances
where Error is the error rate computed by a learning algorithm, |S| is the length of feature subset, |F| is the total number of features, and α is a parameter used to control the influence of classification performance and feature size. In Equation (20), the first term is the classification performance, while the second term is the feature reduction.

Dataset
In this study, twenty-two benchmark datasets collected from the UCI machine learning repository are used to validate the performances of proposed approaches [24]. Table 3 outlines the datasets used in this work. For each individual dataset, the features are normalized between 0 and 1 to prevent the numerical problem.

Parameter Settings
In the present study, the k-nearest neighbor (KNN) algorithm with Euclidean distance and k = 5 is used to compute the error rate (first term of the fitness function). Different BHHO and QBHHO algorithms are employed to find the most informative feature subset. The algorithms are repeated 30 times with different random seeds. Besides, the K-fold cross-validation manner is applied to compute the error rate in the fitness function for preventing the overfitting [15]. In each of the 30 runs, the dataset is partitioned into K equal parts. One part is used for the testing set, while the remaining K -1 parts are used for the training set. The procedure is repeated with K times using different parts for the testing and training sets, and the average results are recorded. Finally, the average statistical measurements are collected throughout 30 independent runs and displayed as the final results. In the previous works, KNN was shown to be faster, simpler, and ease of implement [13,25,26]. Thus, the KNN is chosen as the learning algorithm in this work.
All the experiments are executed on MATLAB 9.3 using a PC with an Intel Core i5-9400F CPU 2.90 GHz and 16.0 GB RAM. In this study, we set K = 10. The number of hawks (population size) is set to 10, and the maximum number of iterations is 100. This hyper-parameter setting was utilized in various previous works [27,28]. The dimension of search space is equal to the total number of features In wrapper feature selection, a fitness function (objective function) is required to evaluate the individual solution. The primary goal of feature selection is to enhance prediction accuracy and to reduce the number of features. In this work, the fitness function that considers both criteria is utilized, and it is defined as [9]: where Error is the error rate computed by a learning algorithm, |S| is the length of feature subset, |F| is the total number of features, and α is a parameter used to control the influence of classification performance and feature size. In Equation (20), the first term is the classification performance, while the second term is the feature reduction.

Dataset
In this study, twenty-two benchmark datasets collected from the UCI machine learning repository are used to validate the performances of proposed approaches [24]. Table 3 outlines the datasets used in this work. For each individual dataset, the features are normalized between 0 and 1 to prevent the numerical problem.

Parameter Settings
In the present study, the k-nearest neighbor (KNN) algorithm with Euclidean distance and k = 5 is used to compute the error rate (first term of the fitness function). Different BHHO and QBHHO algorithms are employed to find the most informative feature subset. The algorithms are repeated 30 times with different random seeds. Besides, the K-fold cross-validation manner is applied to compute the error rate in the fitness function for preventing the overfitting [15]. In each of the 30 runs, the dataset is partitioned into K equal parts. One part is used for the testing set, while the remaining K -1 parts are used for the training set. The procedure is repeated with K times using different parts for the testing and training sets, and the average results are recorded. Finally, the average statistical measurements are collected throughout 30 independent runs and displayed as the final results. In the previous works, KNN was shown to be faster, simpler, and ease of implement [13,25,26]. Thus, the KNN is chosen as the learning algorithm in this work.
All the experiments are executed on MATLAB 9.3 using a PC with an Intel Core i5-9400F CPU 2.90 GHz and 16.0 GB RAM. In this study, we set K = 10. The number of hawks (population size) is set to 10, and the maximum number of iterations is 100. This hyper-parameter setting was utilized in various previous works [27,28]. The dimension of search space is equal to the total number of features of each dataset. According to [9,28,29], we choose the α = 0.99 since the classification performance is the most important in the current work.

Evaluation of Proposed BHHO and QBHHO Algorithms
In the first part of the experiment, the performances of the BHHO and QBHHO algorithms are validated on 22 datasets. The commonly used evaluation metrics include the best fitness value, mean fitness value, standard deviation of fitness value (STD), classification accuracy, and the number of selected features (feature size) are measured, and they can be defined as follows [9,30,31]: Average accuracy = 1 R R m=1 No. o f correctly predicted m Total number o f instances where Gb is the global best solution obtained from run m, µ is the mean fitness, |S| is the number of selected features, m is the order of run, and R is the maximum number of runs.
The twelve proposed approaches, BHHO (S1-S4 and V1-V4) and QBHHO (Q1-Q4) are compared to investigate the best binary version of HHO in feature selection. Tables 4-6 present the experimental results of the best, mean, and STD fitness values of proposed algorithms. In these tables, the best result is highlighted with bold text. In Table 4, the BHHO-V1 scored the optimal best fitness value on most of the datasets (twelve datasets), followed by QBHHO-Q4 (11 datasets). From Table 5, the best algorithm that contributed to the lowest mean fitness value was found to be QBHHO-Q4 (12 datasets), followed by BHHO-S1 and BHHO-S2 (five datasets). This shows that QBHHO-Q4 provided the highest diversity. On the one hand, BHHO-S3 perceived the most consistent results due to the lowest STD values in eight datasets. Furthermore, the convergence curves of the proposed algorithms on 22 datasets were shown in Figures 5 and 6.
Tables 7 and 8 outline the result of the average classification accuracy and average feature size of the proposed algorithms. As can be observed, QBHHO-Q4 outperformed other algorithms in finding promising solutions, thus, leading to optimal classification accuracy. Among rivals, QBHHO-Q4 has achieved the highest average classification accuracy in eleven datasets. It shows that the quadratic transfer function can usually overtake S-shaped and V-shaped transfer functions in feature selection. On the other hand, QBHHO-Q2 provided the smallest number of selected features on most of the datasets, followed by BHHO-V4. Even though V-shaped transfer function can significantly reduce the number of features, however, the relevant features are eliminated and thus resulting in unsatisfactory performance. On the whole, QBHHO with quadratic transfer function Q4 is the best binary version of HHO in current work. Therefore, only QBHHO with transfer function Q4 is used in the rest of this paper.    0270 0.0232 0.0232 0.0232 0.0232 0.0232 0.0233 0.0232 0.0232  2  0.1378 0.1382 0.1394 0.1402 0.1332 0.1327 0.1311 0.1331 0.1317 0.1353 0.1394 0.1276  3  0.0378 0.0378 0.0378 0.0378 0.0378 0.0378 0.0378 0.0378 0.0378 0.0379 0.0378 0. 9740 0.9776 0.9776 0.9776 0.9776 0.9776 0.9775 0.9776 0.

Comparison with Other Metaheuristic Algorithms
In the second part of the experiment, five recent and popular metaheuristic algorithms, including binary differential evolution (BDE) [32], binary flower pollination algorithm (BFPA) [33], binary multi-verse optimizer (BMVO) [14], binary salp swarm algorithm (BSSA) [28], and genetic algorithm (GA) [34], are applied to examine the efficacy and efficiency of proposed QBHHO in feature selection problem. BDE is a variant of differential evolution (DE) that composes of differentiation, mutation, crossover, and selection process. BFPA is a binary version of flower pollination algorithm (FPA) in which the S2 transfer function is implemented for conversion. BMVO integrates the V4 transfer function to convert the continuous variables into the binary one. BSSA is a binary variant of the salp swarm algorithm (SSA) that implements the V3 transfer function. GA comprises of parent selection, crossover, and mutation operators. We utilize a simple GA with roulette wheel selection and single crossover for comparison. Table 9 lists the parameter settings of the utilized algorithms. Tables 10-12 display the experimental results of the best, mean, and STD fitness values of six different algorithms. In these tables, the best results are bolded. Based on the results obtained, QBHHO was showing competitive performance in feature selection. In comparison with BDE, BFPA, BMVO, BSSA, and GA, QBHHO was highly capable in finding the nearly optimal solution. From Table 10, QBHHO yielded the optimal best fitness value in fourteen datasets, which overwhelmed other competitors in this work. This result proves that the QBHHO with quadratic transfer function is helpful for assisting the algorithm to find the optimal solution. Moreover, QBHHO offered the lowest mean fitness values on most of the datasets. This again validates the efficacy of QBHHO in exploring the untried areas when finding the global optimum. In Table 12, the BFPA perceived the smallest STD value in most cases, which contributed to a high consistent result. However, BFPA cannot find the optimal solution very well and thus leading to an ineffective result. Figures 7 and 8 demonstrate the convergence curves of six different algorithms on 22 datasets. As can be seen, QBHHO can usually offer a high diversity. Taking dataset 7 (horse colic) and 21 (diabetic) as the examples, QBHHO converged faster to find the promising solution, which overtook other algorithms in feature selection. That is, QBHHO keeps tracking the global optimum, thus leading to a high quality solution. This can be interpreted due to some strong properties of HHO algorithm, and the superior of quadratic transfer function we made in QBHHO algorithm.  Table 13 presents the experimental results of the average classification accuracy of six different algorithms. Note that the best results are highlighted with bold text. According to the result in Table 13, the average classification accuracy obtained by QBHHO was far superior to other competitors in most cases. Out of 22 datasets, QBHHO contributed to the highest average classification accuracy on at least twelve datasets. Figures 9 and 10 exhibit the results of boxplot of six different algorithms on 22 datasets. In these figures, the red line in the box represents the median value, and the symbol '+' denotes the outlier. As can be seen, QBHHO scored the highest median value in most cases, which provided better classification performance than BDE, BMVO, GA, BSSA, and BFPA algorithms.       Furthermore, we apply the Wilcoxon signed-rank test to examine whether the performance of the proposed QBHHO is significantly better than other algorithms in this work. In the Wilcoxon signed-rank test, if the p-value achieved is less than 0.05, then the performances of the two algorithms are significantly difference; otherwise, the performances of the two algorithms are similar. Table 15 exhibits the result of Wilcoxon signed-rank test with p-values. In this Table, the symbols "w/t/l" indicates that the proposed QBHHO was significantly better to (win), equal to (tie), and significantly worse to (lose) other algorithms. Note that the results with the p-value greater than 0.05 are Furthermore, we apply the Wilcoxon signed-rank test to examine whether the performance of the proposed QBHHO is significantly better than other algorithms in this work. In the Wilcoxon signed-rank test, if the p-value achieved is less than 0.05, then the performances of the two algorithms are significantly difference; otherwise, the performances of the two algorithms are similar. Table 15 exhibits the result of Wilcoxon signed-rank test with p-values. In this Table, the symbols "w/t/l" indicates that the proposed QBHHO was significantly better to (win), equal to (tie), and significantly worse to (lose) other algorithms. Note that the results with the p-value greater than 0.05 are underlined. Inspecting the result in Table 15, the performance of QBHHO was significantly better than other algorithms in most datasets. The results again verify the superiority of QBHHO to solve the feature selection problem in classification tasks. To sum up, the experiments show the excellent properties of QBHHO in terms of classification accuracy and feature size. As an instant conclusion, QBHHO is a useful tool, and it is appropriate to apply in rehabilitation, clinical, and engineering applications.

Conclusions
In this paper, the BHHO and QBHHO algorithms are proposed to tackle the feature selection problem in classification tasks. The BHHO integrated S-shaped or V-shaped transfer function to convert the continuous HHO into the binary version. On the one hand, the quadratic transfer function is introduced in QBHHO to enhance the performance of BHHO in feature selection. The proposed algorithms are validated on 22 benchmark datasets in the UCI machine learning repository. The performances of proposed algorithms are evaluated based on the best fitness value, mean fitness value, standard deviation of fitness value, classification accuracy, and feature size. Among the BHHO and QBHHO algorithms, the QBHHO with quadratic transfer function Q4 offered the optimal performance in current work. Furthermore, the performance of QBHHO is compared with other algorithms, include BDE, BFPA, BMVO, BSSA, and GA. The experimental results show that our proposed QBHHO can usually achieve the highest classification accuracy as well as the smallest feature size when dueling with feature selection tasks. All in all, QBHHO is a powerful tool to solve the feature selection problem in classification tasks.
In the future, QBHHO can be applied to other feature selection and real-world problems. In addition, the classifiers such as support vector machine (SVM) and neural network (NN) can be implemented to enhance the performance of the current algorithm. Moreover, QBHHO can be employed to solve the other binary optimization problems such as knapsack problem and optimized neural network. Lastly, the quadratic transfer function can be integrated into other metaheuristic algorithms to solve binary optimization problems.