Feature Selection for High-Dimensional Datasets through a Novel Artiﬁcial Bee Colony Framework

: There are generally many redundant and irrelevant features in high-dimensional datasets, which leads to the decline of classiﬁcation performance and the extension of execution time. To tackle this problem, feature selection techniques are used to screen out redundant and irrelevant features. The artiﬁcial bee colony (ABC) algorithm is a popular meta-heuristic algorithm with high exploration and low exploitation capacities. To balance between both capacities of the ABC algorithm, a novel ABC framework is proposed in this paper. Speciﬁcally, the solutions are ﬁrst updated by the process of employing bees to retain the original exploration ability, so that the algorithm can explore the solution space extensively. Then, the solutions are modiﬁed by the updating mechanism of an algorithm with strong exploitation ability in the onlooker bee phase. Finally, we remove the scout bee phase from the framework, which can not only reduce the exploration ability but also speed up the algorithm. In order to verify our idea, the operators of the grey wolf optimization (GWO) algorithm and whale optimization algorithm (WOA) are introduced into the framework to enhance the exploitation capability of onlooker bees, named BABCGWO and BABCWOA, respectively. It has been found that these two algorithms are superior to four state-of-the-art feature selection algorithms using 12 high-dimensional datasets, in terms of the classiﬁcation error rate, size of feature subset and execution speed.


Introduction
Due to the rapid development of data acquisition technology, a great deal of digital information is becoming more easily collected and included in datasets. However, not all features in datasets are useful for a target problem. In other words, there are many redundant and irrelevant features in high-dimensional datasets, so feature selection (FS) is used as a vital data preprocessing step in data mining and machine learning [1]. However, FS is an NP-hard problem. For an n-dimensional dataset, there are 2n feature subsets, which is difficult to solve with an exhaustive method. With a good FS method, we can not only get higher classification accuracy, but also reduce the complexity of calculation. In order to improve the search efficiency of FS algorithms, many scholars propose algorithms, which can be roughly divided into three types: filter method, wrapper method and embedded method [2]. Among them, the wrapper method is widely used because of its good classification ability. Therefore, this paper studies the wrapper FS method.
The wrapper approach mainly consists of three parts: classifiers, feature subset evaluation criteria and search techniques [3]. Among them, an effective search technique is crucial for the performance of FS algorithms. It is worth mentioning that meta-heuristic methods, such as the artificial bee colony (ABC) algorithm [4], the particle swarm optimization (PSO) algorithm [5], the differential evolution (DE) algorithm [6], the grey wolf optimization (GWO) algorithm [7], the whale optimization algorithm (WOA) [8], and many (1) In order to trade off the exploitation and exploration abilities of ABC, we use operators with strong exploitation abilities to enhance the exploitation ability in the phase of onlooker bee; (2) This paper analyzes the functional behavior of the scout bee phase and finds that this phase may be redundant while dealing with high-dimensional FS problems, and so eliminating this phase can reduce the computational time of the algorithm; (3) The proposed framework is designed as a general framework that can be used to adapt many ABC variants for the FS problems.
The remainder of this paper is illustrated as follows: Section 2 briefly describes the related works of the ABC algorithm. In Section 3, the original ABC algorithm is introduced and analyzed. Section 4 presents the details of our proposed approach. In Section 5, comparisons of the experimental results are presented and discussed. The proposed algorithms are further analyzed in Section 6. At last, the conclusions and future work are outlined in Section 7.

Related Works
Recently, meta-heuristic algorithms have attracted the attention of many scholars. These algorithms can be used to solve many real engineering tasks, such as path planning [14][15][16], feature selection [17][18][19], function optimization [20][21][22], and the traveling salesman problem [23][24][25]. Although various meta-heuristics have been developed to deal with FS over the years, the significant increase in data dimensionality brings great challenges; therefore, it is worth continuing looking for effective strategies to make metaheuristic algorithms perform better for high-dimensional FS problems [26].
ABC was proposed in 2005 by Karaboga group to optimize algebraic problems [27]. Single-objective ABC was first used to address the FS problem in 2012 [18,28]. Almost all meta-heuristic algorithms have the problem of an imbalance between exploration and exploitation [29], and the ABC algorithm is no exception. There are a lot of studies on the ABC algorithm, seeking to improve its exploitation capability. To accelerate the convergence speed of the ABC algorithm, Chao et al. [30] proposed the KnABC algorithm, which introduced Knee Points into the employed bee phase and onlooker bee phase. The results show that this algorithm has a significant effect on reducing the number of features and increasing the classification accuracy. Shunmugapriya et al. [31] utilized the ACO algorithm for colony initialization, and took the initialization results as the food sources of the ABC algorithm for further optimization so as to integrate the ACO and ABC algorithms; the resulting algorithm's performance was better than that of ABC or ACO alone. Djellali et al. [10] proposed two hybrid ABC algorithms, i.e., ABC-PSO and ABC-GA, which integrate the PSO algorithm and GA algorithm into the framework of the original ABC algorithm in different bee phases, respectively. The experimental results showed that ABC-GA obtained better results than some other existing methods. Shunmugapriya et al. [32] proposed the EABC-FS algorithm, in which the employed bees and onlooker bees made full use of the best solutions in the current swarm to enhance the exploitation ability of the ABC algorithm. The experimental results showed that the performance of the algorithm achieved by introducing such fusion strategies was greatly improved. Moreover, many other studies have shown that the ABC algorithm faces the problem of an insufficient exploitation ability, which results in it becoming trapped in a local optimum and having a low convergence speed [28,33].
Although the above-mentioned hybrid variants of the ABC algorithm have achieved promising performance, they do not deeply analyze the exploitation and exploration abilities in different bee phases of the overall framework. Moreover, few of these algorithms have been developed for high-dimensional FS. Therefore, this paper proposes a novel exploration and exploitation trade-off ABC algorithm by modifying the original overall framework, and applies it to high-dimensional datasets. This new framework strengthens the exploitation ability in the onlooker bee phase by using operators with high exploitation capacities. Additionally, the function of scout bees is discussed in detail, and verified by experiments.

Introduction and Analysis of ABC Algorithm
The ABC algorithm is a kind of swarm intelligence (SI) algorithm that simulates the honey-gathering behavior of a bee swarm. This algorithm includes three types of bees: employed bees, onlooker bees and scout bees. Each food source corresponds to a solution to the given task, and the fitness of the solution indicates the quality of the food source. The overall process of the ABC algorithm is as follows [34].
First of all, it initializes a population of size SN randomly. This is calculated by Equation (1): where i = 1, 2, . . . , SN, d = 1, 2, . . . , D. SN is the number of food sources. D is the dimensionality of the search space. Additionally, the number of employed bees or onlooker bees is equal to the number of food sources. r is a random number in [0, 1], distributed uniformly. x min d and x max d represent the maximum and minimum values of the dth dimension feature, respectively. After initialization, the bees begin to search.
(1) Employed bee phase: According to Equation (2), a new food source is produced around the current food source, as follows: where ϕ id is a random number within [−1,1]. x id and x kd represent the dth dimension feature of x i and x k , respectively. x id is compared with x id , and if the fitness of x id is superior to x id , x id is replaced by x id for entry into the next step, and its counter is reset to 0. Otherwise, x id is retained for entry into the next step, and its counter increases by 1. (2) Onlooker bee phase: Every onlooker bee selects a food source depending on the probability value p i via the roulette-wheel scheme. p i is associated with the food resource information given by the employed bee. The value of p i is generated by Equation (3). where f it i is the fitness value of solution x i . Each selected food source is updated using Equation (2). (3) Scout bee phase: If the counter of a food source is greater than or equal to the preset number of trials, then this food source is discarded. The value of the preset number of trials is usually called the limit for abandonment. If a food source is abandoned, then the scout bee translated from the employed bee will regenerate a food source via Equation (1) to replace the food source that is abandoned.
In the ABC algorithm, the employed bees are in charge of finding viable solutions throughout the search area, and providing the onlooker bees with food information. Based on the food information, the onlooker bees search new food sources near to the existing found food sources. In the updating process of the onlooker bee phase, the same updating formula (Equation (2)) is used to update the population as in the employed bee phase. As we can see from the above, the ABC algorithm does not take advantage of the elitism principle. Both the employed bees and onlooker bees use Equation (2) to obtain new food resources, as it has a powerful global search ability, but its search efficiency is low and its exploitation ability is not optimal. The roulette-wheel scheme can make food sources with higher fitness values easier to select, and the use of the roulette-wheel scheme in the onlooker bee phase can strengthen the exploitation ability, but this exploitation ability is far less strong than its powerful exploration ability. Therefore, as Hong and Ahn [35] have pointed out, the exploitation level of the onlooker bee phase should increase. In addition, the scout bee phase not only reduces the probability of falling into the local optimum, but it also reduces the rate of convergence. Under the action of the scout bees, the optimal solution may also be discarded [34]. Therefore, the ABC algorithm has an outstanding exploration capacity but inefficient exploitation. This imbalance renders the ABC algorithm unable to reach a better solution, because the convergence is too slow.

The Proposed Framework
A meta-heuristic algorithm having a balance between exploration and exploitation ability has a great impact on its performance. For an algorithm with good exploration ability, we can enhance its exploitation ability by introducing operators with strong exploitation ability so as to regain the balance between exploration and exploitation abilities. Based on the analysis in Section 3, this paper presents a novel ABC framework. There are three points in the description of the framework: (1) The employed bee phase of the ABC algorithm is retained so that it can explore the search space widely and avoid reaching the local optimum; (2) The updating mode of the ABC algorithm's onlooker bee phase is changed to the new updating strategy, as inspired by other algorithms with more powerful exploitation capacities. The searching scheme of these algorithms with powerful exploitation abilities is introduced as an operator. According to our observation, higher diversity in the bee swarm can help the algorithm to find more potential search space, but after a certain period, the solutions should converge and approach optimal solutions with reductions in colony diversity. We believe that applying operators with strong exploitation abilities to the optimization process can reduce the diversity of the algorithm in the late stage, and bring about a higher convergence speed. Therefore, the introduction of operators with powerful exploitation abilities can help our novel ABC framework find better solutions; (3) The scout bee phase is removed, because the exploration ability of the scout bee phase will increase the diversity of the algorithm during the later period. Moreover, the scout bee phase will waste the execution time, and consume computational resources and memory during the calculation process. Figure 1 illustrates our proposed framework and its differences from the processes of the original ABC algorithm. Overall, the two methods utilize the same updating mechanism in the employed bee phase. However, without the scout bee phase, our method does not need to compute the value of the counter throughout the algorithm. Since the onlooker bee phase in our method is updated by the operators of an algorithm with strong exploitation abilities, we do not use roulette-wheel selection, so we do not need to calculate the selection probability of each individual.
the diversity of the algorithm in the late stage, and bring about a higher convergence speed. Therefore, the introduction of operators with powerful exploitation abilities can help our novel ABC framework find better solutions; (3) The scout bee phase is removed, because the exploration ability of the scout bee phase will increase the diversity of the algorithm during the later period. Moreover, the scout bee phase will waste the execution time, and consume computational resources and memory during the calculation process. Figure 1 illustrates our proposed framework and its differences from the processes of the original ABC algorithm. Overall, the two methods utilize the same updating mechanism in the employed bee phase. However, without the scout bee phase, our method does not need to compute the value of the counter throughout the algorithm. Since the onlooker bee phase in our method is updated by the operators of an algorithm with strong exploitation abilities, we do not use roulette-wheel selection, so we do not need to calculate the selection probability of each individual. FS is, in essence, an optimization problem in a binary searching space. The value of each element of the solutions is limited to 0 or 1 [36]. However, the ABC algorithm originally proposed is used in continuous space. To adapt our proposed ABC framework to FS, we need to transform the continuous values to binary values. This transformation is fulfilled by Equation (4).
where r is a random value in [0,1]. The function of ( ) sigmoid x is formulated as in Equation (5): FS is, in essence, an optimization problem in a binary searching space. The value of each element of the solutions is limited to 0 or 1 [36]. However, the ABC algorithm originally proposed is used in continuous space. To adapt our proposed ABC framework to FS, we need to transform the continuous values to binary values. This transformation is fulfilled by Equation (4).
where r is a random value in [0, 1]. The function of sigmoid(x) is formulated as in Equation (5):

Abandonment of Scout Bee Phase to Reduce the Exploration Capacity
As the last phase of the ABC algorithm, the scout bee abandons any individual that has not changed for a long time, and then creates a new individual to replace the abandoned individual. This phase has certain exploration advantages for the algorithm. However, it has been proven [34] that the scout bee phase is not active in processing high-dimensional tasks, and runs the risk of missing local optimal solutions due to its exploration ability; as such, we have removed this phase. In the following experiments, we analyze the influence of removing the scout bee phase on the diversity and convergence ability of the algorithm.

Enhancement of Exploitation-Illustrative Example with GWO and WOA
The original ABC algorithm has a low exploitation capacity, especially in the onlooker bee phase. The enhancement of the exploitation capacity in this phase is the most vital factor to regaining the trade-off between exploitation and exploration in the whole procedure of the ABC algorithm. There are many algorithms that have powerful exploitation capacities, such as the GWO algorithm and WOA algorithm. Compared with other algorithms, the GWO algorithm and WOA algorithm make full use of the information related to excellent individuals in the updating process, which gives them powerful exploitation abilities. As such, we take these two algorithms as examples. In our research, we fuse each algorithm as an operator into the onlooker bee phase, and replace the updating mode of the original ABC algorithm in the same phase to enhance the exploitation capacity of our whole framework.
In the GWO algorithm, the grey wolves are divided into four hierarchies, namely, alpha (α), beta (β), delta (δ) and omega (ω). In solving optimization problems, the α wolf is the best solution, the β and δ wolves are the second-and third-best solutions, respectively, while the ω wolves are the remaining candidates. The α, β, and δ wolves lead the wolf pack to search for prey. The position of each wolf is given as follows: where X α , X β and X δ refer to the position vectors of α, β and δ, respectively. D α , D β and D δ denote the distance between the prey (α, β, δ) and the current wolf, respectively. t indicates the current iteration. A = 2a * r 1 − a, C = 2 * r 2 , r 1 and r 2 are random numbers in [0, 1] that are distributed uniformly. The value of a decreases linearly from 2 to 0 as the number of cycles increases. The three best solutions are used to be learnt from during the updating process of the GWO algorithm, which gives it a strong exploitation ability [7,37,38]. The WOA algorithm is also an SI algorithm, which employs the current optimal solution as the prey. The search agents update their positions based on the best solution. The mathematical model is described by the equations: where X p (t) is the best search agent, X rand (t) is a random position vector, b is a manually determined constant, and l is a random number in [-1,1]. The equations for calculating D, D and D are as follows: where A and C are calculated in the same way as above. The process of updating the WOA algorithm selects the best solution for learning, which makes the exploitation ability of the algorithm more powerful [8,12]. This paper introduces the operators of the GWO algorithm and WOA algorithm into our proposed framework to verify its validity. The names of the two methods are BABCGWO and BABCWOA, respectively. The pseudocode is outlined in Algorithm 1.

Algorithm 1. Pseudocode of BABCGWO/BABCWOA
Input: Population size SN, Maximum number of iterations NMAX. Output: The optimal individual x best , the best fitness value f(x best ). Initialize the population by using Equation (1) (4).
Evaluate the fitness value of each individual.

End End
Output x best and f(x best ).

Computational Complexity Analysis
The computational complexity of an algorithm is an important measure to evaluate its running time, which is usually expressed by the big O notation.   According to the above analysis, it can be concluded that the basic ABC algorithm, BABCGWO algorithm and BABCWOA algorithm have the same computational complexity, and the total computational complexity is O(SN * D * N MAX) after many cycles. In Section 5.2, we will conduct an experimental analysis on the specific execution time of each algorithm.

Experimental Design
To verify the effectiveness of the proposed FS algorithms, a series of experiments are carried out on 12 standard datasets, including two-category and multi-classification datasets. These were obtained from http://featureselection.asu.edu/datasets.php (accessed on 18 January 2020) and http://archive.ics.uci.edu/mL/datasets.php (accessed on 18 January 2020). They include microarray gene expression data, image detection data, email text data and so on. In addition, they are not only from different application fields, but also the number of features varies from 310 to 22,283, and the instances vary from 62 to 165, and this provides comprehensive experiments of the proposed and employed algorithms. Table 1 shows the details of the datasets. We verify the effectiveness of the BABCGWO and BABCWOA algorithms by comparing them with the ABC algorithm and their variants applied to high-dimensional datasets. The ABC algorithm without the scout bee phase is named the none-scout ABC algorithm (NSABC). The variants of the BABCGWO algorithm and BABCWOA algorithm with the added scout bee phase are named the BABCGWO with scout bees algorithm (BABCG-WOWS) and the BABCWOA with scout bees algorithm (BABCWOAWS), respectively. To avoid contingency, all algorithms are run 10 times independently. The population size is set to 50; the number of iterations is 100. Each algorithm is implemented in MATLAB language.
A suitable classifier is important when assessing the feature subsets. K-nearest neighbor (KNN) [39] is a common classification method that determines which category the classifier should be assigned to according to its K neighbors. In this research, the value of K is set to 5. In order to reduce the influence of over-fitting, the average classification error rate of 10-fold cross validation is taken as the fitness value. The fitness function is computed as follows:

Experimental Results and Analysis
To test the performance of our proposed framework, the diversity, convergence curves, classification error rate, size of feature subset and computing time of algorithms are investigated in this subsection. The best results are shown in bold in the tables. Figure 2 shows the diversity curves of six algorithms on 12 datasets. We can see that the diversity of ABC is obviously higher than that of other algorithms on all datasets, except DBWorld and Pixraw10P. According to the search process of the ABC algorithm, the exploration performance of the ABC algorithm is stronger, so its diversity is higher. In the early stage, high diversity can avoid trapping in the local optimization, but after a limited number of cycles, we need to find the optimal solution. The diversity of NSABC decreases a lot, which weakens the exploration ability of the ABC algorithm. After introducing the GWO and WOA operators into the framework, the diversity of these algorithms decreases faster than that of the NSABC algorithm in most datasets. The lower the diversity, the weaker the exploration ability of the algorithm and the stronger the exploitation ability of the algorithm. This shows that the introduction of the GWO and WOA operators strengthens the exploitation ability of the algorithm effectively. In addition, Figure 2 shows that the diversity curves of the BABCGWOWS algorithm and BABCGWO algorithm are similar, and the diversity curves of the BABCWOAWS algorithm and BABCWOA algorithm are not very different. It can be seen that scout bees have little effect on the diversity of the algorithms in this framework.  The convergence curves of the algorithms are plotted in Figure 3. This shows the decline of the error rate. Each curve is plotted by averaging the error rate obtained at each generation of the 10 runs. The convergence results of NSABC and ABC are similar on LSVT, Yale, colon, DBWorld, DLBCL, Pixraw10P and GLI_85 datasets, and the convergence results of the NSABC algorithm are slightly higher than that of the ABC algorithm on other datasets. Obviously, compared with the ABC algorithm, the BABCGWO and BABCWOA algorithms converge faster with a good-quality solution. On most datasets, the error rates of BABCGWO and BABCWOA are similar to or lower than those of BABCGWOWS and BABCWOAWS, respectively. It can be concluded that the BABCGWO and BABCWOA algorithms perform better than ABC in terms of both convergence speed and solution quality. Table 2 shows the worst, the best, the mean and the standard deviation of the error rate results of each algorithm. The ultimate goal of FS is to improve generalization performance, which means achieving a lower error rate when used on unforeseeable data. A lower error rate indicates that the algorithm can find a better feature subset. Since almost all the SI-based algorithms are stochastic in nature, they may produce different results in each run. Therefore, standard deviation is conducted to measure the variations in the results. The smaller the standard deviation is, the more stable the algorithm is.   Table 2 shows the worst, the best, the mean and the standard deviation of the error rate results of each algorithm. The ultimate goal of FS is to improve generalization performance, which means achieving a lower error rate when used on unforeseeable data. A lower error rate indicates that the algorithm can find a better feature subset. Since almost all the SI-based algorithms are stochastic in nature, they may produce different results in each run. Therefore, standard deviation is conducted to measure the variations in the results. The smaller the standard deviation is, the more stable the algorithm is. From Table 2, we can see that the error rate of the NSABC algorithm is slightly higher than that of the ABC algorithm used on most datasets, but the increase is not more than 0.01. The error rate was improved on all datasets except Pixraw10P after the introduction of the operator with strong exploitation ability in the onlooker bee phase. Specifically, BABCWOA's average error rate is at least 0.005 lower than the average error rate of the ABC algorithm, applied on the Prostate dataset. When used on the Yale dataset, the average error rate decreased the most, by nearly 0.06, and on the SRBCT and DBWorld datasets, the average error rate decreased by about 0.05. On other datasets, the average error rate also decreased by about 0.01 to 0.03. Moreover, the average error rate of BABCGWO was reduced more, especially on the Yale dataset, and BABCGWO decreased by 0.125 compared with the ABC algorithm. The average error rate of BABCGWO is at least 0.017 lower than that of the ABC algorithm when used on the GLI_85 dataset. There are seven datasets on which the average error rate decreased by more than 0.04. As you can see, the error rate of BABCGWOWS did not change much on most datasets compared to BABCGWO, and the results of comparison between BABCWOAWS and BABCWOA are similar.
In terms of the worst error rate, the BABCWOA algorithm was reduced on half of the datasets, while the BABCGWO algorithm improved on all datasets except Pixraw10P, and the maximum error rate decreased the most on the Yale dataset (by 0.119). Both BABCWOA and BABCGWO decreased in terms of the error rate. It can be seen from the standard deviation of the algorithm that although the error rate of the BABCWOA algorithm improved on the whole, its error rate was not as stable as the ABC algorithm's on a few datasets, such as Yale and ALLAML. The stability of the BABCGWO algorithm is similar to that of the ABC algorithm. It can be concluded that the introduction of operators with strong exploitation ability into the proposed framework can indeed improve the ABC algorithm to a certain extent, and the scout bee phase is not active and has little effect on reducing the error rate of the algorithm. As per the results in Table 3, the number of features of the improved algorithm is more than that of the ABC algorithm. Although dimensionality reduction is one of the targets of FS, it is more important to achieve a lower error rate in many practical applications. Although the ABC algorithm has a small feature subset, it can be observed from the error rate in Table 2 that such a low number of selected features cannot achieve a low error rate.
According to Figure 4, it is obvious that the calculation time will be reduced when the scout bee phase is removed. After the introduction of the GWO operator, the BABCGWO algorithm displayed little difference in time compared with the ABC or NSABC algorithms used on some datasets. In addition, BABCGWO was much faster than the ABC algorithm on colon, SRBCT, Leukemia1, DLBCL, ALLAML, Pixraw10P, Prostate, and Leukemia2 datasets. After the introduction of the WOA operator, the running time of the BABCWOA algorithm increased more or less on all datasets, except SRBCT and ALLAML, which may be because the running speed is also proportional to the feature number. It can be observed from Table 3 that the BABCWOA algorithm selects more features than other algorithms. To sum up, the proposed framework can effectively make the convergence speed faster, reduce the diversity, and find a better optimal solution. Although the number of features increases, the classification error rate of the algorithm decreases significantly after introducing an operator with strong exploitation ability into the framework. The scout bee phase has very little effect on improving the fitness value of the solution and consumes computational resources and memory, so the scout bee phase is omitted in this framework. From the above analysis, we can conclude that using the proposed framework can effectively improve the performance of the ABC algorithm.
the ABC algorithm on colon, SRBCT, Leukemia1, DLBCL, ALLAML, Pixraw10P, Prostate, and Leukemia2 datasets. After the introduction of the WOA operator, the running time of the BABCWOA algorithm increased more or less on all datasets, except SRBCT and ALLAML, which may be because the running speed is also proportional to the feature number. It can be observed from Table 3 that the BABCWOA algorithm selects more features than other algorithms. To sum up, the proposed framework can effectively make the convergence speed faster, reduce the diversity, and find a better optimal solution. Although the number of features increases, the classification error rate of the algorithm decreases significantly after introducing an operator with strong exploitation ability into the framework. The scout bee phase has very little effect on improving the fitness value of the solution and consumes computational resources and memory, so the scout bee phase is omitted in this framework. From the above analysis, we can conclude that using the proposed framework can effectively improve the performance of the ABC algorithm.

Further Analysis
The comparisons in Section 5 show that the proposed BABCGWO algorithm and BABCWOA algorithm are more efficient than the ABC algorithm. To make a complete evaluation, we further verify the effectiveness of BABCGWO algorithm and BABCWOA algorithm by comparing them with four state-of-the-art FS algorithms on high-dimensional datasets, including the popular PSO variants named CSO [40] and VSCCPSO [41], the novel GWO variant ALO_GWO [42], and an ABC variant named ACABC [31]. In particular, CSO, VSCCPSO and ALO_GWO achieved excellent results in dealing with high-dimensional datasets. The parameters applied in CSO, VSCCPSO, ALO_GWO and ACABC here are the same as their own parameter settings.
In this section, the classification error rate, the size of the feature subset, the computational time and the convergence curve of the six algorithms are investigated. The best results are shown in bold in the table. To further verify the improved effect of the two algorithms proposed in this paper, the Wilcoxon's rank sum test [43,44] with a significance level of 0.05 is applied to test the statistical significance between two different algorithms. The error rate, number of features and execution time of the two algorithms are tested by Wilcoxon rank sum test with another four FS algorithms. In the table of Wilcoxon rank sum test, the symbol "+" indicates that the proposed algorithms are significantly better than the compared algorithm, the symbol "=" means that the performance of the two algorithms is similar, and the symbol "−" is opposite to "+", indicating that the proposed algorithms are significantly worse than other algorithms. Table 4 shows the worst, best, average, and standard deviation of the error rate for each algorithm. In terms of the maximum error rate and average error rate, the BABCGWO algorithm performed worse than any other algorithms in all datasets except Yale and Pixraw10P, and it was reduced by several percentage points on most datasets. The BABCWOA algorithm outperformed the four compared algorithms on half of the datasets, and achieved the best average error rate of all algorithms on the Pixraw10P dataset. In terms of minimum error rate, the BABCGWO algorithm's was lower than the other algorithms when applied all datasets except LVST, Yale and GLI_85. The BABCWOA algorithm also outperformed the four algorithms on more than half of the datasets. The standard deviation of the BABCGWO algorithm was ≤0.01 on most datasets and 0.02 on the colon dataset only, which is significantly superior to most other algorithms, indicating that the BABCGWO algorithm has better stability compared with the other algorithms. However, the standard deviation of the BABCWOA algorithm is mostly about 0.02, which is not much different from the four compared algorithms. To further illustrate whether the error rates of the algorithms proposed in this paper are significantly different from those of other algorithms, we use the Wilcoxon rank sum test. As can be seen from the Wilcoxon rank sum test results of the error rate in Table 5, compared with the ACABC and CSO algorithms, the error rates of the algorithms proposed in this paper were almost significantly lower than those of the other four algorithms when used on 12 datasets. Compared with the VSCCPSO algorithm, the BABCGWO algorithm was superior to the VSCCPSO algorithm when used on all datasets except LSVT, Yale and Pixraw10P. The BABCWOA algorithm was also significantly improved compared with the VSCCPSO algorithm when used on some datasets, and there was little difference in the error rate between BABCWOA and VSCCPSO for most datasets. Compared with the ALO_GWO algorithm, the error rate of the BABCGWO algorithm was significantly lower than that of the ALO_GWO algorithm for all datasets except SRBCT and DBWorld, and there was almost no notable difference between the BABCWOA and ALO_GWO algorithms. Table 5. Wilcoxon rank sum test on error rates of algorithms. The experimental results in Table 6 show the average number of selected features and average execution time of the six algorithms across 10 runs on 12 datasets. One of the purposes of FS is to remove redundant and irrelevant features so as to strengthen the classification performance of the algorithm. In the case of the same error rate, the smaller number of selected features indicates that the algorithm can find a better feature subset. The experimental results show that the average number of features selected by the BABCGWO algorithm is less than that of other algorithms for 12 datasets, its running time is shorter than that of other algorithms in all datasets except Yale, and its running speed is only slower than that of the CSO algorithm for the Yale dataset. The BABCWOA algorithm selects fewer features than the four compared algorithms on all datasets except LSVT and Yale, and the BABCWOA algorithm runs faster than the compared algorithms on 8 of the 12 datasets, and is second only to the CSO algorithm on the remaining 4 datasets. Therefore, although the error rate of the BABCWOA algorithm is not significantly improved on some datasets, it does improve the size of feature subsets and running time. This indicates that, compared with other algorithms, the algorithms proposed in this paper can find a feature subset with smaller size in a shorter time, and achieve a lower error rate.

Datasets
The Wilcoxon rank sum test results in Table 7 show that the feature subsets selected by the algorithms proposed in this paper are significantly smaller than those of the ACABC and CSO algorithms applied on the 12 datasets. Compared with ALO_GWO and VSCCPSO, the feature numbers of the proposed algorithms are not significantly lower than those of the two algorithms, but for only a few datasets.
As can be seen from the results in Table 8, the proposed algorithms are not much different from, or are slower than, the other algorithms for only a few datasets. On most datasets, the two algorithms are significantly faster than other algorithms.

Conclusions
There are often redundant and irrelevant features in high-dimensional datasets, so the FS method is used for data preprocessing. Aiming at the strong exploration ability of the ABC algorithm, this study proposes a framework that integrates the updating operators of the algorithm with strong exploitation abilities into the ABC algorithm to make the exploration and exploitation abilities balanced. Moreover, since the removal of the scout bee phase can weaken the exploration ability and save computational resources when processing high-dimensional datasets, the scout bee phase in the ABC algorithm is left out in our framework, and thus the BABCGWO algorithm and BABCWOA algorithm are proposed to deal with the FS problem in high-dimensional datasets. The experimental results show that on 12 high-dimensional datasets, the BABCGWO algorithm and BABCWOA algorithm are significantly superior to other algorithms as regards dimensionality reduction, classification error rate and execution time. This shows that the proposed framework can balance the capabilities of exploration and exploitation, and effectively improve the overall performance in FS.
However, the proposed method mainly focuses on the single-objective feature selection problem, where the main aim is to reduce the algorithm's classification error rate. In the future, we will investigate a multi-objective FS algorithm that simultaneously maximizes the classification performance and minimizes the number of selected features. In conclusion, the proposed framework is effective. The exploration ability of the ABC algorithm is successfully combined with the updating mode of the algorithm with a strong exploitation ability, such that the BABCGWO algorithm and BABCWOA algorithm can find optimal solutions with lower error rates and fewer feature numbers in a shorter period of time.

Conclusions
There are often redundant and irrelevant features in high-dimensional datasets, so the FS method is used for data preprocessing. Aiming at the strong exploration ability of the ABC algorithm, this study proposes a framework that integrates the updating operators of the algorithm with strong exploitation abilities into the ABC algorithm to make the exploration and exploitation abilities balanced. Moreover, since the removal of the scout bee phase can weaken the exploration ability and save computational resources when processing high-dimensional datasets, the scout bee phase in the ABC algorithm is left out in our framework, and thus the BABCGWO algorithm and BABCWOA algorithm are pro-posed to deal with the FS problem in high-dimensional datasets. The experimental results show that on 12 high-dimensional datasets, the BABCGWO algorithm and BABCWOA algorithm are significantly superior to other algorithms as regards dimensionality reduction, classification error rate and execution time. This shows that the proposed framework can balance the capabilities of exploration and exploitation, and effectively improve the overall performance in FS.
However, the proposed method mainly focuses on the single-objective feature selection problem, where the main aim is to reduce the algorithm's classification error rate. In the future, we will investigate a multi-objective FS algorithm that simultaneously maximizes the classification performance and minimizes the number of selected features. Moreover, we would like to employ algorithms in different domains to verify their universality.