An Improved Artiﬁcial Bee Colony for Feature Selection in QSAR

: Quantitative Structure–Activity Relationship (QSAR) aims to correlate molecular structure properties with corresponding bioactivity. Chance correlations and multicollinearity are two major problems often encountered when generating QSAR models. Feature selection can signiﬁcantly improve the accuracy and interpretability of QSAR by removing redundant or irrelevant molecular descriptors. An artiﬁcial bee colony algorithm (ABC) that mimics the foraging behaviors of honey bee colony was originally proposed for continuous optimization problems. It has been applied to feature selection for classiﬁcation but seldom for regression analysis and prediction. In this paper, a binary ABC algorithm is used to select features (molecular descriptors) in QSAR. Furthermore, we propose an improved ABC-based algorithm for feature selection in QSAR, namely ABC-PLS-1. Crossover and mutation operators are introduced to employed bee and onlooker bee phase to modify several dimensions of each solution, which not only saves the process of converting continuous values into discrete values, but also reduces the computational resources. In addition, a novel greedy selection strategy which selects the feature subsets with higher accuracy and fewer features helps the algorithm to converge fast. Three QSAR datasets are used for the evaluation of the proposed algorithm. Experimental results show that ABC-PLS-1 outperforms PSO-PLS, WS-PSO-PLS, and BFDE-PLS in accuracy, root mean square error, and the number of selected features. Moreover, we also study whether to implement scout bee phase when tracking regression problems and drawing such an interesting conclusion that the scout bee phase is redundant when dealing with the feature selection in low-dimensional and medium-dimensional regression problems.


Introduction
Quantitative structure-activity relationship (QSAR) plays a vital role in drug design and discovery [1]. It aims to build the relationship between molecular structure properties of chemical compounds and their corresponding biological activities [2]. In QSAR modeling, the structure properties of the chemical compounds are encoded by a variety of features (molecular descriptors) such as topological, constitutional, thermodynamic parameters. QSAR models can be defined as regression or classification models by using different computational strategies [3]. The features are related with biological activities by using statistical methods or artificial intelligence approaches, such as Multiple Linear Regression (MLR) [4], Support Vector Regression (SVR) [5], Boosted Tree [6], and Partial Least Squares (PLS) regression [7], etc. In particular, machine learning methods have become extensively used in this field during the last few years [8][9][10][11][12][13][14][15]. These methods effectively improve the accuracy of QSAR modeling to a certain extent.
However, several computational issues must be addressed when QSAR models are inferred by machine learning methods. One of these problems is to address the complexity of data sets for selection of appropriate features important for defining a particular QSAR model. Specifically, not all features are related to the activity, and the redundant or irrelevant features may cause over-fitting or weak correlation [16]. The optimal feature subset with only related and non-redundant features increases the accuracy of prediction and the interpretability of the QSAR model. Thus, feature selection (FS) which selects an optimal subset of all features is a vital pre-processing step in QSAR studies to increase the interpretability and improve the prediction accuracy [17].
In principle, feature selection is an NP-hard combination problem. For a search space with D dimensions, the number of subsets to search is 2 D . In other words, the search space increases exponentially as the dimension of the given problem grows, hence it is intractable for limited computational resources.
Evolutionary Computation (EC) techniques are optimization methods inspired by scientific understanding of natural or social behavior, which can be regarded as search procedures at some abstraction level [18]. In general, these algorithms can be classified as either Evolutionary Algorithms (EAs) or Swarm Intelligence (SI) algorithms [19]. EAs start by randomly generating a set of candidate solutions, iteratively combine these solutions, and implement survival of the fittest until an acceptable solution is achieved. The classic EAs include Genetic Algorithm (GA) [20], Differential Evolution (DE) [21], Biogeography-Based Optimization (BBO) [22], and Genetic Programming (GP) [23], etc. SI algorithms start with a set of individuals, and a new set of individuals is created based on historical information and related information in each iteration. A considerable number of new SI algorithms have emerged, such as Ant Colony Algorithm (ACO) [24], Bat Algorithm (BA) [25], Firefly Algorithm (FA) [26], Cuckoo Search (CS) [27], Coyote Optimization Algorithm (COA) [28], and Social Network Optimization (SNO) [29].
SI is a relatively new category of evolutionary computation comparing with EAs and other single-solution based approaches and has paid sufficient attention to feature selection due to its potential global search ability. In particular, the interaction between features can be considered in the screening process, which breaks through the shortcomings of traditional feature selection algorithms. The surveys [30,31] have presented the proven usage of SI algorithms for FS.
The Artificial Bee Colony (ABC) [32] algorithm, which simulates the intelligent foraging behavior of a honeybee swarm, is one of the most well-known SI algorithms. Karaboga et al. concluded that, although ABC uses fewer control parameters, it performs better than or at least comparable to other typical SI algorithms [33]. Ozger et al. [34] carried out a comparative study on different binary ABC algorithms on feature selection. BitABC [35] employs bitwise operators such as AND, OR, XOR to generate new candidate solutions, and the binary ABC algorithm uses different functions to convert continuous vector to binary vectors, such as rounding function [36], sigmoid function [37], and tangent function [38]. The experimental results showed that BitABC generated better feature subsets in shorter computational time. Moreover, many studies combine ABC with other optimization algorithms, such as DE [39], ACO [40], and PSO [41], and they achieve promising results as well. However, the ABC algorithm is seldom applied to regression and prediction problems and achieve promising results.
To improve the accuracy and interpretability of QSAR that is a regression and prediction problem, we apply ABC algorithm to feature selection in QSAR. Major novelties and contributions of our study are described as follows: (1) To save the process of converting continuous space into discrete space and reduce the consumption of computing resources, a two-point crossover operator and a two-way mutation operator are employed to generate food sources in employed bee phase and onlooker bee phase. (2) To achieve fast convergence, a novel greedy selection strategy is employed to greatly reduce the possibility of food sources being abandoned. (3) Furthermore, we investigate the influence of different threshold values that determine whether to implement the scout bee phase on the performance of QSAR and draw an interesting conclusion that the scout bee phase is redundant when dealing with the feature selection in low-dimensional and medium-dimensional regression problem.
The rest of this paper is organized as follows: Section 2 reviews the related work of FS methods based on SI. Section 3 briefly describes QSAR modeling and the FS problem. Section 4 presents the basic ABC algorithm and proposes two improved ABC variants for FS in QSAR. Section 5 describes the experimental datasets and parameter settings. Section 6 presents the experimental results. Conclusions are given in Section 7.

Related Work
SI algorithms are well-known for their global exploration capability and are gaining more attention by the feature selection community recently. It has been proven by the well-known "No Free Lunch (NFL) theorem" [42] that there is no heuristic algorithm that can solve all types of optimization problems. Specifically, since the explorationexploitation balance is an unsolved issue within SI algorithms, each SI algorithm introduces an experimental solution through the combination of deterministic models and stochastic principles. Under such conditions, each SI algorithm holds distinctive characteristics that properly satisfy the requirements of particular problems [18]. Therefore, a particular SI algorithm is not able to solve all problems adequately. This motivates many researchers to investigate the effectiveness of different algorithms in different fields. Between 2010 and 2020, there have been a total of 85 papers used SI algorithms for feature selection in different fields [30].
For the medical application, Mehrdad et al. integrated the node centrality and PSO algorithm [43] to improve the performance on FS. Neggaz et al. [44] applied the sinecosine algorithm and the disruption operator to Salp Swarm Algorithm (SSA) to improve the accuracy of disease diagnosis. Mafarja and Mirjalili [45] proposed a novel Whale Optimization Algorithm (WOA) for FS, and the crossover and mutation operators are used to enhance the exploitation of the WOA algorithm. An FS method suppressed less relevant features in the breast cancer datasets by ABC. Then, to minimize the potential of ABC being trapped in a local optimum, the accuracy of classification by GBDT is employed to evaluate the quality of the inputs [46]. To select a DNA microarray subset of relevant and non-redundant features for computational complexity reduction, Indu et al. [47] proposed a two-phase hybrid model based on improved-binary PSO (iBPSO). A recursive PSO method was developed by Prasad et al. [48] for gene selection. Ahead of this, an Ant Colony Optimization-selection (ACO-S) is utilized to generate a gene subset with the smallest size and salient features while yielding high classification accuracy [49]. Furthermore, Yan et al. [50] hybridized the V-WSP, proposed by Ballabio et al. [51], with PSO to improve the accuracy of laser-induced breakdown spectroscopy. Moreover, to solve the feature selection problem for acoustic defect detection, a single-objective feature selection algorithm hybridizing the Shuffled Frog Leaping Algorithm (SFLA) with an improved minimumredundancy maximum-relevancy (ImRMR) was proposed by Zhang et al. [52]. To handle the challenges of the network detection that detecting anomalies from high dimensional network traffic feature is time-consuming, an FA-based feature selection was attempted by Selvakumar and Muneeswaran [53]. to obtain an optimized detection rate. In addition, FS methods based on the firefly algorithm were investigated for Arabic text classification. Ref. [54] and facial expression classification [55].
Additionally, various SI algorithms have been applied to FS in QSAR. Kumar et al. [56] first used multi-layer variable selection strategy, and then used GA to select meaningful descriptors from a large set of initial descriptors. PSO has been widely applied to selection descriptors in QSAR. For instance, Shen et al. [57] modified PSO named PSO-PLS for variable selection in MLR and PLS modeling. The hybridization of PSO with GA are used as a FS technique by Goodarzi et al. [58]. After that, Wang et al. [59] proposed a weighted sampling PSO-PLS (WS-PSO-PLS) to select the optimal descriptor subset in the QSAR/QSPR model. Moreover, the improved binary Pigeon Optimization Algorithm (POA) was applied to selecting the most relevant descriptors (variables) in QSAR/QSPR classification models [60].
Compared with PSO and ACO, there are fewer studies applying ABC algorithm to FS. Most of them are applied to classification or clustering problems, and rarely used to select features for regression problems. Therefore, in this paper, the ABC algorithm is used to select features for PLS modeling, which is the most straightforward linear regression-based modeling method in QSAR.

QSAR Modeling
In QSAR study, the number of parameters describing the molecular structure of compounds is generally much larger than the number of samples, and there may be obvious chance correlations and multicollinearity between these parameters. By decomposing and screening the information in the data system, the Partial Least Squares (PLS) method can extract the variables with a strong explanation to overcome the adverse effects of chance correlations and multicollinearity in modeling. Therefore, the PLS method is often used as a prediction model for QSAR modeling.
The PLS method is often utilized to predict the relationship between compounds and their corresponding biological activities or chemical properties. It models the relationship between two data matrices, the independent variables X and target variable Y, by a linear multivariate model with factor analysis [61]. The basic principle of PLS regression depends on latent variables. Latent variables are extracted from a set of descriptors that include the basic information essential for modeling the target. QSAR is modeled by using a dependent (response variable) and several independent (molecular descriptors) variables.
The value of Q 2 , a well-known metric which employs the cross-validation technique, measures the accuracy of QSAR modeling, and it is defined as follows: where y i ,ŷ i , andȳ i are the observed value of activity of the compound, the predicted value by the PLS model via using cross-validation procedure, and the average observed value of all compounds, respectively. n is the total number of compounds.

Feature Selection
Let S be a dataset of L samples with D features. A feature selection problem can be described as follows: selecting d features (d < D) from all features, to optimize a given function H(·). In regression analysis and prediction, H(·) generally represents the accuracy or error rate. Generally, we use a binary string to encode a solution X in FS problems: where x j = 1 represents that the jth feature is selected into the subset X, otherwise, not selected. Then, taking the case of that function H(·) being prediction accuracy, the FS problem can be formulated as follows:

The Basic Artificial Bee Colony Algorithm
ABC is a swarm intelligence algorithm that simulates the foraging behavior of a honey bee colony [32]. It has been used widely in many fields for solving optimization problems [62]. In the hive, three types of bees are assigned to the foraging task: employed bees, onlooker bees, and scout bees. Employed bees use the previous source information to find better food sources and share the information with onlooker bees. Onlooker bees waiting on the hive exploit a source with the help of the information shared by employed bees. Scout bees search for undiscovered sources based on an internal rule or possible external clues. The basic implementation of ABC is as follows: (1) Initialization phase: From the perspective of an optimization problem, each food source represents a probable solution which is described as a vector: X = (x i,1 , x i,2 , . . . , x i,D ), and is generated by Equation (4): where i = 1, 2, . . . , SN and SN is the number of the food source. j = 1, 2, . . . , D and D is the dimensionality of the search space. x i,j is the jth dimension of x i . x max j and x min j are the maximum and minimum boundary value, respectively.
(2) Employed bee phase: Each employed bee is associated with a food source. Employed bees need to modify the position of their food source to find new better ones. Thereby, they learn from a neighbor source which is selected randomly among all sources except for itself. The new food source is produced by Equation (5): In the above formula, φ i,j is a uniformly distributed random value within [−1,1]. After x i is produced, its fitness value can be evaluated according to Equation (1). Then, a greedy selection is applied to the selection between x i and replaces x i to enter the next iteration and its counter holding, the number of trials is reset to 0. Otherwise, x i is kept into the next iteration and its counter holding the number of trials is increased by 1.
(3) Onlooker bee phase: After getting the information concerning nectar amount (fitness value) and positions of food sources from employed bees, each onlooker bee selects a food source according to the fitness values by a roulette-wheel scheme, where the better the fitness value of the source, the higher the probability of being selected. The probability value of each food source is calculated by Equation (6): After calculating the probability value of each source, a random number rand(0, 1) is generated to determine whether to be chosen. If P i > rand(0, 1), x i is chosen to update just as in the employed bee phase.
(4) Scout bee phase: Each source has a counter which is zero at the beginning. If the counter holding the number of trials exceeds the predefined threshold value, its corresponding food source will be abandoned and replaced by a new food source, which is generated by Equation (4).

ABC Algorithm for FS in QSAR
The basic ABC algorithm is originally proposed for optimization problems in continuous space; however, FS is an optimization problem in discrete space. Each feature subset is represented with a binary string. "1" in the string means the feature is selected and "0" means the feature is not selected. Hence, the value obtained by Equation (4) needs to be converted into a discrete value by Equation (7): If the value of a dimension is greater than or equal to the threshold value 0.5, the corresponding feature is selected and then its value will be set as 1. Otherwise, it is not selected and its value will be set as 0. Accordingly, an ABC-based algorithm for feature selection in QSAR is proposed, namely ABC-PLS. The pseudo code of the ABC-PLS algorithm can be seen in Algorithm 1.

Algorithm 1 Pseudo code of the ABC-PLS algorithm
Input: Population size SN, Maximum number of iterations MaxIt, Abandonment limit L, counter = 0, t = 0.
Output: The optimal food source x best , the best fitness value f (x best ). %Employed bee phase 5: for each employed bee do 6: Randomly select a different food source x k .

7:
Generate a new food source according to Equation (5) and convert it into discrete values by using Equation (7). 8: Evaluate the fitness value of each food source by using Equation (1). 9: Update x i according to greedy selection, and increase its counter counter by 1 if not update. 10: end for 11: Calculate the selection probability of each food source by using Equation (6). 12: %Onlooker bee phase 13: for each onlooker bee do 14: Select a food source x i according to the selection probability by the roulette-wheel scheme. 15: Randomly select a different food source x k . 16: Generate a new food source according to Equation (5) and convert it into discrete values by using Equation (7). 17: Evaluate the fitness value of each food source by using Equation (1). 18: Update x i according to greedy selection, and increase its counter counter by 1 if not update. 19: end for 20: %Scout bee phase 21: for each food source do 22: if counter ≥ L then 23: Replaced by a new food source according to Equation (5) and convert it into discrete values by Equation (7). 24: Evaluate the fitness value of the new food source by using Equation (1). 25: end if 26: end for 27: end while 28: Output x best and f (x best ).

An Improved ABC Algorithm for FS in QSAR
Since ABC-PLS needs to convert continuous values into discrete values in all four phases of the algorithm, it consumes more computational resources (time, memory). Inspired from Hancer [63], the two-point crossover operator and two-way mutation operator are employed to generate food sources in the employed bee phase and onlooker bee phase. In the algorithm proposed by Hancer, x i and x k generate two new food sources by the crossover operation, and generate another two new food sources by the mutation operator. Therefore, the size of a set with SN solutions will expand to 5 × SN after cross-mutation. Furthermore, the 5 × SN solutions are ranked using non-dominated sorting, and SN number of solutions are selected to update the population through rank and crowding distance. Instead of expanding the size of the solutions set, which consumes much computational time and memory, the two-point crossover operator and the two-way mutation operator used in this paper are described as follows: 4.3.1. Two-Point Crossover crossover is operated between the current food source x i and a food source x k that is selected randomly (x i = x k ), two positions m and n are randomly determined on x i and x k (m < n < D). All values between the position of x i are copied to x i and generate a new food source [63]. An illustrative sample of crossover operator is presented in Figure 1.

Two-Way Mutation Operator
First, a random number within the range of 0 and 1 is uniformly generated. If the generated number is greater than 0.5, a position with value 1 is randomly chosen and its position is set to 0. Otherwise, a position with value 0 is randomly chosen and its position is set to 1 [63]. In this way, a new food source is generated. The mutation operator used in this paper is shown as Figure 2. Subsequently, in view of the fact that if the food source in the employed bee or onlooker bee phase is not updated for a long time, it will be abandoned and reinitialized to produce a new food source, which will reduce the convergence speed of the algorithm, a greedy selection strategy is employed after mutation, and it is specifically described as follows.

Novel Greedy Selection Strategy
If the fitness value of x i is higher than x i , x i replaces x i and enters next iteration. If the fitness of x i is the same as x i , but its number of selected features is less than or equal to the x i , x i replaces x i and enters next iteration as well. Else, x i enters the next iteration and x i is discarded. To make it easier to understand, we give the following example, as shown in Figure 3. There are nine cases of whether an individual updates or not in Figure 3, where x t and x t denote food source and its offspring in the current iteration, respectively. x t+1 denotes the individual entering the next iteration. f itness is the prediction accuracy and sum is the number of selected features. The first three cases indicate that, if x t has the same fitness value with x t and its number of selected features is the same as or more than x t , it will be replaced by x t ; otherwise, it will enter the next iteration directly. If the fitness value of x t is smaller than x t , it will also be replaced by x t regardless of the number of features it selects, which is shown as cases 4-6. In another three cases, x t with a larger fitness value than x t will enter the next iteration without update.
Combining the above three together, the ABC-PLS-1 is proposed. Figure 4 shows the flowchart of ABC-PLS-1 and pseudo code is outlined in Algorithm 2. Overall, the two-point crossover and two-way mutation operators not only save on the process of converting continuous space into discrete space, but also reduce the consumption of computing resources. Furthermore, the greedy selection strategy greatly reduces the possibility of food sources being abandoned so that the algorithm can converge fast to the optimal solution, thereby, the scout bee phase of ABC algorithm does not improve the prediction performance, so it can be omitted. This conclusion will be verified by setting different thresholds that determine whether to carry out the scout bee phase. Output: The optimal food source x best , the best fitness value f (x best ). % Employed bee phase 5: for each employed bee do 6: Randomly select a different food source x k .

Algorithm 2 Cont.
Input: Population size SN, Maximum number of iterations MaxIt, Abandonment limit L, counter = 0, t = 0. Output: The optimal food source x best , the best fitness value f (x best ).

7:
Generate a new food source by crossover operator and mutation operator on x i and x k . 8: Evaluate the fitness value of each food source by using Equation (1). 9: Update x i according to greedy selection, and increase its counter counter by 1 if not update. 10: end for 11: Calculate the selection probability of each food source by using Equation (6). 12: % Onlooker bee phase 13: for each onlooker bee do 14: Select a food source x i according to the selection probability by roulette-wheel scheme. 15: Randomly select a different food source x k . 16: Generate a new food source by crossover operator and mutation operator on x i and x k . 17: Evaluate the fitness value of each food source by using Equation (1). 18: Update x i according to greedy selection, and increase its counter counter by 1 if not update. 19: end for 20: %Scout bee phase 21: for each food source do 22: if counter ≥ L then 23: Replaced by a new food source according to Equation (5) and convert it into discrete values by Equation (7). 24: Evaluate the fitness value of the new food source by using Equation (1). 25: end if 26: end for 27: end while 28: Output x best and f (x best ).

Datasets and Parameters
To verify the performance of the proposed ABC-PLS and ABC-PLS-1 algorithm, a series of experiments are conducted on three common QSAR datasets: Artemisinin, benzodiazepine receptors(BZR), and Selwood. The Artemisinin dataset contains 178 compounds and 89 features. The BZR dataset contains 163 compounds and 75 features. The Selwood dataset contains 29 compounds and 53 features [59]. The basic information about the datasets is described in Table 1. We investigate the performance of the proposed algorithms by comparing it with three FS algorithms for QSAR, including PSO-PLS [57], WS-PSO-PLS [59], and BFDE-PLS [64]. For the compared algorithms, the parameters are set as recommended in the corresponding papers. All algorithms are in MATLAB languages. The population size is 50, and the maximum number of iterations is 200. The thresholds value (i.e., Limit) in the ABC algorithm is set to 100. For fair comparison, each algorithm runs 100 times independently. Table 2 gives the parameter settings of all algorithms.

Method
Learning

Performance Metric
A 5 fold cross-validation method is employed to evaluate the performance of QSAR. Here, the metric Q 2 reflects the prediction accuracy, and it is calculated as Equation (1). The number of features denotes NU M. To know more about the stability of the algorithm, the Root Mean Square Error (RMSE) is calculated as well, which is defined as follows: where y i ,ŷ i refer to the same as in Equation (1), and M X denotes the number of compounds. Table 3 shows the experimental results of six QSAR methods.The best results are identified in boldface. Without introducing intelligent algorithms, the PLS model selects all features in each dataset, and the mean Q 2 and root mean square error are respectively 0.6 and 0.99 on the Artemisinin dataset, 0.4 and 0.85 on the BZR dataset, and 0.24 and 0.65 on the Selwood dataset. However, the performance of PLS is improved when the intelligent algorithm is introduced into the model. The experimental results show that the intelligent algorithm can eliminate the irrelevant features in the datasets by using global search or local search.

Experimental Results and Analysis
The following comparison results can be obtained from Table 3: on the Artemisinin dataset, the mean Q 2 of ABC-PLS-1 is 3.64% larger than that of PSO-PLS and 1.48% larger than that of WS-PSO-PLS, the root mean square error of ABC-PLS-1 is 5.73% smaller than that of PSO-PLS and 2.39% smaller than WS-PSO-PLS. On the BZR dataset, the mean Q 2 of ABC-PLS-1 is 1.8% larger than that of PSO-PLS and 1.29% larger than that of WS-PSO-PLS, the root mean square error of ABC-PLS-1 is 1.49% smaller than that of PSO-PLS and 1.07% smaller than that of WS-PSO-PLS, and the features selected by ABC-PLS-1 are 5.88 less than that selected by PSO-PLS and 4.14 less than that selected by WS-PSO-PLS. On the Selwood dataset, the mean Q 2 of ABC-PLS-1 is 6.73% larger than that of PSO-PLS and 1.74% larger than that of WS-PSO-PLS, the root mean square error of ABC-PLS-1 is 7.67% smaller than that of PSO-PLS and 2.28% smaller than that of WS-PSO-PLS, and the features selected by ABC-PLS-1 are 5.8 less than that selected by PSO-PLS and 3.16 less than that selected by WS-PSO-PLS. The mean Q 2 of ABC-PLS-1 is larger than that of BFDE-PLS and the root mean square error of ABC-PLS-1 is smaller than that of BFDE-PLS on all the three datasets. However, the features selected by ABC-PLS-1 is more than that selected by BFDE-PLS in the Artemisinin dataset and the BZR dataset. ABC-PLS-1 selects more features than ABC-PLS on the Artemisinin dataset, but it is superior to the ABC-PLS on the other two datasets. In conclusion, although the number of selected features of ABC-PLS-1 is not smaller than that of BFDE-PLS on Artemisinin and BZR, the prediction accuracy and the root mean square error of ABC-PLS-1 is obviously better than ABC-PLS, PSO-PLS, WS-PSO-PLS, and BFDE-PLS on all the three datasets.
A rank sum test method at a significance level of 0.05 is used to compare mean Q 2 on three datasets to determine whether ABC-PLS-1 is significantly different from PSO-PLS, WS-PSO-PLS, BFDE-PLS, and ABC-PLS. As shown in Table 4, ABC-PLS-1 is significantly better than others in the mean Q 2 on all datasets.  Figure 5 shows the Q 2 obtained by each algorithm used by running 100 times on three datasets. It is obvious that the Q 2 of ABC-PLS-1 is generally higher than that of the other four algorithms on the Artemisinin dataset and the Selwood dataset. In the last subfigure, the Q 2 of ABC-PLS-1 is higher than PSO-PLS, WS-PSO-PLS, and ABC-PLS, and it is stable.
Convergence curves of the algorithms on three datasets are shown in Figure 6. Each curve is an average result of 100 runs in each iteration. ABC-PLS-1 converges faster with a good quality of solution compared to other state of-the-art methods on Artemisinin and BZR datasets. Although BFDE-PLS finally converges to a higher quality solution than ABC-PLS-1 on the Selwood dataset, it is greatly inferior to others on Artemisinin and BZR datasets and its convergence speed is slow. Overall, ABGWO achieved better performance than others with respect to both convergence speed and solution quality.  Furthermore, in order to verify the validity of ABC-PLS-1, the Root Mean Square Error (RMSE) of ABC-PLS-1 is compared with that of the other four algorithms. Figure 7 presents Box-plots which show the RMES of the five algorithms on three datasets. "+" in figure are outliers. As can be seen from the figure, the mean line of ABC-PLS-1 is lower than PSO-PLS, WS-PSO-PLS, BFDE-PLS, and ABC-PLS on all three datasets. Therefore, the performance of ABC-PLS-1 is better and more stable than others.
For a better evaluation of our proposed FS methods, not only the accuracy and the size of feature subsets but also the computational time is investigated. The computational time is presented in terms of mean values over the 100 runs in Table 5. Like as other wrapper methods, the proposed algorithm requires a high computational cost to evaluate the fitness of individuals. The CPU execution time of ABC-PLS-1 is only less than that of BFED-PLS. However, it is a remarkable fact that the accuracy of feature selection method is far more important than the computational complexity of this method in many highprecision applications, such as biological genetic engineering, medical diagnosis, drug design, and discovery. In fact, in these applications, we prefer to choose the FS method with the highest accuracy, even if it is at the cost of higher computational complexity. Although the proposed ABC-PLS-1 has no edge over time consumption, it boosts the accuracy of FS in QSAR, which is exactly what QSAR modeling needs.
According to the above experimental results, we come to the conclusion that the proposed ABC-PLS-1 performs well in QSAR. In addition, to investigate whether the scout bee phase is redundant when dealing with the feature selection for low-dimensional and medium-dimensional regression prediction problem, we do further experiments on the ABC-PLS-1 by setting different values of Limit.   Table 6 shows the experimental results of three performance metrics when the scout bee operator takes different Limit values on the three datasets. The best results are identified in boldface. In the case of no scout bee phase, i.e., Limit = ∞, the Q 2 on the Artemisinin dataset, BZR dataset, and Selwood dataset are, respectively, 0.7731, 0.5757, and 0.9338, which are respectively 0.15%, 0.33% and 0.12% larger than that in the case of setting the Limit to 100; The root mean square error are respectively 0.7468, 0.7153, and 0.1906, which are, respectively, 0.25%, 0.34%, and 0.19% smaller than that in the case of setting the Limit to 100. The number of the selected features on the Artemisinin dataset is 0.3 smaller than that in the case of setting the Limit value to 100. Therefore, the scout bee operator is redundant in dealing with the feature selection for low-dimensional and medium-dimensional datasets in regression.

Conclusions
To improve the prediction accuracy and interpretability of QSAR modeling, two ABC variants are proposed for feature selection in QSAR in this paper, namely ABC-PLS and ABC-PLS-1. In the former variant, we convert the continuous space to a discrete space by a threshold and then apply it to feature selection in QSAR. In the later variant, to save the process of converting continuous space into discrete space and reduce the consumption of computing resources, the two-point crossover operator and the two-way mutation operator are introduced in the employed bee phase and onlooker bee phase. Furthermore, a novel greedy selection strategy is employed to help the algorithm converge fast to the optimal solution by reducing the possibility of food sources being abandoned. The performance of our proposed algorithms on feature selection in QSAR are compared with that of three state-of-the-art FS methods on three QSAR datasets. The comparison results show that, not only in terms of prediction accuracy and feature subset size, but also in terms of stability, the proposed ABC-PLS-1 outperforms other algorithms. Moreover, we also study whether the scout bee phase is necessary by setting different values of Limit, and conclude that the scout bee phase is redundant when dealing with the feature selection in low-dimensional and medium-dimensional regression problems.
In future research, we will propose a multi-object ABC algorithm for QSAR to maximize the prediction accuracy and minimize the number of selected features, simultaneously.