A Self-Adaptive Cuckoo Search Algorithm Using a Machine Learning Technique

: Metaheuristics are intelligent problem-solvers that have been very efﬁcient in solving huge optimization problems for more than two decades. However, the main drawback of these solvers is the need for problem-dependent and complex parameter setting in order to reach good results. This paper presents a new cuckoo search algorithm able to self-adapt its conﬁguration, particularly its population and the abandon probability. The self-tuning process is governed by using machine learning, where cluster analysis is employed to autonomously and properly compute the number of agents needed at each step of the solving process. The goal is to efﬁciently explore the space of possible solutions while alleviating human effort in parameter conﬁguration. We illustrate interesting experimental results on the well-known set covering problem, where the proposed approach is able to compete against various state-of-the-art algorithms, achieving better results in one single run versus 20 different conﬁgurations. In addition, the result obtained is compared with similar hybrid bio-inspired algorithms illustrating interesting results for this proposal.


Introduction
Recent studies about bio-inspired procedures to solve complex optimization problems have demonstrated that finding good results and the best performance are laborious tasks, so it is necessary to apply an off-line parameter adjustment on metaheuristics [1][2][3][4][5]. This adjustment is considered an optimization problem itself, and several studies are proposing some solutions to solve that, but it always depends on his static therms [6]. Many of these studies use mathematical ways to change the values of one of each parameter during their execution. In this context, parameters like the population size of metaheuristics are initially set without considering their variation or behaviors. We consider using a machine learning (ML) technique that lets us analyze the population and determine the values of the number of solutions and the abandon probability.
As we can see in [7], machine learning and optimization are two topics of artificial intelligence that are rapidly expanding, having a wide range of computer science applications. Due to the rapid progress in the performance of computing and communication techniques, these two research areas have proliferated and drawn widespread attention in a wide variety of applications [8]. For example, in [9], the authors present different clustering techniques to evaluate the credit risk in a determinate population of Europe. Although both fields belong to different communities, they are fundamentally based on artificial intelligence, and the techniques from ML and optimization frequently interact with each other and themselves to improve their learning and/or search capabilities.
On the other hand, advances in operations research and computer science have brought forward new solution approaches in optimization theory, such as heuristics and metaheuristics. While the former are experience-based procedures, which usually provide good solutions in short computing times, metaheuristics are general templates that can easily be tailored to address a wide range of problems. They have been shown to provide near-optimal solutions in reasonable computing times to problems for which traditional methods are not applicable [10]. Moreover, as we can see in [11], the tendency to use a hybrid method to solve some type of recent problem, such as those that COVID-19 has brought with it, has recently proven its effectiveness. In one of the important studies of these two main topics, we are interested in exploring the integration to use ML into metaheuristics, in terms to enhance any of the characteristics or attributes of those algorithms: the solutions, performance, or time to get the results. In this way, we propose to use an unsupervised machine learning technique that let us learning in the search space of the metaheuristic, exploiting the characteristic of their attributes that let us use to enhance the metaheuristic parameters in an online way: spatial clustering based on noise application density (DBSCAN) is one of those techniques that gathers those characteristics. We propose to use noise clustering to associate with the abandon probability and the solutions clustering result for determining the number of nests of the cuckoo search algorithm (CSA). They are studies that include some hybrid propose to enhance the CSA whit an ML technique, as in [12], when the authors present a Kmean technique to determinate the discrete parameters to improve the CSA. In [13], the contribution is a hybrid method with a sin-cousin algorithm to enhance the search space to be used on the CSA. Other different scenarios, but similar cases are in [14], they present a CuckooVina, a combination of cuckoo search and differential evolution in a conformational search method. Thus, some studies propose a hybridization to enhance CSA, but they do not respond to our search to find some ML technique that allows enriching the self-adaptation capabilities of CSA in the way we present.
To illustrate and prove our approach, we apply an improvement of the cuckoo search algorithm with a self-adaptive capability (SACSADBSCAN). This approach was tested on the set covering problem, whose goal is to cover a range of needs at the lowest cost and is a widely used problem to demonstrate research.
The remainder of the paper is structured as follows: in Section 2, we introduce the theoretical background on the optimization and clustering algorithms that have been integrated into the proposed approach, then, in Section 3, we present the original CSA, and his parameters that need to be set to run, moreover we introduce to the DBSCAN algorithm, how is the characteristic operation of this algorithm, and how we propose to exploit them in the metaheuristics. Following this, in Section 4 we present the integration of DBSCAN on CSA, how parameter values are configured in the execution of our approach. Then, in Section 5 experimental results and discussions are shown. Finally, we conclude and suggest some lines of future research in Section 6.

Related Work
As previously mentioned, parameter adjustment of metaheuristics is a complex task and is considered an optimization problem itself, in many cases, this depends on the try-error test to find a good combination of parameters. In this scenario, some studies use different ways, many of those with a mathematical formula to vary the values of the parameters of the CSA.
For example, to determine the step size α and Pa parameter of the Cuckoo Search, Ref. [15] uses a mathematical formula that depends on the range of those minimal and maximal values of those parameters and the number of iterations of the algorithm. With that idea, the target is to use a bigger target area, also those neighboring areas. In [16], authors propose a self-adaptive step size in his studies to vary the α parameter according to the fitness in each iteration, applying a mathematical formula to get the value to set. In addition, Ref. [17] vary the α and Pa values according to their propose, α changes his value according 1.
Each cuckoo lays an egg at a time and drops it into a randomly selected nest.

2.
The best nests with high-quality eggs will be carried over to the next generations. 3.
The number of available host nests is fixed, and the egg laid by a cuckoo is discovered by the host bird with a probability Pa ∈ [0, 1]. In this case, a new random solution is generated.
Every new generation is determinate by Lévy flight [34], that is given by the Equation (2) where x d i is the element d of a solution i at iteration t. x d i (t + 1) is a solution in the iteration t + 1. α > 0 is the step size which should be related to the scales of the problem of interest, the upper (Ub) and lower bounds (Lb) that the problem need to be determinated, in this scenario values between 0 and 1.
The Lévy flight represents a random walk while the random step length is drawn from a Lévy distribution which has an infinite variance with an infinite mean.

Density Based Spatial Clustering Application with Noise
Density-Based Spatial Clustering of Applications with Noise [35] is a popular data grouping algorithm that uses automated analysis techniques to similar group information together. It can be used to find clusters of any shape in a data collection with noise and outliers. Clusters are dense sections of data space separated by regions with a lower density of points.
The goal is to define dense areas that can be determined by the number of things in close proximity to a place. It is crucial to understand that DBSCAN has two important parameters that are required for work.

1.
Epsilon ( ): Determines how close points in a cluster can be seen to each other.

2.
Minimum points (MinPts): The minimal amount of points required to produce a concentrated area.
The underlying premise is that there must be at least a certain number of points in the vicinity of a particular radius for each given group. The parameter determines the surrounding radius around a given point; every point x in the data set is tagged as a central point with a neighbor above or equal to MinPts; x is a limit point if the number of neighbors is less than MinPts. Finally, if a point is neither a center nor a border, it is referred to as a point of noise or an outlying point.
When users do not know much about the data being modified, this technique proves helpful in metaheuristics. See pseudo-code in Algorithms 1 and 2 to understand how this strategy works.
The computed groups are shown on the right side of Figure 1. Each iteration of the process adjusts the positions until the algorithm converges. It is important to mention that the noise points are those that do not remain in any cluster at the time of iterating the algorithm and we can visualize them as the white points around the clusters identified in the graph (Red, Yellow, Blue and Green), all these noise points are grouped into a single cluster, and in this way all the points will belong to a specific cluster.

Proposed Approach: Integrating DBSCAN in CSA
In this section, we describe how DBSCAN was integrated on CSA, in addition to all the keys elements for this technique to work. We add a shortcode at the end of each iteration to decide whether intervention is appropriate to implement parameter control intervention and use the DBSCAN algorithm to calculate the values of those parameters (see Algorithm 3).   In the following sections, we explain some relevant topics that are important to mention to understand our approach. Section 4.1 explains under which criteria the DBSCAN intervening in the metaheuristic to make an analysis on the search space and make the clustering on the solutions. Then, Section 4.2 indicates how the noise cluster is fundamental to determinate the abandon probabilities.

Free Execution Parameter
We decide to include a variable to control the moment to intervene the parameter values of the CSA to make the metaheuristic maintain its independence of executions and its specific behavior, in this case, we consider it prudent to set it on one hundred iterations of a free run. If these execution values are reached, then the algorithm performs a procedure to update the CSA parameter values, as is described in line 39 of the Algorithm 3.

Online Parameter Setting
As we mention in the previous section, the update of the values of the parameters in the CSA occurs when the value of the free execution parameter has reached its limit, then the DBSCAN algorithm can be run and used the generated clusters to infer in the parameter values of the metaheuristic. How to associate the parameters of the Pa and Nest is detailed in the following sections.

Probability Abandon Nest
To set the value of the probability of nest abandon, we use the value of the noise point obtained from the DCSCAN execution. That value indicates the points that are excluded from any cluster, so we can associate the number of noise points to make the metaheuristic consider those probabilities to explore new points of the search space. To make this possible, we associate the percentage number of noise points to set the abandon probabilities value. To let the metaheuristic keep the normal execution we use bounds between 10 to 40 percentage. That indicates, that if the noise points are more than 40% then, the Pa values are set to 0.4, in the same way, if the noise point is less than 10%, the Pa values are set to 0.1. In other cases, the values are set to the percentage values of noise points.

Number of Nests
To determine when varying the number of nests on the CSA, our approach considers keeping in memory the last best fitness value to compare with the new candidate solutions, if the best value of fitness does not vary on the fourth intervention of the DBSCAN, we consider that it is necessary to increase the number of nests, to amplify the search space. So, in this case, the number of nests increases in five on five every time this scenario occurs. On the other hand, if the global best value improves four times consecutively, then the number of nest decrease, eliminating the worst five.

Exploration Influence
During the tests, we realize that the CSA solution space converges to a single large cluster that considers all possible solutions while evaluating the executions, in some instances, certain clusters appear to have values compressed very close to each other. We deem it entirely appropriate in this scenario to have CSA explore the search space. For this, half of the cluster points are renovated to new random ones, allowing the metaheuristic to diversify the search space. The replacement criteria are according to the following Formula (3) where ClusterSol corresponding to all solutions in the current cluster evaluates. Expl In f l is evaluated with the best possible solution of the global population, if the absolute value of the difference between both is larger than one, then we renew a half-point on those clusters.

Set Covering Problem
Many studies use the Set Covering Problem (SCP) to represent real scenarios into the studies, as we can see in the area of management of crew on airlines [36], in the optimization of the location of the emergency buildings [37], manufacturing [38] and production [39]. As we can see, SCP is a classical combinatorial optimization problem. It is belong to the NP-hard class [40] and is formally defined as well: let A = (a ij ) be a binary matrix with M-rows (∀ i ∈ I = {1, . . . , M}) and N-columns (∀ j ∈ J = {1, . . . , N}), and let C = (c j ) be a vector representing the cost of each column j, assuming that c j > 0, ∀ j = {1, ..., N}. Then, it observes that a column j covers a row i if a ij = 1. Therefore, it has: The SCP entails identifying a group of materials that can be used to address a lot of purposes for the least amount of money. A feasible solution corresponds to a subset of columns in its matrix form, and the demands are associated with rows and regarded as constraints. The goal of the challenge is to find the columns that best cover all of the rows.
The Set Covering Problem identifies a low-cost subset S of columns that covers each row with at least one column from S. The SCP can be expressed using integer programming as follows: Instances: We use 65 instances from Beasley's OR-library, which are arranged into 11 sets, to evaluate the algorithm's performance when solving the SCP. To represent the instances, we present on Table 1 the following details: instance group, the number of rows M, number of columns N, the cost range, density (percentage of non-zeroes in the matrix). Reducing the instance size of SCP: In [41] different pre-processing approaches have been proposed in particular to reduce the size of the SCP, with Column Domination and Column Inclusion being the most effective. These methods are used to accelerate the processing of the algorithm.
Column Domination is the process of removing unnecessary columns from a problem in such a way that the final solution is unaffected. Steps: • All the columns are ordered according to their cost in ascending order. • If there are equal cost columns, these are sorted in descending order by the number of rows that the column j covers. • Verify if the column j whose rows can be covered by a set of other columns with a cost less than c j (cost of the column j).

•
It is said that column j is dominated and can be eliminated from the problem.
Column Inclusion: when the domination process has terminated, the process of inclusion is performed, which means if a row is covered only by one column, means that there is no best column to cover those rows, which implies, that column must be included in the optimal solution. All of this process will be included to data of instances and let the new solutions satisfying the constraints.

Experimental Results
To evaluate the performance of our proposal, we test the instances of the SCP [42], comparing the original CS with different configurations, versus the SACSDBCAN.

Methodology
To adequately evaluate the performance of metaheuristics, a performance analysis is required [43]. For this work, we compare the supplied best solution of the CSA to the best-known result of the benchmark retrieved from the OR-Library [44]. Figure 2 depicts the procedures involved in doing a thorough examination of the enhanced metaheuristic. We create objectives and recommendations for the experimental design to show that the proposed approach is a viable alternative for determining metaheuristic parameters. Then, as a vital indicator for assessing future results, we evaluate the best value. We use ordinal analysis and statistical testing to evaluate whether a strategy is significantly better in this circumstance. Lastly, we detail the hardware and software aspects that were used to replicate computational experiments, and we present all of the results in tables and graphs. As a result, we conduct a contrast statistical test for each case, using the Kolmogorov-Smirnov-Lilliefors process [45] to measure sample autonomy and the Mann-Whitney-Wilcoxon [46] test to statistically evaluate the data, in Figure 3 we describe and determinate the organization.
The Kolmogorov-Smirnov-Lilliefors test allows us to assess sample independence by calculating the Z MI N or Z MAX (depending on whether the task is minimization or maximization) obtained from each instance's 31 executions.  The relative percentage deviation is used to assess the results (RPD). The RPD value computes the difference between the objective value Z m in and the minimal best-known value Z o pt for each instance in our experiment, and it is determined as follows:

Set Covering Problem Results
Infrastructure: Java 1.8 was used to implement SACSDBSCAN. The Personal Computer (PC) has the common attributes: MacOS with a 2.7 GHz Intel Core i7 CPU and 16 GB of RAM.
Setup variables: The configuration for our suggested approach is shown in Table 2 below.  Sixty-five SCP instances were considered, each of which was run 31 times; the results are presented in Tables 3 and 4.

Population Initial Abandon Probability
Overview: The algorithms are ranked in order of Z min achieved. The instances that obtained Z min are also displayed. As can be seen in the results of the algorithms that solved SCP, we compare the distribution of the samples of each instance using a violin plot, which allows us to observe the entire distribution of the data. We provide and discuss the most difficult instances of each group to create a resume of all the instances below (4.10, 5.10, 6.5, A.5, B.5, C.5, D.5, NRE.5, NRF.5, NRG.5 and NRH.5): In [47], the authors display the results obtained, the information is detailed, and the configuration that they use. The information is organized as follows: MIN: the minimum value reached, MAX: the maximum value reached, AVG: the average value, BKS: the best-known solution, RPD is determined by Equation (43), and lastly the average of the fitness obtained.
The first method was the standard cuckoo search algorithm with various settings, and the second was SACSDBSCAN, as previously indicated. Tables 5-8 show the behavior of our proposed algorithm versus the original algorithm. The best results are highlighted with underline and maroon color. For example, in the instance 4.1, the best reached solution by our proposal overcomes than the classical CS algorithm. The same strategy is used in all comparisons.      19.28 197 199 18.67 195 199 17.47 197 199 18.67 195 199 17.47 197 199 18.67 197 199 18.67 197 199 18.67 196 199 18. The distribution of the data in all instances (Figures 4-14), shows that the performance of our proposal is better than the traditional the cuckoo search optimizer. Concentrating the largest distribution of results in the optimal values, while in original CS they are visibly distant. For example, in the instance B.5, we can see that the distribution is better in our proposal, but at the same time, reflex the behavior to move the result on the best values in his executions, getting the center of the distribution in the best value quartile. Other instance that shows this behaviour can be observed in C.5. Here, our proposal generates again a large number of optimum results.
The scenario in D.5 and NRE.5, is similar in all of them, the behavior of the ASCSDB-SCAN show that reach best results compare to CS. In instance NRF.5 we can see that the behavior of both algorithms obtains a similar figure, reflexing the results obtained from the nature of the sample data. In this scenario, we can rescue results in our proposal, obtains betters solutions. The instance NRG.5 is the only scenario where the results of this instance are at six point of distance to obtain an Z opt in SACSDBSCAN. Finally, for the instance NRH.5, the behavior of the ASCSDBSCAN is again superior than CS.

Statistical Test
As previously stated, we offer the following hypotheses in order to determine independence: -H 0 : states that Z min /Z max follows a normal distribution. -H 1 : states the opposite.
The test performed has yielded p_value lower than 0.05; therefore, H 0 cannot be assumed. Now that we know that the samples are independent, and it cannot be assumed that they follow a normal distribution, it is not feasible to use the central limit theorem. Therefore, for evaluating the heterogeneity of samples we use a non-parametric evaluation called Mann-Whitney-Wilcoxon test to compare all the results of the hardest instances we propose the following hypotheses: -H 0 : CS is better than SACSDBSCAN. -H 1 : states the opposite.
Finally, the statistical contrast test reveals which technique is considerably superior. The Wilcoxon signed rank test was used to compare SCP on the algorithms techniques for the hardest instances (Tables 9-19). Smaller p-values than 0.05 define that H 0 cannot be assumed because the significance level is also set to 0.05.
To conduct the test run that supports the study, we use a method from the PISA system. We specify all data distributions (each in a file and each data in a line) in this procedure, and the algorithm returns a p-value for the hypotheses.
The following tables show the result of the Mann-Whitney-Wilcoxon test. To understand them, it is necessary to know the following acronyms: • SWS = Statistically without significance.      In all the cases, as mentioned above, the p-values reported are less than 0.05, and SWS suggests that they have no statistical significance. So, with this knowledge, in each instance mentioned, we can see the SACSDBSCAN algorithm was better than the original CS.
If we focus on the instances where our proposal improves the result obtained in comparison to the original CS algorithm, we can infer that the solutions achieved are distributed in a centered way on their optimal value, which reflects that the behavior of this algorithm is very positive. This is reflected in the violin Figure 8 or Figures 10 and 14.

Comparison Results in Similar Hybrid Algorithms
Within the literature, recent studies can be found that use hybrid algorithms that solve the coverage problem [5,[48][49][50]. However, to compare a hybrid algorithm that resembles our proposal, we have considered making a comparison of results with hybrid algorithms that work with bio-inspired metaheuristics, improved by ML, and they solve the set covering problem. In this scheme, we have three algorithms. The first one is the crow search algorithm boosted by the DBSCAN method (called CSADBSCAN). The second studied approach was the integration between the crow search algorithm and the Kmean method (CSAKmean). Both hybridizations were proposed by Valdivia et al. in [51]. Finally, we employ an improved version of the cuckoo search algorithm with the Kmean transition algorithm (KMTA), recently proposed by García et al. in [52].
Tables 20 and 21 present best values reached in CSADBSCAN, CSAKmean and KMTA. Those algorithms implement different strategies to improve metaheuristics with ML. To resume and centering the results of the best values obtained, we add the AVG measure in the final row of each table. Unfortunately, KMTA only reports results of the first of each family instance, so N/R means Not Reported.

Conclusions
In this paper, we can conclude that the use of the machine learning technique to make a metaheuristic autonomous parameters setting has the 38/65 min values fitness on the test that we make, in one single configuration over the 20 different configurations, that demonstrates that with our proposed algorithm, it is not necessary make the complex task to find the best parameter setting of the metaheuristic CS, that is, in most of the time, in a try-and-error way. The result of the experiment demonstrates that it is positive to use the DBSCAN algorithm to infer his result and use that information to let us make changes to the values of the parameters. The comparison results with other bio-inspired hybrid algorithms applied on the set covering problem demonstrate that the use of DBSCAN obtains better results on average fitness values in comparison with studies that report all the best results values. The exploration criteria that we use can let the algorithm vary the search space to find another best candidate, as we saw in the box graphs and the distribution of the instances. In addition, the use of the noise points associate with the Pa, lets the SACSDBSCAN keep the variety on his behavior and not forget the stochastic factor that characterizes a metaheuristic.
The free execution parameter allows the metaheuristic to maintain its natural behavior across executions. At the time of reaching the freedom parameter, we can analyze the results of the metaheuristics and classify its results space to be able to make its classification and corresponding intervention of the possible candidate solutions, eliminating the worst for possible new better solutions.
As future work, we consider implementing an improvement to the criterion of population increase and decrease using clusterization strategies. In another line of work, we want to implement different machine learning techniques to be able to perform algorithms that allow effective use of the population increase/decrease in metaheuristics, and thus be able to deliver tools that make these algorithms more efficient. In addition, we are considering contributing new works that provide comparisons of this algorithm with other hybrid variants of cuckoo search, and other types of metaheuristics that use ML to improve their performance, different from those presented in this work, used to solve continuous problems.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Symbol α
Step size to generate the next solution Pa Probability abandon of CSA