Differential Evolution with Linear Bias Reduction in Parameter Adaptation

: In this study, a new parameter control scheme is proposed for the differential evolution algorithm. The developed linear bias reduction scheme controls the Lehmer mean parameter value depending on the optimization stage, allowing the algorithm to improve the exploration properties at the beginning of the search and speed up the exploitation at the end of the search. As a basic algorithm, the L-SHADE approach is considered, as well as its modiﬁcations, namely the jSO and DISH algorithms. The experiments are performed on the CEC 2017 and 2020 bound-constrained benchmark problems, and the performed statistical comparison of the results demonstrates that the linear bias reduction allows signiﬁcant improvement of the differential evolution performance for various types of optimization problems.


Introduction
The Computational Intelligence (CI) methods include a variety of approaches, such as Evolutionary Computation (EC), Fuzzy Logic (FL), and Neural Networks (NN). Despite the differences among these directions, one of the most important research areas that connects the parts if CI is the classical numerical optimization, and thus, the development of modern optimization techniques is one of the most important directions. The new heuristic optimization methods, including Evolutionary Algorithms (EA) and Swarm Intelligence (SI), are usually developed by introducing new algorithmic schemes [1] or parameter control or adaptation techniques [2]. Among the numerical optimization techniques, the Differential Evolution (DE) algorithm [3] has attracted the attention of a large number of researchers due to its simplicity and high performance [4]. The results of the annual competitions on numerical optimization, conducted within the IEEE Congress on Evolutionary Computation, show [5] that in the last few years, the winners have been mostly DE-based frameworks, which include hybridization or novel parameter tuning techniques.
Currently, one of the main directions of studies on differential evolution is the development of parameter adaptation techniques. The parameter tuning mechanism should rely on the search process quality, usually fitness values and their improvement, and for a tuning scheme to be efficient, it should be designed with respect to the algorithm properties, which leads to many different adaptation schemes developed for specific algorithms. The final goal of every tuning scheme is similar: provide the algorithm with the ability to tune parameters to suboptimal values at every stage of the search so that the best possible solution is reached at the end. Several recent surveys considered the existing variants of DE and their properties [6,7].
The parameter tuning techniques include parameter adaptation and parameter control methods: the former change parameter values based on feedback, while the latter follow a predetermined

Differential Evolution Algorithm
The original differential evolution algorithm was proposed by Storn and Price [13], and its main feature was the usage of differences between the vectors representing solutions, hence the name of the algorithm. It has received much attention from researchers mainly because it is easy to implement and it has high efficiency. The DE starts with a population of randomly initialized solutions x i,j = rand(x max,j − x min,j ), where i = 1, . . . , NP, j = 1, . . . , D, x min,j and x max,j are the lower and upper bounds of the search area for variable j, NP is the initial population size, and D is the problem dimension. After initialization, the algorithm proceeds with the mutation, crossover, and selection operations. The theoretical considerations of all these steps were studied in [14].
There are several mutation schemes known for DE, and the basic strategy is called rand/1 and implemented as follows: v i,j = x r1,j + F(x r2,j − x r3,j ), (1) where F is the scaling factor parameter and indexes r1, r2 and r3 are mutually different and randomly generated. There were several more advanced strategies proposed, including the current-to-pbest/1, introduced in JADE [15] and also used in the SHADE framework [8]: v i,j = x i,j + F(x pbest,j − x i,j ) + F(x r1,j − x r2,j ) (2) where pbest is the index of one of the pb * 100% best in terms of fitness individuals and pbest = r1 = r2 = i. The mutation strategies significantly influence the algorithm performance and require specific parameter tuning techniques; most of them randomly sample F and Cr values. There were several modifications to the current-to-pbest/1 strategy, including the current-to-pbest-w/1 strategy introduced in jSO [11], which significantly improved the performance: where F w is defined as follows: where NFE is the number of function evaluations and NFE max is the total computational resource. In L-SHADE-RSP (L-SHADE with Rank-based Selective Pressure) [16], the mutation strategy was modified with selective pressure resulting in current-to-pbest-w/r, which allowed faster convergence.
In current-to-pbest-w/r, the last index r2 is selected considering the fitness values of the individuals in the population, i.e., the rank values rank i = i, i = 1 . . . NP are set, with the largest rank assigned to the best individual (smallest goal function value in the case of minimization). The probabilities of individuals being selected are calculated as follows: After the mutation step, the crossover is performed, where the newly generated donor vector v i is combined with the current vector x i . There are two types of crossover operations known for DE, namely the binomial and exponential crossover. The exponential crossover is mostly used when there are known connections between adjacent variables, while the binomial crossover is more popular and implemented as follows: The jrand index is randomly generated in [1, D] and is needed to make sure that at least one component is taken from the donor vector, and Cr is the crossover rate parameter. The jSO algorithm additionally changes the Cr generation, so that large values of the crossover rate are not allowed at the beginning of the search: After generating the trial solution u i , the bound constraint handling method is applied to make sure that all vectors are within the search bounds: After calculating the fitness of the trial solution f (u i ), the selection step is performed: Most DE implementations use greedy selection, where the newly generated solution is accepted only if it improves its corresponding current solution x i .
In addition, JADE, SHADE, L-SHADE, jSO, L-SHADE-RSP, and other approaches use an external archive of solutions. The archive is initially empty and is filled with parent vectors, replaced during selection. The archive usually has the same size as the population, and after its size hits the maximum value, newly inserted solutions replace randomly chosen ones. The archived solutions are used in current-to-pbest/1 mutation strategies in the last random index: r2 is either selected from the population or the archive.

Related Studies on Differential Evolution
The original proposal of the original DE algorithm [13] included the three main control parameters of the algorithm: the population size NP, scaling factor F, and crossover rate Cr. Most of the further studies on DE were focused on either developing methods to control or adapt these parameters or proposing new mutation strategies and modifying the algorithm scheme. Some of the earliest theoretical works [17,18] considered the convergence properties of the DE and their connection to parameter settings on a set of benchmark problems, resulting in a set of recommendations for parameter values mainly in connection with the current population variance [19,20].
Although these findings were important for the field, the adaptive DE variants appeared to have more influence on the field. Starting with one of the earlier studies [21], which proposed the popular jDE approach, which proposed the random generation of F and Cr with remembering successful values, there was a set of other adaptive DE variants proposed, including the SDE [22], SaDE (Self-adaptive Differential Evolution) [23], and the already mentioned JADE algorithm [15]. All these methods are based on sampling parameter values with random values distributed either with a uniform, normal, or Cauchy distribution. The most efficient scheme used guided adaptation, where more efficient parameter values were determined based on fitness values, like in the JADE [15] or SHADE [8] approach. In [24], the scale factor local search approach was proposed, where unlike the jDE algorithm, where the F parameter is generated randomly, the local search algorithms are used, such as the golden ratio or hill-climb. The idea of such a local search is similar to the steepest descent method, where the step size is optimized to get the best possible improvement.
In [25], the parameter control strategy was proposed, where the scaling factor was linearly decreased with the iteration number, allowing achieving better results compared to other methods. Several studies [26,27] have proposed different approach to the parameter adaptation, where a pool of fixed parameter values is maintained, and some of them are randomly chosen during the search process. A number of papers described the adaptive selection of mutation strategies used in the DE algorithm, for example adaptive DE with four mutation strategies [28] or a multi-population ensemble of mutations strategies [29]. In [30], the multi-population DE with scaling factor inheritance was proposed, where efficient solutions from one population were transferred to another together with the scaling factor.
Despite the demonstrated efficiency of these approaches, the annual competitions on bound-constrained single-objective optimization organized within the Congress on Evolutionary Computation demonstrate that the most promising approaches are DE with adaptive parameter values, such as SHADE [31] or jDE variants, or hybrid methods, such as LSHADE-SPACMA [32], where L-SHADE was hybridized with CMA-ES (Covariance-Matrix Adaptation Evolution Strategy) with dynamic resource redistribution, or [33], where the super-fit individual generated with CMA-ES was included in the population, although the multi-strategy methods also demonstrate highly competitive results [34].
Several surveys were published discussing the problem of parameter adaptation in DE and considering the existing methods' advantages and drawbacks; a detailed taxonomy was presented in [35]. According to the results of this study, one of the most efficient schemes, the success-history adaptation, was initially introduced in the SHADE algorithm, the winner of CEC 2014 competition, and it represents the parameter tuning techniques introduced in JADE. SHADE maintains H memory cells, each keeping a pair (M F,h , M Cr,h ), where h is the current memory index. For every mutation and crossover operation, new values are generated as follows: where randc is a Cauchy distributed random value, randn is normally distributed, and k is chosen in the range [1, D] for every individual. The memory cells are used as location parameters of the distributions, and if the generated F value is below zero, it is generated again; however, if F > 1, then it is set to one; also if Cr < 0, then Cr = 0, and if Cr > 1, then Cr = 1. The memory cells are updated using the values of F and Cr, which allow generating solutions with better fitness values. The successful F and Cr are stored in S F and S Cr , as well as The update is performed as follows:

M g+1
Cr,k = 0.5(M g Cr,k + mean wL (Cr)), (12) where g is the current generation number and mean wL is the weighted Lehmer mean calculated as follows: where Except the success-history adaptation, there are several other popular parameter adaptation methods, for example the jDE algorithm [21], which has been shown to be very efficient in solving complex optimization problems if a large computational resource is given [36]. In jDE, the following mechanism was proposed: for every individual in the population, a memory cell keeping a pair of F and Cr values is maintained. Initially, the values for all cells are set to M F,i = 0.5 and M Cr,i = 0.9, i = 1, . . . , NP, and new values are generated before the mutation and crossover operations: In a similar manner, the Cr values are generated: The parameters τ 1 and τ 2 control the frequency of the F and Cr change and usually are set to 0.1. If the offspring generated with these parameter values have better fitness compared to the parent, then these F and Cr values are saved in the corresponding memory cells. The resulting jDE algorithm appeared to be highly competitive despite its simplicity, and its modifications jDE100 [37] and j2020 [36] have shown promising results during the CEC 2019 and CEC 2020 competitions on single-objective bound-constrained optimization with a large computational resource.
One of the three main parameters of DE is the population size NP. There have been several important studies on the DE population size, including [38]. Although several population size tuning algorithms exist, for example the structured population size reduction introduced in the SPSRDEMMS (Structured Population Size Redution Differential Evolution with Multiple Mutation Strategies) algorithm [39], the idea proposed in L-SHADE, the Linear Population Size Reduction (LPSR) is one of the most widely used. In LPSR, one or several individuals to be deleted is determined, and the worst in terms of fitness are removed at the end of each generation g. The new population size is calculated as follows: where NP min = 4 is the minimal population size and NP max is the initial population size. The population size reduction idea is to allow a wide search at the beginning to cover as much search space as possible, but later decrease the population size for better convergence to the best located optimum. The archive size NA is decreased in the same way as NP.

Proposed Approach: Linear Bias Reduction
One of the problems in parameter adaptation for DE is the existence of structural bias in the parameter adaptation process. For example, the problem of reaching different areas of the search space by DE was discussed in [40]. The JADE and SHADE algorithms used the Lehmer mean [41] instead of the classical arithmetical mean to avoid the structural bias in terms of parameter values, because it is much easier for the algorithm to generate better solutions with smaller F and Cr values than larger F and Cr, because smaller parameter values result in more local search, and thus easy improvements. However, such greedy behavior could only be beneficial from a short-term perspective, and only for local search. In [12], the generalized Lehmer mean was proposed, where a larger p parameter in the mean was allowed: The provided equation in its general form defines a group of means, and the values of p and m control the bias towards smaller or larger values. For example, mean 0,1,wL (S) is the harmonic mean, mean 0.5,1,wL (S) the geometric mean, mean 1,1,wL (S) the arithmetic mean, and mean 2,1,wL (S) the contraharmonic mean. Any biased mean could be obtained by setting the parameter values, and the graph of possible mean p,m,wL is presented in Figure 1. The behavior of the parameter adaptation in DE could be significantly influenced by the newly generated (M F,h , M Cr,h ) values, so that the setting of p and m may influence the sampled F and Cr values, thus leading to a more explorative search, with large p and small m, or a more exploitative search with small p and large m. Further, in this study, the fixed setting of m = p − 1 will be used, as changing only the p parameter in the Lehmer mean is sufficient to control the biased mean.
Inspired by the idea of linear population size reduction, the Linear Bias Reduction (LBR) technique is proposed. As long as in most cases, the computational resource limit is known, at the beginning of the search process, more exploration is required to find the most promising areas of the search space, while at the end, fast convergence is required. The linear bias reduction starts with a large p max value and gradually decreases the Lehmer mean parameter down to p min as follows: In this study, the lower limit is set to p min = 1, which corresponds to the arithmetic mean, while the upper limit p max is changed. The idea of LBR is inherited from the linear parameter reduction scheme L-SHADE algorithm [10] and other studies, such as [25]. The pseudocode of the L-SHADE-LBR is presented in Algorithm 1. Randomly generate r 1 from p% best 20: Randomly generate r 2 from 1, NP g

24:
if jSO then 25:  The next section contains the experimental setup and results of the LBR application to state-of-the-art DE algorithms on the CEC 2017 and 2020 competitions benchmark problems.

Results
The evaluation of the linear bias reduction approach was performed on two sets of benchmark problems, in particular the CEC 2017 [5] and CEC 2020 [42] competitions on bound-constrained single-objective optimization. These two sets of test problems were chosen because they represent two different scenarios: the CEC 2017 tests algorithms on 30 different 10, 30, 50, and 100D problems with a limited computational resource, and CEC 2020 has only 10 problems in 5, 10, 15, and 20 dimensions, but allows exponential growth of the computational resource available. For CEC 2017 problems, the resource was set to 10,000D fitness evaluations, and for CEC 2020 it was set to 50,000, 10 6 , 3 × 10 6 , and 10 7 for 5D, 10D, 15D, and 20D respectively. All test problems were shifted and rotated to avoid the cheating of the algorithms.
The LBR modification was tested with the L-SHADE, jSO, and DISH algorithms, and the following parameters were set for all algorithms: initial population size NP max = 25 · log(D) · √ D for CEC 2017, NP max = 30 · D · √ D for CEC 2020, final population size NP min = 4, pbest = 0.17, archive size N A = NP, number of memory cells H = 5. The initial value of p m ax in LBR was set to 40 for F update and 32 for Cr update; these values were chosen according to the experimental results reported in [12]. The algorithm was implemented in C++ with GCC and run on a PC with Ubuntu 20.04, an Intel Core i7 8700 k processor, and 32 GB RAM, with the results' post-processing performed using Python 3.6.
The comparison of the results was performed for L-SHADE, jSO, and DISH with and without LBR. In the case that LBR was not used, the p-values in the Lehmer mean were set to two. The Mann-Whitney statistical test with normal approximation, tie-breaking, and the significance level set to 0.01 was applied to identify the difference between algorithm modifications. Table 1 Table 1 show that the LBR gives up to 16 improvements for L-SHADE on 100-dimensional functions and 17 improvements for 50-dimensional functions. The jSO and DISH algorithms, which perform better than L-SHADE, and LBR give smaller improvements for these algorithms. Furthermore, the greater the dimension of the test problems, the more improvements are achieved, i.e., for example, for 10-dimensional problems, the LBR does not deliver any improvements, and for L-SHADE and jSO, there are even several performance deteriorations. Most of the improvements are for multimodal, hybrid, and composition functions.
Tables 2-5 contain the results for all CEC 2017 functions for all dimensions. The results presented in Tables 2-5 show that the linear bias reduction has the largest influence on 50-and 100-dimensional functions. Best values for each function are highlighted with bold. Table 6 contains the Mann-Whitney tests for the CEC 2020 benchmark problems.
The results presented in Table 6 shows that the LBR allows up to two improvements for the L-SHADE algorithm for 10D, 15D, and 20D, and one decreased performance. The improvements were mainly for functions f5 and f6, i.e., hybrid functions. For the jSO algorithm, there were up to three improvements for 15D, and for the DISH algorithm, which also included jSO adaptations, there were up to two improvements, which were observed for f5, f6, and f7.

Discussion
Every evolutionary algorithm should be designed and tuned according to the specific purpose; for example, if the goal is to find the best possible function value in a very limited time period, the convergence speed of the algorithm is crucial; however, if the computational resource is relatively large, then the explorative properties of the algorithm become more important. Obviously, both the fast convergence and the high explorative properties are important, and the algorithm that is able to combine them and make use of both could be recommended for real-world usage.
The linear bias reduction scheme proposed in this study follows this idea: the experiments showed that it improves the performance of several state-of-the-art non-hybrid DE variants on both the CEC 2017 and CEC 2020 problems due to larger parameter values sampled at the beginning of the search, which promote exploration, and smaller closer to the end of the search. The LBR approach does not require any significant additional computational overhead and could be easily implemented in most modern DE-based optimizers, improving their performance.
Although the idea of exploring at the beginning of the search and exploiting at the end is not new to the field of evolutionary algorithms, it could be implemented in many ways, for example with linear population size reduction; however, as this study shows, not all of them are discovered and properly studied. The idea of LBR is relatively simple and could be used for other approaches with corresponding modifications.
For example, in a genetic algorithm for tuning mutation probability, a similar success-history-based adaptation could be used, as was shown in [43]. Implementing the LBR proposed in this study for this version of GA could significantly improve the performance. In general, any parameter adaptation mechanism used in any EA or SI algorithm, where the tuning of numerical parameters is required, could be modified with LBR with minor changes depending on the algorithm. The implementation of LBR for multiobjective algorithms is also possible; however, to the best of our knowledge, there have been only a few numerical parameter tuning variants proposed for multiobjective optimization. As long as LBR only relies on the amount of computational resource, if the resource is known in advance, it could be incorporated into the algorithm.

Conclusions
This paper proposes a parameter tuning modification for the differential evolution algorithm called linear bias reduction, which allows more exploration at the beginning of the search and faster convergence to the best located optimum at the end. The LBR is a relatively simple technique, which could be used not only for DE, but for other approaches. The performed experiments proved its efficiency, especially for complex hybrid functions, and at the same time, there were only a few performance losses for more simple problems. The further directions of study may include, but are not limited to experimenting with non-linear bias reduction, bias increase, and various initial and final parameter values in LBR, as well as implementing LBR-like techniques for other evolutionary or swarm intelligence algorithms.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

EA
Evolutionary Algorithm DE Differential Evolution L-SHADE Linear population size reduction Success History-based Adaptive Differential Evolution LBR Linear Bias Reduction CEC Congress on Evolutionary Computation