Success History-Based Adaptive Di ﬀ erential Evolution Using Turning-Based Mutation

: Single objective optimization algorithms are the foundation of establishing more complex methods, likeconstrainedoptimization, nichingandmulti-objectivealgorithms. Therefore, improvements to single objective optimization algorithms are important because they can impact other domains as well. This paper proposes a method using turning-based mutation that is aimed to solve the problem of premature convergence of algorithms based on SHADE (Success-History based Adaptive Di ﬀ erential Evolution) in high dimensional search space. The proposed method is tested on the Single Objective Bound Constrained Numerical Optimization (CEC2020) benchmark sets in 5, 10, 15, and 20 dimensions for all SHADE, L-SHADE, and jSO algorithms. The e ﬀ ectiveness of the method is veriﬁed by population diversity measure and population clustering analysis. In addition, the new versions (Tb-SHADE, TbL-SHADE and Tb-jSO) using the proposed turning-based mutation get apparently better optimization results than the original algorithms (SHADE, L-SHADE, and jSO) as well as the advanced DISH and the jDE100 algorithms in 10, 15, and 20 dimensional functions, but only have advantages compared with the advanced j2020 algorithm in 5 dimensional functions.


Introduction
The single objective global optimization problem involves finding a solution vector x = (x 1 , . . . , x D ) that minimizes the objective function f (x), where D is the dimension of the problem. The task of black box optimization is to solve the global optimization problem without clear objective function form or structure, that is, f is a "black box". This problem appears in many problems of engineering optimization, where complex simulations are used to calculate the objective function.
The differential evolution (DE) algorithm, proposed by Price and Storm in 1995, laid the foundation for a series of successful algorithms for continuous optimization. DE is a random black box search method, which was originally designed for numerical optimization problems [1], and it's also an evolutionary algorithm that ensures that every next generation has better solutions than the previous generation: a phenomenon known as elitism. The extensive study fields of DE are summarized lately in the references [2].
A popular variant of DE [18] is the algorithm proposed by Fukunaga and Tanabe called Success History-based Adaptive Differential Evolution (SHADE) [19]. In the optimization process, the scale factor F and the crossover rate CR of control parameters are adjusted to adapt to the given problem, and the "current to pbest/1" mutation strategy and the external archive of poor quality solutions in with the fitness of the selected vector f (x). The vector with the better fitness value survives to the next generation.
This paper focuses on improving the mutation process, so the paragraphs below describe the mutation process of the DE algorithm. The complete steps of DE can be referred to the literature [1]. The mutation strategy of DE/rand/1/bin can be expressed as follows: where v i,G is the mutated vector, and x r1,G , x r2,G , and x r3,G are three different individuals randomly selected from the population. F i is the scaling factor, and G is the index of the current generation. If any dimension of the mutated vector v j,i,G is outside the boundary of the search range [x min , x max ], we perform the following correction for boundary-handling to handle infeasible solutions [45]: (2) where j is the dimensional index and i is the individual index.
The pseudo-code of the DE/rand/1/bin algorithm is shown in Algorithm 1.
Algorithm 1 DE/rand/1/bin 1: initialize P, NP, F, CR and MaxFES; 2: while FES < MaxFES do 3: for each individual x do 4: use mutation Formula (1) to create mutated vector v; 5: execute boundary-handling (2) to handle infeasible solutions; 6: use binomial crossover to create trial vector u; 7: use selection of classical DE to create individual of next generation; 8: end for 9: end while 10: return the best found solution.
It can be seen from the description of DE algorithm that users need to set three control parameters: crossover rate CR, scaling factor F and population size NP. The setting of these parameters is very important to the performance of DE.
Fine-tuning the control parameter is a time-consuming task, because of which most advanced variants of DE use parameter adaptation. This is also why Tanabe and Fukunaga proposed the SHADE [19] algorithm in 2013. Because the algorithms used in this paper are based on SHADE, it is described in more detail below.

SHADE
In the control parameters of SHADE, crossover rate CR and scaling factor F are discussed. The algorithm is based on JADE [20], proposed by Sanderson and Zhang, and so they share many mechanisms [18]. The major difference between them is in historical memories M F and M CR with their update mechanisms. The next subsections describe the historical memory update of SHADE and the difference between DE and SHADE algorithm in initialization, mutation, crossover and selection, respectively. The complete steps of SHADE can be referred to the literature [19].

Initialization
In SHADE, the population is initialized in the same manner as in DE, but there are two additional components-historical memory and external archive-that also need to be initialized.
Initialize the control parameters stored in the historical memory, crossover rate CR and scale factor F to 0.5: where H is the size of the user-defined historical memory, and the index k to update the historical memory is initialized to one. In addition, the initialization of the external archive of poor quality solutions is empty, i.e., A = ∅.

Mutation
In contrast to DE/rand/1/bin, the "current to pbest/1" mutation strategy is used in SHADE: where, x i,G is the given individual, and x pbest,G is an individual selected from the best NP × p i (p i ∈ [0, 1]) individuals randomly in the current population. Vector x r1,G is an individual selected from the current population randomly, and x r2,G is an individual selected from a combination of the external archive A and the current population randomly. Index r1 r2 i. F i is a scaling factor, rand [] is a uniform random distribution and NP is the size of population. The v i,G is the mutated vector and G is the index of the current generation. The greed of the "current-to-pbest/1" mutation strategy depends on the control parameter p i , which is calculated as shown in Equations (5) and (6). It balances exploration and exploitation capabilities (a small value of p is more greedy). The scaling factor F i is generated using the following formula: where randc i () is the Cauchy distribution, and M F,ri is randomly selected from historical memory M F (index ri is a uniformly distributed random value from [1, H]). If F i > 1, let F i = 1. If F i ≤ 0, Equation (7) is repeated to attempt to generate a valid value. The boundary handling of SHADE is identical to that of DE, as shown in Equation (2).

Crossover
DE has 2 classic crossover strategies, i.e., binomial and exponential. The crossover strategies of SHADE is the same as that of DE/rand/1/bin, i.e., binomial crossover. However, the crossover rate of DE/rand/1/bin is set in advance while the CR i of SHADE is generated by the following formula: where randn i () is Gaussian distribution, and M CR,ri are randomly selected from historical memory M CR (index r i is a uniformly distributed random value from [1, H]).

Selection
The process of selection of SHADE is the same as that of DE. However, the external archive needs to be updated during selection. If a better trail individual is generated, the original individual x i,G is stored in the external archive. If the external archive exceeds capacity, one of them is randomly deleted.

Historical Memory Update
Historical memory update is also an important operation in SHADE. The historical memories M CR and M F are initialized by Formula (3) but their contents change with the iteration of the algorithm. These memories store the "successful" crossover rate CR and scaling factor F. "Successful" here means that the trail vector u is selected instead of the original vector x to survive to the next generation. In each generation, the values of these "successful" CR and F are first stored in arrays S CR and S F , respectively. After each generation, a unit of each of the historical memories M F and M CR is updated. The updated unit is specified by index k, which is initialized to one and increases by one after each generation. If k exceeds the memory capacity H, it is reset to one. The following formula is used to update the k-th unit of historical memory: If all individuals in the G-th generation fail to generate a better trail vector, i.e., S F = S CR = ∅, the historical memory will not be updated. The weighted Lehmer mean W L and weighted mean W A are calculated using the following formulas, respectively: To improve the adaptability of the parameters, the weight vector w is calculated based on the absolute value of the difference that is obtained by subtracting the objective function value of the given vector from that of the trail vector in current generation G, as follows: The pseudo-code of the SHADE algorithm is shown in Algorithm 2.

Linear Decrease in Population Size: L-SHADE
In [21], a linear reduction of population size was introduced to SHADE to improve its performance. The basic thought is to gradually reduce the population size during evolution to improve exploitation capabilities. In L-SHADE, the population size is calculated after each generation using Formula (14). If the new population size NP new is smaller than the previous population size NP, the all individuals are sorted on the basis of the value of the objective function, and the worst NP-NP new individuals are cut. Also, the size of external archives/A/decreases synchronously with population size: where NP f and NP init are the final and initial population size, respectively. MaxFES and FES are the maximum and current number of the calculation of the fitness function, respectively. And round () is a rounding function.

Weighted Mutation Strategy with Parameterization Enhancement: jSO
The jSO [24] algorithm won the CEC2017 single-objective real parameter optimization competition [46]. It is a type of iL-SHADE algorithm that uses a weighted mutation strategy [47]. The iL-SHADE algorithm extends L-SHADE by initializing all parameters in the historical memories M F and M CR to 0.8, statically initializing the last unit of historical memories M F and M CR to 0.9, updating M F and M CR with the weighted Lehmer average value, limiting the crossover rate CR and scaling factor F in the early stage, and p is calculated for the "current-to-pbest/1" mutation strategy as: where p min and p max are the minimum and maximum value of p, respectively. FES and MaxFES are the current and maximum number of the calculation of the fitness function, respectively. The jSO algorithm sets p max = 0.25 and p min = p max / 2, initial population size to NP init = 25 √ D log D, and the size of the historical memory to H = 5. All parameters in M F and M CR are initialized to 0.3 and 0.8, respectively, and the weighted mutation strategy current-to-pbest-w/1 is used: where F w is calculated as:

Turning-Based Mutation
The opposition-based DE (ODE) algorithm was proposed by Shahryar et al. [48]. The oppositionbased learning (OBL) was used for generation jumping and population initialization, and the opposite numbers was used to improve the convergence rate of DE. Shahryar et al. let all vectors of the initial population take the opposite number in the initialization and allowed the trail vectors to take the opposite number in the selection operation. They then compared their fitness values and selected the vector with the better fitness to accelerate the convergence of the DE algorithm. We refer to the idea of "opposition" in the above algorithm, but the purpose of this paper is to change the direction of mutation under certain conditions to maintain population diversity and enable a longer exploration phase.
Suppose that the search space is two-dimensional (2D). There is a ring-shaped region, the center of which is the global suboptimal individual x pbest,G . The outer radius of the ring is OR and the inner radius is IR. If the Euclidean distance Distance between the given individual and the global suboptimal individual is smaller than the outer radius OR and larger than the inner radius IR, the differential vector de i from the mutation Formulas (1) and (4) takes the opposite number, and some dimensions are randomly selected to assign random values within the search range. Experiments have verified that better outer radius OR and inner radius IR can be calculated as: where OR init is the initial value of the outer radius and IR is the inner radius, which is also the minimum value of the outer radius. The outer radius OR decreases with an increase in the number of fitness evaluations. MaxFES and FES are the maximum and current number of the calculation of the fitness function, respectively, and x max and x min are the upper and lower bounds of the search range, respectively. The Euclidean distance Distance between the given individual and the global suboptimal individual is calculated as: The differential vector de i from the mutation Equations (1) and (4) is calculated as: The pseudo-code of the operation on the differential vector de i in turning-based mutation is shown as Operation 1: if Distance > IR and Distance < OR then 2: where R is the randomly disordered dimension index array, M is the number of randomly selected dimensions, and x max and x min are the upper and lower bounds of the search range, respectively.
Finally, the mutation operation is performed as shown in Equation (23): If the Euclidean distance Distance between the given individual and the global suboptimal individual is smaller than the outer radius OR and larger than the inner radius IR, the improved method changes the direction of mutation of the given individual to maintain the population diversity and a longer exploration phase, thus enhancing the global search ability and the ability to escape the local optimum. Then, with an increase in number of fitness evaluations, the performance of the algorithm can be improved. If the Euclidean distance Distance between the given individual and the global suboptimal individual is smaller than or equal to the inner radius IR, the former is allowed to mutate in the original direction. This enables the given individual to quickly converge to the global optimal or suboptimal position to avoid the problem of non-convergence caused by turning-based mutation.
Since Equation (21) and Operation 1 need to be executed in the mutation process of each individual, the overall time complexity [42] of the improved algorithms is slightly higher than that of the original algorithms, as shown in Tables 1-3. The pseudo-code of the Tb-SHADE algorithm (SHADE algorithm using turning-based mutation) is shown as Algorithm 3, that of the TbL-SHADE algorithm (L-SHADE algorithm using turning-based mutation) is shown as Algorithm 4, and that of the Tb-jSO algorithm (jSO algorithm using turning-based mutation) is shown as Algorithm 5. The improved parts of these algorithm are underlined. use mutation Formulas (7) and (8)  use Formulas (9) and (10)  use mutation Formulas (7) and (8)  use mutation Formulas (7) and (8) to select F and CR; 8: use (17) to calculate F w ; 9: if FES < 0.6MaxFES and F i,G > 0.7 then 10: if FES < 0.25MaxFES then 13:

Experimental Settings
To verify the improved method by experiments, the original algorithm, the improved algorithm and the advanced DISH and the jDE100 algorithms were tested on the Single Objective Bound Constrained Numerical Optimization (CEC2020) benchmark sets in 5, 10, 15 and 20 dimensions. The termination criteria, i.e., the maximum number of the calculation of the fitness function (MaxFES) and the minimum error value (Min error value), were set as in Table 4. The search range is [x min , x max ] = [−100, 100], and 30 independent repeated experiments were conducted. The parameter setting of most algorithm [19,21,24] is shown in Tables 5 and 6. In addition, the parameter setting of j2020 algorithm can be found in [29].  15 3,000,000 10 −8 20 10,000,000 10 −8 The hypothesis that the turning-based mutation can maintain a longer exploration phase can be verified by analyzing the clustering and density of the population during the optimization process. These two analyses are described in more detail below.

Cluster Analysis
The clustering algorithm selected in this experiment is density based noisy application spatial clustering (DBSCAN) [49], which is based on the clustering density rather than its center, so it can find clusters of arbitrary shape. DBSCAN algorithm needs to set two control parameters and a distance measurement. The settings are as follows: (1) distance between core points, that is, Eps = 1% of the decision space; for the CEC2020 benchmark sets, Eps = 2; (2) minimum number of points forming a cluster, that is, MinPts = 4 (minimum number of mutation individuals); and (3) distance measure uses Chebyshev distance [50]-if the distance between the corresponding attributes of two individuals is greater than 1% of the decision space, they are not considered as direct density reachable.

Population Diversity
The population diversity (PD) measure is taken from [51], which is based on the square root of the deviation sum, Equation (25), of individual components and their corresponding mean values, Equation (24): where i is the iterator of members of the population and j is that of the component (dimension).

Results
Tables 7-18 compare the error values (when the error value was smaller than 10 −8 , the corresponding value was considered optimal) obtained by the original algorithms (SHADE, L-SHADE, and jSO) and their improved versions using turning-based mutation (Tb-SHADE, TbL-SHADE and Tb-jSO, respectively). The results of a comparison are showed in the last column of each table. If the performance of the original version was significantly better, uses the "−" sign; if the performance of the improved version was significantly better, uses the sign "+"; if their performances were similar, "=" is used. The better performance values are displayed in bold, and the last row of these tables shows the results of an overall comparison. Tables 19-22 provide the error values obtained by the advanced algorithms DISH, jDE100 and j2020 on CEC2020. All tables provide the best, mean and std (standard deviation) values of 30 independent repetitions of the experiments.            Convergence diagrams are shown in Figures 1-12. Figures 1-4 shows the convergence curves of SHADE and Tb-SHADE, respectively, for some test functions in 5D, 10D, 15D, and 20D, Figures 5-8 shows those of L-SHADE and TbL-SHADE for some test functions in 5D, 10D, 15D, and 20D. and Figures 9-12 shows those of the jSO and Tb-jSO, respectively, for some test functions in 5D, 10D, 15D and 20D. It is apparent that the red line of the turning-based mutation version of the algorithm was often slower to converge but attained better objective function values.            Tables 23-34 shows the number of runs (#runs) of population aggregation, the average generation (Mean CO) of the first cluster during these runs, and the average population diversity (Mean PD) of these generations.   The rankings of the Friedman test [52] were obtained by using the average value (Mean) of each algorithm on all 10 test functions in Tables 7-22, and are shown in Tables 35-38. The related statistical  values of the Friedman test are shown in Table 39. If the chi-square statistic was greater than the critical value, the null hypothesis was rejected. p represents the probability of the null hypothesis obtaining. The null hypothesis here was that there is no significant difference in performance among the nine algorithms considered here on CEC2020.

Results and Discussion
The results on the CEC2020 benchmark sets are first discussed. As shown in Tables 7-18, the scores were two improvements against two instances of worsening (5D), four improvements and two instances of worsening (10D), five improvements and two instances of worsening (15D), and four improvements no instances of worsening (20D) in the case of SHADE; three improvements against zero instances of worsening (5D), four improvements and one worsening (10D), six improvements no worsening (15D), and five improvements and no worsening (20D) in the case of L-SHADE; and one improvement against no worsening (5D), four improvements and one worsening (10D), four improvements and one worsening (15D), and two improvements two instances of worsening (20D) in the case of jSO. In some test functions, the improved algorithm even escaped the local optimum and found the optimal value (if the error was smaller than 10 −8 , the relevant value was considered optimal). Examples are f3 in Tables 10 and 13, and Table 14, f8 in Tables 9, 13 and 16, and Table 17, f9 in Table 12, and f10 in Table 11. In most cases, the improved version was clearly better than the original algorithm except for Tb-SHADE (5D) and Tb-jSO (20D).
According to the convergence curves in Figures 1-12, in most cases, the improved algorithm showed similar convergence to the original in the early stage of the optimization process, but it clearly maintained a longer exploration phase and achieved better values of the objective function in the middle and late stages; in a few cases (such as f 4 in Figure 5), the improved algorithm had slower convergence but did not achieve a better objective function value than the original.
As the numerical analyses in Tables 23-34 show, in most cases, the improved algorithms exhibited fewer clusters (#runs), later clustering (mean CO), and higher population density (mean PD) than the original algorithm. But Tb-SHADE (5D) had a lower population density on f 6-f 9, as did TbL-SHADE (all dimensions) on f 2-f 7, where this might have been related to the linear decrease in the population size. Tb-jSO showed similar numbers of clusters in all dimensions and a lower population density on some test functions in 5D. Therefore, in most cases, the improved versions maintained the diversity of population and a longer exploration phase in the optimization process.
The significant improvements in Tables 7-18 and the clustering analysis in Tables 23 and 24 can be linked. The results in the former set of tables with the "+" symbol were always connected with the occurrence of later clustering, none at all, or fewer instances of clusters of 30 (for the last option, see, for example, column #runs in Tables 24-26, f 3). Consequently, the improvement in the performance effected by the updated version was related to the maintenance of population diversity and a longer exploration phase.
According to the Friedman ranking in Tables 35-38, Tb-SHADE, TbL-SHADE, and Tb-jSO were clearly better than the original algorithms and the advanced DISH and jDE100 in 10D, 15D, and 20D. But Tb-SHADE did not perform as well as SHADE in 5D and did not perform as well as DISH in 5D, 10D and 20D. In addition, the j2020 algorithm delivered the best performance and ranked first in 10D, 15D and 20D and one of the improved versions, TbL-SHADE, only delivered the best performance and ranked first in 5D. And jDE100 (winner of CEC2019), which ranks last in Tables 35-38, did not seem suitable for CEC2020. Table 39 shows that the null hypothesis was rejected in all dimensions, and thus the Friedman ranking was correct. All in all, the three improved algorithms obtained good optimization results in contrast to the original algorithm as well as the advanced DISH and jDE100 algorithms but were slightly worse than the advanced j2020 algorithm.

Conclusions
In this paper, a relatively simple and direct method using turning-based mutation was proposed and tested on Single Objective Bound Constrained Numerical Optimization (CEC2020) benchmark sets in 5, 10, 15, and 20 dimensions against the SHADE, L-SHADE, and jSO algorithms. The basic thought of the proposed method is to change the direction of mutation under certain conditions to maintain the population diversity and a longer exploration phase. It can thus avoid premature convergence and escape the local optimum to get better optimization results. The results of experiments showed that this method is effective on CEC2020 benchmark sets in 10, 15, and 20 dimensions. The strong point of the proposed method is that it can be applied to variants of SHADE easily. A disadvantage is that it increases the time complexity and its effectiveness lacks theoretical proof. Our future research in the area will focus on further experiments, and on applying the proposed method to more algorithms. For example, the improved method may be useful for some practical problems featuring constraints.