Differential Evolution: A Survey and Analysis

: Differential evolution (DE) has been extensively used in optimization studies since its development in 1995 because of its reputation as an effective global optimizer. DE is a population-based metaheuristic technique that develops numerical vectors to solve optimization problems. DE strategies have a signiﬁcant impact on DE performance and play a vital role in achieving stochastic global optimization. However, DE is highly dependent on the control parameters involved. In practice, the ﬁne-tuning of these parameters is not always easy. Here, we discuss the improvements and developments that have been made to DE algorithms. In particular, we present a state-of-the-art survey of the literature on DE and its recent advances, such as the development of adaptive, self-adaptive and hybrid techniques.


Introduction
Optimization algorithms are important approaches for resolving difficult optimization problems [1].Optimization is defined as the procedure of discovery that provides the minimum or maximum value of a function f (x) [2,3].There are many reasons that make these problems difficult to solve.First, we cannot perform a comprehensive search if the problem domain space is too large.Second, the evaluation function is noisy or varies with time, generating a series of solutions instead of a single solution.Third, sometimes the constraints prevent arriving at a possible solution such that the optimization approach is the only solution [4].The mathematical representation for the optimization involves finding an , where where X* is the optimal solution in the N-dimensional search space D N , N is the dimension of the optimization (the number of optimization parameters) and f min i and f max i are the objective functions.Differential evolution (DE) is a stochastic algorithm for solving numerical continuous optimization problems.Since its inception, the DE algorithm has become a powerful global optimizer.Developed by Kenneth Price in 1994, DE is a promising optimization algorithm that converges to the real optimum without using significant amounts of resources.Furthermore, its performance was validated in the evolutionary domain by the IEEE Conference on Evolutionary Computation (CEC) in 1996 [5].
More recently, different versions of DE have secured the top ranks in many competitions between evolutionary algorithms (EAs) by the IEEE CEC conference series (http://www.ntu.edu.sg/home/epnsugan/index_files/cec-benchmarking.htm).However, an impressive number of different DE algorithms have been introduced by the research community over the past decades.To understand all the improvements that have been made to DE, a theoretical study and relative analysis of different enhancements are presented in this paper.Because various DE algorithms involving different techniques are comprehensively discussed, the main motivation behind this survey is to deepen understanding of the characteristics of different DE strategies, with the goal of benefiting from the various approaches.In fact, understanding how to combine these DEs harmoniously and their underlying concepts could be crucial to attaining effective designs or improving the performance of DE algorithms in particular or any optimization algorithms in general.Moreover, the literature shows that no single algorithm has been demonstrated to be effective for various applications.Thus, this study may provide a roadmap through which developers may gain a full understanding of this field.
DE algorithms are different from EA algorithms that shape offspring by mixing solutions with a difference factor rate of selected individual vectors and they are an alternative to recombining individuals through a probabilistic scheme.In fact, the differential mutation strategy is the main component that distinguishes DE from other population algorithms.Applying the mutation to all candidates defines an exploration rule based on other candidate solutions.Therefore, the mutation strategy enhances a population's capability for discovering new promising offspring based on the current distribution of solutions within the domain space [6].Ideally, the performance of DE is based on two major components: the chosen strategy and the control parameters.However, the strategy underlying DE consists of mutation, crossover and selection operators, which are utilized at each generation to determine the global optimum.The control parameter components consist of the population size NP, scaling factor F and the crossover rate Cr [7,8].
Despite the potential of DE, it is obvious to the research community that some adjustments to classic DE are essential to significantly enhance its performance, especially in addressing high-dimension problems.Stagnation, premature convergence and sensitivity to control parameters are the main issues that influence the performance of DE [9].Stagnation occurs when the population cannot converge to a suboptimal solution although the diversity of the population remains high [10].In other words, the population does not improve over a period of iterations, and the algorithm is not capable of finding a new search domain [11].There are many causes of stagnation, including control parameters, which become inefficient for a specific problem in the decision space [9,10].Many studies have proposed a variety of ways to improve the current DE algorithm through modifications [12,13], including the use of differential mutations with perturbations, mutations with selection pressure, and operator adaption techniques [14][15][16][17].Qing conducted an extensive study on differential evolution [18] and observed that the performance of differential evolution and the quality of the results were based on the type of technique used, either binomial or exponential [16,19].
This article provides a comprehensive survey of the different types of state-of-the-art differential evolution algorithms available as global numerical optimizations over continuous search spaces.Thus, this study provides a foundation for researchers who are interested in optimization in general or who care about recent developments in DE.The growing research area is divided into adaptive, self-adaptive, and hybridization strategies.This comprehensive study sheds light on most improvements and developments pertaining to different types of DE families, including primary concepts and a variety of DE formats.
This article is organized as follows.Section 2 briefly describes classic differential evolution.Section 3 analyzes the parameter strategies and their effects on DE performance and surveys different DE approaches.Section 4 presents the conclusions drawn from our study.

Classic Differential Evolution
If we are seeking the optimum for X* which is demonstrated by vector X * i , i = 1, . . .D, X ∈ R D , within boundary constraints L ≤ X ≤ H. Differential evolution (DE) is population-based, where the initial population is P i,j ∈ R [D×NP] with random initialization H ≤ P i,j ≤ L. Initialization of the population is an important step that assumes that there is no previous information about the optimum solution.Therefore, the population is initialized within only boundary constraints, the upper bound (H) and lower bound (L), so the population can by initialized by the following: After the initialization phase, the evolution involves the three processes of mutation, crossover, and selection.The classic differential evolution strategy consists of three random vectors v 1 , v 2 and v 3 that are selected from the population (Equation ( 1)).Randomly select three individuals from the population v 1,2,3 ∈ [1, ..., NP], where v The mutation operation recombines to construct the mutation vector v y shown in Figure 1.The associated equation is  After the initialization phase, the evolution involves the three processes of mutation, crossover, and selection.The classic differential evolution strategy consists of three random vectors v , v , and v that are selected from the population (Equation ( 1)).Randomly select three individuals from the The mutation operation recombines to construct the mutation vector v shown in Figure 1.The associated equation is The mutation process is the main distinctive component of DE and is considered the strategy by which DE is carried out.There are different types of mutation strategies, each one distinguished with an abbreviation based on the classic mutation strategy described by Equation (1), i.e., DE/rand/1/bin, where DE represents differential evolution and "rand" represents random, which indicates that the vectors are selected randomly.The number one indicates the number of difference pairs; in this strategy, it is one pair (v − v ).The last term represents the type of crossover used.This term could be "exp", for exponential, or "bin", for binomial [20].Then, to complement the previous step (mutation strategy), DE also applies uniform crossover to construct trial vectors v , which are out of parameter values that have been copied from two different vectors.In particular, DE selected a random vector from the population, indicated as v , which must be different to v , v , and v , and then it crosses with a mutant vector  ; the binomial crossover is generated as follows: The crossover probability, Cr ∈ (0, 1), is a pre-defined rate that specifies the fraction of the parameter that is transferred from the mutant.Thus, it is used to control which sources participate in The mutation process is the main distinctive component of DE and is considered the strategy by which DE is carried out.There are different types of mutation strategies, each one distinguished with an abbreviation based on the classic mutation strategy described by Equation (1), i.e., DE/rand/1/bin, where DE represents differential evolution and "rand" represents random, which indicates that the vectors are selected randomly.The number one indicates the number of difference pairs; in this strategy, it is one pair (v 1 − v 2 ).The last term represents the type of crossover used.This term could be "exp", for exponential, or "bin", for binomial [20].Then, to complement the previous step (mutation strategy), DE also applies uniform crossover to construct trial vectors v Trail , which are out of parameter values that have been copied from two different vectors.In particular, DE selected a random vector from the population, indicated as v x , which must be different to v 1 , v 2 , and v 3 , and then it crosses with a mutant vector v y ; the binomial crossover is generated as follows: The crossover probability, Cr ∈ (0, 1), is a pre-defined rate that specifies the fraction of the parameter that is transferred from the mutant.Thus, it is used to control which sources participate in a given parameter.The uniform crossover rate is compared with uniform random values formed from rand (0, 1); if the random value is smaller than or equal to Cr then the trial parameter is copied from the mutant vector v y , otherwise the parameter is inherited from v x .
The next operation is selection, in which the trail vector v Trail competes with the target vector v x .If this trail vector v Trail is equal or less than v x it changes the target vector v x in the next generation otherwise v x is not changed in the population, where f (x) is the objective function.If the new trail vector f (v Trail ) is less than or equal to the target vector f (v x ), it replaces the target vector.Otherwise, the population maintains the target vector value.Therefore, the different DE phases prevent the population from ever deteriorating; the population either remains the same or improves.Furthermore, continued refining of the population is achieved by the trial vector, although the fitness of the trial vector is the same as that of the current vector.This factor is crucial in DE because it provides the algorithm the ability to move through the landscape using a variety of generations [21].The termination condition can be either a preset maximum number of generations or a pre-specified target of the objective function value [22].

Differential Evolution Strategies
The various equations underpinning DE possess certain aspects in common when applied to continuous optimization.All consist of an original point, sometimes referred to as the base point.The original algorithm carries out the search operation such that it finds the optimum as soon as possible.We can generalize the DE formula to the form α = β + F × δ, where β represents the base vectors and δ the difference between vectors.Thus, the main goal of all DE equations is to provide the optimal direction based on the differential β and base vector δ (Figure 2). a given parameter.The uniform crossover rate is compared with uniform random values formed from rand (0, 1); if the random value is smaller than or equal to Cr then the trial parameter is copied from the mutant vector v , otherwise the parameter is inherited from v .
The next operation is selection, in which the trail vector v competes with the target vector v .If this trail vector v is equal or less than v it changes the target vector v in the next generation otherwise v is not changed in the population, where ( x ) is the objective function.If the new trail vector ( v ) is less than or equal to the target vector ( v ), it replaces the target vector.Otherwise, the population maintains the target vector value.Therefore, the different DE phases prevent the population from ever deteriorating; the population either remains the same or improves.Furthermore, continued refining of the population is achieved by the trial vector, although the fitness of the trial vector is the same as that of the current vector.This factor is crucial in DE because it provides the algorithm the ability to move through the landscape using a variety of generations [21].The termination condition can be either a preset maximum number of generations or a pre-specified target of the objective function value [22].

Differential Evolution Strategies
The various equations underpinning DE possess certain aspects in common when applied to continuous optimization.All consist of an original point, sometimes referred to as the base point.The original algorithm carries out the search operation such that it finds the optimum as soon as possible.We can generalize the DE formula to the form α = β + F × δ, where β represents the base vectors and δ the difference between vectors.Thus, the main goal of all DE equations is to provide the optimal direction based on the differential β and base vector δ (Figure 2).
Establishing β and δ is crucial to creating an efficient strategy that can be applied to the chosen individuals from the population.However, all possible combinations of β and δ can be classified into the following strategies: local, random, directed, and hybrid.In random strategies, abbreviated as "Rand", all individuals are formed randomly, and there is no prior information about the objective function.In directed strategies, abbreviated as "DIR", a suitable value for the base vector is chosen according to the objective function to ensure a suitable direction.Hybrid strategies include the combination of "Rand" and "DIR", labeled RAND/DIR.In another approach, the best overall vector is used, not only the best among the selected individuals; this approach is referred to as the "BEST".Combining the "Rand" and "BEST" yields the hybrid RAND/BEST strategy.In addition, the combination of more than two approaches, e.g., RAND/BEST/DIR, can yield favorable results by exploiting the advantages of each approach.Establishing β and δ is crucial to creating an efficient strategy that can be applied to the chosen individuals from the population.However, all possible combinations of β and δ can be classified into the following strategies: local, random, directed, and hybrid.In random strategies, abbreviated as "Rand", all individuals are formed randomly, and there is no prior information about the objective function.In directed strategies, abbreviated as "DIR", a suitable value for the base vector is chosen according to the objective function to ensure a suitable direction.Hybrid strategies include the combination of "Rand" and "DIR", labeled RAND/DIR.In another approach, the best overall vector is used, not only the best among the selected individuals; this approach is referred to as the "BEST".Combining the "Rand" and "BEST" yields the hybrid RAND/BEST strategy.In addition, the combination of more than two approaches, e.g., RAND/BEST/DIR, can yield favorable results by exploiting the advantages of each approach.
Table 1 shows that all DE strategies employed are formed based on the DE/rand/x variation, which applies pairs of difference vectors: whereas the scaling factors are frequently presumed to be the same F 1 = F 2 = . . .= F k = F. Substituting an arbitrary base vector x 1 as v best , "the best vector" from the population, provides a different DE approach, indicated as DE/best/1: Most mutation strategies can be formed by a general formula based on the sum of k scaled difference vectors and a weighted average among the best vector and arbitrary ones: Table 1.The differentiation operation can be carried out using many mutation strategies.

Strategy Formulation
One aspect common to all the mutation strategy methods is the base vector, which controls the search direction.The difference vector provides a mutation rate term, such as a self-adaptive term, that is added to an arbitrary or guided base vector to construct a trial individual.Over generations, the individuals of a population reside in increasingly better positions and reform themselves.The various combinations of these vectors can be categorized into four groups based on information pertaining to the values gathered from the objective function: random, directed, local and hybrid.
The RAND approach consists of strategies in which the trial individual is produced without knowledge of the value of the objective function.Similarly, the RAND/DIR approach includes strategies that use the values of the objective function to determine a promising direction.Likewise, the RAND/BEST approach applies the best individual approach to proceed with a trial.Additionally, the RAND/BEST/DIR approach combines the last two groups into one that includes all of their collective benefits.However, a suitable direction is obtained by using the best individual to decrease the search space and exploration time [23,24].Thus, the "dir" and "dir-best" strategies, which use objective function values to generate trial individuals, can produce an exploitation function.In fact, the random selection of parents for a trial enhances exploration capabilities [25][26][27].Thus, the locations of individuals carry information about the fitness landscape.Therefore, an effective mutation strategy that leads to uniform random vectors represents the entire search space well.

Initialization
DE is a population-based optimization technique that begins with the problem solution by selecting the objective function at a random initial population.Predefined parameter bounds describe the area from which the number of population (N p ) vectors in this initial population is chosen within both the upper bound "b U " and the lower bound "b L ", where the subscripts L and U indicate lower and upper, respectively.The following equation is used to develop a random number generator for all vectors from within the predefined upper and lower bounds.The random function Random (0, 1) produces a uniform random number within the range (0, 1).

Crossover
To balance the differential mutation search strategy, DE also applies uniform crossover to construct trial vectors.A trial vector is constructed from values that have been copied from two diverse vectors.In particular, DE crosses each vector as follows: The crossover probability, C r ∈ [0, 1], is predefined in the classic version of DE, and the fraction value of the Cr control is cloned from the mutant vector.C r is compared with a random number randj (0, 1).If the random number is less than or equal to C r , the trial parameter is inherited from the mutant v j,i,g , otherwise, the parameter is cloned from the vector x j,i,g .

Selection
In this stage, we determine when the trial vector u i,g has an objective function value that is less than or equal to that of its target vector x i,g .DE swaps the target vector in the next iteration, otherwise the target retains its place in the population.This process is carried out by comparing each trial vector with the target vector from which the parameters are cloned.After the population is updated, mutation, recombination and selection are repeated until the optimum value is found or after a predefined stop criterion is reached, such as a certain number of iterations.

Differential Evolution Paused-Code and Flowchart
The flow chart of classic differential evolution is shown in Figure 3, and a pseudo-code for classic differential evolution is given in Algorithm 1, which provides a pseudo-code of the DE algorithm for minimizing a cost function, specifically, a DE/rand/-1/bin strategy.

DE Applications
Due to the rapid rise of DE as a modest and strong optimizer, developers have applied the technique in a wide range of domains and fields of technology (http://www1.icsi.berkeley.edu/~storn/code.html).Yalcin proposed a new method for the 3D tracking of license plates from video using a DE algorithm, which could be fine-tuned according to the license plate boundaries [28].A color image quantization application using DE was proposed by Qinghua and Hu.The main objective of image processing techniques during the color image quantization phase, is to decrease the number of colors in an image with a low amount of deformation.DE can be used to adjust colormaps and find the optimal candidate colormap [29].With respect to the bidding market, Alvaro et al. applied DE in developing a competitive electricity market application that finds the optimal bids based on daily bidding activity [30].Sickel et al. used DE in developing a power plant control application for a reference governor to produce an optimal group of points for controlling a power plant that was produced by [31].Wang et al. proposed a flexible QoS multicast routing algorithm for the next-generation Internet that improves the quality of service (QoS) of multicasts to manage the increasing demand of network resources [32].With respect to the electric power systems industry, Ela et al. applied DE to determine the optimal power flow [33].Goswami et al. proposed a DE application for model-based well log-data inversion to discover features of earth formations based on the dimensions of physical phenomena [34].Another application applies network system reconfiguration for distributing systems.The network reconfiguration application proposed by Tzong and Lee involves the application of Improved Mixed-Integer Hybrid Differential Evolution [35].Another DE application developed by Boughari et al. sets suitable controllers for aircraft stability and control augmentation systems [36].

Parameter Control
As suggested in the previous section, the DE algorithm is a simple and effective optimization algorithm for problems from the real world when its control parameters are properly set [8,37,38].In this section, we review the most current approaches for improving DE.First, the DE algorithm applies certain control parameters to the system implementation.The accomplishment of DE is influenced by the value of parameters, such as the crossover and mutation rate.Although some studies have recommended certain values for these parameters, their effect on performance is complex and their exact values are unclear.In particular, there is a wide variety of different recommended values that are appropriate for different problems [39][40][41].
The mutation rate "F", crossover rate "C r " and population dimension "N p" maintain balance between exploration and exploitation [6].Exploration is associated with finding new solutions, and exploitation is associated with searching for new, suitable solutions; the two processes are linked in the evolutionary search [42,43].Therefore, the mutation and crossover rates influence the convergence rate and the effectiveness of the search space [44].

Deterministic Parameter Control
The parameters are altered using a deterministic rule regardless of the feedback from the evolutionary search, with jitter and dither being two operators that are used in this technique.Dither scales the distance of the vector differentials as the same factor, F i , is applied to all the elements of a subtracted vector.Jitter multiplies each vector element of the subtracted vector by a different scale factor, F j .The rotation creates jitter using an essentially different procedure than the classic DE's constant mutation with F. However, this approach shows robustness for non-deceiving objective functions [3].Nonetheless, applied fixed values for each iteration, and F was created for each individual within the range (0.4, 1), whereas the interval (0.5, 0.7) was selected for Cr [54,55].
Another approach is the composite DE (CoDE) algorithm proposed by Wang et al.In CoDE, a trial vector is selected from a set of groups produced by utilizing diverse DE strategies [56].The main objective is to arbitrarily merge many trial vector strategies with different parameters at each iteration to construct new trial vectors.These combinations help to successfully solve many problems.Wang et al. used a group of trial vector strategies and a group of control parameter (almost three) to create strategy and parameter candidate pools.The selected strategies are DE/rand/1/bin, DE/rand/2/bin and DE/current-to-rand/1, and the three pair common choices for the control parameter settings were (F = 1.0,Cr = 0.1), (F = 1.0;Cr = 0.9), and (F = 0.8; Cr = 0.2).In each generation, three different strategies are applied, which randomly pick any of the control parameter values.Then, the trial vector is designated the candidate with the better value of fitness.The parameters are chosen based on whether they are frequently implemented with many DEs, and their performances are evaluated.The three pairs of parameter settings that provide diverse effects produce new improved candidates.Furthermore, the different values of the control parameters maintain different levels of search performance.

Adaptive Parameter Control
The adaptive technique has been applied with classic DE/rand/1/bin.While the performance is relatively favorable, the technique still suffers from convergence rate issues [50,51].If very well designed, self-adaptive and adaptive parameter controls can enhance the robustness and the convergence rate by automatically adapting to the parameters.Approaches other than using the best explored solution, use minor resolution in previous generations and their variation with the present population as a good area for finding the optimum.Adapting the parameters is a method called the adaptive DE algorithm (ADE), which applies an evaluation from feedback of the F relay on an additional parameter (γ) that it is necessary to be adjusted [57,58].However, the self-adaptive parameter controls the value assignments and adjusts them dynamically.A parameter is altered dynamically through processing according to pre-defined rules using adaptive control, self-adaptive control or a combination thereof [45,59].
The main purpose of adaptive DE is to help exploit and explore relationships that avoid premature convergence problems and to optimize the final results.In general, there are many techniques for hybridizing a conventional evolutionary algorithm to solve optimization problems.The initial population of DE is formed by problem-specific heuristics.Then, other solutions obtained using another EA might be enhanced with a local search.This type of combination is called a memetic algorithm [21,59].The benefits of this hybridization lead to various operators that might exploit problem knowledge, such as merging more promising individuals to be inherited.Furthermore, mutation operations may be biased to contain solutions of promising individuals with higher probabilities than those of others.

Differential Evolution with Self-Adapting Populations (DESAP)
Differential evolution with self-adapting populations (DESAP) dynamically adjusts the crossover and mutation parameters δ, η and the population size π [17].Each individual i is connected to its control δ i , η i , and π i .δ and π have similar meanings to NP and Cr, correspondingly.The mutation factor F is retained as static, and η denotes the probability of implementing an extra mutation operation by using normally distributed.The main technique of DESAP is unlike that of the traditional DE/rand/1/bin algorithm [16].Parameters are adapted by developing them over the mutation and crossover processes, as the procedures are applied to each x i .The updated values of the parameters continue with u i if f (u i ) f (x i ).However, DESAP still requires further development to produce better performances.In fact, despite its simplicity, DESAP performed better than DE in one of De Jong's five exam problems, whereas the other solutions are almost identical.DESAP represents an opportunity to reduce the control parameters further by updating the size of the population, as is done with the additional parameters.

Fuzzy Adaptive Differential Evolution (FADE)
Fuzzy adaptive differential evolution (FADE), presented by Lampinen and Liu [19], is a different type of DE algorithm that applies fuzzy logic controllers to adjust the controller parameters F i and Cr i for the crossover and mutation operations.Similarly to DESAP, the size of the population is presumed to be adjusted and is static during the evolution procedure [20].The fuzzy-logic control method has been verified on a group of 10 functions as a benchmark and displays better solutions than those of classic DE for high-dimensional problems.

Self-Adaptive Differential Evolution (SaDE)
Self-adaptive differential evolution (SaDE) is simultaneously applied to a pair of mutation techniques "DE/rand/1" and "DE/current-to-best/2" [52].The adaptation technique of the parameter consists of two chunks: the probability of the adaptation p i , where i = (1, 2), and the DE parameters F i and Cr i .The probability of producing a mutation vector based on the two strategies approaches is 0.5 and this is updated every 50 iterations using the following method: where ns i and n f i are the numbers of offspring vectors constructed by the ith i = (1, 2) strategy that was a success or failure in the selection process over the last 50 generations.It is assumed that this adaptation process can progressively develop the most appropriate mutation strategy at diverse learning phases for a given problem.The mutation factors F i are autonomously created at each iteration based on a normal distribution, NR with a mean of 0.5 and a standard deviation of 0.3, The crossover rates Cr i are autonomously formed based on a normal distribution with a mean of Cr m and a standard deviation of 0.1.The mean Cr m approaches 0.5, is changed every 25 iterations and is set to be the mean of the effective Cr over the previous 25 generations.
Cr i = NR(Cr m , 0.1) where K is the counters of effective Cr values and Cr suc (k) indicates the k th value.
To accelerate the convergence, a local search technique (Quasi-Newton method) is applied to respectable individuals after 200 generations.SaDE has been further developed by applying five mutation strategies to resolve a group of constrained problems [60].

Self-Adaptive NSDE (SaNSDE)
Neighborhood search differential evolution (NSDE) is similar to classic DE except that Equation ( 1) is replaced with v y = v 3 + d i * N(0.5, 0.5) i f u(0.1)0.5 d i * δ otherwise where d i = v 1 − v 2 is the differential deviation, N (0.5, 0.5) means a Gaussian random number with an average of 0.5 and a standard deviation of 0.5 and δ indicates a Cauchy random variable with a rate parameter of t = 1.Self-adaptive DE (SaDE) [8] was developed to resolve the issues related to control parameters and learning technique.In SaDE, two DE learning strategies are chosen according to their performance.The most appropriate learning technique and parameter values are increasingly self-adapted according to the learning experience gained during evolution [61].
SaNSDE is an adaptive differential evolution algorithm that produces mutation vectors in a manner similar to SaDE [61].However, the difference is that the mutation factors are established based on a normal distribution or a Cauchy distribution: where the normal distribution (µ,σ 2 ) indicates a random value of mean µ and variance σ 2 and a Cauchydistribution (µ, δ) indicates a random value with scale parameters µ and δ.The probability f p of the spread over is adapted as follows: The crossover rate adaptation is similar to the method used in SaDE, but the factor Cr m is changed as a biased average of the successful values Cr suc every 25 iterations.
where the weight is calculated with a positive improvement ∆ = f (x) − f (u) in the selection related to each successful crossover rate CRsuc (k).

Self-Adapting Parameter Setting in Differential Evolution (jDE)
jDE is another adaptive DE algorithm that is similar to the classic DE/rand/1/bin algorithm.jDE improves the population size throughout the optimization process based on the improved parameters and thus generates vectors that are more likely to survive.However, the mechanism of jDE involves adapting the parameters F i and CR i associated with each individual.At the beginning of the process, the parameter values are F i = 0.5 and CR i = 0.9 for each individual.However, F i and CR m are updated from the effective records; thus, jDE produces new values within the probabilities τ 1 = τ 2 , which are used to alter the control parameters.The updated values for F i and CR i are then obtained using uniform distributions over (0.1, 1) and (0, 1), respectively.That is, where random j = 1, 2, 3, 4 is the uniform random function ∈[0, 1].The updated parameters are implemented in the mutation and crossover processes to produce new, consistent vectors.This mechanism updates the prior parameter with a new one only if the new vectors pass the selection phase.However, jDE yields improved results with the classic DE/rand/1/ bin strategy.

Adaptive DE Algorithm (ADE)
Hu and Yan proposed another adaptive DE algorithm.They modified the parameters F and Cr to each iteration using the current generation and the fitness [62].They tried to find the optimal value for the parameters F and Cr to find a balance between reliability and efficiency.The mutation and crossover operations are calculated for each generation.Thus, for each parent x g i of generation g, the offspring x g+1 i is constructed as follows: calculate the Gth mutation F(G) and crossover CR(G) as, MDE uses only one array, which is updated when a better solution is found.Therefore, continuously updating the one array improves the convergence speed, leading to fewer evaluation procedures than those associated with classical DE [63].In MDE, and by applied distribution of Laplace "F" is arbitrarily adjusted [63].The Laplace distribution is analogous to the (NP) normal distribution [64].Moreover, the Laplace distribution is longer and skewed, allowing for inference so that it can control more efficiently, thus avoiding premature convergence.Experimental results demonstrate that modified DE with a Laplace distribution (MDE) offers enhanced performance compared with the classical DE approach [65].

Modified DE with P-Best Crossover (MDE_pBX)
MDE_pBX involves F and Cr values that are produced using a Cauchy distribution using a position parameter, and then adapted on the power average of entirely F/Cr ratios producing effective offspring [66].The mutation strategy used in this algorithm scheme (DE/current-to-best/1) can be expressed as follows: V i,g = X i,g +F i (X gr best ,G −X i,G +X r i 1 ,G −X r i 2 ,G ) where X gr best ,G is the finest of the q% vectors arbitrarily selected from the existing generation, whereas X r i 1 ,G and X r i 2 ,G are two distinctive vectors chosen randomly from the current population and are not equal to X gr best ,G or the target.
In the p-best crossover process, each different random vector is chosen from the p best-ranking vectors in the present population [67].Then, a standard crossover is executed as per (5) between the vector and the arbitrarily chosen one from the p-top vector to produce the trial vector with an identical index.The variable p is linearly made smaller with the following generations as shown: where G is the present generation value, G max is the most extreme number of generations and Np is the population number.The parameter adaption mechanism F i is independently calculated as, Fi = Cauchy Distribution (F m , 0.1) where F m initialized with 0.5 w f = 0.8 + 0.2 * rand (0, 1) where n = 1.5 and |Cr success | is the set of cardinality.
3.2.9.DE with Self-Adaptive Mutation and Crossover (DESAMC) DE with self-adaptive mutation and crossover (DESAMC) is a new version of DE [68,69].In this approach, F is adapted using an affection index (A f i ), calculated using information about fitness.A minor A f i shows that each one is far away from the best global vector (best solution); consequently, a robust global exploration is essential.The formula of adaptation is as follows: where tanh indicates the hyperbolic tangent function where the crossover is where t is the present generation, t max is the greatest number of generations and CR + and CR − are the maximum and minimum values of CR, respectively.

Adaptive Differential Evolution with Optional External Archive (JADE)
JADE is an alternative to adapting the parameters at each generation toward progressive self-adaptation, based on the success rate [29].Qin and Suganthan [52] and Zhang and Sanderson [29] proposed the new mutation strategy (DE/current-to-pbest/1).Furthermore, they used new adaptive parameters, µ CR and µ F .
The crossover and selection operations are implemented as in the classic DE algorithm.
The greedy strategy involves a new mutation strategy called DE/current-to-pbest/1 (without archives) and assists the baseline JADE: where V best,g is the best solution that is randomly chosen as one of the best individuals from the current population [56].Similarly, V 1,g , V 2,g and V 3,g are randomly selected from the current population.However, V 3,G is also randomly chosen from the union between X i,g and V 1,g .
JADE is also applied to the archiving process.Initially, the archive is unfilled and is added to the parent solutions that fail in the selection process [70].The purpose of the archive is to avoid calculation overhead.Moreover, the archive has a limited size; thus, if the size of the archive grows beyond R, then a shrink operation is performed to reduce its size so that it does not exceed (α, NP).
The archive technique provides information about the direction required to improve the diversity of the population.In addition, arbitrary F values can help expand population diversity [71].

Adaptation of µ CR µ F
The adaptation technique used for JADE is applied to µ CR and µ F to produce the mutation rate F i and the crossover rate CR i related to each individual vector xi.JADE is implemented in each iteration i, and the crossover rate CR i of each individual xi is individually formed based on a normal random distribution = Normal Distribution (µ CR , 0.1), where the mean µ CR is initially 0.5 and the standard deviation is 0.1, i.e., CR i =Normal Distribution (µ CR , 0.1).
Then, S CR is calculated, which represents the set of all effective crossover rates CR i .Furthermore, the parameter µ CR is updated in each iteration; this information is saved, and random information is deleted from the archive file to keep its size R. µ CR is calculated as follows: Similarly, the mutation rate F i is calculated using the Cauchy distribution (µ F , 0.1), with the constraint that F i = 1.
If F i ≥ 1 or F i ≤ 0 and µ f is initialized as 0.5, then Fi =Cauchy Distribution (µ f , 0.
where S F indicates the set of all effective mutation rates F i .Then, µ f is updated as follows: where mean L indicates the Lehmer mean which is calculated as follows:  [72].The goal of the covariance matrix adaptation is to estimate the reverse Hessian matrix, analogously to a quasi-Newton technique.Furthermore, to increase the utility of the DCMA-EA, the greedy selection method of DE is applied to improve individuals in the next generation [73].CMA-ES uses a new differential perturbation structure, and the new population vector is shaped by the following equation: where randn(dim) T is a group of random numbers taken from a normal distribution with zero mean and a standard deviation of 1 and has an element number equal to the dimensions of the function at hand.The parameter "m" and the evolution of "σ" determine the overall standard deviation.By using a shared population, the new mutated vectors are produced as the target vectors as follows: where x r i 1 ,g and x r i 2 ,g are two vectors randomly selected from the population, m is the average of the present population, B is an orthonormal of eigenvectors, and D is the square root of the commensurate none negative eigenvalues.P is a control value which maintains the contribution of the average vector of the existing population and target ones as well.Both of the scale F and P are computed as follows: F i = 0.5 + 0.5•rand(0, 1)

Differential Evolution with Multiple Strategies
In this approach, four different mutation strategies and one crossover operator are used within a single algorithm framework, as proposed by Elsayed et al. [74].The main objective is to adapt a mutation strategy by choosing one from a pool of allowable schemes.In fact, although this algorithm involves different mutation strategies with dissimilar features, the authors believe that these different strategies cannot yield suitable performance.Therefore, the performance of the mutation strategy is dependent on the progression of the evolution, which is based on the success of the search operators.
Therefore, the feasibility status and the fitness value factors are used to measure the enhancement in the infeasibility.If the problem becomes increasingly feasible, the improvement index is calculated as follows: where V best i,t is the best individual at generation t and avg•V i,t is the average of the violation.

Hybrid DE Algorithms
Hybridization is another way to increase convergence for optimization.Hybridized approaches balance global and local search techniques.Hybridization is the method of joining the advantages of two or more algorithms to produce one algorithm that is anticipated to generate better offspring [75].Each approach has its strengths and weaknesses.Thus, by combining different approaches, performance is improved [76].Hybridization can be implemented at four stages of interaction [76].The first is at the individual stage for the search at examination level, which defines the performance of an individual in the population.The second is the population level, which appears as the dynamic range of a population.The third is the exterior level, which delivers communication with other methods.The fourth is the meta data level, in which a superior metaheuristic contains its strategies [77].
Each optimization technique has specific operators and procedures; for example, the DE algorithm consists of mutation, crossover and selection.In the hybridized technique, some operators can cooperate between two algorithms to exploit the complementary characteristics of different optimization strategies [78].In fact, choosing a suitable combination of balanced algorithms is the key to achieving enhanced performance.Nevertheless, developing an effective hybrid algorithm is not easy because it requires proficiency in different areas of optimization.There are many types of problems for which a classic or modified differential evolution algorithm might fail to find a suitable solution [79].Therefore, recently applied DE hybridization approaches have become widespread due to their ability to handle many real-world problems.Some of the benefits of DE hybridization have been previously discussed [18].To enhance the performance of DE, such as the speed of convergence or the quality of DE, and to solve larger systems, DE must incorporate hybrid evolutionary methodologies [80].In general, there are three types of hybridizations for evolutionary algorithms involving global optimization: hybridization with local search, hybridization with global optimization and hybridization involving both techniques [81].In this section, we highlight and demonstrate several hybrid differential evolutionary algorithms reported in the literature.

Hybridization of DE with Other Evolution Algorithms
DE has been frequently hybridized with PSO because both algorithms implement simple difference processes to perturb the current population [82].The variation between the current and the best individual is utilized both in the refresh population method of PSO and in the DE/current-to-best/1 mutation strategy.
The particle swarm optimization (PSO) method was proposed by J. Kennedy and R.C. Eberhart [83,84].The technique shows perfect action compared with that of other evolutionary algorithms or metaheuristics.This approach mimics human cognition and has been applied to optimization problems.The goal is to apply a group of individuals called a swarm of particles [85].The same notation used for DE is used for PSO; a vector is used as a solution for an optimization task t.At each loop t, a particle alteration index, affected by its velocity v i (t) via the equation x i (t) = x i (t − 1) + v i (t).However, two equations control the updating of the velocity v i (t).
g best represent the whole population and lbest describes the subpopulation encompassing the particle.The g best is practical for best results.Let p g be the best results of the population; thus, social influence is mathematically expressed as ρ 2 * (p g − x i (t − 1)).Therefore, updating the particles at each loop as follows: where ρ 1 and ρ 2 are the control parameters.PSO has several disadvantages, the most significant of which is its premature convergence.PSO consists of three components: previous velocities v i (t − 1), present behavior ρ 1 * (p i − x i (t − 1)), and social behavior ρ 2 * (p g − x i (t − 1)).
Because PSO is built on these three components, it will not operate if any of those components has any issue; for example, a vector consisting of a bad solution will retard the optimal solution.However, DE does not carry the initial two features of PSO.The individual construct is based on a random walk algorithm in the search space, which then selects the optimal position index [86].
In PSO, the next position is based on the present optimal position p i and by the particle's velocity v i .In addition, the third feature of PSO could be inferred in DE as the RAND/BEST strategy.PSO refresh the velocity of a particle applying three expressions.In the proposed strategy, the particle velocities are updated by carrying the subtract of the index vectors of any two dissimilar particles arbitrarily selected from the swarm.Das et al. proposed PSO-DV (particle swarm with differentially perturbed velocity) [87].In the proposed scheme, particle velocities are perturbed by a new term containing the weighted difference of the position vectors of any two dissimilar particles randomly selected from the swarm.This differential velocity term mimics the DE mutation [88].PSO-DV applies the DE differential operator to update the velocity of PSO.Two vectors are chosen randomly from the population.Then, unlike in PSO, a particle is moved to a new position only if the new position produces a better fitness value.In PSO-DV, for each particle i in the swarm, two other separate particles j and k (i = j = k) are chosen randomly.The difference between their locations is calculated as a difference vector: where CR is the crossover rate, δ d is the component of the subtract vector and β is a factor rate in the range [0, 1].Hendtlass proposed the first combination of DE and PSO and called it SDEA as the individuals comply swarm principles [60].DE is used to transfer the individuals to the promised region in a random fashion.[91].DEPSO performs well with numerical integer problems but is not efficient for small feasible space problems.Mutations are maintained by a DE operator on p i , with a trail vector T i = p i for the dth dimension: where k is a random value within the domain [1, D], which includes that the mutation has at least one dimension.CR is a crossover constant, and δ 2 , is the case of N = 2 for the general difference vector δ N , which is defined as follows: where ∆ is the difference vector and P A and P B are randomly chosen from the p-best set.[94].GA operates on the binary element variables through the DE process to enhance the related power-related variables [95].The advantage of a GA lies in its ability to discover a decent solution to a problem whenever the iterative approach is too time-consuming and the mathematical approach is unobtainable [96].GA allows for the fast discovery of the solution.Although the genetic algorithm is not excessively complex, the parameters and implementation of the GA generally require a tremendous amount of tuning [97].
The advantage of DE is that, in general, it frequently shows better solutions than those yielded by GA and other evolutionary algorithms [98][99][100].Furthermore, DE is easy to apply to a wide variety of problems regardless of noisy, multi-modal, multi-dimensional spaces, which typically make problems difficult to optimize.Although DE consists of two important parameters, Cr and F, those parameters do not require the same amount of tuning as those associated with other evolutionary algorithms [101].Liao has proposed a hybridization of DE and a local search algorithm modeled after the harmony search (HS) algorithm to find the global optimum [102].The main goal of this type of hybridization method is to advance the use of mixed discrete and real-valued-dimensional problems.
Boussaïd et al. proposed a hybridization of DE and biogeography based optimization (BBO) to deliver solutions through the optimal power distribution method in a wireless sensor network (WSN) [103,104].
Guo and others have proposed a form of DE enhanced among self-adaptive parameters that depend on simulated annealing algorithms in the collection of DE; the classic selection technique is a greedy equation [106].The greedy rule is easily trapped in a local optimum.However, a new selection technique based on simulated annealing is used in this algorithm.The approach is expressed as follows: where t G represents the Gth generation temperature.Pholdee and Bureerat offered a hybrid algorithm involving the trial vector method of DE called the real-coded population-based incremental learning (RCPBIL) algorithm [107].The RPBIL can be extended to multi-objective optimization similarly to multi-objective PBIL using binary codes for which the population serves as a likelihood vector for single-objective problems [108].When addressing multi-objective problems, more probability vectors are utilized to maintain population variety.Likewise, with the binary code of PBIL, the multi-objective style of the RPBIL uses numerous possibility matrix that appear for a real code population, where each probability matrix is called a tray [109].
A three-dimensional matrix, represents a group of probability trays with dimensions n*n l *n T , n T is the number of trays required for each tray drive to be used to produce a real-code subpopulation, which has approximately N P /N T form results as its members.
An initial population is formed for the multi-starting search procedure with early likelihood trays.An initial Pareto archive is obtained, and non-dominated results are then designated to update the probability trays.Then, the centroid of the non-dominated solution set (R G ) is used to update a probability tray in the series, where the r G of the set that has the lowest value of the first objective function is applied to update the first tray and so on.
The updating procedure for each tray can be improved by substituting X best with r G .Subsequently, a population yielding the updated trays is shaped.The Pareto archive is changed by substituting its members with non-dominated solutions saved from the mixture of the current population and the elements in the preceding archive.If the number of archive elements is larger than the constant archive size, the clustering method is initiated to eliminate non-dominated solutions from the archive.These steps are repeated until a stopping condition is fulfilled [110].
Neri et al. [111,112] proposed a compact DE hybridized with a memetic search to yield faster convergence [113].The algorithm represents the population as a multi-dimensional Gaussian distribution and is called disturbed exploitation compact differential evolution (DEcDE) [114].The DEcDE algorithm utilizes an evolutionary framework based on DE logic assisted by a shallow depth for processing the local search algorithm [114].
The output of the algorithm was introduced to create an MC model to gain high efficiency on a diverse set of problems, regardless of its limits, in terms of complexity and memory usage.At the start of the DEcDE algorithm, an a(2.n) probability vector (PV) is produced.
where µ and σ are, respectively, the mean and standard deviation values for each design variable from a Gaussian probability distribution function (PDF) truncated within the interval [−1, 1].

PDF = [µ[i], σ[i]]
Zhan and Zhang proposed a differential evolution (DE) algorithm with a random walk (DE-RW) [115].DE-RW is analogous to the classic DE algorithm, with a minor alteration in the crossover procedure that mixes the individual vector and the mutant vector to perform a random walk, forming the target vector as follows: where g and G are the current generation number and the maximum number of generations, respectively.A few notable DE algorithms are summarized in Table 2.

Conclusions
This paper presents a survey of DE and analyzes the modifications recently proposed in the literature to enhance the performance of DE.Various state-of-the-art differential evolution algorithms incorporating different strategies have been studied.DE performance is affected by the types of techniques applied.Enhanced DE algorithms are categorized into three groups: adaptive, self-adaptive and hybrid.However, as demonstrated in this article, seemingly minor modifications can greatly enhance the performance of DE.All the DE techniques and mechanisms described here offer partial improvements to the classic DE components to yield better performance.The selection of proper techniques in DE is associated with the mathematical characteristics of the problems addressed.Therefore, by properly adjusting the control parameters of different crossovers, similar levels of performance can be attained.This article offers an essential guide to differential evolution practitioners for acquiring optimum control parameters for DE.Thus, if information about the problem to be addressed is available, suitable optimal values should be adopted based on the basic guidelines and findings summarized in this paper.Researchers interested in DE can find valuable contributions in the literature about DE.

Figure 1 .
Figure 1.Random vectors selected in the mutation strategy (classic differential evolution (DE)).

Figure 1 .
Figure 1.Random vectors selected in the mutation strategy (classic differential evolution (DE)).

Figure 2 .
Figure 2. The differential β and base vector δ provide the optimal direction.

Figure 2 .
Figure 2. The differential β and base vector δ provide the optimal direction.

1 2 where n = 1 . 5 and 1 )
|F success | is the set of cardinality.The crossover probability adaptation Cr of each individual vector is independently created as Cri =Gaussian Distribution (Cr m , 0.Cr m = w Cr •Cr m + (1 − w Cr )•mean Pow (Cr success )w cr = 0.8 + 0.2 * rand (0, 1) mean Pow (Cr success) = ∑ x∈Cr success x n (|Cr success |) 1 2 rand(0, 1) > CR or d = k rand(L d , H d ), else if rand(0, 1) > RW x id otherwise where L d and H d are the low and high search restrictions of the d ht dimension and the parameter RW is used to control the effect of the random walk.The parameter RW is controlled as follows:Rw = 0.1 − 0.099 × g/G .2.12.Differential Covariance Matrix Adaptation Evolutionary Algorithm (CMA-ES) Saurav et al. proposed the Differential Covariance Matrix Adaptation Evolutionary Algorithm for real parameter optimization (CMA-ES) Yu et al. proposed an adaptive hybrid algorithm based on PSO and DE (HPSO-DE) with a composed population among PSO and DE [89].The strategy incorporates the advantages of the two algorithms and maintains population diversity.Therefore, HPSO-DE has the ability to move to local optima [90].Zhang et al. offered DEPSO, which apples a similar standard of updating PSO individuals via DE Liu et al. offered a hybridization of PSO and DE in a pair of population scheme [92].Three mutation strategies are borrowed from DE (DE/rand/1, DE/current_to_best/1DE/rand/2) are applied to refresh the former best solutions [93].Trivedi et al. proposed a hybrid of DE and GA to resolve scheduling challenges

Table 2 .
Summary of different DE algorithms with variety of approaches.