Parameter Combination Framework for the Differential Evolution Algorithm

The differential evolution (DE) algorithm is a popular and efficient evolutionary algorithm that can be used for single objective real-parameter optimization. Its performance is greatly affected by its parameters. Generally, parameter control strategies involve determining the most suitable value for the current state; there is only a little research on parameter combination and parameter distribution which is also useful for improving algorithm performance. This paper proposes an idea to use parameter region division and parameter strategy combination to flexibly adjust the parameter distribution. Based on the idea, a group-based two-level parameter combination framework is designed to support various modes of parameter combination, and enrich the parameter distribution characteristics. Under this framework, two customized parameter combination strategies are given for a single-operation DE algorithm and a multi-operation DE algorithm. The experiments verify the effectiveness of the two strategies and it also illustrates the meaning of the framework.


Introduction
Differential evolution (DE) is a simple and efficient evolutionary algorithm that is used primarily for real-parameter global optimization [1].It has been successfully applied to many real-world problems [2][3][4][5]; however, the performance of DE algorithms greatly depends on parameter settings, especially in complex optimization problems.It is therefore important to study parameter control strategies for DE algorithms.
Parameters of population-based intelligent optimization algorithms can be divided into three categories: (1) operation-related parameters such as scale factor (F) and crossover probability (CR), which are used by search operations in classical DE; (2) population size (NP) and related parameters generated by strategies for dynamically adjusting the population; and (3) high-level strategy parameters: some algorithms use multi-operation, multi-strategy, or multi-population mechanisms, and may introduce new parameters.A number of parameter strategies have been proposed [6][7][8].The first category of parameters has a large influence on the algorithm and has always been the focus of parameter control strategy research.In recent years, the second category of parameters has been gradually paid attention to.The third category of parameters is related to specific high-level strategies and has no universal significance.
Different categories employ diverse methods and technologies.Within a category, parameter control strategies also vary and lead to distinct effects for specific problems.Current parameter control strategy research focuses mostly on searching for the most suitable parameter values for the current problem, process, stage, or individual.The methods employed by parameter control strategies may be deterministic rules or adaptive strategies.There are two problems involved in these methods: one is the Algorithms 2019, 12, 71 2 of 22 way to define the concept of suitable parameter values, and the other is how to set the parameter values.For example, some deterministic strategies believe that in the early stage of evolution, the search process should concentrate on exploration, thus larger F and CR are suitable parameter value; while in the later stage, the algorithm should focus on exploitation, thus smaller F and CR may be more suitable.The corresponding methods have been proposed by some studies such as decreasing the parameter value according to the generation with linear functions or nonlinear functions [9,10].Some adaptive strategies consider that the parameter values that make the individual evolution successful are suitable values, and the corresponding parameter control strategies has been proposed in some researches, such as setting parameter values through comparing the success rate of different parameter values [11,12], statistical successful parameter distribution [13][14][15][16][17][18][19][20], or retain successful parameter values [21,22].
There are also a few studies concentrated on exploring the effects of parameter combination.Parameter combination can be the combination of the values of multiple parameters of an algorithm.Some studies discuss the effect of different value combination of F and CR for DE, and some parameter control strategies try to reach the best combination value for F and CR [22].Some recent studies considered the three parameter F, CR, and NP simultaneously [12,23].In [24], the authors think that the parameters values space could be decomposed in three regions for different convergence type, however, they also stated that such a decomposition is difficult to be obtained.The authors proposed a parameter control strategy to keep the parameter values in the good convergence region.In addition to the combination of multiple parameters, the combination of multiple parameter values for one parameter in the population is a different idea.Parameter distribution is a complex combination state of this idea that the parameter values of the entire population present some distribution characteristics when the parameter control strategy is an individual-level strategy.In [25], for each of the target vectors, the value of F is switched between 0.5 and 2 in a uniformly randomized way, and the Cr value is also selected between 0 and 1.So for the population, the values of F and CR are combined by two values.The study of parameter combinations can also be extended to include the combination of parameter strategies, and combining parameter strategies with algorithm operations.In this paper, we pay great interest in the parameter distribution of the population and the combination of parameter distribution with different algorithm operations.
Through the experiments, we believe that the parameter combination (parameter distribution) has an impact on the algorithm performance, and designing proper parameter combination mode (parameter distribution characteristic) can improve the performance of the algorithm.Therefore, this paper proposes an idea as parameter region division and parameter strategy combination to support adjusting parameter distribution, while the idea still supports the traditional purpose of finding suitable parameter values.Based on the idea, we design a two-level parameter combination framework to flexibly support various modes of parameter combination.Then two parameter combination strategies are customized for DE under this framework.Our experiments verify the effectiveness of the two strategies and they also illustrate the meaning of the framework.
The remainder of this paper is organized as follows.Section 2 introduces the classic DE algorithm and related works on parameter control strategies.Section 3 discusses the effect of parameter distribution on DE algorithm.Section 4 introduces the proposed idea and the design of two-level parameter combination framework.Section 5 customize two parameter combination strategies, and Section 6 reports the experimental results and analysis.Finally, Section 7 presents the conclusions and proposals for future research.

Classical Differential Evolution
Differential evolution (DE) is a population-based heuristic random search algorithm.It has three steps in each generation of evolution: mutation, crossover, and selection [1].
The population can be described as P g = {x 1,g , x 2,g , . . ., x NP,g }, where g is the current generation of the evolution process, and NP is the population size.X i,g = {x 1,i,g , x 2,i,g , . . ., x D,i,g } is an individual vector, where D represents the dimension size of the solution space.The population is initialized uniformly, after which DE enters the iterative evolutionary process until the termination condition.
(1) Mutation step: each individual in the population as a target vector generates a corresponding mutation vector through the mutation strategy.There are many variants of the mutation strategy.The naming convention is DE/x/y, where x is the selection method of the base vector and y is the number of difference vectors.The mutation strategy of classical DE is named DE/rand/1 as follows: where x i,g is the target individual, x r1,g , x r2,g , x r3,g , x r4,g , and x r5,g are random individuals selected from the population, which differ from one other and from x i,g .x rb,g is the best individual in the current generation, v i is the generated mutation vector, and F is the scale factor parameter.
(2) Crossover step: the mutation vector v i and target vector x i exchange internal components to generate candidate vector u i .There are generally two methods for crossover operations: binomial crossover and exponential crossover.Binomial crossover is a commonly used method, and its equation is as follows: where CR is the crossover probability parameter, u i is the trail vector, and j is one of the dimensions of the vector (j = 1, 2, . . ., D).
(3) Selection step: this step determines which vectors enter the next generation.The DE algorithm uses binary greedy selection between the target vector xi and trail vector u i .The equation is as follows:

Related Works on Parameter Control Strategy for DE
The first category of parameters is related to operations performed by the individual; parameter control strategies can thus be population-level strategies or individual-level strategies.With a population-level parameter control strategy, the entire population shares the same parameter values.With an individual-level parameter control strategy, each individual has their own parameter value; thus, individuals in the population have different behavioral characteristics.
Research on the first parameter category mainly concerns parameters F and CR.The methods adopted by the parameter control strategy include the following: using fixed value, using random value, changing value according to evolutionary generation, changing value according to individual fitness, changing value according to the population model, test feedback based on statistics, and test feedback based on the single individual evaluation.
The parameter control studies of the DE algorithm begin with the random mechanism.This is an individual-level mechanism, and each individual is assigned a random parameter value.The values of the entire population may have distribution characteristics such as uniform distribution [9], normal distribution [26], or Cauchy distribution.Several studies discretized the parameter domain into a value set and randomly extracted from it.The set may be for single parameter [27] or for a combination of several parameters [28].
The strategy of changing parameter values according to evolutionary generation is generally a part of the population-level strategy.The parameter value is set by different functions (or rules) of generation such as linear functions [9] or nonlinear functions [10].
Parameter strategies that vary by individual fitness are individual-level strategies, where parameter values differ for each individual according to their fitness.The authors of [29] consider that the parameter values for individuals with good fitness should be smaller, and vice versa.In [30], the F value is set with the same idea as [29], however, the CR is adjusted according to the fitness of the donor vector, which is not a general strategy.In [31] the fitness of donor vector is also be used to adjust the parameter value.
Strategies for changing values according to the population model are primarily population-level strategies.The model is constructed according to the entire population and uses the deterministic rules or adaptive method to adjust parameter values for the population.A fuzzy adaptive differential evolution algorithm (FADE) [32] has designed a fuzzy model to adapt parameters F and CR, and in [33], a two-level parameter adaptation strategy is proposed.The first level involves adjusting the population parameters according to the population model, and in the second level, the individual parameters are adjusted according to the individual's fitness based on the population parameter values.
Test feedback strategies based on statistics are primarily adaptive individual-level strategies.An adaptive DE algorithm named JADE [13] uses the results of the last generation to count the mean of the parameter (F and CR) of all good individuals by the Lehmer mean, which is used to guide the distribution of parameter values for the next generation.The values of the individual parameters are assigned according to normal distribution for CR and Cauchy distribution for F based on the mean.Success History based DE (SHADE) [14] improved the JADE strategy by (1) proposing a weighted method for the mean formula, and (2) proposing a history list for storing the successful mean of several recent generations.Many algorithms have used JADE or SHADE parameter strategy directly or with improvement [15][16][17][18][19][20].References [11,12] discretize the parameter domain and use the results of the previous generation or a period to count the success rate of each discrete parameter value.They then use the success rate to modify the selection probability of each discrete parameter value to guide the individual's parameter allocation.
Test feedback strategies based on the individual evaluation are considered individual-level strategies.Each individual is evaluated and the parameter value of the individual is adjusted independently.A typical strategy is proposed by jDE [21].An algorithm of ensemble of mutation strategies and parameters in DE ( EPSDE) [22] presents a discretized version of this type of parameter strategy.The concept behind these methods is that good parameter values survive and propagate, while poor ones are discarded.
There are also some studies use a metaheuristic optimization algorithm to optimize the parameters of DE, such as [34,35].
In addition to F and CR, several studies have introduced other parameters in the first category.In JADE, a new mutation operation is proposed [13], and a new parameter p is introduced.JADE uses a fixed value for p, but in jSO [20], p is changed according to generation.Some studies utilize the neighborhood relationship to select parents; neighbor size may thus be a new parameter [36][37][38].Reference [38] uses four neighborhood topologies, where the related parameters are fixed.
The relationship between DE performance and the second category of parameters has not been deeply studied.Some studies relate NP to the problem dimension D, but the advices on the value of NP differ greatly from 1D to 40D [39].Some studies set NP as fixed and independent of D; 50 and 100 are primarily used in many DE variants [13,14,21].Recently, several strategies with a variable population size have been proposed, and the concept of a gradual decrease in population size is referenced in some studies.dynNP-DE [40] reduces the population size by half for every 25% of function evaluations.L-SHADE provides a linear population size reduction strategy [18] that changes the NP every generation.Another proposal that changes the NP adaptively according to improvements to the best solution is adopted in several studies, which alternately increase or decrease the NP [41,42].
The third category of parameters does not share common parameters; different high-level strategies may use different parameters [15,17,22,26,27].

The Effect of Parameter Distribution on DE Algorithm
For the population-level parameter control strategy, all individuals share the same parameter value, and the parameter value may vary during the evolutionary process, making the parameter value more suitable for the current evolutionary state of the population.For individual-level parameter control strategy, each individual has its own parameters, and thus the parameter values of the entire population present some distribution characteristics in the parameter space.However, the current individual-level parameter strategies still aim to find the suitable parameter values, such as JADE [13], SHADE [13], and jDE [21], the focus of the current study is not on the parameter combination or distribution.There are few studies specifically focusing on the effects of parameter distribution.In this section, we study the influence of parameter control strategy on parameter distribution and the influence of parameter distribution on algorithm performance by experiments.
We chose three individual-level parameter control strategies: the jDE parameter strategy [21], SHADE parameter strategy [13], and a random parameter strategy.The test functions were extracted from the CEC2014 benchmark test function [43] with the maximum number of evaluations 10,000 × D. The optimization effect is relevant to operations, operation-related parameters (F, CR), population size (NP), and test functions.For comparison, we let the parameter strategy change while the others factors remained the same.
The first test was for function f1(30D), and the NP was set to 300.We adopted the DE/current-to-pbest/1 [13] mutation operation, binomial crossover operation and the other operations were the same as the classic DE.We observed the running distribution state of all individuals and the corresponding parameter values and intercepted the map, as shown in Figure 1.In each subfigure, the graph above displays the individual distribution, and the graph below displays the corresponding parameter distribution.The population's individual distribution uses the plane of the first and second dimensions of the individual, while the parameter distribution uses the parameter plane on F and CR.The numbers at the top of each subfigure represent the generation and current optimal value, and red stars represent successful parameters.From Figure 1, the distribution characteristics of the three strategies can be seen, as described below.

•
The parameters of the random parameter strategy are evenly distributed regardless of the evolutionary state.There are evidently many parameter values that are not beneficial to evolution that affect the efficiency of the strategy.

•
The population parameter distribution of the jDE parameter strategy gradually coincides with the distribution of successful parameter values along with the evolutionary process; however, the process is slower.

•
The parameter distribution of the SHADE parameter strategy spreads around a center point, and the distribution is shifted by the difference between the mean of the successful parameter-values and the entire parameter-values.The SHADE parameter strategy enhances a part of the distribution of successful parameter values.
The SHADE strategy is superior for f1, while the jDE strategy is prone to converge for many small F values in the distribution, which impacts exploration.The random strategy displays the worst results for many invalid parameter values.It is evident that parameter distribution affects algorithm performance.
The second test involved a mutation strategy of DE/rand/1 and a test function of f2 on 50D.The NP was set to 100.The other settings are the same as the first test.The intercepted running state is shown in Figure 2, where it can be seen that the distribution characteristics of the three parameter strategies have not changed.The second test involved a mutation strategy of DE/rand/1 and a test function of f2 on 50D.The NP was set to 100.The other settings are the same as the first test.The intercepted running state is shown in Figure 2, where it can be seen that the distribution characteristics of the three parameter strategies have not changed.
However, an entirely different result appears: the SHADE strategy that performed well in the first test fell into a premature convergence.Premature convergence is identified by the state that the individual distribution is converged in a wrong region for a long period.As in individual distribution subgraph of Figure 2c, the first dimension is converged to 1e-5 precision, but the offset between the convergence location and the optimal value location is 1e1 precision, and has not improved in many generations.However, the jDE strategy that was apt to converge in the first test found the correct region and yielded optimal result.The second test involved a mutation strategy of DE/rand/1 and a test function of f2 on 50D.The NP was set to 100.The other settings are the same as the first test.The intercepted running state is shown in Figure 2, where it can be seen that the distribution characteristics of the three parameter strategies have not changed.
However, an entirely different result appears: the SHADE strategy that performed well in the first test fell into a premature convergence.Premature convergence is identified by the state that the individual distribution is converged in a wrong region for a long period.As in individual distribution subgraph of Figure 2c, the first dimension is converged to 1e-5 precision, but the offset between the convergence location and the optimal value location is 1e1 precision, and has not improved in many generations.However, the jDE strategy that was apt to converge in the first test found the correct region and yielded optimal result.However, an entirely different result appears: the SHADE strategy that performed well in the first test fell into a premature convergence.Premature convergence is identified by the state that the individual distribution is converged in a wrong region for a long period.As in individual distribution subgraph of Figure 2c, the first dimension is converged to 1e-5 precision, but the offset between the convergence location and the optimal value location is 1e1 precision, and has not improved in many generations.However, the jDE strategy that was apt to converge in the first test found the correct region and yielded optimal result.These two tests demonstrate that operation is related to parameter strategy.For the DE/rand/1 operation, the jDE strategy performs better than the SHADE strategy on test function f2, while for the DE/current-to-pbest/1 operation, the jDE strategy performs more poorly than SHADE on test function f1.The SHADE strategy, which is generally an excellent parameter strategy, is not ideal for all operations.Thus, different combination mode of operation and parameter strategy should produce different search characteristics and may be suitable for different problems.
Since jDE and SHADE parameter strategy produce very different parameter distribution characteristics, we consider that the difference of algorithm performance in the tests is due to the distribution of the two parameters F and CR.We designed further experiments based on the second test to verify whether changing the parameter distribution would improve the performance of the algorithm.The algorithm is identical with the second test except the parameter strategy, and the test function is still f2.For the parameter strategy we designed several different parameter region combinations.For the sake of discussion, we divided the domain of F and CR into two subdomains: large value [0.5, 1] and small value [0, 0.5].Then, the plane of F and CR formed four regions, as shown in Figure 3.The experiment on all types of region combination patterns was performed, and the results are shown in Table 1.For each combination, the subregions are assigned the same number of individuals.Each region uses the SHADE parameter strategy.Premature convergence is identified as described in the second test.These two tests demonstrate that operation is related to parameter strategy.For the DE/rand/1 operation, the jDE strategy performs better than the SHADE strategy on test function f2, while for the DE/current-to-pbest/1 operation, the jDE strategy performs more poorly than SHADE on test function f1.The SHADE strategy, which is generally an excellent parameter strategy, is not ideal for all operations.Thus, different combination mode of operation and parameter strategy should produce different search characteristics and may be suitable for different problems.
Since jDE and SHADE parameter strategy produce very different parameter distribution characteristics, we consider that the difference of algorithm performance in the tests is due to the distribution of the two parameters F and CR.We designed further experiments based on the second test to verify whether changing the parameter distribution would improve the performance of the algorithm.The algorithm is identical with the second test except the parameter strategy, and the test function is still f2.For the parameter strategy we designed several different parameter region combinations.For the sake of discussion, we divided the domain of F and CR into two subdomains: large value [0.5, 1] and small value [0, 0.5].Then, the plane of F and CR formed four regions, as shown in Figure 3.The experiment on all types of region combination patterns was performed, and the results are shown in Table 1.For each combination, the subregions are assigned the same number of individuals.Each region uses the SHADE parameter strategy.Premature convergence is identified as described in the second test.1, we can conclude that different regions have different effects and that combination patterns can affect the distribution of parameters and strengthen or weaken the effect.For example, with the region 4,   1, we can conclude that different regions have different effects and that combination patterns can affect the distribution of parameters and strengthen or weaken the effect.For example, with the region 4, premature convergence occurs, but when combined with the region 1, the result is superior than those for either region individually.The proper distribution or combination of parameter values can thus be useful for algorithms.
From the experiments, we acquire several conclusions: (1) different parameter control strategies may produce different parameter distribution.
(2) the parameter distribution will influence the performance of the algorithm.
(3) the parameter strategy is related to the operation adopted by DE variants.(4) adjusting the parameter distribution makes sense for improving the algorithm.
The distribution of parameters cannot, however, be flexibly controlled by a simple formula or a single strategy.For example, the jDE parameter strategy treats all successful parameter values equally, the SHADE strategy has a limitation on the combination of parameter values which cannot support maximal and minimum values simultaneously.Thus, we have designed a flexible parameter combination framework that can support multiple forms of parameter combination and customize the parameter distribution.

Two-Level Parameter Combination Framework
We proposed an idea as parameter region division and parameter strategy combination.The method is to divide the domain of the related parameters into several subdomains.Thus, the feasible parameter space forms multiple regions.The parameter distribution can be customized by choosing one or more regions.These regions can be overlapped or disconnected, and some regions can be removed from the feasible parameter space.In each region, the parameter strategy can be chosen, which produces the specific parameter distribution within the region.The parameter strategy can be an adaptive strategy, then the merits of the traditional parameter strategy of finding suitable parameter value within the region can still be remained in our idea.The combination of regions instructs the distribution scope; while the combination of parameter strategy instructs the distribution details.When the region degenerates into a point, the method turns to a simple parameter value combination method.
We designed a group-based two-level parameter combination scheme to support the idea.Each group correspondents to a parameter strategy including region and parameter control method.The scheme has a two-level architecture as described in Figure 4. Level-2 adjusts the parameter distribution with the basic parameter strategy in the specific region, and level-1 combines or adjusts the basic parameter strategies or regions by population grouping.The individual number of each group represents the ratio of each region.Therefore, adjusting the group size can lead to the change of parameter density in each region.premature convergence occurs, but when combined with the region 1, the result is superior than those for either region individually.The proper distribution or combination of parameter values can thus be useful for algorithms.
From the experiments, we acquire several conclusions: (1) different parameter control strategies may produce different parameter distribution.
(2) the parameter distribution will influence the performance of the algorithm.
(3) the parameter strategy is related to the operation adopted by DE variants.(4) adjusting the parameter distribution makes sense for improving the algorithm.The distribution of parameters cannot, however, be flexibly controlled by a simple formula or a single strategy.For example, the jDE parameter strategy treats all successful parameter values equally, the SHADE strategy has a limitation on the combination of parameter values which cannot support maximal and minimum values simultaneously.Thus, we have designed a flexible parameter combination framework that can support multiple forms of parameter combination and customize the parameter distribution.

Two-level Parameter Combination Framework
We proposed an idea as parameter region division and parameter strategy combination.The method is to divide the domain of the related parameters into several subdomains.Thus, the feasible parameter space forms multiple regions.The parameter distribution can be customized by choosing one or more regions.These regions can be overlapped or disconnected, and some regions can be removed from the feasible parameter space.In each region, the parameter strategy can be chosen, which produces the specific parameter distribution within the region.The parameter strategy can be an adaptive strategy, then the merits of the traditional parameter strategy of finding suitable parameter value within the region can still be remained in our idea.The combination of regions instructs the distribution scope; while the combination of parameter strategy instructs the distribution details.When the region degenerates into a point, the method turns to a simple parameter value combination method.
We designed a group-based two-level parameter combination scheme to support the idea.Each group correspondents to a parameter strategy including region and parameter control method.The scheme has a two-level architecture as described in Figure 4. Level-2 adjusts the parameter distribution with the basic parameter strategy in the specific region, and level-1 combines or adjusts the basic parameter strategies or regions by population grouping.The individual number of each group represents the ratio of each region.Therefore, adjusting the group size can lead to the change of parameter density in each region.The scheme is implemented as a general framework.For the traditional algorithm structure, parameter strategy and evolutionary operation are coupled together, thus the change and combination of the parameter strategies are very difficult.Whereas a framework structure can support the parameter value combination, parameter region combination, parameter strategy

Search Object with Specific Parameter Region and Strategy
A concept of search object is defined to represent an algorithm based on a specific parameter strategy design and performed by an individual group.A search object includes information and search action.
The information of the search object STRinfo is a five-tuple: STRinfo = (opstrategy, pastrategies, group, environment, model), where opstrategy represents the evolutionary operation, pastrategies is a set of parameter strategy (pastrategy) for different parameters of the operation, and pastrategy is a five-turple: pastrategy = (pa, type, num, domain, painfo).Here, pa is the parameters that the strategy deals with, type is the parameter strategy type, num is the strategy ID, domain is the parameter scope, painfo is the information for parameter strategy model.group is the group of individuals to perform the evolutionary operation: group = (size, indiset, character).Here, size is the group size, indiset is the individual set, and character is the characteristics of the set to instruct the grouping.environment is the environment of the searching group that generally represents the neighborhood scope, and model is the evaluation information of the current strategy.
The parameter strategy is designed as algorithm components such as jDE strategy, SHADE strategy, random strategy, and L-SHADE strategy for NP parameter and so on.The same type of strategies uses a uniform interface to facilitate combination with algorithm operation.The operation-related parameter strategy is divided into two phases.Phase 1 is used for collecting information and establishing the model, whereas phase 2 is used for assigning parameter values according to the model.
The action of the search object is described in Algorithm 1. Execute the parameter component with phase 1 to collect information and construct the parameter strategy model.9: end foreach.10: Collect the algorithm evaluation information to generate STRinfo.model.

Grouping for Search Object
During the search process, multiple search objects exist with different parameter strategy settings (STRinfo.pastrategies).Each search object represents an algorithm with specific parameter strategy and generates specific parameter distribution.The parameter distribution of entire population is the combination of multiple search objects.Therefore, the grouping is the first-level of distribution adjustment mechanism which can change the ratio and density of parameter distribution of each search object by changing group size.We design two grouping method in this section to adjust the group size: competition method and collaborative method.Let m represent the number of search objects.Dynamically divide the population into group set G at runtime, where group i ∈ G. Let NP i be the size of group i .It is clear that NP 1 + NP 2 + . . .+ NP m = NP.Each group is assigned to a search object, and each search object shares information about the entire population.First, we define the evaluation model for the search object, and then define the method for computing NP i .

Evaluation Model for Search Object
We define the vector STRinfo.model to describe the data that are to be collected for every generation: STRinfo.model= (s 1 , s 2 ), where s 1 is designed as the success rate of the search object and s 2 is designed as the fitness improvement state.In every generation, for each search object so i , we compute the success rate (SR i ) and the fitness improvement mean (FIM i ) as follows: SI i is the set of all successful individuals in STRinfo i .group.We then convert SR i and FIM i to the ratio of all the search objects as PSR i and PFIM i : for each search object, we update v1 and v2 as follows: where c is set to 0.5.We then calculate the evaluation model for each search object as follows: where w is the linear increment from 0 to 0.5 by generation, and soState i is the evaluation model for search object so i .

Collaboration-Based Grouping Method
In collaboration mode, each search object has its own task and has the basic proportion (P base,i ) for group size.This means that the search object (so i ) has the basic individual numbers of NP*P base,i .If the sum of all P base,i (i = 1 . . .m) is less than 1, the remaining individuals are allocated to the search objects according to their model score.NP i can thus be computed as follows:

Competition-Based Grouping Method
In competitive mode, better search objects are encouraged, and poorer objects transfer their resources to the better search objects according to their score, originating from the evaluation model.The soState of all search objects are sorted and the best set of search object is found, as shown in (14).The rest search objects reduce their group size ∆ i and add ∆ i to the best ones ( 15), ( 16) [44].∆ i is computed from the difference between the best search object and the i-th search object, as shown in (17) [44].The rate is the reduction ratio, set to 0.05.The group size has a lower limit (P min ), and NP i is no less than NP*P min .We set P min as 0.05 for large NP and 0.1 for small NP.

The Algorithm of the Framework
The framework can customize different parameter combination mode.Based on some design principle for evolutionary algorithm, the algorithm can be designed by defining several search objects.The initial group size ratios (Igsr) for all search object, grouping strategy, regrouping interval (GroInt) should be decided.Furthermore, each search object should be designed includes evolutionary operation, all the algorithm parameters, and the parameter strategy and parameter domain for each parameter.The algorithm of the framework is described as Algorithm 2. if (the first generation) 6: grouping the population according to the initial group size ratio (Igsr).

Customizing Two Parameter Combination Strategies
In this section, we customize two specific parameter combination strategies by defining their search objects (STRInfo).

Combine Parameter Regions or Values for Single-Operation DE
This strategy we named PVCDE (Parameter Values Combination for DE) is designed for single-operation DE which means that the multiple search objects use the same evolutionary operation but with different parameter values (or regions).For the DE algorithm, we chose the mutation operation DE/current-to-pbest/1 proposed by the JADE algorithm, as follows: where x pbest is an individual that is randomly selected from the top NP × p (p ∈ [0, 1]) sorted individuals in the current generation.x r2 is a random individual in the population, and x r1 is selected from the population and archive which contains the failed individuals from the selection step.The crossover strategy is binomial crossover, and the selection strategy is the same as the classic DE.The parameters include NP, F, CR, p, and archive-size.NP and archive-size are set as fixed values, and F, CR, and p are controlled by region combination (for F and CR) and value combination (for p) method.
In addition to the regions in Figure 3, we add two regions for F and CR.
• Region 5: . This region consists of the entire range except for a very small F.
. This region consists of the entire range.
We design three search objects.The first search object(so 1 ) is used for wide range search with large F, the second search object (so 2 ) is used for local search with small F and CR, and the third search object (so 3 ) performs the most favorable search for current state by adaptive parameter values in the entire parameter domain.With the overlapping of the parameter regions, the search focus can be changed by adjust the group size of each search object.All the search objects adopt the SHADE parameter control strategy for F and CR within their own region.SHADE parameter control strategy is an adaptive strategy and its parameter distribution characteristic is analyzed in Section 3.
We divide the evolutionary process into two stages.The first stage is an exploration stage, and the second stage is for exploitation.The design of search objects is showed in Table 2.In the initial stage, the search focus is exploration, thus we strengthen the region 1 by assigning larger group size to so 1 , and weaken the region 3 and region 2 by assigning smaller group size to so 3 since its parameter distribution may move in the four regions.A so 2 with small group size achieves the purpose that region 4 cooperates with region 1.The grouping mode uses collaboration-based grouping method with predefined ratio of group size to NP.After the initial stage, the search enters the exploitation stage, thus, we strength so 3 and weaken so 1 and so 2 by adaptive competition-based grouping.A slight difference between the two stages can be seen in the Table 2. so 1 changes from region 1 to Region 1∪Region 2 in order to match a category of separable problem.so 3 varies from region 5 to region 6 to avoid minimal F value in the exploration stage.For parameter p, 0.1 is a suggested value in relevant papers, we combine 0.1 with 0.5 by so 3 in exploration stage for strengthen the exploration for so 3 .
A little change for SHADE strategy also adopts in region 6.As in SHADE [20], the F i for individual x i is computed with Cauchy distribution as randc i (M F,ri , 0.1).When M F,ri is less than 0.05, we use 1.1*M F,ri to replace 0.1.F i is still restricted in the scope [0, 1] as SHADE does, but we have more chances to get a small value (in the interval from 0 to M F,ri ) for F i .The same method is used for CR, but the minimum value of CR i is limited as 0.1/D.

Combine Different Parameter Strategies for Multi-Operation DE
This strategy we named PSCDE (Parameter Strategies Combination for DE) is designed for multi-operation DE which means that the multiple search objects use the different evolutionary operations.This method involves selecting the proper parameter strategy and region for evolutionary operation.We select two mutation operations, DE/current-to-pbest/1 and DE/rand/1, and construct two search objects.The crossover strategy is binomial crossover, and the selection strategy is the same as the classic DE.We match the DE/current-to-pbest/1 operation with SHADE parameter control strategy, and DE/rand/1 operation with jDE parameter strategy.The parameter distribution of jDE strategy has been analyzed in Section 3. We divide the evolutionary process into three stages with small changes to the parameter region or value as Table 3 showed.The working mode between the two search objects is suitable for cooperation mode, because so 1 may fail when competing with so 2 , however, it may be useful to escape a small range when the population falls into a local extremum.Therefore, we use collaboration-based grouping method with predefined group size ratio (Igsr).

Experiments and Discussion
In this section, we describe several experiments to test the proposed framework and methods.We performed an evaluation test on the CEC2014 Special Session on Real-Parameter Single Objective Optimization benchmark suite [40].The CEC2014 benchmark set consists of 30 test functions.Functions F1-F3 are unimodal functions, F4-F16 are simple multimodal functions, F17-F22 are hybrid functions, and F23-F30 are composite functions.The search space is [−100, 100] D .For every function, the maximum number of evaluations is 10,000 × D. According to the rules of CEC2014, the error between the best value found and the true optimal value is considered to be zero if it less than 1e-8.The algorithm in this paper is implemented with the MATLAB language and run in MATLAB R2018a.
We present the first test between SHADE and PVCDE.SHADE is a superior DE variant.PVCDE uses the same operations as SHADE, and the parameter strategy of PVCDE in Level-2 is also similar to that of SHADE.The only difference between them is the region division and combination.
The comparison can thus test whether the method of region division and combination is valid.The algorithms settings are as follows: • SHADE: The parameters for DE/current-to-pbest/1 operation are set to p = 0.1, archive-size = 2*NP, which is the same as that in the initial paper.The memory-size of the history list used by the parameter strategy is set as 6, referencing a previous study [24].For parameter NP, we set the NP as 4D.

•
PVCDE: The population size is set to NP = 5D for the purpose of grouping, with archive-size = 1.4*NP and memory-size = 6.The initial group size ratios (Igsr) are set as [8/10, 1/10, 1/10], and the basic proportions (P base ) of all search objects in phase 1 are set to [7/10, 1/10, 1/10], which emphasizes so 1 .Every ten generations, the groups are reassigned, and the grouping method is random grouping.The switching conditions for stage 1 and 2 are 500 generations or the success rate of so 1 (STRinfo 1 .model.s 1 ) being less than 0.01.
In the experiment comparing SHADE with PVCDE, all the functions with D = 30, 50, and 100 are tested 51 times.The test results on D = 30, 50, and 100 dimensions are shown in Table 4.The error values of 51 runs are counted and the mean and standard deviation are listed.The Wilcoxon rank-sum test is performed on the experimental results at the 0.05 significance level.The symbols −, +, and ≈ represent that the SHADE algorithm is significantly worse than, better than, or similar to PVCDE, respectively.The last row in the table provides a summarized value of the rank-sum test.
As seen in Table 4, for the test of 30D problems, PVCDE has significant advantages on 15 functions according to the rank-sum statistics, and has five significantly worse results in comparison to SHADE.As the dimensions increase, the number of superior results also increases.The number of cases in which PVCDE performs better than SHADE is higher than the number of cases in which PVCDE performs more poorly than SHADE.Therefore, we believe that parameter combination technology can improve algorithm performance.
Analyzing the results of specific functions, we found that some problems are suitable for SHADE and some problems are suitable for PVCDE.For functions f9, f10, f15, f16, and f30, PVCDE performs more poorly than SHADE on most high-dimensional issues.f10 is a multi-mode separable problem; after the initial stage, its most suitable parameter values become 1 for F and 0 for CR.SHADE has a strategy that locks CR to 0, which is more suitable for f10.In PVCDE, since the CR locked to 0 cannot be recovered, this technique is only used in so 1 and so 2 .In so 3 , we give a little change for SHADE strategy without locking CR to 0. A similar case occurs with f9, f15, and f16, which are non-separable and rotated functions.Since the current evolutionary operators are not very effective for these rotation functions, there lacks a wide range of suitable parameter values.After a period, the parameter CR is still locked to 0 for a one-dimensional search in SHADE.In this type of case, the parameter-value combination scheme loses its effect in the later stage, thus affecting its efficiency.f30 is another case in which both algorithms are apt to local extremum; however, PVCDE has more opportunities to fall into a larger local extremum.For other functions, such as f1, f5, f6, f7, f12, f14, f17, f18, f20, and f21, PVCDE is better than SHADE on most high-dimensional issues, which demonstrates the effect of the parameter combination strategy.For example, so 1 was the most useful for f1, and for f6 in 30D, the combination of so 1 and so 2 assisted so 3 to reduce the cases involving falling into a local extremum.
The process of parameter distribution variation is demonstrated in Figure 5.The sample function is f12 (50D).The figure was intercepted during the runtime in different generations.The red stars in the figure represent the successful parameters which make the individuals evolve successfully.We can observe the variation process.In the initial stages, the region 1 is strengthened by so 1 , and region 4 is also give a little percentage by so 2 .Then the success rate of region 1 is decreased and the evolution enters the stage2.The three search objects competed.The region 2 get higher density.At later period, so 1 and so 2 's CR is locked to 0, but so 3 which does not use this technology moved to left boundary.The strategy of PVCDE can bring more control ability to parameter distribution.parameter-value combination scheme loses its effect in the later stage, thus affecting its efficiency.f30 is another case in which both algorithms are apt to local extremum; however, PVCDE has more opportunities to fall into a larger local extremum.For other functions, such as f1, f5, f6, f7, f12, f14, f17, f18, f20, and f21, PVCDE is better than SHADE on most high-dimensional issues, which demonstrates the effect of the parameter combination strategy.For example, so1 was the most useful for f1, and for f6 in 30D, the combination of so1 and so2 assisted so3 to reduce the cases involving falling into a local extremum.
The process of parameter distribution variation is demonstrated in Figure 5.The sample function is f12 (50D).The figure was intercepted during the runtime in different generations.The red stars in the figure represent the successful parameters which make the individuals evolve successfully.We can observe the variation process.In the initial stages, the region 1 is strengthened by so1, and region 4 is also give a little percentage by so2.Then the success rate of region 1 is decreased and the evolution enters the stage2.The three search objects competed.The region 2 get higher density.At later period, so1 and so2 's CR is locked to 0, but so3 which does not use this technology moved to left boundary.The strategy of PVCDE can bring more control ability to parameter distribution.In the second experiment, we combined the second category parameter strategy with our method: we added the parameter strategy LPSR for the NP into our framework.LPSR is a strategy proposed by L-SHADE that reduces the population size linearly according to the number of function evaluations.L-SHADE is a variant of SHADE with LPSR.Because L-SHADE was successful in the CEC2014 competition, we constructed L-PVCDE with the same NP strategy and compared to it.The algorithm settings are as follows:  L-SHADE: the parameter settings is same as its initial settings in paper [18].The population size is from 18D to 4, and the parameters for DE/current-to-pbest/1 operation are set to p = 0.  5.The mean and standard deviation of each algorithm are listed in the Table 5. Wilcoxon rank-sum test was applied to compare L-SHADE and L-PVCDE.The symbols −, +, and ≈ represent that the L-SHADE algorithm is significantly worse than, better than, or similar to L-PVCDE, respectively.Since L-PVCDE uses the L-SHADE strategy only with a combination of parameter regions, L-PVCDE is compared to L-SHADE to test if the combination is useful.In the second experiment, we combined the second category parameter strategy with our method: we added the parameter strategy LPSR for the NP into our framework.LPSR is a strategy proposed by L-SHADE that reduces the population size linearly according to the number of function evaluations.L-SHADE is a variant of SHADE with LPSR.Because L-SHADE was successful in the CEC2014 competition, we constructed L-PVCDE with the same NP strategy and compared to it.The algorithm settings are as follows: • L-SHADE: the parameter settings is same as its initial settings in paper [18].The population size is from 18D to 4, and the parameters for DE/current-to-pbest/1 operation are set to p = 0. We performed the test on D = 50 dimensions, with 25 runs for 30 functions.The test results are shown in Table 5.The mean and standard deviation of each algorithm are listed in the Table 5. Wilcoxon rank-sum test was applied to compare L-SHADE and L-PVCDE.The symbols −, +, and ≈ represent that the L-SHADE algorithm is significantly worse than, better than, or similar to L-PVCDE, respectively.Since L-PVCDE uses the L-SHADE strategy only with a combination of parameter regions, L-PVCDE is compared to L-SHADE to test if the combination is useful.As seen in Table 5, L-PVCDE obtains better results than L-SHADE with 16 functions, and worse results than L-SHADE with seven functions.There functions for which L-PVCDE is worse than L-SHADE including f9, f10, f13, f15, f16, f26, and f23 according to the result of Wilcoxon rank-sum test.Whereas, there are some functions for which L-PVCDE is superior to L-SHADE.For f1, f5, f6, f12, f14, f17, f18, f20, and f21, L-PVCDE has obvious advantages on mean and Wilcoxon rank-sum test result, while for f4, f8, f24, f25, f27, f28, and f30, L-PVCDE is better than L-SHADE according to Wilcoxon rank-sum test result, but according to the mean the advantage of L-PVCDE is not very obvious.However, it is evident that the parameter-value combination strategy does indeed work for many problems.
In the third experiment, we tested PSCDE with multi-operation algorithms (EPSDE [22], MPEDE [15], and CoDE [27]).Because the algorithm parameter strategies originated from jDE and SHADE, jDE and SHADE were also tested to see whether PSCDE led to improvements.The selected algorithms were set to their initial settings in the paper.For PSCDE, the NP was set to 100, which was the case for most algorithms in the comparison, and the basic proportions of all search objects were set to [1/2, 1/2] in all phases.Every 10 generations, the groups were reassigned.The grouping method was random or index-based, with a 0.5 probability.The switching condition for stage is every 500 generations.
We performed the test on D = 50 dimensions with 25 runs.The test results are shown in Table 6.The mean and standard deviation of each algorithm are listed in the table, with the smallest mean in bold.PSCDE wins for 11 functions in terms of the mean, and in second place is SHADE, which wins for 7 functions.Because evaluation by the mean is sometimes biased, a rank-sum test was performed for pairwise comparison.The symbols −, +, and ≈ represent each algorithm is significantly worse than, better than, or similar to PSCDE, respectively.In this test, SHADE, jDE, and PSCDE have the same population size, NP = 100.In comparison to SHADE and jDE, PSCDE displays good performance.PSCDE obtains better result than SHADE for 21 functions and poorer results for four functions.With jDE, PSCDE obtains better results for 24 functions and poorer results for three functions.Thus, PSCDE shows significant improvement in comparison to SHADE and jDE.Compared to SHADE and jDE simultaneously, PSCDE wins for 17 functions and fails for one function, f10.This effect results from the cooperation of the two strategies: they compensate for each other to obtain better result than each one alone.For 12 functions, PSCDE has the compromise effects between SHADE and jDE, or no significant improvement.Thus, with a combination algorithm, we can obtain a result exceeding all the sub-algorithms, or at least a result close to the best sub-algorithm.This robustness is attained by the combination algorithm.EPSDE, CoDE, and MPEDE are all multiple-mutation operations algorithms, and they concentrate on operation combination and use the unified parameter strategy.According to the rank-sum test results, PSCDE demonstrates better performance than these algorithms.The results of Wilcoxons rank-sum tests show that PSCDE is significantly better than EPSED, MPEDE, and CoDE on 29, 18, and 21 functions, respectively.It is significantly worse than EPSED, MPEDE, and CoDE on 0, 3, and 3 functions while there is no significant different between PSCDE and other comparative algorithms on 1, 9, and 6 functions, respectively.The test expresses that operation and parameter strategy combination is important, and that the combination of parameter strategies also plays an important role on the algorithm's effect.
Furthermore, a multiple comparison test on the results of CEC2014 50D test is performed.The algorithms including the ones we proposed in this paper (PVCDE, L-PVCDE, PSCDE) and the comparison algorithms we used (SHADE with 200 individuals, L-SHADE with 100 individuals, jDE, EPSDE, MPEDE, CoDE).We used Friedman and Holm-Bonferrini test [45][46][47] to get the comparison result showed Table 7.The null hypothesis is set to: 'There was no significant differences between L-PVCDE and the j-th algorithm'.The null hypothesis is rejected in 3 cases out of 9. We use SPSS statistical software to perform further Friedman analysis.We get the uniform subsets of multiple algorithms, as shown in Table 8, the algorithms were divided into 4 subsets according to the similarity.The algorithms combined with strategy for NP parameter are superior than others, which means that it is very meaningful to control the population parameters and operational parameters simultaneously.In the algorithms without NP strategy, the PVCDE and PSCDE have similar performance, but superior than others.This also illustrates that further discussing the parameter control strategy is very important for differential evolution algorithm.

Conclusions
Parameter strategies are very important for DE and have a large impact on algorithm performance.There have been many parameter strategies proposed using different methods.Current research on DE algorithms focuses mainly on the combination of operations, or the combination of multiple algorithms.However, research on the combination of parameters (including the combination of parameters values, regions, and strategies, and the combination of parameter strategies and operations) has been a missing link.This paper provides an exploration of this area, and experiments confirm that studying parameter combination is meaningful.The methods proposed in this paper improve the original algorithms to an extent, which also supports our idea.Current research on various type of combination algorithm focuses almost on specific combination schemes, the realization of the algorithm is difficult to change.In this paper, in addition to the method we proposed, the framework we have designed is also meaningful.It allows the parameter combination to be easily customized.It can be used for parameter combination research or new method design.However, there are still several difficulties with the proposed methods.For instance, evaluating the effect of current combination state is an existing problem that deserves further study.

Figure 1 .
Figure 1.Individual and parameters distribution maps of three parameter strategies for f1: (a) random strategy; (b) jDE strategy; and (c) SHADE strategy.

Figure 1 .
Figure 1.Individual and parameters distribution maps of three parameter strategies for f1: (a) random strategy; (b) jDE strategy; and (c) SHADE strategy.

Figure 1 .
Figure 1.Individual and parameters distribution maps of three parameter strategies for f1: (a) random strategy; (b) jDE strategy; and (c) SHADE strategy.

Figure 2 .
Figure 2. Individual and parameter distribution maps of three parameter strategies for f2: (a) random strategy; (b) jDE strategy; and (c) SHADE strategy.

Figure 2 .
Figure 2. Individual and parameter distribution maps of three parameter strategies for f2: (a) random strategy; (b) jDE strategy; and (c) SHADE strategy.

Figure 4 .
Figure 4. Grouping based two-level parameter combination scheme.

Figure 4 .
Figure 4. Grouping based two-level parameter combination scheme.The scheme is implemented as a general framework.For the traditional algorithm structure, parameter strategy and evolutionary operation are coupled together, thus the change and combination of the parameter strategies are very difficult.Whereas a framework structure can support the parameter value combination, parameter region combination, parameter strategy combination, parameter strategy, and evolutionary operation combination, and can flexibly customize different parameter combination strategy.

Algorithm 1 .
Execute search object algorithm.Input: STRinfo Output: offspring set, STRinfo.model1: According to the STRinfo.group,assign the individual set.2: foreach parameter strategy in STRinfo.pastrategies3: Execute the parameter component with phase 2 to generate the parameter value set.4: end foreach 5: Execute the operation (STRinfo.opstrategy)and return the offspring set.6: Compute the objective of the offspring.7: foreach parameter strategy in STRinfo.pastrategies8:

Algorithm 2 . 1 :
The algorithm of the framework.Initialize the population.2: Evaluate the individuals of the population 3: According to the design of search objects, create the runtime list of so with STRInfo.4: while (not (termination condition)) 5:
11, archive size = 2.6*NP, and memory size = 6. L-PVCDE: the population size is from 18D to 10 for the grouping restriction.The initial group size ratios (Igsr) are set as [16/18, 1/10, 1/10], and the basic proportions of all search objects in stage 1 are set to [15/18, 1/18, 1/18].The other settings are the same as PVCDE.We performed the test on D = 50 dimensions, with 25 runs for 30 functions.The test results are shown in Table

Table 1 .
Results for all region combination patterns.

Table 1 .
Results for all region combination patterns.

Table 3 .
The parameter scheme of PSCDE.

Table 8 .
The uniform Subsets of Algorithms (Friedman test by SPSS).