Differential Evolution Optimal Parameters Tuning with Artiﬁcial Neural Network

: Differential evolution (DE) is a simple and efﬁcient population-based stochastic algorithm for solving global numerical optimization problems. DE largely depends on algorithm parameter values and search strategy. Knowledge on how to tune the best values of these parameters is scarce. This paper aims to present a consistent methodology for tuning optimal parameters. At the heart of the methodology is the use of an artiﬁcial neural network (ANN) that learns to draw links between the algorithm performance and parameter values. To do so, ﬁrst, a data-set is generated and normalized, then the ANN approach is performed, and ﬁnally, the best parameter values are extracted. The proposed method is evaluated on a set of 24 test problems from the Black-Box Optimization Benchmarking (BBOB) benchmark. Experimental results show that three distinct cases may arise with the application of this method. For each case, speciﬁcations about the procedure to follow are given. Finally, a comparison with four tuning rules is performed in order to verify and validate the proposed method’s performance. This study provides a thorough insight into optimal parameter tuning, which may be of great use for users.


Introduction
Optimization is the act of transforming something as conveniently as possible. From a mathematical perspective, optimization is the process of finding the global maximum or minimum of an objective function. The conventional techniques such as gradient-based methods or deterministic hill climbing are not generally competent to solve nonlinear global optimization problems [1]. In that context, evolutionary algorithms (EAs) have been widely employed. EAs intrinsic stochastic component allows finding global optimal points surpassing the conventional techniques. The field of evolutionary algorithms is mostly constituted by genetic algorithm (GA), evolutionary programming (EP), evolution strategies (ESs) and differential evolution (DE), see Das et al. [2].
The problems that intelligent algorithms such as DE and EAs are capable of solving, all of them belonging to swarm algorithms, are complex. These algorithms do not require gradient calculations of the function to be optimized. Furthermore, it does not require practically any particularly strong characteristics in the definition of optimization. For example, it is possible to perform these algorithms with non-continuous and non-differentiable cost functions. Additionally, it is also possible to use without having an analytical expression that directly links the optimization variables with the cost function. Moreover, they are optimization algorithms that easily avoid local minima by their nature. Complex optimization problems are shown in the following references [3][4][5][6][7]. In differential equations optimization, possible solutions must be defined by a finite parameter set. Once possible ship can be mathematically expressed by Equation (1). Specifically, adaptive techniques do not directly focus on searching the function f(·), but rather normally test various parameter combinations with distinct DE strategies, and the best of them are applied during the evolution.
In the current study, the utilization of an artificial neural network (ANN) is proposed to learn the function f(·) and afterward to extract the optimal combination of parameters (NP, F and CR). An ANN consists of many interconnected simple functional units (neurons) that perform as parallel information-processors and approximate the function that maps inputs to outputs [33]. ANNs potential to solve problems with high performance and the ability to adapt to different problems have been implemented in numerous fields such as autonomous driving [34,35], solar and wind energy systems [36,37] and financial time series forecasting [38]. The field of ANNs is in full motion, in that way to review its progress and application areas in real-world scenarios, see Abiodun et al. [39].
This work's main contribution is to present a consistent methodology for tuning the optimal parameters of DE. This methodology's heart is the use of ANN to learn the relationship between the algorithm's performance and the parameters. Knowledge on how to tune the optimal values of these parameters is scarce [15,23]. In that context, this paper suggests a manner to fill that knowledge gap. In the proposed methodology, once the objective function is designed, the experimental requisitions are defined and the DE strategy is chosen, four consecutive steps are applied: data-set generation, dataset normalization, ANN approach and best parameters extraction. In order to analyze the distinct cases that may arise, the proposed methodology is evaluated on a set of 24 objective functions from the Black-Box Optimization Benchmarking (BBOB) [40]. Finally, the validation of the proposed method is performed, on the one hand, checking if the neural network predictions are correct, and on the other, making a comparison with the four most common tuning rules: Storn and Price [9], Gämperle et al. [12], Rönkkönen et al. [13] and Zielinsky et al. [14].
The rest of this paper is organized as follows. In Section 2, the experimental set-up is defined. In Section 3, the basic DE algorithm and ANN model are presented. Section 4 introduces the proposed methodology in detail. In Section 5, the results are presented and discussed. Finally, the conclusions are drawn in Section 6.

Simulation Set-Up
The objective functions are defined and evaluated over R D . The global point search domain is given as x opt ∈ [−5, 5] D . In this study, the optimization problem is a minimization bi-dimensional (D = 2) problem. BBOB defines 24 noise-free real-parameter single-objective functions, which are detailed and described in [40]. These objective functions are composed of five groups.
The first group functions are separable (Fcn1-Fcn5). The second group functions have low or moderate conditioning (Fcn6-Fcn9). The third group functions have high conditioning and are unimodal (Fcn10-Fcn14). The fourth group functions have adequate global structure and are multimodal (Fcn15-Fcn19). The fifth group functions have weak global structure and are multimodal (Fcn20-Fcn24). Apart from the first group, all other objective functions are non-separable. The termination criteria for the DE is when 20 generations are reached. Therefore, the DE algorithm is forced to converge quickly to the optimal point. The required fitness precision for each objective function is ∆ f = 10 −2 . Consequently, the target value to be achieved is defined as follows f t = f opt + ∆ f , where f opt is the objective function's global optimal point value.
In this paper, the performance (P) expressed in Equation (1) is quantified by the parameter success rate (SR) defined in Equation (2). SR defines the probability of achieving the requirements, so it directly relates to the algorithm's effectiveness. The DE algorithm's intrinsic stochastic component provokes different results with the same parameter setting. Therefore, the performance is quantified by the mean of satisfactory runs with respect to total independent runs. Satisfactory runs are the executions of DE that achieved the required f t . Consequently and logically, higher values of SR are more interesting than lower values. SR = s runs t runs (2) where s runs is the number of satisfactory DE runs, and t runs is the total number of DE independent runs, which in this paper is 40. All simulations are performed in MATLAB 2020b software with the system configuration of Intel core i7-8550U, 8 GB RAM, 1.80 GHz processor and 64 bit Windows 10 operating system.

Differential Evolution Algorithm
In this section, the standard DE algorithm is introduced. DE is a robust populationbased meta-heuristic search algorithm in which the population of D-dimensional individuals is used to optimize a problem. Each individual of the population is a candidate solution to the problem and is coded as a vector. The population in the G-generation is defined as where D is the dimensionality of the problem and NP is the population size. DE is composed of four steps, which are initialization, mutation, crossover and selection.

Initialization
In the initialization step, three main parameters of the algorithm are defined: population size (NP), mutation factor (F) and crossover rate (CR) [41]. Moreover, the initial population of NP individuals is randomly generated according to a uniform distribution within the search space. Once initialized, the DE algorithm performs mutation, crossover and selection operations iteratively until the user-defined stopping criteria are reached.
In this iterative process, new individuals are generated and evolved over generations. In this work, all generated individuals for the search space are feasible solutions since the problems are non-constrained. If it is necessary to handle non-feasible solutions, see Caraffini et al. [42].

Mutation Operation
In each G-generation, following a mutation strategy the mutation vectors . . , NP} are generated. In this paper, the most popular mutation strategy, 'DE/rand/1', is used. In this strategy, two vectors (individuals) are randomly chosen, their difference is multiplied by a mutation factor F, and the result is added to a third random vector. The difference vector automatically adapts to the scale of the optimized function, and that is the key success factor of the DE algorithm [43]. The 'DE/rand/1' strategy is defined as follows: where r1, r2 and r3 are randomly selected distinct integers within the range [1,NP] and are also different from i. The most popular mutation strategies are defined in the next list. Each strategy affects population diversity differently; therefore, the search convergence rate might vary [44].
• 'DE/best/1:' • 'DE/best/2:' • 'DE/current-to-best/1:' • 'DE/current-to-pbest/1:' where r1, r2, r3, r4 and r5 are randomly selected distinct integers within the range [1,NP] and are also different from i. X G best is the best individual of the current population, and X G p is randomly chosen as one of the top 100p% individuals in the current population with p ∈ (0, 1] [22]. The mutation factor F defines the search step of the optimization. According to Storn and Price, F range is in [0, 2] [9]. Generally, smaller values of F are better when the population is closer to the global best value. In contrast, larger values of F are preferred when the population is far from the global best value [45]. New mutation strategies are proposed in recent years, such as random neighbor based mutation ('DE/neighbor/1') [46], new triangular mutation [28], mutation vector inspired from the biological phenomenon called Hemostasis [41], an enhanced mutation strategy with time stamp scheme [47] and 'DE/pbad-to-pbest-to-gbest/1' [48].

Crossover Operation
After the mutation operation, the crossover operation is performed to increase the diversity of mutation vectors. The crossover operator generates a trial vector {U G i = (U G i,1 , U G i,2 , . . . , U G i,D ), i = 1, 2, . . . , NP} between the interaction of the mutation vector V G i and the target vector X G i . In this paper, the classic binary crossover strategy defined in Equation (9) is used ('DE/rand/1/bin'). However, it is possible to use other crossover strategies, such as exponential crossover [49,50] or an eigenvector-based crossover operator [51].
The crossover rate CR reflects the probability in which the trial vector inherits mutation vector's values. According to Storn and Price, CR range is in [0, 1] [9]. CR has a direct influence on the diversity of the population and, therefore, on search convergence capacity [45].

Selection Operation
Following the crossover operation, the fitness value of the trial vector is calculated. Afterward, the trial vector U i and the target vector X i compete, and if U i has a better fitness value than X i , then U i will replace X i in the next generation. The selection operation is defined as follows: where f (U G i ) and f (X G i ) are the fitness values of U G i and X G i , respectively.

Artificial Neural Network Model
In this section, the mathematical model of an ANN under the DE environment is explained in detail. ANNs are composed of a set of artificial neurons and grouped in interconnected layers that, in their entirety, simulate the behavior of biological neural networks. The connection strength between neurons is quantified by layer weights (W). In this paper, the most popular ANN class called multilayer perceptron (MLP) is designed. The MLP contains at least three layers: an input layer, a hidden layer and an output layer.
As described in Section 2, the DE algorithm's performance (P) is quantified by the success rate (SR). As it is explained in Section 4, the SR value only varies with parameters (NP, F and CR) since the DE strategy is fixed during the data-set generation. The ANN learns the problem of mapping the inputs NP, F and CR to the output SR. Based on this reasoning, in this particular case, the 3-K-1 MLP topology is designed. The hidden layer neuron number (K) is obtained with an experimental approach in the ANN training phase. Figure 1 illustrates the created MLP in this paper. Inside MLP, additional constant units known as bias are applied to decide if a neuron can be fired or not [52]. The main elements of the designed MLP are as follows: • Input layer neurons are {X 1 , X 2 and X 3 }, bias is {X 0 } and first layer weights are defined in W (1) ; • Hidden layer K neurons outputs are {a K }, bias is {a (2) 0 } and second layer weights are defined in W (2) ;

•
The output layer neuron's output is a In MLP, the information is propagated from the input layer to the output layer. In that propagation, mathematical operations are performed to achieve a final prediction output value ( SR). The input layer only transmits the information, whereas the hidden and output layer apart from transmitting also transform it by using an activation function.
The parameter values of NP, F and CR are just positive, so in the hidden layer a sigmoid activation function and in the output layer a linear activation function are used. The mathematical model of network propagation is explained below.
Initially, the network takes the inputs and multiplies them by the weights of the first layer. The result generated by the multiplication passes through the sigmoid activation function to the second layer. The outputs of the hidden layer are defined as follows: where W (1) ji is the weight value from input neuron i to hidden neuron j.
Abbreviating in vectors and matrices, where the dimension of the weights of the first layer is W (1) ∈ R Kx4 , the dimension of the input vector is X ∈ R 4x1 , the dimension of the hidden layer output vector is R Kx1 and the sigmoid activation function is g(s) = 1 1+e −s . Subsequently, the hidden layer outputs are multiplied by the weight of the second layer, and it passes through the linear activation function to the output of the network. The output of the network is the predicted value SR defined as follows: (15) where the dimension of the weights of the second layer is W (2) ∈ R 1 x (K+1) , the dimension of the input vector is z (2) ∈ R (K+1) x1 , the output predicted value is SR ∈ R 1x1 and the linear activation function is f (s) = s.
In conclusion, it is clearly observed that the output prediction value ( SR) depends directly on input parameters values (NP, F and CR), first and second layer weight values (W (1) , W (2) ) and their dimension (K hidden neurons). Weights values and hidden layer neuron number (K) are defined in the neural network training phase.

Proposed Method
In solving an optimization problem with DE, the objective function and the DE algorithm are designed first (the left sub-block of Figure 2). In this work, as stated in Section 2, 24 single-objective functions are performed to check the proposed method in distinct environments. On the other hand, the DE algorithm is designed as defined in Section 3.1. After this, the DE algorithm parameters (NP, F, and CR) are tuned. A tedious trial-and-error approach for tuning the best DE algorithm parameters is commonly used in real-world problems. To tackle this issue, a consistent methodology for tuning the optimal parameters of DE is presented in this work.
The proposed methodology comprises four consecutive steps: data-set generation, data-set normalization, ANN approach and best parameters extraction. These four steps are graphed in Figure 2. For each objective function, each step is executed just one time. As a result of this process, the optimal DE parameters (NP, F and CR) are obtained for each objective function.
In the data-set generation, the DE algorithm is executed with various NP, F and CR parameter combinations, and the achieved SR is saved. In the data-set normalization, the obtained combinations of NP, F, CR and SR are normalized to help the learning process of the ANN. In the ANN approach, the training and testing of the ANN are performed. Lastly, using the best ANN of the previous step, the optimal parameters of DE (NP, F and CR) are extracted.  As declared before, the ANN learns the function f (·) defined in Equation (1). Once the objective function and DE strategy are chosen, the performance of DE (P) quantified by SR only varies with parameters NP, F and CR. Therefore, the ANN exactly learns to map the input parameters (NP, F and CR) with the output SR.

Objective Function
As concluded in Section 3.2, the network's prediction ( SR) depends directly on the weight values (W (1) , W (2) ), and these values are calculated in the training approach of the neural network. The training procedure requires a training set that relates the network's input parameters with the output value. For that reason, initially, a data-set of examples is generated.

Data-Set Generation
In this step, the DE algorithm with different combinations of input parameters (NP, F and CR) is performed, and their output (SR) is calculated and saved. The parameter NP is taken from [10,100] by steps of 10. The parameter F is taken from [0.1, 2] by steps of 0.1. The parameter CR is taken from [0.1, 1] by steps of 0.1. As a consequence of this process, a data-set of 2000 combinations of NP, F and CR with their respective SR is generated.
It should be noted that it is possible to take bigger or smaller steps. With smaller steps, more combinations of parameters are achieved, and the neural network's learning will be more precise, although the time of the data-set generation will be longer. In this study, for each objective function, the generation of the data-set required an average of six hours.

Data-Set Normalization
Once the data-set is generated, the normalization of the NP and F parameters is performed as defined in Equations (16) and (17), respectively. The normalization of CR and SR has not been performed since they are already within the range [0, 1]. After the normalization, the four parameters used in ANN (NP, F, CR and SR) are in the same range, and that helps the ANNs accurate learning.
where NPnorm is the normalized NP, NPmax is the highest NP value and NPmin is the lowest NP value.
where Fnorm is the normalized F, Fmax is the highest F value and Fmin is the lowest F value.

ANN Approach
In this third step, firstly, the ANN training is performed in order to learn the relationship between the input parameters (NPnorm, Fnorm and CR) with the desired output (SR). The training is resumed in the left scheme of Figure 3. Secondly, the trained ANN is validated in the testing procedure. The testing procedure is resumed in the right scheme of Figure 3.
The data-set obtained after the second step of the method is divided into three sets: training data-set, validation data-set and testing data-set. In the training procedure, the training data-set and validation data-set are used. The training set, which represents 75% of the data-set, is used to choose the best layer weights (W (1) and W (2) ). The validation set, which represents 12.5% of the data-set, is used to prevent the ANNs from overfitting. In the testing procedure, the test set, which represents the remaining 12.5% of the data-set, is used to evaluate and validate the trained ANNs performance.  The training of ANNs has been performed using Neural Network Toolbox from Matlab software. At the beginning of the training procedure, each layer's weights are initialized with the Nguyen-Widrow method [53]. Iteratively, the training set is propagated in the network and the network's prediction error is calculated. To conclude the training procedure, the weights of the layers are updated using the Levenberg-Marquard (LM) algorithm [54]. With this iterative procedure, the neural network has the ability to approximate non-linear functions. The training's stopping condition is the gradient level criteria of 10 −8 and six validation checks. The metric to evaluate the neural network's performance is the root-mean-squared error (RMSE) in this work.
As stated in Section 3.2, the ANNs performance depends on weight values and the number of hidden neurons (K). In this paper, K is defined by an experimental procedure. Six neural network architectures with a different number of hidden neurons (K) are created, K varies from 5 to 30, with intervals of 5 neurons. For each possible architecture, 20 networks are trained to reduce the effect of randomness in the weights initialization. The network that presents the smallest test error is chosen as the representative of the architecture. Therefore, there will be a total of six neural networks representing six architectures. Lastly, their performance is compared, and the neural network that shows the smallest testing error is selected as the final network to extract the optimal parameters values.

Best Parameters Extraction
Finally, the extraction of the best combination of parameters is performed. In this way, new combinations are created: the parameter NP is taken from [10, 100] by steps of 5. The parameter F is taken from [0.1, 2] by steps of 0.05. The parameter CR is taken from [0.1, 1] by steps of 0.05. As a consequence of this process, a data-set of 14,709 combinations of NP, F and CR has been generated. Note that the ANN trained in the previous step has learned the relation of normalized inputs parameters with the output SR variable, so new combinations generated in this step need to be also normalized following Equations (16) and (17).
Subsequently, new normalized combinations of parameters are propagated through the neural network, and their respective output SR predictions are saved. In this case, the data-set consists of 14,709 combinations of NP, F, CR and SR. These combinations have to be de-normalized following Equations (16) and (17). Finally, the combinations with the highest SR values are extracted from the data-set, and these ones are the optimal parameters to face the problem.

ANN Training Results
In this section, the results obtained after applying the third step of the proposed method are summarized. Table 1 presents the training and testing performance of each neural network architecture in BBOB objective functions. The network with the smallest testing error is chosen to extract the optimal parameter values. Table 2 resumes the chosen architecture for each objective function.
From the data given in Table 1, it can be seen that the increase in hidden layer neurons (K) generally leads to a decrease in test error. However, a small network with 15 hidden neurons obtained the best performance in Fcn6 and Fcn11. We can also see that the training and testing errors in Fcn23 and Fcn24 were zero. As it is explained in Section 5.2.3, the designed DE algorithm cannot converge to the optimal point in 20 generations with the required fitness in these two functions. Therefore, the obtained SR value with all parameters is zero, and consequently, it is a straightforward mapping problem for the neural network.  Fcn1  30  Fcn13  25  Fcn2  25  Fcn14  30  Fcn3  20  Fcn15  25  Fcn4  30  Fcn16  30  Fcn5  30  Fcn17  25  Fcn6  15  Fcn18  30  Fcn7  30  Fcn19  20  Fcn8  30  Fcn20  30  Fcn9  25  Fcn21  20  Fcn10  30  Fcn22  30  Fcn11  15  Fcn23  5  Fcn12  30 Fcn24 5

Optimal Parameters Results
In this section, the results obtained after applying the fourth step of the proposed methodology are summarized. As stated in Section 4.4, a data-set of 14,709 combinations of NP, F, CR and SR is obtained in each objective function. From this last data-set, the combination of parameters with the highest SR value is extracted. In this extraction, three behaviors are observed.

First Case: Multiple Combinations with High Effectiveness
The DE algorithm has the ability to converge quickly in 20 generations to the optimal point with the required fitness precision of ∆ f = 10 −2 . Furthermore, in this first case, the required fitness is achieved with multiple parameter combinations and high effectiveness defined as SR ≥ 0.95. That means, if the DE algorithm is executed 100 times, there are several combinations of parameters that will achieve the required fitness a minimum of 95 times.
In Table 3, the objective functions with this behavior and their respective number of combinations that achieved the required fitness are summarized. From these multiple parameter combinations, any of them is competent to solve the problem modeled by the objective function. There are numerous combinations of parameters for each objective function, so it is unmanageable to show all of them in this paper analytically. For that reason, they are published as a MATLAB file in Supplementary Materials. However, in this paper, as an example, all 14,709 parameter combinations of Fcn1 and Fcn2 are graphed in Figures 4 and 5. It is helpful to illustrate the association of parameters and performance to analyze the DEs behavior in the objective function. Each point reflects a combination of NP, F and CR. The point color depends on SR value. If the color is dark blue, that combination has a high precision; whereas if the color is dark red, that combination has a small precision. From the following two figures, it can be seen that in Fcn1 the density of blue points was higher than in Fcn2. That is logical because, as defined in Table 3, the number of combinations with high effectiveness was bigger in Fcn1 than in Fcn2.  Finally, by analyzing each objective function's extracted data-set, the best ranges are defined and summarized in Table 4. It is necessary to remark that these ranges are a generalization of the extracted analytical data-set. There are also combinations of parameters out of these ranges, which achieve high performance. One can notice that the best ranges were generally represented by high NP and CR values and low F values. We can see this effect visually in Figure 5. The DE algorithm can converge quickly in 20 generations to the optimal point with the required fitness precision of ∆ f = 10 −2 and medium SR effectiveness. In this study, medium effectiveness is expressed as SR < 0.95. In this second case, the combination with the highest effectiveness value is extracted. Table 5 shows the second case's objective functions, their best parameters combination, and the achieved SR. In all objective functions the best combination was the maximum NP, F tended to be the minimum and CR tended to the maximum. However, the SR value was distinct in each objective function. Note that depending on the user's specific problem, the effectiveness achieved may be sufficient or not. If greater effectiveness is needed, the user should increment the number of DE generations or try other DE strategies or DE variants.

Function Number Best Parameters Combination SR Value
Fcn3 Lastly, as an example, all 14,709 parameter combinations of Fcn3 are graphed in Figure 6. It can be seen that the density of blue points was much lower compared to red ones and also to blue ones of Figures 4 and 5. Furthermore, as in the first case, the best combinations were obtained with high NP and CR values and low F values. Figure 6. The illustration of 14,709 parameter combinations of Fcn3. Each point reflects a combination of NP, F and CR. The point color depends on SR precision value. The dark blue color means a high DE precision to achieve the requirements. The dark red color means a small DE precision to achieve the requirements.

Third Case: Zero Effectiveness
The DE algorithm could not converge quickly in 20 generations to the optimal point with the required fitness precision of ∆ f = 10 −2 . All 14,709 combinations of parameters had zero SR value. That means there is no combination of parameters that achieved the required fitness. To observe how far real fitness is from the required one, the DE algorithm was executed t runs times with a priory competent parameter combination, and the mean fitness precision was calculated (∆ f mean ) as follows: where ∆ f i is the difference between the target value f t and the objective function's global optimal point value f opt , and t runs is the total number of DE independent runs, which in this case was 40. Table 6 shows the third case's objective functions, the used combination of parameters, and the achieved ∆ f mean . It is observed that the mean fitness precision was much higher than the required one, so the obtained fitness value was far from the optimal one. In this circumstance, the user has two main possibilities: the definition of more flexible requisitions or the use of another DE strategy or variant. Table 6. The third case objective functions, the used combination of parameters and the mean fitness value ∆ f mean .

Validation of the Method
In the previous section, the best combinations of parameters for each objective function have been shown. As explained, these combinations are obtained from the prediction ( SR) of the ANN. In this section, these predictions defined in Supplementary Materials, Tables 5 and 6 are checked by running the DE algorithm with the extracted optimal parameters and calculating their respective SR value.
The obtained results are presented in Table 7. In this table, the used combination of parameters for each objective function, the mean value and standard deviation value of SR averaged over 10 independent runs are given. In the first case functions, from multiple optimal combinations extracted, any one is chosen. In the second and third case functions, the combinations defined in Tables 5 and 6 are chosen. Next, a comparison with prevailing tuning rules is performed in order to verify and analyze the proposed method's performance. In the introduction section, the four most popular tuning rules are defined: Storn and Price [9], Gämperle et al. [12], Rönkkönen et al. [13] and Zielinsky et al. [14]. Following these rules, a parameter combination is designed and evaluated with ten independent DE algorithm runs for each objective function. The designed parameter combinations are as follows:
The statistical results achieved with the combinations described above are summarized in Table 8. It includes the mean and standard deviation value of SR calculated over ten DE independent runs.
Several interesting observations can be obtained from the data reported in Tables 7 and 8 and among the results presented in this paper. First, we can conclude that ANN predictions are accurate. In the first and third case functions, the SR predictions and the calculated SR values are practically the same. In the second case functions, there is a mean difference of 2.3% between these two values. This difference is not related to ANN accuracy. Still, it is provoked by the algorithm's intrinsic stochastic component since it is tough to obtain an identical result in two DE independent runs. Even so, assuming this characteristic of the algorithm, the present results confirm that the ANN predictions are precise.
Second, the proposed method outperforms Storn and Price [9], Gämperle et al. [12], Rönkkönen et al. [13] and Zielinsky et al. [14]  Third, in this study, the typical pattern among the optimal parameters values has been a high population size (NP), low mutation factor (F) and high crossover factor (CR). Consequently, the best performance is achieved with many individuals taking small steps and maintaining high diversity. Furthermore, the statement realized in [55] is corroborated, which exclaims that in low-dimensional problems (D < 30), the population size of 100 individuals is a competent choice to solve the problem, whereas a population size lower than 50 individuals is rarely recommended.
Fourth, the biggest problem of the tuning rules defined in Table 8 is that they extrapolate the same choice law for all problems. That is, they do not adapt to the characteristics of the problem. In this case, it has been proven that this way of acting does not produce good results. Between the tuning rules, Storn & Zielinsky obtain the best performance in separable functions (Fcn1-Fcn5) and Rönkkönen et al. [13] in non-separable functions (Fcn6-Fcn24). The main reason for this is that in separable functions, Storn & Zielinsky use a significantly larger CR value than Rönkkönen et al. [13], while in non-separable functions, Rönkkönen et al. [13] employ larger NP and CR value than Storn & Zielinsky. On the other hand, the tuning rules generally obtain their best results in Fcn1, Fcn5, Fcn7, Fcn14, Fcn21 and Fcn22. Within the first case group, these last functions are where the DE algorithm has the greatest facilities to converge to the optimal point with the required fitness. In the results of our method, in Table 3, it is proven that precisely in these functions, the number of combinations that achieved the required fitness is the highest. The most significant advantage of these tuning rules is that it is unnecessary to design or code new approaches.
Finally, the proposed method can be extrapolated to different problems and that is mostly the differentiating key factor compared to other tuning rules. In the DE algorithm, there is a strong interaction between the parameters NP, F and CR, the DE strategy, the objective function to optimize, the global point search domain and the requisitions such as maximum generations and required fitness. In the designed method, once the objective function is designed, the DE strategy is chosen, and the requirements are defined, the values of the optimal parameters are calculated adjusting to these characteristics. In summary, this allows the user not to waste time thinking about what parameter values are competent enough to solve his problem or avoid using the tedious trial-and-error approach.

Conclusions
The present study focuses on exposing a method to tune the optimal parameters of the DE algorithm. In the DE algorithm, there is a strong interaction between the parameters NP, F and CR, the DE strategy, the objective function to optimize, the global point search domain and the requisitions such as maximum generations and required fitness. It is in the interplay of these variables where the proposed method is inserted.
The core of the method is using an ANN-MLP to predict the optimal parameter values depending on the objective function, the DE strategy and the user-defined requirements.
Subject to the DE algorithm's ability to solve the problem, three different cases may arise with applying this method. For each case, specifications about the procedure to follow are given in detail.
The final results show that ANNs prediction effectiveness is more accurate in the first and third case than in the second case objective functions. This difference is caused because in the second case objective functions, the stochastic component effect of DE is more significant than in the other two cases. Even so, taking into account this stochastic characteristic of DE, the present results confirm that these predictions are considerably accurate.
Finally, a comparison with four prevailing tuning rules is performed, and it is concluded that the proposed method outperforms these rules. Future research could examine this method's application in high-dimensional problems with different user-defined requirements and DE strategies. Our aim is to apply the proposed method to a robotic manipulator axis interpolation with auto-guided vehicle movement in future work.