1. Introduction
Nonlinear unconstrained optimization is an active research area, since many reallife challenges/problems can be modeled as a continuous nonlinear optimization problem [
1]. To deal with this kind of optimization problems, various natureinspired population based search mechanisms have been developed in the past [
2]. A few of those are Differential Evolution (DE) [
3,
4], Evolution Strategies (ES) [
2,
5], Partical Swarm Optimization (PSO) [
6,
7,
8,
9], Ant Colony Optimization (ACO) [
10,
11,
12,
13], Bacterial Foraging Optimization (BFO) [
14,
15], Genetic Algorithm (GA) [
16,
17,
18], Genetic Programming (GP) [
2,
19,
20,
21], Cuckoo Search (CS) [
22,
23], Estimation of Distribution Algorithm (EDA) [
24,
25,
26,
27,
28] and Grey Wolf Optimization (GWO) [
29,
30].
DE does not need specific information about the complicated problem at hand [
31]. That is why DE is implemented to solve a wide variety of optimization problems in the past two decades [
30,
32,
33,
34]. DE has merits over PSO, GA, ES and ACO, as it depends upon few control parameters. Its implementation is very easy and user friendly, too [
2]. Due to these advantages, we selected DE to perform global search in the suggested hybrid design. In addition, because of its easy nature, DE is implemented widely [
35,
36,
37,
38,
39,
40,
41,
42] on practical optimization problems [
35,
36,
37,
38,
39,
40,
41,
42]. However, its convergence to known optima is not guaranteed [
2,
31,
43]. Stagnation of DE is another weakness identified in various studies [
31].
Traditional search approaches, such as Nelder–Mead algorithm, Steepest Descent and DFP [
44] may be hybridized with DE to improve its search capability. Implementing LS into a global search for enhancing the solution quality is called Memetic Algorithms (MAs) [
31,
45]. Some of the recent MAs can be found in [
1,
31]. Very recently, Broyden–Fletcher–Goldfarb–Shanan LS was merged with an adaptive DE version, JADE [
46], which produced the MA, Hybridization of Adaptive Differential Evolution with an Expensive Local Search Method [
47]. In the majority of the established designs, LS is implemented to the overall best solutions, while in our design it is applied to the migrated elements of the archive. In addition, the population is adaptively decreased.
In this work, we propose a hybrid algorithm that combines DFP [
44,
48,
49] with a recently developed algorithm, RJADE/TA [
50], to enhance RJADE/TA’s performance in local regions. The main idea is to operate DFP on the elements that are shifted to archive and record the information from both solutions, the previously brought forward and the new potential solutions to discourage the chance of losing the globally best solution. For this purpose, firstly, DFP is implemented to the archived information. Secondly, a decreasing population mechanism is suggested. The new algorithm is denoted by RJADE/TAADPLS.
The structure of this work is as follows.
Section 2 presents primary DE, DFP, and RJADE/TA methods.
Section 3 describes the literature review. In
Section 4, the suggested hybrid algorithm is outlined.
Section 5 is devoted to the validation of results achieved by RJADE/TAADPLS. At the end, the conclusions are summarized in
Section 6.
2. Primary DE, DFP, and RJADE/TA
We reviewed in detail traditional DE and JADE in our previous works [
47,
50]. Here, we briefly review primary DE, DFP and RJADE/TA for ready reference.
2.1. Primary DE
DE [
3,
4] starts with a random population in the given search region. After initialization, a mutation strategy, where three different individuals from population are randomly selected and the scaled difference of the two individuals to the third one, target vector is added to produce a mutant vector. Following mutation, the mutant and the target vectors are combined through a crossover operator to produce a trial vector. At last, the target and trial vectors are compared based on a fitness function to select the better one for the next generation (see Lines 7–20 of Algorithm 1).
Algorithm 1 Outlines of RJADE/TA Procedure. 
 1:
To form the primary population ${P}_{p}$ produce ${N}^{\left[pop\right]}$ vectors uniformly and randomly, ${\mathbf{w}}_{[j,{s}_{1}]}^{\{y\}},{\mathbf{w}}_{[j,{s}_{2}]}^{\{y\}},\dots ,{\mathbf{w}}_{[j,{s}_{{N}^{\left[pop\right]}}]}^{\left[y\right]}$;  2:
${M}^{\left[first\right]}={M}^{\left[s\mathrm{e}c\right]}=\varnothing $;  3:
Initialize $\lambda CR=\lambda F=0.5$; $p=5\%$; $c=0.1$;  4:
Set ${S}_{CR}={S}_{F}=\varnothing $;  5:
Evaluate ${P}_{p}$;  6:
while $FEs<MaxFEs$ do  7:
${F}_{j}=rand(\lambda F,0.1)$;  8:
Randomly sample ${\mathbf{w}}_{(b\mathrm{e}st)}^{[p,y]}$ in $100p\%$ pop;  9:
Choose ${\mathbf{w}}_{[i,{s}_{1}]}^{\{y\}}\ne {\mathbf{w}}_{[i,s]}^{\{y\}}$ in ${P}_{p}$;  10:
Choose ${\tilde{\mathbf{w}}}_{[i,{s}_{2}]}^{\{y\}}\ne {\mathbf{w}}_{[i,{s}_{2}]}^{\{y\}}$ in ${P}_{p}\cup {M}^{\left[first\right]}$ do random selection;  11:
Produce the mutant vector ${\mathbf{w}}_{[i,mut]}^{\{y\}}$ as ${\mathbf{w}}_{[i,mut]}^{\{y\}}={\mathbf{w}}_{[i,s]}^{\{y\}}+{F}_{j}({\mathbf{w}}_{(b\mathrm{e}st)}^{\{p,y\}}{\mathbf{w}}_{[i,s]}^{\{y\}})+{F}_{j}({\mathbf{w}}_{[i,{s}_{1}]}^{\{y\}}{\tilde{\mathbf{w}}}_{[i,{s}_{2}]}^{\{y\}})$;  12:
Produce the trial vector ${\mathbf{q}}_{[i,j]}^{\{y\}}$ as follows.  13:
for $i=1$ to n do  14:
if $i<{i}_{rand}$ or $rand(0,1)<C{R}_{j}$ then  15:
${q}_{[i,j]}^{\{y\}}={w}_{[i,mu{t}_{j}]}^{\{y\}}$;  16:
else  17:
${q}_{[i,j]}^{\{y\}}={w}_{[i,{s}_{j}]}^{\{y\}}$;  18:
end if  19:
end for  20:
Best selection $\{{\mathbf{w}}_{[i,s]}^{\{y\}},{\mathbf{q}}_{[i,s]}^{\{y\}}\}$;  21:
if ${\mathbf{q}}_{[i,s]}^{\{y\}}$ is the best then  22:
${\mathbf{w}}_{[i,s]}^{\{y\}}\to {M}^{\left[first\right]}$, ${CR}_{j}\to {S}_{CR}$, ${F}_{j}\to {S}_{F}$;  23:
end if  24:
If size of ${M}^{\left[first\right]}>{N}^{\left[pop\right]}$, delete extra solutions from ${M}^{\left[first\right]}$ randomly;  25:
Update ${M}^{\left[s\mathrm{e}c\right]}$ as follows.  26:
if $y=\kappa $ then  27:
${\mathbf{w}}_{[j,b\mathrm{e}st]}^{\{y\}}\to {M}^{\left[s\mathrm{e}c\right]}$;  28:
${P}_{p}{\mathbf{w}}_{[j,b\mathrm{e}st]}^{\{y\}}$;  29:
Centroid calculation $\to {\mathbf{w}}_{[j,c]}^{\{y\}}=\frac{1}{{N}^{\left[pop\right]}1}{\sum}_{i=2}^{{N}^{\left[pop\right]}}{\mathbf{w}}_{[j,c]}^{\{y\}}$;  30:
Reflection mechanism $\to {\mathbf{w}}_{[j,r]}^{\{y\}}={\mathbf{w}}_{[j,c]}^{\{y\}}+({\mathbf{w}}_{[j,c]}^{\{y\}}{\mathbf{w}}_{[j,b\mathrm{e}st]}^{\{y\}})$;  31:
end if  32:
$\lambda CR=(1c)\xb7\lambda CR+c\xb7{m\mathrm{e}an}_{A}({S}_{CR})$;  33:
$\lambda F=(1c)\xb7\lambda F+c\xb7{m\mathrm{e}an}_{L}({S}_{F})$;  34:
end while  35:
Result: The best solution ${\mathbf{w}}_{(b\mathrm{e}st)}^{\{y\}}$ corresponding to minimum function $f(\mathbf{w})$ value from ${P}_{p}U{M}^{\left[s\mathrm{e}c\right]}$ in the optimization.

2.2. Reflected Adaptive Differential Evolution with Two External Archives (RJADE/TA)
RJADE/TA [
50] is an adaptive DE variant. Its main idea is to archive comparatively best solutions of the population at regular interval of optimization process and reflect the overall poor solutions. RJADE/TA inserts the following techniques in JADE. The techniques are presented in
Table 1.
To prevent premature convergence and stagnation, the best solution, ${\mathbf{w}}_{[j,b\mathrm{e}st]}^{\{y\}}$ is replaced by its reflection in RJADE/TA and is then shifted to the second archive ${M}^{\left[s\mathrm{e}c\right]}$.
The reflected solution replaces ${\mathbf{w}}_{[j,b\mathrm{e}st]}^{\{y\}}$ in the population and the ever best candidate ${\mathbf{w}}_{[j,b\mathrm{e}st]}^{\{y\}}$ by itself is migrated to the second archive ${M}^{\left[s\mathrm{e}c\right]}$. RJADE/TA maintains two archives, termed as ${M}^{\left[first\right]}$ and ${M}^{\left[s\mathrm{e}c\right]}$ for convenience. After half of available resources are utilized ($MaxFEs$), the first archive update of the second archive, ${M}^{\left[s\mathrm{e}c\right]}$, is made. Afterwards, ${M}^{\left[s\mathrm{e}c\right]}$ is updated adaptively with a continuing intermission of generations (see Algorithm 1).
The overall best candidates are transferred to ${M}^{\left[s\mathrm{e}c\right]}$, whereas ${M}^{\left[first\right]}$ records the recently explored poor solutions. The size of ${M}^{\left[first\right]}$ is fixed, equal to population size ${N}^{\left[pop\right]}$, while the size of ${M}^{\left[s\mathrm{e}c\right]}$ may exceed ${N}^{\left[pop\right]}$. As ${M}^{\left[s\mathrm{e}c\right]}$ keeps information of all best solutions found, no solution is deleted from it. ${M}^{\left[s\mathrm{e}c\right]}$ records only one solution of the current iteration, it may be a child or a parent, whereas ${M}^{\left[first\right]}$ makes a history of more than one inferior “parent solutions” only. ${M}^{\left[first\right]}$ is updated at every iteration and ${M}^{\left[s\mathrm{e}c\right]}$, initialized as ∅, is updated with a gap of $\kappa $ iterations adaptively. The recorded history of ${M}^{\left[first\right]}$ is utilized in reproduction later on. In contrast, in ${M}^{\left[s\mathrm{e}c\right]}$, the recorded best individual is reflected with a new solution, which is then sent to the population. Once a candidate solution is posted to ${M}^{\left[s\mathrm{e}c\right]}$, it remains passive during the whole optimization. When the search procedures are terminated, then the recoded information contributes towards the selection of the best candidate solution.
2.3. Davidon–Fletcher–Powell (DFP) Method
The DFP method is a variable metric method, which was first proposed by Davidon [
51] and then modified by Powell and Fletcher [
52]. It belongs to the class of gradient dependent LS methods. If a right line search is used in DFP method, it will assure convergence (minimization) [
49]. It calculates the difference between the old and new points, as given in Equation (
1). Then, it finds the difference of the gradients at these points as calculated in Equation (
2).
It then updates the Hessian matrix
$\mathbf{H}$ as presented in Equation (
3). Afterwards, it locates the optimal search direction
${\mathbf{s}}^{\left[j\right]}$ with the help of the Hessian matrix information as calculated in Equation (
4). Finally, the output solution
${\mathbf{w}}^{[j+1]}$ is computed by Equation (
5), where
${\alpha}^{\left[j\right]}$ is calculated by a line search method; golden section search method is used in this work.
3. Related Work
To fix the abovementioned weaknesses of DE, many researchers merged various LS techniques in DE. Nelder–Mead LS is hybridized with DE [
53] to improve the local exploitation of DE. Recently, two new LS strategies are proposed and hybridized iteratively with DE in [
1,
31]. These hybrid designs show performance improvement over the algorithms in comparison. Two LS strategies, Trigonometric and Interpolated, are inserted in DE to enhance its poor exploration. Two other LS techniques are merged in DE along with a restart strategy to improve its global exploration [
54]. This algorithm is statistically sound, as the obtained results are better than other algorithms. Furthermore, alopexbased LS is merged in DE [
55] to improve its diversity of population. In another experiment, DE’s slow convergence is enhanced by combining orthogonal design LS [
56] with it. To avert local optima in DE, random LS is hybridized [
57] with it. On the other hand, some researchers borrowed DE’s mutation and crossover in traditional LS methods (see, e.g., [
58,
59]).
To the best of our knowledge, none of the reviewed algorithms in this section integrate DFP into DE’s framework. Further, the proposed work here maintains two archives: the first one stores inferior solutions and the second one keeps information of best solutions migrated to it by the global search. Furthermore, the second archive improves the solutions quality further by implementing DFP there. Hence, our proposed work has the advantage that the second archive keeps complete information of the solution before and after LS. This way, any good solution found is not lost. It also adopts a population decreasing mechanism.
5. Validation of Results
In this section, first we briefly illustrate the five algorithms used for comparison and then the experimental results are presented.
5.1. Global Search Algorithms in Comparison
Among the five algorithms for comparison, the first two, RJADE/TA and RJADE/TALS, are our recently proposed hybrid algorithms, while the remaining three, jDE, jDEsoo and jDErpo, are nonhybrid, but adaptive and popular DE variants.
5.1.1. RJADE/TA
RJADE/TA [
50], similar to RJADE/TAADPLS, utilizes two archives for information. One of the archives stores inferior solutions, while the other keeps a record of superior solutions. However, in RJADE/TAADPLS, the second archive stores elite solutions, which are then improved by DFP. Further details of RJADE/TA can be seen in
Section 2.2.
5.1.2. RJADE/TALS
RJADE/TALS [
60] is a very recently proposed hybrid version of global and local search. However, it is different from RJADE/TAADPLS in the sense that it utilizes reflection mechanism and a fixed population, while RJADE/TAADPLS uses DFP as LS without reflection and a population decreasing approach.
5.1.3. jDE
jDE [
61] is an adaptive version of DE, which is based on selfadaption of control parameters
F and
$CR$. In jDE, the parameters
F and
$CR$ keep changing during the evolution process, while the population size
${N}^{\left[pop\right]}$ is kept unchanged. Every solution in jDE has its own
F and
$CR$ values. Better individuals are produced due to better values of
F and
$CR$. Such parameter values translate to upcoming generations of jDE. Because of its unique mechanism and simplicity, jDE has gained popularity among researchers in the field of optimization. Since its establishment, people use it to compare with their own algorithms.
5.1.4. jDEsoo and jDErpo
jDEsoo [
62] is a new version of DE that deals with singleobjective optimization. jDEsoo subdivides the population and implements more than one DE strategies. To enhance diversity of population, it removes those individuals from population that remain unchanged in the last few generations. It was primarily developed for CEC 2013 competition.
jDErpo [
61] is an improvement of jDE. It is based on the following mechanisms. Firstly, it incorporates two mutation strategies, different from jDE, DE and RJADE/TA. Secondly, it uses adaptively increasing strategy for adjusting the lower bounds of control parameters. Thirdly, it utilizes two pairs of control parameters for two different mutation strategies in contrast to one pair of parameters used in jDE, classic DE and RJADE/TA. jDErpo was also specially designed for solving CEC 2013 competition problems.
5.2. Parameter Settings/Termination Criteria
Experiments were performed on 28 benchmark test problems of CEC 2013 [
63]. They are referred as BMF1–BMF28. The parameters’ settings were kept the same as demanded in [
63]. The dimension
n of each problem was set to 10, population size
${N}^{\left[pop\right]}$ to 100, and the
$MaxFEs$ to
$\mathrm{10,000}\times n$. The number of elite solutions
r was kept as 1. The iterations number
w of DFP was set to 2. The reduction of population per archive update
r was also chosen as 1. The gap
$\kappa $ between successive updates of
${M}^{\left[s\mathrm{e}c\right]}$ was kept as 20. The optimization was terminated if either
$MaxFEs$ were reached or the difference between the means of function error values was less than
${10}^{8}$, as suggested in [
50,
63].
5.3. Comparison of RJADE/TAADPLS against Established Global Optimizers
The mean of function error values, the difference between known and approximated values, for jDE, jDEsoo, jDErpo, RJADE/TA and RJADE/TAADPLS, are presented in
Table 2. In
Table 2, + indicates that the algorithm won against our algorithm, RJADE/TAADPLS; − indicates that the particular algorithm lost against our algorithm; and = indicates that both algorithms obtained the same statistics. The comparison of RJADE/TAADPLS with other competitors showed its outstanding performance against all of them. RJADE/TAADPLS achieved higher mean values than jDE and jDEsoo on 17 out of 28 problems; the many − signs in columns 2 and 3 of
Table 2 support this fact. In contrast, jDE and jDEsoo performed better on six and eight problems, respectively.
RJADE/TAADPLS showed performance improvement against jDErpo and RJADE/TA algorithms as well. In general, RJADE/TAADPLS performed better than all algorithms in comparison, especially in the category of multimodal and composite functions. The proposed mechanism is not only based on LS for local tuning with no reflection, but it also implements an ADP approach, which could be the reasons for its good performance.
5.4. Performance Evaluation of RJADE/TAADPLS Versus RJADE/TALS
We empirically studied the performance of RJADE/TAADPLS against RJADE/TALS.
Table 3 presents the mean results achieved by both methods in 51 runs. The best results are shown in bold face. It is very clear from the results in
Table 3 that the proposed RJADE/TAADPLS performed higher than RJADE/TALS on 13 out of 28 problems. Furthermore, on five problems, they obtained the same results. RJADE/TALS showed performance improvement on 10 test problems.
It is interesting to note that RJADE/TAADPLS showed outstanding performance in the category of composite functions, where it solved BMF22–BMF28 better than RJADE/TALS. Again, the two different mechanisms, the ADP approach and the LS search with out reflection, of RJADE/TAADPLS could be the reasons for its better performance. Among 28 problems, RJADE/TALS was better on 10 functions. Further,
Table 4 presents the percentage performance of RJADE/TAADPLS and RJADE/TALS. Since on five test problems, both algorithms showed equal results, thus we compared the percentage for the remaining 23 problems. As shown in
Table 4, RJADE/TAADPLS was able to solve
$57\%$ of problems against
$43\%$ of problems solved by RJADE/TALS out of 23 test instances.
Furthermore, box plots were plotted from all means obtained in 25 runs of RJADE/TA, RJADE/TALS and RJADE/TAADPLS.
Figure 2 and
Figure 3 plot one function from each three functions. Box plots are very good tools to show the spread of the data.
Figure 2b–d shows that the boxes obtained by RJADE/TAADPLS were lower than the other two boxes, indicating its better performance.
Figure 2a presents the plot of BMF3, in which the two boxes in comparison were lower than RJADE/TAADPLS, thus they were better.
Figure 3b,d,f shows that the boxes obtained by RJADE/TAADPLS on BMF19, BMF25 and BMF27 were lower than the boxes of RJADE/TA and RJADE/TALS, indicating higher performance of RJADE/TAADPLS.
Figure 3a,c,e shows that the two other algorithms were better on the respective test instances.
5.5. Analysis/Discussion of Various Parameters Used
The number of solutions r to be migrated to archive and undergo DFP was kept as 1, since DFP is an expensive method due to gradient calculation. Further, its application to more than one solution might slow down the algorithm. The users may take two, but at most three is suggested. The number of iteration w of DFP to archive elements was kept as 2. DFP is a very good method; it could fine tune the solutions in only two iterations. Moreover, the decreasing number r of population per archive update was also chosen as 1. Since the archive was updated after regular gap of global evolution, each time population was decreased by one. However, if we reduced it by more than one solutions, then a stage would come where the diversity of the population would be decreased and the algorithm would either stop at local optima or converge prematurely. We suggest that the decreasing number be at most 3. In general, these parameters are user defined but should be chosen wisely to compliment the global and local search together, instead of premature convergence or stagnation.
6. Conclusions
This paper proposed a new hybrid algorithm, RJADE/TAADPLS, where a LS mechanism, DFP is combined with a DE based global search scheme, RJADE/TA to benefit from their searching capabilities in local and global regions. Further, a population decreasing mechanism is also adopted. The key idea is to shift the overall best solution to archive at specified regular intervals of RJADE/TA, where it undergoes DFP for further improvement. The archive stores both the best solution and its improved form. Furthermore, the population is decreased by one solution at each archive update. We evaluated and compared our hybrid method with five established algorithms on test suit of CEC 2013. The results demonstrated that our new algorithm is better than other competing algorithms on majority of the tested problems, particularly our algorithm showed superior performance on hard multimodal and composite problems of CEC 2013. In future, the present work will be extended to constrained optimization. As a second task, some other gradient free LS methods, global optimizers and archiving strategies will be tried to design more efficient algorithms for global optimization.