An Enhancing Differential Evolution Algorithm with a Rank-Up Selection: RUSDE

: Recently, the differential evolution (DE) algorithm has been widely used to solve many practical problems. However, DE may suffer from stagnation problems in the iteration process. Thus, we propose an enhancing differential evolution with a rank-up selection, named RUSDE. First, the rank-up individuals in the current population are selected and stored into a new archive; second, a debating mutation strategy is adopted in terms of the updating status of the current population to decide the parent’s selection. Both of the two methods can improve the performance of DE. We conducted numerical experiments based on various functions from CEC 2014, where the results demonstrated excellent performance of this algorithm. Furthermore, this algorithm is applied to the real-world optimization problem of the four-bar linkages, where the results show that the performance of RUSDE is better than other algorithms.

DE starts from a randomly generated initial population. A new individual is generated by summing the vector differences of any two individuals and the third individual. Then the new individual is compared with the corresponding individual in the current population. When the fitness of the new individual is better than that of the current individual, the old individual will be replaced by the new one in the next generation. Through continuous evolution, competition, retention, and deletion, the optimal solution is obtained.
However, in DE exists two problems. First, successful solutions may stop generating, and second, no fixed points can be converged; these two may lead to stagnation. In order to improve DE's performance and reduce the stagnation, some researchers proposed diverse versions. Qin adjusted the parameters of DE and determined adaptively a more suitable generation strategy along with its parameter settings in the search process (SaDE) [25]. Brest introduced an efficient technique for adapting control parameter settings associated with DE (jDE) [26]. Zhang created an optimal archive that contains new updating individuals, that helps to generate the better one (JADE) [27]. Lampinen introduced an adaptive strategy with the control parameters on the base of JADE, and obtained better results (SHADE) [28]. Guo improved the optimal archive with all updating individuals and built a new Successful-Parent-Selecting framework (SPSDE) [29]. Later, she introduced the concept of eigenvector and proposed a new crossover method (EIGDE) [30]. The ranking idea was utilized to search for a better position, for example, the low-ranking individuals learn from the top-ranking ones [31,32]. Lixin combined adaptive population classification, adaptive control parameters, and mutation to create an Individual Dependent Mechanism (IDE) [33]. Some other concepts or methods were introduced, such as, neighborhood-based mutation operator (DEGL) [34], ensemble of parameters and mutation (EPSDE) [35], (MDEpBX) [36], a similarity-based mutant vector generation strategy (DE-SIM) [37], multipopulation-based mutation operators (CAMDE) [38], random neighbors based strategy (RNDE) [39], and adaptive lagrange interpolation search (ADELI) [40].
By far, plenty of algorithms adopt the two learning strategies: The individuals with low ranking learn from the ones with high ranking, such as rank-jDE [31] and TLBO [5]; the individuals without updating learn from the ones with updating, such as JADE [27], SHADE [28], and SPSDE [29]. These two learning strategies had been proved to be valid in improving the performance of DE. Therefore, this study proposes a new learning strategy to further reduce the stagnation of DE and accelerate the convergence. The individuals with no rank updating learn from the ones with rank updating. That is, enhancing differential evolution algorithm with a rank-up selection (RUSDE). The main idea of the proposed framework contains two parts: one is that an archive is created during the process which consists of the rank-up individuals, the other one is that a learning strategy has been adopted in terms of the continuous updating numbers of the top-rank individuals and the current individual. We discuss the influence of the archive size, test the RUSDE's performance by CEC 2014 benchmarks [41]. The results show that RUSDE performs better than the other 11 algorithms. We apply the RUSDE into the optimal synthesis problem of four-bar mechanisms, and obtain a considerable result.
The outline of the rest of this paper is organized as follows. We discuss the original DE in Section 2, and RUSDE is proposed in Section 3. Section 4 shows the experimental results of different functions that show the superiority of RUSDE. Section 5 takes an example for the synthesis problem of four-bar mechanism. Section 6 concludes the paper.

Differential Evolution
Differential Evolution is a powerful evolutionary algorithm that approaches a single objective optimization problem [10]. Its process contains four steps.
Step 1, Initialization Initialize the population X, whose size is N, for D dimensional problems, the i th individual in d th dimension is randomly generated within the bound.
Step 2, Mutation The mutation is used to generate a new population by a change or perturbation from parents. The selection methods are diverse. Reference [10] lists some of the most frequently used strategies, such as "DE/rand/1", "DE/best/1", "DE/current-to-best/1", "DE/rand/2", and "DE/best/2". To explain them, we take "DE/rand/1" for example, shown in Equation (1).
where x a , x b , and x c are different individuals randomly chosen from population X, they are all different from the index i; F is the user-specified scaling factor in the range (0, 2).
Step 3, Crossover For increasing the population diversity, a new child individual u child i,j is generated by crossover operation The crossover scheme of DE can be written as Equation (2): where d rand is randomly chosen from [1,D]; CR is a user-defined crossover rate in the range of [0, 1].
Step 4, Selection The greedy selection mechanism is implemented to yield the population survival into the next generation in terms of the fitness value after the crossover process. Only the better one can survive. The selection procedure can be mathematically expressed as Equation (3):

Differential Evolution Variants
In recent years, a number of DE variants occur to improve DE's performance. We divide them into four categories.
Improving mutation, crossover, or both strategies: For example, the opposition number is generated in population initialization (ODE) to realize the generation jumping and the local improvement [42,43]; the differential covariance matrix's i information is extracted for the determination of the search direction, similar to CMA-ES [44] combined with DE (DCMA) [45,46]. The covariance matrix learning is also used to create a proper coordinate system for the crossover operator, such as eigenvector (EIGDE) [30] and bimodal distribution parameter setting [47]. Besides, there are orthogonal crossover [48], adaptive Lagrange interpolation search(ADELI) [40], Gaussian mutation [49], and multiple strategies [50,51].
Alternating the parents: Some researchers select the parents according to the fitness values or rank to generate a better solution, such as neighborhood and directional Information [55], successful-parent-selecting framework (SPSDE) [29], and rank-based selection (rank-jDE) [31].
Building the population structure: The multiply populations strategies are introduced to adaptively enhance the performance of DE, such as IDE [33] and CAMDE [38]; An aging strategy is employed to jump out of the local optimum [56]. The pseudo-code for DE is given in Algorithm 1.
By far, the main idea of most ranking algorithms is that individuals with low ranking learn from individuals with high ranking. This is similar to human social behavior. Elites are in the top position and attract attention from people, hence everyone likes to learn from them. However, there is another group of people who have also attracted attention. Although they are not ranked high in their industries, their rankings have recently risen fast for some reason, hence people have begun to study the reasons for their rise and learn from it. Similar to the development of the national economy, some developed countries have advanced technology, but there will always be a period of slow economic growth, while some developing countries, which have relatively worse technology, have rapid economic growth. As a result, people will analyze the reason why these countries' economies grow rapidly and learn from them to develop their own economies. This prompts us to propose the ranking-up algorithm in this paper which contains two parts.
First, in order to increase the learning speed of DE, the individuals with no rank up learn from the ones with rank up. The detailed process is that we store the individuals whose rank is up recently in an archive Y. The size of the archive is M, Y is a matrix with M * D and D is the dimension.
To judge the rank-up of the individual, we can use Equation (4): where rankupvalue(i,G) is the i th individual's rank-up value in G th generation; rankvalue(i,G) is the i th individual's rank index in G th generation. Hence, if rankupvalue(i,G) < 0, it means this individual's rank is up, then it will be stored into the archive Y. The method will obey the rule that the newest one replaces the oldest one. Second, a discussion is presented in terms of the current population. We use flagrank(i) to indicate the consecutive no-rising number of the current i th individual's rank, and flagbest is indicated to the consecutive no-updating number of the global optimal individual, shown in Equations (5) and (6). We adapt the parent's selection method to perform the mutation process. The learning behavior of the current individual is discussed in six cases according to the current individual's ranking status: If the current individual's rank rises, then the individual will be used as the initial position of the learning behavior; if the current individual's rank does not rise twice consecutively, the individual will be replaced by a random individual from the archive Y; if the current individual's rank does not rise for one consecutive time, the individual will be replaced by a random individual from the current population X; for the global optimal individual of the current population, if it is not updated, all individuals will not learn from it. Otherwise, all individuals learn from it. As shown in Equation (7).
f lagbest = 0, the best individual s fitness update f lagbest + 1 Otherwise (6) where Y1, Y2, and Y3 are individuals randomly selected from the archive Y; X1 and X2 are two random individuals in the current population; x best is the global optimal individual in the current population; X better is an individual whose rank is better than the current individual's; X i is the current individual; F i is the i th individual's learning factor. In this study, F i is set the same as Ref [26], shown in Equation (8).
Crossover: in this study, we choose the same method as [26] to achieve cross behavior, shown in Equation (9).
where rn(i) represents a random integer from the set of integers 1, 2, ..., D, which is used to ensure that at least one dimension in u child i,d comes from V i,j . CR is a crossover operator rate whose value range is (0, 1). In this study, CR is set to 0.9, the same as in the literature [26].
Selection: when the new individual u child i is generated, its function fitness value will be calculated, compare the fitness value with the one of X i , only the better one can survive.
In summary, the pseudo-code of the rank-up selection DE algorithm is shown in Algorithm 2.
In this paper, the crossover and selection part of RUSDE is the same with that of jDE [26]. Hence, RUSDE proposed in this paper is based on the frame of jDE. Besides, in order to alleviate the problem of stagnation of DE and improve the individual's updating rate, most algorithms create an archive with updated individuals, and then randomly extract individuals from the archive to generate the offspring, such as SPS frame [29]. RUSDE generates the new archive with the rank-up individuals. The difference is that the fitness value of the rank-up individual must be updated, while the rank of the individuals whose fitness value updated may not rise. In other words, this filter is more strict. Therefore, in the numerical experiments, RUSDE will be compared with jDE and SPS-jDE to show its performance. Randomly select X1, X2, X3 from the population X and Y1, Y2, Y3 from the archive Y; Generate v i by Equation (7); for d = 1 to D do Generate v i by Equation (9)

Benchmarks and Experimental Settings
This study used a CEC 2014 benchmark problem [41] to evaluate RUSDE on the basis of comparison with other algorithms. The CEC 2014 benchmark functions can be divided into four categories: F1-F3 are unimodal functions, F4-F16 are simple multimodal functions, F17-F22 are hybrid functions, and F23-F30 are composition functions [57]. The shifted global optimum was determined by the random shift matrix o = [o 1 , o 2 , . . . , o D ], and the rotated matrix M was generated as described in a previous study [58]. The benchmarks are shown in Appendix A.

The Influence of Parameter M
In RUSDE, the size of the selection archive Y may have a crucial influence on the calculation of the optimal solution. We used CEC 2014 benchmark functions (see in Appendix A) to test the effect on the results with different parameter M values. For the 30-dimensional problems, we set different M values, N is set to 100, and FES = 200,000, the algorithms are performed for 20 times, respectively. In Table 1, we list the results of the average rank. As the parameter M increases, that means the size of the archive Y enlarges, and the information from Y may be out-of-date to lead the average rank get worse, while if the parameter M is too small, the diversity of Y becomes worse, which deteriorates the average rank. When M = 50, which is half of the population, the average ranking of RUSDE is the smallest. In order to discuss the statistical significance of the influence of the parameter M, we use Friedman test to assess the impact of it. Assume h0: Different parameters M have no effect on the operating results of the RUSDE. Fromthe F test results, F-score is 5.117, which is larger than 2.545 (the significance level is taken to 0.05, Freedom is 10). This means that this result negates the original hypothesis of Friedman's test. Therefore, the parameter M has significant difference in the calculation of the optimal value of RUSDE. Therefore, in the later experiments, we all take M = 50.

Comparisons with Other Algorithms and Statistical Analysis of The Results
We used CEC 2014 to test 11 evolutionary algorithms (EAs) with 30 and 50-dimensional problems, and ranked the average value. The best, mean, and std values are shown in Tables 2 and 3. It can be seen that for the 30-dimensional problems, the average rank of LSHADE, RUSDE, and SaDE were in the first, second, and third places, respectively. RUSDE was ranked first for F2, F7, F9, F12, F16, F19, and F26, with a total of seven first ranks. LSHADE was ranked first for F1, F7, F9, F11, F15, F17, F18, F20, F21, F26, and F29, with a total of 12 first ranks, in the first place. The third place were SaDE; for the 50-dimensional problems, the average rank of LSHADE, RUSDE, and SPS-jDE were first, second, and third, respectively. LSHADE was ranked first for F1, F9, F11, F18, F21, F26, F29, and F30, with a total of eight first ranks in the first place. RUSDE was ranked first for F16 and F19, with a total of two first ranks in the third place. SPS-jDE was ranked first for F6, F7, F11, F12, F13, and F15, with a total of six first ranks in the second place. From the comprehensive perspective of the results, the average ranking of the RUSDE algorithm was the second smallest, which indicated that RUSDE has the best comprehensive performance except for LSHADE.
Suppose the variance of all algorithms obeys normal distribution. We used a t-test with double tail to compare RUSDE with the other 10 algorithms. The significant level was set to 0.05. Assume h0: there is no difference between the compared algorithms and RUSDE. To determine whether the difference is statistically significant, the t-test calculates a t-value (the p-value is obtained directly from this t-value). The t-value measures the size of the difference relative to the variation in in the RUSDE and other algorithms. The greater the magnitude of T, the greater the evidence against the null hypothesis. A smaller p-value means that there is stronger evidence in favor of the alternative hypothesis. P and T stand for p-value and t-value in Tables 4 and 5. "+", "-", and " = " indicated significantly better than, worse than, and no significant difference from the other 10 algorithms, respectively. The results were shown in Tables 4 and 5. It illustrated that for the 30-dimensional problems, the number of "+" for the other 10 algorithms was greater than the number of "-", and for the SaDE and SPS-jDE, the number of "+" was slightly more than the number of "-". That indicated the performance of RUSDE was slightly better than that of SaDE and SPS-jDE, slightly worse than LSHADE, and better than the remaining eight algorithms; For the 50-dimensional problems, the number of "+" for the other 10 algorithms is greater than the number of "-", and for the SPS-jDE algorithm, the number of "+" is equal to "-". This shows that the performance of RUSDE is not significantly different from that of SPS-jDE and LSHADE, and it is better than the remaining nine algorithms.
The Friedman test was used to evaluate the effect of the algorithm [63]. The null hypothesis (H0) means there is no difference between the algorithms, and the alternative hypothesis (H1) means there is at least one algorithm that obtained different results compared with the others. There are 12 algorithms in total, which means the freedom is 11. When we set the confidence level to 0.05, the critical value is 2.4663. According to the calculations, the F-scores for the mean value ranks of 30D and 50D are 19.869 and 23.008, respectively, which are much higher than the critical value, hence demonstrating that hypothesis H1 can be accepted. The F-tests show that significant differences existed between the algorithm; therefore, we use the post hoc Duncan's test to assess the differences between RUSDE and the other 11 algorithms. Table 6 illustrates the results based on the significance (p-value). It shows that for the 30D problems, RUSDE performs better than the other algorithms, except for SaDE, SPS-jDE, and LSHADE, and for the 50D problems, RUSDE obtains better performance than the other algorithms, except for SaDE, rank-jDE, SPS-jDE, and LSHADE.         +  -=  -=  -=  -+  26  13  21  29  12  14  8  13  12  3  6  =  2  14  4  0  16  13  16  15  12  26  9  -2  3  5  1  2  3  6  2  6 1 15     Figure 1 shows the convergence curve of the RUSDE and 11 other algorithms in CEC 2014. It illustrated that for the unimodal function F2, RUSDE always had a faster convergence speed; for the multi-modal function F7, RUSDE had a faster convergence speed as well as jumping out of the local optimum; for the multi-modal functions F12, RUSDE converged at a normal speed in the early stage, but in the late stage, it could jump out of the local optimum and obtain a better solution; for composite functions and mixed functions F16, F19, and F26, RUSDE performed the superiority to other EAs.

Application Example
The four-bar mechanism is a common mechanism widely used in many machines and devices. The dimensional synthesis of four-bar mechanisms aims to synthesize a mechanism with the minimum errors between the coupler points and desired points to meet the design requirements. A typical case presented in [64] is tested in this paper.

The Classic Case of Four-Bar Mechanism
The schematic of the four-bar mechanism together with the variables is shown in Figure 2. In the world coordinate system XOY, the position of coupler point C can be written as Equation (10).
where r 1 , r 2 , r 3 , and r 4 are the length of the links; r cx , r cy are the coordinates of coupler point C in the relative coordinate system X c OY c ; x 0 , y 0 are the coordinates of joint O 2 in the world coordinate system XOY; θ 0 is the angle of the stationary link with respect to the X-axis; θ i 2 is the i th input angle corresponding to the i th desired point in the relative coordinate system X t OY t .

The Constraints and Goal Function
The links' length must satisfy the Grashof condition, which can be expressed by Equation (11).
h1 : In order to avoid the order defect, the sequence of input angles should fulfill the clockwise or counterclockwise rotation, shown in Equation (12).
The objective function can be divided into two parts. The first part is the sum of the square of Euclidean distances between the coupler points and the corresponding desired points; the second part is the penalty functions. The Grashof condition and the Sequence condition are introduced as the penalty functions. At last, the objective function of the optimal problem can be expressed as Equation (13) where h 1 (x), h 2 (x) are the Grashof and Sequence condition, respectively. If the condition is satisfied, then the value is equal to 0, otherwise, it is set to 1; M 1 , M 2 are high values to penalize the objective function, and they are set to 10 4 .

The Experimental Settings and Results
We compare five algorithms, KH, DGSTLBO, SCA, DE, and jDE, on the synthesis problem of four-bar mechanisms. The parameter settings of algorithms are N = 100; D = 15; FEs = 100, 000.
Then, the design variables of this example are: X = [r 1 , r 2 , r 3 , r 4 , r cx , r cy , x 0 , y 0 , θ 0 , θ 1 2 , θ 2 2 , θ 3 2 , θ 4 2 , θ 5 2 , θ 6 2 ] This problem requires tracing a trajectory along eighteen given points arranged in an irregular closed path and with prescribed timing. The desired points are: The values of the best, mean, and standard deviations obtained by the six algorithms for 30 runs are presented in Table 7. It demonstrates that the best solution obtained by RUSDE is the most accurate among the six algorithms. The convergence graph of the best values for the synthesis problem of the four-bar mechanism is shown in Figure 3. The results show that RUSDE has the fastest convergence speed and the highest convergence accuracy among the six competitive algorithms. Therefore, RUSDE is superior to the other five algorithms in convergence speed and accuracy.

Conclusions
In this study, we proposed an enhanced learning method to improve the performance of differential evolution (DE), named RUSDE. In this method, we selected the rank-up individuals and stored them into an archive Y. Compared with the fitness updated individuals, this selection was more strict; we adopted different mutation strategies inspired by simulating human social behavior. In this learning strategy, parents were selected in terms of their updating status. We used the CEC 2014 benchmarks comprising unimodal, basic multimodal, expanded multimodal, and hybrid problems to test the performance of RUSDE, and discussed the influence of the archive size M. The results showed that when the size was half of the population, the reasonable average rank could be obtained. We compared the performance of RUSDE with DErand, jDE, SaDE, Rank-jDE, SPS-jDE, jDE-EIG, KH, LBSA, DGSTLBO, LSHADE, and SCA. The numerical experiment results showed that RUSDE performed superior to DErand, jDE, SaDE, Rank-jDE, jDE-EIG, KH, LBSA, DGSTLBO, and SCA, except for SPS-jDE in the 30D and 50D problems. The statistical analysis results show that for the 30D problems, RUSDE performs better than the other algorithms, except for SaDE, SPS-jDE, and LSHADE, and for the 50D problems, RUSDE obtains better performance than the other algorithms, except for SaDE, rank-jDE, SPS-jDE, and LSHADE. An example of an application of the proposed method to optimize a four-bar mechanism showed that RUSDE performed better than the other algorithms. The limits of RUSDE are that the size of the selection archive Y has a crucial influence on the calculation of the optimal solution. For different problems, RUSDE with different archive sizes performs differently. In future research, an adaptive mechanism to decide the archive size M will be developed for RUSDE. Moreover, we will introduce this method to other similar algorithms and apply it to practical problems in the field of engineering control system. Funding: This research was partially supported by a grant (#DMR-186-168) from National Chung Hsing University, Taichung, Taiwan.

Conflicts of Interest:
The authors declare no conflict of interest.