Fuzzy Strategy Grey Wolf Optimizer for Complex Multimodal Optimization Problems

Traditional grey wolf optimizers (GWOs) have difficulty balancing convergence and diversity when used for multimodal optimization problems (MMOPs), resulting in low-quality solutions and slow convergence. To address these drawbacks of GWOs, a fuzzy strategy grey wolf optimizer (FSGWO) is proposed in this paper. Binary joint normal distribution is used as a fuzzy method to realize the adaptive adjustment of the control parameters of the FSGWO. Next, the fuzzy mutation operator and the fuzzy crossover operator are designed to generate new individuals based on the fuzzy control parameters. Moreover, a noninferior selection strategy is employed to update the grey wolf population, which makes the entire population available for estimating the location of the optimal solution. Finally, the FSGWO is verified on 30 test functions of IEEE CEC2014 and five engineering application problems. Comparing FSGWO with state-of-the-art competitive algorithms, the results show that FSGWO is superior. Specifically, for the 50D test functions of CEC2014, the average calculation accuracy of FSGWO is 33.63%, 46.45%, 62.94%, 64.99%, and 59.82% higher than those of the equilibrium optimizer algorithm, modified particle swarm optimization, original GWO, hybrid particle swarm optimization and GWO, and selective opposition-based GWO, respectively. For the 30D and 50D test functions of CEC2014, the results of the Wilcoxon signed-rank test show that FSGWO is better than the competitive algorithms.


Introduction
Many complex optimization problems in industrial applications have multiple global optimal solutions or near-optimal solutions that provide decision-makers with different decision preferences, and such problems are often referred to as multimodal optimization problems (MMOPs) [1]. Air service network design is an example of a MMOP, which requires all feasible routes to transport goods to the destination; if the current route cannot be executed due to weather conditions, an alternative route with a similar cost can be selected from the feasible routes to transport goods [2]. Other examples of MMOPs include structural damage detection [3], image segmentation [4], flight control systems [5], job shop scheduling [6], truss structure optimization [7], protein structure prediction [8], and electromagnetic design [9]. Most MMOPs are nonconvex and nonlinear; classical numerical optimization methods are sensitive to nonconvexity and nonlinearity, so they encounter difficulties in solving MMOPs. In contrast, evolutionary algorithms are not sensitive to the nonconvexity and nonlinearity of optimization problems and have been widely used to solve MMOPs, such as genetic algorithms [10], evolutionary algorithms [11], particle swarm optimization [12], ant colony optimization [13], cuckoo search algorithms [14], memetic algorithms [15], niching chaos optimization [16], grey wolf optimization [17], harmony search algorithms [18], fireworks algorithms [19], and gravitational search algorithms [20]. Among these evolutionary algorithms for solving MMOPs, grey wolf optimizers (GWOs) have the advantages of easy implementation and requiring few parameters [21]. Moreover, GWOs The results show that the proposed algorithm has advantages over competitive algorithms in solving MMOPs, and the proposed improvement ideas are feasible for balancing the diversity and convergence of traditional GWOs. Specifically, for the 50D test functions of CEC2014, the average calculation accuracy of FSGWO is 33.63%, 46.45%, 62.94%, 64.99%, and 59.82% higher than those of the equilibrium optimizer algorithm (EO), modified particle swarm optimization (MPSO), original GWO, hybrid particle swarm optimization and grey wolf optimizer (HPSOGWO), and selective opposition-based GWO (SOGWO), respectively, which indicates that the FSGWO can significantly improve the calculation accuracy when solving high-dimensional MMOPs. For the 30D and 50D test functions of CEC2014, the results of the Wilcoxon signed-rank test show that the proposed algorithm is better than the competitive algorithms.
The remainder of this paper is arranged as follows: Section 2 introduces the principle of the original GWO. Section 3 provides the key details and the flowchart of FSGWO. Section 4 presents the experimental results, and Section 5 is the discussion. Finally, Section 6 summarizes the study.

Algorithm Flow of the Original Grey Wolf Optimizer
The grey wolf population is denoted by X = {X 1 , X 2 , . . . , X M }, where M is the number of grey wolves. X p is an individual in X. In the original GWO [21], the three best solutions appearing in the iterative process are called the three leading wolves and are denoted by X α , X β , and X δ . The three leading wolves are used to guide grey wolf individuals to round up prey, which is also known as updating individuals.
The search direction D α from X p to X α is calculated as where C 1 is the neighborhood radius, which is a random vector between (0, 2). The operator represents the vector dot product operation. C 1 X α is a neighborhood point of X α . The operator |·| is an absolute value operator.
A mutant individual of X p generated by X α is written as where A 1 is the search step, which is a random vector between (−2, 2). Similarly, a mutant individual of X p generated by X β can be presented as where D β = |C 2 X β − X p | is the search direction. C 2 is a random vector between (0, 2), and A 2 is a random vector between (−2, 2). In the same way, a mutant individual of X p generated by X δ is where D δ = |C 3 X δ − X p | is the search direction. C 3 is a random vector between (0, 2), and A 3 is a random vector between (−2, 2). The new individual X u , generated from the above three mutant individuals, can be expressed as Finally, X p in X is replaced with X u , completing the update of the individual X p . This update strategy of GWO has two drawbacks. (i) Compared with X p , all dimensions of X u are mutated, which leads to divergence when solving high-dimensional MMOPs and reduces the quality of solutions. (ii) Moreover, GWO uses only three leading wolves as heuristic information to guide individuals to search for the optimal solution. The distribution of the optimal solutions of MMOPs is more complex, and the three leading wolves cannot quickly estimate the region of the optimal solution; thus, the algorithm slowly converges.
The pseudocode of the original GWO algorithm is shown in Algorithm 1.
Initialize the grey wolf population X i (i = 1, 2, . . . , M) Initialize a, A, and C Calculate the fitness of each search agent X α = the best search agent X β = the second-best search agent X δ = the third-best search agent while (t < Max number of iterations) for each search agent Update the position of X i end for Update a, A, and C Calculate the fitness of all search agents Update X α , X β , and X δ t = t + 1 end while return X α

Fuzzy Adaptive Control Parameters
For GWOs, the design of adaptive control parameters is a challenge. In [28][29][30][31], nonlinear functions are used to design adaptive control parameters for GWOs, but such adaptive parameters are not effective in solving MMOPs, and the quality of the obtained solutions is not high. The main reason for this result is that the nonlinear function cannot know the complexity of the solution space of MMOPs and cannot use the current iteration information to estimate the position of the optimal solution.
In recent studies, fuzzy methods have been used to address the issues of adaptive control parameters of evolutionary algorithms. To achieve an optimal balance of exploitation and exploration in the chicken swarm optimization algorithm [43], the fuzzy system is applied to adaptively adjust the number of chickens and random factors. In [44], the fuzzy system is used for the design of the crossover rate control parameter of the differential evolution algorithm, which improves the diversity of the population. In [45], the fuzzy inference system is employed to automatically tune the control parameters of the whale optimization algorithm, which improves the convergence of the algorithm. The common feature of these fuzzy methods is updating the control parameters with the information of the optimal solution in the current iteration.
Fuzzy methods provide new ideas for the design of adaptive control parameters for GWOs. Inspired by this, in this study, bivariate joint normal distribution is used as a fuzzy method to design adaptive control parameters of GWO and new evolutionary operators are designed based on these fuzzy control parameters. Finally, the improved GWO is employed to solve MMOPs.

The Proposed Algorithm
The flowchart of the FSGWO is shown in Figure 1. The algorithm parameters and population are initialized first; the fitness of each individual in the initial population is calculated, and the three individuals with the best fitness values are selected as the three initial leading wolves. In the iterative part of the algorithm, new control parameters are obtained by sampling the binary joint normal distribution, and then a new individual X u is generated through mutation and crossover operations. If X u is better than X p , X p is replaced with X u ; otherwise, it is not replaced. Finally, the parameters of the bivariate joint normal distribution are adaptively updated and the algorithm continues to the next iteration. After the end of the iterations, the best leader wolf is taken as the optimal solution and output. The key points of the FSGWO are described below.

Mutation Strategy with a Fuzzy Search Direction
The mutation of the grey wolf is realized by adding a fuzzy search direction to the grey wolf. The term fuzzy search direction refers to the product of the fuzzy step r a and the search direction D c , and its calculation method is described as follows.
First, three leading wolves are used to estimate the current position X c of the prey, and X c is given by The fuzzy search direction of X p is defined as where X p1 and X p2 are two individuals randomly selected in X, and X p = X p1 = X p2 . The expression X c − X p represents the search information from X p to the prey, which belongs to the global search information. The expression X p1 − X p2 represents the search information between individuals and belongs to the local search information.
There are three differences between Equation (7) and Equation (1). (i) Equation (7) has no absolute value operator and retains the heuristic effect of negative signs on the search. (ii) In the early stage of iterations, the positions of the three leading wolves are generally scattered. Therefore, Equation (7) uses the average value of the three leading wolves to estimate the position of the prey, which can reduce the adverse effects caused by the dispersion of guiding positions and help accelerate the convergence. (iii) Grey wolves have the habit of hunting collectively and surround the prey by exchanging information on the location of their prey. Equation (7) uses X p1 − X p2 to realize the exchange of prey location information between grey wolves, but there is no method for exchanging prey location information between grey wolves in Equation (1).
The mutant individual X ν of X p is generated by the fuzzy search direction, written as where r a is the fuzzy step, which is a random vector between (0, 1). r a is a control parameter of FSGWO, and its generation method is described later. The expression r a D c represents the fuzzy search direction, which is the product of the fuzzy step r a and the search direction D c . The expression X p + r a D c starts from X p and searches for prey in the fuzzy search direction of r a D c .

Fuzzy Crossover Operator
All dimensions of X ν are mutated. To make the search stable, selecting the values on some dimensions from X ν and then copying them into the corresponding dimensions of X u is necessary. This is achieved by a fuzzy crossover operator. The term fuzzy crossover operator refers to a crossover operator that uses the fuzzy crossover factor r b .
The term j is denoted as the jth dimension of X p and X ν . The crossover operation on the jth dimension can be expressed by where r b is the fuzzy crossover factor, which is a random vector between (0, 1). r j b is the value in the jth dimension of r b . r is a random number (scalar) that follows the standard uniform distribution. The expression r ≥ r j b indicates that the value of the jth dimension of X ν is copied to the jth dimension of X u using the roulette strategy.
After completing the operation of Equation (9), a dimension w is randomly specified, and then the mutation operation X ω u = X ω y is performed to generate a new individual X u . The term fuzzy control parameter means that control parameters r a and r b are not directly related in formulas, but they can affect the diversity and the convergence of FSGWO, and there is an inherent fuzzy correlation between r a and r b . Therefore, these two control parameters are related in this paper, and a bivariate joint normal distribution is used to describe that fuzzy relationship. The expression r where u = [u ra , u rb ], u ra and u rb are both scalars. The covariance matrix ∑ is defined as where s 1 is a random number (scalar) following the standard uniform distribution. The terms s 2 and s 3 are random numbers (scalars) following the standard normal distribution.
The values of s 2 and s 3 obtained by sampling the standard normal distribution may be greater than 1, so the diagonal elements of ∑ have diversity, which makes r j c also have diversity.
By sampling Equation (10), a matrix r c with d rows and two columns can be obtained as follows: where d is the dimensionality of the MMOPs. The terms r a and r b are fuzzily related by Equation (10), so the crossover operation of Equation (9) is referred to as the fuzzy crossover operation. The control parameters in r c can be used by a d-dimensional individual in a complete mutation and crossover operation.

Updated Parameters of the Bivariate Joint Normal Distribution
To improve the diversity of r a and r b , it is necessary to update µ and ∑ before each iteration of FSGWO.

Update of µ
Updating µ with a fuzzy perturbation is described as follows. X p before and after the update is denoted by X old p and X new p , respectively. The fitness values of X old p and X new p are denoted by f (X old p ) and f (X new p ), respectively. The absolute value |f (X old p ) − f (X new p )| represents the change rate of the fitness value before and after the update of X p . In population X, the individual with the largest |f (X old p ) − f (X new p )| is denoted by X m , where m is the ID of X m . X m can be written as The control parameters of X m are stored in the mth row of r c , denoted by r m c = [r m a ,r m b ]. r m c can be regarded as heuristic information for updating µ, namely, fuzzy perturbation.
Updating µ with r m c can be written as where c is a conversion factor, which is a constant between (0, 1). The expression c × r m c takes part of r m c as heuristic information to update µ. To avoid excessive perturbation and cause the algorithm to diverge, c is usually 0.1 or 0.2.

Update of ∑
The updated method of ∑ is relatively simple. First, s 1 is obtained by sampling the standard uniform distribution; s 2 and s 3 are sampled via the standard normal distribution. Finally, a new ∑ can be obtained by substituting s 1 , s 2 , and s 3 into Equation (11).

Steps of FSGWO
The steps of FSGWO are shown in Algorithm 2.

Algorithm 2.
Steps of the FSGWO algorithm.
Input: grey wolf population size M, maximum number of iterations T, dimension d of the problem, upper bound ub and lower bound lb of variables. Output: optimal solution X α .

2:
Calculate the fitness value of f (X p ) for each grey wolf. The fitness of the entire population is denoted by f (X old ). f (X new ) = f (X old ).

3:
Initialize the three leading wolves X α , X β , and X δ . 4: for t = 1 to T 4.1: Generate the control parameters matrix r c with Equation (12  Output optimal solution X α .
Some details of the algorithm steps in Algorithm 2 are described below. if the value of an element exceeds the upper bound, it is corrected to rand × ub, or rand × lb if the value of an element exceeds the lower bound, where rand is a random number following a standard uniform distribution.

Analysis of Computational Complexity
M is the population size, d is the dimension of the problem, and T is the number of iterations.
According to the algorithm steps shown in Algorithm 2, the computational cost of FSGWO is concentrated in the iterative part.
According to the above analysis, the computational complexity of FSGWO is the same as that of GWO.

Results
In this section, the FSGWO algorithm is verified on 30 test functions of IEEE CEC2014 [46] and 5 engineering application problems.
The compared algorithms include GWO [21], HPSOGWO [47], SOGWO [36], EO [48], and MPSO [49]. GWO is the original GWO. HPSOGWO is an improved GWO with a particle swarm individual update strategy. SOGWO uses selective opposition to enhance the diversity of GWO, and the convergence speed is faster. EO is inspired by control volume mass balance models used to estimate both dynamic and equilibrium states. MPSO is a particle swarm optimization algorithm using chaotic nonlinear inertia weights and has a good balance of diversity and convergence.
The key parameters of the competitive algorithms are shown in Table 1, and the computer source codes of those algorithms were provided by the original papers. The parameter values of the competition algorithm were taken from the original paper and the default settings of the source codes. The setting method of the control parameter values of FSGWO is described in Algorithm 2. In Table 1, N is the population size, which is uniformly taken as 50 in this paper. In EO, a 1 is a constant value that controls exploration ability and a 2 is a constant value used to manage exploitation ability; GP is a parameter used to balance exploration and exploitation. In MPSO, c 1 and c 2 are referred to as the acceleration factors; ω 1 and ω 2 are inertia weights used to balance exploration and exploitation. In GWO, parameter a is the neighborhood radius. In HPSOGWO, rand is a random number between (0, 1) and ω is an inertia weight. In SOGWO, parameter a is the neighborhood radius. In FSGWO, c is a conversion factor between (0, 1); r a and r b are adaptive control parameters. Table 1. Key parameters of the competitive algorithms.

Results of the Test Functions of CEC2014
The IEEE Congress on Evolutionary Computation 2014 (CEC2014) test suite had 30 complex optimization functions [46], where F1-F3 were unimodal functions and F4-F30 were multimodal functions. In this paper, the proposed algorithm was verified on 30 complex functions (F1-F30) of CEC2014. According to the requirements of the competition, the value range of each dimension decision variable was [−100, 100], and the maximum number of computations of the fitness function was d*10 4 . The experiment was repeated 51 times. The absolute value |f (x) − f (x*)| was the final result of a calculation, where f (x) was the optimal value of the function obtained by the algorithm, and f (x*) was the theoretical optimal value of the function. The smaller the value of |f (x) − f (x*)| was, the closer the optimal value obtained by the algorithm was to the theoretical optimal value. If |f (x) − f (x*)| < 10 −8 , the calculation result was 0. Table 2 shows the calculation results of the related algorithms, in which the mean (Mean) and standard deviation (STD) of the index were calculated using the results of 51 runs of each algorithm. In Table 2, the optimal mean and standard deviation for each function are highlighted with a bold font and gray background.  From the mean results in Table 2, the calculated results of FSGWO on unimodal functions F1-F3 were at least four orders of magnitude better than those of GWO, HPSOGWO, and SOGWO. For the F1 function, the exponent of the result of FSGWO was e + 03, while the exponents of the results of GWO, HPSOGWO, and SOGWO were all e + 07. For the F2 function, the exponent of the result of FSGWO was e + 00, while the exponents of the results of GWO, HPSOGWO, and SOGWO were e + 09, e + 09, and e + 08. For the F3 function, the exponent of the result of FSGWO was e + 00, while the exponents of the results of GWO, HPSOGWO, and SOGWO were all e + 04. The results of the unimodal function show that FSGWO had excellent local optimization ability.

Results of 30-Dimensional Test Functions
From the mean in Table 2, it can be seen that the calculation results of the proposed algorithm for 24 functions (F1-F5, F8-F12, F14, F16-F23, and F26-F30) were better than those of the competitive algorithms, accounting for 80.0%, which indicates that FSGWO could obtain high-quality solutions for MMOPs.
From the STD in Table 2, it can be seen that the results of the proposed algorithm for 23 functions (F1-F5, F8-F12, F14, F16-F23, F26, and F28-F30) were better than those of the competitive algorithms, accounting for 76.7%, which indicates that the FSGWO algorithm had good stability and convergence.
The mean results in Table 2 were analyzed by the Wilcoxon signed-rank test, and the significance level was 0.05. The results of the Wilcoxon test are shown in Table 3. In Table 3, all p values were less than 0.05, which meant that the mean values of FSGWO were significantly different from those of the other algorithms. The results of Tables 2 and 3 show that the quality of solutions obtained by FSGWO was better than that of the competitive algorithms for the 30 30D test functions. According to the data in Table 2, the percentage of improvement in calculation accuracy between FSGWO and each competing algorithm can be calculated, and the results are shown in Table 4. In Table 4, a negative number means that the calculation accuracy of FSGWO for this test function was not as good as that of the competitive algorithm and the term Average represents the average percentage of improvement in the calculation accuracy of FSGWO over the competitive algorithm for the 30 test functions. Table 4 shows that for the 30 30D test functions, the average calculation accuracy of FSGWO was 46.98%, 54.35%, 64.84%, 69.02%, and 62.27% higher than those of EO, MPSO, GWO, HPSOGWO, and SOGWO, respectively. The calculation accuracy of FSGWO was significantly higher than those of the competitive algorithms.  Figure 2 shows box plots of the related algorithms for the 30 test functions which were drawn with 51 calculation results of related algorithms. In Figure 2, the short red line represents the median; the black box represents the upper quartile (Q3) and the lower quartile (Q1); and the blue solid prism represents an outlier.
For 27 functions (F1-F5, F7-F23, and F26-F30), the box length of FSGWO was shorter than those of the competitive algorithms or comparable to them, which meant that the proposed algorithm had good convergence; therefore, the results of 51 runs were relatively concentrated, and the box length was shorter.
For 28 functions (F1-F23 and F26-F30), the median of the proposed algorithm was smaller than those of the competitive algorithms or comparable to them, which indicated that the proposed algorithm had good diversity and local optimization ability and could find high-quality solutions.
For 28 functions (F1-F25 and F28-F30), the number of outliers of FSGWO was less than those of the compared algorithms, or the outliers were mainly distributed around the median, which indicated that the calculation results of FSGWO were close to the normal distribution, and the robustness of FSGWO was better than those of the compared algorithms.     From the perspective of the changing tendencies of the convergence curves in the early stage of the iterations, the fitness values of the proposed algorithm decreased faster than those of the competitive algorithms for 29 functions (F1-F5 and F7-F30), which indicated that FSGWO had good diversity and convergence and could quickly locate the optimal solution in the early stage of the iterations.
From the overall change tendencies of the curves and the final convergence positions, for the 25 functions (F1-F5, F8-F12, F14-F23, and F26-F30), the convergence curves of FSGWO were better than those of the competitive algorithms.

Results of 50-Dimensional Test Functions
The IEEE CEC2014 test functions were set to 50 dimensions. Table 5 shows the calculation results of the related algorithms for the 50-dimensional functions. In Table 5, the optimal mean and standard deviation for each function are highlighted with a bold font and gray background. As the dimensions increased, so did the complexity of the MMOPs. For 23 functions (F1-F5, F7-F12, F14, F16-F18, F20-F23, F26, and F28-F30), the mean values of FSGWO were better than those of the competitive algorithms, accounting for 76.7%.  The Wilcoxon signed-rank test was used to analyze the mean values in Table 5, with a significance level of 0.05. The results of the test are shown in Table 6. Table 6 shows that all p values were less than 0.05, which meant that the calculation results of FSGWO for the 50-dimensional functions were significantly different from those of the competitive algorithms. According to the data in Table 5, the percentage of improvement in calculation accuracy between FSGWO and each competing algorithm can be calculated, and the results are shown in Table 7. In Table 7, a negative number means that the calculation accuracy of FSGWO for this test function was not as good as that of the competitive algorithm and the term Average represents the average percentage of improvement in the calculation accuracy of FSGWO over the competitive algorithm for 30 test functions. Table 7 shows that for the 30 50-dimensional test functions, the average calculation accuracy of FSGWO was 33.63%, 46.45%, 62.94%, 64.99%, and 59.82% higher than those of EO, MPSO, GWO, HPSOGWO, and SOGWO, respectively. The calculation accuracy of FSGWO was significantly higher than those of the competitive algorithms for the 50D test functions.  Figure 4 shows the convergence curves of the related algorithms for the 50-dimensional functions, which were plotted with an average of 51 calculations of each algorithm. From the changing tendencies of the convergence curves and the final convergence positions, the convergence tendencies of the proposed algorithm for 25 test functions (F1-F5, F7-F14, F16-F18, F20-F23, and F26-F30) were better than those of the competitive algorithms.

Verification of the Validity of the Fuzzy Control Parameters
In this paper, the correlation between control parameters r a and r b is modeled by the bivariate joint normal distribution N(µ, ∑). A comparative experiment was conducted to verify the effectiveness of that modeling idea. The control parameters r a and r b were assumed to be independent random variables. In addition, both parameters followed the standard uniform distribution; the other parts of the FSGWO algorithm remained unchanged, and the new FSGWO at this time was denoted as FSGWO1.
The 30-dimensional functions (F1-F30) were solved by FSGWO and FSGWO1, and the calculation results are shown in Table 8. In Table 8, the optimal mean and standard deviation for each function are highlighted with a bold font and gray background. For 24 complex multimodal functions (F1-F5, F7-F12, F14-F16, F18, F19, F21-F23, F25, F26, and F28-F30), the mean values of FSGWO were better than those of FSGWO1, accounting for 80.0%. The Wilcoxon signed-rank test was used to analyze the mean values in Table 8, and the significance level was 0.05. The results of the test are shown in Table 9. The p value (2.7610e -03) was less than the significance level, indicating that there was a substantial difference in Mean between FSGWO and FSGWO1. Figure 5 demonstrates the convergence curves of FSGWO and FSGWO1 for the 30 30D test functions, which were plotted with the averages of 51 runs of the 2 algorithms. Figure 5 shows that the change tendencies and the final convergence positions of FSGWO were better than or comparable to those of FSGWO1 for 29 functions (F1-F5 and F7-F30).    According to the results of Table ?? and Figure 5, the optimization ability of FSGWO was better than that of FSGWO1. Therefore, it is valid that binary joint normal distribution is used as a fuzzy method to realize the adaptive adjustment of the control parameters r a and r b , and this fuzzy method helps improve the convergence speed of FSGWO and the quality of the solution.

Verification of the Effectiveness of the Fuzzy Perturbation Strategy
In Equation (14), the fuzzy perturbation r m c is used to update the mean µ of the bivariate joint normal distribution. A comparative experiment was used to verify that r m c was an effective design. r m c was extracted from r c by m of Equation (13), and m was randomly generated instead of using Equation (13); the other parts of FSGWO remained unchanged, and the new FSGWO was denoted as FSGWO2.
The 30-dimensional functions were solved with FSGWO and FSGWO2, and the results are shown in Table 10. In Table 10, the optimal mean and standard deviation for each function are highlighted with a bold font and gray background. The mean values of FSGWO were better than those of FSGWO2 for 24 functions (F1-F5, F7-F12, F14, F15, F17-F19, F21-F24, F26, and F28-F30), accounting for 80.0%. Table 11 is the result of the Wilcoxon test for the mean values of Table 10, and the significance level was 0.05. The p value was 2.9719e-03, which was less than the significance level. The results of the Wilcoxon test indicated that there was a significant difference in the mean between FSGWO and FSGWO2.  Figure 6 shows the convergence curves of FSGWO and FSGWO2, which were plotted with the averages of 51 runs of the 2 algorithms. Figure 6 shows that FSGWO converged faster than FSGWO2 or was comparable to FSGWO2 for 29 functions (F1-F5 and F7-F30), accounting for 96.7%.  The results in Tables 10 and 11 and Figure 6 show that the design idea of fuzzy perturbation is effective.

Results for Economic Load Dispatch Problems of Power Systems
Economic load dispatch (ELD) is a complex optimization problem with constraints in power systems [50,51]. The task of ELD is to reasonably dispatch the load of the system to each generator to minimize the fuel cost of the system and satisfy the relevant constraints.

The Basic Model of the ELD Problem
The economic load dispatch of thermal power units is discussed in this paper. The fuel cost of an ELD can be approximately expressed as where i is the ID of a generator unit and N G is the total number of generators, which is the dimension of the ELD. For the ith unit, P min i is the output power, P i is the minimum output power, F i (P i ) is the generation cost function, and a i , b i , c i , e i and f i are the power generation cost coefficients. The absolute value operator |·| converts the negative domain of the sine function into a positive domain to generate multimodality; thus, the ELD is a constrained high-dimensional multimodal optimization problem.
The main constraints of ELDs are as follows.
(i) Power balance constraints.
These constraints require that the total power generation of each unit is equal to the system load P D and the transmission loss P L .
where P min i and P max i are the lower and upper limits of the output power of the ith unit, respectively. These constraints require that P i is between [P min i , P max i ].
where P t−1 i and P t i are the output power of the ith unit in the (t − 1)th period and the tth period, respectively; ∆P i is the maximum change rate of the output power of the ith unit in the two adjacent periods. When multiperiod load dispatch is involved, (iv) Prohibited operating zones.
where P pz i and P pz i are the lower and upper limits of the prohibited operation zones of the ith unit, respectively. As there are physical limitations of generator components and unstable factors such as steam valve or bearing vibration, the output power of the ith unit is prohibited in some zones.

Results of ELD Cases
The data of static ELD cases were taken from the IEEE CEC2011 competition dataset [51]; the numbers of units were 40 and 140, respectively. According to the requirements of the competition, the maximum number of calculations of the fitness function was 15,000. The best result (best), mean (mean), median (median), the worst result (worst), and standard deviation (STD) were calculated with 25 independent running results of the algorithm.
In addition to the six algorithms shown in Table 1, the compared algorithms included the island-based harmony search (iHS) [52], intellects-masses optimizer (IMO) [53], modified intellects-masses optimizer (MIMO) [53], adaptive population-based simplex (APS 9) [54], enhanced salp swarm algorithm (ESSA) [55], and the genetic algorithm with a new multiparent crossover (GA-MPC) [56]. iHS is a multipopulation evolutionary algorithm with an island-based harmony search. IMO is a dual-population culture algorithm, and its parameters hardly need to be adjusted. MIMP is an IMO algorithm with a trust domain reflection strategy and strong local search abilities. APS 9 is an improved adaptive population-based simplex method. ESSA is a multistrategy enhanced salp swarm algorithm. GA-MPC is a genetic algorithm with three consecutive parents. Table 12 shows the results of the related algorithms for the 40-unit case. The results of the 1st-6th algorithms are calculated and presented in this paper, and the results of the 7th-12th algorithms were taken from the original papers. In Table 12, the best values of the indicators are highlighted with bold font and a gray background; the symbol-indicates that the data are not provided in the original paper. From the mean index in Table 12, the exponents of the results of related algorithms were all e + 05, and the coefficients of the results were also very close, which meant that all algorithms could approximate the optimal solution; the nuance of the results was mainly caused by the different local optimization abilities of each algorithm. The Best, Mean, and Median of the FSGWO were 1.2260e + 05, 1.2514e + 05, and 1.2519e + 05, respectively, which were better than those of competitive algorithms, indicating that FSGWO had relatively strong local optimization ability and stability. Table 13 shows the results of the related algorithms for the 140-unit case. In Table 13, the best values of the indicators are highlighted with a bold font and gray background. The best, mean, and median of FSGWO were 1.7551e + 06, 1.8119e + 06, and 1.8107e + 06, respectively, which were still better than those of the competitive algorithms, indicating that the proposed algorithm still had excellent optimization performance for the highdimensional ELD problem. The simplex of ESSA and trust region of MIMO were both classical numerical optimization strategies; Table 13 shows that the fuzzy search strategy of FSGWO was competitive with those numerical optimization strategies when used to solve high-dimensional ELD problems. According to the results of Tables 12 and 13, when FSGWO was used to solve highdimensional ELD problems, its optimization ability was better than those of the competitive algorithms, and higher-quality solutions could be obtained by FSGWO.

Design of Three-Bar Truss
In structural engineering, a truss is a triangulated system that provides an efficient way to span long distances. Because members of a truss incur only axial force, the purpose of a truss design is to use less material and maintain the effectiveness of the entire system. A reduction in the amount of material used is usually expressed as a reduction in the diameter of a member. A three-bar planar truss structure is shown in Figure 7 [57]. In this problem, x 1 , x 2 , and x 3 are the normalized diameters of the three members, and x 3 has the same diameter as x 1 . The aim of this study is to achieve the minimum volume of a three-bar truss by minimizing the values of x 1 and x 2 . The problem shown in Figure 7 can be expressed as an optimization problem: where x 1 and x 2 are between [0, 1]. FSGWO was applied to solve the optimization problem of Equation (20). The competitive algorithms included the memory-based grey wolf optimizer (m-GWO) [58], modified sine cosine algorithm (m-SCA) [59], moth-flame optimization (MFO) [60], and cuckoo search (CS) [57]. Table 14 shows the results of related algorithms for the three-bar truss design problem. In Table 14, the best object function value is highlighted with a bold font and gray background. The data for the competitive algorithms are taken from the original papers. From Table 14, the objective function value of FSGWO is 263.8958, which is better than those of the competitive algorithms.  Figure 8 shows a cylindrical pressure vessel which has a hemispherical head at the end and is designed according to the ASME boiler and pressure vessel code [57]. This problem has four decision variables, which are the thickness of the shell (Ts), the thickness of the head (Th), the inner radius (R), and the length of the cylindrical section without considering the head (L). The goal of this problem is to minimize the cost of producing this capacity and satisfy the relevant conditions. Four decision variables of the pressure vessel are represented by x 1 , x 2 , x 3 , and x 4 . The problem shown in Figure 8 can be expressed as an optimization problem:

Design of Pressure Vessel
where x 1 and x 2 are between [0, 99] and x 3 and x 4 are between [10,200].
FSGWO was applied to solve the optimization problem of Equation (21). The competitive algorithms included the grey wolf optimization method based on a beetle antenna strategy (BGWO) [61], the improved grey wolf optimizer (I-GWO) [62], moth-flame optimization with orthogonal learning and Broyden-Fletcher-Goldfarb-Shanno (BFGSOLMFO) [63] and the slime mould algorithm (SMA) [64]. Table 15 shows the results of related algorithms for the pressure vessel problem. In Table 15, the best object function value is highlighted with a bold font and gray background. The data for the competitive algorithms are taken from the original papers. From Table 15, the objective function value of FSGWO is 5885.3328, which is better than those of the competitive algorithms.  Figure 9 shows a gear train design problem, in which there are four gears A, B, C and D [59]. The numbers of teeth of the four gears are represented by variables x 1 , x 2 , x 3 , and x 4 . The number of teeth is an integer between [12,60]. The goal of this problem is to minimize the gear ratio and keep it close to the optimal value of 1/6.931. The problem of Figure 9 can be expressed as an optimization problem:
FSGWO was applied to solve the optimization problem of Equation (22). The competitive algorithms included m-SCA [59], CS [57], the linear prediction evolution algorithm (LPE) [65], and the hybrid grey wolf optimizer and sine cosine algorithm (GWOSCA) [66]. Table 16 shows the results of related algorithms for the gear train design problem. In Table 16, the best object function values are highlighted with a bold font and gray background. The data for the competitive algorithms are taken from the original papers. From Table 16, the objective function value of FSGWO is 2.7009e−12, which is as good as those of m-SCA, CS, and LPE. Moreover, Table 16 shows that gear train design is a typical multimodal optimization problem. For the objective function value of 2.7009e−12, there are three different nearly optimal solutions that correspond to different gear train designs. According to the cost, volume, weight, and reliability of the gear train, decision makers find a design that meets their requirements among these different solutions. 4.6. Design of Cantilever Beam Figure 10 shows a cantilever beam design problem in which there are five nodes [59]. A node in Figure 10 is regarded as a square hollow cross-section with constant thickness. The first node is fixedly supported, and there is an external vertical force acting at the end of the fifth node. The variable x i represents the width of the cross-section of the ith node and its value is between [0.01, 100]. The goal of this problem is to minimize the weight of the cantilever beam. The problem of Figure 10 can be expressed as an optimization problem: FSGWO was applied to solve the optimization problem of Equation (23). The competitive algorithms included CS [57], BGWO [61], m-SCA [59], and MFO [60]. Table 17 shows the results of related algorithms for the cantilever beam design problem. In Table 17, the best object function values are highlighted with a bold font and gray background. The data for the competitive algorithms are taken from the original papers. From Table 17, the objective function value of FSGWO is 1.33996, which is as good as that of BGWO.

Discussion
The convergence curves of Figures 3 and 4 show that the fitness values of the FSGWO algorithm decreased faster than those of the competitive algorithms in the early stage of iterations. This advantage is related to the improvement of FSGWO in population updates.
Step 4.2.4 in Algorithm 2 utilizes the noninferior selection strategy for population updating, which allows only better new individuals to be updated into the population. As the entire grey wolf population can be used to estimate the position of the optimal solution, the probability of detecting the region where the optimal solution is located is also increased, and FSGWO has a faster convergence speed in the early stage of iterations. In contrast, the traditional GWO uses only three leading wolves to estimate the region of the optimal solution, and the probability of finding the optimal solution is relatively low; thus, the convergence curve slowly decreases.
The results in Tables 2-17 and Figure 2 show that the FSGWO algorithm can obtain higher-quality solutions, which indicates that the proposed algorithm has strong optimization ability and stability. These advantages of FSGWO come from the following improvements. (i) The fuzzy direction D c is added with both global and local search information, which enhances the ability of FSGWO to approximate the optimal solution, thereby producing a high-quality solution. (ii) The new individual X u generated by the fuzzy crossover operator does not mutate in all dimensions, which effectively controls the divergence of the algorithm and improves the stability and robustness of FSGWO. (iii) The binary joint normal distribution and fuzzy perturbation can adaptively adjust the control parameters r a and r b of FSGWO, which not only reduces the blindness of the selection of control parameters but also helps improve the local search ability and stability of FSGWO.
Other evolutionary algorithms also have control parameters, and the proposed modeling idea of control parameters is also suitable for those evolutionary algorithms. For example, an evolutionary algorithm has four control parameters, denoted as r 1 , r 2 , r 3 , and r 4 . The internal relation of these four control parameters can be modeled by a quaternary joint normal distribution N(µ, ∑), and µ is written as where µ 1 , µ 2 , µ 3 , and µ 4 are scalars between (0, 1), and their initial values are 0.5. The covariance matrix ∑ can be expressed as where s 1 is a random number following the standard uniform distribution. The four terms s 2 -s 5 are random numbers following the standard normal distribution. The initial values of s 1 -s 5 are all 0.1. A set of control parameters (r 1 , r 2 , r 3 , r 4 ) can be obtained by sampling the quaternary joint normal distribution N(µ, ∑). In the iterative process, the updated methods of µ and ∑ are the same as those in this study. In summary, the experimental results show that the improvements of FSGWO in balancing diversity and convergence are feasible and effective. The proposed algorithm can produce high-quality solutions when used to solve high-dimensional complex MMOPs and has good convergence and stability.

Conclusions
To address the issue that the traditional GWO solves high-dimensional MMOPs with slow convergence speed and low quality solutions, a fuzzy strategy grey wolf optimizer (FSGWO) is proposed in this paper, the key improvements of which are as follows. (i) A new individual mutation strategy is proposed, which utilizes both global and local search information in the fuzzy search direction of mutation and enhances the ability of grey wolf individuals to find the optimal solutions. (ii) A fuzzy crossover operator is used to prevent new individuals from mutating in all dimensions and effectively improves the local search ability of FSGWO and the quality of solutions. (iii) The noninferior selection strategy is applied to update the population, and only better new individuals are allowed to update in the population. Therefore, the entire grey wolf population can be used to estimate the region where the optimal solution is located, which speeds up the convergence of FSGWO. (iv) The two control parameters of FSGWO are modeled by a binary joint normal distribution whose parameters are adaptively updated by a fuzzy perturbation, which effectively reduces the blindness of control parameter selection and improves the stability of the proposed algorithm. Finally, FSGWO is verified on 30 complex test functions of IEEE CEC2014 and 5 engineering application problems; the results show that the convergence of the proposed algorithm and quality of solutions are better than those of the competitive algorithms, which means that the improvements of FSGWO are feasible and effective.
Recent studies have shown that multiple populations have advantages over a single population in maintaining diversity and convergence [67,68]. For our future works, we are interested in some novel topics on GWOs with multiple populations, such as the exchange method of optimal solution information between different populations and the design idea of individual search direction in multiple populations. In addition, state-ofthe-art evolutionary algorithms, such as self-adaptive quasi-oppositional stochastic fractal search [69] and combined social engineering particle swarm optimization [70], have many creative update strategies for populations and can be used for reference in the future improvement of FSGWO.