A Comparative Study of Common Nature-Inspired Algorithms for Continuous Function Optimization

Over previous decades, many nature-inspired optimization algorithms (NIOAs) have been proposed and applied due to their importance and significance. Some survey studies have also been made to investigate NIOAs and their variants and applications. However, these comparative studies mainly focus on one single NIOA, and there lacks a comprehensive comparative and contrastive study of the existing NIOAs. To fill this gap, we spent a great effort to conduct this comprehensive survey. In this survey, more than 120 meta-heuristic algorithms have been collected and, among them, the most popular and common 11 NIOAs are selected. Their accuracy, stability, efficiency and parameter sensitivity are evaluated based on the 30 black-box optimization benchmarking (BBOB) functions. Furthermore, we apply the Friedman test and Nemenyi test to analyze the performance of the compared NIOAs. In this survey, we provide a unified formal description of the 11 NIOAs in order to compare their similarities and differences in depth and a systematic summarization of the challenging problems and research directions for the whole NIOAs field. This comparative study attempts to provide a broader perspective and meaningful enlightenment to understand NIOAs.


Introduction
Nature-inspired optimization algorithms (NIOAs), defined as a group of algorithms that are inspired by natural phenomena, including swarm intelligence, biological systems, physical and chemical systems and, etc. [1]. NIOAs include bio-inspired algorithms and physics-and chemistry-based algorithms; the bio-inspired algorithms further include swarm intelligence-based and evolutionary algorithms [1]. NIOAs are an important branch of artificial intelligence (AI), and NIOAs have made significant progress in the last 30 years. Thus far, a large number of common NIOAs and their variants have been proposed, such as genetic algorithm (GA) [2], particle swarm optimization (PSO) algorithm [3], differential evolution (DE) algorithm [4], artificial bee colony (ABC) algorithm [5], ant colony optimization (ACO) algorithm [6], cuckoo search (CS) algorithm [7], bat algorithm (BA) [8], firefly algorithm (FA) [9], immune algorithm (IA) [10], grey wolf optimization (GWO) [11], gravitational search algorithm (GSA) [12] and harmony search (HS) algorithm [13]. In addition to the theoretical studies of NIOAs, many previous works have made an in-depth investigation on how the NIOAs are applied to various domains. Single NIOAs have been reviewed comprehensively [14][15][16][17][18][19][20][21][22][23][24][25], which present the algorithms and their variants at a good breadth and depth. In the rest of this chapter, we summarize the current survey work of the NIOAs, discuss our motivations for this survey, present our research methodologies and scope of this work and finally, describe our contributions to this field.

Scope of Discussion
This survey work focuses on the single objective numerical optimization algorithms, which are the basis of the more complex optimization algorithms, such as the multi-objective optimization algorithms and the constrained optimization algorithms. We exclude the ant colony optimization (ACO) algorithm, ranked 4 in Table S1 of Supplementary Material A, from this survey study because it is designed to solve combinatorial optimization problems (COPs) [6] and is not in the scope of this work. We exclude the "Biogeography-Based Optimization" algorithm in Table S1 of Supplementary Material A, ranked thirteenth from this survey, since its total number of published articles 950, and the average number of articles published annually 73 both are lower than the values set for this selection of NIOAs. Finally, we consider that the selected 11 algorithms are reasonable. In this paper, we do not involve the variant methods and the applications of 11 NIOAs, since they have been discussed sufficiently in many reviews [14][15][16][17][18][19][20][21][22][23][24][25]. The selection method of NIOAs can identify the NIOAs of continuous hotspot research; some less studied algorithms are excluded, although they were brought up a long time ago (see Table S1 in Supplementary Material A).

Our Contributions
The contributions of this paper are as follows. 1. We present a comprehensive list of more than 120 MHAs, and make a preliminary statistical analysis of the basic information for the chosen NIOAs, which can provide a panoramic view for NIOAs study. It is the first attempt to systematically study the existing NIOAs, even though it is very hard work.

Scope of Discussion
This survey work focuses on the single objective numerical optimization algorithms, which are the basis of the more complex optimization algorithms, such as the multi-objective optimization algorithms and the constrained optimization algorithms. We exclude the ant colony optimization (ACO) algorithm, ranked 4 in Table S1 of Supplementary Material A, from this survey study because it is designed to solve combinatorial optimization problems (COPs) [6] and is not in the scope of this work. We exclude the "Biogeography-Based Optimization" algorithm in Table S1 of Supplementary Material A, ranked thirteenth from this survey, since its total number of published articles 950, and the average number of articles published annually 73 both are lower than the values set for this selection of NIOAs. Finally, we consider that the selected 11 algorithms are reasonable. In this paper, we do not involve the variant methods and the applications of 11 NIOAs, since they have been discussed sufficiently in many reviews [14][15][16][17][18][19][20][21][22][23][24][25]. The selection method of NIOAs can identify the NIOAs of continuous hotspot research; some less studied algorithms are excluded, although they were brought up a long time ago (see Table S1 in Supplementary Material A).

Our Contributions
The contributions of this paper are as follows. 1. We present a comprehensive list of more than 120 MHAs, and make a preliminary statistical analysis of the basic information for the chosen NIOAs, which can provide a panoramic view for NIOAs study. It is the first attempt to systematically study the existing NIOAs, even though it is very hard work.

Scope of Discussion
This survey work focuses on the single objective numerical optimization algorithms, which are the basis of the more complex optimization algorithms, such as the multiobjective optimization algorithms and the constrained optimization algorithms. We exclude the ant colony optimization (ACO) algorithm, ranked 4 in Table S1 of Supplementary Material A, from this survey study because it is designed to solve combinatorial optimization problems (COPs) [6] and is not in the scope of this work. We exclude the "Biogeography-Based Optimization" algorithm in Table S1 of Supplementary Material A, ranked thirteenth from this survey, since its total number of published articles 950, and the average number of articles published annually 73 both are lower than the values set for this selection of NIOAs. Finally, we consider that the selected 11 algorithms are reasonable. In this paper, we do not involve the variant methods and the applications of 11 NIOAs, since they have been discussed sufficiently in many reviews [14][15][16][17][18][19][20][21][22][23][24][25]. The selection method of NIOAs can identify the NIOAs of continuous hotspot research; some less studied algorithms are excluded, although they were brought up a long time ago (see Table S1 in Supplementary Material A).

Our Contributions
The contributions of this paper are as follows. 1. We present a comprehensive list of more than 120 MHAs, and make a preliminary statistical analysis of the basic information for the chosen NIOAs, which can provide a panoramic view for NIOAs study. It is the first attempt to systematically study the existing NIOAs, even though it is very hard work. 2. We analyze and summarize the common characteristics and differences of all the chosen NIOAs to provide a clear insight into the construction, design and application of NIOAs.
3. We compare and analyze the accuracy, the stability, the efficiency and the parameter sensitivity of the chosen NIOAs with different function features under high and low dimensional spaces, respectively, which can reflect the essential characteristics of each algorithm. 4. We discuss the challenges and future directions of the whole NIOAs field, which can provide a referencing framework for the future research of NIOAs.

Structure of the Paper
The rest of this paper is organized as follows. In Section 2, we extracted a unified representation as a foundation of comparison for the 11 NIOAs, and under this representation, the principles of 11 NIOAs have been discussed. In Section 3, the common characteristics and differences of 11 NIOAs have been analyzed and summarized. Section 4 focuses on the comparative study of the accuracy, the stability, the efficiency and the parameter sensitivity of these NIOAs with the 30 BBOB functions, and the Friedman test and Nemenyi test are constructed to analyze the performance of the compared NIOAs. In addition, the 11 NIOAs are applied to solve a constrained engineering optimization problem in Section 4. We discuss the challenges and future direction of the NIOAs in Section 5 and conclude this paper in Section 6.

Common NIOAs
Actually, most of the NIOAs have a similar structure, although they are defined in various forms. In this section, first, the common process will be extracted to offer a unified description for the NIOAs, and then the principles of the 11 NIOAs will be outlined and discussed under this unified structure. The unified representation makes it convenient to analyze the similarity and dissimilarity of these algorithms.

The Common Process for the 11 NIOAs
The common process of most of NIOAs is described in Figure 3, which can be divided into four steps. In step S1, the population and related parameters are initialized. Usually, the initial population is generated by random methods, which ensure it covers as much solution space as possible; the population size is selected based on expert experience and specific requirements, and generally, it should be as large as possible. Most NIOAs use iterative methods, and the maximum iteration times and precision threshold are two common conditions of algorithm termination, which should also be initialized in step S1. 2. We analyze and summarize the common characteristics and differences of all the chosen NIOAs to provide a clear insight into the construction, design and application of NIOAs.
3. We compare and analyze the accuracy, the stability, the efficiency and the parameter sensitivity of the chosen NIOAs with different function features under high and low dimensional spaces, respectively, which can reflect the essential characteristics of each algorithm.
4. We discuss the challenges and future directions of the whole NIOAs field, which can provide a referencing framework for the future research of NIOAs.

Structure of the Paper
The rest of this paper is organized as follows. In Section 2, we extracted a unified representation as a foundation of comparison for the 11 NIOAs, and under this representation, the principles of 11 NIOAs have been discussed. In Section 3, the common characteristics and differences of 11 NIOAs have been analyzed and summarized. Section 4 focuses on the comparative study of the accuracy, the stability, the efficiency and the parameter sensitivity of these NIOAs with the 30 BBOB functions, and the Friedman test and Nemenyi test are constructed to analyze the performance of the compared NIOAs. In addition, the 11 NIOAs are applied to solve a constrained engineering optimization problem in Section 4. We discuss the challenges and future direction of the NIOAs in Section 5 and conclude this paper in Section 6.

Common NIOAs
Actually, most of the NIOAs have a similar structure, although they are defined in various forms. In this section, first, the common process will be extracted to offer a unified description for the NIOAs, and then the principles of the 11 NIOAs will be outlined and discussed under this unified structure. The unified representation makes it convenient to analyze the similarity and dissimilarity of these algorithms.

The Common Process for the 11 NIOAs
The common process of most of NIOAs is described in Figure 3, which can be divided into four steps. In step S1, the population and related parameters are initialized. Usually, the initial population is generated by random methods, which ensure it covers as much solution space as possible; the population size is selected based on expert experience and specific requirements, and generally, it should be as large as possible. Most NIOAs use iterative methods, and the maximum iteration times and precision threshold are two common conditions of algorithm termination, which should also be initialized in step S1.   The fitness function is the unique indicator that reflects the performance of each individual solution, and it is designed by the target function (i.e., the BBOB functions will be described in Section 4.1), which usually has a maximum or minimum value. Generally, an individual has its own local optimal solution, and the whole population has a global optimum. In step S2, the fitness values of the population in each iteration are computed, and if the global best solution satisfies the termination conditions, NIOAs will output the results (in step S4). Otherwise, step S3 is implemented, which performs the key operations (defined by various components or operators) to exchange information among the whole population in order to evolve excellent individuals. Then, the population is updated, and the workflow jumps to step S2 to execute the next iteration. According to the above process, a set of commonly used symbols are given in Table 1 as a unified description for the 11 NIOAs, where D represents the dimension number of objective functions, M is the individual number of each NIOA and N the total iterative times.

Conceptions Symbols Description
Space dimension The expression of the i th solution on the t th iteration, also used to represent the i th individual Local best solution Holland [2] proposed the GA algorithm, which is based on natural selection (named "selection operator s o "), genetic (named "crossover operator c o ") and mutation (named "mutation operator m o ") mechanisms. The encoding method of the GA algorithm is decided by the specific problems, and common encoding schemes include binary, natural number, real number, matrix, tree and quantum. There are many types of selection, crossover and mutation operators, such as roulette wheel selection, stochastic universal sampling, local selection and tournament selection for s o , one-point crossover, two-point crossover, multi-point crossover and uniform crossover for c o and the basic mutation operator (that chooses one or more genes to randomly change), the inversion operator (that randomly chooses two gene points to inverse the genes between two points), for m o . The types of three operators are associated with the encoding schemes. Supposing ϑ 1 , ϑ 2 and ϑ 3 are the probabilities of selection, crossover and mutation, respectively, the steps of the GA algorithm are described as Algorithm 1.

Particle Swarm Optimization (PSO) Algorithm
Kennedy [3] put forward the PSO algorithm, which simulated the bird swarm behavior. The movement method, represented by the position and velocity of the i th individual in the d th dimension for the (t + 1) th iteration, is described in Equation (1).
where c 1 and c 2 are the learning factors, rand 1 and rand 2 are random numbers uniformly distributed in the range [0, 1] and the velocity . The steps of the PSO algorithm are described as Algorithm 2.

Algorithm 2 PSO
Input: the parameters M, N, δ, c 1 and c 2 Begin S1: initialize x i (t) and v i (t) randomly, 0 < i ≤ M, iterative times t = 1; S2: compute f (i) , 0 < i ≤ M, update p i (t) and p g (t), if it satisfies (t > N or precision ≤ δ), then go to step S4; otherwise, go to step S3; S3: update x i (t) and v i (t) according to Equation (1), iterative times t = t + 1; go to step S2; S4: output the optimized results. End

Artificial Bee Colony (ABC) Algorithm
Karaboga [5] presented the ABC algorithm, and there are three kinds of bees: employed foragers, scouts and onlookers. An employed forager associates with one food source and shares it with other bees by certain probability; scouts are in charge of searching new food sources and onlookers find food sources through sharing information with employed foragers. The position of the i th individual on the t th iteration is x i (t) that is generated by Equation (2).
Here, L d and U d are the lower and upper bounds in the d th dimensional space, respectively; rand (0, 1) is the random number uniformly distributed in the range (0, 1). Employed foragers search for new food sources according to Equation (3).
where 0 < i, j ≤ M, i = j, ϕ is the random number uniformly distributed in the range (0, 1). Onlookers choose food sources according to Equations (4) and (5), If a food source cannot be updated after Limit times searches, the ABC algorithm deletes it and the corresponding employed forager changes to the scout; supposing there are F employed foragers initially, the steps of the ABC algorithm are described as Algorithm 3.

Algorithm 3 ABC
Input: the parameters M, N, δ, F, Limit Begin S1: initialize M individuals x i (t) randomly by Equation (2), and appoint M 2 bees to the employed foragers, iterative times t = 1; S2: compute f (i), 0 < i ≤ M, update p g (t); if it satisfies (t > N or precision ≤ δ), then go to step S4; otherwise, go to step S3; S3: employed foragers search new food sources by Equation (3) and compute f (i); update the food sources if the new one is better than the old one; onlookers choose food sources of employed foragers according to Equations (4) and (5), and generate new food sources by Equation (3); update the food sources if the new one is better than the old one; if there are some food sources which need to be given up (cannot be optimized after Limit times searches); the corresponding bees become the scouts, and generate new sources by Equation (2); increase times t = t + 1, go to step S2; S4: output the optimized results. End

Bat Algorithm (BA)
Yang [8] presented the BA algorithm that is based on the echolocation behavior of bats. Suppose the frequency of a sound wave is Freq ∈ [Freq min , Freq max ], Freq min and Freq max lower and upper bound, respectively. The sound intensity and pulse emissivity are defined as A ∈ [A min , A max ] and r, respectively; the individuals update their positions by Equation (6).
where β i is the random number uniformly distributed in the range [0,1]. The bat generates a new solution by Equation (7), where ε is the random number in the range [−1, 1], A is the average sound intensity of all the bats. A i and r i are updated by Equations (8) and (9): where α and γ are two constants, r 0 i is the initialized value of r, the steps of the BA algorithm are described as Algorithm 4.

Algorithm 4 BA
Input: the parameters M, N, δ, Freq, A, α and γ Begin S1: initialize M individuals x i (t) randomly, 0 < i ≤ M, iterative times t = 1; S2: compute f (i), 0 < i ≤ M, update p g (t); if it satisfies (t > N or precision ≤ δ), then go to step S4; otherwise, go to step S3; S3: update Freq, x i (t) and v i (t) by Equation (6), generate a random number rand 1 , if rand 1 > r, the bat with the global optimum generates a new solution by Equation (7). Bats generate new solutions randomly, and BA generates a random number rand 2 ; if rand 2 < A and the new solution is better than the old one, BA updates the corresponding position by Equation (7), and updates A i and r i by Equations (8) and (9); iterative times t = t + 1, go to step S2; S4: output the optimized results. End In 1958, Burnet [34] presented clonal selection theory, and Bersini [10] first used an artificial immune system to solve the discrete problem. Generally speaking, the common immune algorithm (IA) adopts the learning strategy similar to GA, while IA uses the affinity to guide the searching process [35]. The affinity is defined by information entropy; the average information entropy H(i, j) of antibodies i and j is described as follows.
where H l (i, j) = ∑ s z=1 −p zl log p zl , indicates the information entropy of the l th bit of genes for antibodies i and j. p zl is the probability of regarding the l th bit of genes for antibodies i and j as the gene letter K z , K z ∈ {K 1 , K 2 , . . . , K s }; s is the number of gene letters. The affinity between antibodies i and j reflects the similarity of two antibodies, which is defined as follows.
The concentration of antibodies reflects the diversity of the whole population. The density of the i th antibody is described as follows.
where h 1 is the threshold of the affinity. The activity degree refers to the comprehensive ability of the antibody to respond to antigen and be activated by other antibodies; generally, the antibody with large affinity and small concentration will have a large activity degree. The activity degree of the i th antibody is defined as follows.
where f it i is the fitness value of the i th antibody, Con i is the concentration of the i th antibody, and h 2 is the threshold of antibody density. The steps of IA are described as Algorithm 5.

Algorithm 5 IA
Input: the parameters M, N, δ, P c , P m Begin S1: initialize M antibodies x i (t) randomly, 0 < i ≤ M, iterative times t = 1; S2: compute f (i) , 0 < i ≤ M, update p g (t); if it satisfies (t > N or precision ≤ δ), then go to step S4; otherwise, go to step S3; S3: compute the affinity, the concentration and activity degree according to Equations (11), (12) and (13); the selection operation is executed by the roulette method to choose antibodies with large activity degree, and then execute crossover and mutation operations according to the probabilities P c and P m , respectively; iterative time t = t + 1, go to step S2; S4: output the optimized results. End 2.2.6. Firefly Algorithm (FA) The FA [9] algorithm was proposed by Yang Xin-She, and its main operations are the update of firefly luminance I, the computation of firefly attraction degree β and the update of firefly position. Suppose the attraction factor is γ, maximum luminance is I 0 , the maximum attraction degree is β 0 and the step factor is step. The luminance is defined as Equations (14) and (15).
The attraction degree is described as Equation (16).
Equation (17) defines the position updating of the i th firefly when it moves toward the j th firefly. Here, x i (t) and x j (t) are the positions of the i th firefly and the j th firefly, respectively, at the iteration t. For convenience, we use x i (t) to represent the i th firefly.
where ε i is the random number that follows Gaussian distribution or uniform distribution. The steps of FA are described as Algorithm 6.

Algorithm 6 FA
Input: the parameters M, N, δ, I 0 , β 0 and step Begin S1: initialize M individuals x i (t) randomly, 0 < i ≤ M, iterative times t = 1; S2: compute f (i), 0 < i ≤ M, if t > 1 and the new position is better than the old one, update x i (t) and p g (t); if it satisfies (t > N or precision ≤ δ), then go to step S4; otherwise, go to step S3; S3: for the i th firefly x i (t), FA searches another firefly (suppose the j th firefly x j (t), and i = j) in the population that has the luminance calculated with Equation (14). If the luminance of x j (t) is larger than that of x i (t), x i (t) moves toward x j (t) by Equation (17), iterative times t = t + 1; go to step S2; S4: output the optimized results. End

Cuckoo Search (CS) Algorithm
The CS [7] algorithm was also put forward by Yang Xin-She, based on the brood parasitism of certain cuckoos species and the Lévy flights characteristics. The CS algorithm follows three idealized rules: (1) each cuckoo lays one egg at a time and dumps its egg in the randomly selected nest; (2) the best nest with the highest quality of eggs will carry over to the next generations; (3) the number of available host nests is fixed, and the egg laid by a cuckoo would be discovered by the host bird with a probability p a ∈ [0, 1], that is, the fraction p a of M nests would be replaced by the new nests. The i th individual updates its host nest x i (t) by Equation (18), where α is the scaling factor of step size and α usually equals 1. The product ⊗ means entrywise multiplications and Levy(λ) indicates that the Lévy flight draws from the Lévy distribution. It is difficult to satisfy the real Lévy distribution, and Equation (19) is usually used to approximate Lévy flight: where u and v follow the Guassian distribution, u ∼ N 0, σ 2 , v ∼ N(0, 1), , β = 1.5. Some cuckoo eggs may be found and discarded by the host, and the probability of abandonment is p a . When a cuckoo egg is abandoned, the cuckoo needs to find a new boarding site and update it with the following Equation (20): (20) where x j (t) and x k (t) are two different solutions selected randomly by random permutation, H () is a Heaviside function, is a random number drawn from a uniform distribution, s is the step size that is defined as a random number in the scope of (0, 1) and α is a scaling factor of step size. The steps of the CS algorithm are described as Algorithm 7.

Algorithm 7 CS
Input: the parameters M, N, δ, α and p a Begin S1: initialize M host nests , then go to step S4; otherwise, go to step S3; S3: choose a cuckoo randomly to generate a new solution by Equation (18); choose a nest among M individuals randomly; if the new solution is better than the chosen nest, replace it; a fraction (p a ) of the worst nests are replaced by the new nests by Equation (20), iterative times t = t + 1; go to step S2; S4: output the optimized results. End

Differential Evolution (DE) Algorithm
The DE [4] algorithm, proposed by Storn, has three operations: mutation, crossover and selection, but is different from the GA algorithm. The main differences are: (1) in GA, two sub-individuals are generated by crossing two parent individuals, whereas in DE, new individuals are generated by perturbing the different vectors of several individuals; (2) in GA, the progeny individual replaces the parent individual with a certain probability, while in DE, the individual is only updated when the new individual is better than the old one. More specifically, the basic strategy of DE can be described as follows.
1. Mutation operation Each individual x i (t) can execute the mutation operation according to Equation (21), where x r 1 (t), x r 2 (t) and x r 3 (t) are three individuals selected randomly from the whole population, r 1 = r 2 = r 3 = i and F ∈ [0, 2] is the mutation factor.

Crossover operation
The crossover individual u i (t) = (u i,1 (t), u i,2 (t), . . . , u i,D (t)) can be generated by the mutation individual v i (t) and its parent individual x i (t), as described in Equation (22), where rand is a random number in the range [0, 1], CR is the crossover factor and it is a constant in the range [0, 1]; j_rand is an integer selected randomly from the range [1, D].

Selection operation
The DE algorithm adopts the "greedy" strategy; the next-generation individual is chosen between parent individual x i (t) and the crossover individual u i (t), which has the better fitness value, as described in Equation (23).
The steps of the DE algorithm are described as Algorithm 8.
, then go to step S4; otherwise, go to step S3; S3: execute mutation operation by Equation (21), execute crossover operation by Equation (22) and execute selection operation by Equation (23), iterative times t = t + 1; go to step S2; S4: output the optimized results. End 2.2.9. Gravitational Search Algorithm (GSA) The GSA [12] algorithm was proposed by Esmat Rashedi which is based on the law of gravity and mass interactions. On the t th iteration, the force acting on particle x i (t) from particle x j (t) is defined as Equation (24), where I pi is the active gravitational mass related to the particle x j (t), I aj is the passive gravitational mass related to the particle x i (t), ε is a small constant, G(t) is the gravitational constant at iteration t, which is defined as Equation (25): where G 0 = 100 and α = 20. R i,j (t) is the Euclidian distance between two particles x i (t) and x j (t), described as follows.
On the t th iteration, the total force acting on the particle x i (t) in the dimension d is defined as Equation (27), where rand j is a random number in the interval [0, 1], the acceleration of the particle x i (t) in the dimension d is defined as Equation (28): where I ii (t) is the inertial mass of particle x i (t); one particle updates its velocity and position according to its acceleration, as described in Equation (29).
The GSA algorithm updates the gravitational and inertial masses by Equation (30). where f i (t) is the fitness value of the particle x i (t), best(t) and worst(t) represent the best and the worst fitness value among all the particles, respectively; they are defined as Equation (31).
The steps of the GSA algorithm are described as Algorithm 9.

Grey Wolf Optimizer (GWO)
Grey wolf optimizer was proposed in 2014, in which four types of grey wolves are employed for simulating the leadership hierarchy, including alpha (α), beta (β), delta (δ) and omega (ω). The α wolf is called the dominant wolf because his/her orders should be followed by the pack, the β wolf is probably the best candidate to be the α wolf in case the latter passes away or becomes very old, the δ wolf should respect the α but commands the other lower-level wolves as well, the lowest ranking grey wolf is ω and it plays the role of scapegoat. Grey wolves encircle prey during a hunt. In GWO, the encircling behavior can be described by the following equations: where A and C are coefficient vectors, x p (t) is the position vector of the prey (the solution) and x(t) is the position vector of a grey wolf. A and C are calculated as follows, where a is linearly decreased from 2 to 0 over the course of iterations, r 1 and r 2 are random values in the scope of [0, 1]. In order to mathematically simulate the hunting behavior of grey wolves, GWO supposes that α, β and δ have better knowledge about the potential location of prey; it saves the first three best solutions obtained so far and oblige the other wolves to update their positions according to the position of the best search wolves, described as follows.
Here, A 1 , A 2 , A 3 and C 1 , C 2 ,C 3 are the coefficients that are generated by different random values. The steps of the GWO algorithm are described as Algorithm 10.

Algorithm 10 GWO
Input: the parameters M, N, δ Begin S1: initialize M individuals x i (t) randomly, 0 < i ≤ M, iterative times t = 1; S2: compute f (i), 0 < i ≤ M, rank the solutions and find the current top three best wolves: , then go to step S4; otherwise, go to step S3; S3: update the solution of each individual via Equation (42), update coefficients a, A and C, iterative times t = t + 1; go to step S2; S4: output the optimized results. End

Harmony Search (HS) Optimization
Harmony search (HS) optimization algorithm [13] is inspired by improvization in music-playing. In a band, all players adjust their pitch to achieve a wonderful harmony. In the process of global optimization, all decision variables constantly adjust their own values to make the objective function achieve global optimization. A harmony memory (HM) with size given by the parameter HMS (harmony memory size) stores the best harmony vectors during the optimization. A new harmony vector is generated from the HM based on memory considerations, pitch adjustments and randomization. For x i (t), i = 1, 2, . . . , D, it chooses a new value using the parameter of harmony memory considering rate (HMCR), which varies between 0 and 1 as follows.
Equation (43) indicates that x i (t) would be updated from HM if a random-generated number r 1 is less than or equal to HMCR, and would be randomly chosen one feasible value from X i , which is the definition space of the i th dimensional variable. Every component x i (t) is examined to determine whether it should be pitch-adjusted. The procedure uses the parameter of pitch adjusting rate (PAR) that sets the rate of adjustment for the pitch chosen from the HM as follows.
Here, r 2 is a random-generated number and α is defined in Equation (45): where BW is an arbitrary distance band width for continuous design variable, and u(−1, 1) is a uniform distribution between −1 and 1. The steps of the HS algorithm are described as Algorithm 11.

Algorithm 11 HS
Input: the parameters M (or HMS), N, δ, HMCR, PAR and BW Begin S1: initialize the harmony memory filled with M solutions that are randomly generated, iterative times t = 1; , then go to step S4, otherwise, go to step S3; S3: generate new harmony according to Equations (43) and (44); if it is better than one harmony in HM, replace it, iterative times t = t + 1; go to step S2; S4: output the optimized results. End

Common Characteristics
As shown in Section 2, although the NIOAs simulate different population behaviors, all of them are the iterative methods and have some common characteristics which satisfy the Reynolds model [36] and this model describes the basic rules for the aggregation motion of the simulated flock created by a distributed behavioral model.

Randomicity.
Randomicity is the uncertainty of an event with a certain probability and can enhance the global search capability of individuals. All the 11 NIOAs initialize individuals randomly, which can cover the space as large as possible, some other mechanisms have been adopted in them which can enhance the exploration and exploitation abilities, such as the mutation operators m o in GA, IA and DE, the random parameters rand 1 and rand 2 in PSO, rand(0, 1) and ϕ in ABC, β i and ε in BA, ε i in FA, Lévy flight in CS, rand and rand j in DE, rand j and rand i in GSA, r 1 , r 2 ,A 1 , A 2 , A 3 and C 1 , 2. Information Interactivity. The individuals in the NIOAs should exchange information directly or indirectly, which can increase the probability of obtaining the global optimum. For instance, GA, IA and DE adopt the crossover operator c o to exchange information; for PSO, each particle utilizes the global optimum p g (t) to update its position; employed foragers or onlookers in ABC update their velocities v i,d (t + 1) using another different position x j,d (t); bats in BA use global optimum p g (t) to update their positions; in FA, firefly x i (t) moves toward x j (t) through mixing the positions information of x i (t) and x j (t); as to GSA, the force F d i,j (t) is computed according to the positions of particles x i (t) and x j (t), which are used to update the position of each particle; the wolf in GWO updates its position according to the positions of wolves α, β and δ; and at last but not least, a new harmony in HS is generated from HM.
3. Optimality. The individuals in the NIOAs move toward the global best solution through different mechanisms of information exchange. For example, the good genes in GA and DE are chosen as the next generation through the operators s o , c o and m o ; particles in PSO update their positions, influenced by the local optimum p i (t) and the global optimum p g (t); onlookers in ABC choose the food sources that have better fitness than the old one, the bat in BA generates a new solution and updates its position only if the new solution is better than the old one; the good antibodies in IA would save in a memory database to participate in the next iteration; as to FA, one firefly moves toward the fireflies that have larger luminance; CS replaces solutions only when the new one is better than the chosen solution and a fraction of worse solutions will be replaced by the newly generated solutions; in GSA, the gravitational and inertial masses are calculated by the fitness evaluation and the better individuals have higher attractions and walk more slowly; wolves in GWO update their positions according to the positions of wolves α, β and δ, which have better solutions in the wolf population; and a new harmony vector in HS can replace the old harmony in HM only if the new harmony is better than the old one.
In addition to the aforementioned common characteristics of theoretical implementation, these common NIOAs are varied to different versions to handle different problems, including combinational optimization problems (COPs) and multi-objective optimization problems (MOOPs). Similar variant methods are adopted to improve the optimization performance of NIOAs, for example, adaptive technology, fuzzy theory, chaos theory, quantum theory and hybridization technology. The classic articles about the above work are listed in Section 3.2, which provides a comprehensive summary of the 11 different NIOAs.

Variant Methods of Common NIOAs
The summarization of variant methods in other survey work [26][27][28][29][30] are fragmented; in this work, we systematically summarize the popular variants of the 11 common NIOAs, and the popular methods are described in Table 2; because of the massive papers, the summarized literature are the state-of-the-art or representative papers, and the superscript is the citation times from the Web of Science and Scopus databases (by the end of 13 October 2020). As described in Table 2, the following observations have been made: (1) All of the 11 common NIOAs have their versions of handling MOOPs and COPs and have been improved through adopting various adaptive strategies, for example, automatically tuning parameters.
(2) Some classic mathematical and physical theories have been used to enhance the performance of NIOAs, such as fuzzy theory [133], chaos theory [56] and quantum theory [48,132], and the exploration and exploitation abilities of NIOAs.
In general, the proposers maintain the view that a hybridization mechanism can make full use of the advantages of some NIOAs to overcome the disadvantages of the other NIOAs. Following are some examples of this view: DE has the abilities of strong global exploration and rapid convergence; PSO is easy to fall into the local optimum; ABC has slow convergence speed and is poor to balance the exploration and exploitation; BA and FA have a good performance on low dimensional problems, but are not good at solving high dimensional and nonlinear problems; GWO has a poor diversity of population and slow convergence, etc. Although many excellent hybrid NIOAs have been proposed (as mentioned above), what we have to admit is a confusing trend and route of NIOAs enhancement: in order to improve some NIOAs, the "proposers" always claimed that the selected NIOAs have a series of shortcomings, such as being easy to fall into local optimum, slow convergence and not being good at solving high-dimensional problems, and the proposed hybrid methods can improve the performance of the selected NIOAs. It is very likely that, without being fully aware of the merits and demerits of the common NIOAs, a so-called "novel" algorithm is only a mixture of NIOAs. The performance of such hybrid NIOAs, including convergence speed, ability to solve high-dimensional problems and algorithm accuracy, need to be verified with comprehensive experiments. Section 4 will give a systematic comparison and analysis of the performance of the 11 common NIOAs.

Differences
From our observation, there are three important aspects that influence the performance of NIOAs, including the method of parameter tuning, the learning strategy and the topological structure of the population. As the original 11 NIOAs adopt empirical parameters, we will not discuss the problem of parameter-tuning in this paper. However, we will address the sensitivity of the algorithms to their parameters setting in Section 4.2.
In this section, we discuss the differences in learning strategies, the topology structures, the time complexities and the interactions of algorithmic components for the 11 NIOAs.

Learning strategies
Each NIOA has its own learning strategy. GA exchanges information among genes through selection, crossover and mutation operators, while IA updates antibodies according to the conception of activity degree. In GA, the selection operation is executed by the probability ϑ 1 , while in IA, the selection operation is executed by the roulette method to choose antibodies with large activity degree; the main differences between the GA and DE algorithms were described in Section 2.2.8. In PSO, x i (t) is updated by its local best optimum p i (t) and the global best optimum p g (t) of the whole population (see Equation (1)), and in BA, the updating methods of positions and velocities have some similarities to those of PSO. To some extent, BA can be regarded as a combination of the global optimum p g (t) in PSO with an extensive local search method, which is described by the sound intensity A i (t) and the pulse emissivity r i (t) (see Equations (7)-(9)); in FA, if the attraction factor γ (see Equation (17)) tends to zero, it is a special case of PSO; according to Equations (1) and (29), GSA also can be regarded as a variant of PSO. The difference among them is that velocity updating of the former is influenced by local optimum p i (t) and global optimum p g (t), while those of the latter is affected by all the other individuals (named as the total force); in CS, each cuckoo randomly updates its solution by Lévy flight (see Equation (18)), and new solutions are generated to replace the worst solutions through learning from other individuals. The harmony in HS has interactions with the best individuals (solutions in harmony memory) according to certain probability, which updates solution based on memory considerations, pitch adjustments and randomization (see Equations (43) and (44)); compared to the other nine NIOAs, GWO and ABC have hierarchical evolution mechanisms and the roles can be changed dynamically according to the quality of individuals' solutions. Wolves in GWO learn from the current top three best wolves α, β and δ (see Equation (42)), and onlookers in ABC learn from the selected employed foragers that found food sources (see Equation (4)).

Topological structures
According to the scope of interaction among individuals, the topologies of the 11 SOIAs can be roughly grouped into two categories: global neighborhood topology (GNT) and local neighborhood topology (LNT). In this paper, we consider the neighborhood topology in the period of each iteration of NIOAs, and thus we have the following observation and conclusions. GA is LNT, because it can only exchange information between two genes in each iteration; DE is also LNT for a similar reason; ABC can be regarded as LNT too because onlookers only follow certain employed foragers, and scouts generate new solutions randomly and have no interaction with other bees; FA belongs to LNT because a firefly moves towards another firefly that has larger luminance; CS is LNT because individuals either generate new solution independently (see Equation (18)) or produce new solutions and exchange information with other individuals (see Equation (20)); similar to GA, IA also is LNT because two antibodies exchange information through crossover factors. As each particle updates its position using p g (t), PSO is regarded as GNT; BA is classified as GNT for the same reason as PSO; GWO is also GNT because each wolf in GWO updates its position following the best three wolves; GSA belongs to GNT because each particle updates its position following the total force from all the other particles; HS updates solutions from HM that stored the best harmony vectors of the whole population, so it is GNT. The topologies of the 11 NIOAs are illustrated in Figure 4, where each circle represents an individual; the solid line represents those two individuals have information exchange in the current iteration and the dotted line indicates that two individuals maybe exchange information in the sense of probability during the whole evolution process.
best individuals (solutions in harmony memory) according to certain probability, which updates solution based on memory considerations, pitch adjustments and randomization (see Equations (43) and (44)); compared to the other nine NIOAs, GWO and ABC have hierarchical evolution mechanisms and the roles can be changed dynamically according to the quality of individuals' solutions. Wolves in GWO learn from the current top three best wolves , and (see Equation (42)), and onlookers in ABC learn from the selected employed foragers that found food sources (see Equation (4)).

Topological structures
According to the scope of interaction among individuals, the topologies of the 11 SOIAs can be roughly grouped into two categories: global neighborhood topology (GNT) and local neighborhood topology (LNT). In this paper, we consider the neighborhood topology in the period of each iteration of NIOAs, and thus we have the following observation and conclusions. GA is LNT, because it can only exchange information between two genes in each iteration; DE is also LNT for a similar reason; ABC can be regarded as LNT too because onlookers only follow certain employed foragers, and scouts generate new solutions randomly and have no interaction with other bees; FA belongs to LNT because a firefly moves towards another firefly that has larger luminance; CS is LNT because individuals either generate new solution independently (see Equation (18)) or produce new solutions and exchange information with other individuals (see Equation (20)); similar to GA, IA also is LNT because two antibodies exchange information through crossover factors. As each particle updates its position using ( ), PSO is regarded as GNT; BA is classified as GNT for the same reason as PSO; GWO is also GNT because each wolf in GWO updates its position following the best three wolves; GSA belongs to GNT because each particle updates its position following the total force from all the other particles; HS updates solutions from HM that stored the best harmony vectors of the whole population, so it is GNT. The topologies of the 11 NIOAs are illustrated in Figure 4, where each circle represents an individual; the solid line represents those two individuals have information exchange in the current iteration and the dotted line indicates that two individuals maybe exchange information in the sense of probability during the whole evolution process.  GNT and LNT have their own advantages and disadvantages. Generally speaking, all individuals in GNT are connected to each other and attracted are to the global best solution of the whole population; its merits include rapid convergence and strong exploitation ability, while it is more likely to be confined at a local optimum; on the contrary, each individual in LNT only connects to several other individuals in its neighborhood and is attracted by the best position of the neighborhoods. LNT can make individuals search diverse regions of problem space and has a strong exploration ability, while it may have a slow convergence speed.

The interactions of algorithmic components
It is necessary to consider insights into the contribution of each component in NIOAs. The interactions of algorithmic components can reflect the core optimization power of the overall method [136]. According to the learning strategies and topological structures of GNT and LNT have their own advantages and disadvantages. Generally speaking, all individuals in GNT are connected to each other and attracted are to the global best solution of the whole population; its merits include rapid convergence and strong exploitation ability, while it is more likely to be confined at a local optimum; on the contrary, each individual in LNT only connects to several other individuals in its neighborhood and is attracted by the best position of the neighborhoods. LNT can make individuals search diverse regions of problem space and has a strong exploration ability, while it may have a slow convergence speed.

The interactions of algorithmic components
It is necessary to consider insights into the contribution of each component in NIOAs. The interactions of algorithmic components can reflect the core optimization power of the overall method [136]. According to the learning strategies and topological structures of NIOAs, the interactions of algorithmic components can be described. GA, IA and DE exchange information through three components: selection, crossover and mutation operators. For GA, selection and crossover are two main components, and information exchange happens between two genes in each iteration; mutation is executed in a low probability which can increase the diversity of the population. Owing to the local topology, GA has slow convergence. For IA, it uses affinity to guide the searching process. For DE, the mutation is the main operation and generates new solutions by perturbing the different vectors of several individuals. PSO updates the velocity of each particle using its historical and globally optimal solutions; the topology is a full connection, and information exchange is very fast, and thus it is easy to fall into local optimum. ABC has three components: employed foragers, scouts and onlookers. Employed foragers learn from other randomly selected individuals to update their velocity, onlooker finds solutions through sharing information with a specific employed forager and scouts can generate new solutions randomly. The topological structure of ABC is LNT, and the roles of bees can be changed dynamically. BA has a similar mechanism of exchanging information to those of PSO; it updates the velocity using the global optimum; its topology structure is GNT. FA updates its position by exchanging information with another firefly; it is regarded as a special case of PSO, but it belongs to LNT. The particle in GSA updates its velocity according to the acceleration, and the latter uses the total force from all the particles; thus, GSA belongs to GNT. GWO has four types of grey wolves, including alpha (α), beta (β), delta (δ) and omega (ω). GWO saves the top three best solutions (α, β, δ) obtained so far and obliges the other wolves to update their positions according to the position of the best three wolves. HS exchanges information with the best solutions (stored in harmony memory) according to a certain probability; it belongs to GNT.

Time complexities analysis
The time complexity of the 11 NIOAs is described in Table 3 below, where D, M and N represent the number of the dimensions of objective functions, M is the number of the individuals of each NIOA and N is the total iterative times, respectively, and has been defined in Table 1. In order to calculate the time complexity of individual operations in the NIOAs, we divide the NIOAs operations into various components and assign corresponding computational costs. Specifically, we use T init to denote the computational cost of initialization, T eval is the evaluation of a single solution and T iter is the computational cost of one iteration of the main loop of the NIOAs, which is determined by the cost of the operations of updating solutions (T upd ), evaluating solutions (T e ) and the calculating statistics (T stats ). For the 11 compared NIOAs, the computational cost of T init , T eval , T e and T stats have the same values, which are calculated as follows: T init = D·M, T eval = D, T e = M·T eval , T stats = M. Thus, the computational cost of one iteration is reckoned as T iter = T upd + T e + T stats and T upd varies with the different NIOAs, the total computational cost of an NIOA (for example, PSO) is defined as T PSO = T init + T iter ·T. The time complexity of the 11 compared NIOAs is described in Table 3. From Table 3    respectively. In order to analyze the sensitivity of the compared NIOAs to their parameter settings, we apply two groups of parameters for each of the 11 NIOAs. The first group of parameters is consistent with the parameters of the 11 original algorithms. The second group of parameters is different from the first group and is randomly selected according to the principles of the 11 NIOAs. For example, the mutation probability ϑ 3 in GA should generally not be too large, and its value in the second group is set to 0.25. The parameters of the compared NIOAs are described in Table 4. For GA and IA in this work, we adopt the method of roulette wheel selection for the selection operators, the multi-point crossover and one-point crossover for the crossover operators and the basic mutation method for the mutation operators that chooses one or more genes to randomly change.

Algorithms Parameters I Parameters II
GA ϑ 1 = 1, ϑ 2 = 0.8, ϑ 3 = 0. Similarly, the experiment results for the four kinds of criterion values on the second group of parameters are presented in Tables S14-S19 of Supplementary Material D for the low dimension (=10) and Tables S20-S25 of Supplementary Material E for the high dimension (=50). Again, the values in bold represent the algorithm with the best result.
Based on the above experimental results, we count the number of times that these 11 NIOAs achieved a "good" result on the 30 functions, as described in Table S26 of Supplementary Material J. In order to objectively analyze the performance of the NIOAs, if the result of the BEST criterion is within 20% on the optimal solution, it is regarded as the "good" result; for the STD criterion, values are less than 50 of F1-F10, are less than 150 of F11-f20 and are less than 300 of F21-F30 are considered as "good" results. In order to compare the performance of 11 NIOAs, the results of Tables S2-S13 of Supplementary Materials B and C are briefed in Table 5, the results of Tables S14-S25 of Supplementary Materials D and E are briefed in Table 6. As described in Tables 5 and 6, the bold number is the number of wins for the NIOAs on a specific criterion, and the corresponding winning functions are shown in the brackets followed. In addition, as described in Tables S27-S30 of Supplementary Material K, we also give the mean error (AVERAGE ± STD) values of the 11 NIOAs on the 30 BBOB functions, where the values in bold represent the algorithm having the best result.
In this section, we not only compare the accuracy and stability of different NIOAs, but also analyze the sensitivity of each selected NIOA by setting two groups of parameters for the compared NIOAs. It should be noted that it is impossible to find a super NIOA to solve all optimization problems, and it is more meaningful to design a NIOA for a specific problem. However, the experimental results may be helpful for researchers to understand the learning strategy and topology of these NIOAs. In addition, two groups of parameters are chosen to analyze the sensitivity of the 11 NIOAs to their parameters setting, which can roughly reflect the robustness of the compared NIOAs.

Analysis of accuracy and stability
We compare the 11 algorithms in terms of accuracy and stability under the two groups of parameters. The rankings of accuracy and stability of the compared NIOAs is roughly the same. Based on the experimental results of Tables 5 and 6, and Table S121 of Supplementary Material J and Tables S122-S125 of Supplementary Material K, the following observations can be made: (1) As described in Table S26, for the accuracy in low dimensional space, all the NIOAs can obtain good results on at least half of the BBOB functions, while in high dimensional space, the accuracy of all the NIOAs is much worse than that in the low dimensional space, only GSA, DE and CS obtain relatively good results. For the stability in low dimensional space, all the NIOAs can obtain good results on at least half of the BBOB functions, while in high dimensional space, the stability of all the NIOAs is much worse; only CS, DE, ABC and GSA obtain relatively good stability.
(2) As shown in Tables 5 and 6, DE and CS can receive better solutions and stability compared to the other nine NIOAs for both groups of parameters. Especially, for the BEST criterion on the two groups of parameters, DE achieves the best results among the 11 NIOAs. DE and CS are the most stable algorithms among the 11 NIOAs for the different parameter settings. According to the mean error values of Tables S27-S30, DE and CS can receive obvious better results in low dimensional space compared with the other nine NIOAs, and in high dimensional space, CS, DE and GWO can obtain better mean error values than the other eight NIOAs.

Analysis of the parameter sensitivity on the 11 NIOAs
Undoubtedly, the optimization results of all the compared NIOAs are sensitive to their parameters settings, as described in Supplementary Materials B-E. In this work, in order to analyze the degrees of sensitivity for the compared NIOAs, we think that an NIOA is sensitive to its parameters setting if the difference of the two results on the same BBOB function under two groups of parameters is greater than one order of magnitude (one result value is 10 times greater than the other). The statistical results are shown in Table 7, and if the experimental results of certain functions differ by two orders of magnitude, two stars are marked on the corresponding function, etc. As described in Table 7, the following observations can be made: Table 7. The sensitivity comparison of each criterion for 11 NIOAs under two groups of parameters. (1) DE, CS, HS, GSA, GWO, FA, BA and IA are sensitive to their parameters setting on high dimensional space and are relatively insensitive in low dimensional space, which indicates that it should be careful to select the parameters of the above NIOAs when using them in high dimensional problems. Specifically, DE and HS are the most sensitive to their parameters setting on high dimensional space.

NIOAs
(2) ABC, PSO and GA are sensitive to their parameters setting both on high and low dimensional spaces, and PSO is the most sensitive to their parameters setting.
According to the above observations, we draw the following preliminary conclusions: (1) The NIOAs, which have explicit learning strategy of solution update, can acquire better performance than the NIOAs with large randomness (for example, probability method) to learn from other individuals. For example, In GA, the progeny individual replaces the parent individual with a certain probability, while in DE, the individual is only updated when the new individual is better than the old one; the cuckoos in CS update their positions once the new solution generated by Lévy flights is better than the old one; while the individual in BA can be updated by the better one under the probability constraints rand 2 < A, which means that it may not be updated by the better individual; IA executes crossover and mutation operations through choosing antibodies with large activity degree, but the computation of large activity has some uncertainties, for example, the design of the threshold h 1 in Equation (12) and h 2 in Equation (13). It seems that the algorithms can achieve better performance, which continuously and randomly generate new solutions and firmly learn from excellent individuals.
(2) With the increase of dimensional number, all the NIOAs become more sensitive to their parameters setting, which indicates that it is more difficult to choose a suitable set of parameters for NIOAs on high dimensional problems, except for DE and CS, GSA and GWO, which perform better in high dimensional space than the other seven NIOAs.

The Efficiency Comparison and Analysis
For the sake of error elimination, we compute the average value of each iteration in 20 independent experiments and obtains the change curves of the global optimized fitness under 1500 and 15,000 iterations on 30 functions for the low and high dimensions, respectively. For the first group of parameters for compared NIOAs, when D equals 10, the convergent curves of 11 NIOAs on 30 BB functions are described in Figures S1-S30 of Supplementary Material F, and for the case of D =50, the corresponding curves are described in Figures S31-S60 of Supplementary Material G. For the second group of parameters, the corresponding convergent curves are described in Figure S61-S90 of Supplementary Material H and in Figures S91-S120 of Supplementary Material I, respectively. Based on these experimental results, the following observations can be made: (1) FA and HS have the worst optimization efficiency for most of the 30 functions both on the low and high dimensions, because they either evolve solutions through adopting complete random strategies (see Equations (43) and (44)), for example, HS or learn from other individuals in local topology and are perturbed by a random factor (see Equation (17)), such as FA.
(2) With the increasing of iterative times, the curves of most compared NIOAs trend to be stable, while those of ABC and DE are the oscillatory curves for most of the 30 functions both on the low and high dimensions; the amplitude and frequency of oscillations of ABC are greater than those of DE, and they are larger in high dimensions than low dimension for the two algorithms, but from the whole iteration period, the optimization results of ABC and DE are gradually improved.
(3) PSO, GSA and GWO have the fast convergent speed for most of the 30 functions both on the low and high dimensions, because all of them adopt the explicit strategy of learning from the global best solution; that is, the individuals in these NIOAs learn firmly from the global optimum which leads to the rapid convergence.

The Comparison of Running Time
The running time of the 11 NIOAs on 30 BBOB functions are summarized in Table S31 of Supplementary Material L. Based on these data, the following observations can be made: (1) DE and CS are the fastest algorithms on all the 30 functions for the dimension of 10 and 50, respectively. FA is the slowest algorithm both on the low and high dimensional spaces; its running time on the 30 functions is 1~2 orders of magnitude higher than the other 10 NIOAs. GSA has the second-worst running time for both dimensional spaces. The slowest running time of FA and GSA echoes their time complexity given in Table 3 of Section 3.3.
(2) PSO, GA, BAC, BA, GWO and HS are fast when D = 10, while in the high dimensional space, their running time is obviously longer than the low dimensional space. Thus, Entropy 2021, 23, 874 26 of 40 from the view of running time, the above algorithms are more suitable to low dimensional problems, DE and CS are suitable to the problems on both high and low dimensional spaces.
(3) For D = 10, the running time of FA is 20 times that of DE for almost all the functions; when D = 50, FA is 20 times slower than CS (the maximum is 35) for all the functions. Thus, the difference in running time for the NIOAs is very large, and hence it is important to select fast NIOAs for the optimization problems with the strict requirement of running time.

Statistical Tests for Algorithm Comparison
In this study, we consider two statistical tests: the Friedman test [137] and Nemenyi test [137]. A Friedman test is constructed to analyze the performance of the compared NIOAs. Table 8 provides the Friedman test statistics F F and the corresponding critical value in terms of each evaluation criterion. As shown in Table 8, the null hypothesis (that all of the compared algorithms will perform equivalently) was clearly rejected for each evaluation criterion at a significance level of α = 0.05 for the experimental results in both 10 and 50 dimensional spaces. Consequently, we proceed to conduction of a post hoc test [137] in order to analyze the relative performance among the compared NIOAs. The Nemenyi test [137] is used to test whether each of the NIOAs performed competitively against the other compared NIOAs in both the 10-and 50-dimensional spaces. In the test, two NIOAs are considered to have a significant difference in performance if their corresponding average ranks differ at least by the critical difference value given by 6N . For example, q α is equal to 3.219 at the significance level α = 0.05, and thus CD takes the value of 2.7563 (k = 11, N = 30). Figures 5 and 6 show the CD diagrams for each of the four evaluation criteria about the experimental results of the 10-dimensional space under the two groups of parameters. As CS obtains the best average rank on the 30 functions, CS is taken as the control algorithm. If any compared NIOA whose average rank is within one CD to that of CS, it is connected to CS with a red line, as described in Figures 5 and 6. The algorithms that are unconnected to CS are considered to have a significantly different performance between them. In Figure 5a WORST, for example, the average rank for CS was 1.6333, and the critical value would be 4.3896 by adding CD. Since GSA, BA, GA, HS, PSO, FA and IA obtained 5.6333, 5.8333, 6.9, 7.2667, 8.7, 9.2 and 10.5667 for their respective average rankings, they were significantly worse compared with CS. From Figures 5 and 6, we can see that CS and DE obtained the best average ranks on all four criteria, followed by ABC and GWO. CS and DE have obvious better performance with CS. From Figures 5 and 6, we can see that CS and DE obtained th on all four criteria, followed by ABC and GWO. CS and DE have ob mance than the other NIOAs. In other words, CS and DE obtained th the best stability in low dimensional space.    Figure 7 shows the experimental results of the CD diagrams fo evaluation criteria with the 50-dimensional space under the first group Figure 8 shows it under the second group of parameters. CS and DE s the high-dimensional space. Especially, under the first group of para the best average rank on the WORST, BEST and AVERAGE criteria, best average rank on the STD criterion, whereas CS ranked the first b the STD criterion. For the second group of parameters, CS ranked th rank on four criteria, while DE ranked the second on the WORST and  Figure 7 shows the experimental results of the CD diagrams for the four kinds of evaluation criteria with the 50-dimensional space under the first group of parameters, and Figure 8 shows it under the second group of parameters. CS and DE still perform well on the high-dimensional space. Especially, under the first group of parameters, DE obtains the best average rank on the WORST, BEST and AVERAGE criteria, ranked the second-best average rank on the STD criterion, whereas CS ranked the first best average rank on the STD criterion. For the second group of parameters, CS ranked the first best average rank on four criteria, while DE ranked the second on the WORST and STD criteria. the best average rank on the WORST, BEST and AVERA best average rank on the STD criterion, whereas CS rank the STD criterion. For the second group of parameters, C rank on four criteria, while DE ranked the second on the

Performance Comparison on Engineering Optimization Problem
In order to further compare the performance of the 11 NIOAs, we the constrained engineering optimization problem, for example, Te Spring Design. A spring is a kind of general mechanical part, which elastic deformation under load. The weight of the spring (such as a internal combustion engine cylinder and spring of various buffers) h on the normal operation of the relevant mechanical equipment. The d Figure 9, aims to minimize the weight of a tension/compression spri sign, the constraints include the minimum deflection, shear stress, su the limit of outer diameter.

Performance Comparison on Engineering Optimization Problem
In order to further compare the performance of the 11 NIOAs, we apply them to solve the constrained engineering optimization problem, for example, Tension/Compression Spring Design. A spring is a kind of general mechanical part, which can produce a large elastic deformation under load. The weight of the spring (such as a valve spring of an internal combustion engine cylinder and spring of various buffers) has a great influence on the normal operation of the relevant mechanical equipment. The design, as shown in Figure 9, aims to minimize the weight of a tension/compression spring [138]. In this design, the constraints include the minimum deflection, shear stress, surge frequency and the limit of outer diameter. on the normal operation of the relevant mechanical equipment. T Figure 9, aims to minimize the weight of a tension/compression sign, the constraints include the minimum deflection, shear stre the limit of outer diameter.  There are three designed variables: the average coil diameter x 1 , the wire diameter x 2 and the number of the active coils x 3 , which together define the following complex constraints: Not only the objective function but also the constraint conditions should be considered in solving such constrained optimization problems. The Penalty function method is one of the most commonly used constraint processing techniques, which transforms the constrained optimization problem into an unconstrained optimization problem according to the Penalty function to the original objective function. In this study, we adopt the dynamic Penalty function [139], defined as follows: Here, t is the current iterative times, C, α, β are three parameters and in general C = 1, α = 1, β = 2. We run each compared NIOA 20 times independently, and the iterative number of times is 1000. Table 9 gives the experimental results of the 11 compared NIOAs on the spring design problem. We can observe that all the 11 NIOAs have very close BEST values (between 0.012 and 0.013). The CS algorithm ranks first for all the four kinds of qualitative criteria: WORST, AVERAGE, BEST and STD. For the WORST criterion, CS, FA, DE, GSA and GWO achieve good results. The results of the engineering optimization problem indicate that all the 11 NIOAs obtain good results, and CS, FA, DE, GSA and GWO are better and more stable than the other six NIOAs.

Challenges and Future Directions
Indeed, how to improve the performance of NIOAs is a very complex problem, which is influenced comprehensively by the methods of parameter tuning, topology structure and learning strategy. In this study, we draw some preliminary conclusions in order to provide a referencing framework for the selection and improvement of NIOAs. In the past 30 years, a large number of meta-heuristic algorithms (more than 120 in our statistics) and their variants have been proposed in order to provide efficient and effective solutions to optimization problems in the field of AI. Although great progress has been made on the NIOAs, which have been widely and successfully applied to various application fields, challenging problems still exist, mainly reflected in the following four aspects.
1. The first one is the lack of sufficient research in fundamental theories and tools of NIOAs. From our observation, the challenges of the fundamental researches on NIOAs include the following four points.
(1) The biological or natural mechanisms imitated by the NIOAs are not yet fully clear. Most of the NIOAs are proposed by the experts of psychology or computer science and engineering, and close collaboration with biologists is extremely important in order to deeply understand and abstract such mechanisms and functions so that NIOAs can be reasonably and effectively integrated into nature, biology and the real environment.
(2) It is also necessary to lay a solid foundation of mathematical theories to support NIOAs. Such examples include a rigorous time complexity analysis and convergence proof, a deep analysis of topological structures of various NIOAs, a suitable and comprehensive theoretical explanation to balance the contradiction between easily falling into local optimum and slow convergence speed, and an in-depth analytic study of the methods of automatic parameters tuning in order to solve the parameter-dependence problem. Specifically, while working on classic fundamental works [140][141][142] with some achievements in time complexity analysis and convergence proof, the researchers give a list of future research directions, which we brief as follows: for topology analysis, it is indicated that the local neighborhood topology for some specific algorithm is more suitable for complex problems [143], and the investigation into the PSO paradigm finds that the effect of population topology interacted with the function is optimized [144]. Although these previous efforts have recommended population topologies, they still have not precisely identified the topological factors that may result in the best performance on a range of functions [144]. An automatic tuning process for parameters is usually computationally expensive, especially for real-world application problems; therefore, it is desirable to have a benchmark test that suits the NIOAs' tuning toolbox and is easy to use [145]. Due to the lack of a solid mathematical foundation, almost all the NIOAs are working under the black-box mode; thus, there are always researchers proposing so-called "novel" algorithms and declaring that their optimizers find better solutions than other NIOAs [136].
(3) The research is not sufficient on the problem extension of basic continuous NIOAs to different optimization problems, including COPs and MOOPs. The study here on different learning strategies and topological structures of more than 120 MHAs can provide diverse solutions to COPs and MOOPs. Actually, the current research of mathematical theories in the aforementioned (2) and problem extensions mainly focus on a few NIOAs, including GA, PSO and DE, so it is required to pursue further research to more NIOAs.
(4) Another problem is the visualization platforms of NIOAs research. From our observation, there are few discussions on this aspect except for an early simple attempt [146]. In addition, few benchmark tests suit specific optimization problems such as automatic parameter tuning [145]. Owing to the insufficient and inadequate theoretical investigation on the NIOAs, it becomes quite difficult to clearly distinguish the characteristics of different NIOAs (most of the algorithms look very similar) and this, per se, becomes another optimization problem of an optimal selection of the NIOAs for a given problem. This is also a motivation that we attempt to compare and analyze 11 common NIOAs theoretically and experimentally.
2. The second one is that NIOAs are less capable of solving continuous optimization problems in complex environments. The real environments are complicated, and the optimization problems can be high-dimensional, large-scale, multi-modal and multi-objective; the optimization environments can be dynamic, highly constrained and uncertain; the fitness evaluations may contain noises, be imprecise and time-consuming, and sometimes the fitness functions can be un-deterministic. The complexity of the real environments poses a great challenge to NIOAs. Although some efforts [147][148][149] have been made to solve the aforementioned problems, how to handle these issues is still a very difficult problem.
3. The third one is too few combinations of NIOAs with other related disciplines. NIOAs intrinsically have a parallel and distributed architecture, while less attention is paid to the combinations with parallel and distributed technologies, including GPU-based hardware, robot swarm and cloud platforms. A few works [150][151][152] focus on the above issues, and interdisciplinary research is a great potential for NIOAs.
4. The fourth one is that less effort has been made to apply NIOAs to various domain problem fields. Actually, on the one hand, it is impossible to have one single NIOA to adapt to all the application problems. On the other hand, a certain kind of NIOAs may be more effective for certain kinds of problems [134]. Existing enhanced methods of NIOAs (for example, a combination of different NIOAs) lack an in-depth and targeted discussion on the reason why the enhanced methods are adopted. Furthermore, various NIOAs have been adopted to handle the same application problem, but it is not clear why this NIOA was chosen (researchers just happened to use it).
Consequently, it is our belief that in the future, researchers should tackle the following three problems in the NIOAs. These three problems indicate three future research directions for the NIOAs study.
1. Strengthening solid theoretical analysis for the NIOAs. Some theoretical problems of NIOAs are only studied in specific NIOA (for example, PSO), such as the time complexity analysis, the convergence proof, topology analysis, the automatic parameter tuning, the mechanisms of the exploitation and exploration processes. There are still many problems to be solved in the existing research work [140][141][142], and the theoretical analysis of other NIOAs needs to be analyzed deeply. COPs and MOOPs should be further studied by extending and combining the various existing NIOAs. Furthermore, it is necessary to develop a visualization platform of deconstructing, modeling and simulation of the interactions of components in NIOAs, to make it convenient to study the mechanisms of self-organization, direct/indirect communication and the processes of intelligent emergence on various swarm systems and application cases. It is also necessary to establish a benchmark test suite and easy-to-use algorithm toolbox for different problems, for example, automatic parameter tuning and the aforementioned problems in complex environments.
2. Designing novel NIOAs to solve complicated optimization problems. Many real-world optimization problems are very complex, such as the multi-model and multiobjective problems, the constrained or uncertainty problems, the large-scale optimization problems, the optimization problems with noisy, imprecise or time-varying fitness evalua-tions. It is another important direction to design more targeted and effective NIOAs for the above problems.
3. Deep fusion with other related disciplines. In order to improve the performance of the current NIOAs, it is indispensable to combine the NIOAs with other related disciplines or directions, such as distributed and parallel computing, machine learning, quantum computation and robot engineering. More concretely, because NIOAs by nature possess the characteristics of distributed parallelism, it is more easily and natural for them to be implemented in distributed and parallel environments, such as cloud platforms and GPUbased hardware environments. Furthermore, for some large-scale optimization problems, the robot swarm can be a good solution that combines NIOAs and robot engineering. With the support from machine learning methods, NIOAs can become efficient to handle the multi-modal multi-objective optimization problems, and on the other way around, NIOAs can provide optimization support to machine learning tasks, such as the clustering problem and the association rules mining problem.
4. Combination with specific applications. It is necessary to design customized NIOA for specific application problems; the topological structure, learning strategy and method of parameters' selection of customized NIOAs may be suitable to a specific problem, which can acquire the good convergence speed and optimization performance. Existing applications rarely have targeted design of NIOAs; more of them use NIOAs directly or cannot explain the reason for algorithm design with specific problems.

Conclusions
Nature-Inspired Optimization Algorithms (NIOAs) can provide satisfactory solutions to the NP-hard problems, which are difficult and sometimes even impossible for traditional optimization methods to handle. Thus, the NIOAs have been widely applied to various fields both theoretically and in practice; examples including function optimization problems (convex, concave, high or low dimension and single peak or multiple peaks), combinatorial optimization problems (traveling salesman problem (TSP), knapsack problem, bin-packing problem, layout-optimization problem, graph-partitioning problem and production-scheduling problem), automatic control problems (control system optimization, robot structure and trajectory planning), image-processing problems (image recognition, restoration and edge-feature extraction), data-mining problems (feature selection, classification, association rules mining and clustering).
Many NIOAs and their variants have been proposed in the last 30 years. However, for the specific optimization problems, researchers tend to choose the NIOAs based on their narrow experiences or biased knowledge because there lacks an overall and systematic comparison and analysis study of these NIOAs. This study aims to bridge this gap; the contributions of this paper are fourfold. First, we summarize the uniform formal description for the NIOAs, analyze the similarities and differences among the 11 common NIOAs; second, we compare the performance of 11 NIOAs comprehensively, which can reflect the essential characteristics of each algorithm; third, we present a relatively comprehensive list of all the NIOAs so far, the first attempt to systematically summarize existing NIOAs, although it is very hard work; fourth, we comprehensively discuss the challenges and future directions of the whole NIOAs field, which can provide a reference for the further research of NIOAs. Actually, we are not aiming to find a super algorithm that can solve all problems in different fields once and for all (it is an impossible task). Instead, we propose a useful reference to help researchers to choose suitable algorithms more pertinently for different application scenarios in order to take a good advantage and make full use of the different NIOAs. We believe, with this survey work, that more novel-problem-oriented NIOAs will emerge in the future, and we hope that this work can be a good reference and handbook for the NIOAs innovation and applications.
Undoubtedly, it is necessary and meaningful to make a 34 comprehensive comparison of the common NIOAs, and we believe that more efforts are required to further this review in the future. First, the state-of-the-art variants of the 11 common NIOAs will Entropy 2021, 23, 874 35 of 40 be compared and analyzed comprehensively, discussing their convergence, topological structures, learning strategies, the method of parameter tuning and the application field. Second, there are more than 120 MHAs with various topological structures and learning strategies. For example, the recently proposed chicken swarm optimization (CSO) and spider monkey optimization (SMO) algorithms have a hierarchical topological structure and grouping/regrouping learning strategies. Thus, the comprehensive analysis of various topological structures and learning strategies of NIOAs is another future work.