Initialisation Approaches for Population-Based Metaheuristic Algorithms: A Comprehensive Review

: A situation where the set of initial solutions lies near the position of the true optimality (most favourable or desirable solution) by chance can increase the probability of ﬁnding the true optimality and signiﬁcantly reduce the search efforts. In optimisation problems, the location of the global optimum solution is unknown a priori, and initialisation is a stochastic process. In addition, the population size is equally important; if there are problems with high dimensions, a small population size may lie sparsely in unpromising regions, and may return suboptimal solutions with bias. In addition, the different distributions used as position vectors for the initial population may have different sampling emphasis; hence, different degrees of diversity. The initialisation control parameters of population-based metaheuristic algorithms play a signiﬁcant role in improving the performance of the algorithms. Researchers have identiﬁed this signiﬁcance, and they have put much effort into ﬁnding various distribution schemes that will enhance the diversity of the initial populations of the algorithms, and obtain the correct balance of the population size and number of iterations which will guarantee optimal solutions for a given problem set. Despite the afﬁrmation of the role initialisation plays, to our knowledge few studies or surveys have been conducted on this subject area. Therefore, this paper presents a comprehensive survey of different initialisation schemes to improve the quality of solutions obtained by most metaheuristic optimisers for a given problem set. Popular schemes used to improve the diversity of the population can be categorised into random numbers, quasirandom sequences, chaos theory, probability distributions, hybrids of other heuristic or metaheuristic algorithms, L é vy, and others. We discuss the different levels of success of these schemes and identify their limitations. Similarly, we identify gaps and present useful insights for future research directions. Finally, we present a comparison of the effect of population size, the maximum number of iterations, and ten (10) different initialisation methods on the performance of three (3) population-based metaheuristic optimizers: bat algorithm (BA), Grey Wolf Optimizer (GWO), and butterﬂy optimization algorithm (BOA).


Introduction
The primary concern of optimisation is finding either the minima or maxima of the objective function, subject to some given constraints. Optimisation problems naturally occur in machine learning, artificial intelligence, computer science, and operations research. Optimisation has been used to improve processes in all human endeavours. A wide variety of techniques for optimisation exist. These techniques include linear programming, quadratic programming, convex optimization, interior-point method, trust-region method, conjugate-gradient methods, evolutionary algorithms, heuristics, and metaheuristics [1]. The era of artificial intelligence ushered in techniques for optimisation that are capable of finding near-optimal solutions to challenging and complex real-world optimisation problems. Then came the nature-inspired and bio-inspired metaheuristic optimization era, with huge successes recorded and increasing popularity over the past four decades.
represent the current location of the population, the red asterisk (*) represents the current best solution, and the red star (★) denotes the global optimal solution of the Bukin function. The nAOA converged towards the optimal solution after a few iterations, as shown in Figure 3. Similarly, the nAOA was initialised with the random number, and the distribution of the population after the first iteration is shown in Figure 4. The distribution of the population of nAOA quickly falls into a local optimum after a few iterations, as shown in Figure 5.    represent the current location of the population, the red asterisk (*) represents the current best solution, and the red star (★) denotes the global optimal solution of the Bukin function. The nAOA converged towards the optimal solution after a few iterations, as shown in Figure 3. Similarly, the nAOA was initialised with the random number, and the distribution of the population after the first iteration is shown in Figure 4. The distribution of the population of nAOA quickly falls into a local optimum after a few iterations, as shown in Figure 5.     represent the current location of the population, the red asterisk (*) represents the current best solution, and the red star (★) denotes the global optimal solution of the Bukin function. The nAOA converged towards the optimal solution after a few iterations, as shown in Figure 3. Similarly, the nAOA was initialised with the random number, and the distribution of the population after the first iteration is shown in Figure 4. The distribution of the population of nAOA quickly falls into a local optimum after a few iterations, as shown in Figure 5.      Although initialisation plays a significant role in the performance of most metaheuristic optimizers, few studies or surveys have been conducted on the subject area. A search using the keywords survey OR review, initialisation (initialization), and metaheuristics, yielded no comprehensive review or survey articles in the literature. However, in discussing PSO variants, [11] provide a paragraph on attempts to improve PSO performance using different initialisation schemes. The authors discuss how low discrepancy sequences and variants of opposition-based learning enhance the initial swarm population. Another attempt using GA was presented by [12], where the effect of three initialisation functions, namely, nearest neighbour (NN), insertion (In), and Solomon's heuristic, were studied. Li, Liu, and Yang [13] evaluated the effect of 22 different probability distribution initialisation methods on the convergence and accuracy of five optimisation algorithms. In this regard, we formulate the research question given below to accomplish our work: What literature modified the initialisation control parameters comprising size and diversity of population and the maximum number of iterations to improve the algorithms' performance?
The following questions are formulated to answer the main research question: i. What research exists that used distributions other than the random number for initialisation of the population to improve the performance of metaheuristic algorithms?   Although initialisation plays a significant role in the performance of most metaheuristic optimizers, few studies or surveys have been conducted on the subject area. A search using the keywords survey OR review, initialisation (initialization), and metaheuristics, yielded no comprehensive review or survey articles in the literature. However, in discussing PSO variants, [11] provide a paragraph on attempts to improve PSO performance using different initialisation schemes. The authors discuss how low discrepancy sequences and variants of opposition-based learning enhance the initial swarm population. Another attempt using GA was presented by [12], where the effect of three initialisation functions, namely, nearest neighbour (NN), insertion (In), and Solomon's heuristic, were studied. Li, Liu, and Yang [13] evaluated the effect of 22 different probability distribution initialisation methods on the convergence and accuracy of five optimisation algorithms. In this regard, we formulate the research question given below to accomplish our work: What literature modified the initialisation control parameters comprising size and diversity of population and the maximum number of iterations to improve the algorithms' performance?
The following questions are formulated to answer the main research question: i. What research exists that used distributions other than the random number for initialisation of the population to improve the performance of metaheuristic algorithms? Although initialisation plays a significant role in the performance of most metaheuristic optimizers, few studies or surveys have been conducted on the subject area. A search using the keywords survey OR review, initialisation (initialization), and metaheuristics, yielded no comprehensive review or survey articles in the literature. However, in discussing PSO variants, ref. [11] provide a paragraph on attempts to improve PSO performance using different initialisation schemes. The authors discuss how low discrepancy sequences and variants of opposition-based learning enhance the initial swarm population. Another attempt using GA was presented by [12], where the effect of three initialisation functions, namely, nearest neighbour (NN), insertion (In), and Solomon's heuristic, were studied. Li, Liu, and Yang [13] evaluated the effect of 22 different probability distribution initialisation methods on the convergence and accuracy of five optimisation algorithms. In this regard, we formulate the research question given below to accomplish our work: What literature modified the initialisation control parameters comprising size and diversity of population and the maximum number of iterations to improve the algorithms' performance?
The following questions are formulated to answer the main research question: i. What research exists that used distributions other than the random number for initialisation of the population to improve the performance of metaheuristic algorithms? ii.
What study exists that fine-tuned the population size and the number of iterations of different algorithms? iii.
What are the major initialisation distributions used by the population-based algorithm? iv. What problems were solved by the modified algorithms? v.
What are other challenges yet to be explored by researchers in the research area?
To the best of our knowledge, no survey or review article focuses on general efforts to improve the performances of different metaheuristic optimizers using different initialisation schemes in the literature, which motivates the current research contribution. Therefore, this study presents a comprehensive survey of different initialisation methods employed by metaheuristic algorithm designers and optimisation enthusiasts to improve the performance of the different metaheuristic optimizers available in the literature. The study covers articles published between 2000-2021, and the specific contributions of this paper are summarised as follows: We present a comprehensive review of the different distributions used to improve the diversity of the initial population of population-based metaheuristic algorithms. ii.
We categorise the schemes into random numbers, quasirandom sequences, chaos theory, probability distributions, hybrids of other heuristic or metaheuristic algorithms, Lévy, and others. iii.
We also discuss the different levels of success of these schemes and identify their limitations. iv.
An in-depth highlight of the glossary of efforts to improve the performance of metaheuristic algorithms using several initialisation schemes is presented. Metaheuristic research enthusiasts can easily reference this glossary. v.
Finally, we provide the research gaps, useful insights, and future directions.
The rest of the paper is organised as follows. In Section 2, we provide the methodology used for collecting papers. The major initialisation methods used to improve the performance of the algorithms are presented in Section 3. In Section 4, we discuss the various application areas of the present study. Results and discussion of findings from our experiment are presented in Section 5. Finally, Section 6 presents the concluding remarks.

Methodology and Paper Collection Technique
This section discussed the procedure used for paper selection, collection, and review. Search keywords, search techniques, data sources, databases, plus inclusion and exclusion criteria are explained. We followed the systematic literature review procedure provided in the work of [14], and we were guided by the work of [15].

Keywords
In order to retrieve relevant articles to achieve our review goal, we carefully selected some useful keywords, that we used to search the database: initialization (initialisation), metaheuristic, optimization (optimisation), OR algorithm. The initial search for these articles was carried out between 20 to 24 September 2020, and the final search was carried out between 25 to 30 October 2021. Articles retrieved based on the keywords searched were perused during each search in order to collect more related articles from their citations and references sections.

Academic Databases
The keywords selected were used to search and retrieve relevant works from the body of literature. We targeted only articles that are published in reputable peer-review journals, edited books, and conference proceedings indexed in two (2) academic databases. The Web of Science (WoS) and Scopus repositories are the academic databases that we used to extract articles. These are repositories with high-quality articles that are published in SCI-indexed journals and ranked international conferences. We performed a search based on the above keywords in these repositories up to 2021.

Inclusion/Exclusion Criteria
We formulated some inclusion and exclusion criteria in order to collect solely relevant literature examples. The collected articles are either included or excluded based on some Appl. Sci. 2022, 12, 896 6 of 34 criteria after perusing their titles, abstracts, conclusions and, in some cases, the complete content. The selected criteria are given in Table 1.

Inclusion Exclusion
Articles that used different initialisation schemes to improve the performance of metaheuristic algorithm While we discussed the commonly used pseudo-random number initialization scheme, we excluded algorithms that used the scheme. Including these articles would mean reviewing the entire metaheuristic algorithms, which is outside the scope of this work Articles published in reputable peer-review journals, conference proceedings, and edited books Articles published as part of textbooks, abstracts, editorials, and keynote speeches Articles that are written in the English language Articles that are written in other languages besides English

Eligibility
We applied the inclusion and exclusion criteria to determine the eligibility of the selected articles. A total of 99 articles were returned by WoS and 58 articles were returned by Scopus repositories, respectively. Figure 6 shows the document type of distribution from the WoS repository, where 83 articles, 16 conference proceedings, five (5) early access, and one book chapter have been published. Similarly, Figure 7 shows the distribution of document types from Scopus, with 39 articles, 17 conference papers, one book, and one conference review, respectively. Both figures show that more articles are published in journals than in conferences and book chapters. Appl. Sci. 2022, 12, x FOR PEER REVIEW 6 of 31 we used to extract articles. These are repositories with high-quality articles that are published in SCI-indexed journals and ranked international conferences. We performed a search based on the above keywords in these repositories up to 2021.

Inclusion/Exclusion Criteria
We formulated some inclusion and exclusion criteria in order to collect solely relevant literature examples. The collected articles are either included or excluded based on some criteria after perusing their titles, abstracts, conclusions and, in some cases, the complete content. The selected criteria are given in Table 1.

Inclusion Exclusion
Articles that used different initialisation schemes to improve the performance of metaheuristic algorithm While we discussed the commonly used pseudo-random number initialization scheme, we excluded algorithms that used the scheme. Including these articles would mean reviewing the entire metaheuristic algorithms, which is outside the scope of this work Articles published in reputable peer-review journals, conference proceedings, and edited books Articles published as part of textbooks, abstracts, editorials, and keynote speeches Articles that are written in the English language Articles that are written in other languages besides English

Eligibility
We applied the inclusion and exclusion criteria to determine the eligibility of the selected articles. A total of 99 articles were returned by WoS and 58 articles were returned by Scopus repositories, respectively. Figure 6 shows the document type of distribution from the WoS repository, where 83 articles, 16 conference proceedings, five (5) early access, and one book chapter have been published. Similarly, Figure 7 shows the distribution of document types from Scopus, with 39 articles, 17 conference papers, one book, and one conference review, respectively. Both figures show that more articles are published in journals than in conferences and book chapters.    After cross-referencing the two repositories, we found many papers that intersect both, and we excluded these articles from the other repository. In addition, we found articles that were included in the search because they contained the keyword "initialization", but they did not relate to our research; hence, we also excluded them. A total of 52 articles were selected for this survey, after applying the inclusion criteria

Major Initialisation Methods
This section discusses the updated efforts on improving the initial condition of the population of metaheuristic algorithms. This provided an answer to our research question, what research examples exist that used distributions other than the random number for initialisation of the population, to improve the performance of metaheuristic algorithms? The different initialisation schemes identified in the literature were summarised or categorised into pseudo-random number or Monte Carlo methods, quasirandom methods, probability distributions, hybrid, chaos theory, Lévy, and ad hoc knowledge of the domain, and others. The categorisation was performed to aid our discussion of the schemes that we identified.

Pseudo-Random Number or Monte Carlo Methods
By default, the random number generation or Monte Carlo method is the most used initialisation scheme for most metaheuristic algorithms. It uses the uniform probability distribution to generate uniform pseudo-random number sequences that are used as location vectors for the population. Many population-based metaheuristic algorithms use this scheme, and interested readers can refer to the respective optimisers for details. The role of the random number generation, as an essential part of the initialisation process, has been greatly emphasised [16,17]. Despite its popularity, the random number sequence suffers because its discrepancy is not low and does not efficiently cover the search space [8]. The discrepancy of the random number greatly influences how genuinely random the resulting randomly generated solutions are within the solution search spaces [18]. Research works, such as those by [19,20], have shown that the random number does not result in an optimal discrepancy that will aid the convergence of the algorithms. Figure 8 shows how the random numbers tend to form clusters after several iterations, instead of filling up the search space. This is a significant disadvantage of using random number generators to initialise the population of the metaheuristic algorithms. After cross-referencing the two repositories, we found many papers that intersect both, and we excluded these articles from the other repository. In addition, we found articles that were included in the search because they contained the keyword "initialization", but they did not relate to our research; hence, we also excluded them. A total of 52 articles were selected for this survey, after applying the inclusion criteria.

Major Initialisation Methods
This section discusses the updated efforts on improving the initial condition of the population of metaheuristic algorithms. This provided an answer to our research question, what research examples exist that used distributions other than the random number for initialisation of the population, to improve the performance of metaheuristic algorithms? The different initialisation schemes identified in the literature were summarised or categorised into pseudo-random number or Monte Carlo methods, quasirandom methods, probability distributions, hybrid, chaos theory, Lévy, and ad hoc knowledge of the domain, and others. The categorisation was performed to aid our discussion of the schemes that we identified.

Pseudo-Random Number or Monte Carlo Methods
By default, the random number generation or Monte Carlo method is the most used initialisation scheme for most metaheuristic algorithms. It uses the uniform probability distribution to generate uniform pseudo-random number sequences that are used as location vectors for the population. Many population-based metaheuristic algorithms use this scheme, and interested readers can refer to the respective optimisers for details. The role of the random number generation, as an essential part of the initialisation process, has been greatly emphasised [16,17]. Despite its popularity, the random number sequence suffers because its discrepancy is not low and does not efficiently cover the search space [8]. The discrepancy of the random number greatly influences how genuinely random the resulting randomly generated solutions are within the solution search spaces [18]. Research works, such as those by [19,20], have shown that the random number does not result in an optimal discrepancy that will aid the convergence of the algorithms. Figure 8 shows how the random numbers tend to form clusters after several iterations, instead of filling up the search space. This is a significant disadvantage of using random number generators to initialise the population of the metaheuristic algorithms. Appl. Sci. 2022, 12, x FOR PEER REVIEW 8 of 31 We did not include a table for this category because most existing metaheuristic algorithms belong here. A table for this would be huge, and there is no area of application that this scheme has not been applied to.

Quasirandom Methods
Quasirandom number generators are known to generate sequences that are proven to have low discrepancy [9]. Low discrepancy sequences, like Van der Corput, Sobol, Faure, and Halton, are potent computational method tools, which have been used to improve the performance of optimisation algorithms. Quasirandom numbers are effective initialisation mechanisms for metaheuristic algorithms to uniformly cover the search space in order to obtain the optimal solution. The particle swarm population in the work of [21] was initialised using the randomized low discrepancy sequences of Halton, Sobol, and Faure. The three modified PSO were applied to the benchmark test functions, and results were then compared with the global best PSO. This showed that PSO was significantly improved with Sobol, while the results showed a varying improvement with Faure and Halton. Similarly, the Van der Corput and Sobol sequences were used to initialise the PSO and were then applied to solve the benchmark functions [8]. The results obtained were promising when compared to the original PSO.
The krill population in the KH algorithm was initialised using the Faure, Sobol, and Var der Corptut sequences [22]. The benchmark test functions were used to test the efficacy of the modified KH. Our findings revealed significant improvements in the performance of the KH algorithm when initialised using Faure, Sobol, and Var der Corptut lowdiscrepancy sequences, which was also the case with the guaranteed convergence particle swarm optimization (GCPSO) algorithm, using the Niching methods to initialise the swarm population [23]; the Niching methods are based on the Faure low-discrepancy sequence, and the benchmark test functions were used to evaluate the performance of GCPSO, with promising results.
The initialisation schemes that were implemented using low-discrepancy sequences are known to perform poorly, as the problem dimension or graph size scales up. Figure 9 shows how the Halton sequence spreads and fills the search space at the 1000th iteration, improving the convergence of algorithms. We have noted authors [24] who use the Halton sequence to initialise the search agents of the Wingsuit Flying Search (WFS) algorithm. We did not include a table for this category because most existing metaheuristic algorithms belong here. A table for this would be huge, and there is no area of application that this scheme has not been applied to.

Quasirandom Methods
Quasirandom number generators are known to generate sequences that are proven to have low discrepancy [9]. Low discrepancy sequences, like Van der Corput, Sobol, Faure, and Halton, are potent computational method tools, which have been used to improve the performance of optimisation algorithms. Quasirandom numbers are effective initialisation mechanisms for metaheuristic algorithms to uniformly cover the search space in order to obtain the optimal solution. The particle swarm population in the work of [21] was initialised using the randomized low discrepancy sequences of Halton, Sobol, and Faure. The three modified PSO were applied to the benchmark test functions, and results were then compared with the global best PSO. This showed that PSO was significantly improved with Sobol, while the results showed a varying improvement with Faure and Halton. Similarly, the Van der Corput and Sobol sequences were used to initialise the PSO and were then applied to solve the benchmark functions [8]. The results obtained were promising when compared to the original PSO.
The krill population in the KH algorithm was initialised using the Faure, Sobol, and Var der Corptut sequences [22]. The benchmark test functions were used to test the efficacy of the modified KH. Our findings revealed significant improvements in the performance of the KH algorithm when initialised using Faure, Sobol, and Var der Corptut low-discrepancy sequences, which was also the case with the guaranteed convergence particle swarm optimization (GCPSO) algorithm, using the Niching methods to initialise the swarm population [23]; the Niching methods are based on the Faure low-discrepancy sequence, and the benchmark test functions were used to evaluate the performance of GCPSO, with promising results.
The initialisation schemes that were implemented using low-discrepancy sequences are known to perform poorly, as the problem dimension or graph size scales up. Figure 9 shows how the Halton sequence spreads and fills the search space at the 1000th iteration, improving the convergence of algorithms. We have noted authors [24] who use the Halton sequence to initialise the search agents of the Wingsuit Flying Search (WFS) algorithm. Appl. Sci. 2022, 12, x FOR PEER REVIEW 9 of 31 Figure 9. Halton sequence after 1000th iteration. Table 2 summarises the glossary of efforts that used low discrepancy sequences (quasirandom numbers) to initialise the population of some metaheuristic optimizers. Interested readers can refer to the references for more details about the efforts. In all the papers reviewed in this section, the authors claimed that fine-tuning the initialisation control parameters (population size and diversity and maximum number of iterations or function evaluations) improved the performance of the algorithm.

Probability Distributions
The probability distribution describes the possible values and likelihood that a random number is effective within a defined interval. Different probability distributions and their rigorous statistical properties can be used to initialise the population of metaheuristic algorithms. Li, Liu, and Yang [13] used variants of Beta distribution, uniform distribution, normal distribution, logarithmic normal distribution, exponential distribution, Rayleigh distribution, Weibull distribution, and Latin hypercube sampling [31] to form 22 different initialisation schemes in order to evaluate PSO, CS, DE, ABC, and GA. The variants of the probability distributions are as follows: The Beta distribution is a continuous probability distribution over the interval (0,1). It can be written as ~ , .
Varying the values of resulted in a variant of the  Table 2 summarises the glossary of efforts that used low discrepancy sequences (quasirandom numbers) to initialise the population of some metaheuristic optimizers. Interested readers can refer to the references for more details about the efforts. In all the papers reviewed in this section, the authors claimed that fine-tuning the initialisation control parameters (population size and diversity and maximum number of iterations or function evaluations) improved the performance of the algorithm.

Probability Distributions
The probability distribution describes the possible values and likelihood that a random number is effective within a defined interval. Different probability distributions and their rigorous statistical properties can be used to initialise the population of metaheuristic algorithms. Li, Liu, and Yang [13] used variants of Beta distribution, uniform distribution, normal distribution, logarithmic normal distribution, exponential distribution, Rayleigh distribution, Weibull distribution, and Latin hypercube sampling [31] to form 22 different initialisation schemes in order to evaluate PSO, CS, DE, ABC, and GA. The variants of the probability distributions are as follows: The Beta distribution is a continuous probability distribution over the interval (0,1). It can be written as X ∼ Be(a, b). Varying the values of a and b resulted in a variant of the Beta distribution, generating sequences with different behaviours in the search space. Three variants of the Beta distribution were used, A uniform distribution is defined over the interval [a, b], and it is usually written as X ∼ U(a, b). One variant of the normal distribution was used.

•
Normal distribution The Gaussian Normal distribution is usually written as X ∼ N µ, σ 2 . In addition, varying the values of µ and σ 2 resulted in three (3) variants of the normal distribution, which generates sequences with different behaviours in the search space.

•
Logarithmic normal distribution The logarithmic normal distribution is often written as lnX ∼ N µ, σ 2 . Four (4) variants of the logarithmic normal distribution were created by varying the values of µ and σ 2 .

• Exponential distribution
An exponential distribution is asymmetric with a long tail and can be written as X ∼ exp(λ). Varying λ, resulted in three variants of the distribution which were used to initialise the population of the five algorithms. •

Rayleigh distribution
The Rayleigh distribution can be written as X ∼ Rayleigh(σ). Three (3) variants of the distribution were created by varying the value of σ.

•
Weibull distribution This distribution can be considered as a generalisation of a few other distributions. It can be written as X ∼ Weibull(λ, k). For example, k = 1 corresponds to an exponential distribution, while k = 2 leads to the Rayleigh distribution. In the same vein, three variants of the distribution were created.
The convergence and accuracy of five metaheuristic optimizers were evaluated on the benchmark test functions and the CEC2020 test functions. These optimisers are then initialised using 22 different initialization schemes [13]. The findings of those authors showed that PSO and CS are more sensitive to the initialisation scheme used, whereas DE was less susceptible to the initialisation scheme used. In addition, PSO relies on a greater population size, whereas CS requires a lesser population size. DE does well with an increased number of iterations. The Beta, Rayleigh, and exponential distributions are great performers as the results showed that they greatly influence the convergence of the optimisers used.
Georgioudakis, Lagaros, and Papadrakakis [31] incorporated Latin hypercube sampling (LHS) to initialise four (4) optimisers; namely, the evolution strategies (ES), covariance matrix adaptation (CMA), elitist covariance matrix adaptation (ECMA) and differential evolution (DE). They use these optimisers to investigate the relation between the geometry of the structural components, and their service life. They aimed to improve the service life of structural components under fatigue. Their choice of LHS instead of the random Monte Carlo simulation optimised the number of samples needed to calculate the problem regarding the formulation of the statistical quantities.
The stochastic fractal search (SFS) technique was used in the work of [32] to improve the performance of the multi-layer perceptron neural network. It was used to obtain the optimal set of weights and threshold parameters. The hybrid approach was tested on EEE 14-and 118-bus systems, and the results were compared with other non-optimized MLP (optimized MLP based on genetic algorithm (MLP-GA) and Particle Swarm Optimization (MLP-PSO)). The precision was up by 20-50%, and the computational time was down by 30-50%. However, SFS tends to ignore local search; the correct balance between the global and local search is desired. Similarly, the levy-flight was replaced by stochastic random sampling of simpler fat-tailed distributions enhanced with scaled-chaotic sequences to boost cuckoo search (CS) performance in solving the complex wellbore trajectories problem [33].
Probability distributions generally suffer from issues such as equiprobable disjunct intervals and errors in correlations between variables. We summarise efforts in this category in Table 3.

Hybrid with Other Metaheuristic Algorithms
Most researchers used another metaheuristic algorithm to find an optimal solution for the initial position of the population in this approach. Metaheuristic algorithms with a high convergence rate in a specific problem domain are often used to find an initial solution. These solutions are then fed into the other metaheuristic algorithms as the initial conditions. A hybridization of ABC and TS was proposed in the work of [39], where the bee population was initialised using the randomized breadth-first search. The performance of their hybrid was better than the algorithms they compared it with; however, it suffers from the time complexity problem of BFS. The authors [40] initialised the monarch butterfly algorithm by equally partitioning the search space and in the F and T random distribution to mutate the divided population. The results showed significant improvements. The Krill in the work of [41] were initialised using the pairwise linear optimisation, which uses fuzzy rules to create clusters that are used as the initial point for the KH. However, the results showed that this improvement would only suit systems based on fuzzy approximators. The CRO was improved using the VNS algorithm with a new processor selection model for the initialisation. The results are promising; however, parameter sensitivity still needs to be resolved [42].
The cuckoo population was initialised using quasi-opposition-based learning (QOBL) [43]. Reaching the optimal search is enhanced by considering a guess and its quasi-opposite guess. The initialisation schemes of BA are improved using a quasirandom sequence with low discrepancy called Torus [25]. Their results were good; however, the results were not evaluated for higher-dimensional problems. Four (4) different dispatching rules (DR)-based initialisation strategies were used by [44], with varying advantages and disadvantages. The best result was obtained when all of the strategies were used together, which means that the diversity of the population contributed less to the algorithm's overall performance. In [45], a scheme inspired by SAM was developed, and it is a simplified heuristic model that begins the swarm search with an initial set of high-quality solutions.
ABC was used to find the optimal cluster centre of the FCM [46]. An improved ABC was also proposed to solve the vehicle routing problem (VRP) [47]. Among other improvements, the bees were initialised using push forward insertion. An improved DE, named the enhanced differential evolution algorithm (EDE), used the opposition-based learning for the initialisation, along with other improvements, in order to enhance the performance of DE [48]. The optimised stream clustering algorithm (OpStream) used an optimal solution of a metaheuristic algorithm to initialise the first set of the cluster [49]. The optimal solution of the optimal shortening of covering arrays (OSCAR) problem was used as the initialisation function of a metaheuristic algorithm [50].
Mandal, Chatterjee, and Maitra [51] used the PSO to solve the problem that hampered the Chan and Vese algorithm for image segmentation problems, which is low-performance if the contours are not well initialised; the contours are initialised simultaneously with the population. Their hybrid solution made contour initialisation irrelevant to the performance of the algorithm. Another effort was presented by [52], where a scheme to initialise the fuzzy c-means (FCM) clustering algorithm using the PSO was proposed. Finding the optimal cluster centres was set as the objective function of the PSO.
A memetic algorithm that uses the greedy randomized adaptive search procedure (GRASP) metaheuristic and path relinking to initialise and mutate the population was proposed [53]. However, the scalability of the MA was untested. The authors [54] proposed an initialisation scheme that used both the Metropolis-Hastings (MH) and function domain contraction technique (FDCT). MH is helpful when generating the direct sequence of a PD that is difficult. However, MH is best for high multidimensional complex optimisations, as these are problem-dependent. In such a situation, the FDCT is then employed. The FDCT is a sequential three-step solution starting with a random solution generator; and if this is not feasible, then the GBEST PSO generator is applied. If the previous two fail, then the search space reduction technique (SSRT) is applied. These steps ensure that the initialised population leads to a better solution.
Competitive swarm optimizer (CSO) is a variant of PSO used by [55] to improve the extreme learning machines (ELM) network by depending on the individual competition of the particles, which optimise its weights and structure. Although the results show great promise, it took more training time to generate effective models. Sawant, Prabukumar, and Samiappan [56] evaluated an approach to initialise the cuckoo nest based on the correlation between the spectral band of the nest that was proposed. The goal is to ensure convergence by making sure the location of the nest does not repeat. The k-means clustering algorithm is used to select specific clusters on the band based on their correlation coefficient. Another approach is presented to resolve the lack of diversity of PSO and its sensitivity to initialisation, which quickly leads to premature convergence. The crown jewel defence (CJD) is used to escape being stocked in the local optima by relocating and reinitialising the global and local best position. However, the performance of this improvement is not tested in higher dimensions [57].
The DE and local search were combined to improve or enhance the chances of an optimal solution to the hybrid flow-shop scheduling problem [58]. The brainstorm optimisation (BSO) was improved in the work of [59] by implementing a scheme that allows for a reinitialisation scheme to be triggered, based on the current population. In the work of [60], those authors used FA to detect the maxima and number of image clusters through a histogram-based segmentation; the maxima are then used to initialise the parameter estimates of the Gaussian mixture model (GMM). In the work [61], the authors proposed a scheme that enhances the initial conditions of an algorithm by considering these initial conditions to be a sub-optimisation problem where the initial conditions are the parameters to be optimised by the MLA. Their obtained results showed improvements compared to the other algorithms used. The FA was also used in the work of [62] as an optimiser to obtain the initial location of the translation parameters for WNNs. This led to a reduction in the number of hidden nodes of WNN and significantly increased the simultaneous generalisation capability of WNNs.
However, time and computational complexity may be a problem for this approach. In addition, a lack of a proven way to hybridise these algorithms greatly depends on the experience of the researcher. A summary of research efforts in this category is given in Table 4.

Chaos Theory
Chaos theory describes the unpredictability of systems, and over the years, many advances have been made in this area. Chaotic sequences follow these properties: sensitive to initial conditions, ergodicity, and randomicity. This type of sequence has the advantages of introducing chaos or unpredictability into the optimisation, increasing the range of chaotic motion, and using these chaotic induced variables to search the space effectively [69].
Using the logistic chaotic function, ref. [70] proposed novel improvements on the CS, and one of these improvements is the use of the logistic chaotic function to initialise the population. While their results are promising, they suffer from high computational complexity. The same scheme was used in the work of [71] to improve BA, where the bat population was initialised using chaotic sequences, instead of the random number generator. In addition, the bacterial population of BFO was initialised using chaotic sequences that were generated using logistic mapping [72]. Similarly, the butterflies in the work of [73] were initialised using the homogenous chaotic sequence which were adapted to the ultraviolet changes. Among other improvements proposed in the work of [74], the chaotic initialisation strategy was used to initialise the whales in the multi-strategy ensemble whale optimization algorithm (MSWOA).

Ad Hoc Knowledge of the Domain
In the ad hoc knowledge of the domain approach, the authors used background knowledge of the domain to design the initialisation scheme of an algorithm. The nature of the problem is what influences the diversity and spread of the initial population. The scheme proposed in the work of [86] used this scheme to generate initial solutions, serving as the initial point for the metaheuristic method. Their results were better and, in some cases, competitive; however, we believe that this method is excessively problem-dependent as such a generalisation is impossible. In the same vein, ref. [87] proposed the initialisation of the bats method, based on ad hoc knowledge of the PV domain. Precisely, they used the peaks with similar duty ratios that occur at the power versus duty ratio of the boost converter curve. Yao et al. [88] used the objective function to minimise the wear and tear of the actuators when initialising the population.
The clans in EHO [89] were initialised by considering the acoustic decay model that is used to obtain the distance between the sensor and the noise source. Depending on the noise level, the intersection of the source coordinates will be at the radii, which is less likely to be single. The clans are initialised, while being based at the centre of the intersection. The technique suffers from being problem-dependent and requires much adaptation before being used in other domains. Finally, a scheme to help PSO avoid reinitialisation to capture the global peaks, when PSO changes its position and value in the P-V curve, was developed by [90]. Particles are sent to areas of anticipated peaks; once located, particles are sent there to cater for them. Table 6 gives a summary of this approach.

Lévy Flights
A two-way approach to improving the initialisation scheme for the bees algorithm was also proposed [93]. The patch environment and levy motion imitate the natural food environment and the foraging motion of the bees, respectively. Although the patch concept is used in the original Bees algorithm for the neighbourhood search, its use for initialisation and the levy motion greatly improved its performance. In addition, the performance of the GWO algorithm [94] was enhanced using the Lévy flight (LF) and greedy selection. An improved modified GWO algorithm is proposed to solve global or real-world optimisation problems. In order to boost the efficacy of GWO, strategies are integrated with the modified hunting phases. However, no test was carried out on a specific optimisation domain; hence, no comparison was made. A glossary of efforts on the use of this approach is given in Table 7, and authors claimed superiority of their results over other algorithms.

Others
Other approaches to improve the diversity, spread, and optimality of the initial population of metaheuristic algorithms exist in the literature. This category includes approaches that used mathematical and statistical functions to aid the initial population in an exhaustive search.
A nonlinear simplex method was used to initialise the swarms [102]. Their results showed that the particles gravitated better towards the excellent quality solutions. An approach where a particle is placed in the centre, and the rest of the particles are spread around it in the search space was considered by [103]. Their result is promising; however, it is not entirely without bias. The use of complex-valued encoding for metaheuristic optimization research is gaining attention from researchers. A comprehensive and extensive overview of this approach is presented in [104].
The complex-valued encoding metaheuristic algorithms have been applied significantly in function optimization, engineering optimization design, and combinatorial optimization. The regular metaheuristic algorithms are based on continuous or discrete encoding. The advantage of the complex-valued encoding metaheuristic algorithm is that it expands the search region and efficiently avoids falling into the local minimum. Finally, eight metaheuristic algorithms were enhanced using the complex-valued encoding, and they were tested using 29 benchmark test functions and five engineering optimisation design problems. The superiority of complex-valued encoding was proved by analysing and comparing the results with statistical significance, and the complex-valued encoding metaheuristic algorithm returned the best performances. We present a summary of what authors have done in this category in Table 8.

Areas of Application
Much of the research that improved the performance of metaheuristic algorithms, by improving the nature and diversity of the initial population of the algorithms, have been applied in different areas of human endeavour, with significant successes recorded. Figure 10 gives the various application areas of the articles that were found in the literature. Appl Figure 10 shows the computer science subcategory as having the highest number of publications, and this can be attributed to the fact that the optimisation problems naturally occur in this area, with over 43 articles published in this area in journals indexed in Scopus and over 60 articles published in journals indexed in WoS. This means that the vast majority of these improvements are applied to solve optimisation problems in computer science, particularly in the area of artificial intelligence. The area of artificial intelligence alone has about 40 articles that are indexed in WoS; this is the most researched area in computer science. The most cited paper in this area is that of [67], who proposed a hybrid of differential evolution and greedy algorithm to exploit the advantages of both methods to improve initialisation, among other improvements. This was used in solving the multiskill resource-constrained project scheduling problem, and it has been cited 30 times. The hybrid of metaheuristic algorithms is the most common initialisation approach used, apart from the random number generator. The chaos theory and low discrepancy sequences are also popular in this area of application.    Figure 10 shows the computer science subcategory as having the highest number of publications, and this can be attributed to the fact that the optimisation problems naturally occur in this area, with over 43 articles published in this area in journals indexed in Scopus and over 60 articles published in journals indexed in WoS. This means that the vast majority of these improvements are applied to solve optimisation problems in computer science, particularly in the area of artificial intelligence. The area of artificial intelligence alone has about 40 articles that are indexed in WoS; this is the most researched area in computer science. The most cited paper in this area is that of [67], who proposed a hybrid of differential evolution and greedy algorithm to exploit the advantages of both methods to improve initialisation, among other improvements. This was used in solving the multi-skill resource-constrained project scheduling problem, and it has been cited 30 times. The hybrid of metaheuristic algorithms is the most common initialisation approach used, apart from the random number generator. The chaos theory and low discrepancy sequences are also popular in this area of application.

Engineering
Optimisation problems naturally occur in engineering with many metaheuristic algorithms being used to solve problems in this domain; further, this area has the second-highest number of publications. WoS subdivided this category into electrical, electronic, multidisciplinary, industrial, manufacturing, telecommunication, and mechanical categories, whereas Scopus combined them into one category. The sub-area of electrical electronics is the most researched area, with over 25 articles indexed in WoS. A total of 12 articles are indexed in Scopus and over 30 articles in WoS. The most cited article in this category is by [65], in which the authors developed a multi-objective evolutionary algorithm (MOEA)-based proactive-reactive method. This introduced a stability objective, and heuristic initialisation strategies used for the initial solution, and the decision-making approach are also validated. The article was cited 105 times. The area of telecommunications is also well researched and it has nine articles indexed in the repository.

Mathematics
The area of mathematics has provided the foundation for optimisation techniques used by metaheuristic algorithms. Over 28 articles indexed in Scopus are related to this area, and WoS further divided this category into multidisciplinary, computational biology, interdisciplinary, and applied mathematics. This area intersects with engineering and computer science, and many articles in this category are also classified under these other categories.

Others
We categorise all the areas with five publications into the category: others. This category comprises automation control systems, remote sensing, robotics, acoustics, chemistry, environmental sciences, management, transportation science technology, energy, neuroscience, and social sciences. Clearly, we see that 90% of articles published in this subject area and indexed in WoS or Scopus are primarily applied or solved problems in the area of computer science and engineering. Great diversity in the application areas can also be alluded to, as can be seen in pockets of research that fall under other categories with few publications.

Experimental Setup
This section presents the three (3) different metaheuristic algorithms and ten (10) initialisation schemes used in our work. The choice of these algorithms and initialisation methods is based on their performances in solving optimisation problems, the availability of codes online, and part of many other algorithms and initialisation methods we are using in our current research projects. Table 9 summarises these algorithms, including the year the article was first published, the authors, and the application area of the first publication. Tables 10 and 11 summarise the ten initialisation schemes and the control parameters of the algorithms as were used for the experiments, respectively.  Latin hypercube sampling lhsdesign 10 Sobol Sobol Table 11. Algorithm-specific parameters. The variation of the population size and number of iterations are as given in Table 12. The variation is such that a large population size goes with a small number of iterations and vice versa. We also included situations where the two are relatively even. Table 12. Population size and number of iterations. Population size  10  20  30  50  100  300  500  1000   Number of iterations  1000  900  800  600  500  300  100  10 We also conducted a series of experiments to evaluate the effect of the initialisation schemes presented in Table 10 on the three metaheuristic algorithms. We carried out the experiments on ten classical test functions, namely: sphere, quartic, Zakharov, Schwefel 1.2, Booth, Michalewicz, Rastrigin, Rosenbrock, Griewank, and Ackley, consisting of a wide variety of separable, unimodal, non-separable multimodal, numbers of local optima, and multi-dimensional problems. F1 and F2 are unimodal and separable benchmark functions with dimension (D) set at 30. Additionally, F3 and F4 have dimensions set at 30D and are unimodal and non-separable benchmark functions. Similarly, F5, F6, and F7 are multimodal and separable benchmark functions with dimensions set at 2D, 10D, and 30D, respectively. The multimodal and non-separable benchmark functions are F8, F9, and F10, with dimensions set to 30D.

Initialization Parameters Values
All algorithms were implemented in MATLAB R2019a, and the experiments were conducted using Windows 10 OS, Intel Core i7-8550U CPU, 16G RAM. The number of maximum iterations is set at 1000, and the number of independent runs is set at 20. We round up any solution value less than 10 −8 to zero, and the results are reported using the following performance indicators: best, worst, mean, standard deviation, and the algorithm mean runtime. We then statistically compared the results from the experiments using Friedman's test and post-hoc analysis, based on the Wilcoxon signed ranks test.

Results and Discussion
The experiment results on the effect of population size and the maximum number of iterations on the metaheuristic algorithms considered are presented in Tables 13-15. The  Friedman test results for all of the results are given in Table 16. This showed a statistically significant difference in the effect of population size and number of iterations for all algorithms tested. The chi-square and p-value as shown in Table 16, and all the p-values are less than the tolerance level of 0.05. Post hoc analysis with Wilcoxon signed-rank tests was conducted with a Bonferroni correction applied, resulting in a significance level set at p < 0.001.
The test results for BA are shown in Table 13. We noted that the best results are returned when the population size is 1000, the number of iterations is 10, and it has the lowest mean rank, as shown in the corresponding column in Table 16. Further post hoc results confirmed that a significant difference occurred between this comparison. The implication is that BA performed better with larger population sizes. Similarly, the results for GWO are given in Table 14, and it can be seen that GWO failed to return the optimal solution for Rosenbrock. However, it performed optimally when the population size was 50, the number of iterations was 600, and the lowest mean rank was recorded in this category. A further post hoc test confirmed that GWO performed better when the number of iterations was greater. The results for BOA are presented in Table 15, and we noted that excellent results are returned for small population sizes. The least mean rank is returned when the population size is 30, and the number of iterations is 800. The post hoc test confirmed that BOA performs optimally for a greater number of iterations.  The results of experiments that were conducted to show the effect of 10 different initialisation schemes on the algorithms are presented, and the findings in the experiments are discussed. The best, worst, mean, standard deviation, and the mean runtime results obtained from the experiments are shown in Tables 17-19. It can be seen from the results, that the ten different initialisation schemes have a different effect on the performance of the algorithms. For some functions, the results are better than others. For some functions, the results appeared to be inconsistent because, while the best value is accurate, the mean value seemed to be inaccurate. The inconsistency could mean that the initial population is close to the global optimum when the best value was returned. It could also mean that the diversity is best suited to the function, hence, its ability to yield a good result. In other cases, more iterations might be needed, or a different diverse population might be used to achieve the desired result. Asymp. Sig. 0.000 0.000 0.000              MeanRunTimes 6.49E+00 6.47E+00 6.53E+00 6.53E+00 6.52E+00 6.49E+00 6.42E+00 6.45E+00 6.50E+00 6.48E+00       The results for experiments conducted on BA are given in Table 17. The betarnd(3,2), betarnd(2.5,2.5), raylrnd(0.4), and sobol outperformed the rand for most functions. To obtain the general effect of the initialisation schemes on BA, we treated the ten initialisation schemes as observations for the Friedman's test, and the summary is given in the corresponding column in Table 20. The p-value is 0.000, which is less than α = 0.05, and hence we rejected the hypothesis. This means that the performance of BA is sensitive to the initialisation schemes. After a post hoc test based on the Wilcoxon signed ranks test of all the initialisation schemes using a Bonferroni correction with a significance level set at p < 0.001, the betarnd(2.5,2.5) returned the lowest mean and is ranked first and, therefore, we recommended betarnd(2.5,2.5) for BA. The result for BOA is shown in Table 18, and it showed that the betarnd(3,2) and unifrnd(0,1) are the best performing initialisation schemes. As shown in Table 20, BOA has a p-value of 0.050, which is equal to α = 0.05, hence we retained the hypothesis. This means that BOA is not sensitive to the initialisation schemes. Similarly, the results for GWO (Table 19) showed that lognrnd(0,0.5) and betarnd (3,2) are the best performing initialisation schemes. The Friedman's test result showed that the p-value is 0.287, which is greater than α = 0.05, hence we retained the hypothesis. This means that BOA is not sensitive to the initialisation schemes.

Conclusions
So many works exist in the literature that clearly outline the nature of the role of the initial population in the overall performance of metaheuristic algorithms. However, despite the role that initialisation plays and the efforts put forward by researchers in this research area, to our knowledge, no comprehensive survey of articles on the subject area exists. Therefore, the present study presents a comprehensive survey of different approaches to improving performances of metaheuristic optimizers, using their initialisation scheme. We also show the publication trends for research in this area, and the number of citations. Finally, we provided a glossary of efforts that have been made to improve the performance of metaheuristic algorithms using their initialisation scheme. We also include the areas of application of these improvements for easy reference by metaheuristic research enthusiasts.
The number of articles published to date in the repositories that were discussed earlier showed that the area which focuses on the initialisation of the population of metaheuristic algorithms is relatively uncharted. Many of these metaheuristic algorithms have been proposed; however, less effort has been made regarding their initialisation scheme. Most researchers opt for the commonly used random number generator whose disadvantages have been significantly studied. The ease of implementation of the random number generator may have contributed to its use by researchers. On the one hand, the hybridisation of metaheuristic algorithms has yielded great results in the literature. Authors have had a great degree of success in using different initialisation schemes for the algorithms. We see a promising avenue whereby researchers can explore these high-performing initialisation schemes to assess their efficacy. The size of the population and the iteration number can be varied along with these schemes. This can help in increasing the performance of the algorithms.
Our experiments demonstrate that for the classical functions under consideration, BA is sensitive to the initialisation schemes, whereas GWO and BOA are not. The sensitivity of the algorithms is also problem-dependent, meaning that some functions were insensitive to the initialisation scheme. The population size and number of iterations play a role in the performance of the algorithms. We discovered that BA performed better with larger population sizes. GWO and BOA performed better when the number of iterations was greater. This conclusion is heavily dependent on the dimension problem; however, we believe that good population diversity and number of iterations will most likely lead to optimal solutions.
We also identified the need for an initialisation method for these algorithms that are best suited to the specific problem domain with statistical backing to yield an optimal solution for that set of problems. Unfortunately, most papers on meta-heuristics usually perform very little statistical validation, and if they do it is only on a single problem that the researchers describe. Benchmarking meta-heuristics with systematic and sound statistical techniques is usually lacking from many published works in the literature. In addition, a tuning/adaptive scheme could be developed, and this scheme should be capable of choosing an initialisation method from a suite of initialisation schemes that will lead to better solutions, depending on the nature of the problem encountered. This approach will also lead to the diversity of the population.