Studying the Impact of Initialization for Population-Based Algorithms with Low-Discrepancy Sequences

: To solve different kinds of optimization challenges, meta-heuristic algorithms have been extensively used. Population initialization plays a prominent role in meta-heuristic algorithms for the problem of optimization. These algorithms can affect convergence to identify a robust optimum solution. To investigate the effectiveness of diversity, many scholars have a focus on the reliability and quality of meta-heuristic algorithms for enhancement. To initialize the population in the search space, this dissertation proposes three new low discrepancy sequences for population initialization instead of uniform distribution called the WELL sequence, Knuth sequence, and Torus sequence. This paper also introduces a detailed survey of the different initialization methods of PSO and DE based on quasi-random sequence families such as the Sobol sequence, Halton sequence, and uniform random distribution. For well-known benchmark test problems and learning of artiﬁcial neural network, the proposed methods for PSO (TO-PSO, KN-PSO, and WE-PSO), BA (BA-TO, BA-WE, and BA-KN), and DE (DE-TO, DE-WE, and DE-KN) have been evaluated. The synthesis of our strategies demonstrates promising success over uniform random numbers using low discrepancy sequences. The experimental ﬁndings indicate that the initialization based on low discrepancy sequences is exceptionally stronger than the uniform random number. Furthermore, our work outlines the profound effects on convergence and heterogeneity of the proposed methodology. It is expected that a comparative simulation survey of the low discrepancy sequence would be beneﬁcial for the investigator to analyze the meta-heuristic algorithms in detail.


Introduction
The term 'optimization' refers to the best solution for a problem with minimum cost in aspect of memory, time, and resources. Sometimes processing time is fast but it may be using a lot of memory while sometimes the processing speed and memory both work fine but the accuracy may get affected. Optimization targets the best solution of any problem [1]. The solution is considered to be the best solution if it is satisfactory in terms of processing speed, resource utilization, and accuracy of the result [2]. Optimization algorithms are utilized to determine the problems of local and global search. A typical target behind the utilization of these optimization algorithms is to discover the optima for contribution as indicated by known inputs model that describes the problem which is to be solved [3]. Optimization algorithms have turned out to be the most generally adopted algorithms that are operational in all application areas, like, enterprises, sports, medical, agriculture, and finance [4].
Evolutionary algorithms (EAs) have been introduced and strongly employed in the different field of science and engineering tracks [5]. EAs have been broadly utilized to determine optimization problems of maximization and minimization to find best optimal value. Rather than ordinary strategies dependent on mathematical programming or formal rationales, EAs are observed to be all the more dominant and adaptable [6]. Despite this, in solving the complex optimization problems, EAs faces the problem of local optima in which the computation to be caught in nearby local optima and a resist the convergence speed, for example, complex nonlinear problems [7]. To enhance the performance of EAs and to avoid premature convergence, there is a need to develop new variants of evolutionary algorithms. Furthermore, dependent on genetic evolution procedures, several researchers have been given the task to improve existing EAs or developing new EAs. The most generally used EAs algorithms involve the genetic algorithm (GA) [8] and differential evolution [9]. DE is recognized as a simplistic yet strong evolutionary algorithm that has been utilized to tackle different hard optimization problems in various science and engineering disciplines [10]. As another component of an EA, the DE algorithm yields a comparative structure [11] by EA, which incorporates three essential basic genetic operators, i.e., mutation, crossover, and selection. These genetic operators contribute major roles to the performance of the DE [12].
The intelligent attitude of non-intelligent species like ants (going for searching food) or birds (during flying in flocks) or fish in school is termed as swarm intelligence (SI). SI inspired by the experience of ants, bees, birds, and fishes to fulfill their goals as a swarm [13]. If every member of a swarm in SI works individually without social interaction, it will become complicated to achieve their goals due to their individuality which results in lack of intelligence. However, when they cooperate with each other, their social interaction is improved, and they also interact with the environment which makes it easier for them to accomplish difficult tasks [14]. SI-based algorithms are ant colony optimization (ACO), bat algorithm (BA) [15][16][17][18], and particle swarm optimization (PSO) [19]. PSO [20] has pulled in much consideration because of its simplicity of execution and strong search abilities. It is impelled by the social foraging fashions of fish and birds that seek for food in the form of groups.
The major issue with these meta-heuristic algorithms while applying those complex numerical optimization problems is premature convergence [21]. Regardless of the nature of the non-linear problem, this issue is confronted while running a heuristic algorithm, for example, meta-heuristic algorithms like PSO [22] and DE get stuck in the local optima after little number of epochs. The population convergence fails to produce a new population of the swarm due to inappropriate animalization strategies to explore the whole search space [23]. In the field of evolutionary computing, the performance of meta-heuristics algorithms is affected by the generation of random numbers while initializing the population into the multidimensional search space [24]. The meta-heuristics algorithm tends to reach the optimum value while solving the problems in low dimensional search space. However, the performance is supposed to be insignificant when the dimensionality of the problem is high, and this causes the particles to stick in the local optima [25]. Metaheuristics population-based algorithm initialization can be performed by using chaotic initialization [26][27][28], opposition-based initialization, and quasi-random sequences. This paper presents the impact of quasi-random sequences for the initialization of the population of the meta-heuristics algorithm.
In accordance with the optimization problem, population initialization plays a significant role in meta-heuristic algorithms. These algorithms can influence diversity, convergence, and also help to find an efficiently optimal solution. Particularly, recognizing the importance of diversity, several researchers have worked on performance for the improvement of meta-heuristic algorithms. In order to improve the convergence, rather applying the random distribution for initialization, quasi-random sequences are more useful to initialize the population [29].
Quasi-random sequences suffer from many issues while solving the problems of different dimensionality in real world [30]. Some of the sequences of quasi-random sequences give better results on large dimensions and vice versa [31,32]. Our objective is to find the most suitable quasi-random sequence for meta-heuristic algorithms, which gives superior results without considering the dimensionality problem.
Considering this fact, we have proposed three novel pseudo-random initialization strategies called WELL sequence, Knuth sequence [33], and Torus sequence to initialize the population in the search space. We initialized PSO, BA, and DE algorithm with these proposed pseudo-random strategies (WELL sequence, Knuth sequence, and Torus sequence). In our first contribution, we have compared the novel PSO technique with the simple random distribution [34] and family of low discrepancy sequences [35] on several unimodal and multi modals complex benchmark functions and training of the artificial neural network [36]. The experimental results have shown that PSO with Knuth-based initialization (KN-PSO) outperforms the other traditional PSO, PSO with Sobol-based initialization (SO-PSO), PSO with Halton-based initialization (H-PSO), PSO with Torus-based initialization (TO-PSO), and PSO with WELL-based initialization (WE-PSO) [37]. Similarly, in the second contribution, DE is initialized by these proposed pseudo-random strategies (WELL sequence, Knuth [38][39][40].
The rest of the paper is organized as: Section 2 overviews the previous work. In Section 3, the different algorithm methodology is represented with six initialization strategies. Section 4 contains the experimental setup. In Section 5, the results and discussion about the implementation and comparison of algorithms using initialization techniques on sixteen benchmark tests functions are presented. Section 6 presents the comparison of PSO, BA, and DE regarding data classification. Lastly, Section 7 concludes the paper.

Previous Work
Many research studies proposed different variants based on initialization techniques and we have discussed some of them in detail in this chapter. Initialization of the swarm in a good way helps the PSO to search more efficiently [41]. In this work, the initialization of swarm with nonlinear simplex method (NSM) has been done. NSM requires only function evaluations without any derivatives for computation. NSM starts with initial simplex and produces sequence of steps moving the highest function value vertex in opposite direction of the lowest one. They initialized the particle with the initial simplex in the D dimensional search areas, where D + 1 vertices of the simplex are D + 1 particle of the swarm and the MSN method is applied for N-D + 1 steps for N size swarm. In this way, each particle in the swarm has the information of the region. In the last, they compared their results with simple PSO and found significant improvement. This variant was introduced by Mark Richards and Dan Ventura in 2004.
In their work [42], they proposed to use centroidal Voronoi tessellations for initializing the swarm. Voronoi tessellations is a technique of partitioning any region into compartments, each partition contains group of generators. Each partition is associated with one generator and it consists of all the particles closer to that generator. In the same way, the generators are selected for the initial position of the particle. In this way, they initialized the particle swarm optimization algorithm. They compared it with basic SPO on many benchmark functions and found improved performance in high-dimensional spaces.
Halton sampling was introduced by Nguyen Xuan Hoai, Nguyen Quang Uy, and R.I. McKay in 2007 [43]. Halton sequence is a low discrepancy deterministic sequence used to generate point in space. Halton sequence is not a fully random. To randomize, X.Wang and F.J. Hickernell proposed a new function called randomize Haltom sequence by using von Neuman-Kakutani transformation. They used this sequence to initialize the global best of the PSO. They performed a test on various benchmark functions and compared the result with the PSO and initialized with uniform random numbers. They found better performance especially for complex and smaller populations.
VC  [45]. They used a new operator called systematic mutation operator which is used to improve the performance of the PSO. Instead of using the normal random number, the new operator uses the quasirandom Sobol sequence to initialize the swarm as the QRS is less random as compared to pseudorandom sequences which is helpful for computational methods. They proposed two variants, SM-PSO1 and SM-PSO2. The main difference between the two versions is that in MSPSO1, the best particle is mutated while in MS-PSO2, the worst particle is mutated. They found better results comparing with BPSO and other variants. This work is done by Jiyong et al. 2011 [46]. In this paper, researchers proposed a new method of initialization. In their work, they added the functionality to detect automatically when the particle is prematurely converged and initializes the swarm. They also added functionality to redesign the inertia weight to balance the searching ability globally and locally. They named it IAWPSO. This variant was proposed by P. Murugan in 2012 and applied on the transmission expansion problem to decide installation of new circuits in an increasing usage of electricity and found this variant fruitful [47]. In this work, he used the new initialization technique called population monitored for complementary magnitudes initialization. In initialization, he used decision variables. All particles are initialized with and integer within the limit of the upper and lower values of the decision variable in such a way that each particle should be unique. The initial population is created in a way that each particle can have the ability of the possible solution and they are unique. Almost 50% of the particles are opposite to another 50% considering the lower and upper limit of decision variable. The important thing in this initialization is to maintain uniqueness and diversity among the particles of the swarm generated initially. SISP SO was introduce by Liang Yin, Xiao−Min Hu, and Jun Zhang in 2013 [48]. In this paper, the authors introduced a new initialization technique named space-based initialization strategy. In this work, they broke down each dimension of the search area into two segments, S1i and S2i. The borders of the areas are [li,(li + ui)/2] and [(li + ui)/2, ui], with each segment linked with a probability and initialized with 0.5. They applied SIS-PSO on thirteen functions and compared results with GPSO and CLPSO and found significant improvement. This variant was introduced by Moaath Shatnawi, Mohammad Faidzul Nasrudin, and Shahnorbanun Sahran in 2017 [49]. In this work, they introduced a new variant of PSO called polar PSO. They explained that most of the distortion was occurring due to polar particles. Hence, they introduced a new method for reinitialization of the polar particles by redefining the distance based on the dimensionality of the point. By using this method, it removed the distortion occurring during the computation. He compared the results with BPSO and found some improvement.
This variant was proposed by Laxmi et al. in 2017 [50]. In this work, they used the Nawaz-Enscore-Ham heuristic technique to initialize the swarm. This variant is named PHPSO. The sequence generated by NEH jobs is placed in ascending order of the sums of their total flow time. To construct a job sequence, it depends on its initial order. The minimum TFT sequence is the current sequence for the upcoming iteration among all the sequences. The resulting population generated by the NEH method is used to initialize the population of PSO. They applied this algorithm for the no-wait flow shop scheduling problem. They compared the result with DPSO and HPSO and found the comparatively better result.
A new variant of PSO combining with stochastic gradient decent was proposed by Hayder M. Albeahdili, Tony Han, and Naz E. Islam in 2015 and named it the PSO-SGD algorithm for training the convolution neural network [51]. The proposed technique was divided into two phases. PSO was used to train and initialize the CNN parameters in the first phase. When it showed slow progress of the PSO for few iterations, SGD was used in the second phase. Additionally, they used PSO combined with the genetic algorithm (GA) which helped the particle for simulation and overcame the slowness of SGD. They applied the new algorithm on different benchmark datasets and performed well for three different datasets. The proposed technique avoided the occurrence of local optimum and premature saturation as it was in the known problem by using any single algorithm.
The authors in [52] examined the impact of initiating the initial population by excluding traditional techniques like random numbers or quasi-random numbers. The authors applied the non-linear simplex method for generating the initial population of DE, where the proposed algorithm was termed NSD. The working of the proposed algorithm is measured with twenty benchmark functions and compared with the standard DE and opposition-based DE (ODE) algorithm. Numerical results illustrate that the proposed technique enhances the convergence rate.
To tackle the thresholding problem of the image, an enhanced variant of the standard DE algorithm with a local search (termed as LED) and low discrepancy sequences is introduced [53]. Experimental results conclude that the performance of the introduced algorithm is superior for finding the optimum threshold.
For the steelmaking continuous (SCC) problem, in [54], the authors presented a novel enhanced technique of DE based on the two-step procedure for producing an initial population, as well as, a novel mutation approach. Furthermore, an incremental methodology for generating the initial population was also incorporated in DE to handle dynamic events. Computational experiments conducted with the presented approach show the effectiveness of the presented approach than others. Additionally, as per concern, in the application area, the authors utilized BA for the antenna optimization problem in [55], moreover, pan evaporation was estimated by using BA [56]. Beside this, in [57], the authors applied a new variant of DE for path-planning of mobile robots.
According to the above-mentioned studies, we conclude that the efficiency of metaheuristic algorithms is affected by using the random number for the initialization of the population. Due to this reason, various articles used the quasi-random number sequences for the population's initialization in meta-heuristic algorithms. However, the majority of the researchers used limited quasi-random sequences for initializing the population and did not perform any comparative analysis for their effect on the initialization of population algorithms. Similarly, Knuth, Well, and Torus sequences from quasi-random sequences are still not proposed in DE and BA for the initialization of the population. After analyzing all the literature, we found the above-mentioned gaps and try to fill it.

Methodology
The most important step in any meta-heuristic algorithm is to initialize its population properly. If the initialization is not proper, then it may go to search in unnecessary areas and may fail to search the optimum solution. Proper initialization is very important for any algorithm for its performance. The objective of this paper is to figure out the purity of quasi-random sequences. PSO is random in nature, so it does not have a specific pattern to ensure the global optimum point. Therefore, by taking the advantage of this randomness and considering this fact, we proposed three novel quasi-random initialization strategies called WELL sequence, Knuth sequence, and Torus sequence to initialize the population in the search space. We initialized the PSO, DE, and BA algorithm with these proposed pseudo-random strategies (WELL sequence, Knuth sequence, and Torus sequence). We have compared the novel techniques with the simple random distribution and family of low discrepancy sequences on several unimodal and multi modals complex benchmark functions and training of the artificial neural network. A brief description of quasi sequences approaches and proposed algorithms using WELL sequence, Knuth sequence, and Torus sequence for PSO, DE, and BA are discussed in below.
It has been stated above; the goal of this study is to analyze the purity of low discrepancy sequences. Therefore, we compare the proposed algorithm based on WELL, Torus, and Knuth distribution with the simple PSO, BA, and DE based on pseudo-random uniform distribution and other low discrepancy distributions based on the Sobol sequence and Halton sequence.

Low Discrepancy Sequences
Discrepancy is the measure of how uniform the numbers are distributed. Consider the set of points P = (x 1 , x 2 , . . . . . . , x n ) be set of n points in s dimensions in [0, 1)s. For a vector y = (y 1 , y 2 , . . . , y s ) [0, 1)s, let J be: Although there exit other measures of discrepancy, the star discrepancy is commonly used. A low value of discrepancy means more uniform distribution in space.

Uniform Random Numbers
Random numbers are generated through a pseudo random sequence by pursuing uniform-distribution [44], which can be typified using the probability-density function of constant uniform-distribution. Given below is the probability-density function in (2) as: where u and v describe the features that fit the maximum likelihood. At the edge of u and v, the cost of f (w) is unproductive because of 0 impacts on the integrals of f (w)dw at any range. The likelihood function of assessment helps to simulate the assessment of features of maximum likelihood, likelihood function of assessment is given below in (3) as:

Sobol
The Sobol sequence is firstly introduced by a Russian mathematician, Sobol [45]. Then, reconstruct the coordinates. Coordinates have liner recurrence relation for each dimension. Let the non-negative instance s containing a binary expression in (4) where s: Then, the ith instance of the D dimension can be generated using the (5): where v D 1 represents the binary function which is followed by the D dimension and ith direction instance and these direction instances can be generated using the (6): where c q is a polynomial coefficient where i > q.

Halton
Halton sequences was carried out by J. Halton [43] and can be considered as the enhanced version of Van Dar Corput (Gentle, 2006). Halton sequence constructs random points pattern by using the base as coprime. Halton sequences: The pseudo code to generate Haltom sequences is as follow: Halton Sequences: //input: Initial index = s and base = coprime //output: instances = h Set the interval over For each iteration k1, k2, k3 . . . kn:do • For each particle p1, p2, p3, . . . , pn WELL equi-distributed long-period linear (WELL) sequence was proposed in [58]. Initially, it was carried out as updated version of the Mersenne twister algorithm. The algorithm for generating the WELL distribution is given as: WELL Sequences The algorithm stated above describes the general recurrence for the WELL distribution. The description for the algorithm is as: x and r two integers with the interval of r > 0 and 0 < x < k and k = r * w − x, where w is the weight factor of distribution. A 0 to A 7 represent the binary matrix of size r * w having r bit block. m x describes the bit mask that holds the first w − x bits. t 0 to t 7 are temporary vector variables.

Knuth
As discussed above, inbuilt library function is used, Knuth(x(min,)xmax) to generate Knuth sequences random points. Following is the pseudo code to generate Knuth sequences. Knuth sequence is designed and was proposed by the authors in [33]. Following is the pseudo code to generate Knuth sequences.

•
To shuffle an array a of n elements (indices 0 . . . n − 1): 3.1.6. Torus Torus is a geometric term and was firstly used by the authors in [59] to generate a Torus mesh for the geometric coordinate system. In game development, Torus mesh is commonly used and can be generated using the left hand coordinate system or right hand coordinate system. The shape for the Torus at 1D, 2D, and 3D are circle, donut, and 2D rectangle, respectively. The Torus in 3D can be represented by the following (7)-(9): b(θ, δ) = (D + r cos θ) sin δ, where the angles of circles are θ, δ and D is the distance from tube center to Torus center, r denotes to the radius of 6circle. Inspired by this mesh having Torus, low discrepancy sequences have been generated that were initialized with the prime series as Torus effect. In (10), the mathematical notation for Torus series is shown: where s 1 denotes the series of ith prime number and f is a fraction which can be calculated by f = a − floor(a). Due to the prime constraints, the dimension for the Torus is limited to the 100,000 only if we use parameter prime in Torus function. For more than 100,000 dimensions, the number must be provided through manual way. In Figure        Initialize the swarm 2.
Set epoch count I = 0, population size N z , Dimension of the problem D z , w max and w min 3.
For each particle P z . 4.
Compute the fitness score f z 7.
Set global best position g best Set local best position p best where f z ∈ locally optimal f itness 9.
Compare the current particle's fitness score x z in the swarm and its old local best location p best z If the current fitness score x z is greater than p best z , then substitute p best z , with x z ; else retain the x z unchanged 10. Compare the current particle's fitness score x z in the swarm and its old global best location g best z If the current fitness score x z . is greater than g best z , then substitute g best z , with x z ; else retain the x z unchanged 11. Compute v z+1 → updated velocity vector 12. Compute x z+1 → updated position vector 13. Go to step 2; If the stopping criteria does not met; else terminate In our other contribution in the paper, we introduced three novel methods of initialization of population: DE-WE, DE-KN, and DE-TO. The Algorithm 2 shows the flow chart of the proposed distribution-based DE initialization.   In our last contribution in the paper, we introduced three novel methods of initialization of population: BA-WE, BA-KN, and BA-TO. The Algorithm 3 shows the flow chart of the proposed distribution-based BA initialization.

Algorithm 3 Proposed Pseudo Code of BA Using Novel Method of Initialization
(1) Bat-Initialization();Using WELL,Knuth,Torus

Experimental Setup
To achieve the effective working of the algorithms, it is compulsory to adjust the parameters coupled with all approaches to their most suitable value. Largely, these parameters are observed before the implementation of the algorithm and maintain uniformity throughout the execution. In various studies, it is suggested that the most appropriate methodology for selecting the parameters of any algorithm is predicted through exhaustive experiments for obtaining the optimal parameters. In this study, the experimental setting of the parameters is employed in Table 1 and the parameters setting of the algorithms in  Table 2, respectively, which is on the basis of the literature stated in this section. Along with this, objective functions and their details are in Table 3. The search space boundary is [−100, 100] population size kept 50 with 10 number of runs. Further, 10, 20, and 30 dimensions are used for 1000, 2000, and 3000 iterations, respectively.

Results and Discussion
This section briefly describes the simulation results of the proposed approaches and their graphical representation. Primarily, this section is divided into three sub-sections, where each sub-section is specifically dedicated to EAs simulation results such as PSO, DE, and BA, respectively. In addition to this, each EA is also examined through the statistical tests which are also stated in sub-sections.

Discussion on PSO Results
The simulation was simulated in C + + and applied on a computer using the C + + language on the computer having the Windows  Table 3. In Table 1, D(Dimensions) shows the dimensionality of the problem, S (Search Space) represents the interval of the variables, I(Iterations), Pop(Population size) and in Table 2, f min denotes the common global optimum minimum value. The parameters for the simulation use c1 = c2 = 1.45, inertia weight w is used in the interval [0.9, 0.4], and swarm size is 50. For simulation, the function dimensions are D = 10, 20, and 30 and a maximum number of epochs is 3000. All techniques were applied to similar parameters for comparatively effective results. In order to check the performance of each technique, all algorithms were tested for 30 runs.
The purpose of this study continues to observe whereby the unique characteristics of experimental results rely on dimensions of these standard benchmark functions.
The objective of this study is to find the most suitable initializing approach for the PSO and during the first experiment, the proposed WE-PSO, TO-PSO, and KN-PSO with other approaches SO-PSO, H-PSO, and standard PSO was investigated. The objective of the second simulation is to find the nature of the dimension regarding standard function optimization. Lastly, the simulation results of WE-PSO, TO-PSO, and KN-PSO were compared with standard PSO, SO-PSO, and H-PSO. In the rest of the paper, simulation results are discussed in detail.                 The core objective of this simulation setup is to find the superiority of results depending upon the dimension of the functions that are to be optimized. In experiments, three dimensions for benchmark functions D = 10, D = 20, and D = 30 were used. Simulation results are presented in Table 4. From these simulation results, it was found that functions having larger dimensions were tougher to optimize and it can be seen from the Table 4 when dimension size id is D = 20 and D = 30 and our proposed approach KN-PSO shows belter result on higher dimensions on other approaches WE-PSO, TO-PSO, standard PSO, H-PSO, and SO-PSO.   Table 3 and their parameter settings are also shown in the Table 1. Table 4 shows that with dimension D-30, KN-PSO is more superior and outperforms in convergence than the WE-PSO, TO-PSO, standard PSO, SO-PSO, and H-PSO. The comparative analysis can be seen from Table 4 that with smaller dimension size, standard PSO performs well (D = 10); while the size of the dimension increases, KN-PSO outperforms in convergence significantly. Hence, KN-PSO is best for higher dimensions. The experimental results from Table 4 Table 5.

Discussion on DE Results
Population initialization is a vital factor in the evolutionary computing-based algorithm, which considerably influences the diversity and convergence. In order to improve the diversity and convergence, rather applying the random distribution for initialization, quasi-random sequences are more useful to initialize the population. In this paper, the capability of DE was extended to make it suitable for the optimization problem by introducing new initialization techniques: Knuth sequence-based (DE-KN), the Torus-based sequence-based (DE-TO), and the WELL sequence-based (DE-WE) by using low discrepancies sequence, Torus to solve the optimization problems in large dimension search spaces.
For global optimization, the most considerable variety of benchmark problems can be used. All benchmark problems have their own individual abilities and the variety of detailed characteristics of such functions explains the level of complexity for benchmark problems. For the efficiency analysis of the above-mentioned optimization algorithms, Table 3 displays the benchmark problems that are utilized. Table 3 explains the following contents of benchmark problems: name, range, domain, and formulas. In this study, those benchmark problems are incorporated, which have been extensively utilized in the literature for conveying a deep knowledge of the performance related to the abovementioned optimization algorithms.
To measure the effectiveness and robustness of optimization algorithms, benchmark functions are applied. In this study, fifteen computationally-expensive black box functions are applied with their various abilities and traits. The purpose to utilize these benchmark functions is to examine the effectiveness of the above-mentioned proposed approaches.
In this section, a comparison among the low discrepancies sequence, methods, is performed with each other with reference to capabilities and efficiency with the help of highdimensional fifteen benchmark functions. Nevertheless, the whole performance of optimization algorithms varies on the basis of setting parameters, and also with other testing criteria. Benchmark problems may be embeded to demonstrate the performance of the low discrepancies sequence approaches, at various complex levels. Table 6 contains the experimental simulation results on benchmark functions. The exhaustive statistical results are explained in Table 7
Beside this, DE-TO has excellent control on the high dimensionality problems than other methods in spite of complexity and the superficial topology of the examined problems. Figures 23-38 show the achievements of traditional DE, DE-S, DE-H, DE-KN, and DE-WE algorithms with to regard to their efficiency and capability. The results demonstrate that DE-TO outperforms in higher dimensionality problems. By summarizing it, the dimensionality strongly influences the working of most algorithms, however, it is observed that DE-TO is more consistent during the increment of dimensions of the problem. Due to this consistency of DE-TO, it is proven that the DE-TO algorithm has greater capability of exploration.
For statistical comparison, a widely known mean ranks obtained by Kruskal-Wallis and Friedman tests is implemented to compare the implications between the DE-TO  algorithm and other algorithms in DE-KN, DE-WE, DE-S, DE-H and standard DE are given  in the Table 7.

Discussion on BA Results
The initialization technique plays a vital role in evolutionary and swarm-based stochastic algorithms. As, the traditional BA is not good in the process of global search. Therefore, the performance of BA can be increased by assigning the robust initial fitness to the particles. This may cause the enhancement in the diversity of swarm.                 In this work, the primary concern is to reach the optimal solution, which is 0 in the ideal case. We investigated the different distribution approaches such as Knuth, WELL, Torus, Sobol, Halton, and random to initialize the BA for ensuring the swarm diversity in the very initial stage of the process. It is observed from Table 8 that Knuth distribution-based BA initialization gives better results as compared to the other quasi-random sequences. To validate the numerical results, mean ranks obtained by Kruskal-Wallis and Friedman tests for BA-KN, BA-WE, BATO, BA-SO, BA-HA, and standard BA are given in the Table 9.  For further verification of performance of proposed algorithms TO-PSO, WE-PSO, and KN-PSO, a comparative study for real world benchmark datasets problem is tested for training of the neural network. We performed experiments using seven benchmark datasets (Diabetes, Heart, Wine, Seed, Vertebral, Blood Tissue, and Mammography) exerted from the worldwide famous machine-learning repository of UCI. Training weights are initialized within interval [−50, 50]. Accuracy of the feed forward neural network is tested in the form of root mean squared error (RMSE). Table 10 shows the characteristics of the datasets used.

Discussion
The multi-layer feed forward neural network is trained with the back propagation algorithm, standard PSO, SO-PSO, H-PSO, and proposed TO-PSO, KN-PSO, and WE-PSO.
Comparison of these training approaches tested on real classification problem datasets are taken from the UCI repository. The cross validation method is used to compare the performances of different classification techniques. In the paper, the k-fold cross validation method used for the comparison of classification performances for the training of the neural network with back propagation, standard PSO, SO-PSO, H-PSO, and proposed TO-PSO, KN-PSO, and WE-PSO is used. The k-fold cross validation method was proposed and used in the experimental with value k = 10. The dataset divided into 10 chunks where each chunk of data contains the same proportion of each class of dataset. One chunk is used as testing while nine chunks are used as the training phase. The experimental results of algorithms such as with back propagation, standard PSO, SOPSO, H-PSO, and proposed TO-PSO, KN-PSO, and WE-PSO are compared with each other on seven well-known real datasets taken from UCI and their performances are evaluated. In Table 11, the simulation results show that the training of the neural network with the KN-PSO algorithm outperforms in accuracy and is capable to provide good classification accuracy than the other traditional approaches. The KN-PSO algorithm may be used effectively for data classification and statistical problems in the future as well. Figure 55 represents the accuracy graph for seven datasets. The classification testing accuracy were imported from Microsoft Excel Spreadsheet to the software RStudio version 1.2.5001 to get assurance of the winner approach statistically among all the other approaches. The testing accuracy of all seven variants of PSONN were analyzed by the one-way ANOVA test and post-hoc Tukey's multi-comparison test [60] having a 0.05 significance level. Table 12 depicts the results of one-way ANOVA of the testing accuracy of classification data. The significance value in Table 11 is 0.04639 which is less than 0.05, giving evidence that there is a significant difference among all variants of PSONN with a 95% confidence level. According to this, the variants of PSONN are significantly distinct from each other. Figure 56 represents the graph of one-way ANOVA results, which conclude that KN-PSONN significantly outperforms than all other variants of PSONN. Figure 57 represents the results of multi-comparisons of PSONN variants through the post-hoc Tukey's test. The resultant graph depicts that the KN-PSONN variant is significantly different from all other variants. According to the results in Figure 55, KN-PSONN is proved statistically different from all other approaches of PSONN with a 95% confidence level.

NN Classifications with DE-Based Initialization Approaches
The proposed approaches, DE-KN, DE-TO, and DE-WE and family of low discrepancy sequences, are extremely suitable for tackling global optimization problems. A comparative study for real-world benchmark datasets problems is tested for the training of the neural network. We performed experiments using seven benchmark datasets (Diabetes, Heart, Wine, Seed, Vertebral, Blood Tissue, and Mammography) exerted from the worldwide famous machine-learning repository of UCI. Training weights are initialized within the interval [− 50,50]. Accuracy of the feed-forward neural network is tested in the form of root mean squared error (RMSE).  Table 10. These features include the total units participated against each dataset, the number of total input instances, the dataset nature, and the number of classes against each dataset i.e., binary class problem or multi-class problem. The impact of increasing the number of target classes is independent as the proposed strategy is purely concerned with weight optimization rather feature selection or reducing high dimensionality. The 10-fold cross validation method has been carried out for the training and testing process. The experimental results of algorithms such as with back propagation, standard DE, DE-S, DE-H, DE-WE, DE-TO, and DE-KN are compared with each other on seven well-known real datasets taken from UCI and their performances are evaluated. In Table 13, the simulation results show that training of neural networks with the DE-H algorithm outperforms in accuracy and is capable of providing the good classification accuracy than the other traditional approaches. The DE-H algorithm may be used effectively for data classification and statistical problems in the future as well. Figure 58 represents the accuracy graph for seven datasets. To prove the experimental results statistically, the testing accuracy of classification datasets were loaded to the software RStudio (1.2.5001 version). The classification results of seven approaches of DE were tested along with the one-way ANOVA statistical test and post-hoc Tukey's pair-wise comparison statistical test [60] using significance level 0.05. The findings of the classification dataset with one-way ANOVA are illustrated in Table 14, where the significance = 0.02043 is less than the above-mentioned threshold of significance level. The findings in Table 14 prove that there are significant dissimilarities in all variants of DE with a 95% confidence level. Figure 59 demonstrates the graph of one-way ANOVA which gives the evidence that H-DE is significantly better than other approaches of DE. Figure 60 models the findings of pairwise comparisons of DE approaches with the posthoc Tukey's statistical test. The simulated graph describes that the H-DE approach is statistically significant dissimilar as compared to other approaches of DE having a 95% confidence level.

NN Classifications with BA-Based Initialization Approaches
The multi-layer feed forward neural network is trained with the back propagation algorithm, standard BA, BA-SO, BA-H, and proposed BA-TO, BA-KN, and BA-WE. The comparison of these training approaches is tested on real classification problem datasets taken from the UCI repository. The cross validation method used to compare the performances of different classification techniques. In the paper, the k-fold cross validation method used for comparison of classification performances for the training of the neural network with back propagation, standard BA, BA-SO, BA-H, and proposed BA-TO, BAKN, and BAWE is used. The k-fold cross validation method was proposed and used in the experimental with value k = 10. The dataset is divided into 10 chunks where each chunk of data contains the same proportion of each class of dataset. One chunk is used as testing while nine chunks are used as training phase. The experimental results of algorithms such as with back propagation, standard BA, BA-SO, BA-H, and proposed BA-TO, BA-KN, and BA-WE are compared with each other on seven well-known real datasets taken from UCI and their performances are evaluated. In Table 15, the simulation results show that training of the neural network with the BA-KN algorithm outperforms in accuracy and is capable of providing good classification accuracy than the other traditional approaches. The BA-WE algorithm may be used effectively for data classification and statistical problem in the future as well. Figure 61 represents the accuracy graph for seven datasets. For giving the evidence of simulation results, the results of seven variants of BA initialization obtained from classification were examined by the statistical tests such as oneway ANOVA and post-hoc Tukey tests [60] for pair-wise likeliness (pair-wise comparisons) under the condition of 0.05 significance level. The outcomes of the one-way ANOVA test are presented in Table 16, which show the significance level less than 0.05 is 0.03623. The outcomes of Table 16 revealed that there is significant divergence in all initialization variants of BA with a 95% confidence level. Figure 62 displays the graph of one-way ANOVA which gives proof that KN-BANN is significantly superior to the other initialization variants of BANN. Figure 63 displays the outcomes of pair-wise likeliness comparisons (pair-wise comparisons) of initialization variants of BANN by using post-hoc Tukey's statistical test. The plotted graph shows that the KN-BANN initialization variant is statistically divergent than other initialization variants of BANN having a 95% confidence level.

Conclusions
This paper introduces the new WELL sequence, Knuth sequence, and Torus sequence pseudorandom initialization strategies that are used to initialize the population in PSO, BA, and DE algorithms. Using the low discrepancy sequence family, the theoretical validation of the suggested methods is assessed on a robust suite of benchmark test functions and artificial neural network learning. The results of the simulation show that the use of the low discrepancy sequence family preserves the swarm's diversity, increases the pace of convergence, and identifies a better swarm area. The suggested low discrepancy sequence families contain wider diversity and improved local searchability. The experimental findings indicate that KN-PSO, BA-KN, and DE-H have excellent convergence precision and improved avoidance of local optima. The proposed methods are contrasted with both a random distribution family of low discrepancy sequence approaches and traditional algorithms for PSO, BA, and DE, producing better performance. According to our analysis, an inference can be drawn that the quasi-random sequence for all population-based algorithms is substantially stronger and more feasible. Our goal is to work on higher-dimensional problems and constrained optimization problems for future perspectives. Moreover, we have not improved other additional algorithm operators such as mutations in this study. However, the results of such operators on low discrepancy sequences would be fascinating to examine. The main goal of this research is to extend to other stochastic meta-heuristic algorithms that establish the future directions of our work.

Funding:
The manuscript APC is supported by Universiti Malaysia Sabah, Jalan UMS, 88400, KK, Malaysia. Furthermore, this work is partially funded by FCT/MCTES through national funds and when applicable co-funded EU funds under the Project UIDB/50008/2020; and by Brazilian National Council for Scientific and Technological Development-CNPq, via Grant No. 313036/2020-9.
Institutional Review Board Statement: Not applicable.