Introducing a Parallel Genetic Algorithm for Global Optimization Problems

: The topic of efficiently finding the global minimum of multidimensional functions is widely applicable to numerous problems in the modern world. Many algorithms have been proposed to address these problems, among which genetic algorithms and their variants are particularly notable. Their popularity is due to their exceptional performance in solving optimization problems and their adaptability to various types of problems. However, genetic algorithms require significant computational resources and time, prompting the need for parallel techniques. Moving in this research direction, a new global optimization method is presented here that exploits the use of parallel computing techniques in genetic algorithms. This innovative method employs autonomous parallel computing units that periodically share the optimal solutions they discover. Increasing the number of computational threads, coupled with solution exchange techniques, can significantly reduce the number of calls to the objective function, thus saving computational power. Also, a stopping rule is proposed that takes advantage of the parallel computational environment. The proposed method was tested on a broad array of benchmark functions from the relevant literature and compared with other global optimization techniques regarding its efficiency.


Introduction
Typically, the task of locating the global minimum [1] of a function f : S → R, S ⊂ R n is defined as follows: x * = arg min x∈S f (x). ( where the set (S) is as follows: The values a i and b i are the left and right bounds, respectively, for the point x i .A systematic review of the optimization procedure can be found in the work of Fouskakis [2].
The previously defined problem has been tackled using a variety of methods, which have been successfully applied to a wide range of problems in various fields, such as medicine [3,4], chemistry [5,6], physics [7][8][9], economics [10,11], etc. Global optimization methods are divided into two main categories: deterministic and stochastic methods [12].The first category belongs to the interval methods [13,14], where the set (S) is iteratively divided into subregions, and those that do not contain the global solution are discarded based on predefined criteria.Many related works have been published in the area of deterministic methods, including the work of Maranas and Floudas, who proposed a deterministic method for chemical problems [15], the TRUST method [16], the method suggested by Evtushenko and Posypkin [17], etc.In the second category, the search for the global minimum is based on randomness.Also, stochastic optimization methods are commonly used because they can be programmed more easily and do not depend on any previous information about the objective problem.Some stochastic optimization methods that have been used by researchers include ant colony optimization [18,19], controlled random search [20][21][22], particle swarm optimization [23][24][25], simulated annealing [26][27][28], differential evolution [29,30], and genetic algorithms [31][32][33].Finally, there is a plethora of research referring to metaheuristic algorithms [34][35][36], offering new perspectives and solutions to problems in various fields.
The current work proposes a series of modifications in order to effectively parallelize the widely adopted method of genetic algorithms for solving Equation (1).Genetic algorithms, initially proposed by John Holland, constitute a fundamental technique in the field of stochastic methods [37].Inspired by biology, these algorithms simulate the principles of evolution, including genetic mutation, natural selection, and the exchange of genetic material [38].The integration of genetic algorithms with machine learning has proven effective in addressing complex problems and validating models.This interaction is highlighted in applications such as the design and optimization of 5G networks, contributing to path loss estimation and improving performance in indoor environments [39].It is also applied to optimizing the movement of digital robots [40] and conserving energy in industrial robots with two arms [41].Additionally, genetic algorithms have been employed to find optimal operating conditions for motors [42], optimize the placement of electric vehicle charging stations [43], manage energy [44], and have applications in other fields such as medicine [45,46] and agriculture [47].
Although genetic algorithms have proven to be effective, the optimization process requires significant computational resources and time.This emphasizes the necessity of implementing parallel techniques, as the execution of algorithms is significantly accelerated by the combined use of multiple computational resources.Modern parallel programming techniques include the message-passing interface (MPI) [48] and the OpenMP library [49].Parallel programming techniques have also been incorporated in various cases into global optimization, such as the combination of simulated annealing and parallel techniques [50], the use of parallel methods in particle swarm optimization [51], the incorporation of radial basis functions in parallel stochastic optimization [52], etc.One of the main advantages of genetic algorithms over other global optimization techniques is that they can be easily parallelized and exploit modern computing units as well as the previously mentioned parallel programming techniques.
In the relevant literature, two major categories of parallel genetic algorithms appear, namely, island genetic algorithms and cellular genetic algorithms [53].The island model is a parallel genetic algorithm (PGA), which manages several subpopulations on separate islands, and executes the genetic algorithm process on each island simultaneously for a different set of solutions.Island models have been utilized in various cases, such as molecular sequence alignment [54], the quadratic assignment problem [55], the placement of sensors/actuators in large structures [56], etc.Also, recently, Tsoulos et al. proposed an implementation of an island PGA [57].Regarding the parallel cellular model of genetic algorithms, solutions are organized into a grid.Various diverse operators, such as crossovers and mutations, are applied to neighboring regions within the grid.For each solution, a descendant factor is created, replacing its position within the birth region.The model is flexible regarding the structure of the grid, neighborhood strategies, and settings.Implementations may involve multiple processors or graphical processing units, with information exchange possible through physical communication networks.The theory of parallel genetic algorithms has been thoroughly presented by a number of researchers in the literature [58,59].Also, parallel genetic algorithms have been incorporated in combinatorial optimization [60].
The proposed method is based on the island technique and suggests a number of improvements to the general scheme of parallel Genetic Algorithms.Among these improvements are a series of techniques for propagating optimal solutions among islands that aim to speed up the convergence of the overall algorithm.In addition, the individual islands of the genetic algorithm periodically apply a local minimization technique with two goals: to discover the most accurate local minima of the objective function and to speed up the convergence of the overall algorithm without wasting computing power on previously discovered function values.Furthermore, an efficient termination rule based on asymptotic considerations, which was validated across a series of global optimization methods, is also incorporated into the current algorithm.The proposed method was applied to a series of problems appearing in the relevant literature.The experimental results indicate that the new method can effectively find the global minimum of the functions in a large percentage of cases, and the above modifications significantly accelerated the discovery of the global minimum as the number of individual islands in the genetic algorithm increased.
The remainder of the article follows this structure: In Section 2, the genetic algorithm is analyzed, and the parallelization, dissemination techniques (PT or migration methodologies), and termination criteria are discussed.Subsequently, in Section 3, the test functions used are presented in detail, along with the experimental results.Finally, in Section 4, some conclusions are outlined, and future explorations are formulated.

Method Description
This section begins with a detailed description of the base genetic algorithm and continues providing the details of the suggested modifications.

The Genetic Algorithm
Genetic algorithms are inspired by natural selection and the process of evolution.In their basic form, they start with an initial population of chromosomes, representing possible solutions to a specific problem.Each chromosome is represented as a "gene", and its length is equal to the dimension of the problem.The algorithm processes these solutions through iterative steps, replicating and evolving the population of solutions.In each generation, the selected solutions are crossed and mutated to improve their fit to the problem.As generations progress, the population converges toward solutions with improved fit to the problem.Important factors affecting genetic algorithm performance include population size, selection rate, crossover and mutation probabilities, and strategic replacement of solutions.The choice of these parameters affects the ability of the algorithm to explore the solution space and converge to the optimal result.Subsequently, the operation of the genetic algorithm is presented through the replication and advancement of solution populations, step by step [61,62].The steps of a typical genetic algorithm is shown in Algorithm 1.

Parallelization of Genetic Algorithm and Propagation Techniques
In the parallel island model of Figure 1, an evolving population is divided into various "islands", each working concurrently to optimize a specific set of solutions.In this figure, each island implements a separate genetic algorithm as described in Section 2.1.The steps of the overall algorithm are also presented through a series of steps in Algorithm 2. In contrast to classical parallelization, which handles a central population, the island model features decentralized populations evolving independently.Each island exchanges information with others at specific points in evolution through migration, where solutions move from one island to another, influencing the overall convergence toward the optimal solution.Migration settings determine how often they occur and which solutions are selected for exchange.Each island can follow a similar search strategy, but for more variety or faster convergence, different approaches can be employed.Islands may have identical or diverse strategies, providing flexibility and efficiency in exploring the solution space.To implement this parallel model, each island is connected to a computational resource.For instance, as depicted in images of Figure 2, the execution of the parallel island model involves five islands, each managing a distinct set of solutions using five processor units (PUs).During the migration process, information related to solutions is exchanged among PUs. Figure 2 also depicts the four different techniques for spreading the chromosomes with the best functional values.In Figure 2a, we observe the migration of the best chromosomes from one island to another (randomly chosen).In Figure 2b, migration occurs from a randomly chosen island to all others.In Figure 2c, it occurs from all islands to a randomly chosen one, and finally, in Figure 2d, migration occurs from each island to all others.

Algorithm 1
The steps of the genetic algorithm.(a) For every chromosome g i , i = 1, . . ., N c Calculate the fitness 3. Selection step.The chromosomes are sorted with respect to their fitness values.Denote as N b the integer part of (1 − p s ) × N c chromosomes with the lowest fitness values.These chromosomes will be copied to the next generation.The rest of the chromosomes will be replaced by offspring created in the crossover procedure.Each offspring is created from two chromosomes (parents) of the population through the tournament selection process.The procedure for tournament selection is as follows: A set of N t > 1 randomly selected chromosomes is formed, and the individual with the lowest fitness value from this set is selected as a parent.4. Crossover step.Two selected solutions (parents) are combined to create new solutions (offspring).During crossover, genes are exchanged between parents, introducing diversity.For each selected pair of parents (z, w), two additional chromosomes, represented by z and w, are generated through the following equations.
where i = 1, . . ., n.The values a i are uniformly distributed random numbers, with i. Replace g i using the next offspring created in the crossover procedure.
(b) End For 6. Mutation step.Some genes in the offspring are randomly modified.This introduces more diversity into the population and helps identify new solutions.
(a) For every chromosome If the termination criterion defined in the work of Tsoulos [64], which is outlined in Section 2.3, is met, or k > N g , then go to the local search step; otherwise, go to step 2a. 8. Local search step.To improve the success in finding better solutions, a process of local optimization search is implemented.In the present study, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) variant proposed by Powell [65] is employed as the local search procedure.This procedure is applied to the chromosome in the population with the lowest fitness value.

Algorithm 2
The overall algorithm.
1. Set N I as the total number of parallel processing units.
2. Set N R as the number of generations, after which, each processing unit will send its best chromosomes to the remaining processing units.3. Set N P as the number of migrated chromosomes between the parallel processing units.4. Set PT as the propagation technique.5. Set k = 0 as the generation number.6.For j = 1, ..., N I , perform in parallel (a) Execute a generation of the GA algorithm described in Algorithm 1 on processing unit j.
Obtain the best N P chromosomes from algorithm j.
ii. Propagate these N P chromosomes to the rest of the processing units using a propagation scheme that will be subsequently described.
(c) End If . Check the proposed termination rule.If the termination rule is valid, then go to step 9a; otherwise, go to step 6.
(a) Terminate and report the best value from all processing units.Apply a local search procedure to this located value to enhance the located global minimum.
The migration or propagation techniques, as described in this study, are periodically and synchronously performed in N R iterations on each processing unit.Below are the migration techniques that could be defined: • 1to1: Optimal solutions migrate from a random island to another random one, replacing the worst solutions (see Figure 2a).• 1toN: Optimal solutions migrate from a random island to all others, replacing the worst solutions (see Figure 2b).• Nto1: All islands send their optimal solutions to a random island, replacing the worst solutions (see Figure 2c).• NtoN: All islands send their optimal solutions to all other islands, replacing the worst solutions (see Figure 2d).
If we assume that the migration method "1toN" is executed, then a random island will transfer chromosomes to the other islands, except for itself.However, we keep the label "N" instead of "N-1" because the chromosomes exist on the island that sends them.The number of solutions participating in the migration and replacement process is fully customizable and will be discussed in the experiments below.

Termination Rule
The termination criterion employed in this study was originally introduced in the research conducted by Tsoulos [64] and it is formulated as follows: • In each generation k, the chromosome g * with the best functional value f (g * ) is re- trieved from the population.If this value does not change for a number of generations, then the algorithm should probably terminate.• Consider σ (k) as the associated variance of the quantity f (g * ) at generation k.The algorithm terminates when where k last is the last generation where a lower value of f (g * ) is discovered.

Experiments
A series of benchmark functions from the relevant literature is introduced here, along with the conducted experiments and a discussion of the experimental results.

Test Functions
To assess the effectiveness of the proposed method in locating the overall minimum of functions, a set of well-known test functions cited in the relevant literature [66,67] was employed.The functions used here are as follows: • The Bent cigar function is defined as follows: with the global minimum f (x * ) = 0.For the conducted experiments, the value n = 10 was used.

•
The Bf1 function (Bohachevsky 1) is defined as follows: • The Bf2 function (Bohachevsky 2) is defined as follows: • The Branin function is given by f The CM function.The cosine mixture function is given by the following: The value n = 4 was used in the conducted experiments.• Discus function.The function is defined as follows: with global minimum f (x * ) = 0.For the conducted experiments, the value n = 10 was used.

•
The Easom function.The function is given by the following equation: The exponential function.The function is given by the following: The global minimum is situated at x * = (0, 0, . . ., 0), with a value of −1.In our experiments, we applied this function for n = 4, 16, 64, and referred to the respective instances as EXP4, EXP16, EXP64, and EXP100.• Griewank2 function.The function is given by the following: The global minimum is located at the x * = (0, 0, . . ., 0) with a value of 0.
• Gkls function.f (x) = Gkls(x, n, w) is a function with w local minima, described in [68] with x ∈ [−1, 1] n , and n is a positive integer between 2 and 100.The value of the global minimum is −1, and in our experiments, we used n = 2, 3 and w = 50, 100.The high-conditioned elliptic function is defined as follows: Featuring a global minimum at f (x * ) = 0, the experiments were conducted using the value n = 10.

•
Potential function.As a test case, the molecular conformation corresponding to the global minimum of the energy of N atoms interacting via the Lennard-Jones potential [69] is utilized.The function to be minimized is defined as follows: In the current experiments, two different cases were studied: N = 3, 5. • Rastrigin function.This function is given by the following: Sinusoidal function.The function is given by the following: The global minimum is situated at x * = (2.09435,2.09435, . . ., 2.09435) with a value of f (x * ) = −3.5.In the performed experiments, we examined scenarios with n = 4, 8 and z = π 6 .The parameter (z) is employed to offset the position of the global minimum [70].

•
Test2N function.This function is given by the following equation: The function has 2 n in the specified range; in our experiments, we used n = 4, 5, 6, 7, 8, 9. • Test30N function.This function is given by the following: 10,10].This function has 30 n local minima in the specified range, and we used n = 3, 4 in the conducted experiments.

Experimental Results
To evaluate the performance of the parallel genetic algorithm, a series of experiments were carried out.These experiments varied the number of parallel computing units from 1 to 10.The parallelization was achieved using the freely available OpenMP library [49], and the method was implemented in ANSI C++ within the OPTIMUS optimization package, accessible at https://github.com/itsoulos/OPTIMUS(accessed on 7 June 2024).All experiments were conducted on a system equipped with an AMD Ryzen 5950X processor, 128 GB of RAM, and running the Debian Linux operating system.The experimental settings are shown in Table 1.To ensure the reliability and validity of the research, experiments were conducted 30 times and concerned Tables 2-4.In Table 2, the number of objective function invocations for each problem and its solving time for various combinations of processing units (PUs) and chromosomes are provided.In the columns listing objective function invocation values, values in parentheses represent the percentage of executions where the overall optimum was successfully identified.The absence of this fraction indicates a 100% success rate, meaning that the global minimum was found in every run.Generally, across all problems, there is a decrease in the number of objective function invocations and execution time as the number of parallel computing units increases.The number of chromosomes remains constant in each case, e.g., 1PUx500chrom, 2PUx250chrom, etc.This is a positive result, indicating that parallelization improves the performance of the genetic algorithm.Figures 3 and 4 are derived from Table 2.A statistical comparison of objective function invocations, solving times, and execution times similarly shows performance improvements and computation time reductions for problems as the number of computing units increases.Specifically, in Figure 3, the objective function invocations are halved compared to the initial invocations with only two computational units.This reduction in invocations continues significantly as the number of computational units increases.In Figure 4, we observe similar behavior in the algorithm termination times.In this case, the times are significantly shorter in the parallel process with ten (10) computational units compared to a single computational unit.In the comparisons presented above, there is a reduction in the required computational power, as shown in Figure 3, along with a decrease in the time required to find solutions, as depicted in Figure 4.In Table 2, additional interesting details regarding objective function invocations and computational times are presented, such as minimum, maximum, mean, and standard deviations.In conclusion, as the workload is distributed among an increasing number of computational units, there is a performance improvement.This reinforces the overall methodology.In Table 3, chromosome migration with the best functional values occurs in every generation, involving a specific number of ten chromosomes, N P = 10, participating in the propagation process.To enhance the implementation of propagation techniques, we increased the local optimization rate applied in Table 3 from 0.1% (as presented in Table 2) to 0.5% LSR.However, the level of local optimization was carefully controlled because an excessive increase could lead to a higher number of calls to the objective function.Conversely, reducing the LSR might lead to a decrease in the success rate concerning the identification of optimal chromosomes.In the statistical representation of Figure 5, we observe the superiority of the '1 to N' propagation, meaning the transfer of ten chromosomes from a random island to all others.The 'N to N' propagation appears to be equally effective.As a general rule, if we classify migration methods based on their performance, they will be ranked as follows: '1toN' in Figure 2b, 'NtoN' in Figure 2d, '1to1' in Figure 2a, and 'Nto1' in Figure 2c.The first two strategies, where migration occurs across all islands, demonstrate better performance compared to the other two, where migration only affects one island.The success of '1toN' in Figure 2b and 'NtoN' in Figure 2d, albeit with a slight difference, appears to be due to the migration of the best chromosomes to all islands.This leads to an improvement in the convergence of the algorithm towards better candidate solutions in a shorter time frame.The actual times are shown in Figure 6.During the conducted experiments, the "1-to-N" and "N-to-N" propagation techniques appear to perform better according to experimental evidence.A common feature of these two techniques is that the optimal solutions are distributed to all computing units, thereby improving the performance of each individual unit and consequently enhancing the overall performance of the general algorithm.To conduct experiments among stochastic methods of global optimization, including particle swarm optimization (PSO), improved PSO (IPSO) [71], differential evolution with random selection (DE), differential evolution with tournament selection (TDE) [72], genetic algorithm (GA), and parallel genetic algorithm (PGA), certain parameters remained constant.Also, the parallel implementation of the GAlib library [73] was used in the comparative experiments.The population size for all consists of 500 particles, agents, or chromosomes.In PGA, the population consists of 20PU × 25chrom, while all other parameters remain the same as those described in Table 2. Any method employing LSR maintains this parameter at the same value.The double box is a termination rule that is consistent across all methods.
The values resulting from experiments in Table 4 are depicted in Figures 7 and 8.The box plots of Figure 7 reveal the superiority of PGA, as the number of objective function calls remains at approximately 10,000 across all problems.Conversely, IPSO, DE, and TDE (especially DE) show a low number of calls in some problems, while in others, they experience significant increases.Each method has a specific lower limit of calls during initialization and optimization, which varies from method to method.PGA consistently meets this threshold with very small deviations, as illustrated in the same figure.Figure 8 presents the total call values for each method.This work was also compared against the parallel version of GAlib, found in recent literature.Although GAlib achieves a similar success rate in discovering the global minimum of the benchmark functions, it requires significantly more function calls than the proposed method for the same setup parameters.In Figure 9, it is observed that the collaboration of sub-listing units significantly accelerates the process of finding minima.Additionally, a new experiment was conducted where the number of chromosomes varied from 250 to 1000 and the number of processing units changed from 1 to 10.The total number of function calls for each case is graphically shown in Figure 10.The method maintains the same behavior for any number of chromosomes.This means that the set of required calls is significantly reduced by adding new parallel processing units.Of course, as expected, the total number of calls required increases as the number of available chromosomes increases.

Conclusions
According to the relevant literature, despite the high success rate that genetic algorithms exhibit in finding good functional values, they require significant computational power, leading to longer processing times.This manuscript introduces a parallel technique for global optimization, employing a genetic algorithm to solve the problem.Specifically, the initial population of chromosomes is divided into various subpopulations that run on different computational units.During the optimization process, the islands operate independently but periodically exchange chromosomes with good functional values.The number of chromosomes participating in migration is determined by the crossover and mutation rates.Additionally, periodic local optimization is performed on each computational unit, which should not require excessive computational power (function calls).
Experimental results revealed that even parallelization with just two computational units significantly reduces both the number of function calls and processing time, proving to be quite effective even with more computational units.Furthermore, it was observed that the most effective information exchange technique was the so-called '1toN', with a slight difference from the 'NtoN', where a randomly selected subpopulation sends information to all other subpopulations.Moreover, the 'NtoN' technique-where all subpopulations send information to all other subpopulations-seems to perform equally well.
Similar dissemination techniques have been applied to other stochastic methods, such as the differential evolution (DE) method by Charilogis and Tsoulos [74] and the particle swarm optimization (PSO) method by Charilogis and Tsoulos [75].In the case of differential evolution, the proposed dissemination technique is '1to1' in Figure 2a and not '1toN' in Figure 2b as suggested in this study.However, in the case of PSO and GA, the recommended dissemination technique is the same.
The parallelization of various methodologies of genetic algorithms or even different stochastic techniques for global optimization can be explored to enhance the methodology.However, in such heterogeneous environments, more efficient termination criteria are required, or even their combined use may be necessary.
Author Contributions: I.G.T. conceptualized the idea and methodology, supervised the technical aspects related to the software, and contributed to manuscript preparation.V.C. conducted the experiments using various datasets, performed statistical analysis, and collaborated with all authors in manuscript preparation.All authors have reviewed and endorsed the conclusive version of the manuscript.

1 . 2 .
Initialization step.(a) Set N c as the number of chromosomes.(b) Set N g the maximum number of allowed generations.(c) Initialize randomly N c chromosomes in S. Each chromosome denotes a potential solution to the problem of Equation (1).(d) Set as p s the selection rate of the algorithm, with p s ≤ 1. (e) Set as p m the mutation rate, with p m ≤ 1. (f) Set k = 0 as the generation counter.Fitness calculation step.

10 ] 2 .
The global minimum of the function is −176.541793.•Hartman 3 function.The function is given by the following:

Problems N i = 1
Nc

Figure 3 .
Figure 3. Statistical comparison of function calls with different numbers of processor units.

Figure 4 .
Figure 4. Statistical comparison of times with different numbers of processor units.

Figure 5 .
Figure 5. Statistical comparison of function calls with 5 PUs and different propagation technique.

Figure 6 .
Figure 6.Comparison of times with 5 PUS and different propagation techniques.

Figure 7 .
Figure 7. Statistical comparison of function calls using different stochastic optimization methods.

Figure 8 .
Figure 8.Comparison of total function calls using different stochastic optimization methods.

Figure 9 .
Figure 9. Different variations of the ELP problem.

Figure 10 .
Figure 10.Comparison of function calls with different numbers of chromosomes.

Table 1 .
The following settings were initially used to conduct the experiments.

Table 2 .
Statistical analysis comparing execution times (seconds) and function calls across varying numbers of processor units.

Table 3 .
Evaluating function calls and times (seconds) using various propagation techniques for comparison.

Table 4 .
Comparison of function calls using different stochastic optimization methods.