Improved Method for Parallelization of Evolutionary Metaheuristics

: This paper introduces a method for the distribution of any and all population-based metaheuristics. It improves on the naive approach, independent multiple runs, while adding negligible overhead. Existing methods that coordinate instances across a cluster typically require some compromise of more complex design, higher communication loads, and solution propagation rate, requiring more work to develop and more resources to run. The aim of the new method is not to achieve state-of-the-art results, but rather to provide a better baseline method than multiple independent runs. The main concept of the method is that one of the instances receives updates with the current best solution of all other instances. This work describes the general approach and its particularization to both genetic algorithms and ant colony optimization for solving Traveling Salesman Problems (TSPs). It also includes extensive tests on the TSPLIB benchmark problems of resulting quality of the solutions and anytime performance (solution quality versus time to reach it). These tests show that the new method yields better solutions for about two thirds of the problems and equivalent solutions in the remaining third, and consistently exhibits better anytime performance.


Introduction
Metaheuristics encompass a wide range of algorithms for optimization. Originally applied to heuristics or strategies that coordinate lower level search procedures, in time the concept extended to include methods that employ ways of escaping local optima in complex search spaces, typically several neighbourhood strategies or some extent of randomness. While these algorithms cannot compete with strict optimization methods where they are applicable, such as solving linear programs, they excel at attaining good enough solutions for problems that do not match the expression capabilities of strict methods or that span a large search space that cannot be exhaustively searched by exact methods [1].
Given their diversity, there are many different classifications of metaheuristics, according to different features. The classification most relevant to this work divides them into two classes: The method interprets metaheuristics as a form of stochastic predictors of the optimal solution. They take a specific instance and calculate, through a combination of deterministic and random operations, a value that tries to approximate the solution to the problem.
Thus, running several instances of a metaheuristic is equivalent to using a group of stochastic predictors. Even if in the case of metaheuristics it is possible to know which of the instances yields the best solution by comparing their fitness, it is possible to draw some ideas from the statistician's toolbox to extract more value from running several instances.
The group of methods, employed in statistics and machine learning, collectively known as ensemble methods or ensemble learning deal with exactly this scenario. All of them combine multiple models or predictors to obtain a better result than possible with any one of the models individually [3].
The overarching theme in all ensemble methods is that improvement can be obtained from the aggregation of multiple (weak) models to build a better one.
The same concept is present in the very procedure of many metaheuristics: from the partial reuse of solutions in Genetic Algorithms, to the preference for frequently traveled edges in Ant Colony Optimization, to name a few.
Path Relinking showcases this behaviour better than any other metaheuristic: its concept itself is exactly to combine good solutions to find better ones. Starting from a pool of solutions (generated randomly and/or using some heuristic), it selects the best ones and builds paths between pairs of them. Each step consists of a modification of the current point as in a local search, but always moving towards the other solution (guiding solution). As new solutions are generated, they may become new path sources or guiding solutions.
Taking all of this into account, this paper proposes a method that combines multiple instances of any population-based metaheuristic to improve its efficiency, while introducing as little additional overhead as possible. For easier reading, the method is referred to as the Multiverse method in the remainder of the paper, as opposed to multiple independent runs, which is called Multistart. The metaphor for the Multiverse being that each instance is a universe of its own, but within this method they all form a single entity, hence the Multiverse.
In the Multiverse method, one of the multiple instances has special status: the collector. It receives updates of the best solutions of the other instances. This is straightforward, as all population-based metaheuristics already step through iterations (or generations), combine the solutions in their populations to create new, improved ones, and possess mechanisms to work with multiple solutions (the population). All other instances contribute their current best solution at each iteration to the population of the collector; the same process that takes place normally, applied to this extended population, is responsible for the mixing of solutions.
The added overhead is small: injection of external solutions into the population, and one-way communication of a single solution from each instance. Furthermore, this communication scheme fits a star-like topology, such as the one provided by a standard switch, rather than the more complex and costly mesh favoured by other configurations, such as the Islands model, and more typical of super-computers than of clusters built from commodity-grade computers.
John Holland introduced the concept of Genetic Algorithms (GAs) in [4]. The field has since expanded well beyond the original idea, into a more general class of Evolutionary Algorithms that share the underlying idea of simulating Darwinian evolution and natural selection through mutation and recombination of individuals. Reeves [5] provides a comprehensive review of the history and application of Genetic Algorithms.
Ant Colony Optimization (ACO) actually encompasses a number of similar algorithms for solving discrete optimization problems and which are inspired by the behaviour of real ant colonies in nature. The first ACO system, known as Ant System (AS), was introduced in [6]. Ant Colony Optimization as such is described in [7]. And in time additional variants have surfaced, such as Ant Colony System (ACS) and MAX-MIN Ant System (MMAS). A broad and detailed overview of this developments is available in [8].
Intensive research exists on the application of GPU to accelerate metaheuristics in general, and GA or ACO in particular. A couple of examples suffice to understand that this approach lacks the generality that this work aims at. The PhD thesis [9] addresses metaheuristics in general, and describes in depth the complications of GPU use: multiple memory management, CPU and GPU communication, partitioning the code into sequential and parallel chunks, etc. More specifically for GAs, Krömmer et al. [10] implement GPU accelerated Genetic Algorithm and Differential Evolution methods for task scheduling by offloading problem-specific computation to the GPU. For ACO specifically, Delévacq et al. [11] review GPU implementations, but the more general among them still apply only to a specific algorithm of the ACO family for a specific problem.
In the case of multiprocessing, the main approach is to partition the population, so that each processor evolves a subpopulation. When there is no communication among processors, this is equivalent to the general Multistart method described above. This is one of three possible migration strategies, as introduced in [12]; when communication is allowed, the best solutions from each subpopulation are transferred to other subpopulations, which is called migration. The other two migration strategies are allowing communication among all nodes, which incurs severe overhead, and allowing each processor communication only with a subset of all the processors, typically exploiting neighbourhood in the network topology; this corresponds to the Islands model mentioned before, and its main drawback is the limited propagation rate of good solutions.
This type of parallelization strategy improves on the sequential approach in the following aspects [13]: • The selection of individuals is local to the subpopulation, which requires less computation compared to selecting from the whole population.

•
Each subpopulation can progress asynchronously, reducing the synchronization overhead.

•
The algorithm is more robust, as performance of each processor is independent from the others.
See [14] for further references and examples of applications of parallel Genetic Algorithms following this scheme.
Others approach the parallel strategies for metaheuristics in general. For instance, Salto [15] reviews multiple parallel metaheuristics applied to cutting, packing, and related problems, finding two main classes: master-slave, where the master manages the population and the slaves perform operations on the individuals or subpopulation assigned to them, such as fitness evaluation; and structured population, a generalization of the use of independent subpopulations with varying degrees of communication.
Likewise, Crainic [16] introduces a similar classification into three types, two of which match the ones described above (parallelization of low level computation within an iteration, and multiple subpopulations with varying degrees of granularity and communication); the third type involves partitioning the solution space across the components of the solution variables, with each processor working on a given subset, assuming all other components constant. This usually involves multiple iterations for a suitable exploration of the solution space, and is reminiscent of co-evolution methods. Similarly, Pedemonte et al. [17] present an ACO-specific classification of parallelization strategies, which can be easily mapped to the categories above.
For the evaluation and comparison of parallel metaheuristics, Alba [18] analyzes meaningful metrics and common pitfalls. The design of algorithm assessment methods in this work draws from this analysis to ensure that the results are relevant. In particular, the comparison is based on a predefined effort metric in order to compare solution quality, and it applies a statistical test on the distributions of solutions rather than comparing an aggregated value, such as the mean or the median. Sections 3 and 4 detail the evaluation mechanism.
For the rest of this paper, Section 2 details how to build Multiverse-enabled versions of a GA and ACO for solving the Traveling Salesman Problem (TSP) as a practical demonstration of the method. Both the GA and the ACO are used for experimentally evaluating its performance as compared to Multistart in Section 3. Section 4 provides details on the methodology followed. Finally, Section 5 draws the main conclusions of the work and presents the future work.

General Method and Application to a Genetic Algorithm for TSP
Population based metaheuristics follow the common procedure: 1. Create an initial population 2. Evaluate the population 3. Select the individuals that contribute to the next generation 4. Generate the new population by combining and/or modifying the selected individuals 5. Stop or go back to step 2 (according to a stopping criterion) repeating the loop until a certain condition is met: a number of iterations, a given fitness level, some number of generations without improvement, etc.
Different algorithms define alternative strategies for each of the steps in the procedure, and may in implementation blur the distinctions between them. GA and ACO are examples of the specialization of the general procedure. They can be considered the archetypes for two broad classes of population based metaheuristics: those which build solutions atomically from previous individuals (like GAs) and those that construct new solutions by combining elements or components of previous solutions one at a time (like ACOs). As such archetypes, they are the obvious choice to test the generality of the Multiverse method.
The Multiverse method injects a new step just before closing the loop. In this step, every instance of the metaheuristic sends its current best solution to the controlling node, and it relays them to the collector instance, which adds them to its own population. Due to the very design of this type of metaheuristics, everything else just needs to go on as usual, giving the new procedure: 1. Create an initial population 2. Evaluate the population 3. Select the individuals that contribute to the next generation 4. Generate the new population by combining and/or modifying the selected individuals 5. All but collector: send best solution so far. Collector: receive and add individuals to population 6. Stop or go back to step 2 (according to a stopping criterion) Since such a generic algorithm is not amenable to testing, this work develops two specific instances of it, a GA and an ACO, both solving a TSP, and compares their performance to that of a corresponding Multistart variant with the same codebase, except for the communication of solutions to the collector. As the aim of this paper is to assess the improvements brought by the Multiverse approach, the base algorithms are far from state-of-the-art. Instead, the focus is on creating a level playing field for both versions, with as little influence from external factors as possible. For this reason, the GA relies on relatively simple crossover and mutation operators among those available in the literature and the ACO uses the standard parametrization, and neither of them performs a local search step. Similarly, there is no comparison between the results and the best known solutions in the literature; doing this would require a much greater development effort in the design and programming of the algorithms, but is irrelevant to the objective.
Listing 1 shows, in pseudocode, the operation of the Multiverse Genetic Algorithm for TSP, which closely mirrors the procedure outlined above. The condition Collector is true for the collector node in Multiverse mode; therefore, the code for Multistart is exactly the same, except for the two lines under this condition, as it never happens. This pseudocode corresponds to the nodes that run the GA instances; there is an additional node, the controller, that launches the execution of the instances, keeps the synchronization between them, tracks and consolidates the best solutions, and in the Multiverse case communicates the best solutions of the other instances to the collector. Lines 1-6 initialize the population with random tours and calculate the costs associated to them, informing the controller. Inside the loop, lines 8-14 generate the new population by including directly the elite individuals, selecting the ones that will act as parents, and applying the crossover and mutation operators on them. Lines 15-17 add the solutions from other instances in the collector. Finally, lines 18-23 trim the solutions down to the population size discarding the ones with worse fitness, update the best solution if needed, and send the current best solution to the controller.
The SelectParents function selects a fraction of individuals from the population; this fraction is given by the crossover probability parameter. The selection is random, but the probability for each individual to be selected is inversely proportional to its cost. It deterministically includes the best individual as an elitist strategy. The crossover operator is a variation of order crossover: it randomly selects a subtour of one parent and places it in the same position in the offspring; the rest of the tour is filled up with the remaining cities in the order that they appear in the other parent. The ApplyCrossover function repeatedly draws two individuals from the parents group and generates two offspring with them (inverting which is used as first parent). The selection at this stage is again biased by the inverse of the cost, so that better solutions are selected more often. The mutation operator switches the position of two randomly selected cities. The ApplyMutation function goes through all the individuals selected as parents and applies the mutation operator with probability given by the mutation probability parameter. Each time it does apply it, it adds a new individual to the population. EliteSolutions includes the best individuals from the original population in the new one; the number of individuals to transfer in this way is given by the elitism parameter.
Listing 2 shows the pseudocode for the Multiverse ACO for solving a TSP. The code is the same for the Multistart version, as the only difference is the two lines under the condition Collector, which is true only for the collector process in the Multiverse version. As in the case for GA, this pseudocode corresponds to the nodes that run the ACO instances and there is an additional controller node.  In each iteration, each of the NumAnts ants builds a tour by iteratively selecting the next city to go to. The function GetNext does this selection by assigning a probability to choose each of the cities not yet visited calculated as: where c is the current city in the tour, c is the candidate city for the next step, C is the set of all potential candidate cities for the next step, including c ; τ ij is the pheromone level associated to including going from city i to city j in the tour; η ij is the heuristic associated to including going from city i to city j in the tour, namely the inverse of the distance from i to j; and α and β are algorithm parameters. The update of the pheromone matrix consists of the evaporation step, calculated as τ ij = ρτ ij , with the non-negative evaporation parameter ρ < 1, and the pheromone addition by the best ant along the tour it built, T: where d ij is the distance from i to j.

Experimental Results
The following analysis takes all 18 Asymmetric Traveling Salesman Problem instances in the TSPLIB benchmark library, and runs both the Multiverse and Multistart variants of each GA and ACO on them. Due to the probabilistic nature of the algorithms, it repeats each run 25 times, tracking running time and the best solution found at each iteration. It also records the random seed generated in each run so that the exact same results can be reproduced. Tables 1 and 2 show the outcome of the analysis for the Genetic Algorithm and the Ant Colony Optimization, respectively. The fields are:

Instance
The file containing the problem data.

∆ Avg
The percent difference between the average across repetitions of final solution fitness for Multistart and Multiverse.

∆ Min
The percent difference between the best solution fitness across repetitions for Multistart and Multiverse. RTime The percent difference between the averages across repetitions of CPU time elapsed for Multistart and Multiverse. WTime The percent difference between the averages across repetitions of wall time elapsed for Multistart and Multiverse.

∆ HV
The percent difference between the averages across repetitions of the hypervolume Multistart and Multiverse. Hypervolume is a measure used to compare solutions to multi-objective problems. In such problems, the outcome is given as a Pareto front of non-dominated solutions, that is, solutions such that there is no other solution that improves on one of the objectives without getting worse on another objective. These solutions are also known as efficient solutions. Pérez Cáceres et al. [19] apply the hypervolume measure to assess anytime performance of ACO algorithms, by considering the evolution of best solution found so far at each iteration as a bi-objective problem which aims at minimizing both the cost and the time to reach a given solution quality. This work includes the hypervolume in the same way as a measure of anytime performance.
Tables 1 and 2 show that the final solution quality is better in average using Multiverse, and that the best solution across runs for most of the instances also is attained by Multiverse. Elapsed times (CPU time and wall time) alternate between positive and negative, which is consistent with negligible overhead of Multiverse with respect to Multistart, as the random effect of synchronization is dominant; otherwise the time differences should be positive if the additional communication had a significant impact. Finally, hypervolume is in average also consistently better for Multiverse.
Instance br17 is rather small (17 cities) and always yielded the globally optimal solution except in one run of Multistart. The analyses below do not include this instance, as it behaves as an outlier, posing problems for normalization and application of statistical tests. At this size, there is no practical difference between the two methods. For the ACO, instance ftv33 was also problematic for the same reason in the analysis of solution quality, but had enough variability to be included in the analysis of hypervolume. Figure 1 shows a boxplot of the best costs achieved in the 25 runs of each instance for Multistart and Multiverse GA, providing a deeper view of the ∆ Avg and ∆ Min columns in Table 1. Since the values for each instance are very different from each other, the values are normalized as where x norm is the normalized value for x,x is the average value for Multistart, and x min is the minimum value for Multistart. This performs a translation and scaling of both the Multistart and Multiverse values so that the resulting boxes are comparable and fit well in the graph, while maintaining the relative shapes and positions unmodified within each instance. This figure corroborates the typically improved performance of the Multiverse method (with labels ending in .mv) over the Multistart method (with labels ending in .ms). There are some instances, such as ftv55 and ftv64 where the high variability of solutions using Multiverse makes it difficult to reach a conclusion. Running a directed Mann Whitney U test at α = 0.05 for each instance, supports the hypothesis that there is a statistical significant difference in solution quality for 10 out of 17 instances in favor of Multiverse, and no significant difference for the other 7. Table 3 shows which instances pass the test. Figure 2 is the equivalent of Figure 1 for ACO. The findings are similar, with 11 out of 17 significantly improving according to the directed Mann Whitney U test, as shown in Table 4. Table 3. Instances for which the Mann Whitney U test supports that there is a significant difference between Multiverse and Multistart Genetic Algorithm in favor of Multiverse (MV) or no difference (EQ). There is no instance so that there is a difference in favor of Multistart.
MV ftv33, rbg403, p43, rbg443, ftv47, ft70, rbg358, ftv38, ftv35, ry48p EQ ftv170, ftv44, ftv55, kro124p, ftv64, ftv70, ft53   Figures 1 and 2, respectively, but for hypervolumes instead of cost achieved. The analysis is similar, but for hypervolume the improvement obtained by Multiverse is more evident. The application of a directed Mann Whitney U test at α = 0.05 for each instance, supports the hypothesis that there is a statistical significant difference in solution quality for Genetic Algorithm in 15 out of 17 instances in favor of Multiverse, and no significant difference for the other two: ftv170 and ftv70. For Ant Colony Optimization, the advantage is found in 11 out of 18 instances (see Table 5); larger instances benefit the most from the Multiverse approach.

Materials and Methods
The hardware setup for the tests described in Section 3 consisted of a cluster of 18 virtual machines with 512 MB of RAM memory running Arch Linux on top of an Intel Xeon E5-2695 (Intel Corporation, Santa Clara, CA, USA) at 2.40 GHz.
The logs containing all the information needed for the analysis are provided as Supplementary Materials to this paper. The program for the tests was written in Python, using MPI for communication across nodes, and is available in GitHub at the URL https://github.com/valthalion/endof. The repository also included the code used to process the results and automatically generated the tables and charts used in the paper.
The overall procedure started from the 18 instances of Asymmetric Traveling Salesman Problem in the benchmark collection TSPLIB. For each of them, four sets of runs were executed: GA Multistart, GA Multiverse, ACO Multistart, and ACO Multiverse. Each set consisted of 25 runs with random seeds. Each run logged to a text file the running time, random seed, final solution, and best solution at each iteration. The logs could be recreated by re-running the experiments using the recorded seeds.
The logs were parsed and loaded into a MySQL database for easier manipulation. The repository contained the Python script for populating the database, as well as the SQL code to generate the database schema. Another script processed the database to generate the aggregated values reported in the tables across this paper, including generating L A T E X code for some of them, the statistical tests comparing corresponding Multistart and Multiverse instances, and the charts summarizing the results: boxplots of best solution and hypervolume, and the graph of evolution of the objective function with the iterations.
The algorithms used standard parameters, except for the termination condition which was set to a number of iterations equal to 10 times the number of cities. For ACO, evaporation was ρ = 0.95, three ants deposited pheromones out of 50 ants per iteration, a single elitist solution was kept, and α = β = 1. For GA, a population of 50 solutions was used, without elitism, with 50% and 5% crossover and mutation probabilities, respectively. The crossover operator selected a chunk of the first parent given by two cut points, and placed it in the same position in the offspring; it then filled the rest of the offspring with the remaining elements in the order they appear in the second parent. This was done twice, changing the role of first and second parent, to generate two offspring. The mutation operator randomly exchanged two cities in the loop.

Conclusions and Future Work
The evolution of hardware leads towards parallelization of algorithms in order to extract the full performance of new systems and so tackle larger and more complex problems. This trend is already evident in the literature, and multiple approaches exist with different trade-offs of solution quality, algorithm complexity, communication overhead, and generalization capability: from problem-specific GPU acceleration to multiple independent runs. Multiple independent runs, or Multistart, sits at one end of the trade-off spectrum: fully generic, with minimum complexity and overhead, and is typically considered the baseline against which to compare other methods. This work introduces the Multiverse method as an alternative to multiple independent runs. The Multiverse method is also fully generic, in the sense that it is directly applicable to any population-based metaheuristic, and incurs negligible overhead both in terms of added complexity and communication.
In order to test these claims, and to evaluate its performance in solution quality with respect to Multistart, this paper describes concrete instantiations of the method for solving the Traveling Salesman Problem using GA and ACO as representatives of the two broad classes of population-based metaheuristics. The tests run on the asymmetric TSP instances in the benchmark library TSPLIB. The Multiverse variant is often superior in solution quality and never worse, and consistently outperforms the Multistart approach in anytime behaviour, reaching better solutions faster. The tests show statistical significance for these findings, and confirm that the added overhead is smaller than the intrinsic variability in running time for the problems.
These results support the Multiverse method as a potential candidate to replace multiple independent runs as the baseline for distributed algorithm comparison. To further consolidate this outcome, the next step in its development is to test whether these benefits hold for a real-world problem. In particular we will apply it to Ant Colony Optimization solving a line scheduling problem in the steel industry, as described in [20].