Implementation of a Parallel Algorithm Based on a Spark Cloud Computing Platform

: Parallel algorithms, such as the ant colony algorithm, take a long time when solving large-scale problems. In this paper, the MAX-MIN Ant System algorithm (MMAS) is parallelized to solve Traveling Salesman Problem (TSP) based on a Spark cloud computing platform. We combine MMAS with Spark MapReduce to execute the path building and the pheromone operation in a distributed computer cluster. To improve the precision of the solution, local optimization strategy 2-opt is adapted in MMAS. The experimental results show that Spark has a very great accelerating effect on the ant colony algorithm when the city scale of TSP or the number of ants is relatively large.


Introduction
The ant colony algorithm is a heuristic search algorithm and shows excellent performances on solving various combinatorial optimization, task scheduling and network routing problems [1][2][3].However, its time complexity is high and it is seriously time consuming when solving large-scale problems.In order to deal with the problem, many researches are focused on the parallel ant colony algorithm.Up to now, the main parallel ways of applying the ant colony algorithm are achieved by OPEN ACCESS using multi-core CPUs [4,5].However, the programming models of these ways are complex and their excellent performances are limited by the number of CPU cores.With the development of cloud computing methods, researchers use Hadoop, which is the first generation of cloud computing platform based on MapReduce framework [6], to accelerate ant colony algorithm [7,8].In this way, the programming model allows the user to neglect many of the issues associated with distributed computing: splitting up and assigning the input to various systems, scheduling and running computation tasks on the available nodes and coordinating the necessary communication between tasks.Since the intermediate calculation results of Hadoop MapReduce are saved on a hard disk, the tasks take a long time to be started.Therefore, the ant colony algorithm, which needs a lot of iterations, cannot obtain a good speedup.Spark [9] overcomes the drawbacks of Hadoop.The intermediate calculation results of Spark are saved in memory and tasks start fast.The programming model of Spark MapReduce is quite different with Hadoop MapReduce.In this paper, firstly we introduce Spark MapReduce and then combine it with the ant colony algorithm.
In the previous studies, the precision of solutions for TSP is ignored in [7,8].In order to obtain optimal solutions, the ant colony algorithm needs to be optimized by other algorithms.A composite algorithm is proposed based on Hadoop in [10], which combined simulated annealing (SA).Although the composite algorithm improves the precision of solutions, its speedup is much lower than that of the above two papers.Therefore, it is very important to choose an optimization algorithm which is suitable for the MapReduce framework.We adapted local search strategy 2-opt [11] into ant colony algorithm to speed up convergence rate, in order to improve the efficiency of solving large-scale problems.

Principle of MAX-MIN Ant System Algorithm
The ant colony algorithm is an optimization algorithm that was inspired by the behavior of real ants.It is one of the important methods of solving TSP.The basic ant colony algorithm needs a long search time and is prone to premature stagnation.In order to overcome these disadvantages, researchers continue to optimize the basic ant colony algorithm.In 1997, Stützle et al. proposed MAX-MIN ant algorithm (MMAS) [12].There are two main differences between MMAS and the basic ant colony algorithm.The first one is that MMAS only updates pheromone for the shortest path in the current loop, which ensures that the algorithm searches in the vicinity of the shortest path then finds out the global optimal solution gradually.The second one is that the pheromone on each side is controlled in a fixed range (τmin, τmax) to avoid the algorithm converging to the local optimal solution too fast.τmin and τmax are the minimum value and the maximum value of pheromone, respectively.
The main process of using MMAS to solve TSP is as follows: (1) Initializing ants, cities and tabu list.At first, place m ants randomly in n cities. Pheromone values of every path between cities are initialized to the maximum value τmax.For each ant, there is a tabu list to record the cities which the ant has gone by.The initialization of the tabu list is the city where the ant is situated.The values of pheromone released by ants in each path are initialized to 0.
(2) Constructing the ant paths.An ant moves from city i to j with probability where τij(t) is the amount of pheromone deposited for transition from city i to j in time t, α is a parameter to control the influence of τij(t), ηij(t) is the desirability of city transition ij (a prior knowledge, typically 1/dij, where d is the distance) in time t and β is a parameter to control the influence of ηij.τij and ηij represent the attractiveness and trail level for the other possible city transitions.Allowedk is the set of cities which are not in the tabu list.Equation (1) shows that ants will not choose cities in the tabu list to ensure the legitimacy of solutions.
(3) Operating the pheromone.In iteration, one ant is to complete a traversal of all cities.When all ants tabu lists have been filled, path constructions are finished.Compute the length of the path that each ant has passed by.Compare all paths to find the shortest one, and then save it, which is denoted as C best .The pheromone is updated by where ρ is the pheromone evaporation coefficient and . The minimum value and the maximum value of pheromone are respectively τmin, τmax.Go to step 2 after the pheromone operation until meeting the termination condition of the algorithm: result converging or reaching the maximum number of iterations.

The MapReduce Framework
In April 2004, Google researchers found that most distributed computing can be abstracted as a MapReduce operation.A MapReduce program is composed of a Map procedure that performs filtering and sorting and a Reduce procedure that performs a summary operation.MapReduce can be used for processing massive data and parallel algorithms with high time complexity.
The operation process of the MapReduce framework is shown in Figure 1.In the stage of Map, the input is split into a plurality of parallel parts and Map tasks are generated.Then they are assigned to each computing node and processed in the same way.In the stage of Reduce, Reduce Tasks merge the results calculated by Map Tasks, and then output final results.MapReduce is one of the basic Could Computing technologies of Google.Programs based on MapReduce framework can be run on thousands of common computers in a parallel way.Programmers can take use of the power of a computer cluster even though they don't have any distributed system design experience.

Spark Platform
Spark platform is developed in the UC Berkeley AMP lab.Spark is a fast and general-purpose cluster computing system based on MapReduce framework.Spark uses Scala language for development, which is concise and efficient.
The fundamental programming abstraction of Spark is called Resilient Distributed Datasets (RDD), a logical collection of data partitioned across machines.By allowing user programs to load data into a cluster's memory and to query it repeatedly, Spark is well suited to iterative algorithms, such as ant colony algorithm.
The working principle of Spark platform is shown in Figure 2. Spark divides a computer cluster into a master node and multiple worker nodes.The master node is responsible for task scheduling, resource allocation and error management.Worker nodes process Map tasks and Reduce tasks in parallel.User program distribution and input data splitting are automatically completed by the platform.

Implementation of MMAS Based on Spark Platform
The ant colony algorithm is essentially a parallel algorithm.In every iteration, the ant colony algorithm can be divided into two stages.In the first stage, each ant iterates through all cities independently.Calculate the walking distance and save the current path.In the second stage, compare all the walking distances of ants to get the shortest one and then operate the pheromone.These two stages can be realized respectively by Map and Reduce, so the MapReduce framework is suitable for the ant colony algorithm.The implementation of MMAS based on MapReduce framework and Spark platform is as follows: (1) Data parallelism.At first, initialize the city distance matrix and the pheromone matrix.Let the number of ants be m and establish an array of size m.Use parallelize() of Spark to convert the array to a distributed dataset, then use this dataset as the input of map() to start m Map tasks.
The array that has no effect on results is only used for starting Map tasks, and the elements in it can be any value.(2) Map Tasks.Because all Map tasks will use the same algorithm, we only need to complete the process that one ant constructs a path.Construct path according to MMAS and use the 2-opt algorithm for optimization.Then output the walking distance and path of each ant.The calculations of the algorithm are mainly concentrated in Map stage.The outputs of map() are terms of key-value pairs, in which the distance is key and the path is value.Sort the distances using sortByKey().(3) Reduce Tasks.Use take (1) to find the shortest distance and its corresponding path.(4) Iteration: Update the pheromone matrix according to the results of Reduce tasks, and then save the current minimum distance and optimal path.Iterate Map tasks and Reduce tasks to obtain the final result.

Experiments
The experimental hardware platform is four Linux servers with a total of 64 CPU cores and 128 G memory.Use five standard TSP instances in TSPLIB [13].Compared with common serial programs, resource scheduling and task allocation of Spark platform consume extra time, so we first compare common program (single core) with the 2-core Spark platform on run time.The experimental parameters and the results are shown in Table 1.Column "opt" represents the length of the known optimal solution of every instance.Columns "MMAS" and "MMAS+2-Opt" represent the best results for every instance achieved by the two algorithms, respectively.From Table 1 we can see that MMAS combined 2-Opt based on MapReduce can obtain optimal solutions.The running time of the 2-core Spark platform is longer than common program when city scales are small.As the city scale increases, the acceleration effect becomes greater.So the Spark platform is suitable for large calculations, such as rat195, pr299.
Because the number of ants has a great influence on the solution accuracy, we choose the eil101 instance and change the number of ants to compare common program with 2-core Spark platform.The experimental results are shown in Figure 3.
When the number of ants increases by 64, the time consumptions of common programs increase by 4 min and these of the Spark platform are 2 min.Spark platform starts tasks so fast that adding Map tasks will not bring extra time consumption.
Then we use pr299 and increase the number of CPU cores to test the speedup of the Spark platform.If the number of CPU cores is less than 64, cores are evenly divided among four servers.For example, when we test four cores, the program is executed on four servers (every server uses one core) and cores are LAN-connected.The time consumptions of different CPU cores are shown in Table 2.For the parallel computing system, the main performance index is speedup.The comparison between Spark and Hadoop is shown in Figure 4.The experimental results show that Spark platform has a very noticeable accelerating effect on ant colony algorithm and performs better than Hadoop.When the CPU core number is 1-16, the speedup is close to the linear speedup and then decreases.By observing the CPU load we find that CPU runs in full load when the CPU core number is 1-16, and CPU load decreases while the CPU core number increases to 32 or 64, which means that the amount of calculation is trivial for the Spark platform.So the parallel ant colony algorithm based on the Spark platform is more suitable for handling large-scale problems.

Conclusions
This paper presents the implementation scheme of parallel ant colony algorithm based on the Spark platform.Using the Spark MapReduce framework can improve the efficiency of the ant colony algorithm.At present, cloud computing has been widely used in big data processing while it its potential for speeding up parallel algorithms needs to be developed.Not only the ant colony algorithm but also other parallel algorithms, such as neural network, logistic regression or particle swarm algorithm, can also be sped up by using the Spark platform.

Figure 1 .
Figure 1.The operation process of MapReduce framework.

Figure 2 .
Figure 2. The working principle of Spark.

Figure 3 .
Figure 3.Time consumption of different number of ants.

Figure 4 .
Figure 4. Comparison between the speedups of Spark and Hadoop.In order to make the linear speedup to be a straight line, we let y = log2(T1/Tn).T1 is the running time of common program (single core).

Table 1 .
Comparison between single core program and Spark platform.

Table 2 .
Time consumption of different number of Cores.