Metaheuristic Based Scheduling Meta-Tasks in Distributed Heterogeneous Computing Systems

Scheduling is a key problem in distributed heterogeneous computing systems in order to benefit from the large computing capacity of such systems and is an NP-complete problem. In this paper, we present a metaheuristic technique, namely the Particle Swarm Optimization (PSO) algorithm, for this problem. PSO is a population-based search algorithm based on the simulation of the social behavior of bird flocking and fish schooling. Particles fly in problem search space to find optimal or near-optimal solutions. The scheduler aims at minimizing makespan, which is the time when finishes the latest task. Experimental studies show that the proposed method is more efficient and surpasses those of reported PSO and GA approaches for this problem.


Introduction
A distributed heterogeneous computing (HC) system consists of a distributed suite of different highperformance machines, interconnected by high-speed networks, to perform different computationally intensive applications that have various computational requirements. Heterogeneous computing systems range from diverse elements or paradigms within a single computer, to a cluster of different types of PCs, to coordinated, geographically distributed machines with different architectures (e.g., Grids [1]).
To exploit the different capabilities of a suite of heterogeneous resources effectively and satisfy users with high expectations for their applications, a crucial problem that needs to be solved in the framework of HC is the scheduling problem.
Optimal scheduling involves mapping a set of tasks to a set of resources to efficiently exploit the capabilities of such systems. As mentioned in [2], optimal mapping tasks to machines in an HC suite is an NP-complete problem and therefore the use of heuristics is one of the suitable approaches. According to the type of tasks being scheduled, the scheduling problem can be classified into two types: scheduling meta-tasks and scheduling a directed acyclic graph (DAG) composed of communicating tasks. In this paper, we consider meta-task scheduling problem which involve allocation of a set of independent tasks from different users to a set of computing resources.
Ritchie and Levine [11] used a hybrid ant colony optimization for scheduling in HC systems. In this method, authors combined ant colony optimization with local and tabu search to find shorter schedules. Yarkhan and Dongarra [12] used simulated annealing approach for grid job scheduling. Page and Naughton [13] used a genetic algorithm method for scheduling HC systems. In this method the scheduling strategy operates in a dynamically changing computing resource environment and adapts to variable communication costs and variable availability of processing resources. Braun et al. [14] described eleven heuristics and compared them on different types of HC environments. The authors illustrated that the GA scheduler can obtain better results in comparison with others.
Xhafa et al. [15] used Genetic Algorithm-based schedulers for computational grids and most of GA operators are implemented and compared to find the best GA scheduler for this problem. In [16] the authors also focused on Struggle Genetic Algorithms and their tuning for scheduling of independent jobs in computational grids. Hash-based implementations of the struggle Genetic operator for the GAs were proposed. Abraham et al. [17] used a fuzzy particle swarm optimization and Izakian et al. [18] used a discrete version of particle swarm optimization for scheduling problem.
Xhafa et al. [19] exploited the capabilities of Cellular Memetic Algorithms (CMA) for obtaining efficient batch schedulers for grid systems. Authors implemented and studied several methods and operators of CMA for the job scheduling in grid systems. Abraham et al. [20] illustrated the usage of several nature inspired meta-heuristics (SA, GA, PSO, and ACO) for scheduling jobs in computational grids using single and multi-objective optimization approaches. Also Xhafa and Abraham [21] have reviewed the most important concepts from grid computing related to scheduling problems and their resolution using heuristic and meta-heuristic approaches. The authors identified different types of scheduling based on different criteria, such as static vs. dynamic environment, multi-objectivity, adaptivity, etc.
Different criteria can be used for evaluating the efficiency of scheduling algorithms, the most important of which is makespan. Makespan is the time when an HC system finishes the latest task. An optimal schedule will be the one that minimizes the makespan.
PSO is an algorithm that follows a collaborative population-based search model and has been applied successfully to a number of problems, including standard function optimization problems [22], solving permutation problems [23] and training multi-layer neural networks [24] and its use is rapidly increasing. A PSO algorithm contains a swarm of particles in which each particle includes a potential solution. In contrast to evolutionary computation paradigms such as Genetic Algorithm, a swarm is similar to a population, while a particle is similar to an individual. The particles fly through a multidimensional search space in which the position of each particle is adjusted according to its own experience and the experience of its neighbors. PSO system combines local search methods (through self experience) with global search methods (through neighboring experience), attempting to balance exploration and exploitation [25].
In this paper, we present a version of particle swarm optimization approach for scheduling metatasks in HC systems and the goal of scheduler is to minimize the makespan. In order to evaluate the performance of the proposed method, it is compared with genetic algorithm that presented in [14] for scheduling tasks in HC systems and continuous PSO that presented in [25] for task assignment problem. The experimental results show the presented method is more efficient and can be effectively used for HC systems scheduling. The remainder of this paper is organized in the following manner. In Section 2, we formulate the problem, in Section 3 the PSO paradigm is briefly discussed, Section 4 describes the proposed method and Section 5 reports the experimental results. Finally Section 6 concludes this work.

Problem Definition
An HC environment is composed of computing resources where these resources can be a single PC, a cluster of workstations or a supercomputer. Let T = {T 1 , T 2 ,…,T n } denote the set of tasks that in a specific time interval is submitted to HC system. Assume the tasks are independent of each other (with no inter-task data dependencies) and preemption is not allowed (they cannot change the resource they have been assigned to). Also assume at the time of submitting these tasks, m machines M = {M 1 , M 2 ,…,M m } are within the HC environment. In this paper it is assumed that each machine uses the First-Come, First-Served (FCFS) method for performing the received tasks. We assume that each machine in HC environment can estimate how much time is required to perform each task. In [14] Expected Time to Compute (ECT) matrix is used to estimate the required time for executing a task in a machine. An ETC matrix is an n × m matrix in which n is the number of tasks and m is the number of machines. One row of the ETC matrix contains the estimated execution time for a given task on each machine. Similarly one column of the ETC matrix consists of the estimated execution time of a given machine for each task. Thus, for an arbitrary task T j and an arbitrary machine M i , ETC (T j , M i ) is the estimated execution time of T j on M i . In ETC model we take the usual assumption that we know the computing capacity of each resource, an estimation or prediction of the computational needs of each task, and the load of prior work of each resource.
Assume that C i,j (i  {1,2,…m}, j  {1,2,…n}) is the execution time for performing jth task in ith machine and W i (i  {1,2,…m}is the previous workload of M i , then (1) shows the time required for M i to complete the tasks included in it. According to the aforementioned definition, makespan can be estimated using (2): In this paper the goal of scheduler is to minimize makespan.

Particle Swarm Optimization
Particle swarm optimization (PSO) is a population based stochastic optimization technique inspired by bird flocking and fish schooling originally designed and introduced by Kennedy and Eberhart [10] in 1995. The algorithmic flow in PSO starts with a population of particles whose positions, which represent the potential solutions for the studied problem, and velocities are randomly initialized in the search space. In each iteration, the search for optimal position is performed by updating the particle velocities and positions. Also in each iteration, the fitness value of each particle's position is determined using a fitness function. The velocity of each particle is updated using two best positions, personal best position and neighborhood best position. The personal best position, pbest, is the best position the particle has visited and nbest is the best position the particle and its neighbors have visited since the first time step. Based on the size of neighborhoods two PSO algorithms can be developed. When all of the population size of the swarm is considered as the neighbor of a particle nbest is called global best (gbest) and if the smaller neighborhoods are defined for each particle, then nbest is called local best (lbest). gbest uses the star neighborhood topology and lbest usually uses ring neighborhood topology. There are two main differences between gbest and lbest with respect to their convergence characteristics. Due to the larger particle interconnectivity of the gbest PSO it converges faster than the lbest PSO, but lbest PSO is less susceptible to being trapped in local optima. A particle's velocity and position are updated as follows: where c 1 and c 2 are positive constants, called acceleration coefficients which control the influence of pbest and nbest on the search process, P is the number of particles in the swarm, r 1 and r 2 are random values in range [0, 1] sampled from a uniform distribution. Figure 1 shows the pseudo-code of particle swarm optimization approach. create a swarm with P particles.
initialize the position and velocity of each particle randomly.
calculate fitness value of each position.
calculate pbest and nbest for each particle.
repeat update velocity of each particle using Equation (3). update position of each particle using Equation (4).
calculate fitness value of each particle. update pbest for each particle. update nbest for each particle.
until stopping condition is true;

PSO for Task Scheduling in HC Systems
In this section, we propose a version of particle swarm optimization for HC system scheduling. In this method, we add a heuristic to PSO. Particles need to be designed to present a sequence of tasks in available machines in HC system. Also the velocity has to be redefined.

Particles Encoding
One of the key issues in designing a successful PSO algorithm is the representation step, i.e. finding a suitable mapping between problem solution and PSO particle. In this paper each particle's position is encoded in an n-dimensional search space in which n is the number of tasks to be scheduled. The value of each dimension is a natural number included in range [1, m] indicating the machine number, in which m is the number of available machines in HC system at the time of scheduling. Assume that X k = {X k1 , X k2 ,…,X kn } shows the position of kth particle; X kj indicates the machine where task T j is assigned by the scheduler in this particle. Note that in this encoding method a machine number can appear more than once in a particle.
Since pbest and nbest are two positions that include the personal best position and neighborhood best position of each particle, therefore the pbest and nbest encoding is similar to the particle's position. Also in this paper we used start topology for nbest (gbest PSO).
In our proposed method, velocity of each particle is considered as an m × n matrix whose elements are real numbers in range [1, V max ]. Formally if V k is the velocity matrix of kth particle, then:

Updating Particles
In our proposed method similar to classic PSO, at first the particle's velocity is updated and then it is used for updating the particles' position. Figure 2 shows the pseudo-code for updating velocity matrix for particle k. for each task j=1,2,…,n do if kj kj pbest X  then

end end
In this figure c 1 and c 2 are acceleration coefficients, r 1 and r 2 are random values in range [0, 1] sampled from a uniform distribution and X k is the position of particle k. For updating particle's position we use the updated velocity matrix and a heuristic, η which adds an explicit bias towards the most attractive solutions and is a problem-dependent function. In our proposed method for updating a particle's position, for each task, the probability of its performing on various machines is calculated according to (6): where p kij is the probability of performing task T j on machine M i in particle k, and η kij represents a priori effectiveness of performing task T j on machine M i in particle k. Since in this paper we aim at minimizing makespan, η kij is obtained using (7): in which CT kij is the completion time of task T j on machine M i in particle k and can be obtained according to the workload of machine M i plus required time for executing task T j on machine M i .
After obtaining the p kij ,  i = 1,2,…m, we can select a machine for task T j in particle k according to (8). In this equation r 0  [0, 1] is a user specified parameter and r is a random number in range (0,1) sampled from the uniform distribution:

Fitness Evaluation
Since in this paper the makespan is used to evaluate the performance of scheduler, the Fitness value of each solution can be estimated using (9): makespan 1 fitness  (9) Figure 3 shows the pseudo-code of our proposed method. Create and initialize swarm with P particles // X, pbest, nbest are n-dimensional and V is n m  matrix repeat for each particle k=1,…,P do if ; end end for each particle k=1,…,P do for each task j=1,2,…,n do if kj kj pbest X  ; end end for each task j=1,2,…,n do for each machine i=1,2,…,m do calculate kij p using Equation (6); end select a machine for allocating to task j T using Equation (8) ; update the workload of the selected machine; end end until stopping condition is true;

Experimental Results
In order to evaluate the performance of the proposed method, the approach was compared with a genetic algorithm [14] and continuous PSO [25] for task assignment problem in multiprocessor systems. The goal of scheduler in these methods is to minimize the makespan. These methods are implemented using VC++ and run on a Pentium IV 3.2 GHz PC. In order to optimize the performance of the proposed method and proposed PSO in [25] and GA in [14], fine tuning has been performed and best values for their parameters are selected. For the proposed method the following ranges of parameter values were tested: c 1 [14] for simulating the HC environment.
The simulation model in [14] is based on expected time to compute (ETC) matrix for 512 tasks and 16 machines. The instances of the benchmark are classified into 12 different types of ETC matrices according to the three following metrics: task heterogeneity, machine heterogeneity, and consistency. In ETC matrix, the amount of variance among the execution times of tasks for a given machine is defined as task heterogeneity. Machine heterogeneity represents the variation that is possible among the execution times for a given task across all the machines. Also an ETC matrix is said to be consistent whenever a machine M i executes any task T j faster than machine M k ; in this case, machine M i executes all tasks faster than machine M k . In contrast, inconsistent matrices characterize the situation where machine M i may be faster than machine M k for some tasks and slower for others. Partially-consistent matrices are inconsistent matrices that include a consistent sub-matrix of a predefined size [14]. Instances consist of 512 tasks and 16 machines and are labeled as u-x-yy-zz as follows:  u means uniform distribution used in generating the matrices.  x shows the type of inconsistency; c means consistent, i means inconsistent, and p means partially-consistent.  yy indicates the heterogeneity of the tasks; hi means high and lo means low.  zz represents the heterogeneity of the machines; hi means high and lo means low.
In our experiment, the initial population for the compared methods is generated using two scenarios: (a) randomly generated particles from a uniform distribution, and (b) one particle using the min-min heuristic (that can achieve a very good reduction in makespan [6,14]) and the others are random solutions.
The statistical results of over 50 independent runs are compared in Table 1 for scenario (a). In the table the first column indicates the instance name, the second, third, and fourth columns indicate the makespan achieved by GA [14], PSO [25] and our proposed method respectively.
As shown in Table 1, the proposed PSO approach achieved best results in all instances. Also our method has a large amount of reduction in makespan in all instances; this is because of using heuristic η in the proposed method that minimizes makespan efficiently.  [14], PSO [25] and the proposed method for scenario (a). GA [14] PSO [17] Our method

Standard deviation
GA [14] PSO [17] Our method Figure 6. Comparison of convergence time between different methods. GA [14] PSO [17] Our method Table 2 shows the statistical results of over 50 independent runs in scenario (b). As shown in this table, the min-min heuristic can obtain a good reduction in makespan. In this scenario our method surpasses others in most instances, except those with low heterogeneity in tasks and machines. Figures 4 and 5 show the standard deviation of the compared methods for scenario (a) and scenario (b), respectively. As shown in Figure 4, the proposed method has the lowest standard deviation; this is because of the use of heuristic η in our method. Figure 5 also shows that the magnitude of standard deviation is decreased in scenario (b) thanks to the use of the min -min heuristic. In this scenario, the PSO approach proposed in [25] has lowest standard deviation in most instances and our method has admissible standard deviation too. Figure 6 shows a comparison of CPU times required to achieve results between compared methods. It is evident that the proposed method needs the lowest time for convergence in most cases, but by increasing the number of tasks and problem search space, the time for achieving results is increased in the proposed method rather than GA and in case of 1,024 tasks, the GA scheduler needs lowest time for convergence.

Conclusions
To exploit the different capabilities of a suite of heterogeneous resources effectively and satisfy users with high expectations for their applications, a crucial problem that needs to be solved in the framework of HC is the scheduling problem. In this paper, we have combined particle swarm optimization approach with heuristic for scheduling tasks in distributed heterogeneous systems to minimize makespan. The performance of the proposed method was compared with GA and continuous PSO through carrying out exhaustive simulation tests and different settings. Experimental results show that our method surpasses other proposed techniques in most cases. In the future, we will formulate the proposed method for minimizing makespan and flowtime as a multi-objective problem.