1. Introduction
Cloud computing has emerged as an effective distributed computing utility which may be used for deploying large and complex scientific workflow applications [
1,
2]. Workflows decompose complex, data-intensive applications into smaller tasks and execute those tasks in serial or parallel depending on the nature of the application. A workflow application is represented graphically using a Directed Acyclic Graph (DAG) to reflect the interdependencies among the workflow’s tasks, where the nodes represent computational tasks of the workflow and the directed edges between the nodes determine data dependencies (that is, data transfers), control dependencies (that is, order of execution), and precedence requirements between the tasks. However, resource allocation and scheduling of tasks of a given workflow in a cloud environment are issues of great importance.
Optimization of workflow scheduling is an active research area in the Infrastructure as a Service (IaaS) cloud. It is an NP-complete problem, so building an optimum workflow scheduler with reasonable performance and computation speed is very challenging in the heterogeneous distributed environment of clouds [
3].
A Multi-objective Optimization Problem (MOP) is characterized by multiple conflicting objectives that require simultaneous optimization. Unlike single objective optimization, there is no single feasible solution that optimizes all objective functions; instead, a set of non-dominated solutions with optimal trade-offs known as Pareto optimal solutions can be found for MOPs. The set of all Pareto optimal solutions in the objective space is called the Pareto front [
4]. Many existing studies deal with cloud workflow scheduling as a single or bi-objective optimization problem without considering some important requirements of the users or the providers. Therefore, it is highly desirable to formulate scheduling of the workflow applications as a MOP taking into account the requirements from the user and the service provider. For example, the cloud workflow scheduler might wish to consider user’s Quality of Service (QoS) objectives, such as makespan and cost, as well as provider’s objectives, such as efficient load balancing over the Virtual Machines (VMs).
Predict Earliest Finish Time (PEFT) [
5] is an efficient heuristic in terms of makespan proposed for task scheduling in heterogeneous systems. This heuristic assign priorities to tasks and schedules them in a priority order to the known-best VM. However, list-based heuristics are only locally optimal. Therefore, a metaheuristic approach can be very effective to achieve better optimization solutions for workflow scheduling in the cloud. However, each metaheuristic algorithm has its own merits and demerits. Therefore, hybrid approaches have shown to produce better results [
6,
7] as they combine heuristic rules with metaheuristic algorithms and have attracted much attention in recent years to solve multi-objective workflow scheduling problems in the cloud.
Symbiotic Organisms Search (SOS) [
8] was proposed as a nature-inspired metaheuristic optimization algorithm that was inspired by the interactive behavior between organisms in an ecosystem to live together and survive. SOS is a simply structured, powerful, easy to use, and robust algorithm for solving global optimization problems. The SOS algorithm has strong global exploration, faster convergence capability, and requires only common controlling parameters, such as population size and initialization. Recently, a discrete version of SOS [
9] was proposed for scheduling a bag of tasks in the cloud environment.
This paper proposes a hybrid metaheuristic for multi-objective workflow scheduling in a cloud based on the list-based heuristic algorithm PEFT and the discrete version of the metaheuristic algorithm SOS to achieve optimum convergence and diversity of the Pareto front. The two conflicting objectives of the proposed scheme Hybrid Bio-inspired Metaheuristic for Multi-objective Optimization (HBMMO) are to minimize makespan and to reduce cost along with the efficient utilization of the VMs. Therefore, the proposed multi-objective approach based on a Pareto optimal non-dominated solution considers the users’ as well as providers’ requirements for workflow scheduling in the cloud.
The remaining sections of the paper are organized as follows.
Section 2 discusses the background and investigates related work in the recent literature.
Section 3 presents the system model and the problem formulation of the proposed method. After that,
Section 4 describes the proposed algorithm. Then, the results of a simulation and its analysis are discussed in
Section 5. Finally,
Section 6 presents the main conclusions of the study.
2. Related Work
Several heuristic and metaheuristic algorithms have tried to address workflow scheduling in the cloud environment using different strategies [
10,
11,
12,
13,
14]. Critical Path On a Processor (CPOP), Heterogeneous Earliest Finish Time (HEFT) [
10], Heterogeneous Critical Parent Trees (HCPT), High Performance Task Scheduling (HPS), Performance Effective Task Scheduling (PETS), Lookahead, and PEFT [
5] are some of the well-known list-based scheduling heuristics. All of them attempt to find suitable schedule maps on the basis of some pre-defined rules and problem size. Hence, they are only locally optimal and infeasible for large and complex workflow scheduling problems in the cloud. Recently, Anwar and Deng (2018) [
15] proposed a model, Dynamic Scheduling of Bag of Tasks based workflows (DSB), for scheduling large and complex scientific workflows on elastic, heterogeneous, scalable, and dynamically provisioned VMs. It minimizes the financial cost under a deadline constraint. However, all of them consider the optimization of a single objective only.
The metaheuristic-based techniques are used to find near-optimal solutions for these complex workflow scheduling problems. Recently, a number of nature-inspired metaheuristic-based techniques, such as Artificial Bee Colony (ABC), Ant Colony Optimization (ACO), the Bat Algorithm (BA), Cuckoo Search (CS), Differential Evolution (DE), the Firefly Algorithm (FA), the Genetic Algorithm (GA), Harmony Search (HS), the Immune Algorithm (IA), the League Championship Algorithm (LCA), the Lion Optimization Algorithm (LOA), the Memetic Algorithm (MA), Particle Swarm Optimization (PSO), and Simulated Annealing (SA) [
16] have been applied in solving the task scheduling problem.
A metaheuristic algorithm can be improved in terms of the quality of the solution or convergence speed by combining it with another population-based metaheuristic algorithm or some local search-based metaheuristic algorithm [
17]. Domanal et al. (2017) [
18] proposed a hybrid bio-inspired algorithm for task scheduling and resource management of cloud resources in terms of efficient resource utilization, improved reliability, and reduced average response time. Pooranian et al. (2015) [
19] hybridized a gravitational emulation local search strategy with particle swarm optimization to improve the obtained solution. Abdullahi and Ngadi (2016) [
20] proposed an SA-based SOS in order to improve the convergence rate and quality of solution.
The MOP is a very promising direction to tackle the problem of workflow scheduling in the cloud. Zhang (2014) [
21] used the MOP approach based on a Pareto optimal non-dominated solution for the workflow scheduling problem in the cloud. Zhu et al. (2016) [
2] proposed an evolutionary multi-objective scheduling for cloud (EMS-C) algorithm to solve the workflow scheduling problem on the IaaS platform. Extensions of HEFT [
10], the Pareto Optimal Scheduling Heuristic (POSH) [
22], and Multi-Objective Heterogeneous Earliest Finish Time (MOHEFT) [
3] were designed to provide users with a set of trade-off optimal solutions for scheduling workflows in the cloud. A multi-objective heuristic algorithm, Min-min based time and cost tradeoff (MTCT), was proposed by Xu et al. (2016) [
23]. A scheduling approach, the Balanced and file Reuse-Replication Scheduling (BaRRS) algorithm, was proposed to select the optimal solution based on makespan and cost [
24]. However, they focus on only two objectives.
Recently, some hybrid multi-objective algorithms have been used by combining the good features of two or more approaches: adaptive hybrid PSO [
25], the hybrid multi-objective population migration algorithm [
26], Multi-Objective SOS (MOSOS) with an adaptive penalty function [
27], non-dominance sort-based Hybrid PSO (HPSO) [
28], and Fragmentation-Based Genetic Algorithm (FBGA) [
29]. Although there has been considerable research conducted on Pareto-based optimal methods [
30,
31,
32], further study is needed to enhance the convergence and diversity of the approximate Pareto front in the context of cloud computing.
3. Problem Description for the Proposed Methodology
Table 1 summarizes important notations and their definitions used throughout this paper.
3.1. System Model
The cloud data center used in this study is represented by a set of heterogeneous
VMs,
where
such that
, as shown in
Figure 1. Each VM has its own processing speed measured in Millions of Instruction Per Second (MIPS), memory in Megabytes (MB), storage space in MB, bandwidth in Megabits per second (Mbps), and cost per unit of time.
Tasks of scientific workflow applications can be represented by a DAG, , where is the set of vertices representing different tasks of the workflow, and is the set of directed edges between the vertices representing dependencies and precedence constraints. An edge between the tasks and , indicates the precedence constraint that the task cannot start its execution before finishes and sends all the needed output data to task . In this case, task is considered one of the immediate predecessors of task , and task is considered one of the immediate successors of task . Task can have multiple predecessor and multiple successor tasks, denoted as and of , respectively. A task is considered as a ready task when all its predecessors have finished execution. Each task is assumed to have a workload, denoted by , which is the runtime of the task on a specific VM type. Also, each edge has a weight that indicates the data transfer size of the output data from to , denoted by . Any task without a predecessor task is called the entry task , and a task with no successor task is called the exit task , i.e., and respectively. In this work, we assume that the given workflow has single and . So, if a given workflow has more than one entry or exit task, then a virtual or task with , , , , , and is added to the DAG.
3.2. Assumptions
The current study considers the following assumptions similar to the work presented by Anwar and Deng (2018) [
15].
- (1)
The workflow application is assumed to be executed in a single cloud data center, so that one possible source of execution delay, storage cost, and data transmission cost between data centers is eliminated.
- (2)
An on-demand pricing model is considered, where any partial utilization of the leased VM is charged as a full time period.
- (3)
The communication time for the tasks executed on the same VM is assumed to be zero.
- (4)
The scheduling of tasks is considered to be non-preemptive, which means that a task cannot be interrupted while being executed until it has completed its execution.
- (5)
Each task can be assigned to a single VM, and a VM can process several tasks.
- (6)
Multi-tenant scenarios are not considered, i.e., each VM can only run one task at a time.
- (7)
The processing capacity of a VM is provided either from the IaaS provider or can be calculated based on the work reported by Ostermann et al. (2010) [
33]. The estimation times are scaled by the processing capacity of VM instances, i.e., 1 s of each task in a workflow runs for 1 s on a VM instance with one Elastic Compute Unit (ECU). Note that an ECU is the central processing unit (CPU) performance unit defined by Amazon. The processing capacity of an ECU (based on
[email protected] GHz performing 4 flops per Hz) was estimated at 4.4 GFLOPS (Giga Floating Point Operations Per Second) [
33].
- (8)
When a VM is leased, a boot time of 97 seconds for proper initialization is considered based on the measurements reported by Mao and Humphrey (2011) [
34] for the Amazon EC2 cloud.
- (9)
We adopted a performance degradation of 24% in the Amazon EC2 cloud, similar to the work presented in [
6,
35,
36] based on results achieved by Schad et al. (2010) [
37] and Schad (2015) [
38].
3.3. Multi-Objective Optimization
A MOP has multiple conflicting objectives which need to be optimized simultaneously. Therefore, the goal is to find good trade-off solutions that represent the best possible compromises among the objectives. A MOP problem can be formulated as:
subject to
wherein
represents the decision space.
consist of
objective functions. Since multi-objective optimization usually involve conflicting objectives, so there is no single solution which can optimize all objectives simultaneously. Hence, the desired solution is considered to be any possible solution which is optimal for one or more objectives. For this purpose, the concept of Pareto dominance is mostly employed. Given two solutions
, solution
Pareto dominates
or
Pareto dominates
if and only if,
where
is the
th objective function of solution
in
dimensional space. A solution
is denoted as Pareto optimal if and only if
such that
Pareto dominates
, that is, it is not dominated by any other solution within the decision space. The set of all Pareto optimal solutions is termed as the Pareto set and its image in the objective space is called the Pareto front. Workflow scheduling in the cloud can be seen as a MOP whose goal is to find a set of good trade-off solutions enabling the user to select the desired trade-off amongst the objectives.
3.4. Problem Formulation
The objectives of the proposed work are to minimize the makespan, cost, and degree of imbalance among the VMs. In the workflow scheduling problem, the fitness of a solution is the trade-off between the three objectives.
The cloud workflow scheduling problem can be formulated as follows:
subject to
The fitness function is defined by Equations (3)–(6), where , , and indicate minimizing the three objectives, namely makespan, cost, and degree of imbalance among the VMs, respectively. Equation (7) indicates that the makespan of a workflow depends on the finish time of the exit task. Equation (8) defines the execution time of task on VM considering the VM’s performance variability which represent the potential uncertainties, variation, or degradation in CPU performance and network resources due to the multi-tenant, heterogeneous, shared, and virtualized nature of real cloud environments. In other words, it is the amount by which the speed of a VM may degrade. Ultimately, it may result in a degradation in execution time of tasks. Equation (9) calculates the communication time between the tasks and , which represents the ratio of data transfer size from task to to the smallest bandwidth between the VMs and . and are VMs on which and are executed, respectively. When successive tasks execute on the same VM, . Equation (10) represents the start time () of task to be executed on VM . It is computed based on the available time of the VM () for the execution of the task, the maximum value of the sum of the finish time of all its predecessors, and the communication time between its predecessors and itself. After is decided to run on , will be updated as the finish time of the immediate predecessor task that has been executed on the same VM. Specifically, when is the entry task of the application, the start time can be computed as the available time of VM where is mapped during resource allocation. The finish time () of task executed on VM is defined by Equation (11). The total execution cost for the workflow is defined in Equation (12). Equation (13) ensures the precedence constraint that a task can only start execution after its predecessor task has finished and all the required input data is received. Equation (14) measures the degree of imbalance of all leased VMs based on the Euclidean distance. Obviously, minimizing this value will result in higher utilization of VMs. Equation (15) defines the utilization rate of VM . Equation (16) ensures that a task can be assigned to exactly one VM and can be executed only once. Equation (17) guarantees that a task cannot be interrupted while being executed until it has completed its execution.
4. Proposed Work
This section describes the proposed multi-objective workflow method HBMMO, which optimizes the scheduling of workflow tasks in the cloud environment. In this section, we show how we extended the discrete version of SOS in order to achieve the required objectives of minimizing both the makespan and the cost of executing workflows on the cloud and efficiently balance the load of the VMs. The flow diagram of the proposed algorithm is shown in
Figure 2 and the pseudo code of our proposed HBMMO technique is presented in Algorithm 1. The following subsections represent the phases of the proposed algorithm.
4.1. Initialization
The first task of the proposed optimization model is generating a population of solution candidates, called an ecosystem, using different initialization schemes, where each candidate solution is called an organism. These organisms of the initial population include a schedule generated by the PEFT heuristic, and the remaining schedules are randomly generated under the condition that each organism satisfies all dependencies. The organisms generated by the PEFT heuristic could be used as an approximate endpoint of the Pareto front. The user is required to provide all the necessary inputs, including the size of the ecosystem, the number of VMs, and the number of objective functions. The PEFT heuristic provides guidance to the algorithm that improves the performance of the proposed method and allows for faster convergence to suboptimal solutions. By utilizing the PEFT heuristic, better initial candidate solutions may be obtained. The organisms adjust their position in the solution space through the three phases of the SOS algorithm. Each organism of the ecosystem represents a valid feasible schedule of the entire workflow and an organism’s length equals the size of the given workflow. Let
be the number of organisms,
be the number of tasks in a given workflow, and
be the number of VMs for executing the workflow tasks, then the ecosystem is expressed as
. The position of the
organism, expressed as a vector of the
element, can be given as
where
such that
. In other words,
represents a task-VM mapping scheme of the workflow while preserving the precedence constraints.
Table 2 shows an example of an organism
for mapping of 10 tasks on 4 VMs. The best position identified by all organisms so far is represented by
. Each organism of the ecosystem represents a mapping of the tasks of a given workflow to the VMs while keeping the precedence constraints. So, each organism represents a potential solution to the problem at hand in the solution space for the submitted workflow and the proposed algorithm is used to find the optimal solution.
4.2. Fitness Evaluation
At each iteration of the algorithm, the relationship among organisms (i.e., solutions) is decided based on the desired optimization fitness function using their corresponding positions according to Equation (3). Then, the organism with the best fitness value is updated.
4.3. Optimization
The optimization strategy is performed by applying the three search and update phases (i.e., mutualism, commensalism, and parasitism) to represent the symbiotic interaction between the organisms. The non-dominated organisms found along these phases are stored in an elite ecosystem. The three phases of the symbiotic relationships are described as follows.
4.3.1. Mutualism
The mutualism between organism
and a randomly selected organism
with
is modeled in Equations (19)–(20).
where
is known as the ‘Mutual Vector’, which represents the mutualistic characteristics between organism
and
to increase their survival advantage, R
is a vector of uniformly distributed random numbers between 0 and 1,
denotes the organism with the best objective fitness value in terms of the maximum level of adaptation in the ecosystem, and
and
are the adaptive benefit factors to represent the level of benefit to each of the two organisms
and
, respectively, which varies automatically during the search process. The adaptive benefit factors in [
39] are shown as,
;
; and
. The organisms are updated only if their new fitness is better than their pre-interaction fitness. Otherwise,
and
are discarded while
and
survive to the next population generation.
After mutualism, the elite ecosystem is shown in Equation (21).
4.3.2. Commensalism
The commensalism between organisms
and
with
is modeled in Equation (22).
where
is a vector of uniformly distributed random numbers between −1 and 1, and
denotes the benefit given to
by
. The organism
is updated by
only if its new fitness is better than its pre-interaction fitness. Otherwise,
is discarded while
survives to the next population generation. After commensalism, the elite ecosystem is shown in Equation (23).
4.3.3. Parasitism
The parasitism between organism and a randomly selected organism with is implemented as follows.
Let
be given a role similar to the anopheles mosquito through the creation of an artificial parasite termed as a Parasite Vector (
) in the search space by fine-tuning the stochastically selected attributes of organism
in order to differentiate
with
. A random organism
is selected as a host to
and their fitness values are evaluated. If
has better fitness value than
, then
is replaced by
; otherwise,
will no longer be able to survive in the ecosystem. After parasitism, the elite ecosystem is shown in Equation (24).
4.4. Selection of Best Fit Solutions
The solutions from the elite ecosystem after the optimization process are combined together as given by Equation (25).
The size of the combined population
is larger than the number of organisms
in the ecosystem. The fitness of each organism in the ecosystem
is checked for dominance with other members using Step IVB. Then, only
organisms with higher ranks are selected based on fast non-dominated sorting and crowding distance [
40] for the next generation. The solutions are selected based on the non-domination ranks in the front to which they belong. If there are more solutions with the same value of dominance, then the solution whose crowding distance is higher is selected for the next generation. The solution with the higher crowding distance value is less crowded by other solutions and signifies better density to preserve the diversity of the region. Each objective function is normalized prior to computing the crowding distance. Note that the size of the ecosystem comprising the best solutions is kept the same, that is
. The solution with the highest rank is selected as the best solution
for the next generation.
In the proposed work, the fitness evaluation function is normalized for converting all of the objectives into the minimized problems in the range [0, 1] and for maximizing the spread of the solutions across the Pareto front. The normalized fitness function value across
objective functions of the solution
is defined as
where
is the
th objective function value for solution
, and
and
are the minimum and maximum values of the
th objective function in the ecosystem.
The crowding distance is used to select the solutions who have the same rank in the front. The crowding distance of the two boundary solutions is assigned an infinite value. The crowding distance of an intermediate
th solution in a non-dominated solution set
is defined as the average distance of the two adjacent solutions on its either side along each of the objectives, denoted as
, which is mathematically given in Equation (27).
where
is the number of non-dominated solutions obtained,
is an objective function value of a solution in the non-dominated set
,
is the
th objective function value of the
th solution in the set
; and the metrics
and
are the maximum and minimum normalized values of the
th objective function in the same set, respectively. Here, the non-dominated solutions with the smallest and the largest objective function values, referred to as boundary solutions, are assigned an infinite distance value so that they are always selected.
4.5. Termination Condition
The termination condition is an important factor that can determine the final solutions from the simulation. In this study, the algorithm terminates when a maximum iterations criterion is satisfied. When the optimization process ends, the final set of all optimal solutions in the objective space, called the Pareto front, is presented to the user. According to the scenario presented in this study, a candidate solution is Pareto front if either it is at least as good as all other solutions for all the three objectives , , and , or it is better than all other solutions for at least one of these objectives.
Algorithm 1. Hybrid Bio-inspired Metaheuristic for Multi-objective Optimization (HBMMO) for Scientific Workflow Scheduling in the Cloud. |
Input: | Workflow and set of VMs |
Output: | Pareto optimal set of solutions |
1 | //Initialization phase (Section 4.1) |
2 | Initialize parameters |
3 | Initialize population with randomly generated solutions where each solution satisfies all constraints |
4 | Replace one of the organism by mapping generated by PEFT algorithm |
5 | Initialize |
6 | while termination criteria not fulfilled do |
7 | //Fitness evaluation phase (Section 4.2) |
8 | Evaluate the fitness of each organism //according to Equation (3) |
9 | Select the best solution as |
10 | //Optimization phase (Section 4.3) |
11 | //Apply Mutualism (Section 4.3.1) |
12 | Randomly select where |
13 | Update organisms and //according to Equations (19)–(20) |
14 | //Commensalism (Section 4.3.2) |
15 | Update //according to Equation (22) |
16 | //Parasitism (Section 4.3.3) |
17 | Randomly select where |
18 | Create a parasite vector () |
19 | if fitness of is better than then |
20 | accept to replace |
21 | else reject and keep |
22 | end if |
23 | //Selection of best fit solution phase (Section 4.4) |
24 | Generate the combined population |
25 | Calculate normalized fitness values for each objective //according to Equation (26) |
26 | Apply the non-dominated sort to find the solutions in fronts , where is min s.t. |
27 | for each front do |
28 | for each objective function do |
29 | for each // size of |
30 | Evaluate crowding distance of //according to Equation (27) |
31 | Sort according to crowding distance in descending order |
32 | end for |
33 | Calculate total crowding distance value for every front |
34 | end for |
35 | end for |
36 | Store the best solution as Pareto set in each generation |
37 | end while |