A Hybrid Metaheuristic for Multi-Objective Scientiﬁc Workﬂow Scheduling in a Cloud Environment

: Cloud computing has emerged as a high-performance computing environment with a large pool of abstracted, virtualized, ﬂexible, and on-demand resources and services. Scheduling of scientiﬁc workﬂows in a distributed environment is a well-known NP-complete problem and therefore intractable with exact solutions. It becomes even more challenging in the cloud computing platform due to its dynamic and heterogeneous nature. The aim of this study is to optimize multi-objective scheduling of scientiﬁc workﬂows in a cloud computing environment based on the proposed metaheuristic-based algorithm, Hybrid Bio-inspired Metaheuristic for Multi-objective Optimization (HBMMO). The strong global exploration ability of the nature-inspired metaheuristic Symbiotic Organisms Search (SOS) is enhanced by involving an efﬁcient list-scheduling heuristic, Predict Earliest Finish Time (PEFT), in the proposed algorithm to obtain better convergence and diversity of the approximate Pareto front in terms of reduced makespan, minimized cost, and efﬁcient load balance of the Virtual Machines (VMs). The experiments using different scientiﬁc workﬂow applications highlight the effectiveness, practicality, and better performance of the proposed algorithm.


Introduction
Cloud computing has emerged as an effective distributed computing utility which may be used for deploying large and complex scientific workflow applications [1,2]. Workflows decompose complex, data-intensive applications into smaller tasks and execute those tasks in serial or parallel depending on the nature of the application. A workflow application is represented graphically using a Directed Acyclic Graph (DAG) to reflect the interdependencies among the workflow's tasks, where the nodes represent computational tasks of the workflow and the directed edges between the nodes determine data dependencies (that is, data transfers), control dependencies (that is, order of execution), and precedence requirements between the tasks. However, resource allocation and scheduling of tasks of a given workflow in a cloud environment are issues of great importance.
Optimization of workflow scheduling is an active research area in the Infrastructure as a Service (IaaS) cloud. It is an NP-complete problem, so building an optimum workflow scheduler with reasonable performance and computation speed is very challenging in the heterogeneous distributed environment of clouds [3].
A Multi-objective Optimization Problem (MOP) is characterized by multiple conflicting objectives that require simultaneous optimization. Unlike single objective optimization, there is no single feasible solution that optimizes all objective functions; instead, a set of non-dominated solutions with optimal trade-offs known as Pareto optimal solutions can be found for MOPs. The set of all Pareto optimal Harmony Search (HS), the Immune Algorithm (IA), the League Championship Algorithm (LCA), the Lion Optimization Algorithm (LOA), the Memetic Algorithm (MA), Particle Swarm Optimization (PSO), and Simulated Annealing (SA) [16] have been applied in solving the task scheduling problem.
A metaheuristic algorithm can be improved in terms of the quality of the solution or convergence speed by combining it with another population-based metaheuristic algorithm or some local search-based metaheuristic algorithm [17]. Domanal et al. (2017) [18] proposed a hybrid bio-inspired algorithm for task scheduling and resource management of cloud resources in terms of efficient resource utilization, improved reliability, and reduced average response time. Pooranian et al. (2015) [19] hybridized a gravitational emulation local search strategy with particle swarm optimization to improve the obtained solution.   [20] proposed an SA-based SOS in order to improve the convergence rate and quality of solution.
The MOP is a very promising direction to tackle the problem of workflow scheduling in the cloud. Zhang (2014) [21] used the MOP approach based on a Pareto optimal non-dominated solution for the workflow scheduling problem in the cloud.   [2] proposed an evolutionary multi-objective scheduling for cloud (EMS-C) algorithm to solve the workflow scheduling problem on the IaaS platform. Extensions of HEFT [10], the Pareto Optimal Scheduling Heuristic (POSH) [22], and Multi-Objective Heterogeneous Earliest Finish Time (MOHEFT) [3] were designed to provide users with a set of trade-off optimal solutions for scheduling workflows in the cloud. A multi-objective heuristic algorithm, Min-min based time and cost tradeoff (MTCT), was proposed by Xu et al. (2016) [23]. A scheduling approach, the Balanced and file Reuse-Replication Scheduling (BaRRS) algorithm, was proposed to select the optimal solution based on makespan and cost [24]. However, they focus on only two objectives.
Recently, some hybrid multi-objective algorithms have been used by combining the good features of two or more approaches: adaptive hybrid PSO [25], the hybrid multi-objective population migration algorithm [26], Multi-Objective SOS (MOSOS) with an adaptive penalty function [27], non-dominance sort-based Hybrid PSO (HPSO) [28], and Fragmentation-Based Genetic Algorithm (FBGA) [29]. Although there has been considerable research conducted on Pareto-based optimal methods [30][31][32], further study is needed to enhance the convergence and diversity of the approximate Pareto front in the context of cloud computing. Table 1 summarizes important notations and their definitions used throughout this paper. Best organism known so far

Problem Description for the Proposed Methodology
An integer representing the VM allocated to the task v q such that x i,q ∈ [1, k]

System Model
The cloud data center used in this study is represented by a set of heterogeneous k VMs, M = {m 1 , m 2 , m 3 , . . . m k } where m r ∈ M such that 1 ≤ r ≤ k, as shown in Figure 1. Each VM has its own processing speed measured in Millions of Instruction Per Second (MIPS), memory in Megabytes (MB), storage space in MB, bandwidth in Megabits per second (Mbps), and cost per unit of time. Tasks of scientific workflow applications can be represented by a DAG, W = (V, E), where V = {v 1 , v 2 , v 3 , . . . , v n } is the set of vertices representing n different tasks of the workflow, and E is the set of directed edges between the vertices representing dependencies and precedence constraints. An edge e ij ∈ E between the tasks v i and v j , indicates the precedence constraint that the task v j cannot start its execution before v i finishes and sends all the needed output data to task v j . In this case, task v i is considered one of the immediate predecessors of task v j , and task v j is considered one of the immediate successors of task v i . Task v i can have multiple predecessor and multiple successor tasks, denoted as pred(v i ) and succ(v i ) of v i , respectively. A task is considered as a ready task when all its predecessors have finished execution. Each task v i is assumed to have a workload, denoted by W L i , which is the runtime of the task on a specific VM type. Also, each edge e ij has a weight that indicates the data transfer size of the output data from v i to v j , denoted by DS i . Any task without a predecessor task is called the entry task v entry , and a task with no successor task is called the exit task v exit , i.e., pred v entry = ∅ and succ(v exit ) = ∅, respectively. In this work, we assume that the given workflow has single v entry and v exit . So, if a given workflow has more than one entry or exit task, then a virtual v entry or v exit task with W L entry = 0, DS entry = 0, ST entry = 0, FT entry = 0, ET entry = 0, and CT entry = 0 is added to the DAG.

Assumptions
The current study considers the following assumptions similar to the work presented by Anwar and Deng (2018) [15].
(1) The workflow application is assumed to be executed in a single cloud data center, so that one possible source of execution delay, storage cost, and data transmission cost between data centers is eliminated. (2) An on-demand pricing model is considered, where any partial utilization of the leased VM is charged as a full time period. (3) The communication time for the tasks executed on the same VM is assumed to be zero. (4) The scheduling of tasks is considered to be non-preemptive, which means that a task cannot be interrupted while being executed until it has completed its execution. (5) Each task can be assigned to a single VM, and a VM can process several tasks. (6) Multi-tenant scenarios are not considered, i.e., each VM can only run one task at a time.

Multi-Objective Optimization
A MOP has multiple conflicting objectives which need to be optimized simultaneously. Therefore, the goal is to find good trade-off solutions that represent the best possible compromises among the objectives. A MOP problem can be formulated as: subject to x ∈ ω wherein ω represents the decision space. f (x) consist of d objective functions.
Since multi-objective optimization usually involve conflicting objectives, so there is no single solution which can optimize all objectives simultaneously. Hence, the desired solution is considered to be any possible solution which is optimal for one or more objectives. For this purpose, the concept of Pareto dominance is mostly employed. Given two solutions x 1 , denoted as Pareto optimal if and only if x ∈ ω such that x Pareto dominates x * , that is, it is not dominated by any other solution within the decision space. The set of all Pareto optimal solutions is termed as the Pareto set and its image in the objective space is called the Pareto front. Workflow scheduling in the cloud can be seen as a MOP whose goal is to find a set of good trade-off solutions enabling the user to select the desired trade-off amongst the objectives.

Problem Formulation
The objectives of the proposed work are to minimize the makespan, cost, and degree of imbalance among the VMs. In the workflow scheduling problem, the fitness of a solution is the trade-off between the three objectives.
The cloud workflow scheduling problem can be formulated as follows: The fitness function f is defined by Equations (3)- (6), where f 1 , f 2 , and f 3 indicate minimizing the three objectives, namely makespan, cost, and degree of imbalance among the VMs, respectively. Equation (7) indicates that the makespan of a workflow depends on the finish time of the exit task. Equation (8) defines the execution time of task v i on VM m r considering the VM's performance variability which represent the potential uncertainties, variation, or degradation in CPU performance and network resources due to the multi-tenant, heterogeneous, shared, and virtualized nature of real cloud environments. In other words, it is the amount by which the speed of a VM may degrade. Ultimately, it may result in a degradation in execution time of tasks. Equation (9) calculates the communication time between the tasks v i and v j , which represents the ratio of data transfer size from task v i to v j to the smallest bandwidth between the VMs m r and m s . m r and m s are VMs on which v i and v j are executed, respectively. When successive tasks execute on the same VM, CT ij = 0. Equation (10) represents the start time (ST ir ) of task v i to be executed on VM m r . It is computed based on the available time of the VM (AT r ) for the execution of the task, the maximum value of the sum of the finish time of all its predecessors, and the communication time between its predecessors and itself.
After v i is decided to run on m r , AT r will be updated as the finish time of the immediate predecessor task that has been executed on the same VM. Specifically, when v i is the entry task of the application, the start time can be computed as the available time of VM m r where v i is mapped during resource allocation. The finish time (FT ir ) of task v i executed on VM m r is defined by Equation (11). The total execution cost for the workflow is defined in Equation (12). Equation (13) ensures the precedence constraint that a task can only start execution after its predecessor task has finished and all the required input data is received. Equation (14) measures the degree of imbalance of all leased VMs based on the Euclidean distance. Obviously, minimizing this value will result in higher utilization of VMs. Equation (15) defines the utilization rate of VM m r . Equation (16) ensures that a task can be assigned to exactly one VM and can be executed only once. Equation (17) guarantees that a task cannot be interrupted while being executed until it has completed its execution.

Proposed Work
This section describes the proposed multi-objective workflow method HBMMO, which optimizes the scheduling of workflow tasks in the cloud environment. In this section, we show how we extended the discrete version of SOS in order to achieve the required objectives of minimizing both the makespan and the cost of executing workflows on the cloud and efficiently balance the load of the VMs. The flow diagram of the proposed algorithm is shown in Figure 2 and the pseudo code of our proposed HBMMO technique is presented in Algorithm 1. The following subsections represent the phases of the proposed algorithm.

Initialization
The first task of the proposed optimization model is generating a population of solution candidates, called an ecosystem, using different initialization schemes, where each candidate solution is called an organism. These organisms of the initial population include a schedule generated by the PEFT heuristic, and the remaining schedules are randomly generated under the condition that each organism satisfies all dependencies. The organisms generated by the PEFT heuristic could be used as an approximate endpoint of the Pareto front. The user is required to provide all the necessary inputs, including the size of the ecosystem, the number of VMs, and the number of objective functions.
The PEFT heuristic provides guidance to the algorithm that improves the performance of the proposed method and allows for faster convergence to suboptimal solutions. By utilizing the PEFT heuristic, better initial candidate solutions may be obtained. The organisms adjust their position in the solution space through the three phases of the SOS algorithm. Each organism of the ecosystem represents a valid feasible schedule of the entire workflow and an organism's length equals the size of the given workflow. Let N be the number of organisms, n be the number of tasks in a given workflow, and k be the number of VMs for executing the workflow tasks, then the ecosystem is expressed as X = [X 1 , X 2 , X 3 , . . . , X N ]. The position of the i th organism, expressed as a vector of the 1 × n element, can be given as where x i,q ∈ X i such that 1 ≤ x i,q ≤ k. In other words, X i represents a task-VM mapping scheme of the workflow while preserving the precedence constraints. Table 2 shows an example of an organism X i for mapping of 10 tasks on 4 VMs. The best position identified by all organisms so far is represented by X best . Each organism of the ecosystem represents a mapping of the tasks of a given workflow to the VMs while keeping the precedence constraints. So, each organism represents a potential solution to the problem at hand in the solution space for the submitted workflow and the proposed algorithm is used to find the optimal solution.

Fitness Evaluation
At each iteration of the algorithm, the relationship among organisms (i.e., solutions) is decided based on the desired optimization fitness function using their corresponding positions according to Equation (3). Then, the organism with the best fitness value X best is updated.

Optimization
The optimization strategy is performed by applying the three search and update phases (i.e., mutualism, commensalism, and parasitism) to represent the symbiotic interaction between the organisms. The non-dominated organisms found along these phases are stored in an elite ecosystem. The three phases of the symbiotic relationships are described as follows.

Mutualism
The mutualism between organism X i and a randomly selected organism X j with i = j is modeled in Equations (19)- (20).
where MV = X i +X j 2 is known as the 'Mutual Vector', which represents the mutualistic characteristics between organism X i and X j to increase their survival advantage, R(0, 1) is a vector of uniformly distributed random numbers between 0 and 1, X best denotes the organism with the best objective fitness value in terms of the maximum level of adaptation in the ecosystem, and ABF 1 and ABF 2 are the adaptive benefit factors to represent the level of benefit to each of the two organisms X i and X j , respectively, which varies automatically during the search process. The adaptive benefit factors in [39] are shown as, The organisms are updated only if their new fitness is better than their pre-interaction fitness. Otherwise, X new i and X new j are discarded while X i and X j survive to the next population generation.
After mutualism, the elite ecosystem is shown in Equation (21).

Commensalism
The commensalism between organisms X i and X j with i = j is modeled in Equation (22).
where rand(−1, 1) is a vector of uniformly distributed random numbers between −1 and 1, and X best − X j denotes the benefit given to X i by X j . The organism X i is updated by X new i only if its new fitness is better than its pre-interaction fitness. Otherwise, X new i is discarded while X i survives to the next population generation. After commensalism, the elite ecosystem is shown in Equation (23).

Parasitism
The parasitism between organism X i and a randomly selected organism X j with i = j is implemented as follows.
Let X i be given a role similar to the anopheles mosquito through the creation of an artificial parasite termed as a Parasite Vector (PV) in the search space by fine-tuning the stochastically selected attributes of organism X i in order to differentiate PV with X i . A random organism X j is selected as a host to PV and their fitness values are evaluated. If PV has better fitness value than X j , then X j is replaced by PV; otherwise, PV will no longer be able to survive in the ecosystem. After parasitism, the elite ecosystem is shown in Equation (24).

Selection of Best Fit Solutions
The solutions from the elite ecosystem after the optimization process are combined together as given by Equation (25).
The size of the combined population X combined is larger than the number of organisms n in the ecosystem. The fitness of each organism in the ecosystem X combined is checked for dominance with other members using Step IVB. Then, only n organisms with higher ranks are selected based on fast non-dominated sorting and crowding distance [40] for the next generation. The solutions are selected based on the non-domination ranks in the front to which they belong. If there are more solutions with the same value of dominance, then the solution whose crowding distance is higher is selected for the next generation. The solution with the higher crowding distance value is less crowded by other solutions and signifies better density to preserve the diversity of the region. Each objective function is normalized prior to computing the crowding distance. Note that the size of the ecosystem comprising the best solutions is kept the same, that is n. The solution with the highest rank is selected as the best solution X best for the next generation.
In the proposed work, the fitness evaluation function is normalized for converting all of the objectives into the minimized problems in the range [0, 1] and for maximizing the spread of the solutions across the Pareto front. The normalized fitness function value across d objective functions of the solution x is defined as i , which is mathematically given in Equation (27).

Termination Condition
The termination condition is an important factor that can determine the final solutions from the simulation. In this study, the algorithm terminates when a maximum iterations criterion is satisfied. When the optimization process ends, the final set of all optimal solutions in the objective space, called the Pareto front, is presented to the user. According to the scenario presented in this study, a candidate solution is Pareto front if either it is at least as good as all other solutions for all the three objectives f 1 (X), f 2 (X), and f 3 (X), or it is better than all other solutions for at least one of these objectives. Replace one of the organism by mapping generated by PEFT algorithm 5 Initialize X best 6 while termination criteria not fulfilled do 7 //Fitness evaluation phase (Section 4.2) 8 Evaluate the fitness f (X) of each organism //according to Equation (3)  9 Select the best solution as X best 10 //Optimization phase (Section 4.3) 11 //Apply Mutualism (Section 4.3.1) 12 Randomly select X j where i = j 13 Update organisms X i and X j //according to Equations (19) Randomly select X j where i = j 18 Create a parasite vector (PV) 19 if fitness of PV is better than X j then 20 accept PV to replace X j 21 else reject PV and keep X j 22 end if 23 //Selection of best fit solution phase (Section 4.4) 24 Generate the combined population X combined 25 Calculate normalized fitness values for each objective //according to Equation (26) 26 Apply the non-dominated sort to find the solutions in fronts F 1 , F 2 , F 3 , . . . , F l , where l is min s.

Experimental Setup
The proposed HBMMO was implemented by conducting simulation experiments using an extension of CloudSim [41] called the WorkflowSim-1.0 toolkit [42], which is a modern framework aimed for modeling and simulating scientific workflow scheduling in cloud computing environments. It provides a higher layer of workflow management and also adds functionalities required to support the analysis of various scheduling overheads. Table 3 gives the parameters used in the simulation setup. Experimentation was carried out with different real workflow applications published by Pegasus project, including Montage, CyberShake, Epigenomics, LIGO Inspiral Analysis, and SIPHT [43,44]. Montage is an input/output (I/O)-intensive astronomical application for constructing custom mosaics of the sky. CyberShake is a data-intensive application for generating probabilistic seismic hazard curves for a region. Epigenomics is a CPU-intensive workflow for automating various operations in genome sequence processing. LIGO Inspiral Analysis is a CPU-intensive workflow used for gravitational physics. SIPHT is a computation-intensive workflow used in bioinformatics for automating the search for untranslated RNA (sRNA) encoding-genes for bacterial replicons. Datasets of all the mentioned workflows are provided in the form of DAX files (https://confluence.pegasus.isi.edu/ display/pegasus/WorkflowGenerator) in XML format. They are later converted to DAG-based workflows by using workflow management system framework tools, such as Pegasus [44]. Figure 3 shows the simplified representations of small instances of workflows used in our experiments and the characteristics of these workflows are presented in Table 4.

Evaluation Metrics
The performance analysis of the proposed algorithm is carried out with existing state-of-the-art algorithms using the following metrics.

Inverted Generational Distance (IGD)
The inverted generational distance evaluates the proximity between the optimal solutions obtained by the proposed algorithm (that is, the obtained Pareto front) and the true Pareto front [40]. GD is mathematically given by [40] in Equation (28).
where M is the number of non-dominated solutions obtained in the objective space along the Pareto front, and d i is the Euclidean distance (in objective space) between each solution and the nearest member of the true Pareto front. A result of GD = 0 indicates that all the optimal solutions generated by the proposed algorithm are in the true Pareto front; any other result represents its deviation from the true Pareto front. Therefore, a smaller value of generational distance (GD) reveals a better performance of the achieved solution set. In other words, closer proximity between the obtained Pareto front and the true Pareto front signifies better solutions.

Hypervolume (HV)
This metric indicates the volume of the objective space covered between the obtained Pareto front X and a reference point [45]. For calculating hypervolume, first a vector of non-dominated solutions is generated as an approximation of the actual Pareto front, while the solutions dominated by this vector are discarded. A hypercube hc i is created for each non-dominated solution i ∈ X obtained by the algorithms. Then, a union of all hypercubes is taken. HV is mathematically given in Equation (29).
where hc i is the hypercube for each solution X i . This metric is useful for providing combined information about the convergence and diversity of the Pareto optimal solutions. Algorithms that result in solutions with a large value of HV are desirable, because a larger value signifies that the solution set is close to the Pareto front and also has a good distribution. A result of HV = 0 indicates that there is no solution close to the true Pareto front and the corresponding algorithm fails to produce the optimal solution set. For the purpose of comparison between algorithms, the objective values of the obtained solutions are separately normalized between the interval [0, 1], with 1 representing the optimal value, before calculating the HV. In this study, a reference point (1, 1, 1) is selected in the calculations of HV.

Simaulation Results
The proposed HBMMO algorithm was evaluated against a set of well-known techniques to solve multi-objective optimization problem, including NSGA-II [40], MOPSO [46], and MOHEFT [3]. For the purpose of comparison, all algorithms employed the same number of evaluation functions. The parametric values for NSGA-II are set as: population size n = 100, maximum iterations i = 500, crossover rate cr = 0.8, and mutation rate mr = 0.01; for MOPSO: population size n = 100, learning factors c 1 = 1.5 and c 2 = 1.5, and inertia weight w = [0.9 − 0.7]; and for MOHEFT: number of trade-off solutions n = 100. To achieve the Pareto optimal solutions with the algorithms, the scheduling is repeated 50 times for each algorithm. The results are obtained by taking the average. The VMs are selected randomly such that the fastest VM is three times faster than the slowest one as well as three times more expensive. Figure 4 shows the multi-objective non-dominated solutions obtained for Montage, CyberShake, Epigenomics, and LIGO workflows, respectively. It shows that a lower makespan is correlated with a higher cost and vice versa. We can see that the solutions obtained using HBMMO have a better search ability due to the uniform distribution of solutions than those of MOHEFT, MOPSO, and NSGA-II. It can be seen that the Pareto fronts obtained using HBMMO are superior for all of the workflow instances under consideration. Even in the case of CyberShake, where MOHEFT performs significantly better, HBMMO is still able to maintain better convergence and a uniform distribution of solutions. Figure 5 shows the results obtained by computing the mean GD for Montage with 25 tasks, CyberShake with 30 tasks, Epigenomics with 24 tasks, LIGO Inspiral Analysis with 30 tasks, and SIPHT with 30 tasks. It is observed that the GD value for the proposed HBMMO algorithm is lower as compared to other algorithms. It implies that the solution set generated by the proposed method has a better ability to converge towards the true Pareto front.   As can be seen from Figure 6, the mean HV of the HBMMO against the comparative algorithms is statistically better in most of the scenarios. Compared with NSGA-II, the performance gain is over 50% in most of the cases whereas the improvement rate of HBMMO over MOHEFT is slightly better for small-and medium-size workflows. It can be concluded that HBMMO has better search efficiency and can achieve better non-dominated fronts.   Figure 7 shows a graph of the average Degree of Imbalance (DI) of tasks among VMs. It describes the fairness of the tasks' distribution across the VMs. HBMMO has the best load distribution of tasks between the VMs. Therefore, the smaller value of DI reveals the better performance of the achieved solution set.  Figure 8 shows that HBMMO achieves a better average makespan than the comparative algorithms for all of the benchmark workflows. Figure 9 shows the improvement rate of HBMMO over the MOHEFT algorithm in terms of makespan and cost of the workflow applications. In the case of makespan, the HBMMO gains an improvement by 7.26%, 14.04%, 4.54%, and 16.6% compared with the MOHEFT, using the Montage, CyberShake, Epigenomics, and LIGO workflows, respectively. The cost of HBMMO was better by 14%, 18%, 8%, and 4% over the MOHEFT algorithm in the case of the Montage, CyberShake, Epigenomics, and LIGO workflows, respectively.

Analysis of Variance (ANOVA) Test
The statistical significance of the achieved experimental results is validated by applying the one-way ANOVA test [47]. It is used to find out whether there is any significance variation in the means of all groups or not. It includes a null hypothesis H 0 and an alternate hypothesis H 1 given as where H 0 states that there is no significant difference between the results of the groups and H 1 states that there is significant difference between the results of the groups. Table 5 shows the results for each workflow. It can be seen that the difference between the groups is significant whereas the difference within the groups is trivial. It is clear that the proposed method is statistically significant from the comparative algorithm due to the greater F-statistic and lower p-value. The p-values in the tests are extremely small or close to zero, so they are not given here. Thus, the null hypothesis is rejected and the alternate hypothesis is accepted. Therefore, it is evident that the proposed HBMMO significantly outperforms the other state-of-the-art algorithms.

Conclusions and Future Work
In this paper, a novel Hybrid Bio-inspired Metaheuristic for Multi-objective Optimization (HBMMO) algorithm based on a non-dominant sorting strategy for the workflow scheduling problem in the cloud with more than two objectives is proposed and implemented using WorkflowSim. It is a hybridization of the list-based heuristic algorithm PEFT and the discrete version of the metaheuristic algorithm SOS, which aims to minimize the overall makespan, overall execution cost, and inefficient utilization of the VMs. Well-known real-world workflows are selected to evaluate the performance of the proposed method and the results demonstrate that the proposed HBMMO algorithm is highly effective and promising with potentially wide applicability for the scientific workflow scheduling problem in an IaaS cloud, and attains a uniformly distributed solution set with better convergence towards the true Pareto optimal front. In future work, we intend to develop an environment friendly distributed scheduler for VMs between cloud data centers so that energy can be saved and CO 2 emissions can be reduced.