An Evolutionary Algorithm for Multi-Objective Workflow Scheduling with Adaptive Dynamic Grouping

Zhang, Guochen; Zhang, Aolong; Sun, Chaoli; Ye, Qing

doi:10.3390/electronics14132586

Open AccessArticle

An Evolutionary Algorithm for Multi-Objective Workflow Scheduling with Adaptive Dynamic Grouping

by

Guochen Zhang

,

Aolong Zhang

,

Chaoli Sun

^* and

Qing Ye

College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(13), 2586; https://doi.org/10.3390/electronics14132586

Submission received: 7 May 2025 / Revised: 19 June 2025 / Accepted: 23 June 2025 / Published: 26 June 2025

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

For workflow scheduling with complex dependencies in cloud computing environments, existing research predominantly focuses on multi-objective algorithm optimization while neglecting the critical factor of workflow topological structure. The proposed Adaptive Dynamic Grouping (ADG) strategy breaks through this limitation via dual innovative mechanisms: firstly constructing a dynamic variable grouping model based on task dependencies to effectively compress decision space and reduce global search overhead and secondly introducing an adaptive resource allocation strategy that dynamically distributes execution opportunities according to variable groups’ contribution to optimization, accelerating convergence toward the Pareto frontier. The experimental results on five real-world workflows across three major cloud providers’ virtual machines demonstrate ADG’s superior performance in multi-objective optimization, including execution time, cost, and energy consumption, providing an efficient solution for cloud-based workflow scheduling.

Keywords:

cloud computing; adaptive search; workflow scheduling; multi-objective optimization

1. Introduction

Cloud computing, as an integration of parallel computing, distributed computing, and other advanced technologies, utilizes internet-based virtualization to deliver elastic computing resources, massive data storage, and efficient processing capabilities [1]. This paradigm enables ubiquitous access to computing resources through on-demand services [2], significantly propelling advancements in data analytics, Internet of Things (IoT), and artificial intelligence (AI) applications. Nevertheless, the escalating complexity of computational tasks on cloud platforms has elevated workflow scheduling to a pivotal research and practical challenge. The optimal scheduling of workflow tasks has emerged as a fundamental requirement for maximizing cloud computing system performance.

A typical workflow comprises interdependent computing tasks with complex topological dependencies. The scheduling challenge extends beyond mere execution time optimization to encompass multiple objectives including operational costs and energy efficiency [3]. Current scheduling methodologies predominantly employ heuristic optimization algorithms to derive approximate solutions [4]. However, these algorithms suffer from inherent limitations in generalization and global search capabilities due to their problem-specific design constraints [4]. While multi-objective scheduling algorithms address multiple constraints, their “black-box” approach fails to leverage the inherent task dependencies and structural knowledge embedded within workflows [5]. This fundamental limitation leads to suboptimal search efficiency, particularly when handling problems with high-dimensional decision spaces [5].

While existing decomposition methods effectively partition large-scale optimization problems, they often fail to fully utilize critical inter-task correlation information. This limitation becomes particularly apparent when handling complex task interdependencies, frequently leading to premature convergence to local optima. To address these challenges, this paper introduces an Adaptive Dynamic Grouping (ADG) algorithm for multi-objective workflow scheduling that innovatively incorporates workflow structural knowledge. The proposed solution features three key components: an intelligent grouping strategy that organizes decision variables into functionally cohesive units based on task parallelism and dependencies, a localized optimization approach that confines perturbations to task-specific operations to reduce computational overhead, and an adaptive resource allocation mechanism that dynamically prioritizes high-value variable groups while replacing underperforming ones to accelerate convergence. Comprehensive experiments were conducted across 20 real-world workflows and 12 VM types from major cloud providers (Amazon EC2, Alibaba Cloud, Microsoft Azure). The results demonstrate ADG’s superior performance in simultaneously optimizing execution time, cost, and energy consumption compared to existing scheduling algorithms.

2. Problem Description

This paper formulates the workflow scheduling problem in cloud computing as a multi-objective optimization problem, with a primary focus on optimizing execution time, cost, and energy consumption while taking into full account core components such as the structure of the workflow model during the scheduling process.

2.1. Cloud Computing Workflow Scheduling Model

In cloud computing services, all heterogeneous virtual machines (

V M s

) are represented by the set

V = {V_{1}, V_{2}, \dots, V_{m}}

, where

V_{i}

denotes the i-th virtual machine. Each

V M

is characterized by four basic attributes: {

M i p s

,

C p u s

,

P e r c o s t

,

B a n d w i d t h

}, where

M i p s

and

C p u s

represent the computing power and the number of CPUs of the

V M

, respectively;

P e r c o s t

indicates the rental cost per unit time of the

V M

; and

B a n d w i d t h

refers to the data transmission rate of the

V M

. The computing performance of

V M s

is heterogeneous, meaning that the execution time of the same task may vary across different

V M s

, and the number of tasks that each

V M

can handle concurrently also differs. Additionally, a task can only be assigned to one

V M

at a time [6]. An example of the workflow scheduling process on a cloud platform is shown in Figure 1. The scheduling center randomly distributes 10 cloud tasks uploaded by the user to four virtual machines. After all

V M s

complete their tasks, the execution results are obtained. Among them, virtual machine

V M 2

has the longest completion time, so the Makespan of this task scheduling is 10.

A workflow is typically composed of a set of tasks interconnected by constraint dependencies and communication priority relationships, with its task topology usually represented by a directed acyclic graph (DAG) denoted as

G = (T, E)

. Here,

T

is the set of nodes in the DAG, where each node

t_{i \ i n T}

represents a task (

i

being the task number, with a total of

n

tasks), and

E

represents the set of constraints between tasks. Tasks with dependency relationships are connected by directed edges—for example,

e_{i, j}

signifies the constraint between task

t_{i}

and task

t_{j}

, meaning task

t_{j}

can only execute after receiving data transmitted from task

t_{i}

. The set of predecessor tasks of node

t_{j}

is denoted as

P (t_{j})

(pointed to by directed edges from predecessor nodes), and the set of successor tasks of node

t_{j}

is denoted as

S (t_{j})

(pointed to by directed edges from

t_{j}

to successor nodes). When a task has multiple predecessors, it can only execute after all predecessor tasks are completed and all communication requirements for the task are satisfied. Specifically, a task with no predecessor nodes is called a workflow ingress task, while a task with no successor nodes is called a workflow egress task. In the example shown in Figure 2, when

n = 9

, the predecessor task set of task

t_{6}

is

P (t_{6}) = {t_{1}, t_{2}, t_{3}}

, the successor task set of task

t_{2}

is

S (t_{2}) = {t_{6}, t_{7}}

, the workflow ingress task is

t_{1}

, and the workflow egress task is

t_{9}

.

2.2. Optimization Goals

2.2.1. The Calculation of Makespan

In workflow scheduling, the execution time of task

t_{i}

on virtual machine

v_{j}

is computed as follows:

C T (t_{i}, v_{j}) = \frac{T L (t_{i})}{M i p s (v_{j})}

(1)

where

T L (t_{i})

denotes the computational workload of task

t_{i}

(measured in instructions), and

Mips (v_{j})

represents the processing capacity of

v_{j}

’s CPU (in millions of instructions per second). The start time of task

t_{i}

is determined by the completion of all its predecessor tasks:

S T (t_{i}, v_{j}) = m a x {T T F (e_{i k})}, t_{k} \in p r e (t_{i})

(2)

where

TTF (e_{i k})

is the total data transfer time from predecessor task

t_{k}

to

t_{i}

. The finish time of task

t_{i}

is the sum of its start and execution times:

F T (t_{i}, v_{j}) = S T (t_{i}, v_{j}) + C T (t_{i}, v_{j})

(3)

When tasks

t_{i}

and

t_{k}

reside on different VMs, communication overhead must be considered:

T T_{i k} (e_{i k}) = \{\begin{matrix} 0, i f v (t_{i}) = v (t_{k}); \\ \frac{m_{i k}}{m a x {b d (v (t_{i})), b d (v (t_{k}))}}, O t h e r w i s e \end{matrix}

(4)

Here,

m_{i k}

denotes the data volume transferred between tasks, and

bd (v (t_{i}))

,

bd (v (t_{k}))

represent the network bandwidths of their respective VMs. The communication time is zero if tasks are collocated; otherwise, it is determined by the data volume and the maximum available bandwidth. The end-to-end communication delay between dependent tasks is as follows:

T T F_{i k} = T T_{i k} (e_{i k}) + F T (t_{k})

(5)

where

F T (t_{k})

indicates the execution completion time of the

t_{k}

.

For scheduling algorithms, especially in the context of cloud computing, the maximum completion time (Makespan) stands as the most fundamental evaluation metric. This metric is defined as the total duration elapsed from the start of the first task in a workflow to the completion of the last task, serving as a critical indicator for assessing the time efficiency of scheduling algorithms. Mathematically, it can be expressed as follows:

M a k e s p a n = m a x_{\forall t_{i} \in T} {F T (t_{i})}

(6)

2.2.2. Task Execution Cost

In workflow scheduling, the virtual machine leasing cost is another key metric for users, calculated as

C = \sum_{j = 1}^{m} p e r (v_{i}) \times [A C T (v_{i}) / l]

(7)

where

per (v_{i})

denotes the unit leasing price of virtual machine

v_{i}

,

ACT (v_{i})

is its actual usage duration in hours,

l

represents the billing cycle (typically 1 h or 1 min), and

⌈\cdot⌉

indicates that partial cycles are rounded up to the next full unit using the ceiling operator.

2.2.3. Energy Consumption for Virtual Machines

The energy consumption of virtual machines (VMs) is modeled based on the framework outlined in Reference [7]. The total power consumption consists of two components: an idle power component and an active power component, as described by Equation (8):

E S_{j} = \int_{s t}^{e t} A (t) \times P I + λ \times f v {(t)}^{3} d t

(8)

where

s t

and

e t

denote the power-on time and power-off time of the VM, respectively;

A (t)

is a binary state variable representing the operational state of the VM at time

t

(1 for active state, 0 for idle state);

P_{I}

is the power consumption per unit time in the idle state;

f_{v} (t)

is the CPU frequency of the VM at time

t

; and

λ

is a constant reflecting the relationship between the operating frequency and supply voltage of the VM.

Thus, the total energy consumption of servers on the cloud computing platform can be expressed as follows:

E n e r g y C o s t = \sum_{j = 1}^{m} E S_{j}

(9)

For a task

t_{i}

, the decision variable

x_{i}

represents the index of the assigned VM (ranging from 1 to M). For a workflow with

N

tasks, the corresponding decision variables are collectively defined as

X = {x_{1}, x_{2}, \dots, x_{N}}

, and the decision space is denoted as

S = M^{N}

. Based on the above analysis, the multi-objective workflow scheduling model can be formulated as follows:

\{\begin{matrix} M i n i m i z e f (x) = {f_{1} (x), f_{2} (x), f_{3} (x)} \\ S . t . x \in {1,2, \dots, m}^{n} \end{matrix}

(10)

3. An Evolutionary Algorithm for Workflow Scheduling with Adaptive Dynamic Grouping

The proposed Adaptive Dynamic Grouping (ADG) strategy deeply analyzes the workflow structure and uses a dynamic decision variable grouping mechanism to cluster large-scale variables based on task dependency relationships. This approach compresses the decision space, reduces problem complexity, and avoids the high cost of global search. The strategy can be embedded into the population selection and evaluation processes of existing multi-objective optimization algorithms to enhance their optimization capabilities. The specific embedding process is described in Algorithm 1.

Algorithm 1 The pseudocode of ADG

1: G ← GroupDecisionVariables
2: Initialize a population P
3: Calculate

H V^{P}

on the non-dominated solutions of P
4: for g = 1 → |G| do
5: Get a new population by re-generating values on decision variables of

G_{g}

for P
6: P′ ← Regenerate Population on group

G_{g}

based P
7: Non—dominate sorting of

P^{'}

and calculate

H V^{P^{'}}

8: ∆

C_{g}

← max[

{H V}^{P^'} - {H V}^P, 0]

9: end for
10: while Stop condition is not reached do
11:

G_{k}

← Roulette wheel selection based on ΔC
12: for

l

= 1 → L do
13: if Δ

C_{k}^{l}

+ Δ

C_{k}^{l - 1}

= 0 then
14:

G_{k}^{'}

← Subdivide(

G_{k}

)
15:

G_{k}^{'}

replace

G_{k}

in G
16:            TC ← 0
17:        end if
18:        Q ← Reproduction(

P_{k}

)
19:

P^{'} \leftarrow E n v i r o n m e n t a l S e l e c t i o n (P \cup Q)

20: Update the non-dominated solutions using

P^{'}

21:

H V^{P^{'}} \leftarrow c a l c u l a t e t h e H V v a l u e o f P^{'}

22: Δ

C_{k}^{l}

←

H V^{P^{'}}

−

H V^{P}

,
23:

H V^{P}

←

H V^{P^{'}}

24: TC = Δ

C_{k}^{l}

+ TC
25: end for
26: Δ

C_{k}

← max[TC/L,0]
27: end while

3.1. The Encoding Method

Assume that the current task scheduling problem involves six tasks and four virtual machines (VMs). The encoding process is shown in Figure 3, with the following specific mappings: tasks

t_{1}

and

t_{3}

are allocated to VM

v_{4}

, tasks

t_{2}

and

t_{6}

to VM

v_{1}

, task

t_{4}

to VM

v_{3}

, and task

t_{5}

to VM

v_{2}

. Given the complex dependencies among workflow tasks, traditional scheduling methods relying on pre-randomized task sequences are ill-suited. Therefore, this study employs a single-layer encoding strategy. This approach directly encodes task-VM mappings, bypassing redundant task ordering and focusing on the core problem of resource allocation. The proposed task priority scheduling strategy will be detailed in Section 3.4.

3.2. The Framework of ADG

In the context of workflow scheduling on cloud platforms, the coupling characteristics of large-scale task sets and heterogeneous resources often lead to high complexity in scheduling problems, which greatly complicates the search and optimization processes of algorithms[8]. However, the inherent topological structure and task dependency relationships of workflows provide important breakthroughs for problem-solving. A deep analysis of workflow structures reveals that task execution typically exhibits significant locality characteristics—adjusting the scheduling allocation of local tasks usually only affects the execution progress of associated tasks, with minimal impact on the overall system. Based on this, complex workflow systems can be decoupled into independent or weakly coupled subregions, thereby transforming the global optimization problem into several local optimization problems for subregions. This divide-and-conquer strategy enables the analysis of the contribution weights of each group of decision variables to optimization objectives by leveraging historical iteration results. The core idea of the ADG algorithm is illustrated in Algorithm 1.

In Algorithm 1, the problem features are first independently sampled

|G|

times to generate

|G|

subproblems (line 1), with the decision variable selection details provided in Section 3.3. Subsequently, population

P

is initialized, and the hypervolume (HV) of its non-dominated solutions is computed (lines 2–3). For each subproblem

g

, a new population

P^{'}

is regenerated from

P

by resetting the values of variables selected for

g

. After evaluating the objective values of

P^{'}

, the non-dominated solution set is updated, and the HV difference between

P^{'}

and the initial population determines the initial contribution

Δ C_{g}

of subproblem g (lines 4–8).

During the iterative process, subproblem

G_{k}

is selected using roulette wheel selection (line 10). If

G_{k}

fails to yield positive contributions for two consecutive cycles, its historical contributions are cleared and replaced by a new subproblem

G_{k}

′ selected via roulette wheel selection, followed by the dynamic grouping adjustment strategy (lines 11–16, strategy details in Section 3.3). Otherwise, if

G_{k}

continues to contribute positively, the algorithm proceeds to the offspring generation process, producing a mixed population

Q

for environmental selection to derive a new population

P^{'}

(lines 18–19). The non-dominated solution set is then updated, and the HV value is recalculated as the contribution degree

Δ C_{k}^{l}

(lines 20–23). Finally, the mean contribution degree is computed and assigned as the contribution degree of subproblem

G_{k}

(line 26).

3.3. Dynamic Decision Variable Grouping Mechanism Based on Workflow Structure Decomposition

The GroupDecisionVariables method decomposes high-dimensional scheduling problems into low-dimensional subspaces by parsing workflow topologies and adopting independent feature sampling, providing a hierarchical solving framework for iterative optimization. As illustrated in Figure 4, for a typical workflow scenario (13 tasks with

t_{1}

as the entry task and

t_{13}

as the exit task), the method first filters decision variables corresponding to tasks without successor nodes, incorporating the exit task

t_{13}

into the initial group Group1. Given that only a single task exists at the current layer depth, the decision variables of upstream tasks

t_{10}

,

t_{11}

, and

t_{12}

are recursively aggregated to form the first subproblem SubProblem1, ensuring effective dimensionality for subproblem optimization.

Subsequently, the remaining tasks

T^{'} = {t_{1}, t_{2}, \dots, t_{9}}

form a new grouping space where the “end-task-first aggregation strategy” is repeated: nodes without successors (e.g.,

t_{7}

,

t_{8}

,

t_{9}

) are prioritized, and their decision variables are merged with preceding groups to construct SubProblem2. This process proceeds recursively via depth-first traversal until all task decision variables are integrated into hierarchical subproblems. By backtracking task dependencies from exit to entry tasks, this grouping strategy naturally aligns with the DAG (directed acyclic graph) structure of workflows, maintaining subproblem independence while ensuring weak coupling between adjacent subproblems—thereby providing structured support for subsequent contribution-based dynamic subproblem selection.

The Subdivide method realizes dynamic grouping adjustment through a dual screening mechanism: first, it identifies the core group with the highest contribution degree from the current grouping set, treating it as an independent workflow unit; then, it employs a top-down hierarchical traversal strategy to extract initial task nodes without predecessor dependencies within this workflow. Through depth-first decomposition, the core group is refined into multiple functional subgroups. The average computational resource requirements (such as CPU cycles, memory usage, etc.) of tasks in each subgroup are calculated, and the subgroup with the highest resource requirement—termed the “key task cluster”—is selected. This cluster corresponds to the decision variable set with the greatest impact on system performance, replacing the current redundant group with the lowest contribution degree to form an optimized subgroup set

G

. This process iterates collaboratively with the offspring propagation and environmental selection mechanisms in genetic algorithms to continuously optimize the grouping structure.

Figure 5 illustrates the screening process of key subgroups: suppose Group3 is identified as the highest-contribution group in a certain iteration, hierarchical splitting yields three functional subgroups (Group3.1, Group3.2, and Group3.3). By quantitatively analyzing the resource consumption characteristics of tasks (e.g., computational complexity, I/O frequency), Group3.3, with the highest resource intensity, is selected as the new decision variable group to replace the current redundant group with the lowest contribution degree.

Based on the above grouping mechanism, the preliminary task division can be determined. On this basis, the task priority scheduling policy (detailed in Section 3.4) further determines the execution order of tasks within each group, with the two mechanisms working in close collaboration to achieve efficient workflow scheduling.

3.4. Task Priority Scheduling Policy and Group Task-VM Mapping Reproduction Policy

In cloud workflow task scheduling systems, two primary optimization dimensions exist: determining the execution timing relationships among tasks and mapping tasks efficiently to heterogeneous virtual machine (VM) resources. The task-resource mapping problem can be directly modeled using decision variables, while the task execution order is dynamically optimized through the Priority Scheduling Policy proposed in this section. This policy establishes a partial order among intra-group tasks to ensure that when multiple tasks from the same subgroup are assigned to the same VM, their execution sequence strictly adheres to the workflow’s dependency constraints. Specifically, the execution priority

R a n k (t_{i}, v_{j})

of task

t_{i}

on VM

v_{j}

is computed recursively using Equation (11):

R a n k (t_{i}, v_{j}) = S T (t_{i}, v_{j}) + \bar{w_{i}} + \underset{t_{p} \in s u c (t_{i})}{m a x} R a n k (t_{p})

(11)

where

S T (t_{i}, v_{j})

is defined as the maximum data transfer time among all predecessor tasks of

t_{i}

, this parameter characterizes the earliest feasible start time for task

t_{i}

on virtual machine

v_{j}

. Whereas

\bar{w_{i}}

, the average execution time of task

t_{i}

across the entire VM cluster, is computed as shown in Equation (12)

\bar{w_{i}} = \frac{\sum_{j = 1}^{M} \frac{T L (t_{i})}{M i p s (v_{j})}}{M}

(12)

Workflow tasks on virtual machines are sorted in ascending order based on the priority values computed by Equation (11). Mathematical derivation shows that the priority value of any task is strictly greater than that of all its predecessor tasks, a property ensuring that the “predecessor-first, successor-later” dependency constraints are automatically satisfied during task scheduling. This priority-based sorting mechanism eliminates the generation of invalid solutions at the algorithm design level, eliminating the need for additional constraint handling overhead.

The scheduling model proposed in this paper centers on balancing three objectives—completion time, execution cost, and energy consumption—adopting a modular design to flexibly integrate with existing multi-objective optimization algorithm frameworks. Take NSGA-III [9] as an example: by embedding the proposed adaptive decision variable grouping mechanism, the algorithm significantly enhances population management efficiency when addressing high-dimensional decision spaces.

To maintain diversity within the evolving population and ensure the correctness of the evolutionary direction, this paper proposes an intra-group task load-balanced mapping strategy integrated with crossover and mutation operators to assist the multi-objective algorithm in efficiently generating offspring during the evolutionary process. Specifically, 50% of the individuals are randomly selected to undergo genetic operations (including binary crossover and polynomial mutation), while the remaining individuals generate offspring through the task load-balanced mapping strategy. This strategy achieves dynamic adaptation of the scheduling scheme to heterogeneous computing environments through a dual mechanism of task load balancing and virtual machine performance adaptation.

The specific execution process of the load-balanced mapping strategy includes the following: first, sorting the tasks within the group in ascending order based on the completion times of their target virtual machines, with their indices denoted as

K_{i (i = 1,2, \dots, m)}

; extracting the minimum number of tasks among all virtual machines and denoting it as

Z

; swapping the

Z

tasks with the smallest resource consumption on virtual machine

v_{K_{1}}

with the tasks with the largest resource consumption on

v_{K_{m}}

in a one-to-one manner, while ensuring that the task priority sequence remains unchanged during the exchange process. This iterative process continues until load balancing adjustments are completed for all virtual machines.

The left side of Figure 6 presents the task allocation relationships corresponding to a set of decision variables and the completion time sequences of the target virtual machines. After reordering the virtual machines within the group in ascending order of their completion times, the updated sequence on the right side is obtained. It can be observed that virtual machine

v_{3}

has the fewest tasks

Z = 3

, and the following exchange operations are ultimately performed: Virtual machines

v_{4}

,

v_{2}

, and

v_{3}

each select the tasks with the smallest resource consumption to swap with the tasks with the largest resource consumption on virtual machines

v_{5}

,

v_{6}

, and

v_{1}

, respectively. As illustrated in the right figure, the system achieves load balancing through bidirectional task swapping between virtual machines

v_{4}

and

v_{5}

, involving three task pairs (

T_{1} \leftrightarrow T_{24}

,

T_{14} \leftrightarrow T_{12}

,

T_{30} \leftrightarrow T_{19}

). The blue and red lines in the diagram indicate the task swaps between the two VMs. This scheduling strategy first exchanges tasks between

v_{5}

(with maximum completion time) and

v_{4}

(with minimum completion time) to reduce the system makespan. Subsequently, it sequentially executes cascade optimization between

v_{2}

and

v_{6}

,

v_{3}

and

v_{1}

, establishing a comprehensive load-balancing closed loop. This example intuitively demonstrates the application process of this strategy in an actual scheduling scenario.

4. Experiment

This section outlines the experimental setup, including the workflow configuration, the parameters and metrics of the real virtual machines, followed by ablation experiments of the proposed algorithm. The performance of the proposed algorithm is compared with four representative algorithms, and the results are analyzed and discussed. The experimental environment was configured using WorkflowSim [10], a cloud workflow simulation platform. All experiments were conducted on a Lenovo computer (Beijing, China) running Windows 11, equipped with a 13th Gen Intel(R) Core(TM) i9-13900HX CPU and 32 GB of Samsung DDR5 5600 MHz memory, Lenovo, Beijing, China.

4.1. Experimental Setup

In the experiments, five types of real-world workflows are employed, including CyberShake [11], Epigenomics [12], Inspiral [13], Montage [14], and SIPHT [15] as illustrated in Figure 7, each representing distinct domains: CyberShake for analyzing seismic hazard-related data, Epigenomics for matching epigenetic states of human cells in biology, Inspiral for gravitational waveform analysis in physics, Montage for constructing customized astronomical sky images in astronomy, and SIPHT as a bioinformatics method for analyzing bacterial small-molecule untranslated regulatory RNAs. Figure 7 depicts the architectures of these workflows, which span diverse application domains and serve as widely adopted benchmarks for evaluating workflow scheduling algorithms, with each containing task sets across four scales. Table 1 lists 12 VM types from leading cloud service providers—Amazon EC2, Alibaba Cloud, and Microsoft Azure—applied in this study’s experiments.

4.2. Ablation Experiment

To validate the effectiveness of each component of the proposed algorithm, four ablation experiments were designed and conducted, focusing on the analysis of decision variable grouping, task priority ranking, and offspring generation mechanisms. The experiments were carried out in four different scales of CyberShake cloud computing workflow scenarios, using the twelve VM types described previously. The specific comparison methods are as follows:

(1): NGD-NSGAIII: A comparison method without the dynamic grouping mechanism;
(2): NRP-NSGAIII: A comparison method without the intra-group task priority ranking;
(3): NVM-NSGAIII: A comparison method without the task mapping strategy for offspring generation;
(4): ADG-NSGAIII: The complete algorithm proposed in this paper.

Figure 8 presents the convergence curves of the above algorithms for the three optimization objectives—completion time, execution cost, and energy consumption—across four scales of CyberShake workflows.

The experimental comparison results demonstrate that the proposed algorithm comprehensively outperforms other ablation variants: specifically, NGD-NSGAIII (without a dynamic grouping mechanism) exhibits performance comparable to the proposed algorithm under small task scales (30/50 tasks), but its target performance significantly lags behind in large-scale scenarios. NRP-NSGAIII (lacking intra-group task prioritization) shows extremely irregular convergence trends in completion time due to neglecting task dependency constraints, allowing successor tasks to be scheduled before predecessors—this causes prolonged startup waits for predecessor tasks, triggering significant execution delays and leading to severe fluctuations in the convergence curve. In contrast, NVM-NSGAIII (without task mapping strategy for offspring generation) demonstrates stronger randomness in solution searching due to the absence of heuristic guidance. Offspring generated randomly via cross-variable operators often contain numerous invalid or low-quality solutions, requiring more iterations to converge to optimal solutions. This not only significantly slows down the convergence speed but also exerts obvious negative impacts on the scheduling efficiency of cloud computing tasks.

4.3. Algorithm Comparison Experiment

In the experimental validation phase, the proposed Adaptive Dynamic Grouping-based Multi-Objective Optimization Algorithm (ADG-NSGAIII) is benchmarked against four categories of comparison algorithms: (1) Improved Multi-Objective Particle Swarm Optimization (I_MaOPSO) [16], (2) Non-dominated Sorting Genetic Algorithm II (NSGAII) [17], (3) Non-dominated Sorting Genetic Algorithm III (NSGAIII) [9], and (4) Reference Vector Guided Evolutionary Algorithm (RVEA) [18]. Table 2 presents the Hypervolume Indicator (HV) comparison results of these five algorithms across 20 workflow instances (encompassing three optimization objectives: Makespan, execution cost, and energy consumption, with decision variable dimensions ranging from 24 to 1000). Table 3 further showcases the performance metrics of NSGAII, NSGAIII, and RVEA after integrating the proposed Adaptive Dynamic Grouping mechanism, calculated using the PlatEMO Multi-Objective Optimization Platform [19]. Specifically, the metrics include the following: (a) Hypervolume Indicator (HV), (b) 100-iteration time consumption, and (c) Cumulative time consumption to achieve equivalent convergence. Algorithm 1 illustrates the implementation details of the grouping mechanism using NSGAIII as a representative example.

To validate the performance difference between the algorithm using the ADG strategy and the algorithm without ADG, a paired-sample t-test was conducted for statistical analysis. In the experiment, each algorithm was executed 10 times independently for problem scenarios of different scales, and the average values were calculated, generating a total of 12 groups of comparative data. According to Formula (13) (where

\bar{d}

represents the mean of differences and

s_{d}

represents the standard deviation of differences), the calculated t-value was 2.6547, with a corresponding p-value of 0.0224.

t = \frac{\bar{d}}{s_{d} / \sqrt{n}}

(13)

From a statistical perspective, when the significance level is typically set at 0.05, the p-value of 0.0224 is less than 0.05, indicating that the difference between the two groups of data is statistically significant. This result fully demonstrates that the ADG strategy proposed in this paper can significantly enhance the algorithm performance. Its effectiveness has been verified across problem scales of varying sizes, confirming its practical engineering application value.

To validate the execution efficiency of the algorithm, this paper conducts a comparative analysis of the execution times between algorithms with and without the ADG strategy, exploring two key dimensions. Firstly, the time consumption for both algorithms to complete 10 iterations is compared, as presented in Table 4. Secondly, taking the cost results of NSGA-II after 100 iterations as the benchmark, the number of iterations and corresponding time required for other algorithms to achieve the same optimization effect are analyzed, as shown in Table 5. The symbol “-” indicates that the algorithm fails to reach the optimization level of NSGA-II within 100 iterations.

The experimental data reveal that in the 10-iteration scenario, algorithms adopting the ADG strategy generally exhibit higher average execution times than those without it, mainly because the ADG strategy incurs additional computational costs from calculating individual contributions and performing dynamic grouping during the population selection phase. However, in terms of optimization effectiveness, the ADG strategy shows remarkable advantages. Its superiority in reducing iteration numbers is not obvious for small—scale problems, but as the problem scale grows, such as when n = 100, the number of iterations of the algorithm with the ADG strategy significantly decreases. When n = 1000 and the single—iteration computation time substantially increases, the ADG strategy significantly shortens the overall optimization time by reducing the number of iterations, demonstrating particularly striking efficiency improvements. Evidently, the ADG strategy can significantly boost the optimization efficiency of algorithms for large-scale problems, effectively striking a balance between computational overhead and optimization performance.

In addition to the comparison with multi-objective workflow scheduling algorithms, this paper also designed a set of experiments to compare the optimization effects with various classical heuristic algorithms in the field of scheduling in the CyberShake workflow environment. These algorithms include the First-Come-First-Served (FCFS) algorithm [20], Round Robin (RR) algorithm [20], Shortest Job First (SJF) algorithm [20], Min-Min algorithm [21], and Min-Max algorithm [22]. Figure 9 shows the experimental results based on the following three optimization objectives.

5. Conclusions and Future Work

The ADG strategy proposed in this paper addresses the challenges of cloud computing workflow scheduling by deeply mining workflow structure knowledge to achieve efficient grouping of decision variables. The strategy integrates an Adaptive Dynamic Grouping adjustment mechanism to ensure precise allocation of evolutionary opportunities among different subgroups, while adopting a task load-balanced virtual machine mapping strategy for offspring generation, fundamentally enhancing search efficiency and convergence performance. Extensive comparative experiments in real-world workflow scenarios and cloud platform environments demonstrate that the strategy exhibits significant advantages in high-dimensional multi-objective optimization algorithms: compared with algorithms without this strategy, the number of iterations is reduced by nearly 50%, the average execution time is shortened by over 40%, and the t-test verifies its effectiveness and stability.

Despite achieving breakthrough results in workflow scheduling, the algorithm has two main limitations. Firstly, it heavily relies on specific domain knowledge systems; only with a thorough understanding of intra-domain dependencies can task dependencies be leveraged during the design phase for dynamic grouping and optimization, which restricts its direct applicability to large-scale multi-objective optimization problems in other domains, such as financial optimization and logistics scheduling. Secondly, the computational time complexity of the hypervolume (HV) metric increases exponentially with the number of objectives, leading to excessive consumption of computational resources and significantly prolonged processing times. Future research will focus on these two issues by expanding the algorithm’s application scenarios to emerging fields like edge computing, exploring lightweight computational complexity evaluation metrics for integration into the algorithm framework, and continuously enhancing the algorithm’s performance and general adaptability to provide novel solutions for advancing cloud computing task scheduling technologies.

Author Contributions

Conceptualization, G.Z. and C.S.; Software, A.Z.; Data curation, Q.Y.; Writing—original draft, G.Z.; Writing—review & editing, C.S.; Funding acquisition, C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (Grant No. 62372319).

Data Availability Statement

The datasets used in this study, CyberShake, Epigenomics, Inspiral, Montage, and SIPHT, were obtained from publicly available sources. The experimental results have been presented in the manuscript, with no additional data generated.

Conflicts of Interest

The authors declare no conflict of interest.

References

Deb, M.; Choudhury, A. Hybrid cloud: A new paradigm in cloud computing. In Machine Learning Techniques and Analytics for Cloud Security; Wiley: Hoboken, NJ, USA, 2021; pp. 1–23. [Google Scholar]
Al-Dhuraibi, Y.; Paraiso, F.; Djarallah, N.; Merle, P. Elasticity in cloud computing: State of the art and research challenges. IEEE Trans Serv. Comput. 2017, 11, 430–447. [Google Scholar] [CrossRef]
Baldan, F.J.; Ramirez-Gallego, S.; Bergmeir, C.; Herrera, F.; Benitez, J.M. A forecasting methodology for workload forecasting in cloud systems. IEEE Trans Cloud Comput. 2016, 6, 929–941. [Google Scholar] [CrossRef]
Bindu, G.B.; Ramani, K.; Bindu, C.S. Optimized resource scheduling using the meta heuristic algorithm in cloud computing. IAENG Int. J. Comput. Sci. 2020, 47, 360–366. [Google Scholar]
Adhikari, M.; Amgoth, T.; Srirama, S.N. A survey on scheduling strategies for workflows in cloud environment and emerging trends. ACM Comput. Surv. (CSUR) 2019, 52, 1–36. [Google Scholar] [CrossRef]
Song, Y.; Xin, R.; Chen, P.; Zhang, R.; Chen, J.; Zhao, Z. Identifying performance anomalies in fluctuating cloud environments: A robust correlative-GNN-based explainable approach. Future Gener. Comput. Syst. 2023, 145, 77–86. [Google Scholar] [CrossRef]
Xia, Y.; Luo, X.; Jin, T.; Li, J.; Xing, L. A tri-chromosome-based evolutionary algorithm for energy-efficient workflow scheduling in clouds. Swarm Evol. Comput. 2024, 91, 101751. [Google Scholar] [CrossRef]
Zitzler, E.; Thiele, L.; Laumanns, M.; Fonseca, C.M.; Da Fonseca, V.G. Performance assessment of multiobjective optimizers: An analysis and review. IEEE Trans. Evol. Comput. 2003, 7, 117–132. [Google Scholar] [CrossRef]
Deb, K.; Jain, H. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: Solving problems with box constraints. IEEE Trans Evol. Comput. 2013, 18, 577–601. [Google Scholar] [CrossRef]
Chen, W.; Deelman, E. Workflowsim: A toolkit for simulating scientific workflows in distributed environments. In Proceedings of the 2012 IEEE 8th International Conference on E-Science, Chicago, IL, USA, 8–12 October 2012; pp. 1–8. [Google Scholar]
Maechling, P.; Deelman, E.; Zhao, L.; Graves, R.; Mehta, G.; Gupta, N.; Mehringer, J.; Kesselman, C.; Callaghan, S.; Okaya, D.; et al. SCEC CyberShake workflows—Automating probabilistic seismic hazard analysis calculations. In Workflows for e-Science: Scientific Workflows for Grids; Springer: Berlin/Heidelberg, Germany, 2007; pp. 143–163. [Google Scholar]
Li, H.; Ruan, J.; Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18, 1851–1858. [Google Scholar] [CrossRef] [PubMed]
Brown, D.A.; Brady, P.R.; Dietz, A.; Cao, J.; Johnson, B.; McNabb, J. A case study on the use of workflow technologies for scientific analysis: Gravitational wave data analysis. In Workflows for e-Science: Scientific Workflows for Grids; Springer: Berlin/Heidelberg, Germany, 2007; pp. 39–59. [Google Scholar]
Berriman, G.B.; Deelman, E.; Good, J.C.; Jacob, J.C.; Katz, D.S.; Kesselman, C.; Laity, A.C.; Prince, T.A.; Singh, G.; Su, M.H. Montage: A grid-enabled engine for delivering custom science-grade mosaics on demand. In Proceedings of the Optimizing Scientific Return for Astronomy Through Information Technologies, Glasgow, UK, 24–25 June 2004; SPIE: Bellingham, WA, USA, 2004; Volume 5493, pp. 221–232. [Google Scholar]
Livny, J.; Teonadi, H.; Livny, M.; Waldor, M.K. High-throughput, kingdom-wide prediction and annotation of bacterial non-coding RNAs. PLoS ONE 2008, 3, e3197. [Google Scholar] [CrossRef]
Saeedi, S.; Khorsand, R.; Bidgoli, S.G.; Ramezanpour, M. Improved many-objective particle swarm optimization algorithm for scientific workflow scheduling in cloud computing. Comput. Ind. Eng. 2020, 147, 106649. [Google Scholar] [CrossRef]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T.A. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Cheng, R.; Jin, Y.; Olhofer, M.; Sendhoff, B. A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Trans Evol. Comput. 2016, 20, 773–791. [Google Scholar] [CrossRef]
Tian, Y.; Cheng, R.; Zhang, X.; Jin, Y. PlatEMO: A MATLAB platform for evolutionary multi-objective optimization [educational forum]. IEEE Comput. Intell. Mag. 2017, 12, 73–87. [Google Scholar] [CrossRef]
Siahaan, A.P.U. Comparison analysis of CPU scheduling: FCFS, SJF and Round Robin. Int. J. Eng. Dev. Res. 2016, 4, 124–132. [Google Scholar]
Mustapha, S.M.F.D.S.; Gupta, P. Fault aware task scheduling in cloud using min-min and DBSCAN. Internet Things Cyber-Phys. Syst. 2024, 4, 68–76. [Google Scholar] [CrossRef]
Mishra, S.K.; Sahoo, B.; Parida, P.P. Load balancing in cloud computing: A big picture. J. King Saud Univ.-Comput. Inf. Sci. 2020, 32, 149–158. [Google Scholar] [CrossRef]

Figure 1. An example of workflow schedule model.

Figure 2. Example of workflow model.

Figure 3. Method of encoding example.

Figure 4. Example of grouping workflow tasks.

Figure 5. Example of adaptive dynamic packet adjustment strategy.

Figure 6. Example of task balancing mapping policy.

Figure 7. Structures of five real-world workflows.

Figure 8. (a) Convergence curves for Makespan in different CyberShake workflows; (b) Convergence curves for Cost in the ablation experiment; (c) Convergence curves for EnergyCost in the ablation experiment.

Figure 9. (a) The result of Makespan is compared with the heuristic algorithm; (b) The result of Cost is compared with the heuristic algorithm; (c) The result of Makespan is compared with the heuristic algorithm.

Table 1. Attribute range of the virtual machine.

Types	Mips (MB/s)	CPUs	PerCost ($)	Bandwidth (MB)
EC2.S	512	1	0.043	512
EC2.M	1024	1	0.086	768
EC2.L	2048	2	0.174	1280
EC2.XL	2048	4	0.350	2560
Alibaba Cloud.S	1024	2	0.047	1280
Alibaba Cloud.M	1024	2	0.351	1280
Alibaba Cloud.L	2048	4	0.050	2048
Alibaba Cloud.XL	5120	8	0.257	2048
Azure.S	768	1	0.096	1024
Azure.M	1280	2	0.192	2048
Azure.L	2560	4	0.383	1640
Azure.XL	3072	8	0.766	3072

Table 2. Comparison results concerning metric HV.

Algorithm	I_MaOPSO +/−/≈	NSGAII +/−/≈	NSGAIII +/−/≈	RVEA +/−/≈
20 Workflows	4/9/7	2/10/8	3/13/4	1/11/8

Table 3. Comparison results concerning metric HV value.

CyberShake	n	NSGAII	NSGAIII	RVEA
With ADG	30	9.1860 × 10⁻¹	9.2081 × 10⁻¹	8.2873 × 10⁻¹
	50	9.2981 × 10⁻¹	9.3005 × 10⁻¹	8.6430 × 10⁻¹
	100	7.8492 × 10⁻¹	8.6712 × 10⁻¹	8.2984 × 10⁻¹
	1000	7.3192 × 10⁻¹	7.7946 × 10⁻¹	5.5573 × 10⁻¹
Without ADG	30	8.4348 × 10⁻¹	8.4719 × 10⁻¹	9.1223 × 10⁻¹
	50	8.7973 × 10⁻¹	8.7298 × 10⁻¹	9.1858 × 10⁻¹
	100	7.8538 × 10⁻¹	7.8348 × 10⁻¹	7.6878 × 10⁻¹
	1000	6.3113 × 10⁻¹	6.1608 × 10⁻¹	4.4648 × 10⁻¹

Table 4. The average execution time over 10 iterations.

CyberShake	n	NSGAII	NSGAIII	RVEA
WithADG	30	3.8 s	4.7 s	4.3 s
	50	6.6 s	10.0 s	11.2 s
	100	14.3 s	30.5 s	23.5 s
	1000	386.2 s	2949.2 s	1322.6 s
WithOutADG	30	2.1 s	3.3 s	3.2 s
	50	3.6 s	7.5 s	7.6 s
	100	8.6 s	25.5 s	21.1 s
	1000	326.3 s	2898.5 s	1021.3 s

Table 5. Comparison of the number of iterations when achieving the same optimization effect.

CyberShake	n	NSGAII	NSGAIII	RVEA
WithOutADG	30	100 times	-	-
	50	100 times	87 times	86 times
	100	100 times	76 times	-
	1000	100 times	53 times	93 times
WithADG	30	87 times	91 times	88 times
	50	73 times	82 times	80 times
	100	52 times	46 times	68 times
	1000	43 times	32 times	57 times

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, G.; Zhang, A.; Sun, C.; Ye, Q. An Evolutionary Algorithm for Multi-Objective Workflow Scheduling with Adaptive Dynamic Grouping. Electronics 2025, 14, 2586. https://doi.org/10.3390/electronics14132586

AMA Style

Zhang G, Zhang A, Sun C, Ye Q. An Evolutionary Algorithm for Multi-Objective Workflow Scheduling with Adaptive Dynamic Grouping. Electronics. 2025; 14(13):2586. https://doi.org/10.3390/electronics14132586

Chicago/Turabian Style

Zhang, Guochen, Aolong Zhang, Chaoli Sun, and Qing Ye. 2025. "An Evolutionary Algorithm for Multi-Objective Workflow Scheduling with Adaptive Dynamic Grouping" Electronics 14, no. 13: 2586. https://doi.org/10.3390/electronics14132586

APA Style

Zhang, G., Zhang, A., Sun, C., & Ye, Q. (2025). An Evolutionary Algorithm for Multi-Objective Workflow Scheduling with Adaptive Dynamic Grouping. Electronics, 14(13), 2586. https://doi.org/10.3390/electronics14132586

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Evolutionary Algorithm for Multi-Objective Workflow Scheduling with Adaptive Dynamic Grouping

Abstract

1. Introduction

2. Problem Description

2.1. Cloud Computing Workflow Scheduling Model

2.2. Optimization Goals

2.2.1. The Calculation of Makespan

2.2.2. Task Execution Cost

2.2.3. Energy Consumption for Virtual Machines

3. An Evolutionary Algorithm for Workflow Scheduling with Adaptive Dynamic Grouping

3.1. The Encoding Method

3.2. The Framework of ADG

3.3. Dynamic Decision Variable Grouping Mechanism Based on Workflow Structure Decomposition

3.4. Task Priority Scheduling Policy and Group Task-VM Mapping Reproduction Policy

4. Experiment

4.1. Experimental Setup

4.2. Ablation Experiment

4.3. Algorithm Comparison Experiment

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI