Joint Optimization of Container Resource Defragmentation and Task Scheduling in Queueing Cloud Computing: A DRL-Based Approach

Guo, Yan; Wei, Lan; Fan, Cunqun; Ma, You; Zhao, Xiangang; He, Henghong

doi:10.3390/fi17110483

Open AccessArticle

Joint Optimization of Container Resource Defragmentation and Task Scheduling in Queueing Cloud Computing: A DRL-Based Approach

by

Yan Guo

^1,2,*,

Lan Wei

^1,2,*,

Cunqun Fan

^1,2,

You Ma

^1,2,

Xiangang Zhao

^1,2 and

Henghong He

^1,2

¹

Key Laboratory of Radiometric Calibration and Validation for Environmental Satellites, National Satellite Meteorological Center (National Center for Space Weather), China Meteorological Administration, Beijing 100081, China

²

Innovation Center for FengYun Meteorological Satellite (FYSIC), Beijing 100081, China

^*

Authors to whom correspondence should be addressed.

Future Internet 2025, 17(11), 483; https://doi.org/10.3390/fi17110483

Submission received: 17 September 2025 / Revised: 16 October 2025 / Accepted: 20 October 2025 / Published: 22 October 2025

(This article belongs to the Special Issue Convergence of IoT, Edge and Cloud Systems)

Download

Browse Figures

Versions Notes

Abstract

Container-based virtualization has become pivotal in cloud computing, and resource fragmentation is inevitable due to the frequency of container deployment/termination and the heterogeneous nature of IoT tasks. In queuing cloud systems, resource defragmentation and task scheduling are interdependent yet rarely co-optimized in existing research. This paper addresses this gap by investigating the joint optimization of resource defragmentation and task scheduling in a queuing cloud computing system. We first formulate the problem to minimize task completion time and maximize resource utilization, then transform it into an online decision problem. We propose a Deep Reinforcement Learning (DRL)-based two-layer iterative approach called DRL-RDG, which uses a Resource Defragmentation approach based on a Greedy strategy (RDG) to find the optimal container migration solution and a DRL algorithm to learn the optimal task-scheduling solution. Simulation results show that DRL-RDG achieves a low average task completion time and high resource utilization, demonstrating its effectiveness in queuing cloud environments.

Keywords:

cloud computing; IoT; resource fragmentation; task scheduling; queuing; container

1. Introduction

Cloud computing provides on-demand access to a shared pool of configurable computing resources, such as servers, storage, networks, applications, and so on, over the Internet. Cloud computing systems provide services by allocating underlying physical resource flexibly to users via virtualization technologies [1,2]. In recent years, container-based virtualization has emerged in cloud computing, driving widespread attention and adoption across both industry and academia due to the lightweight and flexibility of features [3]. For example, major cloud providers (e.g., AWS, Google Cloud, and Azure) offer container orchestration services (e.g., Kubernetes and ECS) to support large-scale application deployment and management [4,5].

The proliferation of Internet of Things (IoT) applications further amplifies the importance of efficient resource management in cloud environments [4]. Some IoT systems generate massive volumes of time-sensitive data that require rapid processing and low-latency responses, placing stringent demands on underlying infrastructure. Containerized virtualization has emerged as a key enabler for IoT cloud deployments.

When receiving tasks from users, the container scheduler in a cloud computing system determines the priority of tasks (i.e., which tasks should be processed first) and maps each task’s container to a server (i.e., which server should host the container to fulfill resource requirements). The container scheduler is a key enabling technology, since it affects the resource utilization efficiency and users’ service-level agreements [6]. However, one common challenge associated with the containerized service model is the issue of resource fragmentation. As containers are rapidly deployed and terminated, available CPU, memory, and storage resources on servers are split into small blocks. The considerable heterogeneity of IoT tasks and cumulative resource allocation/deallocation cycles naturally lead to resource fragmentation, making it difficult to achieve high resource utilization and fast job completion [7,8].

Against this backdrop, jointly optimizing resource defragmentation and task scheduling becomes an even more formidable challenge in queuing cloud systems. Due to the high cost and substantial energy consumption of cloud infrastructure, it is costly to over-procure such infrastructure to process a large number of user requests in real time. In this scenario with limited resources, task queuing becomes inevitable, especially in private cloud environments. In such a queuing cloud system, resource defragmentation and task scheduling are inherently interdependent processes. Effective resource defragmentation consolidates scattered idle resources, which enables the scheduler to allocate resources to more tasks and reduces the average task completion time. Conversely, task scheduling can proactively prevent excessive fragmentation. This bidirectional interdependence highlights the necessity of an integrated optimization framework that unifies defragmentation and scheduling to simultaneously enhance resource utilization efficiency while reducing task completion time.

A number of resource defragmentation policies have been explored by leveraging container/Virtual Machine (VM) live migration. Live migration allows running containers/VMs to be transferred between physical servers without service interruption, making it possible to consolidate the resource fragmentation of physical servers and optimize resource allocations. The existing resource defragmentation schemes mainly focus on metrics such as the number of active hosts [9,10], power consumption [11,12], migration cost [10,13,14], resource fragmentation [9,15], etc. While these efforts have made meaningful progress in reducing resource fragmentation and improving infrastructure efficiency, these schemes mainly focus on the scenario where the sequence of incoming user tasks is unknown and fail to account for the joint optimization of resource defragmentation and task scheduling.

To address the above problem, we investigate the joint optimization of resource defragmentation and task scheduling in a queuing cloud computing system. As shown in Figure 1, users continuously submit tasks, which enter a task queue to wait for processing. A container scheduler migrates running containers as needed and dispatches these queued tasks to the three servers. Figure 2 illustrates a comparative analysis of different resource defragmentation and task-scheduling strategies. Figure 2a depicts a Best fit-based task scheduling strategy without resource defragmentation, where no container is migrated;

k_{12}

is scheduled to

s_{2}

[16]. Figure 2b depicts a separate optimization of resource defragmentation and task scheduling, where

c_{7}, c_{9}, c_{10}

are migrated to

s_{3}, s_{2}, s_{1}

, respectively, for maximum available resources [15]; then,

k_{11}, k_{12}

are scheduled to

s_{3}

. In the resource defragmentation stage, containers are migrated to minimize resource fragmentation, without regard for resource requirements of queued tasks. Figure 2c depicts a joint optimization of resource defragmentation and task scheduling, where only

c_{10}

is migrated to

s_{1}

, under the consideration of resource requirements of queued tasks; then,

k_{11}, k_{12}

are scheduled to

s_{3}, s_{2}

, respectively. This example indicates that, compared with task scheduling alone, resource defragmentation improves the resource utilization by scheduling more tasks at the cost of container migration. Through the joint optimization of resource defragmentation and task scheduling, identical tasks are scheduled while involving fewer container migration operations.

There are two primary challenges in the joint optimization of resource defragmentation and task scheduling: the heterogeneous and dynamic nature of task resource requirements, as well as variations in task duration. These factors collectively contribute to persistent resource fragmentation, which, in turn, increases task queuing delays and degrades overall system efficiency. Effectively addressing these issues requires not only intelligent task sequencing to minimize makespan but also proactive container migration strategies that consolidate fragmented resources with minimal overhead. Furthermore, the interdependence between scheduling and defragmentation necessitates a coordinated approach that dynamically balances immediate task execution needs with long-term resource availability.

In this paper, we formulate the joint resource defragmentation and task-scheduling problem, with the aim of minimizing the task completion time and maximizing resource utilization. The problem is then transformed to an online decision problem that aligns with the dynamic, real-time nature of cloud environments. We further propose a Deep Reinforcement Learning (DRL)-based resource defragmentation and task-scheduling approach called DRL-RDG, which is a two-layer iterative approach to solve the queued task-scheduling subproblem through a DRL algorithm and the resource defragmentation subproblem through a Resource Defragmentation approach based on a Greedy strategy (RDG). The proposed DRL-RDG can balance short-term migration costs against long-term scheduling gains in task completion time and utilization.

The remainder of this paper is organized as follows. Section 2 introduces the related work. Section 3 formulates the joint resource defragmentation and task-scheduling problem. Section 4 transforms the problem to an online decision problem, then introduces the DRL-RDG approach. Simulation studies are conducted to demonstrate the efficiency of the proposal in Section 5. Finally, Section 6 concludes this paper.

2. Related Work

A number of VM/container resource defragmentation approaches and task scheduling approaches in cloud computing have been proposed in recent years.

To mitigate the resource fragmentation issue, VM/container consolidation has been extensively researched, primarily focusing on a better assignment of VMs/containers to hosts. The exiting VM/container consolidation approaches mainly focus on power consumption [11,12], resource utilization [17], the number of active hosts [9,10,18], migration cost [10,13,14], resource fragmentation [9,15], etc. To solve the consolidation problem, many metaheuristics and heuristics have been proposed. Metaheuristics provide approximate solutions based on genetic algorithms [19], ant colony optimization [20], particle swarm optimization [21], etc. Heuristics are proposed for better performance in terms of the solvable instance size and time complexity [9,10,15,22]. For example, Gudkov et al. [10] proposed a heuristic for solving the VM consolidation problem with a controlled trade-off between the number of released hosts and the amount of migration memory. The key idea is to place a VM in a host with a lack of free space using induced migrations [10]. Zhu et al. proposed a nimble, fine-grained consolidation algorithm, which focuses on utilizing resource fragmentation to increase the number of additional VM allocations [15]. Kiaee et al. considered joint VM and container consolidation in an edge-cloud environment and proposed an autoencoder-based solution comprising two stages of consolidation subproblems—namely, a joint VM and container multi-criteria migration decision and an edge-cloud power service-level agreement for VM placement [7]. However, these consolidation approaches mainly focus on scenarios where the sequence of incoming user tasks is unknown and fail to account for the joint optimization of resource defragmentation and task scheduling.

Many works on task scheduling in cloud computing have been proposed to improve load balancing [23,24], minimize task completion time [25], maximize throughput [26], etc. Early research often focused on heuristic methods to efficiently allocate tasks to available virtual machines, such as first in–first out (FIFO), shortest job first (SJF), genetic algorithms, and so on [25,27,28]. More recently, machine learning-based techniques, including reinforcement learning, have been applied to adaptively learn optimal scheduling policies in dynamic and heterogeneous cloud environments [29,30,31]. For example, Guo et al. studied delay-optimal scheduling in a queuing cloud computing system and proposed a heuristic algorithm in which a min–min best fit policy is used to solve inter-queue scheduling and an SJF policy is used to solve intra-queue buffering [25]. Kang et al. proposed an automatic generation network-based DRL approach to learn the optimal policy of dispatching arriving user requests, with the reward aiming to minimize the task response time and maximize resource utilization [31].

However, these studies on task scheduling overlook the issue of resource fragmentation. This limitation becomes particularly critical in high-load scenarios, where fragmented resources prevent efficient task placement, leading to increased delays and resource under-utilization. Consequently, there is a growing recognition of the need to integrate resource defragmentation awareness into task-scheduling strategies, especially in queuing cloud systems where resource continuity and task scheduling are deeply interdependent.

3. System Model and Problem Formulation

3.1. System Model

This paper considers a queuing cloud computing system with a resource pool consisting of a number of computing servers. S denotes the set of computing servers, and m is the number of computing servers. Each server (

s_{i} \in S (1 \leq i \leq m)

) has limited resources (e.g., CPU, memory, etc.).

R_{c}

and

R_{m}

denote the capacity of the CPU and memory of servers (

s_{i}

). These resources are provided to users in the form of containers. Specifically, users submit task requirements to the cloud computing system, which allocates container instances based on these requirements. The capacity of a server is larger than that of a container instance. Therefore, multiple instances can be run on a server in parallel, and the instances are dispersed across servers according to the task allocation policy. To ensure flexible resource allocation and efficient resource utilization, we assume that the resource requirements of container instances are exactly equal to the task requirements of users.

3.2. Task Model

Heterogeneous and dynamic tasks are considered. The CPU and memory requirements of tasks are heterogeneous, and each task arrives independently. We consider a non-preemptive time-slotted system. At the beginning of each time slot (

t \in [1, τ]

), users submit tasks to the system, where

τ

is a sufficiently large integer.

k_{j} (1 \leq j \leq n)

denotes the tasks, where n is the total number of tasks. We assume that when a task arrives, its resource requirement and its duration are specified. The resource requirement of each task (

k_{j} (1 \leq j \leq n)

) contains the CPU requirement (

r_{c} (j)

) and memory requirement (

r_{m} (j)

). The duration (

T_{d} (j)

) of task

k_{j}

represents its required processing time. The submitted time of task

k_{j}

is the time at which task

k_{j}

is submitted to the scheduling queue, denoted by

T_{s} (j)

. The custom container instance (

c_{j} (1 \leq j \leq n)

) allocated to the task (

k_{j}

) is also assigned resources

r_{c} (j)

and

r_{m} (j)

.

As shown in Figure 1, users continuously submit tasks (

k_{1}, k_{2}, \dots

), which enter a task queue to wait for processing. The height and length of the task blocks and container blocks represent the CPU requirement (

r_{c} (j)

) and memory requirement (

r_{m} (j)

), respectively. Tasks

k_{1}, k_{2}, \dots

are scheduled to specific containers

c_{1}, c_{2}, \dots

on servers

s_{1}, s_{2}, s_{3}

according to the scheduling strategy. Each task (

k_{i}

) is executed in its designated container (

c_{i}

). The gray areas represent remaining available resources. A container scheduler migrates running containers as needed and dispatches these queued tasks to the three servers. The details of some notations are listed in Table 1.

3.3. Task-Scheduling Decision

Let

x_{i, j, t}

denote the task-scheduling decision, where

x_{i, j, t} \in {0, 1}

,

1 \leq i \leq m

,

1 \leq j \leq n

, and

t \in [1, τ)

.

x_{i, j, t} = 1

denotes that container instance

c_{j}

is hosted by server

s_{i}

to execute task

k_{j}

in time slot t and

x_{i, j, t} = 0

otherwise. Each container (

c_{j}

) must run continuously once started, and it can be migrated but not interrupted until its task finishes.

\sum_{i} x_{i, j, t} + \sum_{i} \sum_{l = t + 2}^{τ} x_{i, j, l} - \sum_{i} x_{i, j, t + 1} \leq 1, \forall j \in [1, n], t \in [1, τ)

(1)

Formula (1) enforces the non-interruption of task execution. Specifically, it ensures that if a task is active in time slot t and becomes inactive in

t + 1

, it cannot be reactivated in

t + 2

or any subsequent time slot. This prevents fragmented execution and guarantees continuous resource allocation once a task begins.

Additionally, containers cannot be partitioned and placed on multiple servers, which leads to another constraint—namely, that a container should be hosted on, at most, one server at any time.

\sum_{i = 1}^{m} x_{i, j, t} \leq 1, \forall j \in [1, n], t \in [1, τ)

(2)

For each server, the allocated resource for containers should be below the capacity (

R_{c}

and

R_{m}

).

\sum_{j = 1}^{n} x_{i, j, t} r_{c} (j) \leq R_{c}, \sum_{j = 1}^{n} x_{i, j, t} r_{m} (j) \leq R_{m}, \forall i \in [1, m], t \in [1, τ)

(3)

3.4. Resource Defragmentation Decision

For running containers, container migration between servers can be implemented to resolve resource fragmentation. Let

y_{i, i^{'}, j, t}

denote the resource defragmentation decision variable, indicating whether

c_{j}

is migrated from server

s_{i}

to

s_{i^{'}}

in t, where

y_{i, i^{'}, j, t} \in {0, 1}

,

1 \leq i, i^{'} \leq m

,

i \neq i^{'}

,

1 \leq j \leq n

, and

t \in [1, τ)

.

y_{i, i^{'}, j, t} = 1

denotes that container instance

c_{j}

is migrated from server

s_{i}

to

s_{i^{'}}

in t and

y_{i, i^{'}, j, t} = 0

otherwise. Note that ∀

i = i^{'}

,

y_{i, i^{'}, j, t} = 0

. The relationship between the resource defragmentation decision variable and the task-scheduling decision variable is

y_{i, i^{'}, j, t} = \{\begin{matrix} 1 & x_{i, j, t} = 1 and x_{i^{'}, j, t + 1} = 1 and i \neq i^{'} \\ 0 & otherwise \end{matrix}

(4)

Container migration can bring about a service disruption, as denoted by service downtime (

T_{m} (j)

). For each container (

c_{j}

), the overall running time (

T_{r} (j)

) is the summation of its duration and its service downtime. It is defined as

T_{r} (j) = \sum_{t = 1}^{τ} \sum_{i^{'} = 1}^{m} \sum_{i = 1}^{m} y_{i, i^{'}, j, t} T_{m} (j) + T_{d} (j) .

(5)

This definition leads to the task-scheduling constraint:

\sum_{t = 1}^{τ} \sum_{i = 1}^{m} x_{i, j, t} = T_{r} (j), \forall j \in [1, n] .

(6)

3.5. Objective Function

In this paper, we focus on the task completion time and the resource fragmentation rate in a cloud system. The task completion time of a task (

k_{j}

) is the summation of its queuing time and the running time of

c_{j}

, and it relates directly to the quality of experience of users. The resource fragmentation rate refers to a metric that quantifies the degree to which available resources (e.g., memory and CPU) are divided into small, non-contiguous blocks, reducing their usability for efficient allocation. A task-scheduling policy achieving a lower resource fragmentation rate is more efficient in terms of resource utilization.

Let

T_{c} (j)

be the task completion time of

k_{j}

, which is defined as

T_{c} (j) = min {t | \sum_{i = 1}^{m} x_{i, j, t} = 1} - T_{s} (j) + T_{r} (j),

(7)

where

min {t | \sum_{i = 1}^{m} x_{i, j, t} = 1}

is the start execution time of

k_{j}

. Let

R_{u}

be the resource utilization rate, which is defined as

R_{u} = μ_{1} \frac{\sum_{t = 1}^{τ} \sum_{i = 1}^{m} \sum_{j = 1}^{n} x_{i, j, t} r_{c} (j)}{m R_{c} τ} + μ_{2} \frac{\sum_{t = 1}^{τ} \sum_{i = 1}^{m} \sum_{j = 1}^{n} x_{i, j, t} r_{m} (j)}{m R_{m} τ},

(8)

where

μ_{1}

and

μ_{2}

are weight coefficients for CPU and memory utilization metrics, respectively.

μ_{1}

and

μ_{2}

are constrained to the interval of [0, 1], with

μ_{1} + μ_{2} = 1

. Note that

μ_{1}

and

μ_{2}

are hyperparameters that balance the relative importance of CPU utilization versus memory utilization in the reward function. Their values are assigned based on the focus of the application scenario. For instance,

μ_{1}

is assigned a higher weight in CPU-intensive scenarios, whereas

μ_{2}

is assigned a higher weight in memory-intensive scenarios.

The goal of this paper is to search for a task-scheduling and resource defragmentation policy that solves the following problem:

\begin{matrix} P 1 : & min η_{1} \frac{\sum_{j = 1}^{n} T_{c} (j)}{n} - η_{2} R_{u} \\ s . t . (1) - (4), (6), \end{matrix}

where

η_{1}

and

η_{2}

are weight coefficients for the average task completion time and the resource utilization rate, respectively.

η_{1}

and

η_{2}

are constrained to the interval of [0, 1], with

η_{1} + η_{2} = 1

. Note that

\frac{\sum_{j = 1}^{n} T_{c} (j)}{n}

carries the dimension of time, while

R_{u}

is a dimensionless ratio. The

η_{1}

and

η_{2}

coefficients serve not only as weights to balance the relative importance of minimizing the average task completion time and maximizing the resource utilization rate but also effectively normalize the different units of the two terms.

According to the objective function and Equations (5), (7) and (8), the decision variables are

x_{i, j, t}

and

y_{i, i^{'}, j, t}

. According to Equation (4), the value of

y_{i, i^{'}, j, t}

is inherently determined once

x_{i, j, t}

is determined. Therefore, the above problem is reduced to finding the optimal task-scheduling sequence (

x_{i, j, t}^{⁎}

).

4. Joint Resource Defragmentation and Task-Scheduling Approach

In this paper, we consider an online dynamic scheduling system where container migration and task-scheduling decisions must be made at a specific moment to respond to real-time task arrivals. In this section, we first transform the problem to an online decision problem, then introduce the DRL-RDG approach.

4.1. Problem Transformation

In the online scheduling system, we consider the following process: at the beginning of a specific time slot (t), the state of tasks and server resource information required for scheduling decision-making are captured, including the queued tasks, the running containers, and the resource availability of servers. Concomitantly, the scheduler determines which tasks in the queue are to be assigned to which servers, as well as whether any running containers need to be migrated and, if so, to which servers.

Let

K^{f}

,

K^{q}

, and

K^{r}

denote the sets of completed tasks, queued tasks, and running tasks, respectively. The three sets are dynamically updated at the beginning of each time slot (t) to reflect the latest system state. For tasks in

K^{r}

, let

H = {h_{i, j}}

denote the task-scheduling decision matrix in time slot

t - 1

. Let

{\tilde{x}}_{i, j}

be the task-scheduling decision variables in the current time slot (t). For tasks in

K^{q}

,

{\tilde{x}}_{i, j} = 1

denotes that task

k_{j}

is assigned to server

s_{i}

. For tasks in

K^{r}

,

{\tilde{x}}_{i, j} = 1

and

h_{i, j} = 0

denotes that container

c_{j}

of

k_{j}

is migrated to server

s_{i}

.

Let

\tilde{T}

be the current cumulative task completion time, which is defined as

\tilde{T} = \sum_{j \in K^{q}} (1 - \sum_{i = 1}^{m} {\tilde{x}}_{i, j}) + | K^{r} | + \sum_{j \in K^{r}} \sum_{i = 1}^{m} ({\tilde{x}}_{i, j} - h_{i, j}) T_{m} (j) .

(9)

Let

{\tilde{R}}_{u}

be the resource utilization rate during time slot t, which is defined as

{\tilde{R}}_{u} = μ_{1} \frac{\sum_{i = 1}^{m} \sum_{j = 1}^{n} {\tilde{x}}_{i, j} r_{c} (j)}{m R_{c}} + μ_{2} \frac{\sum_{i = 1}^{m} \sum_{j = 1}^{n} {\tilde{x}}_{i, j} r_{m} (j)}{m R_{m}} .

(10)

The goal of the online task-scheduling and resource defragmentation problem is to minimize

η_{1} \frac{\tilde{T}}{n} - η_{2} {\tilde{R}}_{u}

.

4.2. DRL-Based Resource Defragmentation and Task-Scheduling Approach

The proposed DRL-RDG is a two-layer iterative approach to address the coupled challenges of the online resource fragmentation and task-scheduling problem in dynamic computing environments. In the first layer, given a resource defragmentation decision, DRL-RDG leverages a task-scheduling approach based on DRL to solve the queued task-scheduling subproblem. In the second layer, conversely, given the current task-scheduling decisions, DRL-RDG employs an RDG approach to resolve the resource fragmentation subproblem. The iterative of two layers ensures that task scheduling and resource defragmentation mutually reinforce each other: effective scheduling reduces the need for frequent defragmentation, while proactive defragmentation creates more efficient resource configurations for subsequent scheduling decisions.

In the following, we first introduce the proposed RDG. Then, we introduce an RL-based resource defragmentation and task-scheduling approach called RL-RDG to highlight and substantiate the advantages of the DRL algorithm. Finally, we present the proposed DRL-RDG.

4.2.1. Resource Defragmentation Approach Based on Greedy Strategy (RDG)

In RDG, we define a resource imbalance index and a free-space size index. The resource imbalance index is a measure of the disparity between CPU and memory utilization on a server or for a container. This index captures the degree to which one resource (either CPU or memory) is over-utilized relative to the other. The resource imbalance index of container

c_{j}

is

R I I_{j} = \frac{r_{c} (j)}{R_{c}} - \frac{r_{m} (j)}{R_{m}},

(11)

and the resource imbalance index of a server (

s_{i}

) is

R I I_{i} = \frac{\sum_{j = 1}^{n} {\tilde{x}}_{i, j} r_{c} (j)}{R_{c}} - \frac{\sum_{j = 1}^{n} {\tilde{x}}_{i, j} r_{m} (j)}{R_{m}} .

(12)

The domain of

R I I

is (−1, 1), where the larger absolute value of

R I I_{i}

indicates a more severe imbalance, highlighting servers that require adjustment to align CPU and memory utilization.

The free-space size index is used to quantify the underutilized resource capacity of a server. As the overall free-space size increases, the resource fragments are further reduced. The free-space size of a server (

s_{i}

) is

F_{i} = (R_{c} - \sum_{j = 1}^{n} {\tilde{x}}_{i, j} r_{c} (j)) (R_{m} - \sum_{j = 1}^{n} {\tilde{x}}_{i, j} r_{m} (j)),

(13)

and the overall free-space size of servers is

F_{O} = \sum_{i = 1}^{m} F_{i}

.

The detailed procedure of RDG is outlined in Algorithm 1. The core objective of RDG is to systematically select containers for migration from imbalanced servers to target servers, aiming to improve overall resource balance and reduce fragmentation. In row 2, RDG computes the priority of each server based on

p r_{i} = w_{p} * | R I I_{i} | + (1 - w_{p}) (1 / (F_{i} + 1 e - 10)),

(14)

where

w_{p}

is the weight of the resource imbalance index. This formula prioritizes servers that are both highly imbalanced (high

| R I I_{i} |

) and have small free space (high

1 / F_{i}

), as these are the most critical candidates for defragmentation. Then, RDG sorts servers by priority. In rows 4–19, RDG attempts to migrate containers in the server with the highest priority. In rows 8–9, RDG evaluates the migration priority of containers through a correlation metric between the container and the server:

c o r r_{i, j} = R I I_{j} \cdot R I I_{i} \cdot max (\frac{r_{c} (j)}{R_{c}} . \frac{r_{m} (j)}{R_{m}}) .

(15)

A higher positive

c o r r_{i, j}

value indicates that the container’s resource profile (its own imbalance (

R I I_{j}

)) strongly aligns with and potentially exacerbates the server’s overall imbalance (

R I I_{i}

). The

max (\dots)

term ensures containers with significant resource demands are considered. Thus, containers with the highest positive scores are prioritized for migration, as they are likely the main contributors to the server’s imbalance.

Algorithm 1 Resource Defragmentation approach based on Greedy strategy (RDG)

Input: the initial matrix

H = {h_{i, j}}

, the maximum allowed migrations

M_{m a x}

,

R_{c}, R_{m}, K^{r}

Output: the resource defragmentation decision

{{\tilde{y}}_{i, i^{'}, j}}

1:: Compute $R I I_{j}$ , $R I I_{i}, F_{i}, F_{O}$
2:: Compute the priority $p r_{i}$ , and sort servers in descending order to form a priority list L
3:: Initialize migration counter: $M_{t o t a l}$ = 0
4:: for $s_{i} \in L$ do
5:: if $M_{t o t a l} \geq M_{m a x}$ then
6:: Break
7:: end if
8:: Identify containers on $s_{i}$ : $C_{j} = {c_{j} | h_{i, j} = 1}$ , and compute correlation $c o r r_{i, j}$ for $c_{j} \in C_{j}$
9:: Sort $C_{j}$ in descending order of $c o r r_{i, j}$ to get migration candidates
10:: for $c_{j} \in C_{j}$ do
11:: Find optimal target server $s_{i^{*}}$
12:: if optimal $s_{i^{*}}$ exists and $Δ F_{O} > 0$ then
13:: ${\tilde{y}}_{i, i^{*}, j} = 1$
14:: Update H, $R I I_{i}, R I I_{i^{*}}, F_{i}, F_{i^{*}}, F_{O}, L$
15:: $M_{t o t a l} = M_{t o t a l} + 1$
16:: Break
17:: end if
18:: end for
19:: end for
20:: return ${{\tilde{y}}_{i, i^{'}, j}}$

Containers are then ranked by the correlation value, and the container with the highest positive scores is marked as the top migration candidate. The target server in row 11 should have sufficient resources and satisfy

s_{i}^{*} = arg {min}_{i^{*}} {Δ F_{O} = F_{O} (x_{i^{*}, j} = 1) - F_{O} (x_{i, j} = 1)}

. This condition seeks the target server that minimizes the increase in overall system fragmentation (

Δ F_{O}

) after the hypothetical migration. If such a target server exists, the container is migrated from

s_{i}

to

s_{i}^{*}

. The system state metrics are updated. This process repeats for the highest-priority server until the migration counter reaches the maximum times (

M_{m a x}

). The output of RDG is a resource defragmentation decision matrix (

{{\tilde{y}}_{i, i^{'}, j}}

). Based on the greedy strategy, RDG ensures efficient resource defragmentation, thereby improving overall resource utilization efficiency.

4.2.2. RL-RDG

In RL-RDG, the RDG algorithm is used to solve the resource defragmentation subproblem, and the Q-learning algorithm is used to solve the task-scheduling subproblem.

Q-learning is a common reinforcement learning algorithm that learns the rewards of specific actions in given states by constructing and updating a Q-table. The Q-table consists of Q-values (

Q (s_{Q}, a)

) representing the expected rewards for taking an action in a specific state. In RL-RDG, the state contains the resource allocation status of each server, denoted as

s_{Q} = 〈 H, K^{f}, K^{q}, K^{r} 〉

. An action in state

s_{Q}

is

a (s_{Q}) = {(k_{j 1}, s_{i 1}), (k_{j 2}, s_{i 2}), \dots}

, which means assigning a set of one or more queued tasks (

k_{j 1}, k_{j 2}, \dots \in K^{q}

) to the according servers. This action is valid only if

s_{i}

has enough free CPU and memory to accommodate

k_{j}

. The action causes a system state transition:

s_{Q}^{'} = a (s_{Q})

. The reward of

a (s_{Q})

is

R = η_{2} {\tilde{R}}_{u} - η_{1} \tilde{T}

. The Q-value is updated using the formula expressed as

Q (s_{Q}, a) = R + γ \cdot {max}_{a^{'}} Q (s_{Q}^{'}, a^{'})

, where

γ

is the discount factor for future rewards.

The detailed procedure of RL-RDG is outlined in Algorithm 2. The algorithm operates through nested loops. The outer loop (Lines 2–16) continues until no valid actions exist for

s_{Q 0}

or all queued tasks are allocated

K^{q} = \emptyset

. The inner loop involves learning Q-values by updating the Q-table. In each iteration,

Line 2: The RDG algorithm first performs container migration based on the current system state to optimize resource fragmentation, updating the server state (H);
Lines 3–6: The Q-table is initiated, and valid actions in the initial state ( $s_{Q 0}$ ) are checked for;
Lines 8–14: The inner Q-learning loop selects actions using the $ϵ$ -greedy policy, observes rewards and next states, and updates Q-values until convergence;
Line 15: The optimal action sequence is selected based on the learned Q-values to schedule tasks.

Algorithm 2 RL-RDG

Input:

γ, ϵ

,

H, M_{m a x}, R_{c}, R_{m}

,

K^{r}

,

K^{q}

,

K^{f}

Output: The resource defragmentation and task scheduling decisions

{{\tilde{x}}_{i, j}}

,

{{\tilde{y}}_{i, i^{'}, j}}

1:: repeat
2:: Migrate containers according to RDG algorithm, and update H
3:: $Q (s_{Q}, a) = 0$ , $s_{Q 0} = 〈 H, K^{f}, K^{q}, K^{r} 〉$
4:: if $a (s_{Q 0}) = \emptyset$ then
5:: Break
6:: end if
7:: $s_{Q} = s_{Q 0}$
8:: repeat
9:: if $K^{q} \neq \emptyset$ then
10:: Select $a (s_{Q})$ according to $ϵ$ -greedy Policy
11:: Observe $s_{Q}^{'}$ , R and update $Q (s_{Q}, a)$ , $H, K^{r}, K^{q}, K^{f}$
12:: $s_{Q} = s_{Q}^{'}$
13:: end if
14:: until $Q (s_{Q}, a)$ converge
15:: Select the optimal action to schedule tasks
16:: until $K^{q} = \emptyset$
17:: return Task allocation records and migration decisions

The action selection strategy in row 10 is the

ϵ

-greedy policy, which is defined as follows:

π (a | s) = \{\begin{matrix} 1 - ϵ + \frac{ϵ}{| A (s) |} & a = arg min_{a} Q (s, a) \\ \frac{ϵ}{| A (s) |} & otherwise, \end{matrix}

(16)

where

| A (s) |

denotes the number of optional actions. The above policy classifies optional actions into two categories: one category consists of the action(s) that maximize(s) the Q-value, and the other includes all remaining actions. The iterative process continues until

Q (s_{Q}, a)

is converged. The convergence criterion is defined as either reaching the maximum iteration limit of 1000 or the maximum change in Q-values between consecutive iterations falling below a threshold of 0.001. It is worth noting that the episode limit and the threshold value can be adjusted according to specific application scenarios and problem scales. After task scheduling, RDG is triggered to perform container migration based on the updated server state. The outer iteration continues until no valid actions exist for

s_{Q 0}

or all queued tasks are allocated

K^{q} = \emptyset

.

4.2.3. DRL-RDG

In our proposed DRL-RDG, the deep Q-network algorithm replaces the traditional Q-learning algorithm to solve the task-scheduling subproblem. The deep Q-network is a common deep reinforcement learning algorithm that learns the expected long-term rewards of specific actions in given states using a neural network. It can overcome the limitation of Q-learning’s discrete Q-table, which fails to handle high-dimensional state spaces efficiently.

In DRL-RDG, there are two neural networks (Q and

\hat{Q}

) and a replay memory (D). The two network (Q and

\hat{Q}

) have the same neural network structure, while they have different network parameter vectors (

θ

and

θ^{-}

). The neural networks contain four layers: an input layer, two fully connected layers, and an output layer. The Q-values are represented by

Q (s_{Q}, a; θ)

and

\hat{Q} (s_{Q}, a; θ^{-})

under state

s_{Q}

and action a, respectively. The replay memory (D) stores the observed experience

(s_{Q}, a, R, s_{Q}^{'})

, which is used to train the neural network.

The DRL-RDG algorithm is outlined in Algorithm 3. The algorithm operates in two nested loops:

The outer loop (Lines 4–26) iterates until all queued tasks are scheduled. In each iteration, RDG first performs container migration to optimize resource fragmentation (Line 3). The initial state ( $s_{Q 0}$ ) is then constructed from the updated system state (Line 4). If no valid scheduling actions exist from this state, the loop terminates (Lines 5–7).
The inner loop (Lines 8–23) represents the DQN training and decision-making phase over multiple episodes and steps. Within each step, if queued tasks exist, an action (a) is selected using an $ϵ$ -greedy policy (Line 12). This policy balances exploration (random action with probability) and exploitation (selecting the action with the highest Q-value from Q). After executing the action, the system observes the new state and reward, stores the experience in replay memory (Line 13), and updates the current state (Line 14).

Algorithm 3 DRL-RDG

Input:

(γ, ϵ)

,

H, M_{m a x}, R_{c}, R_{m}

,

K^{r}, K^{q}, K^{f}

Output: The resource defragmentation and task scheduling decisions

{{\tilde{x}}_{i, j}}

,

{{\tilde{y}}_{i, i^{'}, j}}

1:: Initialize replay memory D, network Q with random parameter $θ$ , target network $\hat{Q}$ with parameter $θ^{-} = θ$
2:: repeat
3:: Migrate containers according to RDG algorithm, and update H
4:: Obtain $s_{Q 0} = 〈 H, K^{f}, K^{q}, K^{r} 〉$
5:: if $a (s_{Q 0}) = \emptyset$ then
6:: Break
7:: end if
8:: for episode = 1, …, E do
9:: $s_{Q} = s_{Q 0}$
10:: for step = 1, …, C do
11:: if $K^{q} \neq \emptyset$ then
12:: Select $a (s_{Q})$ according to $ϵ$ -greedy Policy
13:: Observe $s_{Q}^{'}$ , R, update $H, K^{r}, K^{q}, K^{f}$ , and store $(s_{Q}, a, R, s_{Q}^{'})$ in D
14:: $s_{Q} = s_{Q}^{'}$
15:: end if
16:: Sample a set of experience from D
17:: Compute $\hat{Q}$ , and update $θ$ to minimize $L (θ, \hat{Q})$
18:: step = step + 1
19:: if mod (step, updatestep) = 0 then
20:: $θ^{-} = θ$
21:: end if
22:: end for
23:: end for
24:: Select the optimal action to schedule tasks
25:: until $K^{q} = \emptyset$
26:: return Task allocation records and migration decisions

The

ϵ

-greedy policy in row 12 is based on Equation (16), where the Q-value is

Q (s_{Q}, a; θ)

. The neural network training phase (Lines 16–23) involves sampling a batch of experiences (

I

) from replay memory (D). Based on these experience samples, the key idea of the update of Q is to minimize the difference between the Q-values predicted by the online network (

Q (s_{Q}, a; θ)

) and the target values generated by the target network (

\hat{Q} (s_{Q}, a; θ^{-})

). The loss function is expressed as follows:

L (θ, \hat{Q}) = \frac{1}{| I |} \sum_{i \in I} {(Q (s_{Q} (i), a (i); θ) - \hat{Q} (s_{Q} (i), a (i); θ^{-}))}^{2}

(17)

where

| I |

is the cardinality of set

I

. This mean squared error loss ensures that the online network’s predictions gradually align with the more stable target values.

For every updatestep,

\hat{Q}

is updated by copying Q, i.e., the target network parameters (

θ^{-}

) are synchronized with the online network parameters (

θ

; Line 21) to stabilize training. The iteration between RDG and the deep Q-network is the same as the iteration between RDG and Q-learning in the RL-RDG algorithm. The iterationalso continues until no valid actions exist for

s_{Q 0}

or all queued tasks are allocated (

K^{q} = \emptyset

).

5. Performance Evaluation

In this section, we use simulations to evaluate the performance of RL-RDG and DRL-RDG.

The system investigated in this paper is a queuing cloud computing system. The queuing environment is simulated by configuring servers with 96 CPUs and 256G memory. The results of Google trace show that the number of tasks that arrive every 5 min follows a Poisson distribution [32]. The task duration follows a heavy-tailed distribution, shaped with 80% of jobs in the trace being shorter than the average job duration [32,33]. Therefore, in the simulation, a Poisson distribution is used to generate a per time-slot number of arriving tasks. The task duration time and the resource requirement follow heavy-tailed distributions [25]. Therefore, the number of tasks arriving in each time slot follows P

(λ)

. The CPU requirement of tasks follows the log-normal distribution (LN(3, 0.75)), and the maximum CPU requirement is set as 64. The memory requirement of tasks follows the log-normal distribution (LN(4, 0.75)), and the maximum memory requirement is set as 128. The task duration time of tasks follows the Pareto distribution (Pa(1.5, 2)). The weight coefficients of CPU and memory utilization (

μ_{1}

and

μ_{2}

) are set as 0.5. The weight coefficients of the average task completion time and the resource utilization rate (

η_{1}

and

η_{2}

) are set as 0.1 and 0.9.

To evaluate the efficiency of DRL-RDG, we investigate the performance in terms of the task completion time and the resource utilization rate under various server cluster scales and task intensities, which are controlled through different parameter settings of the number of servers (m) and task arrival rates (

λ

). The benchmark methods include FIFO-RDG, SJF-RDG, and DRL-o. FIFO-RDG and SJF-RDG are approaches under the two-layer iterative framework. DRL-o only considers task scheduling without resource defragmentation. Their core mechanisms are introduced as follows:

SJF-RDG: In the first layer, the Shortest Job First (SJF) algorithm which prioritizes tasks with shorter execution times, is employed to solve the task-scheduling subproblem. In the second layer, the RDG algorithm is employed to solve the resource defragmentation subproblem.
FIFO-RDG: In the first layer, the First In–First Out (FIFO) algorithm is employed to solve the task-scheduling subproblem. FIFO-scheduled tasks follow the chronological order of task arrival. In the second layer, the RDG algorithm is employed to solve the resource defragmentation subproblem.
DRL-o: In DRL-o, only the deep Q-network algorithm is employed to solve the task-scheduling problem without resource defragmentation.

Note that all experimental results are the average values obtained from 100 independent simulation runs conducted under identical environmental conditions. The source code has been released (https://jihulab.com/guoyan/resource-defragmentation-and-task-scheduling, accessed on 16 October 2025).

5.1. Server Cluster Scale

In this scenario, the number of servers varies from 30 to 70, and the average number of tasks arriving per time slot is equal to the number of servers. This parameter setting is designed to simulate a high-intensity and dynamically scalable cluster environment, where the system faces consistent task intensity in terms of the task-to-server ratio as the number of servers increases. Figure 3 shows the performance comparison in terms of the task completion time and the resource utilization rate under various server cluster scales. Table 2 shows the quantitative comparison of performance metrics between DRL-RDG and RL-RDG.

Figure 3a shows that DRL-RDG and RL-RDG outperform SJF-RDG and FIFO-RDG in terms of average task completion time. This is because both algorithms can learn and dynamically adjust scheduling strategies to adapt to load changes, unlike the relatively fixed strategies of SJF-RDG and FIFO-RDG. DRL-RDG and RL-RDG also outperform DRL-o due to resource defragmentation. Specifically, for DRL-RDG, as the number of servers increases, its advantage over RL-RDG in average latency becomes more significant. In high-dimensional scenarios (with more servers), DRL-RDG can better handle complex state spaces, while RL-RDG is limited by insufficient Q-value learning, resulting in slightly higher latency compared to DRL-RDG. Compared with FIFO-RDG, the SJF-RDG scheduling strategy prioritizes short tasks, avoiding short tasks being blocked by long tasks, thereby reducing the overall average latency. DRL-o’s average task completion time lies between those of RL-RDG/SJF-RDG and FIFO-RDG. While DRL-o benefits from DRL’s ability, it suffers from persistent resource fragmentation, leading to a longer queuing time.

Figure 3b,c show that DRL-RDG and RL-RDG achieve relatively high resource utilization compared to SJF-RDG, FIFO-RDG, and DRL-o. DRL-RDG shows a slightly better trend as the number of servers increases, which is related to its better performance in handling complex scenarios. Note that as the number of servers increases, the utilization rate of these algorithms exhibits a slight improvement. This is because more servers mean more fragments can be integrated, and the efficiency of the overall resource pool is enhanced. In summary, the learning-based DRL-RDG and RL-RDG algorithms show significant advantages compared to the traditional SJF-RDG and FIFO-RDG in terms of average task completion time and resource utilization.

5.2. Task Arrival Rate

In this scenario, the number of servers is fixed at 50, while the task arrival rate varies incrementally from 40 to 60. This parameter configuration is intentionally designed to simulate a fixed-scale cluster under dynamically varying task intensities. Figure 4 shows the performance comparison in terms of the task completion time and the resource utilization rate under various task arrival rates. Table 3 shows the quantitative comparison of performance metrics between DRL-RDG and RL-RDG.

Figure 4a shows that as the task arrival rate increases, the average task completion time of all algorithms shows an upward trend. This is because a higher task arrival rate increases the probability of task queuing, thereby prolonging the average task completion time. Specifically, as the task arrival rate increases, SJF-RDG reduces the average completion time more effectively than FIFO-RDG by prioritizing short-duration tasks. DRL-o exhibits an average task completion time that is comparable to that of SJF-RDG. In contrast to SJF-RDG, RL-RDG and DRL-RDG further decrease the average completion time. This is achieved by comprehensively accounting for the task duration time, resource requirements of tasks, and the long-term implications of the current task-scheduling strategy.

Figure 4b,c show that as the task arrival rate increases, the average CPU and memory utilization of algorithms first rise and then stabilize. In the initial stage of an increasing task arrival rate, more tasks make better use of CPU and memory resources, thereby increasing resource utilization. When the task arrival rate reaches a certain level, the system attains a steady state, leading to stabilized resource utilization. Compared to RL-RDG and DRL-RDG, DRL-o exhibits lower average CPU and memory utilization. Since DRL-o does not perform resource defragmentation, scattered idle resources cannot be consolidated and allocated to tasks effectively. Compared to RL-RDG and DRL-RDG, SJF-RDG lacks long-term strategic optimization and, thus, exhibits more fluctuating utilization patterns.

5.3. Statistical Analysis of Performance Differences

To quantitatively validate the observed performance differences between DRL-RDG and RL-RDG, we conduct Mann–Whitney U tests. The statistical analysis results presented in Table 4 demonstrate that all p-values are below the 0.05 significance level, confirming the statistical significance of our findings across all evaluation metrics and system scales. The statistical results confirm the performance advantages of our proposed DRL-RDG approach.

6. Conclusions

This paper investigates the joint optimization of resource defragmentation and task scheduling in queuing cloud systems. We first formulate the problem to minimize the average task completion time and maximize resource utilization, then transform it into an online decision problem. In online scenarios, decisions must be made in real time for dynamic task arrivals. We then propose the DRL-RDG approach, which is a two-layer iterative approach, to address the coupled challenges of online resource fragmentation and the task-scheduling problem. Specifically, DRL-RDG solves the task-scheduling subproblem and resource fragmentation subproblem through the deep Q-network algorithm and RDG. In addition, we propose the RL-RDG approach to highlight and substantiate the advantages of DRL-RDG. Simulation results show that DRL-RDG achieves a low average task completion time and high resource utilization, demonstrating its effectiveness in queuing cloud environments.

This study primarily relies on simulations to establish a controlled and scalable benchmarking environment. While the simulation parameters are designed to reflect realistic scenarios, we acknowledge that validation on a real-world testbed is an essential next step. Our immediate future work includes deploying and evaluating the proposed algorithm in a real application to further verify its practical efficacy.

Author Contributions

Conceptualization, Y.G. and L.W.; methodology, Y.G.; validation, Y.G. and H.H.; data curation, L.W.; writing—original draft preparation, Y.G.; writing—review and editing, C.F. and Y.M.; supervision, L.W.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Technology Research and Development Program of China (Grant Nos. 2023YFB3905300 and 2023YFB3905302).

Data Availability Statement

Data available on request due to restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jawed, M.S.; Sajid, M. A comprehensive survey on cloud computing: Architecture, tools, technologies, and open issues. Int. J. Cloud Appl. Comput. 2022, 12, 1–33. [Google Scholar] [CrossRef]
Lee, J. A view of cloud computing. Int. J. Netw. Distrib. Comput. 2013, 1, 2–8. [Google Scholar] [CrossRef]
Wang, B.; Song, Y.; Cui, X.; Cao, J. Performance comparison between hypervisor- and container-based virtualizations for cloud users. In Proceedings of the 4th International Conference on Systems and Informatics (ICSAI), Hangzhou, China, 11–13 November 2017. [Google Scholar]
Queiroz, R.; Cruz, T.; Mendes, J.; Sousa, P.; Simoes, P. Container-based virtualization for real-time industrial systems—A systematic review. ACM Comput. Surv. 2024, 56, 59. [Google Scholar] [CrossRef]
Kaur, K.; Garg, S.; Kaddoum, G.; Ahmed, S.H.; Atiquzzaman, M. KEIDS: Kubernetes-based energy and interference driven scheduler for industrial IoT in edge-cloud ecosystem. IEEE Internet Things J. 2020, 7, 4228–4237. [Google Scholar] [CrossRef]
Struhar, V.; Craciunas, S.S.; Ashjaei, M.; Behnam, M.; Papadopoulos, A.V. REACT: Enabling real-time container orchestration. In Proceedings of the 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vasteras, Sweden, 7–10 September 2021. [Google Scholar]
Kiaee, F.; Arianyan, E. Joint VM and container consolidation with auto-encoder based contribution extraction of decision criteria in edge-cloud environment. J. Netw. Comput. Appl. 2025, 233, 104049. [Google Scholar] [CrossRef]
Li, H.; Berger, D.S.; Hsu, L.; Ernst, D.; Zardoshti, P.; Novakovic, S.; Shah, M.; Rajadnya, S.; Lee, S.; Agarwal, I.; et al. Pond: Cxl-based memory pooling systems for cloud platforms. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Vancouver, BC, Canada, 25–29 March 2023. [Google Scholar]
Rao, K.S.; Thilagam, P.S. Heuristics based server consolidation with residual resource defragmentation in cloud data centers. Future Gener. Comput. Syst. 2015, 50, 87–98. [Google Scholar] [CrossRef]
Gudkov, A.; Popov, P.; Romanov, S. Balcon-resource balancing algorithm for VM consolidation. Future Gener. Comput. Syst. 2023, 147, 265–274. [Google Scholar] [CrossRef]
Hasan, M.S.; Huh, E. Heuristic based energy-aware resource allocation by dynamic consolidation of virtual machines in cloud data center. KSII Trans. Internet Inf. Syst. 2013, 7, 1825–1842. [Google Scholar] [CrossRef]
Lin, H.; Liu, G.; Lin, W.; Wang, X.; Wang, X. A novel virtual machine consolidation algorithm with server power mode management for energy-efficient cloud data centers. Clust. Comput. 2024, 27, 11709–11725. [Google Scholar] [CrossRef]
Xu, H.; Liu, Y.; Wei, W.; Xue, Y. Migration cost and energy-aware virtual machine consolidation under cloud environments considering remaining runtime. Int. J. Parallel Program. 2019, 47, 481–501. [Google Scholar] [CrossRef]
Wu, Q.; Ishikawa, F.; Zhu, Q.; Xia, Y. Energy and migration cost-aware dynamic virtual machine consolidation in heterogeneous cloud datacenters. IEEE Trans. Serv. Comput. 2019, 12, 550–563. [Google Scholar] [CrossRef]
Zhu, J.; Tang, W.; Meng, X.; Gong, N.; Ai, T.; Li, G.; Yu, B.; Yang, X. Phecon: Fine-grained VM consolidation with nimble resource defragmentation in public cloud platforms. In Proceedings of the 53rd International Conference on Parallel Processing (ICPP), Gotland, Sweden, 12–15 August 2024. [Google Scholar]
Hadary, O.; Marshall, L.; Menache, I.; Pan, A.; Greeff, E.E.; Dion, D.; Dorminey, S.; Joshi, S.; Chen, Y.; Russinovich, M.; et al. Protean: VM allocation service at scale. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Virtual Event, 4–6 November 2020. [Google Scholar]
Banerjee, S.; Roy, S.; Khatua, S. Towards energy and QoS aware dynamic VM consolidation in a multi-resource cloud. Future Gener. Comput. Syst. 2024, 157, 376–391. [Google Scholar] [CrossRef]
Nath, S.B.; Addya, S.K.; Chakraborty, S.; Ghosh, S.K. Green containerized service consolidation in cloud. In Proceedings of the 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020. [Google Scholar]
Hallawi, H.; Mehnen, J.; He, H. Multi-capacity combinatorial ordering GA in application to cloud resources allocation and efficient virtual machines consolidation. Future Gener. Comput. Syst. 2017, 69, 1–10. [Google Scholar] [CrossRef]
Liu, F.; Ma, Z.; Wang, B.; Lin, W. A virtual machine consolidation algorithm based on ant colony system and extreme learning machine for cloud data center. IEEE Access. 2020, 8, 53–67. [Google Scholar] [CrossRef]
Li, H.; Zhu, G.; Cui, C.; Tang, H.; Dou, Y.; He, C. Energy-efficient migration and consolidation algorithm of virtual machines in data centers for cloud computing. Computing 2016, 98, 303–317. [Google Scholar] [CrossRef]
Han, G.; Que, W.; Jia, G.; Zhang, W. Resource-utilization-aware energy efficient server consolidation algorithm for green computing in IIOT. J. Netw. Comput. Appl. 2018, 103, 205–214. [Google Scholar] [CrossRef]
Ahmad, M.O.; Khan, R.Z. An efficient load balancing scheduling strategy for cloud computing based on hybrid approach. Int. J. Cloud Comput. 2020, 9, 453–469. [Google Scholar] [CrossRef]
Shameer, A.P.; Subhajini, A.C. OABC scheduler: A multi-objective load balancing-based task scheduling in a cloud environment. Int. J. Adv. Intell. Paradigms. 2024, 27, 283–303. [Google Scholar] [CrossRef]
Guo, M.; Guan, Q.; Chen, W.; Ji, F.; Peng, Z. Delay-optimal scheduling of vms in a queueing cloud computing system with heterogeneous workloads. IEEE Trans. Serv. Comput. 2022, 15, 110–123. [Google Scholar] [CrossRef]
Maguluri, S.T.; Srikant, R.; Ying, L. Stochastic models of load balancing and scheduling in cloud computing clusters. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), Orlando, FL, USA, 25–30 March 2012. [Google Scholar]
Hoenisch, P.; Schuller, D.; Schulte, S.; Hochreiner, C.; Dustdar, S. Optimization of complex elastic processes. IEEE Trans. Serv. Comput. 2016, 9, 700–713. [Google Scholar] [CrossRef]
Gad-ElRab, A.A.A.; Alzohairy, T.A.A.; Raslan, K.R.; Emara, F.A. Genetic-based task scheduling algorithm with dynamic virtual machine generation in cloud computing. Int. J. Comput. 2021, 20, 165–174. [Google Scholar] [CrossRef]
Chraibi, A.; Alla, S.B.; Touhafi, A.; Ezzati, A. A novel dynamic multi-objective task scheduling optimization based on dueling DQN and PER. J Supercomput. 2023, 79, 21368–21423. [Google Scholar] [CrossRef]
Gupta, A.; Soni, K.M.; Singhal, S. A hybrid metaheuristic and machine learning algorithm for optimal task scheduling in cloud computing. In Proceedings of the 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 6–8 July 2021. [Google Scholar]
Kang, K.; Ding, D.; Xie, H.; Yin, Q.; Zeng, J. Adaptive DRL-based task scheduling for energy-efficient cloud computing. IEEE Trans. Netw. Serv. Manag. 2022, 19, 4948–4961. [Google Scholar] [CrossRef]
Calzarossa, M.C.; Vedova, M.L.D.; Massari, L.; Petcu, D.; Tabash, M.I.M.; Tessera, D. Workloads in the Clouds; Springer International Publishing: Cham, Switzerland, 2016; pp. 525–550. [Google Scholar]
Reiss, C.; Tumanov, A.; Ganger, G.R.; Katz, R.H.; Kozuch, M.A. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the ACM Symposium on Cloud Computing (SoCC), San Jose, CA, USA, 14–17 October 2012. [Google Scholar]

Figure 1. Illustration of the queuing cloud computing system. The length and height of the task blocks and container blocks represent the memory requirement and CPU requirement, respectively. The gray areas in servers represent remaining available resources. Each task (

k_{j}

) is executed in its designated container (

c_{j}

).

Figure 1. Illustration of the queuing cloud computing system. The length and height of the task blocks and container blocks represent the memory requirement and CPU requirement, respectively. The gray areas in servers represent remaining available resources. Each task (

k_{j}

) is executed in its designated container (

c_{j}

).

Figure 2. Comparison of different resource defragmentation and task-scheduling strategies.

Figure 3. Performance comparison in terms of the task completion time and the resource utilization rate under various server cluster scales: (a) average task completion time vs. number of servers; (b) average CPU utilization vs. number of servers; (c) average memory utilization vs. number of servers.

Figure 4. Performance comparison in terms of the task completion time and the resource utilization rate under various task arrival rates: (a) average task completion time vs. task arrival rate; (b) average CPU utilization vs. task arrival rate; (c) average memory utilization vs. task arrival rate.

Table 1. The details of some notations.

Symbols	Definition
$s_{i} (1 \leq i \leq m)$	The servers in the queuing cloud computing system
$k_{j} (1 \leq j \leq n)$	The tasks submitted to the queuing cloud computing system
$c_{j} (1 \leq j \leq n)$	The container instance in which $k_{j}$ is performed
$R_{c}, R_{m}$	The CPU and memory capacity of each server
$r_{c} (j), r_{m} (j)$	The CPU and memory requirement of task $k_{j}$
$T_{d} (j)$	The duration of task $k_{j}$
$T_{s} (j)$	The submitted time of task $k_{j}$
$T_{m} (j)$	The downtime of task $k_{j}$ caused by container migration
$T_{r} (j)$	The overall running time of task $k_{j}$
$x_{i, j, t}$	The task-scheduling decision variable
$y_{i, i^{'}, j, t}$	The resource defragmentation decision variable
$T_{c} (j)$	The overall completion time of task $k_{j}$
$R_{u}$	The resource utilization rate
$K^{f}$ , $K^{q}$ , and $K^{r}$	The sets of completed tasks, queued tasks, and running tasks
$H = {h_{i, j}}$	The task-scheduling decision matrix during time slot $t - 1$
${\tilde{x}}_{i, j}$	The task-scheduling decision variables in the current time slot (t)
${\tilde{y}}_{i, i^{'}, j}$	The resource defragmentation decision variables in the current time slot (t)

Table 2. Quantitative comparison of performance metrics between DRL-RDG and RL-RDG.

Number of Servers		30	40	50	60	70
Average Task Completion Time	RL-RDG	9.53	9.50	8.45	8.66	8.35
Average Task Completion Time	DRL-RDG	9.25	8.98	7.86	7.52	8.10
Average CPU Utilization	RL-RDG	82.5	83.93	84.2	85.1	86.3
Average CPU Utilization	DRL-RDG	84.1	86.5	86.8	87.5	89.7
Average Memory Utilization	RL-RDG	83.2	79.5	83.1	85.3	84.7
Average Memory Utilization	DRL-RDG	85.6	81.9	85.7	87.8	86.9

Table 3. Quantitative comparison of performance metrics between DRL-RDG and RL-RDG.

Task Arrival Rate		40	45	50	55	60
Average Task Completion Time	RL-RDG	5.92	6.95	8.45	10.28	12.50
Average Task Completion Time	DRL-RDG	5.85	6.18	7.86	9.42	11.35
Average CPU Utilization	RL-RDG	78.3	81.5	85.2	86.6	85.9
Average CPU Utilization	DRL-RDG	79.5	84.2	87.8	89.1	92.5
Average Memory Utilization	RL-RDG	79.0	80.8	83.1	87.2	90.7
Average Memory Utilization	DRL-RDG	81.2	83.5	85.7	90.9	92.3

Table 4. Mann–Whitney U-test results (p-values) for performance metrics.

Number of Servers	30	40	50	60	70	50	50	50	50
Task Arrival Rate	30	40	50	60	70	40	45	55	60
Average Task Completion Time	0.038	0.025	0.011	0.004	0.047	0.038	0.026	0.021	0.035
Average CPU Utilization	0.032	0.041	0.028	0.015	0.023	0.041	0.033	0.024	0.016
Average Memory Utilization	0.035	0.037	0.031	0.026	0.019	0.039	0.028	0.017	0.014

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, Y.; Wei, L.; Fan, C.; Ma, Y.; Zhao, X.; He, H. Joint Optimization of Container Resource Defragmentation and Task Scheduling in Queueing Cloud Computing: A DRL-Based Approach. Future Internet 2025, 17, 483. https://doi.org/10.3390/fi17110483

AMA Style

Guo Y, Wei L, Fan C, Ma Y, Zhao X, He H. Joint Optimization of Container Resource Defragmentation and Task Scheduling in Queueing Cloud Computing: A DRL-Based Approach. Future Internet. 2025; 17(11):483. https://doi.org/10.3390/fi17110483

Chicago/Turabian Style

Guo, Yan, Lan Wei, Cunqun Fan, You Ma, Xiangang Zhao, and Henghong He. 2025. "Joint Optimization of Container Resource Defragmentation and Task Scheduling in Queueing Cloud Computing: A DRL-Based Approach" Future Internet 17, no. 11: 483. https://doi.org/10.3390/fi17110483

APA Style

Guo, Y., Wei, L., Fan, C., Ma, Y., Zhao, X., & He, H. (2025). Joint Optimization of Container Resource Defragmentation and Task Scheduling in Queueing Cloud Computing: A DRL-Based Approach. Future Internet, 17(11), 483. https://doi.org/10.3390/fi17110483

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Joint Optimization of Container Resource Defragmentation and Task Scheduling in Queueing Cloud Computing: A DRL-Based Approach

Abstract

1. Introduction

2. Related Work

3. System Model and Problem Formulation

3.1. System Model

3.2. Task Model

3.3. Task-Scheduling Decision

3.4. Resource Defragmentation Decision

3.5. Objective Function

4. Joint Resource Defragmentation and Task-Scheduling Approach

4.1. Problem Transformation

4.2. DRL-Based Resource Defragmentation and Task-Scheduling Approach

4.2.1. Resource Defragmentation Approach Based on Greedy Strategy (RDG)

4.2.2. RL-RDG

4.2.3. DRL-RDG

5. Performance Evaluation

5.1. Server Cluster Scale

5.2. Task Arrival Rate

5.3. Statistical Analysis of Performance Differences

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI