1. Introduction
Cloud computing provides on-demand access to a shared pool of configurable computing resources, such as servers, storage, networks, applications, and so on, over the Internet. Cloud computing systems provide services by allocating underlying physical resource flexibly to users via virtualization technologies [
1,
2]. In recent years, container-based virtualization has emerged in cloud computing, driving widespread attention and adoption across both industry and academia due to the lightweight and flexibility of features [
3]. For example, major cloud providers (e.g., AWS, Google Cloud, and Azure) offer container orchestration services (e.g., Kubernetes and ECS) to support large-scale application deployment and management [
4,
5].
The proliferation of Internet of Things (IoT) applications further amplifies the importance of efficient resource management in cloud environments [
4]. Some IoT systems generate massive volumes of time-sensitive data that require rapid processing and low-latency responses, placing stringent demands on underlying infrastructure. Containerized virtualization has emerged as a key enabler for IoT cloud deployments.
When receiving tasks from users, the container scheduler in a cloud computing system determines the priority of tasks (i.e., which tasks should be processed first) and maps each task’s container to a server (i.e., which server should host the container to fulfill resource requirements). The container scheduler is a key enabling technology, since it affects the resource utilization efficiency and users’ service-level agreements [
6]. However, one common challenge associated with the containerized service model is the issue of resource fragmentation. As containers are rapidly deployed and terminated, available CPU, memory, and storage resources on servers are split into small blocks. The considerable heterogeneity of IoT tasks and cumulative resource allocation/deallocation cycles naturally lead to resource fragmentation, making it difficult to achieve high resource utilization and fast job completion [
7,
8].
Against this backdrop, jointly optimizing resource defragmentation and task scheduling becomes an even more formidable challenge in queuing cloud systems. Due to the high cost and substantial energy consumption of cloud infrastructure, it is costly to over-procure such infrastructure to process a large number of user requests in real time. In this scenario with limited resources, task queuing becomes inevitable, especially in private cloud environments. In such a queuing cloud system, resource defragmentation and task scheduling are inherently interdependent processes. Effective resource defragmentation consolidates scattered idle resources, which enables the scheduler to allocate resources to more tasks and reduces the average task completion time. Conversely, task scheduling can proactively prevent excessive fragmentation. This bidirectional interdependence highlights the necessity of an integrated optimization framework that unifies defragmentation and scheduling to simultaneously enhance resource utilization efficiency while reducing task completion time.
A number of resource defragmentation policies have been explored by leveraging container/Virtual Machine (VM) live migration. Live migration allows running containers/VMs to be transferred between physical servers without service interruption, making it possible to consolidate the resource fragmentation of physical servers and optimize resource allocations. The existing resource defragmentation schemes mainly focus on metrics such as the number of active hosts [
9,
10], power consumption [
11,
12], migration cost [
10,
13,
14], resource fragmentation [
9,
15], etc. While these efforts have made meaningful progress in reducing resource fragmentation and improving infrastructure efficiency, these schemes mainly focus on the scenario where the sequence of incoming user tasks is unknown and fail to account for the joint optimization of resource defragmentation and task scheduling.
To address the above problem, we investigate the joint optimization of resource defragmentation and task scheduling in a queuing cloud computing system. As shown in
Figure 1, users continuously submit tasks, which enter a task queue to wait for processing. A container scheduler migrates running containers as needed and dispatches these queued tasks to the three servers.
Figure 2 illustrates a comparative analysis of different resource defragmentation and task-scheduling strategies.
Figure 2a depicts a Best fit-based task scheduling strategy without resource defragmentation, where no container is migrated;
is scheduled to
[
16].
Figure 2b depicts a separate optimization of resource defragmentation and task scheduling, where
are migrated to
, respectively, for maximum available resources [
15]; then,
are scheduled to
. In the resource defragmentation stage, containers are migrated to minimize resource fragmentation, without regard for resource requirements of queued tasks.
Figure 2c depicts a joint optimization of resource defragmentation and task scheduling, where only
is migrated to
, under the consideration of resource requirements of queued tasks; then,
are scheduled to
, respectively. This example indicates that, compared with task scheduling alone, resource defragmentation improves the resource utilization by scheduling more tasks at the cost of container migration. Through the joint optimization of resource defragmentation and task scheduling, identical tasks are scheduled while involving fewer container migration operations.
There are two primary challenges in the joint optimization of resource defragmentation and task scheduling: the heterogeneous and dynamic nature of task resource requirements, as well as variations in task duration. These factors collectively contribute to persistent resource fragmentation, which, in turn, increases task queuing delays and degrades overall system efficiency. Effectively addressing these issues requires not only intelligent task sequencing to minimize makespan but also proactive container migration strategies that consolidate fragmented resources with minimal overhead. Furthermore, the interdependence between scheduling and defragmentation necessitates a coordinated approach that dynamically balances immediate task execution needs with long-term resource availability.
In this paper, we formulate the joint resource defragmentation and task-scheduling problem, with the aim of minimizing the task completion time and maximizing resource utilization. The problem is then transformed to an online decision problem that aligns with the dynamic, real-time nature of cloud environments. We further propose a Deep Reinforcement Learning (DRL)-based resource defragmentation and task-scheduling approach called DRL-RDG, which is a two-layer iterative approach to solve the queued task-scheduling subproblem through a DRL algorithm and the resource defragmentation subproblem through a Resource Defragmentation approach based on a Greedy strategy (RDG). The proposed DRL-RDG can balance short-term migration costs against long-term scheduling gains in task completion time and utilization.
The remainder of this paper is organized as follows.
Section 2 introduces the related work.
Section 3 formulates the joint resource defragmentation and task-scheduling problem.
Section 4 transforms the problem to an online decision problem, then introduces the DRL-RDG approach. Simulation studies are conducted to demonstrate the efficiency of the proposal in
Section 5. Finally,
Section 6 concludes this paper.
2. Related Work
A number of VM/container resource defragmentation approaches and task scheduling approaches in cloud computing have been proposed in recent years.
To mitigate the resource fragmentation issue, VM/container consolidation has been extensively researched, primarily focusing on a better assignment of VMs/containers to hosts. The exiting VM/container consolidation approaches mainly focus on power consumption [
11,
12], resource utilization [
17], the number of active hosts [
9,
10,
18], migration cost [
10,
13,
14], resource fragmentation [
9,
15], etc. To solve the consolidation problem, many metaheuristics and heuristics have been proposed. Metaheuristics provide approximate solutions based on genetic algorithms [
19], ant colony optimization [
20], particle swarm optimization [
21], etc. Heuristics are proposed for better performance in terms of the solvable instance size and time complexity [
9,
10,
15,
22]. For example, Gudkov et al. [
10] proposed a heuristic for solving the VM consolidation problem with a controlled trade-off between the number of released hosts and the amount of migration memory. The key idea is to place a VM in a host with a lack of free space using induced migrations [
10]. Zhu et al. proposed a nimble, fine-grained consolidation algorithm, which focuses on utilizing resource fragmentation to increase the number of additional VM allocations [
15]. Kiaee et al. considered joint VM and container consolidation in an edge-cloud environment and proposed an autoencoder-based solution comprising two stages of consolidation subproblems—namely, a joint VM and container multi-criteria migration decision and an edge-cloud power service-level agreement for VM placement [
7]. However, these consolidation approaches mainly focus on scenarios where the sequence of incoming user tasks is unknown and fail to account for the joint optimization of resource defragmentation and task scheduling.
Many works on task scheduling in cloud computing have been proposed to improve load balancing [
23,
24], minimize task completion time [
25], maximize throughput [
26], etc. Early research often focused on heuristic methods to efficiently allocate tasks to available virtual machines, such as first in–first out (FIFO), shortest job first (SJF), genetic algorithms, and so on [
25,
27,
28]. More recently, machine learning-based techniques, including reinforcement learning, have been applied to adaptively learn optimal scheduling policies in dynamic and heterogeneous cloud environments [
29,
30,
31]. For example, Guo et al. studied delay-optimal scheduling in a queuing cloud computing system and proposed a heuristic algorithm in which a min–min best fit policy is used to solve inter-queue scheduling and an SJF policy is used to solve intra-queue buffering [
25]. Kang et al. proposed an automatic generation network-based DRL approach to learn the optimal policy of dispatching arriving user requests, with the reward aiming to minimize the task response time and maximize resource utilization [
31].
However, these studies on task scheduling overlook the issue of resource fragmentation. This limitation becomes particularly critical in high-load scenarios, where fragmented resources prevent efficient task placement, leading to increased delays and resource under-utilization. Consequently, there is a growing recognition of the need to integrate resource defragmentation awareness into task-scheduling strategies, especially in queuing cloud systems where resource continuity and task scheduling are deeply interdependent.
4. Joint Resource Defragmentation and Task-Scheduling Approach
In this paper, we consider an online dynamic scheduling system where container migration and task-scheduling decisions must be made at a specific moment to respond to real-time task arrivals. In this section, we first transform the problem to an online decision problem, then introduce the DRL-RDG approach.
4.1. Problem Transformation
In the online scheduling system, we consider the following process: at the beginning of a specific time slot (t), the state of tasks and server resource information required for scheduling decision-making are captured, including the queued tasks, the running containers, and the resource availability of servers. Concomitantly, the scheduler determines which tasks in the queue are to be assigned to which servers, as well as whether any running containers need to be migrated and, if so, to which servers.
Let , , and denote the sets of completed tasks, queued tasks, and running tasks, respectively. The three sets are dynamically updated at the beginning of each time slot (t) to reflect the latest system state. For tasks in , let denote the task-scheduling decision matrix in time slot . Let be the task-scheduling decision variables in the current time slot (t). For tasks in , denotes that task is assigned to server . For tasks in , and denotes that container of is migrated to server .
Let
be the current cumulative task completion time, which is defined as
Let
be the resource utilization rate during time slot
t, which is defined as
The goal of the online task-scheduling and resource defragmentation problem is to minimize .
4.2. DRL-Based Resource Defragmentation and Task-Scheduling Approach
The proposed DRL-RDG is a two-layer iterative approach to address the coupled challenges of the online resource fragmentation and task-scheduling problem in dynamic computing environments. In the first layer, given a resource defragmentation decision, DRL-RDG leverages a task-scheduling approach based on DRL to solve the queued task-scheduling subproblem. In the second layer, conversely, given the current task-scheduling decisions, DRL-RDG employs an RDG approach to resolve the resource fragmentation subproblem. The iterative of two layers ensures that task scheduling and resource defragmentation mutually reinforce each other: effective scheduling reduces the need for frequent defragmentation, while proactive defragmentation creates more efficient resource configurations for subsequent scheduling decisions.
In the following, we first introduce the proposed RDG. Then, we introduce an RL-based resource defragmentation and task-scheduling approach called RL-RDG to highlight and substantiate the advantages of the DRL algorithm. Finally, we present the proposed DRL-RDG.
4.2.1. Resource Defragmentation Approach Based on Greedy Strategy (RDG)
In RDG, we define a resource imbalance index and a free-space size index. The resource imbalance index is a measure of the disparity between CPU and memory utilization on a server or for a container. This index captures the degree to which one resource (either CPU or memory) is over-utilized relative to the other. The resource imbalance index of container
is
and the resource imbalance index of a server (
) is
The domain of
is (−1, 1), where the larger absolute value of
indicates a more severe imbalance, highlighting servers that require adjustment to align CPU and memory utilization.
The free-space size index is used to quantify the underutilized resource capacity of a server. As the overall free-space size increases, the resource fragments are further reduced. The free-space size of a server (
) is
and the overall free-space size of servers is
.
The detailed procedure of RDG is outlined in Algorithm 1. The core objective of RDG is to systematically select containers for migration from imbalanced servers to target servers, aiming to improve overall resource balance and reduce fragmentation. In row 2, RDG computes the priority of each server based on
where
is the weight of the resource imbalance index. This formula prioritizes servers that are both highly imbalanced (high
) and have small free space (high
), as these are the most critical candidates for defragmentation. Then, RDG sorts servers by priority. In rows 4–19, RDG attempts to migrate containers in the server with the highest priority. In rows 8–9, RDG evaluates the migration priority of containers through a correlation metric between the container and the server:
A higher positive
value indicates that the container’s resource profile (its own imbalance (
)) strongly aligns with and potentially exacerbates the server’s overall imbalance (
). The
term ensures containers with significant resource demands are considered. Thus, containers with the highest positive scores are prioritized for migration, as they are likely the main contributors to the server’s imbalance.
| Algorithm 1 Resource Defragmentation approach based on Greedy strategy (RDG) |
Input: the initial matrix , the maximum allowed migrations , Output: the resource defragmentation decision - 1:
Compute , - 2:
Compute the priority , and sort servers in descending order to form a priority list L - 3:
Initialize migration counter: = 0 - 4:
for do - 5:
if then - 6:
Break - 7:
end if - 8:
Identify containers on : , and compute correlation for - 9:
Sort in descending order of to get migration candidates - 10:
for do - 11:
Find optimal target server - 12:
if optimal exists and then - 13:
- 14:
Update H, - 15:
- 16:
Break - 17:
end if - 18:
end for - 19:
end for - 20:
return
|
Containers are then ranked by the correlation value, and the container with the highest positive scores is marked as the top migration candidate. The target server in row 11 should have sufficient resources and satisfy . This condition seeks the target server that minimizes the increase in overall system fragmentation () after the hypothetical migration. If such a target server exists, the container is migrated from to . The system state metrics are updated. This process repeats for the highest-priority server until the migration counter reaches the maximum times (). The output of RDG is a resource defragmentation decision matrix (). Based on the greedy strategy, RDG ensures efficient resource defragmentation, thereby improving overall resource utilization efficiency.
4.2.2. RL-RDG
In RL-RDG, the RDG algorithm is used to solve the resource defragmentation subproblem, and the Q-learning algorithm is used to solve the task-scheduling subproblem.
Q-learning is a common reinforcement learning algorithm that learns the rewards of specific actions in given states by constructing and updating a Q-table. The Q-table consists of Q-values () representing the expected rewards for taking an action in a specific state. In RL-RDG, the state contains the resource allocation status of each server, denoted as . An action in state is , which means assigning a set of one or more queued tasks () to the according servers. This action is valid only if has enough free CPU and memory to accommodate . The action causes a system state transition: . The reward of is . The Q-value is updated using the formula expressed as , where is the discount factor for future rewards.
The detailed procedure of RL-RDG is outlined in Algorithm 2. The algorithm operates through nested loops. The outer loop (Lines 2–16) continues until no valid actions exist for or all queued tasks are allocated . The inner loop involves learning Q-values by updating the Q-table. In each iteration,
Line 2: The RDG algorithm first performs container migration based on the current system state to optimize resource fragmentation, updating the server state (H);
Lines 3–6: The Q-table is initiated, and valid actions in the initial state () are checked for;
Lines 8–14: The inner Q-learning loop selects actions using the -greedy policy, observes rewards and next states, and updates Q-values until convergence;
Line 15: The optimal action sequence is selected based on the learned Q-values to schedule tasks.
| Algorithm 2 RL-RDG |
Input: , ,, , Output: The resource defragmentation and task scheduling decisions , - 1:
repeat - 2:
Migrate containers according to RDG algorithm, and update H - 3:
, - 4:
if then - 5:
Break - 6:
end if - 7:
- 8:
repeat - 9:
if then - 10:
Select according to -greedy Policy - 11:
Observe , R and update , - 12:
- 13:
end if - 14:
until converge - 15:
Select the optimal action to schedule tasks - 16:
until - 17:
return Task allocation records and migration decisions
|
The action selection strategy in row 10 is the
-greedy policy, which is defined as follows:
where
denotes the number of optional actions. The above policy classifies optional actions into two categories: one category consists of the action(s) that maximize(s) the Q-value, and the other includes all remaining actions. The iterative process continues until
is converged. The convergence criterion is defined as either reaching the maximum iteration limit of 1000 or the maximum change in Q-values between consecutive iterations falling below a threshold of 0.001. It is worth noting that the episode limit and the threshold value can be adjusted according to specific application scenarios and problem scales. After task scheduling, RDG is triggered to perform container migration based on the updated server state. The outer iteration continues until no valid actions exist for
or all queued tasks are allocated
.
4.2.3. DRL-RDG
In our proposed DRL-RDG, the deep Q-network algorithm replaces the traditional Q-learning algorithm to solve the task-scheduling subproblem. The deep Q-network is a common deep reinforcement learning algorithm that learns the expected long-term rewards of specific actions in given states using a neural network. It can overcome the limitation of Q-learning’s discrete Q-table, which fails to handle high-dimensional state spaces efficiently.
In DRL-RDG, there are two neural networks (Q and ) and a replay memory (D). The two network (Q and ) have the same neural network structure, while they have different network parameter vectors ( and ). The neural networks contain four layers: an input layer, two fully connected layers, and an output layer. The Q-values are represented by and under state and action a, respectively. The replay memory (D) stores the observed experience , which is used to train the neural network.
The DRL-RDG algorithm is outlined in Algorithm 3. The algorithm operates in two nested loops:
The outer loop (Lines 4–26) iterates until all queued tasks are scheduled. In each iteration, RDG first performs container migration to optimize resource fragmentation (Line 3). The initial state () is then constructed from the updated system state (Line 4). If no valid scheduling actions exist from this state, the loop terminates (Lines 5–7).
The inner loop (Lines 8–23) represents the DQN training and decision-making phase over multiple episodes and steps. Within each step, if queued tasks exist, an action (a) is selected using an -greedy policy (Line 12). This policy balances exploration (random action with probability) and exploitation (selecting the action with the highest Q-value from Q). After executing the action, the system observes the new state and reward, stores the experience in replay memory (Line 13), and updates the current state (Line 14).
| Algorithm 3 DRL-RDG |
Input: , , Output: The resource defragmentation and task scheduling decisions , - 1:
Initialize replay memory D, network Q with random parameter , target network with parameter - 2:
repeat - 3:
Migrate containers according to RDG algorithm, and update H - 4:
Obtain - 5:
if then - 6:
Break - 7:
end if - 8:
for episode = 1, …, E do - 9:
- 10:
for step = 1, …, C do - 11:
if then - 12:
Select according to -greedy Policy - 13:
Observe , R, update , and store in D - 14:
- 15:
end if - 16:
Sample a set of experience from D - 17:
Compute , and update to minimize - 18:
step = step + 1 - 19:
if mod (step, updatestep) = 0 then - 20:
- 21:
end if - 22:
end for - 23:
end for - 24:
Select the optimal action to schedule tasks - 25:
until - 26:
return Task allocation records and migration decisions
|
The
-greedy policy in row 12 is based on Equation (
16), where the Q-value is
. The neural network training phase (Lines 16–23) involves sampling a batch of experiences (
) from replay memory (
D). Based on these experience samples, the key idea of the update of
Q is to minimize the difference between the Q-values predicted by the online network (
) and the target values generated by the target network (
). The loss function is expressed as follows:
where
is the cardinality of set
. This mean squared error loss ensures that the online network’s predictions gradually align with the more stable target values.
For every updatestep, is updated by copying Q, i.e., the target network parameters () are synchronized with the online network parameters (; Line 21) to stabilize training. The iteration between RDG and the deep Q-network is the same as the iteration between RDG and Q-learning in the RL-RDG algorithm. The iterationalso continues until no valid actions exist for or all queued tasks are allocated ().
5. Performance Evaluation
In this section, we use simulations to evaluate the performance of RL-RDG and DRL-RDG.
The system investigated in this paper is a queuing cloud computing system. The queuing environment is simulated by configuring servers with 96 CPUs and 256G memory. The results of
Google trace show that the number of tasks that arrive every 5 min follows a Poisson distribution [
32]. The task duration follows a heavy-tailed distribution, shaped with 80% of jobs in the trace being shorter than the average job duration [
32,
33]. Therefore, in the simulation, a Poisson distribution is used to generate a per time-slot number of arriving tasks. The task duration time and the resource requirement follow heavy-tailed distributions [
25]. Therefore, the number of tasks arriving in each time slot follows P
. The CPU requirement of tasks follows the log-normal distribution (LN(3, 0.75)), and the maximum CPU requirement is set as 64. The memory requirement of tasks follows the log-normal distribution (LN(4, 0.75)), and the maximum memory requirement is set as 128. The task duration time of tasks follows the Pareto distribution (Pa(1.5, 2)). The weight coefficients of CPU and memory utilization (
and
) are set as 0.5. The weight coefficients of the average task completion time and the resource utilization rate (
and
) are set as 0.1 and 0.9.
To evaluate the efficiency of DRL-RDG, we investigate the performance in terms of the task completion time and the resource utilization rate under various server cluster scales and task intensities, which are controlled through different parameter settings of the number of servers (m) and task arrival rates (). The benchmark methods include FIFO-RDG, SJF-RDG, and DRL-o. FIFO-RDG and SJF-RDG are approaches under the two-layer iterative framework. DRL-o only considers task scheduling without resource defragmentation. Their core mechanisms are introduced as follows:
SJF-RDG: In the first layer, the Shortest Job First (SJF) algorithm which prioritizes tasks with shorter execution times, is employed to solve the task-scheduling subproblem. In the second layer, the RDG algorithm is employed to solve the resource defragmentation subproblem.
FIFO-RDG: In the first layer, the First In–First Out (FIFO) algorithm is employed to solve the task-scheduling subproblem. FIFO-scheduled tasks follow the chronological order of task arrival. In the second layer, the RDG algorithm is employed to solve the resource defragmentation subproblem.
DRL-o: In DRL-o, only the deep Q-network algorithm is employed to solve the task-scheduling problem without resource defragmentation.
5.1. Server Cluster Scale
In this scenario, the number of servers varies from 30 to 70, and the average number of tasks arriving per time slot is equal to the number of servers. This parameter setting is designed to simulate a high-intensity and dynamically scalable cluster environment, where the system faces consistent task intensity in terms of the task-to-server ratio as the number of servers increases.
Figure 3 shows the performance comparison in terms of the task completion time and the resource utilization rate under various server cluster scales.
Table 2 shows the quantitative comparison of performance metrics between DRL-RDG and RL-RDG.
Figure 3a shows that DRL-RDG and RL-RDG outperform SJF-RDG and FIFO-RDG in terms of average task completion time. This is because both algorithms can learn and dynamically adjust scheduling strategies to adapt to load changes, unlike the relatively fixed strategies of SJF-RDG and FIFO-RDG. DRL-RDG and RL-RDG also outperform DRL-o due to resource defragmentation. Specifically, for DRL-RDG, as the number of servers increases, its advantage over RL-RDG in average latency becomes more significant. In high-dimensional scenarios (with more servers), DRL-RDG can better handle complex state spaces, while RL-RDG is limited by insufficient Q-value learning, resulting in slightly higher latency compared to DRL-RDG. Compared with FIFO-RDG, the SJF-RDG scheduling strategy prioritizes short tasks, avoiding short tasks being blocked by long tasks, thereby reducing the overall average latency. DRL-o’s average task completion time lies between those of RL-RDG/SJF-RDG and FIFO-RDG. While DRL-o benefits from DRL’s ability, it suffers from persistent resource fragmentation, leading to a longer queuing time.
Figure 3b,c show that DRL-RDG and RL-RDG achieve relatively high resource utilization compared to SJF-RDG, FIFO-RDG, and DRL-o. DRL-RDG shows a slightly better trend as the number of servers increases, which is related to its better performance in handling complex scenarios. Note that as the number of servers increases, the utilization rate of these algorithms exhibits a slight improvement. This is because more servers mean more fragments can be integrated, and the efficiency of the overall resource pool is enhanced. In summary, the learning-based DRL-RDG and RL-RDG algorithms show significant advantages compared to the traditional SJF-RDG and FIFO-RDG in terms of average task completion time and resource utilization.
5.2. Task Arrival Rate
In this scenario, the number of servers is fixed at 50, while the task arrival rate varies incrementally from 40 to 60. This parameter configuration is intentionally designed to simulate a fixed-scale cluster under dynamically varying task intensities.
Figure 4 shows the performance comparison in terms of the task completion time and the resource utilization rate under various task arrival rates.
Table 3 shows the quantitative comparison of performance metrics between DRL-RDG and RL-RDG.
Figure 4a shows that as the task arrival rate increases, the average task completion time of all algorithms shows an upward trend. This is because a higher task arrival rate increases the probability of task queuing, thereby prolonging the average task completion time. Specifically, as the task arrival rate increases, SJF-RDG reduces the average completion time more effectively than FIFO-RDG by prioritizing short-duration tasks. DRL-o exhibits an average task completion time that is comparable to that of SJF-RDG. In contrast to SJF-RDG, RL-RDG and DRL-RDG further decrease the average completion time. This is achieved by comprehensively accounting for the task duration time, resource requirements of tasks, and the long-term implications of the current task-scheduling strategy.
Figure 4b,c show that as the task arrival rate increases, the average CPU and memory utilization of algorithms first rise and then stabilize. In the initial stage of an increasing task arrival rate, more tasks make better use of CPU and memory resources, thereby increasing resource utilization. When the task arrival rate reaches a certain level, the system attains a steady state, leading to stabilized resource utilization. Compared to RL-RDG and DRL-RDG, DRL-o exhibits lower average CPU and memory utilization. Since DRL-o does not perform resource defragmentation, scattered idle resources cannot be consolidated and allocated to tasks effectively. Compared to RL-RDG and DRL-RDG, SJF-RDG lacks long-term strategic optimization and, thus, exhibits more fluctuating utilization patterns.
5.3. Statistical Analysis of Performance Differences
To quantitatively validate the observed performance differences between DRL-RDG and RL-RDG, we conduct Mann–Whitney U tests. The statistical analysis results presented in
Table 4 demonstrate that all
p-values are below the 0.05 significance level, confirming the statistical significance of our findings across all evaluation metrics and system scales. The statistical results confirm the performance advantages of our proposed DRL-RDG approach.