Dynamic Task Scheduling Based on Greedy and Deep Reinforcement Learning Algorithms for Cloud–Edge Collaboration in Smart Buildings

Yang, Ping; He, Jiangmin

doi:10.3390/electronics14163327

Open AccessArticle

Dynamic Task Scheduling Based on Greedy and Deep Reinforcement Learning Algorithms for Cloud–Edge Collaboration in Smart Buildings

by

Ping Yang

and

Jiangmin He

^*

School of Electronics and Artificial Intelligence, Shaanxi University of Science & Technology, Xi’an 710021, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(16), 3327; https://doi.org/10.3390/electronics14163327

Submission received: 9 August 2025 / Revised: 18 August 2025 / Accepted: 19 August 2025 / Published: 21 August 2025

Download

Browse Figures

Versions Notes

Abstract

Driven by technologies such as the Internet of Things and artificial intelligence, smart buildings have developed rapidly, and the demand for processing massive amounts of data has risen sharply. Traditional cloud computing is confronted with challenges such as high network latency and large bandwidth pressure. Although edge computing can effectively reduce latency, it has problems such as resource limitations and difficulties with cluster collaboration. Therefore, cloud–edge collaboration has become an inevitable choice to meet the real-time and reliability requirements of smart buildings. In view of the problems with the existing task scheduling methods in the smart building scenario, such as ignoring container compatibility constraints, the difficulty in balancing global optimization and real-time performance, and the difficulty in adapting to the dynamic environments, this paper proposes a two-stage cloud-edge collaborative dynamic task scheduling mechanism. Firstly, a task scheduling system model supporting container compatibility was constructed, aiming to minimize system latency and energy consumption while ensuring the real-time requirements of tasks were met. Secondly, for this task-scheduling problem, a hierarchical and progressive solution was designed: In the first stage, a Resource-Aware Cost-Driven Greedy algorithm (RACDG) was proposed to enable edge nodes to quickly generate the initial task offloading decision. In the second stage, for the tasks that need to be offloaded in the initial decision-making, a Proximal Policy Optimization algorithm based on Action Masks (AMPPO) is proposed to achieve global dynamic scheduling. Finally, in the simulation experiments, the comparison with other classical algorithms shows that the algorithm proposed in this paper can reduce the system delay by 26–63.7%, reduce energy consumption by 21.7–66.9%, and still maintain a task completion rate of more than 91.3% under high-load conditions. It has good scheduling robustness and application potential. It provides an effective solution for the cloud–edge collaborative task scheduling of smart buildings.

Keywords:

smart building; cloud-edge collaboration; container; greedy algorithm; deep reinforcement learning; dynamic scheduling

1. Introduction

With the rapid development of advanced technologies such as the Internet of Things, cloud computing and artificial intelligence, smart buildings, as an important component of urban intelligence, are gradually becoming a research hotspot [1,2,3]. Smart buildings achieve the intelligent automation of functions such as environmental control, energy management and security monitoring within buildings by integrating various sensors, intelligent devices and information systems [4]. However, with the continuous development of smart buildings, the demand for processing massive amounts of data is increasing day by day. The traditional centralized cloud computing architecture has problems such as high network latency and large bandwidth pressure when processing data, and it is difficult to meet the strict requirements of smart buildings for real-time performance, reliability, etc. [5]. Edge computing, by processing data locally, can effectively reduce latency, relieve the pressure on the cloud, and respond quickly to real-time tasks such as equipment failure early warning [6]. However, when edge computing is applied in the field of building control, due to the relative independence among various building parks, edge devices cannot form large-scale edge clusters. The autonomy of the edge layer achieved by using collective intelligence cannot solve the problem of local resource constraints in each edge node. Therefore, cloud-edge collaboration has become an inevitable choice for the development of smart buildings. It integrates the powerful computing power of cloud computing and the real-time processing advantages of edge computing, achieving the scheduling of high-computationally intensive tasks to the cloud side for execution and the execution of high-real-time tasks locally [7], thereby enhancing the rapidity of service response and resource utilization in smart buildings.

Task scheduling, a critical element of cloud–edge collaboration [8], dynamically assigns tasks to cloud or edge resources according to their specific needs. This intelligent allocation optimizes resource utilization and accelerates task completion. The approach sees broad application across diverse sectors, including industrial manufacturing, intelligent transportation, and intelligent healthcare [9,10,11,12]. However, due to the characteristics of smart buildings, such as their heterogeneous and limited edge node resources, diverse task types, and dynamic and bursty task generation, the existing task scheduling research has the following limitations.

(1): Ignoring container compatibility constraints: Most existing studies assume that edge nodes have homogeneous computing resources and default tasks can be flexibly migrated to any node for execution, without fully considering issues such as container constraints in actual deployment [13]. In actual deployment, the functions and operation logics of each container application are independent, and tasks can only run within the appropriate application containers. For example, the video transcoding task relies on containers of specific codec libraries, while the data analysis task requires containers that are adapted to dedicated algorithm frameworks.
(2): It is difficult to balance global optimization and real-time performance: Global optimization algorithms based on reinforcement learning and other techniques require global state modeling and complex calculations, resulting in high decision delay [14], making it difficult to deal with sudden scenarios such as fire alarms in smart buildings. Although heuristic algorithms are fast, they can easily obtain local optimal solutions [15]. For example, greedy algorithms are prone to cause resource imbalance and cannot take into account global optimization requirements such as energy consumption, delay, and the resource utilization rate.
(3): Difficulty in adapting to dynamic environments: Mainstream algorithms such as particle swarm optimization and gray Wolf optimization [16] and other meta-heuristic algorithms lack environmental perception and learning capabilities. In complex scenarios such as dynamic fluctuations in the resource status of smart buildings, they are prone to fall into local optima and have difficulty dynamically adjusting strategies, resulting in task scheduling lagging behind real-time demands and being unable to efficiently respond to real-time changes in the system status.

To overcome the limitations of prior research, this study introduces a two-stage cloud-edge collaborative dynamic task-scheduling mechanism. This approach leverages the rapid solving capability of heuristic algorithms and the adaptive nature of deep reinforcement learning, seeking the optimal trade-off among latency, energy consumption, and task completion rate. The primary contributions of this work are summarized as follows:

A task-scheduling system model with container compatibility constraints is developed for the typical application scenario of smart buildings. Under the assumption of static container deployment, the model jointly considers task characteristics, real-time requirements, and edge node resource status to optimize the overall task latency and energy consumption. Additionally, a timeout penalty mechanism based on each task’s maximum tolerable delay is introduced to ensure the timeliness of critical tasks.
A two-stage dynamic task scheduling mechanism is proposed: In the first stage, a Resource-Aware Cost-Driven Greedy scheduling algorithm (RACDG) is designed to make rapid offloading decisions at edge nodes. In the second stage, the Proximal Policy Optimization algorithm based on Action Mask (AMPPO) is introduced to optimize the task-scheduling strategy based on the global state of the system, mask illegal actions, improve the global scheduling performance, and adapt to the dynamic task load changes in the smart building.
A simulation environment for the smart building scenario is established to simulate real conditions such as dynamic arrival of tasks according to the Poisson process, heterogeneous edge node resources, and limited deployment of application containers. Moreover, a variety of representative scheduling algorithms are selected for comparative experiments. The experimental results show that the proposed algorithm can significantly improve the task completion rate under high-load conditions, reduce system delay and energy consumption, and demonstrate good scheduling robustness and practical application potential.

The rest of the paper is organized as follows. The related work is presented in Section 2. Then, Section 3 describes the system model and the objective optimization problem. Section 4 describes the proposed algorithm in detail. Section 5 presents the simulation results and analysis. Finally, we conclude our work in Section 6.

2. Related Work

2.1. Cloud–Edge Collaborative Computing

In the field of Internet of Vehicles (IoV), Shu et al. [17] proposed the Adaptive Computing Offloading and Resource Allocation Strategy (ACORAS), which optimizes the offloading of computing tasks and resource allocation for vehicular devices by leveraging the synergistic benefits of cloud and edge computing. This strategy significantly reduces both latency and energy consumption. Specifically, ACORAS employs the Particle Swarm Optimization (PSO) algorithm to dynamically adjust offloading decisions and resource allocation strategies, thus enhancing computational efficiency and response speed within the IoV environment. This provides a practical solution for the application of cloud-edge collaboration in IoV scenarios. In intelligent healthcare systems, the cloud-edge collaboration architecture similarly demonstrates its advantages in processing large-scale medical data and improving the responsiveness of healthcare services. Su et al. [12] proposed a bi-level optimization scheduling model based on the cloud-edge collaboration framework, aimed at optimizing resource scheduling in intelligent healthcare systems, particularly the efficient utilization of Distributed Generation (DG), Energy Storage (ES), and Controllable Load (CL). By offloading tasks to edge computing nodes and integrating cloud processing, the system ensures timely medical data processing while also minimizing transmission delays and energy consumption. In summary, the potential of cloud-edge collaboration architecture is progressively being realized across various industries. As the technology continues to mature, it is expected to play an increasingly pivotal role in a wider range of application scenarios in the future.

2.2. Task-Scheduling Algorithm

For task offloading and scheduling problems, the existing researches mainly include heuristic algorithms, meta-heuristic algorithms and reinforcement learning methods.

Heuristic algorithms are known for their intuitive nature and relative simplicity, enabling the rapid generation of feasible solutions. To address the limited computational capabilities of terminal devices and optimize the offloading of dependent tasks to edge nodes or the cloud, thereby preventing execution delays and errors, Zhang et al. [18] developed a greedy strategy-based offloading algorithm aimed at reducing response time. Similarly, Hao et al. [19] introduced a heuristic greedy approach to the task-offloading problem, specifically targeting the minimization of the last task’s response time. They further integrated a domain search algorithm to devise an offloading strategy, effectively tackling the response time optimization issue. This combined method significantly reduced latency and reflected a global optimization perspective. Nevertheless, a common drawback of such algorithms is their dependence on fixed rules, which often hinders their adaptability to dynamic environments characterized by fluctuating system states.

Meta-heuristic algorithms can usually achieve faster convergence speed and have the ability to handle multiple objectives. Xiao et al. [20] proposed an improved binary particle swarm optimization algorithm to minimize task delay and energy consumption, enabling the joint optimization of task offloading and content caching. Jia et al. [21] developed a modified whale optimization algorithm targeting the reduction of task-scheduling time, cost, and virtual machine load within cloud computing environments. Song et al. [22] established a task-offloading model for mobile edge computing and proposed a particle swarm optimization algorithm based on fuzzy rules to minimize application completion time and energy consumption. However, these algorithms have shortcomings such as high computational complexity, insufficient real-time performance, limited adaptability to dynamic environments, difficulties with parameter optimization, etc., and may affect scheduling efficiency due to iteration time consumption and hardware computing power limitations in dynamic scenarios such as smart buildings.

Reinforcement learning is a data-driven algorithm that adaptively learns the optimal policy by interacting with the environment. When the characteristics of the task-unloading problem are not completely known or cannot be fully characterized by predefined rules or policies, reinforcement learning is more suitable for solving the problem. Wang et al. [23] adopted the Q-learning algorithm in reinforcement learning to optimize response time for UAV (Unmanned Aerial Vehicle) task offloading. Zhao et al. [24] proposed a deep reinforcement learning-based service request scheduling method with pointer networks, aiming to optimize resource utilization and reduce runtime and waiting time in edge computing. The state of resources in cloud–edge collaborative scenarios is time-varying, reinforcement learning technology emphasizes interaction with environment, and deep learning algorithms are becoming more and more advanced. Integrating these two technologies into cloud–edge scenarios is one of the future trends. Sellami et al. [25] employed DRL for dynamic task scheduling, yielding marked improvements in energy efficiency and latency. Tang et al. [26] leveraged task-priority DRL to optimize cloud-edge offloading strategies, cutting average energy and delay costs.

Among various reinforcement learning algorithms, Proximal Policy Optimization (PPO) has been widely applied to task offloading and resource scheduling problems in recent years due to its strong stability and convergence. Compared with traditional policy gradient methods, PPO improves the stability and sample efficiency of the training process by restricting the magnitude of policy updates, which prevents overly large updates. This makes PPO particularly suitable for edge computing environments with complex state spaces and dynamic resource conditions. Chen et al. [27] proposed a computation offloading strategy based on PPO and self-attention mechanisms, which significantly enhanced the computation offloading efficiency and resource allocation performance in MEC-powered smart factories. Liu et al. [28] proposed a computation offloading and resource allocation strategy based on PPO to address the issues of delay and resource scarcity caused by multi-user task offloading, significantly optimizing delay performance in MEC environments. Although PPO has achieved remarkable results in various task scheduling applications, determining how to handle more complex scheduling demands and achieve optimal scheduling strategies under limited resources in smart building scenarios remains an open issue.

In the smart building scenario, Feng et al. [29] deeply discussed the task-scheduling challenges of edge computing in a smart building environment. To address challenges including real-time processing constraints, high energy consumption, disordered resource matching, and diversified functional requirements, the SSA-GA optimization algorithm and task scheduling model of improved AHP method were adopted. These approaches simultaneously optimized processing latency and energy efficiency while enhancing user satisfaction and experience. Shen et al. [30] used differential dictionary coding compression technology to estimate the data compression ratio and its overhead, and combined the Levy flight algorithm and improved gray wolf algorithm to find the optimal offloading scheme, aiming at the problem of limited edge computing environment resources and high delay and energy consumption of task offloading in smart buildings. This method significantly reduced the overall delay and energy consumption of task offloading. Nevertheless, these studies are mostly based on static or idealized system modeling, lacking comprehensive consideration of key practical factors such as task dynamic generation, edge node resource heterogeneity, container deployment constraints, etc. Therefore, there are still some problems of insufficient adaptability and generalization ability when dealing with typical scheduling scenarios such as task load mutation and resource constraint complexity in smart buildings.

Therefore, considering the limitations of existing research and algorithms, this paper proposes a two-stage collaborative scheduling algorithm combining local perception and global optimization, which balances computational efficiency and optimization accuracy by a heuristic fast decision and deep reinforcement learning fine-tuning two-level scheduling strategy, and improves the adaptability and real-time response of the system to a dynamic environment.

3. System Model and Problem Definition

3.1. System Model

Most of the current mainstream cloud–edge collaboration architectures are cloud–edge-device three-layer structures with the edge layer as the core, and edge clusters are formed by connecting numerous edge devices to control the cloud edge collaboration process. However, in the smart building environment, because each area is independent, edge devices must use the cloud to realize information transmission, and it is impossible to use cluster intelligence to manage the entire edge layer. Therefore, it is necessary to take the cloud as the center and implement unified management for all edge devices and cloud edge collaboration processes [31].

Therefore, in view of the application characteristics of edge computing in smart building scenarios and the current limitations in the field of building control, this paper constructs a cloud-management-edge-device four-layer cloud-edge collaboration overall system model architecture, as shown in Figure 1.

Application cloud platform layer: Deployed in the cloud server, it provides four types of core application services such as data visualization analysis, remote device monitoring, AI model training and maintenance, and controller configuration management for the system, which is the upper support for intelligent operation of smart building business.

Cloud management platform layer: As a general capability component of the platform independent of the application cloud, this layer focuses on three functions: edge device management, resource scheduling management and cloud edge collaborative control. Task scheduling and offloading decisions are mainly made by this layer, which realizes reasonable task allocation between the “edge” and “management” layers and enhances overall system resource utilization and task response efficiency.

Edge node layer: This layer consists of edge controller hardware and the node operating software running on it. Nodes communicate with the field device layer and connect to cloud via WiFi or 4/5G. Meanwhile, some of the control logic of the subsystem by the cloud is sinking to the edge nodes and running in the form of application containers, enabling it to still have independent building management and control capabilities in the offline state.

Device layer: including lighting, temperature control, video monitoring and security subsystems in the building, connected to the edge node through an industrial field bus or Ethernet cable.

In the management and control process of smart buildings, the trigger, calculation, and result execution scheme of computing tasks in each edge node are shown in Figure 2, which involves three layers of application cloud, cloud management node and edge node of cloud–edge collaboration overall architecture.

In Figure 2, the orange arrows represent all possible event-triggering conditions for each subsystem; the black arrows represent the flow of tasks actually executed in each node; the red arrows represent tasks that each edge node hopes to offload to the cloud for execution; the blue arrows represent the final decision of the cloud to allocate tasks to edge nodes or accept requests for execution in the cloud; and the green arrows represent the calculation results returned after the completion of the calculation task.

Compared with traditional architectures, this architecture is more realistic in terms of task source, execution mode and resource modeling. In traditional architectures, tasks are usually triggered by end devices, and the edge controller directly allocates resources to execute them after receiving the tasks. Cloud resources are commonly presumed to be infinite, with tasks being executed instantaneously after offloading.In this architecture, tasks are intuitively delivered by timed triggers in edge nodes, containerized applications, or application cloud platforms.

In addition, this architecture requires all tasks to run in specific application containers [32]. If the edge node does not deploy the required containers, the tasks need to be offloaded to the cloud for execution. Compared with the traditional scheduling mechanism assuming “unlimited cloud resources”, the resource pool of the cloud management node in the model established in this paper is limited, and even if the tasks are offloaded to the cloud, they need to queue for resource allocation, which improves the fit degree of the model to the actual deployment conditions and simulation reliability.

Based on the above architecture, the task-scheduling system consists of a cloud management node and edge nodes, represented by a set as

N = {0, 1, 2, . . ., n}

, where 0 represents the cloud management node. The total amount of computing resources in the management node and each edge node is represented by a set

F = {f_{0}^{max}, f_{1}^{max}, f_{2}^{max}, . . ., f_{n}^{max}}

, the total amount of bandwidth resources is represented by a set

W = {w_{0}^{max}, w_{1}^{max}, w_{2}^{max}, . . ., w_{n}^{max}}

, and the total amount of storage resources is represented by a set

S = {s_{0}^{max}, s_{1}^{max}, s_{2}^{max}, . . ., s_{n}^{max}}

.

3.1.1. Task Model

Task is a series of independent tasks that arrive dynamically at the edge node at any time and in random order, represented by the set

M = {m_{1}, m_{2}, . . ., m_{k}}

, where k represents the total number of tasks. Each arriving task has its basic information, denoted as

m_{k} = {c_{k}, d_{k}, v_{k}, t_{k}^{max}}

, where

c_{k}

is the amount of computation required to complete the task

m_{k}

,

d_{k}

is the amount of data required by the task,

v_{k}

is the application container required by the task

m_{k}

, and

t_{k}^{max}

is the maximum tolerance time of the task

m_{k}

.

3.1.2. Decision-Making Model

In order to more accurately calculate the scheduling policy of the task-scheduling system at each time, the execution process of the task scheduling system is divided into T time periods, and each time period is represented by

t \in Γ = {1, . . ., T}

. In the application repository of the cloud management node, there are V different application containers, and the set

V = {1, 2, . . ., v}

is representative, so we assume that each task has a corresponding service container. The same container can execute multiple different tasks, and each container can only process one task at the same time. The many-to-one relationship between tasks and containers can be expressed as follows: for any task

m_{k} \in M

and v belongs to V,

v = m a p (m_{k})

indicates that task

m_{k}

needs to be processed in container V. When multiple application containers are configured on cloud management nodes and edge nodes at the same time, the storage resource constraints of each node should be satisfied:

\sum_{v \in V} a_{n}^{v} (t) h_{n}^{v} \leq s_{n}^{max}, \forall t, n

(1)

where

a_{n}^{v} (t) \in {0, 1}

denotes whether node n deploys a container v within time t.

h_{n}^{v}

denotes the storage resources allocated by node n to the container. To reduce modeling complexity and deployment overhead, this work assumes a static container deployment strategy, where containers remain fixed during the task-scheduling process without dynamic migration.

In each time period, each edge node reports a task set to be offloaded to the management node, and each task set to be offloaded is represented by

x_{n} (t) = {m_{n, 1}, m_{n, 2}, . . ., m_{n, k}}

, where

x_{n} (t)

represents the task queue that node n strongly hopes to offload to the cloud for execution in time period t, and there are k tasks to be offloaded in this queue, which are represented by

X (t) = {x_{1} (t), x_{2} (t), . . ., x_{n} (t)}

after being summarized by the cloud management node. In order to describe the task-scheduling policy executed by the system in time period t, let

y (t) = {y_{n}^{m_{k}} (t) : n \in N, k \in x_{n} (t)}

(2)

where

y_{n}^{m_{k}} (t) \in \{0, 1\}

represents the final task-scheduling policy of the to-be-unloaded task set reported by node n in time period t, and the task

m_{k}

reported by node n in time period t will be unloaded to the cloud for execution; otherwise, it represents execution at the edge node.

3.1.3. Delay Model

The total task completion time in this architecture is primarily determined by computation time, data transmission latency, and queuing delays.

A task’s computation time is determined by its computational requirements and the computing power of the executing node. For task

m_{k}

, the computation time needed to execute the task is

t_{c}^{m_{k}} = (1 - y_{n}^{m_{k}} (t)) \frac{c_{j}}{f_{n}^{m_{k}} (t)} + y_{n}^{m_{k}} (t) \frac{c_{j}}{f_{0}^{m_{k}} (t)}

(3)

where

f_{n}^{m_{k}}

is the computation resource allocated by node n to task

m_{k}

, during t time period.

The task transmission time is the task data transmission time. If the task

m_{k}

is executed at the edge node,

t_{t r}^{m_{k}} = 0

. When the task is offloaded from the edge node to the cloud, the data transmission time of the task

m_{k}

is

t_{t r}^{m_{k}} = y_{n}^{m_{k}} (t) \frac{d_{k}}{r_{n, 0}}

(4)

where

r_{n, 0}

is the data transmission rate between the edge node n and the cloud management node, expressed as

r_{n, 0} = w_{n}^{m_{k}} (t) {log}_{2} (1 + \frac{h_{n} (t) p_{n, t r}}{σ^{2}})

(5)

where

w_{n}^{m_{k}} (t)

is the channel bandwidth of the node n transmitting the output data of task

m_{k}

in t time period.

h_{n} (t)

is the channel gain between the edge node n and the cloud management node, i.e., the network state.

p_{n, t r}

is the transmission power of the edge node n.

σ^{2}

is Gaussian white noise.

The queue waiting time of a task is the time that the task spends waiting at the node due to insufficient resources. The waiting time of each task in the queue is equal to the accumulation of the computing time of the previous tasks, expressed as

t_{w, q}^{m_{k}} = \sum_{m_{k} \in Q (n)} t_{c}^{m_{k}}

(6)

where

Q (n)

represents the execution queue of the task at node n. If task

m_{k}

, is the first task in the queue, then

t_{w, q}^{m_{k}} = 0

.

The total time taken to execute the task for any node

n \in N

in the time period t is

T_{t o t a l} = \sum_{m_{k} \in M} (t_{c}^{m_{k}} + t_{t r}^{m_{k}} + t_{w, q}^{m_{k}})

(7)

3.1.4. Energy Consumption Model

Task execution energy consumption primarily comprises computation and communication components.

Computation energy refers to the power consumed by processing tasks at a given node, expressed as

E_{c}^{m_{k}} = (1 - y_{n}^{m_{k}} (t)) p_{n, c} \frac{c_{j}}{f_{n}^{m_{k}} (t)} + y_{n}^{m_{k}} (t) p_{0, c} \frac{c_{j}}{f_{0}^{m_{k}} (t)}

(8)

where

p_{n, c}

represents the calculation power of node n, usually expressed as

p_{n, c} = b {f_{n}}^{3}

, where b depends on the coefficient of chip structure [33].

Communication energy refers to power consumption during data transmission, formulated as

E_{t r}^{m_{k}} = y_{n}^{m_{k}} (t) p_{n, t r} \sum_{m_{k} \in M} t_{t r}^{m_{k}}

(9)

Thus, total task energy consumption comprises the sum of computation and communication energy, formulated as

E_{t o t a l} = \sum_{m_{k} \in M} (E_{c}^{m_{k}} + E_{t r}^{m_{k}})

(10)

3.2. Optimization Objective

This paper comprehensively considers the nature of the task, its real-time requirements and node resources; optimizes the total delay and energy consumption of the system task when it is executed at the edge or in the cloud; and introduces a timeout penalty term according to the maximum tolerance time of the task to ensure the real-time performance of the critical task. The specific objective function is as follows:

min \sum_{r = 1}^{R} (α T_{t o t a l}^{r} + β E_{t o t a l}^{r} + P_{i})

(11)

\begin{matrix} s . t . & C 1 : \sum_{v \in V} a_{n}^{v} (t) h_{n}^{v} \leq g_{n}^{max}, & \forall n & \in N, t \in Γ \\ C 2 : 0 < f_{n}^{m_{k}} (t) \leq f_{n}^{max}, & \forall n & \in N, m_{k} \in M, t \in Γ \\ C 3 : 0 < \sum_{m_{k} \in M} f_{n}^{m_{k}} (t) \leq f_{n}^{max}, & \forall n & \in N, t \in Γ \\ C 4 : 0 < w_{n}^{m_{k}} (t) \leq w_{n}^{max}, & \forall n & \in N, m_{k} \in M, t \in Γ \\ C 5 : 0 < \sum_{m_{k} \in M} w_{n}^{m_{k}} (t) \leq w_{n}^{max}, & \forall n & \in N, t \in Γ \end{matrix}

where

P_{i} = \{\begin{matrix} μ \cdot (T_{t o t a l}^{i} - t_{i}^{max}), i f T_{t o t a l}^{i} > t_{i}^{max} \\ 0, o t h e r w i s e \end{matrix}\}

is the timeout penalty term of the task, and the timeout penalty coefficient can be set according to the maximum tolerance time of the task;

α, β \in [0, 1]

are weight coefficients, and

α + β = 1

is satisfied, which respectively represents the sensitivity of tasks to delay and energy consumption; R represents the total number of tasks in the system in this time period;

T_{t o t a l}^{r}

represents the completion time of the rth task, and

E_{t o t a l}^{r}

represents the energy consumption caused by the execution of the rth task.

C 1

is the edge node storage resource constraint,

C 2, C 3

are the edge node computation resource constraints, and

C 4, C 5

are the edge node bandwidth resource constraints.

4. Greedy and Deep Reinforcement Learning-Based Task Scheduling Algorithm

4.1. Problem Decomposition

Since the actual deployment of edge nodes and cloud management nodes is far from becoming a reality, it is inevitable that there will be some delay in the information transmission process. Therefore, achieving complete information synchronization between edge nodes and management nodes will have a large time cost. Therefore, this paper divides the task-scheduling problem into two stages. The first stage is implemented by each edge node in a distributed manner, and the initial task-offloading decision is provided to the cloud under the premise of considering only its own resources. The second stage is performed centrally by the cloud management node. From the perspective of global resources, the final task-scheduling policy is generated in combination with the offload decisions of each edge node. The specific scheduling flow is shown in Figure 3.

Theoretically, although Equation (11) provides a comprehensive mathematical formulation of the task-scheduling objective—incorporating multiple performance dimensions such as task completion delay, energy consumption, and deadline violation penalties—the model essentially constitutes a Mixed-Integer Nonlinear Programming (MINLP) problem. As the number of tasks M and computing nodes N increases, the solution space expands exponentially, with a theoretical complexity of

O (N^{M} \times T^{M})

, where T denotes the number of scheduling time steps. This renders the problem NP-hard, for which no polynomial-time algorithm is known to guarantee exact solutions in large-scale scenarios.

In practical deployment environments, directly solving this global optimization model incurs substantial computational overhead and latency, making it unsuitable for real-time and responsive scheduling systems. To address this challenge, this study builds upon the proposed two-stage cooperative scheduling framework and introduces a hybrid optimization strategy that integrates a greedy heuristic with deep reinforcement learning. This approach significantly reduces algorithmic complexity while effectively approximating near-optimal solutions, thereby enabling a practical and efficient edge–cloud collaborative scheduling solution.

4.2. Distributed Edge Node Decision Phase

At this stage, each edge node participates in task scheduling from a local perspective, assuming that only itself and a single cloud-management node exist within the entire task decision-making system. It is further assumed that the cloud management node will fully execute the scheduling strategy proposed by the edge node. As a result, each edge node does not need to consider the global system state but only needs to make preliminary scheduling decisions for received tasks based on locally observable information.

Given the localized nature of scheduling decisions at this stage, the greedy algorithm is adopted as an efficient and practical approach due to its characteristic of selecting the currently optimal choice at each decision point. This algorithm offers low computational complexity and minimal implementation overhead, making it well-suited for deployment in resource-constrained edge computing environments. Moreover, the greedy strategy can achieve near-optimal scheduling performance in most scenarios, effectively meeting the requirements for decision-making timeliness and system responsiveness. Based on these considerations, this paper proposed a resource-aware cost-driven greedy scheduling algorithm for this stage, with its specific execution procedure illustrated in Algorithm 1.

Algorithm 1: Greedy scheduling algorithm based on a resource-aware cost-driven approach.

In Algorithm 1, each edge node first acquires the information of every task in the task queue awaiting allocation, including its required computation workload, data volume, necessary container, and maximum tolerable latency. The node then sequentially selects tasks from this queue for distribution. For each task, it checks whether the node contains the container required for its execution. If the container is not available, the task is placed into the edge node’s offloading queue. If the container is available, the algorithm calculates the execution cost of the task on both the edge and the cloud. When the edge execution cost is lower, the task is executed locally by adding it to the edge node’s local execution queue, followed by an update of the node’s resource state. Conversely, if the cloud execution cost is lower, the task is added to the offloading queue, removed from the distribution queue, and the queue is subsequently updated.

The time complexity of the algorithm is analyzed as follows: Initially, each edge node retrieves the task information from the task allocation queue, and each retrieval operation takes constant time,

O (1)

. Since this operation is performed independently for each task, the total time complexity for retrieving the information is

O (T)

, where T is the number of tasks in the queue. Afterward, the algorithm sequentially selects tasks and checks the availability of the required containers, which is also a constant time operation,

O (1)

. Subsequently, the algorithm calculates the execution costs of the task on both the edge and the cloud, a simple mathematical operation that takes constant time,

O (1)

. Based on the calculation results, the task is placed in the appropriate queue, and the update operation has a similarly constant time,

O (1)

. Therefore, each operation in the task processing phase has a constant time complexity.

In conclusion, the overall time complexity of the algorithm is dominated by the process of traversing the task queue and processing each task, resulting in a total time complexity of

O (T)

, where T is the number of tasks in the task allocation queue.

4.3. Cloud Management Node Decision Phase

In this phase, the objective of task scheduling is to decide whether to offload tasks to the cloud or distribute them to other edge nodes according to the global resource situation. Due to the need for real-time scheduling in environments characterized by dynamic global resource states, and the challenges arising from large state spaces and complex action selection, this paper proposes a PPO-based dynamic scheduling algorithm incorporating action masking. This approach facilitates more reasonable decision-making and improves resource utilization efficiency.

4.3.1. Markov Decision Process Modeling

Reinforcement learning is a machine learning method that learns optimal strategies through interactions between agents and the environment. In cloud–edge collaborative task scheduling, agents try different scheduling decisions and adjust strategies according to reward signals fed back by the environment to maximize long-term cumulative rewards. To formally describe the above process, the cloud–edge collaborative task scheduling problem is modeled as a Markov Decision Process (MDP), which can be represented by quintuple

(S, A, P, R, γ)

. From left to right, elements represent state space, action space, state transition function, reward function and discount factor. In this paper, the three key elements of state space, action space and reward function are defined as follows:

State space: In the process of task scheduling, the scheduler will make decisions according to the current task information and the current resource state information of each edge server and cloud server, so the state space consists of the above two kinds of information. The state space is defined as

S = {s_{t} ∣ s_{t} = (m, r, q)}

(12)

where

s_{t}

is the state at the t-th time step,

m = {c, d, v, t^{max}}

represents basic information of the task, which are the computation amount required for task completion, data amount, container required for task execution, and maximum tolerance time respectively.

r = {f, w}

represents the current resource state information of each edge server and cloud server, which are the current computation resource state and bandwidth resource state, respectively.

q = Q (n)

represents the current execution queue of each node.

Action space: Scheduler needs to decide whether the task is executed in cloud node or edge node, defined as

A = {0, 1, \dots, n}

(13)

where “0” means that the task is offloaded to the cloud for execution, and action “1” means that the task is assigned to the edge node with number 1.

Reward function: Because the reward function is usually related to the objective function, the goal of this optimization problem in this paper is to minimize the task completion time, energy consumption, and task timeout penalty in each time period of the system, while the solution goal is to obtain the maximum reward function value, so the reward function in this paper is defined as

R = - (α T_{t o t a l} + β E_{t o t a l} + μ \cdot \frac{T_{t o t a l}^{i} - t_{i}^{max}}{t_{i}^{max}})

(14)

4.3.2. PPO Dynamic Scheduling Algorithm Based on Action Mask

Proximal Policy Optimization (PPO) is a reinforcement learning method based on policy gradients, aimed at optimizing the policy to maximize long-term rewards. This algorithm was first proposed by Schulman et al. [34], with its key innovation lying in the introduction of a clipped objective function, built upon traditional policy gradient methods (e.g., TRPO), to stabilize the training process. This modification prevents excessively large policy updates, significantly enhancing both the stability of training and sample efficiency. PPO is based on the Actor–Critic architecture, where the Actor network generates action policies based on the current state and optimizes the policy through gradient ascent to maximize expected returns. The Critic network, on the other hand, evaluates the value of each state and provides feedback to help the Actor network refine its policy. Based on the advantages of the PPO algorithm, this paper applies it to global task scheduling decisions, addressing dynamic resource demands and ensuring efficient task allocation and resource utilization.

To avoid selecting nodes with insufficient resources or container mismatches during the task-scheduling process, this paper incorporates an action masking mechanism into the PPO algorithm to filter out infeasible actions under the current system state [35]. Specifically, at each scheduling decision step, the system generates an action mask based on the current resource status

r = {f, w}

and task requirements

m = {c, d, v, t^{max}}

. For each action in the action space

A = \{0, 1, . . ., n\}

, an action mask vector is generated as

M a s k = [m a s k_{0}, m a s k_{1}, . . ., m a s k_{n}] \in {0, 1}^{n + 1}

. The generation rule for the n-th action is defined as

m a s k_{n} = \{\begin{matrix} 1, & if node n \in F (c, d, v) \\ 0, & otherwise \end{matrix}

(15)

Here,

F (c, d, v)

denotes the set of feasible nodes that satisfy the task execution requirements under the current state, defined as

F (c, d, v) = \{n | v \in V, f_{n} \geq c, w_{n} \geq d\}

(16)

In this formulation,

f_{n}, w_{n}

denote the remaining computational resources and bandwidth of node n, respectively;

c, d

represent the required computational workload and data volume of the task;

V

represents the set of container types available on node n, and v is the container type required by the task.

A mask value of 1 indicates a feasible action under the current state, whereas a value of 0 denotes an infeasible action due to constraint violations. Actions masked as 0 are assigned zero probability in the policy network’s output, thereby restricting the exploration to valid actions and enhancing the convergence and robustness of policy learning.

In this algorithm, the output value

V (s_{t})

of the Critic network represents the estimated value of state s at time t. We use generalized dominance estimation (GAE) to calculate dominance function

{\hat{A}}_{t}

, which estimates dominance using a weighted sum of multiple time series differencing (TD) steps, thus minimizing bias while keeping variance low. It is defined as follows:

{\hat{A}}_{t}^{G A E} = \sum_{l = 0}^{\infty} {(γ λ)}^{l} δ_{t + l}

(17)

where

δ_{t} = r_{t} + γ V (s_{t + 1}) - V (s_{t})

is the time series difference error;

γ

is the discount factor; and

λ

is a hyperparameter. The objective function of PPO algorithm is

L_{t}^{C L I P} (θ) = E_{t} [min (P r_{t} (θ) {\hat{A}}_{t}, c l i p (P r_{t} (θ), 1 - ϵ, 1 + ϵ) {\hat{A}}_{t})]

(18)

where

P r_{t} (θ) = \frac{π (a_{t} | s_{t}; θ)}{π (a_{t} | s_{t}; θ_{o l d})}

is the probability ratio of the new strategy to the old strategy;

ϵ

is the hyperparameter controlling the clipping interval;

c l i p (\cdot)

is a clipping function, which constrains the value of

{Pr}_{t} (θ)

, so that it always lies within the interval

[1 - ϵ, 1 + ϵ]

.

PPO algorithm not only optimizes the strategy, but also updates the value function

V (s_{t})

by minimizing the mean square error:

L^{V F} (θ) = E_{t} [{(V (s_{t}; θ) - R_{t})}^{2}]

(19)

where

V (s_{t}; θ)

represents the value function prediction of the current state;

R_{t} = \sum_{k = 0}^{n} γ^{k} r_{t + k}

is the cumulative return.

At the same time, to encourage exploration and prevent premature local convergence, the entropy regularization term is introduced:

L^{E N T} (θ) = E t [H (π_{θ} (s_{t}))]

(20)

where

H (π_{θ} (s_{t}))

represents the entropy of the strategy and the uncertainty of the strategy distribution.

Thus, the total loss function of the algorithm can be expressed as

L (θ) = E_{t} [L^{C L P} (θ) - c_{1} L^{V F} (θ) + c_{2} L^{E N T} (θ)]

(21)

where

c_{1}

is the loss coefficient of the value function and

c_{2}

is the coefficient of the entropy regularization term.

The policy network is initialized with the same parameters as the value network. The parameter update process is expressed as

θ \leftarrow θ + l r \nabla_{θ} L (θ)

(22)

where

l r

is the learning rate and

\nabla_{θ} L (θ)

is the gradient of the function

L (θ)

with respect to parameter

θ

.

The algorithm framework proposed for the decision stage of the cloud management node is shown in Figure 4, and the pseudo-code of the training process is shown in Algorithm 2.

In Algorithm 2, firstly, we initialize the policy network, value network and experience replay pool randomly. We reset the system environment state at the beginning of each round, initialize resource state of all edge nodes and cloud, and collect tasks to be unloaded to generate queue to be allocated in cloud. Enter the task scheduling loop phase (lines 6–19). First, the policy network generates action probabilities based on the current task attributes and resource status. Then, it checks all nodes to determine whether they meet multiple constraints, such as container availability, computing resources, and bandwidth resources, and dynamically sets the action mask. After mask filtering, the unload decision is executed; task delay, energy consumption and timeout penalty are calculated to generate the reward; and state transition data is stored in the experience pool. At the end of round (line 20–25), sample data from experience pool based on dominance estimation value, update policy network and value network parameters using PPO algorithm, optimize optimal policy through policy gradient. Finally, empty memory and enter next round iteration until the model converges or reaches preset training round termination.

Algorithm 2: PPO dynamic scheduling algorithm based on action masks.

The time complexity of this algorithm is primarily influenced by the number of episodes N, the number of tasks per episode T, and the complexity of the policy network K. Below, we systematically analyze the time complexity of each stage and derive the overall time complexity of the algorithm.

(1): Initialization Stage: At the beginning of each episode, the policy network, value network, and memory are randomly initialized. These operations are independent of the number of tasks or episodes; hence, the time complexity for this stage is $O (1)$ .
(2): Environment State and Task Queue Generation: At the start of each episode, the system’s environment state is reset, and the resource states of all edge nodes and the cloud are initialized. This operation involves resetting the state of the system, and its time complexity is $O (1)$ . Following this, the algorithm collects the tasks to be offloaded and generates the task allocation queue for the cloud. Assuming there are T tasks per episode, the time complexity for generating the task queue is $O (T)$ .
(3): Task Scheduling Loop:The policy network generates action probabilities based on the current task attributes and resource states. The computational complexity of this process depends on the network architecture (e.g., the number of layers and parameters), and is denoted as $O (K)$ . Subsequently, for each task to be scheduled, the algorithm traverses all nodes to verify whether they contain available containers and satisfy the number of nodes; the complexity of this action masking step is thus $O (T \times M)$ . After filtering through the mask, the algorithm executes the offloading decision and calculates the task’s delay, energy consumption, and timeout penalties to generate rewards. Since each task requires individual computation, the time complexity for this part is $O (T)$ . Finally, the state transition data is stored in the memory, which incurs a constant-time cost of $O (1)$ . Therefore, the overall time complexity of the task-scheduling loop is $O (T \times (K + M) \approx O (T \times K)$ , Here, the action masking cost $O (T \times M)$ is relatively negligible compared to policy inference and can therefore be incorporated into the overall scheduling complexity.
(4): Policy Update: At the end of each episode, data is sampled from the memory, and the PPO algorithm is used to update the policy and value network parameters. The complexity of PPO typically depends on the network layers, so its time complexity is $O (K)$ . The time complexity of the operation to clear the memory is $O (1)$ .

Therefore, based on the above analysis, the time complexity for each iteration is

O (T \times K)

, and the overall time complexity is

O (N \times T \times K)

.

5. Simulation and Analysis

5.1. Simulation Settings

A cloud-edge collaborative task scheduling simulation platform for smart building scenarios was developed based on Python 3.8 and the deep learning framework PyTorch 2.4.1. The experimental system consists of five heterogeneous edge nodes and one cloud management node. The edge nodes exhibit significant differences in computing power, bandwidth, and storage capacity, reflecting the heterogeneous resource characteristics found in real-world scenarios.

To simulate typical service deployment scenarios in smart buildings, all application containers are assumed to be deployed on the cloud, while each edge node randomly deploys 7–9 types of container instances based on its local storage capacity. Regarding task types, 14 typical tasks are defined, covering core functions such as security monitoring, energy optimization, and device control. These tasks vary in computing resource and data transmission requirements, have different maximum tolerable latencies, and exhibit explicit container dependencies. To better reflect the dynamic nature of task requests in practical building systems, task generation weights are assigned to different task types, and the arrival process of tasks is modeled by a Poisson distribution to simulate varying arrival frequencies and system load fluctuations.

All experiments were conducted on a computer equipped with an Intel® Core™ i5-9300HF (2.40 GHz) processor and 16 GB of RAM. Referring to previous work [36,37], the detailed parameter settings of the system model are shown in Table 1, and the training hyperparameters of the AMPPO algorithm in the second stage are presented in Table 2.

To measure the performance of the proposed algorithms, the following five algorithms were selected as benchmarks for comparison.

(1): Edge-first strategy (EF): All tasks are preferentially executed on local edge nodes. If edge resources are insufficient, the tasks are discarded;
(2): Cloud-First strategy (CF): All tasks are, by default, offloaded to the cloud for execution;
(3): Random Selection strategy (RS): In each scheduling round, tasks are randomly assigned to either the cloud or edge nodes for execution;
(4): RACDG: a resource-aware cost-driven greedy algorithm proposed in Section 4 as the first-stage solution of this work;
(5): PSO [36]: this algorithm is a task-offloading strategy based on particle swarm optimization for industrial IoT environment, aiming to minimize the delay and energy consumption of task processing.

5.2. Model Hyperparameter Settings and Training Cost Analysis

5.2.1. Hyperparameter Settings

To investigate the impact of batch size on model performance, we compare the reward trajectories under BatchSize settings of 32, 64, and 128, as illustrated in Figure 5. The model trained with BatchSize = 64 achieves the highest cumulative reward during the later training stages and exhibits a relatively stable convergence trend. This indicates a favorable trade-off between learning speed and training stability. In contrast, BatchSize = 32 shows rapid reward descent in early epochs but suffers from larger fluctuations later, implying a risk of overfitting. BatchSize = 128 yields significantly lower rewards overall, suggesting that excessively large batches may reduce update frequency and lead to underfitting. Therefore, BatchSize = 64 is identified as the optimal choice in this scenario considering both performance and stability.

To investigate the influence of learning rate on model performance, this study compares the reward trajectories under three settings: LR = 0.00001, 0.0001, and 0.001, as illustrated in Figure 6. When the learning rate is excessively low (e.g., LR = 0.00001), the model fails to converge effectively throughout the training process, with rewards remaining at a low level (approximately between −900 and −600) and exhibiting considerable volatility, indicating a significant under-update problem. A moderate learning rate (LR = 0.0001) enables gradual convergence, with the reward eventually approaching −400; however, certain oscillations persist in the later training stages. In contrast, LR = 0.001 demonstrates the most favorable convergence behavior and training stability. The reward rapidly increases to a high-performance range within the first 2000 episodes and remains stable thereafter, suggesting that this setting facilitates more efficient policy updates and improved generalization capability. Therefore, LR = 0.001 is selected as the optimal learning rate configuration for this task.

For the remaining hyperparameters, including the discount factor

γ

, entropy regularization term coefficient

c_{2}

, and GAE hyperparameter

λ

, we follow the widely adopted default settings established in PPO literature [34,37]. These values have been consistently validated across diverse tasks and are known to ensure stable and robust performance.

5.2.2. Training Cost Analysis

To assess the practical feasibility of the AMPPO model in dynamic environments, the wall-clock training time was recorded. The training of the AMPPO model over 10,000 episodes (1,000,000 timesteps) took approximately 21 min in total, with an average episode time of 0.126 s. The policy was updated every 2048 timesteps, resulting in approximately 488 update iterations during training, with each policy update taking an average of 2.58 s. These empirical results demonstrate that AMPPO achieves efficient training performance, supporting its feasibility for deployment in smart building systems with strict real-time requirements.

5.3. Experimental Results and Analysis

In order to evaluate the performance of different scheduling strategies comprehensively, the following main performance indicators are selected in the experiment:

Total Time: Refers to the cumulative time taken for all tasks from their arrival in the system to the completion of execution, measured in seconds (s).
Total Energy Consumption: Refers to the sum of computational and communication energy consumed during the execution of all tasks, measured in joules (J).
On-Time completion rate of tasks: Refers to the proportion of tasks completed within their maximum tolerable latency. It is defined as $TCR = \frac{N_{completed}}{N_{total}}$ .

Meanwhile, to enhance the statistical robustness of the experimental results, each group of simulations was independently repeated 20 times. For each performance metric, the sample mean and the 95% confidence interval (CI), calculated based on the t-distribution, are reported. The confidence interval quantifies the uncertainty associated with the sample mean, with its upper and lower bounds determined by the sample standard deviation and the corresponding t-critical value. The 95% CI is calculated as

C I = \bar{x} \pm t_{0.975} \cdot \frac{s}{\sqrt{n}}

(23)

where

\bar{x}

is the sample mean, s is the sample standard deviation, and n is the number of repetitions.

5.3.1. Overall Performance Comparison

Under experimental conditions with a task arrival rate of 10 tasks per second and a total of 600 tasks, a comprehensive performance comparison of various scheduling algorithms was conducted, as illustrated in Figure 7, Figure 8 and Figure 9. To verify the statistical significance of the performance differences, a one-way analysis of variance (ANOVA) was performed. The results indicate the proposed algorithm outperforms the other algorithms in terms of task completion time, energy consumption, and task on-time completion rate. The statistical significance of these differences (p < 0.001) underscores the effectiveness and reliability of the proposed approach.

Due to the resource constraints at edge nodes, the EF strategy performed poorly in this scenario. It completed only approximately 240 tasks, resulting in an average execution time of 4.47 s and a task drop rate as high as 60%. In contrast, all other algorithms successfully scheduled all 600 tasks without any task loss. As shown in Figure 7, both the proposed two-stage collaborative scheduling algorithm and the first-stage RACDG algorithm performed exceptionally well. Notably, the two-stage algorithm further reduced the total task completion time by 16.9% compared to RACDG, and by 63.7%, 42.4%, and 26% compared to the traditional CF, RS, and PSO algorithms, respectively.

Figure 8 presents the total energy consumption under different algorithms. The CF strategy, which offloads all tasks to the cloud, incurs significant transmission energy costs, leading to the highest energy consumption and increased instability. In contrast, the EF strategy processes all tasks locally on low-power edge nodes, thereby avoiding high-transmission-energy costs. However, since it completes only about 40% of tasks, its overall energy consumption remains low. The proposed two-stage algorithm significantly reduces the total system energy consumption by 66.9%, 11.7%, 21.7%, and 21.4% compared to the CF, RS, PSO, and RACDG algorithms, respectively.

Figure 9 shows the task on-time completion rates achieved by different algorithms. Due to severe task discards under resource limitations, the EF strategy achieves a completion rate of only 23.8%, significantly lower than all others. While CF completes all tasks, its exclusive reliance on cloud execution results in long transmission delays, causing many tasks to miss their deadlines; thus, its on-time completion rate is only 55.43%. The RS algorithm, which randomly offloads tasks without considering node resource availability, often leads to overload and achieves an on-time completion rate of 80.1%.The proposed two-stage collaborative scheduling algorithm performs best, attaining an on-time task completion rate of 93.3%, which is 4% and 5.6% higher than those achieved by the PSO and RACDG algorithms, respectively.

5.3.2. Performance Analysis Under Different Mission Arrival Rates

In this experimental design, the total task count is configured as 500, and the task arrival rate follows a uniform distribution of [3, 15] tasks/s. Figure 10, Figure 11 and Figure 12 systematically shows the total task completion time of different scheduling strategies under dynamic load conditions, total system energy consumption and task on-time completion rate and other key performance indicators.

As shown in Figure 10, across the entire range of task arrival rates, the proposed two-stage collaborative scheduling algorithm exhibits significant performance advantages. Under low-load conditions (3 tasks/s), most algorithms achieve relatively low total task completion times. EF and CF show slightly higher total times due to uneven resource utilization and increased transmission overhead, respectively. Even at this stage, the proposed algorithm outperforms both EF and CF with a notably lower total task time. As the load increases to moderate levels (6–9 tasks/s), the total completion time of EF and CF rises sharply. EF suffers from resource bottlenecks at edge nodes, reducing task processing efficiency, while CF incurs substantial latency due to cloud-only task offloading. Although RS, PSO, and RACDG benefit from collaborative mechanisms and show moderate improvements, their task completion times remain higher than that of the proposed algorithm. Under high load scenarios (12–15 tasks/s), the EF algorithm misleadingly shows a reduction in the total task time due to the high number of dropped tasks. However, the actual task completion rate is significantly degraded, indicating extremely low system throughput. Meanwhile, CF experiences a steep increase in total time, exceeding that of all other algorithms, demonstrating that its cloud-centric strategy breaks down under high concurrency. Although RS, PSO, and RACDG alleviate resource contention to some extent, they are still unable to effectively balance the system load in highly concurrent environments. In contrast, the proposed algorithm consistently achieves the lowest total task completion time across all load levels. Moreover, the narrow error bars indicate low performance variance, underscoring the method’s stability and robustness. Compared to benchmark algorithms such as RS, PSO, and RACDG, the proposed method demonstrates superior adaptability and resilience under high-load conditions.

Figure 11 illustrates the total system energy consumption incurred by each scheduling algorithm under varying task arrival rates. As shown in the figure, the energy consumption trends differ significantly across algorithms as the load increases. The proposed two-stage collaborative scheduling algorithm consistently maintains low energy consumption across all load conditions, demonstrating excellent energy efficiency and scalability. In contrast, the CF algorithm exhibits the highest energy consumption at every task arrival rate, with total consumption rising sharply with increasing load. At 15 tasks/s, its energy consumption exceeds 1100 J. This is primarily due to CF’s cloud-centric strategy, which results in substantial communication overhead and energy costs from frequent task offloading and data transmission. Furthermore, CF shows large error bars, indicating high variability and poor performance stability. The EF algorithm processes tasks exclusively on low-power edge nodes, thereby avoiding the energy overhead associated with cloud communication. However, due to resource constraints, EF completes only around 40% of all tasks. Consequently, its low-energy-consumption figures are misleading, as they do not reflect actual scheduling effectiveness or system throughput. Benchmark algorithms such as RS, PSO, and RACDG demonstrate moderate energy performance. Among them, PSO shows a significant increase in energy consumption and large variance under high-load conditions (15 tasks/s), highlighting its limited robustness. Overall, the proposed algorithm achieves the lowest or second-lowest energy consumption at all load levels. Notably, under the high-load scenario of 15 tasks/s, it outperforms RS, PSO, and RACDG by a substantial margin, while maintaining minimal variance, which further confirms its robustness and stability in dynamic environments.

Figure 12 presents the on-time task completion rate under varying task arrival rates. As the arrival rate increases, the system load intensifies, leading to a noticeable decline in the real-time performance of most algorithms. Despite this trend, the proposed two-stage collaborative scheduling algorithm consistently maintains an on-time completion rate above 91.3%, demonstrating strong robustness and adaptability under dynamic load conditions. PSO and RACDG also exhibit good adaptability, with their completion rates declining gradually as the load increases. In contrast, the EF algorithm performs the worst in terms of real-time task handling. Since EF relies solely on edge resources, it becomes severely constrained under high load, resulting in substantial task dropping. Consequently, its on-time completion rate decreases dramatically from 65% to below 20%, failing to meet real-time requirements in high-concurrency scenarios. Although the CF algorithm is capable of completing all tasks, its on-time completion rate is significantly impacted by cloud transmission delays. As the task arrival rate increases, its timely completion rate declines from approximately 77% to around 40%, indicating that fully cloud-based offloading strategies suffer from poor real-time responsiveness under severe communication bottlenecks. Additionally, the RS algorithm lacks effective resource-awareness. When the arrival rate reaches nine tasks/s, its on-time completion rate exhibits a marked drop, further revealing its limitations in dynamic scheduling under increasing system pressure.

5.3.3. Performance Analysis by Total Number of Tasks

In this group of experiments, the task arrival rate was set to 10 tasks/s, and the total number of tasks obeyed the uniform distribution of [200, 1000]. Figure 13, Figure 14 and Figure 15 show the performance of different scheduling strategies in terms of total task completion time, total system energy consumption, and task completion rate under different task total numbers.

Figure 13 illustrates the impact of increasing task volume on the total task completion time across different scheduling algorithms. As the total number of tasks grows, all algorithms exhibit an upward trend in execution time. Among them, the EF and CF algorithms show the most pronounced increases. In particular, CF’s total completion time rapidly reaches or even exceeds 2000 s when the task count surpasses 600, primarily due to the high communication overhead of full cloud offloading. The RS and PSO algorithms demonstrate relatively stable performance at moderate task scales. However, when the number of tasks increases to 800 and 1000, both algorithms experience a sharp acceleration in completion time, revealing their limited scalability under high-load conditions. Compared to these benchmark methods, the RACDG algorithm exhibits better scalability, with a more gradual increase in task completion time as the task volume grows from 200 to 1000. Notably, the proposed two-stage collaborative scheduling algorithm consistently achieves the lowest execution time across all task volumes. Its growth trend remains the most moderate, clearly demonstrating its strong scalability and effectiveness in handling large-scale task scheduling scenarios.

Figure 14 illustrates the total system energy consumption of each scheduling algorithm under varying task volume conditions. As the number of tasks increases, all algorithms exhibit an upward trend in energy consumption, reflecting the growing processing load associated with larger task scales. The EF algorithm consistently shows the lowest energy consumption. However, it is important to note that EF only completes a subset of the total tasks due to resource limitations, which results in misleadingly low energy consumption figures. Consequently, despite its seemingly favorable energy metrics, EF exhibits poor overall scheduling performance and lacks practical applicability in real-world scenarios. In contrast, the CF algorithm consistently incurs the highest energy consumption across all task volumes. Its energy usage increases rapidly with task scale, especially when the number of tasks reaches 1000, at which point energy consumption exceeds 1500 J. This is primarily attributed to the significant communication overhead introduced by full cloud offloading. The RS, PSO, and RACDG algorithms maintain relatively low energy consumption when the task volume is small. However, once the number of tasks surpasses 600, their energy usage escalates sharply, indicating reduced scheduling efficiency and imbalanced resource utilization under large-scale workloads. Notably, the proposed two-stage collaborative scheduling algorithm consistently achieves low energy consumption across all experimental conditions. It exhibits a stable and gradual growth trend, and in large-scale scenarios (800 and 1000 tasks), its total energy consumption is significantly lower than that of CF, RS, and PSO. These results confirm the algorithm’s superior energy efficiency and scalability for large-scale task scheduling.

Figure 15 presents the task completion rates of each scheduling algorithm under varying total task volumes. As the task count increases, the proposed two-stage collaborative scheduling algorithm consistently maintains a completion rate above 93.3%, demonstrating strong scalability and robustness. The EF algorithm exhibits persistently low task completion rates due to extensive task dropping under resource constraints. Meanwhile, the CF algorithm also performs poorly in this metric, with its completion rate declining from 63% to 54% as the task volume increases, primarily due to severe transmission latency caused by cloud offloading. The RS algorithm performs relatively well under small task loads. However, as the task count grows, its completion rate drops significantly, indicating limited adaptability and high performance variability. In contrast, both the PSO and RACDG algorithms—where RACDG represents the first-stage method proposed in this paper—maintain task completion rates above 85% even under high task loads. The rate of decline is relatively modest, highlighting their superior adaptability and effectiveness in large-scale task scheduling scenarios.

5.4. Time Complexity Analysis

To comprehensively evaluate the computational complexity and practical deployability of each scheduling strategy, Table 3 presents a comparative analysis of the theoretical time complexity of the proposed method against five baseline algorithms. Here, T denotes the total number of tasks, I represents the maximum number of iterations for PSO, P is the number of particles, M is the number of edge nodes, and K refers to the complexity of policy network inference, and N denotes the number of training episodes.

Traditional heuristic approaches (EF, CF, RS) do not rely on sophisticated resource-awareness mechanisms or policy learning procedures, resulting in a linear time complexity of

O (T)

. These strategies are suitable for scenarios with stringent real-time requirements but relatively low demands on scheduling precision. RACDG, as a resource-aware greedy scheduling algorithm, can rapidly adapt to current resource states and task demands, achieving locally optimal decisions. While maintaining a time complexity of

O (T)

, it significantly enhances scheduling efficiency, particularly in resource-constrained edge computing environments. PSO, a representative swarm intelligence optimization method, exhibits a time complexity of

O (I \cdot P \cdot T \cdot M)

, increasing linearly with the number of particles and iterations. Although it is suitable for high-precision offline optimization, it suffers from considerable scheduling latency in large-scale online task scenarios.

In contrast, the proposed algorithm leverages a policy network to achieve comprehensive resource awareness and dynamic scheduling decisions. Its time complexity is

O (N \cdot T \cdot K)

, which is relatively high. However, it demonstrates significantly superior scheduling performance under complex and constrained environments, offering stronger generalization ability and practical applicability compared to the baseline methods.

6. Conclusions

This paper studies the cloud-edge collaborative task scheduling problem for smart buildings. In view of the problems existing in the current task-scheduling methods in the smart building scenario, such as ignoring container compatibility constraints, difficulty in balancing global optimization and real-time performance, and difficulty in adapting to dynamic environments, a cloud–edge collaborative task scheduling system model was constructed by comprehensively considering the limited resources and heterogeneity of edge nodes, as well as the container compatibility and real-time performance requirements of tasks. To minimize the system delay and energy consumption, and ensure the real-time requirement of tasks, a hierarchical progressive solution is designed. In the first stage, a resource-aware, cost-driven greedy algorithm is proposed to achieve rapid initial task-unloading decisions in edge nodes, and in the second stage, a proximal policy optimization algorithm based on action mask is introduced to achieve global dynamic scheduling. The experimental results show that compared with other algorithms, the proposed algorithm can effectively reduce system delay and energy consumption and still maintain a high task completion rate under a high load. It has good scheduling robustness and application potential, and provides an effective solution for the cloud–edge collaborative task scheduling of smart buildings. Nevertheless, this study has certain limitations. The proposed task-scheduling model is developed under the assumption of static container deployment, without considering dynamic container deployment and migration during system runtime. In real-world smart building environments, containerized services may undergo migration due to workload fluctuations and dynamic changes in resource availability, thereby posing greater challenges to task-scheduling strategies. Therefore, future work will focus on extending the current model to incorporate dynamic container deployment mechanisms, aiming to enhance the flexibility and robustness of the scheduling system, and better align with the requirements of dynamic resource management and quality-of-service assurance in smart building scenarios.

Author Contributions

Conceptualization, P.Y.; Methodology, J.H.; Software, J.H.; Validation, J.H.; Resources, P.Y.; Writing—original draft, J.H.; Writing—review and editing, P.Y.; Visualization, J.H.; Supervision, P.Y.; Project administration, P.Y.; Funding acquisition, P.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research and Development Program Project of Shaanxi Province (Project No.: 2023-YBGY-213) and Special Program Project for Serving Local Areas of Shaanxi Provincial Department of Education (Project No.: 23JC016).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Haiyirete, X.; Zhang, W.; Gao, Y. Evolving Trends in Smart Building Research: A Scientometric Analysis. Buildings 2024, 14, 3023. [Google Scholar] [CrossRef]
Apanaviciene, R.; Vanagas, A.; Fokaides, P.A. Smart Building Integration into a Smart City (SBISC): Development of a New Evaluation Framework. Energies 2020, 13, 2190. [Google Scholar] [CrossRef]
Doukari, O.; Seck, B.; Greenwood, D.; Feng, H.; Kassem, M. Towards an Interoperable Approach for Modelling and Managing Smart Building Data: The Case of the CESI Smart Building Demonstrator. Buildings 2022, 12, 362. [Google Scholar] [CrossRef]
Ghaffarianhoseini, A.; Berardi, U.; AlWaer, H.; Chang, S.; Halawa, E.; Ghaffarianhoseini, A.; Clements-Croome, D. What Is an Intelligent Building? Analysis of Recent Interpretations from an International Perspective. Archit. Sci. Rev. 2015, 59, 338–357. [Google Scholar] [CrossRef]
Jayanetti, A.; Halgamuge, S.; Buyya, R. Deep Reinforcement Learning for Energy and Time Optimized Scheduling of Precedence-Constrained Tasks in Edge–Cloud Computing Environments. Future Gener. Comput. Syst. 2022, 137, 14–30. [Google Scholar] [CrossRef]
Premsankar, G.; Di Francesco, M.; Taleb, T. Edge Computing for the Internet of Things: A Case Study. IEEE Internet Things J. 2018, 5, 1275–1284. [Google Scholar] [CrossRef]
Liu, L.; Zhu, H.; Wang, T.; Tang, M. A Fast and Efficient Task Offloading Approach in Edge-Cloud Collaboration Environment. Electronics 2024, 13, 313. [Google Scholar] [CrossRef]
Sahoo, S.K.; Mishra, S.K. A Survey on Task Scheduling in Edge-Cloud. SN Comput. Sci. 2025, 6, 217. [Google Scholar] [CrossRef]
Song, C.; Zheng, H.; Han, G.; Zeng, P.; Liu, L. Cloud Edge Collaborative Service Composition Optimization for Intelligent Manufacturing. IEEE Trans. Ind. Inform. 2023, 19, 6849–6858. [Google Scholar] [CrossRef]
Zhang, W.; Tuo, K. Research on Offloading Strategy for Mobile Edge Computing Based on Improved Grey Wolf Optimization Algorithm. Electronics 2023, 12, 2533. [Google Scholar] [CrossRef]
Zhu, S.; Zhao, M.; Zhang, Q. Multi-objective Optimal Offloading Decision for Multi-user Structured Tasks in Intelligent Transportation Edge Computing Scenario. J. Supercomput. 2022, 78, 17797–17825. [Google Scholar] [CrossRef]
Su, X.; An, L.; Cheng, Z.; Weng, Y. Cloud-edge collaboration-based bi-level optimal scheduling for intelligent healthcare systems. Future Gener. Comput. Syst. 2023, 141, 28–39. [Google Scholar] [CrossRef]
Urblik, L.; Kajati, E.; Papcun, P.; Zolotová, I. Containerization in Edge Intelligence: A Review. Electronics 2024, 13, 1335. [Google Scholar] [CrossRef]
Thodoroff, P.; Li, W.; Lawrence, N. Benchmarking Real-Time Reinforcement Learning. In Proceedings of NeurIPS Workshop on Pre-Registration in Machine Learning; 2022; Volume 181, pp. 26–41. Available online: https://proceedings.mlr.press/v181/thodoroff22a.html (accessed on 6 November 2024).
Wang, Y. Review on Greedy Algorithm. Theor. Nat. Sci. 2023, 14, 233–239. [Google Scholar] [CrossRef]
Nujhat, N.; Haque, S.F.; Sarker, S. Task Offloading Exploiting Grey Wolf Optimization in Collaborative Edge Computing. J. Cloud Comput. 2024, 13, 1. [Google Scholar] [CrossRef]
Shu, W.; Yu, H.; Zhai, C.; Feng, X. An Adaptive Computing Offloading and Resource Allocation Strategy for Internet of Vehicles Based on Cloud-Edge Collaboration. IEEE Trans. Intell. Transp. Syst. 2024, 1–10. [Google Scholar] [CrossRef]
Zhang, J.; Chen, J.; Bao, X.; Liu, C.; Yuan, P.; Zhang, X.; Wang, S. Dependent Task Offloading Mechanism for Cloud–Edge–Device Collaboration. J. Netw. Comput. Appl. 2023, 216, 103656. [Google Scholar] [CrossRef]
Hao, T.; Zhan, J.; Hwang, K.; Gao, W.; Wen, X. AI-oriented Workload Allocation for Cloud-Edge Computing. In Proceedings of the 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Melbourne, Australia, 10–13 May 2021; pp. 555–564. [Google Scholar] [CrossRef]
Xiao, Z.; Shu, J.; Jiang, H.; Lui, J.C.; Min, G.; Liu, J.; Dustdar, S. Multi-Objective Parallel Task Offloading and Content Caching in D2D-Aided MEC Networks. IEEE Trans. Mob. Comput. 2023, 22, 6599–6615. [Google Scholar] [CrossRef]
Jia, L.W.; Li, K.; Shi, X. Cloud Computing Task Scheduling Model Based on Improved Whale Optimization Algorithm. Wirel. Commun. Mob. Comput. 2021, 13, 4888154. [Google Scholar] [CrossRef]
Song, F.; Xing, H.; Luo, S.; Zhan, D.; Dai, P.; Qu, R. A Multiobjective Computation Offloading Algorithm for Mobile-Edge Computing. IEEE Internet Things J. 2020, 7, 8780–8799. [Google Scholar] [CrossRef]
Wang, M.; Shi, S.; Gu, S.; Gu, X.; Qin, X. Q-learning Based Computation Offloading for Multi-UAV-Enabled Cloud-Edge Computing Networks. IET Commun. 2020, 14, 2481–2490. [Google Scholar] [CrossRef]
Zhao, Y.; Li, B.; Wang, J.; Jiang, D.; Li, D. Integrating Deep Reinforcement Learning with Pointer Networks for Service Request Scheduling in Edge Computing. Knowl.-Based Syst. 2022, 258, 109983. [Google Scholar] [CrossRef]
Sellami, B.; Hakiri, A.; Ben Yahia, S.; Berthou, P. Energy-aware Task Scheduling and Offloading Using Deep Reinforcement Learning in SDN-enabled IoT Network. Comput. Netw. 2022, 210, 108957. [Google Scholar] [CrossRef]
Tang, T.; Li, C.; Liu, F. Collaborative Cloud-Edge-End Task Offloading with Task Dependency Based on Deep Reinforcement Learning. Comput. Commun. 2023, 209, 78–90. [Google Scholar] [CrossRef]
Chen, Y.; Peng, K.; Ling, C. COPSA: A Computation Offloading Strategy Based on PPO Algorithm and Self-Attention Mechanism in MEC-Empowered Smart Factories. J. Cloud Comput. 2024, 13, 153. [Google Scholar] [CrossRef]
Liu, K.; Yang, W. Task Offloading and Resource Allocation Strategies Based on Proximal Policy Optimization. In Proceedings of the 6th International Conference on Natural Language Processing (ICNLP), Xi’an, China, 22–24 March 2024; pp. 693–698. [Google Scholar] [CrossRef]
Feng, X.; Yi, L.; Wang, L.B. An Efficient Scheduling Strategy for Collaborative Cloud and Edge Computing in System of Intelligent Buildings. J. Adv. Comput. Intell. Intell. Inform. 2023, 27, 948–958. [Google Scholar] [CrossRef]
Shen, Z.; Lu, X.L. An Edge Computing Offloading Method Based on Data Compression and Improved Grey Wolf Algorithm in Smart Building Environment. Appl. Res. Comput. 2024, 41, 3311–3316. [Google Scholar] [CrossRef]
Wang, Y. Design and Application of Cloud-Edge Collaborative Management Components for Smart Buildings. Master’s Thesis, Zhejiang University, Hangzhou, China, 2022. [Google Scholar] [CrossRef]
Tang, B.; Luo, J.; Obaidat, M.S.; Li, H. Container-based Task Scheduling in Cloud-Edge Collaborative Environment Using Priority-aware Greedy Strategy. Clust. Comput. 2023, 26, 3689–3705. [Google Scholar] [CrossRef]
Zhang, W.; Wen, Y.; Guan, K.; Kilper, D.C.; Luo, H.; Wu, D.O. Energy-Optimal Mobile Cloud Computing Under Stochastic Wireless Channel. IEEE Trans. Wirel. Commun. 2013, 12, 4569–4581. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Tang, C.; Liu, C.; Chen, W.; You, S.D. Implementing Action Mask in Proximal Policy Optimization (PPO) Algorithm. ICT Express 2020, 6, 200–203. [Google Scholar] [CrossRef]
You, Q.; Tang, B. Efficient Task Offloading Using Particle Swarm Optimization Algorithm in Edge Computing for Industrial Internet of Things. J. Cloud Comput. 2021, 10, 41. [Google Scholar] [CrossRef]
Engstrom, L.; Ilyas, A.; Santurkar, S.; Tsipras, D.; Janoos, F.; Rudolph, L.; Madry, A. Implementation Matters in Deep RL: A Case Study on PPO and TRPO. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020; Available online: https://openreview.net/forum?id=r1etN1rtPB (accessed on 10 December 2024).

Figure 1. Architecture diagram of the overall cloud–edge collaboration system model.

Figure 2. Task execution scheme.

Figure 3. Flowchart of two-stage cooperative scheduling mechanism.

Figure 4. AMPPO algorithm framework diagram.

Figure 5. Convergence analysis under different batch sizes.

Figure 6. Convergence analysis under different learning rates.

Figure 7. Total time for different algorithms.

Figure 8. Total energy consumption for different algorithms.

Figure 9. Task completion rates for different algorithms.

Figure 10. Comparison of total time for different task arrival rates.

Figure 11. Comparison of total energy consumption under different task arrival rates.

Figure 12. Comparison of task completion rate under different task arrival rate.

Figure 13. Comparison of task completion rate under different task arrival rates.

Figure 14. Comparison of total energy consumption by total number of tasks.

Figure 15. Comparison of task completion rate under different total tasks.

Table 1. System model parameter settings.

Symbols	Description	Value
n	Edge Node Number	5
$f_{n}^{max}$	Edge Node Maximum Compute Resources	[3, 15] GHz
$f_{0}^{max}$	Cloud Node Maximum Compute Resources	100 GHz
$w_{n}^{max}$	Edge Node Maximum Bandwidth Resources	[10, 20] MHz
$w_{0}^{max}$	Cloud Node Maximum Bandwidth Resources	100 MHz
$c_{k}$	Task Compute Size	$[2 \times 10^{8}, 4 \times 10^{9}]$ cycles
$d_{k}$	Task Data Size	[1, 50] MB
$v_{k}$	Container Type	14
$t_{k}^{max}$	Task Maximum Tolerance Time	[0.1, 5] s
$h_{n} (t)$	Edge Node n to Cloud Node Channel Gain	$10^{- 5}$
$p_{n, t r}$	Transmission power of edge nodes	[0.1, 1] W
$σ^{2}$	Gaussian white noise	$10^{- 9}$ W
b	Chip structure coefficient in energy consumption model	$10^{- 27}$
$α$	Total time coefficient	0.5
$β$	Total energy coefficient	0.5
$μ$	Timeout penalty coefficient	5, 2, 0.5

Table 2. Hyperparameter Settings of the AMPPO algorithm.

Symbols	Description	Value
$l r$	Learning Rate	0.001
$γ$	Discount Factor	0.99
$ϵ$	Clip Range	0.2
$λ$	GAE Hyperparameter	0.95
$c_{1}$	Value Function Loss Coefficient	0.5
$c_{2}$	Entropy Regularization Term Coefficient	0.01
S	Sample Batch Size	64

Table 3. Comparison of time complexity among algorithms.

Algorithm	EF	CF	RS	PSO	RACDG	Proposed
Time complexity	$O (T)$	$O (T)$	$O (T)$	$O (I \cdot P \cdot T \cdot M)$	$O (T)$	$O (N \times T \times K)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, P.; He, J. Dynamic Task Scheduling Based on Greedy and Deep Reinforcement Learning Algorithms for Cloud–Edge Collaboration in Smart Buildings. Electronics 2025, 14, 3327. https://doi.org/10.3390/electronics14163327

AMA Style

Yang P, He J. Dynamic Task Scheduling Based on Greedy and Deep Reinforcement Learning Algorithms for Cloud–Edge Collaboration in Smart Buildings. Electronics. 2025; 14(16):3327. https://doi.org/10.3390/electronics14163327

Chicago/Turabian Style

Yang, Ping, and Jiangmin He. 2025. "Dynamic Task Scheduling Based on Greedy and Deep Reinforcement Learning Algorithms for Cloud–Edge Collaboration in Smart Buildings" Electronics 14, no. 16: 3327. https://doi.org/10.3390/electronics14163327

APA Style

Yang, P., & He, J. (2025). Dynamic Task Scheduling Based on Greedy and Deep Reinforcement Learning Algorithms for Cloud–Edge Collaboration in Smart Buildings. Electronics, 14(16), 3327. https://doi.org/10.3390/electronics14163327

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Task Scheduling Based on Greedy and Deep Reinforcement Learning Algorithms for Cloud–Edge Collaboration in Smart Buildings

Abstract

1. Introduction

2. Related Work

2.1. Cloud–Edge Collaborative Computing

2.2. Task-Scheduling Algorithm

3. System Model and Problem Definition

3.1. System Model

3.1.1. Task Model

3.1.2. Decision-Making Model

3.1.3. Delay Model

3.1.4. Energy Consumption Model

3.2. Optimization Objective

4. Greedy and Deep Reinforcement Learning-Based Task Scheduling Algorithm

4.1. Problem Decomposition

4.2. Distributed Edge Node Decision Phase

4.3. Cloud Management Node Decision Phase

4.3.1. Markov Decision Process Modeling

4.3.2. PPO Dynamic Scheduling Algorithm Based on Action Mask

5. Simulation and Analysis

5.1. Simulation Settings

5.2. Model Hyperparameter Settings and Training Cost Analysis

5.2.1. Hyperparameter Settings

5.2.2. Training Cost Analysis

5.3. Experimental Results and Analysis

5.3.1. Overall Performance Comparison

5.3.2. Performance Analysis Under Different Mission Arrival Rates

5.3.3. Performance Analysis by Total Number of Tasks

5.4. Time Complexity Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI