Next Article in Journal
Towards Sustainable Cloud Computing: Load Balancing with Nature-Inspired Meta-Heuristic Algorithms
Previous Article in Journal
Edge Computing-Enabled Secure Forecasting Nationwide Industry PM2.5 with LLM in the Heterogeneous Network
Previous Article in Special Issue
Cache Optimization Methods Involving Node and Content Sharding of Blockchain in Internet of Vehicles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Agent Deep Reinforcement Learning-Based Inference Task Scheduling and Offloading for Maximum Inference Accuracy under Time and Energy Constraints

1
Information Engineering College, Xinjiang Institute of Engineering, Urumqi 830099, China
2
School of Computer and Communication Engineering, University of Science and Technology Beijing, Haidian District, Xueyuan Road 30, Beijing 100083, China
3
School of Computer Science, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(13), 2580; https://doi.org/10.3390/electronics13132580
Submission received: 17 May 2024 / Revised: 24 June 2024 / Accepted: 29 June 2024 / Published: 30 June 2024
(This article belongs to the Special Issue Advances in 5G Wireless Edge Computing)

Abstract

:
The journey towards realizing real-time AI-driven IoT applications is facing a significant hurdle caused by the limited resources of IoT devices. Particularly for battery-powered edge devices, the decision between performing task inference locally or by offloading to edge servers, all while ensuring timely results and conserving energy, is a critical issue. This problem is further complicated when an edge device houses multiple local inference models. The challenge of effectively allocating inference models to tasks between local models and edge server models under strict time and energy constraints while maximizing overall accuracy is recognized as a strongly NP-hard problem and has not been addressed in the literature. Therefore, in this work we propose MASITO, a novel multi-agent deep reinforcement learning framework designed to address this intricate problem. By dividing the problem into two sub-problems namely task scheduling and edge server selection we propose a cooperative multi-agent system for addressing each sub-problem. MASITO’s design allows for faster training and more robust schedules using cooperative behavior where agents compensate for each other’s sub-optimal actions. Moreover, MASITO dynamically adapts to different network configurations which allows for high-mobility edge computing applications. Experiments on the ImageNet-mini dataset demonstrate the framework’s efficacy, outperforming genetic algorithms (GAs), simulated annealing (SA), and particle swarm optimization (PSO) in scheduling times by providing lower times ranging from 60% up to 90% while maintaining comparable average accuracy in worst-case scenarios and superior accuracy in best-case scenarios.

1. Introduction

As advancements in hardware and machine learning algorithms continue to unfold, artificial intelligence (AI) is becoming increasingly pervasive in Internet of Things (IoT) applications. One notable trend driving this evolution is the empowerment of IoT devices to conduct local inference tasks. This shift is propelled by two key factors: the remarkable progress in hardware capabilities and the refinement of machine learning algorithms. Modern IoT devices are equipped with increasingly sophisticated processors and specialized neural processing engines, enabling them to process complex AI computations locally with greater efficiency. Concurrently, machine learning algorithms have undergone significant enhancements, becoming more lightweight, efficient, and capable of running on resource-constrained devices. Together, these advancements enable IoT devices to perform real-time data analysis and decision-making at the edge, reducing latency, conserving bandwidth, and enhancing privacy by minimizing data transmission to central servers [1,2].
In addition to local inference capabilities, the role of inference task offloading to edge servers remains vital, particularly for tasks with higher computational demands and time-sensitive applications, while local inference is advantageous for quick decision-making and reducing reliance on centralized resources, certain AI tasks may exceed the processing capacity of individual IoT devices. In such scenarios, offloading these tasks to edge servers equipped with more powerful hardware and ample computational resources becomes indispensable. This approach not only alleviates the burden on edge devices but also ensures the timely execution of complex computations, critical for meeting stringent response time requirements in time-sensitive applications [3].
Having multiple local inference models onboard an edge device varying in size and accuracy empowers the system with dynamic adaptability, enabling edge devices to intelligently assign the most suitable model to a task based on various factors such as task properties, device resources, and application parameters (see Figure 1). This dynamic approach leverages the diverse capabilities of different inference models to optimize performance and efficiency in real-time scenarios. By analyzing task requirements, such as computational complexity, accuracy thresholds, and latency constraints, edge devices can intelligently select the most appropriate model from their repertoire to favor accuracy over energy, for example, or speed over accuracy. Simultaneously, considering device resources, including processing power, memory availability, and energy reserves, ensures optimal utilization of onboard resources without overburdening the device. Additionally, by incorporating application-specific parameters, edge devices can tailor inference model selection to align with application objectives and constraints [4].
The challenge of assigning local and edge server inference models to tasks under time and energy constraints while maximizing the accuracy is similar in nature to the unbounded multidimensional knapsack problem [5], which consists of filling a knapsack limited by volume and weight to maximize profit. However, in this case, the aim is to fill a schedule constrained by time and energy with inference models to optimize accuracy. This class of problem falls under the NP-Hard category, lacking any polynomial time solutions. Therefore, in this work, we tackle this complex optimization problem by dividing it into two smaller sub-problems and proposing an edge computing framework for AI-driven edge computing real-time applications, using state-of-the-art deep reinforcement learning methods by leveraging parallel inference between local inference and offloading tasks under time and energy limitations while maximizing inference accuracy.
In real-time IoT applications, such as smart healthcare [6], autonomous vehicles [7,8], and critical resource deployment [9,10], timely decision-making is critical for ensuring safety, efficiency, and responsiveness. By efficiently managing inference tasks at the edge, solving this problem enables the seamless integration of AI capabilities into these systems, enhancing their intelligence and autonomy. Moreover, the optimization of inference task allocation contributes to resource conservation, extending the operational lifespan of battery-powered edge devices and reducing overall energy consumption. Beyond IoT, edge computing, and AI integration, solutions to this problem have implications for broader computing paradigms, including cloud-edge orchestration and distributed computing architectures. Therefore, addressing this challenge not only advances the capabilities of edge computing systems but also paves the way for innovative applications that leverage the synergy between AI and edge technologies.
In the literature, there is a notable lack of research on the subject of inference task scheduling and offloading under constraints, primarily attributed to the novelty of the problem under investigation. Few works such as [11] focus on the decision of whether to process data locally or offload to an edge server based on the probability of the inference resulting in low accuracy while adhering to a given energy constraint. This work represents a smaller subproblem of the more general problem tackled in this work, where their work takes advantage of only a single local inference model with a single-edge server resulting in a binary decision. Moreover, the time aspect is not addressed, which is crucial for real-time applications. On the other hand, the study outlined in [4] strikes more resemblance to our problem, where multiple inference models within the edge device are considered all while adhering to time constraints. However, their investigation overlooks the energy aspect, which holds significance for battery-powered edge devices. Additionally, their system exclusively allows offloading to a single-edge server. This underscores a research gap wherein the overarching problem necessitates attention, considering both time and energy constraints while maximizing inference accuracy through the allocation of inference models from a set of local and edge server models.
Utilizing deep reinforcement learning (DRL) agents to orchestrate inference task scheduling and offloading in edge computing environments, our work seeks to optimize the allocation of inference models, respecting time and energy constraints while maximizing accuracy, thereby advancing the efficacy and efficiency of AI-driven IoT applications. The main contributions of this paper can be summarized as follows:
  • We formulate the problem of inference model scheduling between local inference and edge server offloading under time and energy constraints in parallel where we analyze its complexity.
  • We propose MASITO, a multi-agent deep reinforcement learning (DRL)-based framework for inference task scheduling and offloading for edge computing consisting of cooperating agents for task scheduling and edge server selection.
  • We perform experiments on the framework and compare its performance against other schemes, such as genetic algorithms (GAs), simulated annealing (SA), and particle swarm optimization (PSO), where we prove its effectiveness and advantages over these schemes.
The rest of this paper is organized as follows. Section 2 presents the related works and points out the research gap. In Section 3, we describe the system model. In Section 4, we propose MASITO and explain the framework components. In Section 5, we present the experiment setup and results in addition to analysis of the obtained results. Finally, we conclude this work in Section 6.

2. Related Works

Task offloading in edge computing has attracted significant attention from the research community as a result of its crucial role in optimizing resource utilization and enhancing system performance. Various solutions and architectures have been proposed to address task-offloading problems [3]. In this section, we focus on the multi-access edge computing (MEC) network architecture in which computing resources are placed at the edge of the network, close to end users. This reduces latency and improves efficiency for data processing and service delivery in applications like IoT and mobile networks [12,13,14,15].
For instance, works proposed by Yang et al. (2020) [16], Liu et al. (2019) [17], and Zhang et al. (2021) [18] focus on designing offloading decisions based on data characteristics and network conditions, aiming to minimize latency and maximize throughput. Similarly, solutions proposed by Chen et al. (2021) [19], Li et al. (2020) [20], and Xu et al. (2020) [21] examine task-offloading strategies considering energy consumption and device capabilities, developing energy-aware scheduling algorithms. Additionally, studies by Cozzolino et al. (2023) [22], Abdenacer et al. (2023) [6], and Younis et al. (2019) [23] address both energy and latency, aiming to reduce both metrics or find a balance between them. These approaches are often not suitable for real-time applications, which require meeting strict time constraints and tight energy budgets.
A notable scarcity of studies is found in the literature where explicit time and energy constraints are considered, either individually or in combination. For example, works by Li et al. (2021) [24], Tajallifar et al. (2021) [25], and Liu et al. (2016) [26] propose a time-aware task-offloading framework that processes tasks by priority based on given deadlines to dynamically allocate resources for meeting real-time requirements. Similarly, Zhao et al. (2021) [27] and Jiang et al. (2022) [28] introduce an energy-aware task-offloading approach optimizing energy consumption by balancing workload distribution across edge devices and servers. Furthermore, Mohammad et al. (2020) [29], Azizi et al. (2022) [30], Wang et al. (2019) [31], and Ben et al. (2024) [32,33] tackle the joint optimization of time and energy constraints in task-offloading decisions, using mathematical modeling techniques to formulate the problem as a multi-objective optimization problem. These studies emphasize the necessity of accounting for both time and energy constraints when making task-offloading decisions, underscoring their interdependence and the requirement for comprehensive optimization strategies in edge computing environments.
In the context of inference tasks, accuracy as a task-offloading objective has seen less attention from the research community. With the rise of AI model deployment in IoT, it is becoming an important performance metric. Nikoloska et al. (2020) [11] propose a data selection scheme based on a confidence metric for edge devices to select the data samples which could lead to poor inference accuracy, in which case they are offloaded to the edge server. Fresa et al. (2021) [4] propose a task scheduling and offloading scheme based on LP-Relaxation and dynamic programming where all possible cases of scheduling two inference tasks between the edge device and the edge server are considered.
Various optimization methods have been used to tackle the task-offloading problem in edge computing, reflecting the complexity and diversity of the challenges involved. Traditional optimization techniques, such as Mixed Linear Programming (MLP) [34] and branch and bound [35], have framed task allocation as a mathematical optimization problem, allowing for exact or near-optimal solutions under certain constraints. Additionally, machine learning methods have gained prominence for their capacity to adaptively learn and optimize task-offloading decisions based on historical data and real-time observations [36,37]. Furthermore, metaheuristic algorithms, including genetic algorithms, simulated annealing, and particle swarm optimization, have been employed to address the NP-hard nature of the task-offloading problem by efficiently exploring the solution space and identifying high-quality solutions [32,33,38,39]. Each method presents unique advantages and trade-offs, depending on the specific problem characteristics and the requirements of the edge computing environment.
Deep reinforcement learning (DRL) approaches stand out as highly adaptive schemes, making them suitable for high-mobility edge computing networks. Additionally, with proper configuration and design, DRL schemes can consistently match and outperform metaheuristic schemes while consuming fewer resources. Chen et al. (2021) [19] propose a resource allocation scheme for augmented reality (AR) in single and multi-access edge computing (MEC) systems. Using a DRL multi-agent deep deterministic policy gradient (MADDPG) method, their system managed to minimize energy consumption for each user subject to latency requirements and limited resources. Their proposed system uses MEC servers for offload path planning, which incurs a communication overhead while edge devices wait for plans. Additionally, their proposed DRL agent considers a fixed number of MEC servers. Alfakih et al. (2020) [36] propose a system model consisting of mobile edge computing networks (MECNs) with multiple access points. The edge devices can connect to an MECN through the access points. The edge device uses a state–action–reward–state–action (SARSA) reinforcement learning agent to decide whether to offload the task to the nearest edge server, an adjacent edge server, or the remote cloud. This work relies on networking infrastructure that might not always be available and cost-effective, potentially causing additional delays in real-time applications. Furthermore, the proposed agent has limited actions in terms of deciding which server to offload to, significantly reducing the ability to precisely control the overall task offload latency. Gao et al. (2024) [40] propose Com-DDPG, an offloading strategy for MEC utilizing multi-agent reinforcement learning to improve offloading performance. Within the Internet of Vehicles (IoV) transmission radius, multiple agents collaborate to learn environmental changes, such as the number of mobile devices and task queues, and develop an offloading strategy for edge servers. The method models task dependency, priority, and resource consumption, and formulates communication among agents. Reinforcement learning determines the offloading strategy, with a Long Short-Term Memory (LSTM) network enhancing internal state predictions and a bidirectional recurrent neural network (BRNN) improving communication features among agents. Li et al. (2024) [41] propose a Multi-Action and Environment-Adaptive Proximal Policy Optimization algorithm (MEPPO), an enhancement of the conventional PPO algorithm, aimed at optimizing task scheduling and resource allocation in dynamic Vehicular Edge Computing (VEC) environments. The method encompasses three core aspects: generating task-offloading and priority decisions to reduce service request completion time, dynamically allocating transmit power based on expected transmission distances to minimize energy consumption, and adapting scheduling decisions to varying numbers of vehicle tasks by manipulating the state space of the PPO algorithm. This approach ensures efficient and adaptive management of tasks and resources in real-world scenarios.
In summary, a significant gap in the literature is identified, where inference task-offloading decisions are determined by both time and energy constraints while aiming to maximize overall accuracy. Our work stands out by addressing inference task scheduling and offloading with a strong emphasis on accuracy as a key performance metric. Furthermore, our approach directly integrates time and energy constraints to control scheduling and offloading decisions, focusing on parallel task execution. Additionally, deep reinforcement learning enables rapid scheduling and real-time adaptation to network changes.

3. System Model

We consider a sensing system consisting of edge devices equipped with a set of local inference models L = { 1 , , L } . These edge devices have access to a set of edge servers E . Each edge server is equipped with a single inference model. The set of edge server inference models is denoted as S = { 1 , , S } . At each time slot, the edge device receives a set of inference tasks denoted as J = { 1 , , J } . An edge device constructs a selected set of inference models M = { 1 , , M } for each time slot using S and L . This set is then used to assign inference models to tasks. All notations used in this section are presented in Table 1.

3.1. Inference Accuracy

Edge devices can deploy inference models in several ways. One approach is to use a single tunable model, where adjusting input hyperparameters alters accuracy and inference times. Alternatively, multiple instances of similar models with different internal structures, such as varying layer sizes in Deep Neural Networks (DNNs), can be deployed. Another option is to use diverse types of models with different sizes and top-1 average accuracies. Since the actual top-1 accuracy of each model for a specific inference task is unknown beforehand, we rely on the average accuracy estimated from historical top-1 accuracy measurements. The average top-1 accuracy of model i is denoted as A i where i M . The average top-1 accuracy of models on edge servers is set to be significantly higher than that of local inference models on edge devices [32].
A j > A i i L , j S

3.2. Time Delay Model

The average inference time for each model i denoted as T i inf where i M is estimated using the average historical measured inference times where data pre-processing time is considered part of T i inf . We define T i lat as the average latency of edge server i estimated from previous measurements. T i lat is continuously updated after every transmission providing an indirect metric to the mobility of edge devices. Let T i j off be the estimated time to offload task j where i J to edge server i. T i j off can be calculated using the channel bandwidth and the size of task j denoted as s i z e j given by
T i j off = s i z e j b i + T i lat
where b i represents the bandwidth of the channel between the edge device and edge server i. The bandwidth can be estimated from historical transmissions.
Let T i j task be the total time to process a given task j using model i including inference and offloading times.
T i j task = T i inf i L j J
T i j task = T i inf + T i j off + T i resp i S j J
where T i resp represents the average response time from edge server i given by
T i resp = s i z e r b i + T i lat
where s i z e r is a constant representing the response size.
We define x i j = { 0 , 1 } as a binary variable representing the decision whether an inference model i is assigned to inference task j. Let T k slot be the total time to process a complete time slot k. The local inference and offloading are performed in parallel, and therefore we define T k slot as the max between the total local inference time T k local and the total server time which includes offloading, inference, and response times of all offloaded tasks denoted as T k server . T k server is calculated using Algorithm 1.
T k slot = m a x ( T k local , T k server ) k Δ
where
T k local = i = 1 L j = 1 J x i j T i inf
  • T i server is the total offload, inference, and response times for all tasks offloaded to the edge server with model i.
  • T off _ accu is a variable to accumulate and track offload times for all edge servers.
  • T i inf _ accu is a variable that accumulates and tracks inference times for each edge server with model i.
Algorithm 1: Steps to calculate T k server
Electronics 13 02580 i001
The edge devices, encompassing both standard edge devices and edge servers, are assumed to have two queues as depicted in Figure 2, one for computation and another for communication. This configuration facilitates parallel processing of inference tasks and offloading or receiving data. Figure 3 illustrates an example schedule for nine inference tasks, where four tasks are processed using local inference models, and six tasks are offloaded to three different edge servers. At t 1 , the scenario is as follows:
T off _ accu = T 1 , 1 off
T 1 inf _ accu = m a x ( T 1 inf _ accu , T off _ accu ) + T 1 , 1 inf
T 1 inf _ accu = T off _ accu + T 1 , 1 inf = T 1 , 1 off + T 1 , 1 inf
T 1 server = m a x ( T 1 server , T 1 inf _ accu ) + T 1 resp
T 1 server = T 1 inf _ accu + T 1 , 1 resp = T 1 , 1 off + T 1 , 1 inf + T 1 , 1 resp
By following the steps outlined in Algorithm 1 and applying them to the example shown in Figure 3, we obtain the T i server values presented in Table 2. The T server value is determined as the maximum of all edge server total times. Finally, the total slot time T slot is calculated by taking the larger value between the total time for local inference T local and T server .
T server = m a x ( T 1 server , T 2 server , T 3 server ) = T 2 server
T slot = m a x ( T local , T 2 server ) = T 2 server

3.3. Inference Energy

Let E i j off be the energy cost of offloading task j to edge server i. E i j off depends on the offload time T i j off and c i the average energy cost of transmitting data to edge server i in one time unit. c i depends on several factors, including the communication medium such as Wi-Fi, Cellular, Bluetooth, or Zigbee, which affects energy consumption. Each medium has different power requirements, data rates, and transmission ranges, which influence the overall energy cost. The transmission power level of the wireless device impacts energy consumption. Higher transmit power levels generally result in greater energy consumption to maintain communication over longer distances or in environments with obstacles or interference. The power consumed by the wireless device when idle or in standby mode also contributes to the overall energy cost. Signal strength and quality impact energy consumption, especially in wireless communication systems that adapt transmission power based on signal conditions. Maintaining reliable communication may require higher power levels in environments with weak or noisy signals, leading to increased energy consumption. Environmental factors such as interference, obstacles, and electromagnetic noise can affect energy consumption by influencing signal propagation and reception quality. Various optimization techniques, such as data compression, packet aggregation, adaptive modulation, and power control algorithms, can help reduce energy consumption during wireless communication by improving spectral efficiency and minimizing transmission overhead.
The energy cost of offloading task j to edge server i, denoted as E i j off , depends on the offload time T i j off and c i , the average energy cost per time unit for transmitting data to edge server i. Several factors influence c i , including the communication medium (Wi-Fi, Cellular, Bluetooth, or Zigbee), each with distinct power requirements, data rates, and transmission ranges. Higher transmission power levels generally lead to greater energy consumption, especially for maintaining communication over longer distances or in challenging environments. Additionally, the power consumed by the wireless device in idle or standby mode contributes to the overall energy cost. Signal strength and quality also impact energy consumption, as maintaining reliable communication in weak or noisy signal environments may require higher power levels. Environmental factors such as interference, obstacles, and electromagnetic noise can affect energy consumption by influencing signal propagation and reception quality. Furthermore, optimization techniques like data compression, packet aggregation, adaptive modulation, and power control algorithms can reduce energy consumption by improving spectral efficiency and minimizing transmission overhead [32].
We assume that c i can be calculated internally by monitoring battery usage and the network adapter’s configurations such as the transmission power. By averaging these measured power usage metrics we can estimate c i . E i j off is given by the following.
We calculate c i internally by monitoring battery usage and network adapter configurations, including transmission power levels. By averaging these measured power usage metrics, we can estimate c i . The energy cost E i j off of offloading task j to edge server i is then given by
E i j off = T i j off c i
Similarly, the energy cost of the inference response denoted by E i resp is given by
E i resp = T i resp c i
Let E i inf be the average energy cost of performing the inference of an inference task using model i. The inference energy cost is negligible compared to the offloading energy cost. Therefore it is defined as a constant that can be estimated using the inference time and the maximum power consumption of the edge device’s CPU in the worst case under full load. Let E i j task be the total energy cost of processing task j using model i given by the following.
Let E i inf represent the average energy cost of performing inference for a task using model i. Given that the inference energy cost is minimal compared to the offloading energy cost, it is considered constant. This constant can be estimated using the inference time and the maximum power consumption of the edge device’s CPU under worst-case conditions of full load. Let E i j task denote the total energy cost for processing task j using model i, given by:
E i j task = E i inf i L j J
E i j task = E i inf + E i j off + E i resp i S j J
Finally, we define E k slot as the total energy consumption for slot k given by
E k slot = i = 1 M j = 1 J x i j E i j task

3.4. Problem Formulation

In this section, we identify two optimization problems. First, the problem of assigning inference models to inference tasks while respecting the given time and energy constraints and maximizing the overall accuracy. Secondly, the problem of selecting an optimal subset of available edge servers which maximizes the average accuracy of produced schedules while reducing the scheduling time.

3.4.1. Inference Task Scheduling Problem

The problem can be formulated as follows:
M a x i m i z e A k slot = i = 1 M j = 1 J x i j A i
Where A k slot is the total accuracy for a time slot k. Let E k slot be the total energy consumption of slot k. Given E and T as the energy and time constraints, respectively, Equation (1) is subject to the following:
T k slot T k Δ
E k slot E k Δ
i = 1 M x i j = 1 j J
Using Equation (2), we guarantee that each parallel processing time of each slot is respecting the time constraint. Similarly, Equation (3) ensures energy consumption for a time slot respects the given constraint. Finally, Equation (4) guarantees each inference task is assigned an inference model that produces a complete solution.
This problem could be thought of as an instance of the well-known classic knapsack problem (KP) in which we are trying to fill our schedule (i.e., a knapsack) with inference models (i.e., pieces) to maximize the accuracy (i.e., profit) while respecting time and energy constraints (i.e., knapsack weight and volume capacities). In this case, the pieces and the knapsack have two dimensions and therefore this problem is an instance of the multi-dimensional KP. Additionally, inference models (i.e., pieces) can be reused to construct a schedule, and therefore the problem becomes an instance of the unbounded multi-dimensional KP (UMdKP). However, since we are considering parallel schedules where inference tasks can be processed locally and in edge servers in parallel, which in UMdKP terms means pieces can be overlapping in the weight dimension (i.e., time) but not in the volume dimension which renders this similarity useless. Alternatively, multiple knapsacks can be considered for each edge server which makes this problem an instance of the multi-knapsack problem (MKP). However, the problem becomes much more difficult to model especially when trying to uphold the time constraint over all knapsacks.

3.4.2. Edge Server Selection Problem

Let M k be the set of selected edge servers for time slot k. Let y j be a binary variable representing whether the inference model from edge server j is selected to be part of M .
Objective:
M a x i m i z e A k = i = 1 L x i A i + j = 1 S y j A j
Subject to:
i = 1 L x i + j = 1 S y j = | M |
x i { 1 , 0 } , i L
y j { 1 , 0 } , j S
where:
  • A i is the average accuracy of local inference model i.
  • A j is the accuracy of the inference model from edge server j.
This formulation (Equation (6)) ensures that each selected edge server contributes its single inference model to the selected subset M . The binary variable y j determines the selection of edge servers, and the objective function maximizes the total accuracy obtained from the selected models. Solving this optimization problem provides the optimal or near-optimal solution for selecting the subset M for each time slot k.
This type of problem is a combinatorial optimization problem known as a subset selection problem, where the goal is to select a subset of elements from a given set while optimizing a certain objective function subject to certain constraints.
In this specific case, we are tasked with selecting a subset M of inference models from both local models and models hosted on edge servers to perform inference tasks. The objective is to maximize the total accuracy obtained from the selected models while minimizing the cardinality of M (i.e., selecting the fewest number of models necessary to achieve faster scheduling times and higher accuracy).
Such types of optimization problems can be solved using various methods such as greedy algorithms where we select models based on certain criteria (e.g., highest accuracy or best cost–benefit ratio) until the cardinality constraint is met, while greedy algorithms do not guarantee optimal solutions, they can provide fast and efficient solutions in many cases. Moreover, dynamic programming can also be used if the problem exhibits overlapping subproblems and optimal substructure properties. This approach is particularly useful for problems with small problem sizes and a limited number of feasible solutions. Metaheuristic algorithms such as genetic algorithms, simulated annealing, or particle swarm optimization can be used to explore the solution space and find near-optimal solutions.

4. Multi-Agent DRL-Based Selective Inference Task Scheduling and Offloading Framework (MASITO)

Task scheduling and offloading in edge computing systems involve complex decision-making processes due to dynamic network conditions, varying computational loads, time, and energy constraints. Deep reinforcement learning (DRL) is best suited to handling such complexity by enabling agents to learn optimal decision-making policies through interaction with the environment. Task scheduling and offloading decisions in edge computing systems often need to optimize multiple conflicting objectives, such as adhering to time and energy constraints while maximizing accuracy. DRL algorithms can balance these objectives by learning complex trade-offs and generating efficient scheduling and offloading strategies. DRL agents can adapt to such dynamic environments by continuously learning and updating their policies based on real-time feedback.
The proposed MASITO framework depicted in Figure 4 takes advantage of DRL and uses two cooperating reinforcement learning agents, namely edge server selection agent and a scheduling agent. At the beginning of each time slot, the agents have access to information about the received tasks for that time slot. In addition to information about local inference models, available edge server models, given constraints, and network status. Using this information the edge server selection agent selects a single-edge server model for each task to be included in the selected set of inference models along with local inference models. This set is then used by the scheduling agent to assign an inference model for each task.
In the context of edge computing, the system state at a given time slot is often primarily influenced by the decisions made in the previous time slot regarding task scheduling and offloading. This characteristic lends itself well to being modeled as a Markov Decision Process (MDP), where the state transition dynamics satisfy the Markov property, meaning the future state depends only on the current state and the action taken. However, accurately determining the transition probabilities between states, especially in finite-state MDPs, can be challenging due to the dynamic and uncertain nature of the network environment, and while existing works often assume that transfer probabilities can be obtained through offline training, this approach may not accurately reflect the real-world network conditions, which are inherently unknown and subject to change. Therefore, to devise an optimal task scheduling and offloading strategy, reinforcement learning agents, specifically scheduling and server selection agents, are trained using model-free deep reinforcement learning (DRL) algorithms like Double Deep Q-Networks (DDQNs). Model-free DRL algorithms learn directly from interactions with the environment without relying on explicit models of the system dynamics, making them suitable for capturing the complex and dynamic nature of edge computing environments. By leveraging DDQNs, these agents can iteratively learn and update their decision-making policies based on real-time feedback, ultimately enabling more adaptive and effective task scheduling and offloading strategies that maximize system performance under time and energy constraints.

4.1. MASITO Design

For each time slot k, the edge device receives a set of inference tasks J . For each task, a fixed-size pool of selected inference models M consisting of all the local inference models L in addition to a single slot reserved for a selected edge server model m s is constructed.
M = L + m s
The design illustrated in Figure 5 facilitates the operation of DDQNs with fixed inputs and outputs for both agents, enabling them to adapt to the dynamic network of edge servers, which may join and disconnect arbitrarily. The selection of the edge server model m s is made based on the current inference task to best adhere to the given time and energy constraints. Subsequently, the set of selected inference models is utilized by the scheduling agent to assign a single model to the ongoing inference task. Initially, the schedule H k for time slot k is initialized with a default local model m d where m d L . Following each scheduling assignment, the model corresponding to the currently processed task is substituted with the selected model. This design approach ensures the availability of a complete schedule before all time slot tasks are completed, facilitating evaluation at each step, and enabling the computation of a reward representing a single action. This design is advantageous for training agents as it necessitates providing a reward for each action, as opposed to multiple actions where the agent selects inference models for all slot tasks and subsequently receives a single reward.
A schedule H k undergoes evaluation by computing its total time T k slot , total energy consumption E k slot , and total accuracy A k slot . We identify two methods for evaluating a schedule. The first method relies solely on estimated averages, including inference time, inference energy, and accuracy, for each selected model in a schedule. This approach offers benefits for agents as they receive relatively consistent average values, facilitating faster learning. Additionally, multiple agents can run in parallel to produce a single schedule, resulting in significant speed advantages. This is achievable because only cached average values are needed, eliminating the need for real inference after each selection step. However, relying solely on average estimates may lead to incorrect solutions, especially when outliers such as task sizes and network latency deviate significantly from the estimated average values. Hence, we propose a second approach where inference is performed after every selection step. This yields real inference times, energy consumption, and accuracy, as opposed to estimated averages, allowing the agent to adapt its subsequent actions to address previous sub-optimal actions and better handle outlier data, which can cause unexpected time and energy spikes. Consequently, this approach leads to more accurate schedules. Moreover, during training, the second approach benefits from instant and accurate rewards.

4.2. Inference Task Scheduling

In this section, we propose an MDP model for the inference task scheduling problem.

4.2.1. State Space

The state for the scheduling agent is designed such that given a set of selected inference models M , time and energy constraints T and E, respectively, and currently processed task j, it allows it to estimate T k slot , E k slot , and A k slot for time slot k. As discussed earlier, we initially start with a default schedule H and start replacing models with selected ones and thus the state is represented as the set of all possible schedules produced from considering all possible actions (i.e., all models in M ). A state s k , j sched for time slot k and task j is given by
s k , j sched = { [ I d i , T , E , A k slot , E k slot , T k server , T k local , T k slot , s i z e j , c ] i M }
where I d i is the id of the inference model i. T and E represent time and energy constraints, respectively. s i z e j is the size of task j. c is the number of remaining tasks in the schedule which gives the agent an indicator of how to spend the constraints budget according to how many tasks are remaining. c can be calculated by subtracting j from the number of tasks per time slot. A k slot , E k slot , T k server , T k local , T k slot are estimated from previous schedule values A k slot , E k slot , T k server , T k local , T k slot , respectively, passed by previous state s k , j 1 sched and replacing placeholder default inference model M d by model M i .
Let z i be a binary variable representing whether inference model m i is a local model or belongs to an edge server.
z i = 0 m i is a local model 1 m i is an edge server model
E k slot = E k slot E d inf + E i j task
A k slot = A k slot A d + A i
T k off _ accu = T k off _ accu + z i T i off
T k server = m a x ( T k server , T k off _ accu + z i ( T i inf + T i resp ) )
T k local = T k local T d inf + ( 1 z i ) T i inf
T k slot = m a x ( T k local , T k server )
All state values except for Id i and c are normalized using a min–max approach where instead of using values from different ranges, we scale all values to a range between 0 and 1.

4.2.2. Action Space

The action space for the scheduling agent consists of the index of the selected inference model. The action space a sched is given by
a sched = { i i M }

4.2.3. Reward Function

After every action, the resulting schedule is evaluated where we calculate T k slot , A k slot , and E k slot ; then, using time and energy constraints we define the reward r k , j for time slot k and task j is given in Equation (7). Let A k slot ^ , T k slot ^ , and E k slot ^ be the normalized values of A k slot , T k slot , and E k slot , respectively.
r k , j = α A k slot ^ β | T k slot ^ T ^ | γ | E k slot ^ E ^ |
where T ^ and E ^ are normalized values of T and E, respectively. α , β , and γ are weighting coefficients that determine the relative importance of each term in the reward function.

4.3. Edge Server Selection

In this section, we propose an MDP model for the edge server selection problem.

4.3.1. State Space

The state for the edge server selection agent contains information about the network latency and accuracy about the currently evaluated edge server along with the current task j and given constraints. This gives the agent enough context to decide whether to use this edge server for offloading or not. Given m s as the inference model corresponding to the previously selected edge server, and m s + 1 as the inference model corresponding to the currently evaluated edge server. We define the state s k , j sel as follows:
s k , j sel = [ T s lat , T s + 1 lat , T k server , T k local , E k slot , T , E , s i z e j ]
where T and E represent time and energy constraints, respectively. s i z e j is the size of the currently processed task j.

4.3.2. Action Space

The action space of the edge server selection agent consists of a binary decision representing whether to use the evaluated edge server E S 1 or skip it and keep using the previous one E S 0 .
a sel = { 0 , 1 }
The edge server selection agent receives a similar reward to the scheduling agent since they have a common goal of producing the best schedule given the same constraints.

4.4. DDQN for MASITO

Deep Q-Network (DQN) is a reinforcement learning algorithm that combines deep neural networks with Q-learning to learn optimal policies for decision-making in sequential decision-making tasks. At its core, DQN aims to approximate the optimal action-value function, Q * ( s , a ) , which represents the expected cumulative reward when taking action a in state s and then following the optimal policy thereafter. The key innovation of DQN lies in using deep neural networks to approximate the action-value function, enabling it to handle high-dimensional state spaces commonly encountered in real-world applications. DQN learns by iteratively updating the parameters of the neural network to minimize the temporal difference error between the predicted Q-values and the target Q-values. By leveraging experience replay, where past experiences are stored and sampled randomly during training, DQN improves sample efficiency and stability by breaking temporal correlations in the data. Through this process, DQN learns to make optimal decisions by iteratively refining its policy based on feedback received from the environment.
Double Deep Q-Networks (DDQNs) improve upon DQNs by addressing the issue of overestimation bias in action values, which can lead to suboptimal policies. In DDQNs, this bias is mitigated by decoupling action selection from action evaluation through the use of two separate networks, a policy network and a target network (see Figure 6), and while the policy network selects the best action based on the current state, the target network is used to estimate the value of that action. By periodically updating the parameters of the target network with those of the policy network, a DDQN ensures that the target values used to compute the temporal difference error are more stable and less prone to overestimation. This approach results in more accurate and reliable Q-value estimates, leading to improved convergence and ultimately better performance in reinforcement learning tasks compared to the original DQN algorithm. The target Q-value is calculated according to Equation (8). The loss function is given by Equation (9). The target network is updated using Equation (10).
Q t a r g e t ( s , a ) = r + γ Q t ( s , argmax a Q ( s , a ; θ ) ; θ )
L ( θ ) = E [ ( r + γ Q t ( s , argmax a Q ( s , a ; θ ) ; θ ) Q ( s , a ; θ ) ) 2 ]
θ = τ θ + ( 1 τ ) θ
where Q ( s , a ; θ ) represents the Q-value predicted by the policy network for state s and action a with parameters θ . Q t ( s , a ) represents the target Q-value for state s and action a. r is the immediate reward obtained after taking action a in state s. s is the next state. γ is the discount factor determining the importance of future rewards. θ represents the parameters of the target network. τ is the soft update parameter determining the rate at which the target network parameters are updated.
The deep neural network for the scheduling agent takes an array of size | M | × 10 and outputs an array of size | M | , while the deep neural network for the edge server selection agent takes an array of size ( | M | + 5 ) × 2 as input and outputs an array of size 2.
Algorithm 2 describes the steps used to train the DDQN agents. It uses an epsilon-greedy strategy where an agent decides between exploring new actions and exploiting known actions based on a parameter ϵ . Initially, ϵ is set to ϵ m a x and then each episode is reduced by a factor of ϵ d e c a y until it reaches a minimum value of ϵ m i n . This ensures sufficient exploration to avoid local optima and a gradual shift towards exploitation leveraging accumulated knowledge for better decision-making. Furthermore, Replay memory is used where past experiences (state, action, reward, next state) are stored in a buffer, allowing the agent to randomly sample mini-batches of these experiences during training. This approach breaks the temporal correlations between consecutive experiences, stabilizing the training process and improving learning efficiency. Algorithm 3 presents the main steps of MASITO illustrated in Figure 5.
Algorithm 2: DDQN Agent Training Algorithm
Electronics 13 02580 i002
Algorithm 3: MASITO steps
Electronics 13 02580 i003

5. Experimental Results

In this section, we evaluate the performance of MASITO through extensive experiments and comparisons with the following algorithms.

5.1. Baseline Algorithms

Genetic algorithms (GAs) have been extensively utilized in the literature [38,42] for addressing high-dimensional optimization problems due to their versatility and effectiveness. In a GA, a population of candidate solutions, represented as chromosomes, undergoes iterative evolution through processes such as selection, crossover, and mutation, inspired by principles of natural selection and genetics. Selection mechanisms, such as tournament selection, play a crucial role in GA operation by determining which individuals proceed to the next generation based on their fitness. The tournament selection method, employed in our study, involves randomly selecting a subset of individuals from the population and choosing the fittest individual among them to serve as a parent for producing offspring. This approach ensures that individuals with higher fitness have a greater probability of being selected as parents, thereby guiding the evolution of the population toward more optimal solutions. The widespread use of GAs in the literature, coupled with their ability to handle high-dimensional problems and diverse solution spaces, justifies their selection as a baseline for comparison in our study.
Particle swarm optimization (PSO) stands out as a foundational metaheuristic extensively employed in MdKP research [43,44]. Its efficacy in navigating high-dimensional search spaces has established it as a reliable baseline method. Drawing inspiration from the collective behavior observed in natural phenomena like bird flocks or fish schools, PSO operates through a cooperative search mechanism. Individuals, termed particles, iteratively adjust their positions based on personal experiences and swarm knowledge. This collaborative approach enables PSO to explore diverse regions of the search space, adapt to dynamic conditions, and converge toward optimal solutions efficiently. Assessing our proposed framework against PSO provides valuable insights into its effectiveness and competitiveness in addressing the optimization task, and while PSO traditionally caters to continuous search spaces with real-valued vectors, our experiments necessitate its adaptation for discrete search spaces. This adaptation involves representing solutions as integer position vectors, corresponding to inference model indexes, and applying clipping mechanisms to integer velocity vectors after each update to ensure adherence to predefined ranges.
Simulated annealing (SA) has been widely adopted in the literature for both task scheduling [45,46] and the MdKP [47,48], demonstrating its effectiveness. Its stochastic nature, employing a probabilistic acceptance criterion, enables it to navigate complex and multimodal optimization landscapes, making it well-suited for tackling challenging optimization problems. This characteristic allows SA to evade local optima and explore diverse regions of the search space, rendering it a compelling choice as a baseline method.
All compared algorithms utilize an identical fitness function f outlined in Equation (14). The parameters for all implemented algorithms are detailed in Table 3. Python version 3.12 serves as the programming language for the implementation of all algorithms, supplemented by PyTorch for handling inference models. Notably, parameters such as population size and swarm size for GA and PSO have been meticulously tuned to their minimum values offering best accuracy with lowest execution time. Increasing these parameters only results in longer execution times without improvements in accuracy.
δ T = T T k slot T
δ E = E E k slot E
ω T = 1 if ( δ T < 0 ) and ( | δ E | δ m i n ) 1 | δ T | otherwise
ω E = 1 if ( δ E < 0 ) and ( | δ T | δ m i n ) 1 | δ E | otherwise
ω = 1 2 ω T + 1 2 ω E
f k = ω A k slot 100
In this context, δ T and δ E signify the distance ratio of T k slot and E slot k from the specified constraints T and E, respectively. δ m i n denotes the minimum distance ratio at which the constraint is considered fulfilled. Introducing ω T and ω E as the time and energy penalties, respectively, the penalties are structured to scale in accordance with the distance from the given constraints. This design compels the agent to minimize the distance and effectively utilize the allocated constraint budget. However, in scenarios where a single constraint imposes a limitation, Equations (11) and (12) ensure that the agent is not forced to maximize both constraints simultaneously. Instead, the penalty for the other constraint is waived as long as it remains within acceptable bounds.

5.2. Experiment Setup

We present a real-world case study employing a laptop as the edge device, as detailed in Table 4. The laptop is equipped with a quad-core processor running at 3.0 GHz and 8 GB of RAM. Connectivity is facilitated through WiFi (802.11ac), linking the edge device to an access point. The access point, in turn, connects to a set of edge servers comprising desktop computers via Ethernet. To ensure the validity of our findings in practical settings, it is crucial to note that the experiments were conducted in an environment devoid of concurrent devices competing for network resources, thus mitigating potential traffic contention issues. In our experiments, we opt to utilize power as a constraint, measured in watts, rather than energy, measured in kWh, for the sake of convenience.

5.3. Experiment Case Study

We use an object detection case study utilizing the ImageNet-mini dataset [49], comprising 3923 images with sizes ranging from 10 KB to 10 MB, to assess the effectiveness of our system. For deployment on the edge device, we select a suite of lightweight object classification inference models, including ResNet-18 and ResNet-34 [50], as well as ShuffleNet-V2 [51]. Conversely, the edge servers are equipped with a more precise and larger inference model, specifically the ResNeXt-101 [52]. During the deployment phase, we conduct tests on these models to estimate their average inference time and accuracy on each machine, as outlined in Table 5. Notably, the average inference time of edge servers varies across machines due to differences in hardware capabilities and is therefore excluded from the table.
The task scheduling agent undergoes training with parameters detailed in Table 6.
Initially, training commences on a subset comprising 33% of the dataset, spanning 50 episodes, while introducing randomized variations in time and energy constraints within the ranges of T = [ 200 , 250 , 300 , , 500 ] and E = [ 10 , 15 , 20 , , 50 ] . In the initial phase of pre-deployment training, edge servers are simulated locally, leveraging high-accuracy inference models while introducing random variations in latency. Once the scheduling agent has attained sufficient training, the subsequent phase entails training the server selection agent. The parameters employed for training the server selection agent align closely with those utilized for the scheduling agent, with exceptions specified in Table 6.

5.4. Evaluation Metrics

We assess the performance of the agents on the remaining 66.66% of the dataset using four key metrics: scheduling time, accuracy, time, and power consumption. Scheduling time, measured in milliseconds, represents the duration required by the algorithm to generate a schedule for a given time slot. This metric is calculated individually for each time slot and then averaged across all slots. Accuracy, time, and power consumption collectively quantify the quality and adherence of the generated schedules to the specified constraints. Each schedule for every time slot undergoes evaluation through inference utilizing the designated inference models. Subsequently, these metrics are aggregated across all time slots to provide a comprehensive assessment across the dataset.

5.5. Performance under Different Number of Iterations

Examining the results depicted in Figure 7, Subplot 1, we note that the scheduling times of GA and PSO exhibit linear scaling in relation to the number of iterations. Notably, a small number of iterations, around 20 for GA, 100 for PSO, is found to be adequate for generating high-accuracy schedules, whereas for SA we find 500 iterations to be sufficient. These values will be used in subsequent experiments. Conversely, while SA initially appears to maintain a constant scheduling time, a closer inspection (refer to Figure 8, Subplot 1) reveals a linear growth pattern, albeit at a slower pace. Interestingly, MASITO remains unaffected by the number of iterations, attributable to its capacity to generate solutions in a single forward pass through the deep neural network.
In Subplot 2, we observe that all schemes yield nearly comparable accuracy solutions, albeit with minor deviations. SA and PSO notably produce higher accuracy schedules with increased iterations.
Subplot 3 highlights MASITO’s distinct advantage in leveraging the available time budget compared to other methods, enabling it to achieve comparable average accuracy. This phenomenon may be attributed to the edge server selection agent’s tendency to choose higher latency and power-cost servers, yielding similar accuracy but garnering higher rewards. This behavior aligns with the reward function design, which penalizes greater deviations from the given constraints, thereby incentivizing the selection of servers that better use the given time and power budgets to maximize the reward.

5.6. Performance Evaluation under Different Time Constraints

To assess MASITO’s performance and accuracy in terms of constraint compliance, we conduct experiments where one constraint is fixed while the other is varied. In Figure 9, we maintain the power constraint at 30W and adjust the time constraint from 200 to 500 ms. Analyzing the results depicted in Figure 9, Subplot 1, we observe that GA, PSO, and SA demonstrate linear growth in scheduling times as the time constraints increase. In contrast, MASITO remains unaffected, maintaining a consistent scheduling time regardless of the time constraint value. Subplot 2 illustrates that all methods yield schedules with increasing accuracy relative to higher time constraints, with MASITO achieving comparable accuracy to GA and SA while surpassing PSO in accuracy, despite exhibiting significantly shorter scheduling times. Notably, in Subplot 3, we observe that all compared algorithms adhere to the specified time constraint, albeit with minor variations for MASITO. This suggests MASITO’s capability to effectively manage time constraints while maintaining high accuracy levels and low scheduling times.

5.7. Performance Evaluation under Different Power Constraints

Similarly, in Figure 10, we fix the time constraint at 500 ms and vary the power constraint from 10 to 30 W. Analyzing Figure 10, Subplot 1, we observe results similar to those previously discussed. However, we note that the power constraint has a more pronounced effect on the scheduling times of GA, PSO, and SA, with a noticeable increase in scheduling times in response to variations in the power constraint, while MASITO remains unaffected. Subplot 2 reveals that the accuracy levels of all compared schemes are mostly similar, with slight variations observed for MASITO. In Subplot 3, we observe that all metaheuristic methods exhibit scheduling times lower than the specified constraint, especially for the 20 W constraint, indicative of a greater reliance on offloading to low-latency edge servers. Conversely, MASITO tries to maintain scheduling times as close as possible to the given constraint in an attempt to maximize rewards. Subplot 4 demonstrates that all compared schemes successfully adhere to the specified power constraint until reaching 40 W, beyond which all schemes reach a time constraint limit and cease to increase power consumption despite having larger power budgets. However, MASITO exhibits less compliance with the power constraint due to the reward function coefficients favoring accuracy and time over power. This issue could be addressed by adjusting these coefficients, albeit potentially at the expense of accuracy.

5.8. Performance Evaluation under Varying Number of Edge Servers

In this section, we assess the scalability of the framework by fixing both time and energy constraints at 500 ms and 30 W, respectively, while varying the number of available edge servers, as demonstrated in Figure 11. This enables us to monitor changes in scheduling time with larger networks and determine whether the compared methods capitalize on the available resources. Upon examining Figure 11, Subplot 2, we observe that as the number of available edge servers increases, MASITO consistently generates solutions on par with or occasionally surpassing those of GA, PSO, and SA. This can be attributed to the server selection agent’s adeptness at making optimal decisions in selecting the best servers, while operating at lower scheduling times. Subplot 3 highlights MASITO’s ability to fully leverage the allocated time constraint budget to maximize rewards, contrasting with other schemes that often opt for solutions with lower times. Lastly, in Subplot 4, MASITO is observed to allow for higher power solutions than the specified power constraint, prioritizing time compliance and higher accuracy at the expense of power constraint adherence as a strategic compromise to maximize rewards. This behavior can be fine-tuned by adjusting the coefficients of the reward function to achieve a more balanced trade-off between time, accuracy, and power constraints.

6. Conclusions

In this study, we address the challenging optimization problem of selective inference task scheduling and offloading under time and energy constraints while prioritizing accuracy maximization. Demonstrated to be strongly NP-Hard, we propose MASITO, a novel framework-leveraging cooperative multi-deep reinforcement learning agents. Our approach involves the deployment of a scheduling DRL agent tasked with allocating inference models to inference tasks, balancing between local inference and offloading to edge servers. Additionally, a complementary DRL agent is devised to select optimal edge servers for each inference task based on specified time and energy constraints. These agents operate synergistically, compensating for each other’s sub-optimal actions, and are seamlessly integrated into MASITO to dynamically adapt to diverse network configurations, facilitating applications in high-mobility edge computing environments. Experimental validation on the ImageNet-mini dataset underscores the efficacy of our framework against genetic algorithms (GAs), particle swarm optimization (PSO), and simulated annealing (SA) which reveal MASITO’s superiority in maintaining consistently lower scheduling times regardless of constraints and the number of edge servers, a critical factor for real-time applications. Moreover, MASITO demonstrates comparable average accuracy in worst-case scenarios and superior accuracy in best-case scenarios. Furthermore, MASITO exhibits the potential for continuous improvement with increased data processing.
Looking ahead, we propose two avenues for future research. Firstly, we advocate for the implementation of a federated learning scheme among edge devices to facilitate the sharing of learned experiences, expediting the convergence to optimal agents. Secondly, we suggest the development of an automated system capable of adapting the reward function coefficients dynamically according to specific application scenarios, enhancing the adaptability and robustness of MASITO in diverse real-world settings.

Author Contributions

In the collaborative endeavor of this research, each co-author played a distinct and vital role. A.B.S. led the research efforts and skillfully put into practice the conceptual ideas. H.N. and S.D. provided valuable oversight throughout, ensuring this study’s coherence and alignment with research goals. A.K. contributed expertise in consulting and methodology design, strengthening the investigative framework. Additionally, A.N. meticulously reviewed and edited the manuscript, improving its overall quality. Lastly, N.A. offered co-supervision, providing further insights and guidance. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Baccour, E.; Mhaisen, N.; Abdellatif, A.A.; Erbad, A.; Mohamed, A.; Hamdi, M.; Guizani, M. Pervasive AI for IoT applications: A survey on resource-efficient distributed artificial intelligence. IEEE Commun. Surv. Tutor. 2022, 24, 2366–2418. [Google Scholar] [CrossRef]
  2. Shlezinger, N.; Bajić, I.V. Collaborative inference for AI-empowered IoT devices. IEEE Internet Things Mag. 2022, 5, 92–98. [Google Scholar] [CrossRef]
  3. Islam, A.; Debnath, A.; Ghose, M.; Chakraborty, S. A Survey on Task Offloading in Multi-access Edge Computing. J. Syst. Archit. 2021, 118, 102225. [Google Scholar] [CrossRef]
  4. Fresa, A.; Champati, J.P. Offloading Algorithms for Maximizing Inference Accuracy on Edge Device Under a Time Constraint. arXiv 2021, arXiv:2112.11413. [Google Scholar]
  5. Cacchiani, V.; Iori, M.; Locatelli, A.; Martello, S. Knapsack problems—An overview of recent advances. Part II: Multiple, multidimensional, and quadratic knapsack problems. Comput. Oper. Res. 2022, 143, 105693. [Google Scholar] [CrossRef]
  6. Abdenacer, N.; Abdelkader, N.N.; Qammar, A.; Shi, F.; Ning, H.; Dhelim, S. Task Offloading for Smart Glasses in Healthcare: Enhancing Detection of Elevated Body Temperature. In Proceedings of the 2023 IEEE International Conference on Smart Internet of Things (SmartIoT), Xining, China, 25–27 August 2023; pp. 243–250. [Google Scholar] [CrossRef]
  7. Naouri, A.; Nouri, N.A.; Khelloufi, A.; Sada, A.B.; Naouri, S.; Ning, H.; Dhelim, S. BusCache: V2V-based infrastructure-free content dissemination system for Internet of Vehicles. IEEE Access 2024, 12, 37663–37678. [Google Scholar] [CrossRef]
  8. Aung, N.; Dhelim, S.; Chen, L.; Lakas, A.; Zhang, W.; Ning, H.; Chaib, S.; Kechadi, M.T. VeSoNet: Traffic-Aware Content Caching for Vehicular Social Networks Using Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8638–8649. [Google Scholar] [CrossRef]
  9. Naouri, A.; Ning, H.; Nouri, N.A.; Khelloufi, A.; Ben Sada, A.; Naouri, S.; Qammar, A.; Dhelim, S. Maximizing UAV fog deployment efficiency for critical rescue operations: A multi-objective optimization approach. Future Gener. Comput. Syst. 2024, 159, 255–271. [Google Scholar] [CrossRef]
  10. Naouri, A.; Nouri, N.A.; Khelloufi, A.; Sada, A.B.; Ning, H.; Dhelim, S. Efficient fog node placement using nature-inspired metaheuristic for IoT applications. Clust. Comput. 2024. [Google Scholar] [CrossRef]
  11. Nikoloska, I.; Zlatanov, N. Data selection scheme for energy efficient supervised learning at iot nodes. IEEE Commun. Lett. 2020, 25, 859–863. [Google Scholar] [CrossRef]
  12. Khelloufi, A.; Ning, H.; Naouri, A.; Sada, A.B.; Qammar, A.; Khalil, A.; Mao, L.; Dhelim, S. A Multimodal Latent-Features-Based Service Recommendation System for the Social Internet of Things. IEEE Trans. Comput. Soc. Syst. 2024, 1–16. [Google Scholar] [CrossRef]
  13. Zhang, D.G.; Dong, W.M.; Zhang, T.; Zhang, J.; Zhang, P.; Sun, G.X.; Cao, Y.H. New Computing Tasks Offloading Method for MEC Based on Prospect Theory Framework. IEEE Trans. Comput. Soc. Syst. 2024, 11, 770–781. [Google Scholar] [CrossRef]
  14. Khelloufi, A.; Khelil, A.; Naouri, A.; Sada, A.B.; Ning, H.; Aung, N.; Dhelim, S. A Hybrid Feature and Trust-Aggregation Recommender System in the Social Internet of Things. IEEE Access 2024. [Google Scholar] [CrossRef]
  15. Dhelim, S.; Aung, N.; Kechadi, M.T.; Ning, H.; Chen, L.; Lakas, A. Trust2Vec: Large-Scale IoT Trust Management System Based on Signed Network Embeddings. IEEE Internet Things J. 2023, 10, 553–562. [Google Scholar] [CrossRef]
  16. Yang, T.; Chai, R.; Zhang, L. Latency optimization-based joint task offloading and scheduling for multi-user MEC system. In Proceedings of the 2020 IEEE 29th Wireless and Optical Communications Conference (WOCC), Newark, NJ, USA, 1–2 May 2020; pp. 1–6. [Google Scholar] [CrossRef]
  17. Liu, C.F.; Bennis, M.; Debbah, M.; Poor, H.V. Dynamic task offloading and resource allocation for ultra-reliable low-latency edge computing. IEEE Trans. Commun. 2019, 67, 4132–4150. [Google Scholar] [CrossRef]
  18. Zhang, H.; Yang, Y.; Huang, X.; Fang, C.; Zhang, P. Ultra-low latency multi-task offloading in mobile edge computing. IEEE Access 2021, 9, 32569–32581. [Google Scholar] [CrossRef]
  19. Chen, X.; Liu, G. Energy-efficient task offloading and resource allocation via deep reinforcement learning for augmented reality in mobile edge networks. IEEE Internet Things J. 2021, 8, 10843–10856. [Google Scholar] [CrossRef]
  20. Li, J.; Dai, M.; Su, Z. Energy-aware task offloading in the Internet of Things. IEEE Wirel. Commun. 2020, 27, 112–117. [Google Scholar] [CrossRef]
  21. Xu, Z.; Zhao, L.; Liang, W.; Rana, O.F.; Zhou, P.; Xia, Q.; Xu, W.; Wu, G. Energy-aware inference offloading for DNN-driven applications in mobile edge clouds. IEEE Trans. Parallel Distrib. Syst. 2020, 32, 799–814. [Google Scholar] [CrossRef]
  22. Cozzolino, V.; Tonetto, L.; Mohan, N.; Ding, A.Y.; Ott, J. Nimbus: Towards Latency-Energy Efficient Task Offloading for AR Services. IEEE Trans. Cloud Comput. 2023, 11, 1530–1545. [Google Scholar] [CrossRef]
  23. Younis, A.; Tran, T.X.; Pompili, D. Energy-latency-aware task offloading and approximate computing at the mobile edge. In Proceedings of the 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), Monterey, CA, USA, 4–7 November 2019; pp. 299–307. [Google Scholar] [CrossRef]
  24. Li, Z.; Chang, V.; Ge, J.; Pan, L.; Hu, H.; Huang, B. Energy-aware task offloading with deadline constraint in mobile edge computing. EURASIP J. Wirel. Commun. Netw. 2021, 2021, 56. [Google Scholar] [CrossRef]
  25. Tajallifar, M.; Ebrahimi, S.; Javan, M.R.; Mokari, N.; Chiaraviglio, L. Energy-efficient task offloading under E2E latency constraints. IEEE Trans. Commun. 2021, 70, 1711–1725. [Google Scholar] [CrossRef]
  26. Liu, K.; Peng, J.; Li, H.; Zhang, X.; Liu, W. Multi-device task offloading with time-constraints for energy efficiency in mobile cloud computing. Future Gener. Comput. Syst. 2016, 64, 1–14. [Google Scholar] [CrossRef]
  27. Zhao, M.; Yu, J.J.; Li, W.T.; Liu, D.; Yao, S.; Feng, W.; She, C.; Quek, T.Q. Energy-aware task offloading and resource allocation for time-sensitive services in mobile edge computing systems. IEEE Trans. Veh. Technol. 2021, 70, 10925–10940. [Google Scholar] [CrossRef]
  28. Jiang, H.; Dai, X.; Xiao, Z.; Iyengar, A. Joint task offloading and resource allocation for energy-constrained mobile edge computing. IEEE Trans. Mob. Comput. 2022, 22, 4000–4015. [Google Scholar] [CrossRef]
  29. Mohammad, U.; Sorour, S.; Hefeida, M. Task allocation for mobile federated and offloaded learning with energy and delay constraints. In Proceedings of the 2020 IEEE International Conference on Communications Workshops (ICC Workshops), Virtually, 7–11 June 2020; pp. 1–6. [Google Scholar] [CrossRef]
  30. Azizi, S.; Othman, M.; Khamfroush, H. DECO: A deadline-aware and energy-efficient algorithm for task offloading in mobile edge computing. IEEE Syst. J. 2022, 17, 952–963. [Google Scholar] [CrossRef]
  31. Wang, Q.; Guo, S.; Liu, J.; Yang, Y. Energy-efficient computation offloading and resource allocation for delay-sensitive mobile edge computing. Sustain. Comput. Inform. Syst. 2019, 21, 154–164. [Google Scholar] [CrossRef]
  32. Ben Sada, A.; Khelloufi, A.; Naouri, A.; Ning, H.; Dhelim, S. Hybrid metaheuristics for selective inference task offloading under time and energy constraints for real-time IoT sensing systems. Clust. Comput. 2024, 1–17. [Google Scholar] [CrossRef]
  33. Ben Sada, A.; Khelloufi, A.; Naouri, A.; Ning, H.; Dhelim, S. Energy-Aware Selective Inference Task Offloading for Real-Time Edge Computing Applications. IEEE Access 2024, 12, 72924–72937. [Google Scholar] [CrossRef]
  34. Alameddine, H.A.; Sharafeddine, S.; Sebbah, S.; Ayoubi, S.; Assi, C. Dynamic task offloading and scheduling for low-latency IoT services in multi-access edge computing. IEEE J. Sel. Areas Commun. 2019, 37, 668–682. [Google Scholar] [CrossRef]
  35. Ni, W.; Tian, H.; Lyu, X.; Fan, S. Service-dependent task offloading for multiuser mobile edge computing system. Electron. Lett. 2019, 55, 839–841. [Google Scholar] [CrossRef]
  36. Alfakih, T.; Hassan, M.M.; Gumaei, A.; Savaglio, C.; Fortino, G. Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA. IEEE Access 2020, 8, 54074–54084. [Google Scholar] [CrossRef]
  37. Huang, L.; Feng, X.; Zhang, C.; Qian, L.; Wu, Y. Deep reinforcement learning-based joint task offloading and bandwidth allocation for multi-user mobile edge computing. Digit. Commun. Netw. 2019, 5, 10–17. [Google Scholar] [CrossRef]
  38. Li, Z.; Zhu, Q. Genetic algorithm-based optimization of offloading and resource allocation in mobile-edge computing. Information 2020, 11, 83. [Google Scholar] [CrossRef]
  39. Abbas, A.; Raza, A.; Aadil, F.; Maqsood, M. Meta-heuristic-based offloading task optimization in mobile edge computing. Int. J. Distrib. Sens. Netw. 2021, 17, 15501477211023021. [Google Scholar] [CrossRef]
  40. Gao, H.; Wang, X.; Wei, W.; Al-Dulaimi, A.; Xu, Y. Com-DDPG: Task Offloading Based on Multiagent Reinforcement Learning for Information-Communication-Enhanced Mobile Edge Computing in the Internet of Vehicles. IEEE Trans. Veh. Technol. 2024, 73, 348–361. [Google Scholar] [CrossRef]
  41. Li, P.; Xiao, Z.; Wang, X.; Huang, K.; Huang, Y.; Gao, H. EPtask: Deep Reinforcement Learning Based Energy-Efficient and Priority-Aware Task Scheduling for Dynamic Vehicular Edge Computing. IEEE Trans. Intell. Veh. 2024, 9, 1830–1846. [Google Scholar] [CrossRef]
  42. Chakraborty, S.; Mazumdar, K. Sustainable task offloading decision using genetic algorithm in sensor mobile edge computing. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 1552–1568. [Google Scholar] [CrossRef]
  43. Haddar, B.; Khemakhem, M.; Hanafi, S.; Wilbaut, C. A hybrid quantum particle swarm optimization for the Multidimensional Knapsack Problem. Eng. Appl. Artif. Intell. 2016, 55, 1–13. [Google Scholar] [CrossRef]
  44. Bansal, J.C.; Deep, K. A Modified Binary Particle Swarm Optimization for Knapsack Problems. Appl. Math. Comput. 2012, 218, 11042–11061. [Google Scholar] [CrossRef]
  45. Tanha, M.; Hosseini Shirvani, M.; Rahmani, A.M. A hybrid meta-heuristic task scheduling algorithm based on genetic and thermodynamic simulated annealing algorithms in cloud computing environments. Neural Comput. Appl. 2021, 33, 16951–16984. [Google Scholar] [CrossRef]
  46. Fanian, F.; Bardsiri, V.K.; Shokouhifar, M. A new task scheduling algorithm using firefly and simulated annealing algorithms in cloud computing. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 195–202. [Google Scholar] [CrossRef]
  47. Chen, Y.; Hao, J.K. Memetic search for the generalized quadratic multiple knapsack problem. IEEE Trans. Evol. Comput. 2016, 20, 908–923. [Google Scholar] [CrossRef]
  48. Kierkosz, I.; Luczak, M. A hybrid evolutionary algorithm for the two-dimensional packing problem. Cent. Eur. J. Oper. Res. 2014, 22, 729–753. [Google Scholar] [CrossRef]
  49. Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. arXiv 2017, arXiv:1606.04080. [Google Scholar]
  50. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
  51. Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. arXiv 2018, arXiv:1807.11164. [Google Scholar]
  52. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. arXiv 2017, arXiv:1611.05431. [Google Scholar]
Figure 1. Edge computing system with inference task scheduling and offloading.
Figure 1. Edge computing system with inference task scheduling and offloading.
Electronics 13 02580 g001
Figure 2. Edge device model.
Figure 2. Edge device model.
Electronics 13 02580 g002
Figure 3. An example of a schedule for 3 edge servers and 10 inference tasks.
Figure 3. An example of a schedule for 3 edge servers and 10 inference tasks.
Electronics 13 02580 g003
Figure 4. MASITO architecture.
Figure 4. MASITO architecture.
Electronics 13 02580 g004
Figure 5. MASITO flow diagram.
Figure 5. MASITO flow diagram.
Electronics 13 02580 g005
Figure 6. DDQN algorithm.
Figure 6. DDQN algorithm.
Electronics 13 02580 g006
Figure 7. Comparing MASITO to GA, PSO, and SA while varying number of iterations with time constraint set to 500 ms and power constraint set to 30 W.
Figure 7. Comparing MASITO to GA, PSO, and SA while varying number of iterations with time constraint set to 500 ms and power constraint set to 30 W.
Electronics 13 02580 g007
Figure 8. A zoomed-in view of Subplot 1 from Figure 7.
Figure 8. A zoomed-in view of Subplot 1 from Figure 7.
Electronics 13 02580 g008
Figure 9. Comparing MASITO to GA, PSO, and SA with different time constraints while power constraint is set to 30 W.
Figure 9. Comparing MASITO to GA, PSO, and SA with different time constraints while power constraint is set to 30 W.
Electronics 13 02580 g009
Figure 10. Comparing MASITO to GA, PSO, and SA with different power constraints while time constraint is set to 500 ms.
Figure 10. Comparing MASITO to GA, PSO, and SA with different power constraints while time constraint is set to 500 ms.
Electronics 13 02580 g010
Figure 11. Comparing MASITO to GA, PSO, and SA with different counts of edge servers while time constraint is set to 500 ms and power constraint is set to 30 W.
Figure 11. Comparing MASITO to GA, PSO, and SA with different counts of edge servers while time constraint is set to 500 ms and power constraint is set to 30 W.
Electronics 13 02580 g011
Table 1. Symbols and notations.
Table 1. Symbols and notations.
NotationDescription
L The set of edge device local inference models
E The set of available edge servers
S The set of edge server inference models
M The set of all inference models available to the edge device
J The set of inference tasks for each time slot
A i The average top-1 accuracy of model i
T i inf The average inference time of model i
T i lat The average communication latency for edge server i
T i j off The estimated time to offload task j to the edge server i
s i z e j The size of task j
b i The bandwidth of the communication channel for edge server i
T i j task The total time it takes to process task j using model i
T i resp The average response time from edge server i
x i j Represents whether an inference model i is assigned to task j
T k slot The total time to process a complete time slot k
T k local The total local inference time for time slot k
T k server The total server time for time slot k
T off _ accu Offloading time accumulator
T inf _ accu Inference time accumulator
E i j off The energy cost of offloading task j to edge server i
c i The average energy cost of transmitting data to edge server i
E i resp The energy cost of the inference response
E i inf The average energy cost of inference using model i
E i j task The total energy cost of processing task j using model i
A k slot The total accuracy for time slot k
E k slot The total energy for time slot k
Table 2. Example values of T i server for time steps 2–5.
Table 2. Example values of T i server for time steps 2–5.
t 2 T 1 server = T 1 , 1 off + T 1 , 1 inf + T 1 , 2 inf + T 1 , 2 resp
t 3 T 2 server = T 1 , 1 off + T 1 , 2 off + T 2 , 1 off + T 2 , 1 inf + T 2 , 1 resp
t 4 T 2 server = T 1 , 1 off + T 1 , 2 off + T 2 , 1 off + T 2 , 1 inf + T 2 , 1 resp + T 2 , 2 resp
t 5 T 3 server = T 1 , 1 off + T 1 , 2 off + T 2 , 1 off + T 2 , 2 off + T 3 , 1 off + T 3 , 1 inf + T 3 , 1 resp
Table 3. Metaheuristic algorithm parameters.
Table 3. Metaheuristic algorithm parameters.
GAMutation probability0.25
Mutation fading0.90
Population20
Generations20
Tournament size10
PSOSwarm size50
Iterations100
SAInitial temperature1 × 103
Cooling rate1 × 10−1
Iterations500
Table 4. Edge device parameters.
Table 4. Edge device parameters.
ParameterValue
edge deviceAverage Response Size0.1 MB
| M | 4
Default ModelShuffleNet-V2
CPUQuad Core @ 3.0 GHz
RAM8 GB
Tasks Per Time Slot10
Table 5. Inference model parameters.
Table 5. Inference model parameters.
Average Accuracy (%)Average Inference Time (ms)Size (MB)Number of Parameters
ShuffleNet-V266.1519.445.31,366,792
ResNet-1872.0128.0744.711,689,512
ResNet-3476.7942.4583.321,797,672
ResNeXt-101 (Edge Servers)87.05-319.383,455,272
Table 6. Agent parameters.
Table 6. Agent parameters.
ParameterValue
Scheduling Agent ϵ m a x 1.0
ϵ m i n 0.01
ϵ d e c a y 0.995
Learning Rate1 × 10−3
Replay Memory Capacity1 × 106
Batch Size64
γ 0.99
τ 1 × 10−3
Update Every N steps4
Hidden Layer Size128
Hidden Layers3
Server Selection AgentHidden Layer Size64
Hidden Layers3
ϵ d e c a y 0.985
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ben Sada, A.; Khelloufi, A.; Naouri, A.; Ning, H.; Aung, N.; Dhelim, S. Multi-Agent Deep Reinforcement Learning-Based Inference Task Scheduling and Offloading for Maximum Inference Accuracy under Time and Energy Constraints. Electronics 2024, 13, 2580. https://doi.org/10.3390/electronics13132580

AMA Style

Ben Sada A, Khelloufi A, Naouri A, Ning H, Aung N, Dhelim S. Multi-Agent Deep Reinforcement Learning-Based Inference Task Scheduling and Offloading for Maximum Inference Accuracy under Time and Energy Constraints. Electronics. 2024; 13(13):2580. https://doi.org/10.3390/electronics13132580

Chicago/Turabian Style

Ben Sada, Abdelkarim, Amar Khelloufi, Abdenacer Naouri, Huansheng Ning, Nyothiri Aung, and Sahraoui Dhelim. 2024. "Multi-Agent Deep Reinforcement Learning-Based Inference Task Scheduling and Offloading for Maximum Inference Accuracy under Time and Energy Constraints" Electronics 13, no. 13: 2580. https://doi.org/10.3390/electronics13132580

APA Style

Ben Sada, A., Khelloufi, A., Naouri, A., Ning, H., Aung, N., & Dhelim, S. (2024). Multi-Agent Deep Reinforcement Learning-Based Inference Task Scheduling and Offloading for Maximum Inference Accuracy under Time and Energy Constraints. Electronics, 13(13), 2580. https://doi.org/10.3390/electronics13132580

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop