Joint Task Ofﬂoading, Resource Allocation, and Load-Balancing Optimization in Multi-UAV-Aided MEC Systems

: Due to their limited computation capabilities and battery life


Introduction
Wireless sensors and Internet of Things (IoT) devices are now common and ubiquitous in today's world, and their application and utility have expanded significantly.Furthermore, with the introduction of stable, high-speed internet (e.g., in 5G and 6G), many new services and applications in the fields of virtual/augmented reality, facial recognition, video games and video streaming, e-health, vehicular networks, and natural language processing [1,2] have emerged.However, these services and applications require increased computing capacity and energy efficiency, which creates new challenges for these devices to address due to their limited computation and battery capacity [3,4].As a result, task offloading is proposed as a prominent solution to these limitations, in which intensive and delay-tasks could be offloaded and then executed remotely at more powerful devices [5,6].
Mobile cloud computing (MCC) is regarded as a valuable solution for offloading the intensive tasks of IoT devices to cloud computing resources [7][8][9].Due to the centralization of cloud computing resources, the implementation of this solution results in high latency.MCC is also confronted with the issue of security [10,11].As a result, researchers tend to Appl.Sci.2023, 13, 2625 2 of 22 distribute cloud computing resources to the nearby edge of IoT users, resulting in a new paradigm known as mobile edge computing (MEC), which efficiently addresses latency and security issues [12,13].On the other hand, due to their constant use in a variety of fields, Unmanned Aerial Vehicle (UAV) systems have seen an increase in popularity and attracted considerable attention in recent years.Furthermore, UAVs' low deployment cost and high mobility make them popular and appealing devices for aided communication networks, acting as relay nodes [14], flying base stations [15], or terminal nodes [16].
Several models and approaches have been proposed recently for using computational offloading with UAV-assisted MEC systems [17].From one perspective, some models focus on a single objective, while others cover multiple objectives [18].From another perspective, some approaches address the task offloading for a single edge while others cover for a multiple edge server with and without the cloud [19].Nevertheless, most of these solutions allow the mobile users to only offload their tasks to stations that are connected, implying an unbalanced load at stations [12,20].Consequently, it may be difficult for some mobile users to complete their computation tasks within the acceptable latency threshold.Meanwhile, implementing an optimal strategy for multi-user in dynamic and complex systems such as multi-users MEC system is another challenging issue that should be carefully addressed.Motivated by these considerations, we propose an efficient technique for multi-user and multi-edges systems in this paper to balance base station loads and reduce system costs.A novel and efficient offloading algorithm is also designed to find a near-optimal solution.The main contributions reported in this paper are summarized as follows: • An efficient load-balancing algorithm is introduced for optimizing the load among GBSs, in which the mobile users are redistributed to the most appropriate GBSs regarding their location, task size, and CPU cycles.In addition, UAVs are utilized as a potential MEC server to provide computation and communication resources by hovering in overcrowded areas where the GBSs server is still overloaded.• Task offloading, load balancing, and resource allocation are jointly optimized for multi-tiers UAV-aided MEC systems via a formulation of an integer programming problem with the primary objective of minimizing system cost.

•
A novel form of deep reinforcement learning is introduced in which the application task's requirements represent the system state and the offloading decision is used to define the action.The solution is then derived using an efficient distributed deep reinforcement-learning-based algorithm.

•
Simulation findings demonstrate that the proposed model not only exhibits a fast and effective convergence performance but also significantly decreases the system costs compared with a benchmark approach.
The rest of this paper is structured as follows.In Section 2, the work pertaining to task offloading models will be highlighted, whereas the system model and the problem formulation are introduced in Section 3. Following that, in Section 4, we develop a novel and efficient task offloading algorithm based-deep reinforcement learning to derive the offloading solution.Afterward, experimental results are presented and discussed in Section 5. Finally, the conclusion and the future recommendations are presented in Section 6.

Related Work
In the last decade, the computing power of Unmanned Aerial Vehicles (UAVs) has improved by several orders of magnitude.UAVs have emerged as a promising technology for supporting a wide range of human activities, including target tracking, water quality inspection, disaster relief, and power patrol inspection to surveillance due to their inherent mobility and increasing computing power.As a consequence, there is significant interest in embedding UAVs into edge computing systems to provide value-added edge computing services.The recent proliferation of IoT devices, in conjunction with edge computing, have uncovered numerous opportunities for novel UAV-based applications.Due to their limited battery life and processing power, however, UAVs face difficulties when performing computationally intensive tasks.Multiple research proposals on UAV-assisted MEC systems have been proposed.Learning and optimization have been widely employed in the development of such systems.In addition, single-UAV edge computing systems may not be adequate for serving remote users and accommodating diverse UAV application scenarios.Therefore, solutions based on multiple UAVs have been proposed to handle this issue.
The authors of [21] proposed iTOA, an intelligent task offloading approach for UA MEC networks, to help in determining which offloading tasks would alleviate the burden on the UAV computation platform and be most beneficial to it.The key features of their work are the use of the deep Monte Carlo tree search (MCTS) scheme and a splitting deep neural network (sDNN).The role of MCTS is to find the optimal offloading trajectory that maximizes some reward functions having to do with latency.More specifically, the problem has been formulated as an average service latency minimization task.In order to make room for the application of MCTS, the task offloading problem has been recast as a Markov decision process (MDP).The authors further integrated MCTS with a Long Short-Term memory (LSTM) network to better predict the quality of channels, whereas the second serves to hasten search convergence by providing prior probabilities to the MCTS being used.The simulation findings show that the proposed iTOA has 33% and 60% lower latency than the game theory and greedy search benchmark task offloading methods, respectively.
Similarly, the work presented in [22] uses MDP and reinforcement learning to address the MEC task offloading problem in a UAV patrol system.To that purpose, a two-stage Stackelberg game model has been developed to simulate the system, which includes UAVs and network edge nodes.The problem has also been described as an optimization problem that utilizes a multi-agent deep deterministic policy gradient (MADDPG) scheme to find its solution.Extensive simulations were conducted in order to show the effectiveness of the suggested approach in terms of delay, utility of the UAVs, and system performance when compared to a random approach, a non-dominated sorting genetic algorithm, and a QoS priority algorithm.In another scenario, UAVs have been utilized to offer IoT devices with MEC services, particularly in scenarios such as terrestrial signal blockage and shadowing that make it difficult for IoT devices on the ground to reach edge clouds.
In this regard, the authors of [23] addressed the requirement of MEC quality of service and UAV battery limited lifetime and provided a system that takes into account IoT job offloading and UAV placements.The topic is formulated as a non-convex problem that minimizes the weighted sum of service delay of IoT devices and energy consumption of UAV.To tackle the problem, the authors employed an algorithm based on successive convex approximation.The authors carried out a number of experiments to show that their collaborative UAV-EC framework outperforms baseline methods that rely only on UAVs or ECs for MEC in IoT.
The authors of [24] explore scenarios in which complex missions with interdependent tasks must be carried out using multiple UAVs.They proposed a multi-agent reinforcement learning approach for determining a close-optimum offloading policy of bandwidth and task allocation to minimize the computing missions' average response time while taking the dynamic nature of the environment and the UAVs' energy constraints into account.A MDP formulation has also been adopted, in which the policy of offloading is used as the action, and the reward is tied to response time performance.Three distinct topologies for task interdependencies in complex missions are investigated as well as the impact on the offloading policy.In their experimental work, the authors demonstrated a good convergence ability and a significant reduction in response time of complex missions.
Considering the scenario of large-scale sparely distributed user equipment, Zhao et al. [25] proposed a collaborative framework based on multi-agent deep learning to jointly determine the UAVs' trajectories, communication resource management of UAVs and computation task allocation in a way that minimizes the total of execution delays and energy consumptions in multi-UAVs MEC systems.Furthermore, to deal with the high dimensional action space and find the best offloading policy, a multi-agent twin delayed deep deterministic policy gradient algorithm is used.The framework's implementation resulted in results that show a significant reduction in total system cost when compared to other approaches in both fixed and mobile user equipment scenarios.
Multiagent reinforcement learning was also employed in [26] to assist in decision making regarding task offloading from UAVs to edge clouds while minimizing the overall latency perceived by the user and UAV energy consumption.The distributed architecture presented in this paper made it possible to achieve the two aforementioned goals.
He et al. [27] emphasized the importance of fully utilizing the UAVs computation capabilities for effective remote edge computing in a multi-UAV MEC context.To that end, the authors proposed a multi-hop task offloading scheme with on-the-fly computation in which UAVs perform multi-hop task offloading collaboratively.Furthermore, UAVs use their local processors to perform some tasks.To jointly optimize deployment strategies and resource allocation, two distributed algorithms with linear complexity have been proposed.In the experimental study, both special cases and general cases are considered, and simulation results indicate that better performance in terms of the overall rate of a multi-UAV network has been improved through applying the proposed framework compared to the baseline scheme.
In [28], a two-layer optimization strategy is proposed to jointly optimize UAV task scheduling at the first layer via a dynamic programming-based bidding optimization method and bit allocation with UAVs trajectories while resolving the potential path conflict problem.According to simulation results, the proposed strategy outperforms greedy and random strategies in terms of total energy consumption by the user.The proposed system was able to eliminate UAV trajectory conflicts while satisfying safety constraints.Meanwhile, Yang et al. [29] have addressed the load-balancing problem for multi-UAV-aided mobile-edge computing.Specifically, UAVs are used as MEC nodes to provide the computation and storage capabilities for IoT devices.In addition, a new differential evolution (DE)-based technique is proposed to redeploy the UAVs to the best location according to the IoT distributions.Moreover, a deep learning-based approach is designed to solve the task-scheduling problem at each UAV and thereby improve task performance.However, the approach in [29] assumed that UAVs have large enough computation capabilities to address IoT devices, which is not true in the real world, especially, in a large-scale environment.Consequently, this impedes the capability of the system to effectively address real-time applications when the number of IoT devices increases.In contrast, in our work, UAVs are used as assistance to MEC nodes to address the load problem.
To reduce the ground nodes' energy consumption, the authors of [30] investigated collaborative multi-task computation and cache offloading for UAV-aided MEC networks, taking into account the time-sensitive tasks' Quality of Experience.To handle the multiple problem variants, a block coordinate descent-based solution is proposed.Three manageable subproblems were identified and solved iteratively, including trajectory optimization, UAV resource allocation (including computation capability and bandwidth), and offloading decisions that take into account task type and ratio.Multiple experiments were conducted in a simulated environment, and the authors reported that the results provided insight into the effectiveness of collaborative offloading as it was able to manage two jobs (computing and caching) while reducing the overall GN energy consumption and meeting the QoE requirements of various task types.Meanwhile, Zhou et al. [31] proposed a cooperative task offloading and resource allocation algorithm (CTROAA) that uses Lyapunov optimization to minimize energy consumption while taking system performance into account.The authors considered hybrid energy sources for multi-clouds with UAV assistance.Three critical subproblems pertaining to task offloading control, local computing control, and charging control and cloud computing have been considered.The first two problems were mathematically formulated as convex optimization problems, whereas the third problem was mathematically formulated as a combinatorial optimization problem and was solved using Simulated Annealing (SA).To demonstrate the efficacy of the algorithm, a mathematical analysis was conducted where authors demonstrated that CTORAA can achieve the arbitrarily defined profit-stability trade-off.In addition, simulation results were conducted to demonstrate that the algorithm could adapt to varying task arrival rates, assure the stability of the queue, and outperform two benchmark approaches, "Fixed" and "Random" in finding the optimal solution to minimize energy consumption.
Elsewhere, the task-offloading problem for a multi-Internet of vehicles environment has been added in [32,33].Specifically, He et al. [32] modeled task offloading, security, and resource allocation as a multi-objective problem for UAV-assisted vehicular ad hoc networks, where minimizing the task delay is the main goal.Afterward, a relax-androunding and Lagrangian-based algorithm is proposed to solve this problem in an effective manner.Whereas, in [33], an integrated problem of task offloading, RSUs' cooperation, and tasks' division is proposed for MEC-based vehicular networks, where minimizing the task delay and increasing the performance of the service are the main goal.Additionally, an efficient routing mechanism is introduced to boost service reliability and decrease the rate of session failures.Moreover, a model-based deep neural network is developed to solve this problem and derive the optimum solution.
Furthermore, recent task offloading and resource allocation models have been proposed for MEC networks [34,35].Specifically, Mohamed et al. [34] proposed a multi-tiered edge-based resource allocation optimization framework for the heterogeneous execution of tasks.This framework can facilitate the different operations of offloading over diverse IoT environments.Moreover, an optimization strategy is proposed with the goal of reducing energy consumption while promoting processing computations, task execution time, and network bandwidth.Whereas in [35], a resource allocation has been investigated for multi-UAV-aided MEC networks, in which a mixed-integer programming problem is formulated with the goal of decreasing the overall system cost in terms of latency and energy.Moreover, an efficient algorithm based on deep reinforcement learning is proposed that joined UAV movement control, MU power control, and MU association to solve this problem and derive the solution.
Recently, the task offloading and resource allocation optimization for a mobile edge computing environment has been proposed in [36][37][38].More specifically, Xu et al. [36] proposed a cooperative task-offloading approach for UAV-aided MEC systems, where the nearest mobile device of a UAV can be served as an assisted hob to transfer the tasks of far mobile devices to UAV for remote processing.In addition, a block coordinate descent-based algorithm is proposed to optimize the trajectory of UAVs and decrease the overall energy consumption of mobile devices and UAVs.Meanwhile, in a different contribution, a multi-task offloading and resource allocation approach was proposed for MEC systems in satellite IoT [37], in which the task allocation and scheduling are initially handled by multiple unmanned aerial vehicle-based air base stations, with edge computing provided by satellites.Moreover, the directed acyclic graph is utilized to drive the main dependencies between tasks, and then, an attention mechanism and proximal policy optimization collaborative-based algorithm is proposed to obtain the offloading strategy.Whereas, in [38], an energy and delay-optimized trajectory planning framework is proposed for a multi-UAV multi-IoT network.More specifically, Banerjee et al. designed a multi-UAV multi-IoT system environment in hexagonal cells, in which each cell involves a set of IoT devices and a single UAV.In addition, each cell has IoT devices grouped into clusters that UAVs hover over to collect and delegate tasks when needed.Moreover, the major objective of this study is the optimization of energy consumption and transition times between hovering points.Finally, to identify the Pareto-optimal front and choose the optimal solution, a multi-objective optimization technique is applied.
Table 1 provides a concise summary of the literature reviewed in terms of the main objective, solution, application and tier environments as well as highlights the main weakness.
It is observed from the above summary of related works that numerous efforts and approaches have been investigated for addressing the task-offloading issues for multiuser MEC systems with single and/or multiple edge servers either using the cloud or not.Nevertheless, most of these solutions allow mobile users to only transmit their tasks to the connected MEC server, which implies an unbalanced load on the edge servers.
Consequently, some mobile users may find it difficult to complete their computation tasks within the acceptable latency threshold.Meanwhile, implementing an optimal strategy for multi-usersa in dynamic and complex systems such as a multi-users MEC system is another challenging issue that should be carefully addressed.Motivated by these considerations, an efficient load-balancing model for multi-user and multi-edge systems is presented in this paper in order to balance the loads among base stations and reduce system costs.Furthermore, a novel task-offloading approach based on deep learning techniques is proposed to efficiently derive the offloading solution.

System Model and Problem Formulation
This section starts by introducing the multi-UAV-aided MEC system model.Following this, optimization problems pertaining to task offloading, resource allocation, and loadbalancing models are formulated with the aim of minimizing the total system cost.

System Model
In this study, we consider a multi-UAV-aided MEC system as shown in Figure 1, which consists of three main layers.In the first layer, a D = {1, . . ., K} set of device users is distributed in which each device has an intensive application required to be executed.In the second layer, there is a G = {1, . . ., M} and U = {1, . . ., N} set of ground base stations (GBSs) and UAVs, respectively, that offer the storage and computation capabilities for device users due to their limitations in computation and battery.Moreover, UAVs can be utilized as a potential MEC server to provide communication and computation resources through hovering over the crowded areas which are still loaded due to the fixed location of GBSs.Furthermore, UAVs and GBSs are controlled and managed through a backbone router, in which the controller technology of software defined network (SDN) is efficiently utilized.Finally, in the last layer, a single cloud is configured and attached to the backbone router in the second layer through the core of the network.
Consequently, each device user can process their intensive computation application locally using their resources or offloading and then processing at one of the available GBSs or UAVs (in the case of GBSs being overcrowded) or at the cloud server.Thereupon, S = {0, 1, . . ., M, M + 1, M + 2, . . ., J, J + 1} is used to denote the set of the available servers that can be used to process the intensive application of mobile user, where 0 indicates the local resources of mobile users, from 1 to M indicates one of the available GBSs, from M + 1 to J (i.e., M + N) indicates one of the available UAVs, and J + 1 indicates the cloud server.
Moreover, let y i,j ∈ {0, 1} be a binary offloading decision for the intensive application of mobile i that will be allocated to be processed at server j, where i ∈ K, j ∈ S.More specifically, (y i,0 = 1) means the mobile i decides to process its application locally, and (y i,J+1 = 1) means the mobile i decides to offload and process its application remotely at the cloud server; otherwise, the mobile i decides to offload and process its application remotely at GBSs or UAVs.Overall, the application of each mobile user must be processed by only one server (including itself, i.e., server 0).Therefore, ∑ J+1 j=0 y i,j = 1, where the offloading servers value depends on: Based on previous studies in MEC [39,40], we adopt a quasi-static model in our simulations where the devices' number remains unchanged over the offloading period, whereas it may be changed over the different periods.A further discussion of the load balancing, communication and computation setup and their requirements for mobile-edge cloud computing models are presented below, taking into account the key roles they play.

Load Balancing
In this subsection, we investigate the design of the load-balancing process among GBSs using UAVs.First, at time t, the distribution of mobile devices across the GBSs is unbalanced, with some GBSs being overloaded while others are underloaded, as depicted in Figure 1.Consequently, the network congestion degrades the service quality and application's latency for these devices.Therefore, the main contribution of load balancing there falls into how to balance the load among GBSs.This can be achieved in two main phases.These can be accomplished as follows.In the initial phase, we redistribute the reallocated IoT devices (i.e., devices that exist in the intersection area of GBSs) to be connected to the best GBSs (i.e., least loaded) by compelling them to hand over.Then, in the second phase, we locate the GBSs that are still overcrowded based on a predetermined threshold value θ and then assign one or more UAVs hovering above to alleviate their loads through providing the computing and storage capabilities for their mobile users.The detailed process steps for balancing the load among GBSs are as follows.Based on previous studies in MEC [39,40], we adopt a quasi-static model in our simulations where the devices' number remains unchanged over the offloading period, whereas it may be changed over the different periods.A further discussion of the load balancing, communication and computation setup and their requirements for mobile-edge cloud computing models are presented below, taking into account the key roles they play.

Load Balancing
In this subsection, we investigate the design of the load-balancing process among GBSs using UAVs.First, at time t, the distribution of mobile devices across the GBSs is unbalanced, with some GBSs being overloaded while others are underloaded, as depicted in Figure 1.Consequently, the network congestion degrades the service quality and application's latency for these devices.Therefore, the main contribution of load balancing there falls into how to balance the load among GBSs.This can be achieved in two main phases.These can be accomplished as follows.In the initial phase, we redistribute the reallocated IoT devices (i.e., devices that exist in the intersection area of GBSs) to be connected to the best GBSs (i.e., least loaded) by compelling them to hand over.Then, in the second phase, we locate the GBSs that are still overcrowded based on a predetermined threshold value θ and then assign one or more UAVs hovering above to alleviate their loads through providing the computing and storage capabilities for their mobile users.The detailed process steps for balancing the load among GBSs are as follows.
Initially, a summary of the information about the mobile devices associated with each GBS is sent to the manager through the connected GBSs.This information includes the total number of mobile devices, the available rate of data for each user, the CPU cycles and data size required for each task associated with the mobile device and all the mobile devices that exist around intersection area of GBSs (i.e., black mobile users in Figure 1) and may be reallocated to another nearby GBS.The central manager then iterates over all mobile devices in the intersection area and, based on the collected information, forces them to hand over to the most appropriate available GBS.Then, user's computation capabilities and data rate are updated once the appropriate GBS is chosen for each mobile device.After all mobile devices have been assigned to the optimal GBS, the steps are repeated.Subsequently, upon receiving the updated information, the central manager determines the new number of mobile users at each GBS and finds the GBSs that are still overloaded (i.e., the number of users greater than given threshold θ).It then hovers one of the available UAVs above these GBSs so they can be able to provide new computation and storage capabilities.By doing so, the overhead consumption for mobile devices will be minimized.Algorithm 1 outlines the comprehensive process for balancing the load among the GBSs.
A snapshot example of IoT devices' distribution across GBSs is shown in Figure 2 to illustrate the algorithm execution.We can observe from this figure that seventeen IoT devices are distributed across three GBSs, where 12 devices are connected with GBS 1 and 2, and three devices are connected with GBS 2 and GBS 3 .Additionally, D 10 , D 11 , D 12 , D 15 , and D 16 are all co-located across GBSs' boundaries, making it easy to allocate devices as needed.Furthermore, each GBS is capable of providing 20 GHz computation capabilities and a bandwidth of 20 MHz that can be shared with connected devices.Finally, our goal is to redistribute workloads among GBSs as well as provide UAV-enabled edge computing services in the still-overloaded area so that quality is improved and energy is reduced.Furthermore, redistributing the workloads among GBSs is possible and can be achieved through two main phases: handing over devices to the best-performing GBS and providing UAV-enabled edge computing services in the still-overloaded GBSs.Based on the given parameters' values, these phases can be technically achieved as follows.
Algorithm 1 GBSs Balancing Load 1: Initialization: Each mobile device i is located and associated with a given GBSs j. 2: /*Phase one -Redistribute mobile devices across GBSs*/ 3: for all GBSs j and at given time slot t do 4: ζ ← Numbers of mobile devices.

5:
ϕ ← Mobile devices' requirements in terms of CPU cycles and task size.

6:
η ← Calculate the computation capabilities and data rate assigned for each device at GBSs.

7:
ϑ ← Determine the mobile devices that can be reallocated to another GSBs.

8:
Determine the optimum GBSs for each mobile user and then force it to hand over with respect to the values ofζ, η, ϑ. 9: end for 10: /*Phase Two -Provide the Overloaded GBSs with UAVs*/ 11: for all GBSs j and at given time slot t do 12: ζ ← Find the updated mobile devices' numbers associated with each GBS.Provide one UAV with computation and storage capabilities hovering above this GBS. 15:

end if 16: end for
As shown in Table 2, the central control manager first collects summary information about the current environment, such as the number of GBSs and their connected devices, the number of devices that can be reallocated, the data rate available, and the task requirements for each device, such as data size and CPU cycles.Then, it iterates over the reallocated devices and determines the optimal GBS based on the estimated execution time.For instance, D 10 will be handed over to GBS 3 where the estimated time (i.e., upload and computation) is 23.2 s at GBS 1 and 6.8 s at GBS 3 .In addition, D 11 and D 12 will be handed over to GBS 2 which reduces the time, whereas D 15 and D 16 remain connected to GBS 3 .In our example, after the IoT devices are allocated to the best GBSs, some GBSs may still be overloaded (e.g., GBS 1 in our example).Therefore, we can proceed to the second phase, UAV deployment, which can be performed as follows.Based on the updated number of connected devices for each GBS, and according to the given threshold /theta (e.g., /theta = 6), UAVs are hovering above the area of GBSs that have devices more than or equal to θ (i.e., GBS 1 ) and providing the computation capabilities to IoT devices.

Communication Model
In this subsection, we investigate the transmission time and energy consumption associated with communications between the mobile device and servers.Moreover, each intensive application is denoted by a tuple of (α i , µ i , β i , Γ i ) for i ∈ K, where α i and µ i , respectively, indicate the input and output data size of the application, β i indicates the number of CPU cycles needed to accomplish the application, and Γ i indicates the deadline associated with each application.
Consequently, in this paper, guided by the work in [41], the total energy and time consumption for returning the output data size is neglected because of its small size compared with the input data size.Moreover, to mitigate the uplink interference of multiusers transmission in the same cell, an Orthogonal Frequency Division Multiple Access approach is selected [42].
Following that, regarding the Shannon law [43], if the mobile user i decides to offload and then process its application remotely at GBSs or UAVs J, then the upload data rate among the mobile device and the edge can be identified as follows: where B J and p i , respectively, indicate the bandwidth and transmission power of mobile i and g 0 and ω 0 , respectively, indicate the associated gain and noise power.
If the mobile user i decides to offload and then process its application remotely at the cloud server J + 1, then one of the available GBSs or UAVs will be chosen as a relay node to transfer the application's data and requirements.In this study, the GBSs or UAVs with the greatest uplink data rate are chosen as a relay node, and its equation can be identified as follows: Furthermore, if the mobile user i decides to process its application locally, for completeness, we assume that the uplink data rate is R i,0 = ∞.
Finally, the overhead consumption for the communication can be computed as follows: where ζ denotes the prorogation delay between the cloud and edge server, and T U i denotes the upload time that can be expressed as:

Computation Model
In this subsection, we investigate the transmission time and energy consumption associated with processing each task at the available servers.Moreover, the available capabilities in CPU cycles for each server j are denoted by f j , where j ∈ {0...J + 1}.
Note that the computation capabilities at the cloud are more powerful than GBSs and UAVs and the capabilities of GBSs and UAVs are more powerful than those of mobile devices.In addition, in this article, we assume that the computation capabilities of GBSs and UAVs are equally shared between all the mobile devices that transmit their tasks to the same server.Moreover, the computation capabilities assigned for each device at server j can be expressed as follows: As a consequence, the time and energy spent on executing the task of each mobile i can be computed as: where ξ is a constant coefficient that denotes the consumption of energy where the mobile device is being idle.Finally, the overhead for processing the task of each device user i, in which load balancing, communication, and computation models are considered, can be expressed as: where w t i and w e i ∈ [0, 1] are scalar weights denoting the time and energy consumption, respectively, which depends on the nature of the application.For instance, if (w t i = 0) and (w e i = 1), the mobile user targets an application with an energy-sensitive, or maybe the battery of the device is in a low state.Whereas, if (w t i = 1) and (w e i = 0), the mobile user targets an application that is time-sensitive.Additionally, for different objectives, w t i and w e i are set at different values.Moreover, through the application settings, these weights can be adjusted at any time.

Problem Formulation
The formulation of the problem of a multi-user, multi-UAV-aided MEC system is presented in this subsection.Additionally, load balancing, task offloading, and resource allocation are all taken into account for each mobile user.Moreover, both the energy and time overhead are jointly optimized in the objective.The problem can be formulated as follows: where decreasing the total system cost is the main objective.In addition, the first constraint handles the delay requirements for each user, whereas constraint C2 ensures that each task will only be executed once and constraint C3 assures the binarization of the taskoffloading variable.The problem's solution is derived through determining the best values for the offloading.Nevertheless, it is not possible to solve this problem using a convex feasible set since y is a binary variable.In addition, the problem is NP-hard, since the objective is not convex [44].Moreover, solving it in polynomial time is difficult, particularly with a large number of device users, since the problem size grows exponentially with the increase in devices.As result, reinforcement learning can be used as an alternative to conventional optimization methods to solve these problems and obtain close-optimal solutions.

Deep Reinforcement Learning-Based Approach for Solving the Problem
Throughout this section, we demonstrate how deep reinforcement learning can be used for solving our optimization problem effectively, thereby reducing the time complexity and effort involved in solving it.First, we introduce the reinforcement learning definition and then highlight its key elements.After this, a distributed deep learning scheme is introduced to achieve a close optimum solution.

Reinforcement Learning Introduction
It is important to note that one of the most cutting-edge fields of machine learning, reinforcement learning (RL), is capable of coping with an unpredictable and dynamic environment and also taking a variety of actions in order to maximize the accumulated reward.More specifically, RL is comprised of five main elements, namely environment, agent, action, reward, and state space.Firstly, for a given environment and a specific time t, the state s is observed by the agent, and then based on the policy π = P(a t |s t ), an action is selected to move the agent from the state s t to the next state s + 1.The agent then applies the reward function R(s, a) to earn a reward r.Finally, the agent repeats this procedure until it reaches the final state and maximizes the total reward regarding R t = ∑ ∞ k=0 γ k r t+k , where γ ∈ [0, 1] is the factor for discount.

Reinforcement Learning Key Elements
For the system model to be converted into reinforcement learning equivalents, the state, actions, and reward function must be defined, which represent the key elements of RL.In our environment, where multiple users are performing an intensive-application on an MEC system, the key elements of RL are specified as follows: • State: In our study, the computational requirements for the intensive application can be utilized to define the state space S as follows The offloading decision can be utilized to specify the action space A, in which selecting an action a t = {(y i ) t |i ∈ K} based on the s t can follow the policy π(a t |s t ).• Reward: The reward value is given by the objective function according to (Equation ( 11)) as part of our problem formulation.Therefore, the objective function value at time t can be calculated using policy π(a t |s t ) depending on the state s t and after selecting an action a t .Afterward, the same procedure is continuously repeated with the time index increasing as t = 1, 2, . . ., T. As a result, based on the findings of this study, the total reward r t is minimized using a policy π that can be defined as lim T→∞ 1 T ∑ T t=0 r t , where r t denotes the ω i in Equation ( 11).

Distributed Deep Reinforcement Learning-Based Algorithm
It is vital to highlight that distributed deep q-learning is an extended version of the deep Q-learning algorithm that incorporates a series set of deep neural networks (DNNs) capable of parallel processing and thereby deriving the most appropriate solution efficiently [45].This section presents a distributed deep reinforcement learning-based strategy that is presented to roughly decrease the total reward values shown in Equation (11).
The architecture of our proposed distributed deep reinforcement learning-based algorithm is shown in Figure 3, in which a B number of DNNs are used with a shared and fixed-size replay memory M. In addition, the application task's requirements are entered as input (i.e., system state) and the most appropriate offloading decision is obtained as output (i.e., offloading decision).More specifically, the system is provided by the state t .Moreover, from all the generated actions, the action with the least reward value can be selected as output regarding y * t = arg min b∈B Q(s t , y b t ) and then stored as transition experience (s t , y * t ) in replay memory M. Subsequently, according to the stored transitions, the DNNs can be trained and then updated via choosing a random sample of data from memory M.
In Algorithm 2, the procedure to derive the near-optimum task offloading solution is outlined.This procedure is as follows.First, we have B number of DNNs which are initialized with different random values of weights w b t , and the replay memory is assigned with an empty finite-size S.Then, at time t, the application task's requirements (i.e., α, β and Γ) is given as the input of state s t .Afterward, each DNN obtains the system state s t and generates a set of B actions according to f Each DNN uses the same input state s t .

5:
Generating a set of actions from the DNNs {a b t } = f w b t (s t ).

6:
Selecting the action with least value regarding a * t = arg min b∈B Q(s t , a b t ).

7:
Storing the values of transition (s t , a * t ) in the memory M

8:
Selecting a random sample of transitions from the replay memory.

Experiment Setup
We conducted our simulation on a desktop computer with 16 GB of RAM and using an Intel ® Core(TM) i7-4770 processor clocking at 3.4 GHz.Regarding the environment, there are 100 mobile devices distributed across five GBSs and three UAVs, and each mobile need to process an intensive application.In addition, there are 0.6 × 10 9 cycles per second on each mobile device, 10 × 10 9 cycles per second on edge servers, and 1 × 10 12 cycle per second on the cloud server.The transmission power for the mobile device is set to 0.2 Watts, and 10 MHz of bandwidth is available between the mobile device and edge servers.Each mobile device has randomly assigned (10,30) MB as the input data size, which follows the random distribution and has also assigned with 1900 cycles per byte for the computation requirements.Moreover, a further analysis was made of the time and energy for each mobile, which was determined to be 4.75 × 10 −7 seconds per bit and 3.25 × 10 −7 Joules per bit, respectively [46].It is estimated that there is a propagation delay of 15 ms between the edge servers and cloud.We have set the weight for both the execution time as well as the energy consumption as w t i = 0.5, and w e i = 0.5, meaning that the each device is considering both metrics.Furthermore, regarding the deep-learning algorithm's parameters, four layers are considered in each DNN, in which two of them are hidden layers with 120 and 80 neurons and the episode size, mini-batch size, learning rate and memory size are set as 20,000, 32, 0.01, and 1024, respectively.Following these specifications, we ran 50 simulations round and calculated average values.

Convergence Performance of System
This subsection illustrates the convergence performance for our work, where the appropriate value for each parameter has been selected for use in the remaining simulation based on applying different values for each parameter.
First, the parameter of learning rate is optimized in Figure 4, in which different values are used with regard to the ratio of reward.It is seen from the plot that the performance convergence is observed to be faster with a value of 0.01 as well as the speed is increasing with the increase of learning rate value.Nevertheless, the convergence of performance speed for the large value (i.e., 0.1) drops, whereas a local optimum solution occurs.Therefore, we have decided that 0.01 would be the most suitable value for learning, since it is a strong indicator of how well the learner will be able to adapt their behavior in a given situation.Secondly, the DNNs' parameter is optimized in Figure 5, in which different numbers are used with regard to the ratio of reward.It is clear that our suggested model can reach convergence more quickly as the number of DNNs increases.Moreover, with only three DNNs, the ratio of reward can reach 0.96 after 2000 steps of learning.Nevertheless, we observed that our proposed model could not converge well when only a few DNNs are used (i.e., DNNs = 1).Lastly, the parameter of batch size is optimized in Figure 6, in which different values are used with regard to the ratio of reward, where this parameter indicates the trained samples at each interval.The figure clearly demonstrates that the convergence with 32 is more rapid than with the other values.This is because the gradient descent direction tends to increase with decreasing mini-batch size values, leading to a more rapid updating of neural network weights.As a consequence, the batch size is determined to be 32, which appears to be optimal.

System Performance
As a means of demonstrating and validating the model, simulations were conducted under four different scenarios, which are as follows:

•
Local Policy (LP): In this policy, there is no offloading.The application's tasks are carried out locally on the device resources.
• Full Offloading Policy (FOP): This policy involves offloading the application's tasks to GBSs for remote processing.• Proposed Model Policy: In this policy, according to our proposed model, the application's tasks will be processed according to the offloading decision, which will minimize the total overhead of the system in the end.• Task Offloading Policy (TOP [47]): This policy is set up to handle the application's tasks for mobile users based on the model proposed in [47] whereby each mobile user should send their application's tasks to the connected GBSs in the event that it does not take the selection of GBSs into consideration.
First, Figure 7 depicts a comparison of total cost for various numbers of mobile users under the four different policies.According to the graph, the proposed model has the lowest system cost when compared to the other policies.With a small number of users, the total cost of TOP and FOP policies approaches that of the proposed model.However, there is an increasing gap among them as the number of users increases.Furthermore, when the number of users exceeds 60, the cost of the FOP policy exceeds the cost of the LP policy.The reason for this could be that the GBSs lack the computation capabilities to handle more users connected to them at the same time due to shared channels, which is one of the reasons for redistributing users among GBSs and using UAVs as assisting nodes, which significantly impacts the system's performance.Second, a comparison of total cost for different numbers of GBSs under the four different policies is illustrated in Figure 8.In light of this figure, it can be observed that LP is unaffected by the GBSs' number, whereas the other policies' costs are steadily decreasing.The LP policy does not utilize GBSs resources, whereas mobile users are allocated with more resources, resulting in a shorter processing time and thereby system cost.Furthermore, selecting the right GBSs and UAVs to perform transmission and processing tasks also has a significant impact on the system performance.Furthermore, Figure 9 illustrates the comparison of the successful task processing ratio under different numbers of users, where this ratio denotes the proportion of successfully completed tasks relative to the total number of tasks.It is observed from the figure that the ratio is 100% for a small number of users (i.e., less than 20) for the three policies.However, it steadily degrades as the number of users increases and reaches 85% and 71% for TOP and FOP policies, whereas it slightly decreases for our proposed model and reaches 98% for 100 users.This variation can be explained by the fact that for TOP and FOP policies, as the number of users increases, the available resources at GBSs become competitive among users, whereas our model can balance the load among servers and UAVs can be used to efficiently utilize the computation resources of edge servers.Finally, Figure 10 shows how the total cost of the four strategies was calculated for five different types of applications (shown in Table 3).The figure shows that the total cost of the applications in categories A, B, and C for the FOP policy is higher than those in the other scenarios, whereas those in the LP policy for the other applications (D and E) are higher than those in the other policies.The reason for this is that the communication requirements for applications A, B, and C exceed the computation requirements.As a result, for these communication-intensive applications, LP policy is a better choice.D and E, on the other hand, are computationally intensive, so offloading policies are the best option.Furthermore, balancing the load among GBSs and employing UAVs as assistant nodes can improve the proposed model's performance compared with TOP policy.

Conclusions
For multi-user, multi-tier UAV-aided MEC systems, an integrated model of load balancing, resource allocation, and task offloading is proposed.In this model, an effective load-balancing model is designed to optimize the load among ground MEC servers by handing off users in the intersection area between GBSs to the most suitable one.In addition, UAVs are utilized as potential MEC servers to provide communication and computation resources by hovering over crowded areas where the ground-based MEC server is still overloaded.In addition, task offloading, load balancing, and resource allocation are jointly optimized via a formulation of an integer problem with the primary objective of minimizing system cost.This formulation is of the NP-hard variety, which is challenging to solve in polynomial time.For this problem, a novel form of deep reinforcement learning is presented in which the application task's requirements represent the system state and the offloading decision is used to define the action.The solution is then derived using an efficient distributed deep reinforcement-learning-based algorithm.In conclusion, experimental results demonstrate that our model converges quickly and significantly reduces system cost (i.e., about 41.9%, 44.2%, and 11%) compared to local execution, full offloading policies, and the task offloading work in [47].
s t and then each DNN can generate an offloading action y b t regarding f w b t : s t → y b t , where b ∈ B denotes as the index of DNN and f w b t represents the b th DNN with the weight value w b

w b t :Figure 3 .Algorithm 2 1 :
Figure 3. Architecture of the proposed distributed deep reinforcement learning-based algorithm.

Figure 4 .
Figure 4. Convergence of performance over different learning rate values.

Figure 5 .
Figure 5. Convergence of performance over different number of DNNs.

Figure 6 .
Figure 6.Convergence of performance over different number of batch sizes.

Figure 7 .
Figure 7.A comparison of total cost for different numbers of mobile users (M = 5, N = 3).

Figure 9 .
Figure 9.A comparison of successful task processing ratio under different numbers of users (K = 5, N = 3).

Table 1 .
A related work comparison.