Task Offloading and Data Compression Collaboration Optimization for UAV Swarm-Enabled Mobile Edge Computing

Hu, Zhijuan; Liu, Shuangyu; Zhou, Dongsheng; Shen, Chao; Wang, Tingting

doi:10.3390/drones9040288

Open AccessArticle

Task Offloading and Data Compression Collaboration Optimization for UAV Swarm-Enabled Mobile Edge Computing

by

Zhijuan Hu

^1,*,†,

Shuangyu Liu

^1,†,

Dongsheng Zhou

^1,†,

Chao Shen

^1,* and

Tingting Wang

²

¹

School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China

²

School of Telecommunications Engineering, Xidian University, Xi’an 710071, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Drones 2025, 9(4), 288; https://doi.org/10.3390/drones9040288

Submission received: 1 March 2025 / Revised: 25 March 2025 / Accepted: 7 April 2025 / Published: 9 April 2025

(This article belongs to the Special Issue Unmanned Aerial Vehicle Swarm-Enabled Edge Computing)

Download

Browse Figures

Versions Notes

Abstract

The combination of Unmanned Aerial Vehicles (UAVs) and Mobile Edge Computing (MEC) effectively meets the demands of user equipments (UEs) for high-quality computing services, low energy consumption, and low latency. However, in complex environments such as disaster rescue scenarios, a single UAV is still constrained by limited transmission power and computing resources, making it difficult to efficiently complete computational tasks. To address this issue, we propose a UAV swarm-enabled MEC system that integrates data compression technology, in which the only swarm head UAV (USH) offloads the compressed computing tasks compressed by the UEs and partially distributes them to the swarm member UAV (USM) for collaborative processing. To minimize the total energy and time cost of the system, we utilize Markov Decision Process (MDP) for modeling and construct a deep deterministic policy gradient offloading algorithm with a prioritized experience replay mechanism (PER-DDPG) to jointly optimize compression ratio, task offloading rate, resource allocation and swarm positioning. Simulation results show that compared with deep Q-network (DQN) and deep deterministic policy gradient (DDPG) baseline algorithms, the proposed scheme performs excellently in terms of convergence and robustness, reducing system latency and energy consumption by about 32.7%.

Keywords:

UAV swarm; mobile edge computing; computational offloading; data compression; deep reinforcement learning

1. Introduction

The rapid development of the Internet of Things (IoT) and new communication technologies has driven the explosive growth of mobile devices, which has been accompanied by a wide variety of computationally intensive and latency-sensitive applications [1,2]. In [3], Zhou et al. emphasize the significance of the convergence of communication, computing, and caching (3C) for future mobile networks, especially in the context of 6G. As an extension of cloud computing, Mobile Edge Computing (MEC) [4] offloads compute-intensive tasks to servers deployed at the edge of the wire access networks, meeting the demand for low-latency communication and computing in an energy-saving and efficient manner [5,6,7]. In [8], Cai et al. analyze the challenges in collaborating among different edge computing paradigms and propose solutions such as blockchain-based smart contracts and software-defined networking/network function virtualization (SDN/NFV) to tackle resource isolation. In the early stages, fixed base stations (BSs) were mainly used as an edge server to provide computational services for mobile users [9]. For example, in [10], Zhang et al. equipped a Macro Base Station (MBS) with an edge server with better processing capability, and optimized the offloading of computing tasks in fifth-generation (5G) heterogeneous networks while satisfying the latency constraints. However, in practical situations such as complex environments, remote areas or disaster relief sites, terrestrial MEC systems with fixed locations usually suffer from severe performance degradation or even inoperativity due to communication obstruction [11]. Fortunately, this dilemma is solved by unmanned aerial vehicle-enabled MEC. With the characteristics of high mobility, light weight, and no geographical constraints, we can utilize unmanned aerial vehicles (UAVs) as edge nodes to fulfill computational requirements across various scenarios [12]. Wang et al. [13] presented an iterative robust enhancement algorithm for a UAV-assisted ground–air cooperative MEC system, which minimized the system energy consumption by collectively optimizing the task offloading and flight trajectory of UAVs in a non-convex manner. To meet the low-latency demand for computational tasks in wireless sensor networks (WSNs), Yang et al. [14] constructed a framework where edge UAV nodes carry intelligent surfaces (STAR-RISs) for computing offloading and content caching. By cooperatively optimizing UAV position, offloading strategy and passive beam-forming, they reduced the energy consumption of the system while achieving good convergence.

Nevertheless, due to the shortcomings of limited transmission power, low load capacity and few computing resources, traditional single or multiple UAVs cannot efficiently complete numerous multifarious tasks [15]. In this case, UAV swarms are considered a viable solution. By managing and assigning tasks in cooperation under the leadership of the control unit, we can maximize energy utilization and minimize delay. With this advantage, UAV swarm-assisted MEC has become a popular trend in current research [16,17]. In [18], an MEC system composed solely of UAVs is presented, in which member UAVs generate tasks, helper UAVs offload computation, and a header UAV aggregates the processed results. To mitigate the total energy expenditure of the UAV swarm, the authors optimized the communication channels, offloading objects and offloading rate through a two-stage resource allocation algorithm. Seid et al. [19] considered a UAV cluster under an aerial-to-ground (A2G) network in response to emergencies, where member UAVs, under the coordination of header UCH, unload and compute independent tasks generated by Edge Internet of Things (EIoT) devices. Furthermore, a scheme based on deep reinforcement learning (DRL) for coordination computation offloading and resource allocation was proposed to control the total computation cost in terms of energy consumption and latency of EIoT, BSs, and UCH. In the research of Li et al. [20], UAVs are allowed to dynamically gather into multiple swarms to assist in completing MEC. In order to maximize the long-term energy efficiency of the system, a comprehensive optimization problem addressing dynamic clustering and scheduling was developed, and a UAV swarm dynamic coordination method based on reinforcement learning is developed to attain equilibrium. To match mobile devices and plan UAV trajectories, Miao et al. [21] put forward a global and local path planning algorithm based on ground station and onboard computer control in the framework of UAV swarm-assisted MEC task offloading, which minimizes the energy loss of global path planning for the UAV cluster while maximizing access and minimizing task completion delay. The issue of computational offloading and routing selection of UAV swarms under the UAV-Edge-Cloud computing architecture was highlighted in [22], in which the authors developed a polynomial near-optimal approximation algorithm using Markov approximation techniques [23], driven by the goal of maximizing the throughput while minimizing the routing and computational costs. To summarize, the research on UAV- or UAV swarm-assisted MEC for task offloading and resource allocation has been fully launched, but it is more reflected in scenario transformation and method update. In fact, the computational performance is related to the task size. Due to the temporal or spatial correlation of the tasks, the raw data generated by the user usually contain duplicates [24], which causes a waste of computational time and resources. Thus, it is necessary to perform preprocessing, such as data compression, before task offloading to reduce the burden of the MEC and speed up the task completion [25,26]. Cheng et al. in [26] investigated a scenario where a single UAV serves multiple wireless devices. The block coordinate descent (BCD) method is adopted to decompose the energy minimization problem into several subproblems for solution.

As an important field of modern information technology, data compression encodes data while retaining useful information, with the aim of reducing the size of data during storage and transmission [27,28,29]. Typically, data compression is categorized into lossless and loss compression. Lossless compression maintains the integrity of the data, i.e., the decompressed data match the original data exactly, and loss compression sacrifices tiny details that are difficult for the human brain to perceive in order to achieve a higher compression ratio [30,31]. In the practical application of MEC, research on data compression technology has already made some progress. Han et al. [32] considered introducing data compression to reduce offloading delay when using MEC to increase the computational energy of blockchain mining nodes, and then proposed a block coordinate descent (BCD) iterative algorithm to solve the problem of minimizing the total energy consumption of the system. In order to more accurately optimize the latency of the MEC system with multiple users and servers, Liang et al. [33] performed lossless compression before task offloading and associated the reliability of edge servers with delay and energy consumption [34], and then designed a distributed computational offloading strategy to make up for the shortcoming of centralized algorithms. Regarding the enhancement of data transmission efficiency in non-orthogonal multiple access (NOMA) MEC systems, Tu et al. [35] proposed a partial compression mechanism, which allows users to send tasks to the BS in two ways: lossless compression followed by offloading, and direct offloading without preprocessing. Ultimately, this approach achieved a good effect. Ding et al. [36] designed a multi-BS intelligent MEC system for the power grid, which utilizes two-level compression to ensure energy saving. Specifically, the small BSs receive information collected and losslessly compressed by MDs, and then transmit it to the micro-BS for processing after applying the same compression method. So far, it can be seen that MEC services benefit greatly from data compression strategies. Since the transmission of tasks typically consumes more energy than computation, data compression techniques are more advantageous in situations where the onboard energy budget is limited, especially in UAV swarm-supported MEC systems.

In this manuscript, we study the task offloading strategy of an MEC system that incorporates UAV swarms and data compression. The considered scenario focuses on the abnormal situation of BSs, mainly including one head UAV (USH), several member UAVs (USMs), and some randomly scattered user equipments (UEs). Since both the energy reserves and the processing capabilities of the edge servers carried are higher than those of the USMs, the USH can fly close to the UEs and offload tasks exceeding their capacity, then redistribute a small portion to the USM. In the process of data transmission, orthogonal frequency division multiple access (OFDMA) technology is employed to improve communication efficiency. To save communication bandwidth and maintain the value of the data, UEs first perform lossless compression on the calculated data, and the USH performs the corresponding decompression operation. Next, we formulate the task unloading strategy to the problem of minimizing the total energy and time expenditure, and adopt the Markov decision process model (MDP), the preferential experience playback mechanism (PER) and DRL to optimize the resource allocation, unloading rate and data compression ratio. Finally, simulation experiments are built to demonstrate the advantages of the proposed deep deterministic policy gradient offloading scheme with a prioritized experience replay mechanism (PER-DDPG) in saving system energy consumption, reducing latency and improving resource utilization efficiency. The main contributions are summarized as follows.

An MEC system with multi-user and multi-auxiliary co-existence (e.g., UAV swarm, task offloading, and lossless data compression) is proposed, in which UAVs can be classified into USH and USM according to their functional differences. Through a detailed analysis of task transmission, USH movement, computation execute and onboard energy, etc., an optimization problem of collaborative task offloading and data compression was formulated.
The optimization issue is modeled as MDP, and the action space, state space, and reward function are tailored to the specific requirements of the UAV swarm-enabled MEC system. Then, the deep deterministic strategy gradient (DDPG) algorithm in DRL is adopted to solve this issue, thus saving the delay and energy consumption during offloading. Furthermore, preferential experience replay is introduced into the DDPG algorithm to improve the efficiency of experience replay and speed up the training process, which increases the stability of the training process and is less sensitive to the changes of some hyperparameters.
Extensive numerical simulations are carried out to verify the convergence, stability and effectiveness of the proposed algorithm. As the total number of tasks increases, the proposed algorithm can obtain a lower cumulative system reward than other baseline comparison algorithms. In addition, compared with the non-compressed scheme, the compressed scheme can better save system costs and reduce the system delay and energy consumption when there are more user tasks.

The remainder of this manuscript is structured as follows. Section 2 outlines the system model and formulates an optimization problem to minimize system energy usage and delay. Section 3 builds the MDP model and proposes the PER-DDPG offloading algorithm to solve the above problems. In Section 4, the performance of our proposed scheme is shown and compared with other schemes. Finally, we give the conclusion and future work in Section 5.

2. System Model

We consider an MEC network architecture enabled by a cluster of UAVs, as illustrated in Figure 1, which mainly comprises K UEs and N UAVs. The UEs are arbitrarily scattered throughout a rectangular area on the ground, and always generate computational tasks that are too large for them to handle. In this case, an aerial UAV swarm containing a USH and several USMs can provide support. The USMs are randomly distributed within a 300 m radius around the USH while maintaining a minimum separation distance to ensure collision avoidance, where the USH selects the nearest USM for task offloading based on the shortest-path principle. Compared to USM, USH has more processing power, so USH is responsible for establishing a communication link with the UE and offloading most of the computational task, then assigning a limited number of tasks to the USM and keeping the rest to itself. Moreover, to optimize network transmission, reducing both time delay and energy expenditure, and ensure the accuracy and completeness of the computation results, it is necessary to implement a lossless compression technique on UE and then decompress it accordingly at USM. Let

K = {1, 2, \dots, K}

and

N = {1, 2, \dots, N}

denotes the set of UEs and USMs. To facilitate the analysis of the dynamic changes of the system, we utilize quasistatic network scenarios to model and discretize an observation period D into T slots with equal length

D / T

, indexed by

T = {1, 2, \dots, T}

.

2.1. Data Compression

Assuming that the computational tasks generated by the UE k in time slot t are

A_{k} (t) = {D_{k} (t), C_{k} (t)}

, where

D_{k} (t)

and

C_{k} (t)

represent the total amount of data to be processed and the CPU cycles consumed by

D_{k} (t)

, respectively, while the proportion of the local computation data to the total task is

a_{k} (t) \in [0, 1]

, then the amount of computation data that USH and USM must complete together is expressed as

D_{k} (t) (1 - a_{k} (t))

.

In order to prevent the loss of information contained in the computing tasks, and also to reduce the resources consumed by data transmission, it is necessary to use lossless compression technology on the UE side. Correspondingly, when the USH completes the offloading, the task reconstruction is performed first and then the computing is carried out. We use a continuous variable

C R_{k} (t) \in [1, C R_{k}^{m a x}]

to represent the compression ratio of the lossless compression algorithm adopted by UE, where

C R_{k}^{m a x}

is the maximum compression ratio considering the network requirements, then the amount of data sent by the UE k to the USH is reduced to

D_{k} (t) (1 - a_{k} (t)) / C R_{k} (t)

. As given in [26,37], the CPU cycles required to compress 1 bit of the original task can be approximated as

μ_{C R_{k} (t), ϵ} (t) = e^{ϵ C R_{k} (t)} - e^{ϵ},

(1)

where

ϵ

is a positive constant hinge on the lossless compression technique. Let

f_{k} (t)

stand for the computing capability of UE k in time slot t, then the delay and energy expenditure due to the implementation of compression are derived by

T_{k}^{c p r} (t) = \frac{D_{k} (t) (1 - a_{k} (t)) μ_{C R_{k} (t), ϵ} (t)}{f_{k} (t)},

(2)

E_{k}^{c p r} = η f_{k}^{3} (t) T_{k}^{c p r} (t) = η f_{k}^{2} (t) D_{k} (t) (1 - a_{k} (t)) μ_{C R_{k} (t), ϵ} (t),

(3)

where

η

is the computational efficiency of

U E

, while we assume that each UE belonging to

K

has the same

η

. In addition, data decompression is much easier than compression, and the resulting latency is much smaller, so we can reasonably ignore it [26,38].

2.2. Task Transmission

Defining a 3D coordinate system with the ground as the reference plane, the location of UE k in time slot t is indicated by

L_{k} (t) = (x_{k} (t), y_{k} (t), 0)

. At this point, the coordinates of USH and USM n are

L_{U S H} (t) = (x_{U S H} (t), y_{U S H} (t), H)

and

L_{n} (t) = (x_{n} (t), y_{n} (t), H)

, respectively, where

n \in N

and H is the current height of USH and USM relative to the ground.

2.2.1. Communication Between UEs and USH

Considering the impact of obstacles such as buildings, cars and terrain on radio propagation, we model the communication channel between UEs and USH as a superposition of the path loss of line of sight (LoS) and non-line of sight (NLoS) with different probabilities of occurrence. Let f and c represent the carrier frequency and the speed of light, while

ζ^{L o S}

and

ζ^{N L o S}

denote the average additional attenuation, then the path loss

P L_{k, U S H}^{L o S}

and

P L_{k, U S H}^{N L o S}

for LoS and NLoS at time slot t are expressed as

P L_{k, U S H}^{L o S} (t) = 20 l o g \frac{4 π f d_{k, U S H} (t)}{c} + ζ^{L o S},

(4)

P L_{k, U S H}^{N L o S} (t) = 20 l o g \frac{4 π f d_{k, U S H} (t)}{c} + ζ^{N L o S},

(5)

where

d_{k, U S H} (t)

is the distance from UE k to USH, and can be acquired by

d_{k, U S H} (t) = | | L_{U S H} (t) - L_{k} (t) {| |}_{2} .

(6)

Next, the probability of existing LoS and NLoS connections between UE k and USH can be calculated by

P r_{k, U S H}^{L o S} (t) = \frac{1}{1 + a exp (- b (arcsin \frac{H}{d_{k, U S H} (t)} - a))},

(7)

P r_{k, U S H}^{N L o S} (t) = 1 - P r_{k, U S H}^{L o S} (t),

(8)

where a and b are variables that hinge on the operating environment. So far, the average path loss between UE and USH is formulated as

P L_{k, U S H} (t) = P r_{k, U S H}^{L o S} (t) P L_{k, U S H}^{N L o S} (t) + P r_{k, U S H}^{N L o S} (t) P L_{k, U S H}^{N L o S} (t) .

(9)

In order to avoid interference between channels, an orthogonal access scheme is used for both uplink and downlink communications. Assuming that the bandwidth resources allocated by USH for each UE are B, and the noise power of the communication link is

P_{k}

and

σ^{2}

, respectively, the wireless transmission rate of data between USH and UE k at time slot t is denoted as

r_{k, U S H} (t) = B {log}_{2} (1 + \frac{P_{k} P L_{k, U S H} (t)}{σ^{2}}) .

(10)

Notably, our proposed framework employs a large-scale fading model that achieves analytical tractability. For enhanced modeling precision, the system accommodates potential integration of small-scale fading components, particularly Rayleigh and Rician fading variants, as well as advanced path loss modeling techniques [39].

2.2.2. Communication Between USH and USMs

As part of the swarm, both USH and USMs communicate and compute in the air at height H, which can be approximated as the presence of only LoS components. If the communication bandwidth provided by the USH for each USM is B, the data transfer rate between USM n and the USH can be denoted as

r_{U S H, n} (t) = B {log}_{2} (1 + \frac{P_{U S H} G_{U S H}}{σ^{2} | | L_{n} (t) - L_{U S H} {(t) | |}_{2}^{2}}),

(11)

as long as it satisfies

n \in N

, where

P_{U S H}

and

G_{U S H}

represent the data transmission power of the USH and the average channel power gain at a reference distance of 1m, respectively.

2.3. USH Movement

The location of USH at the current time slot t is known to be

L_{U S H} (t) = (x_{U S H} (t)

,

y_{U S H} (t), H)

, then after USH has flown a time slot of length

Δ_{f l y}

with uniform speed

v (t)

, its location information is changed to

L_{U S H} (t + 1) = (x_{U S H} (t) + v (t) Δ_{f l y} c o s (ψ (t)), y_{U S H} (t) + v (t) Δ_{f l y} s i n (ψ (t)), H)

, where

ψ (t)

is the angle formed by the projection of

v (t)

in the horizontal plane with the x-axis and satisfies

ψ (t) \in [0, 2 π]

. If we use

V_{U S H}^{m a x}

to denote the maximum flight speed that USH can achieve, there is the following constraint:

| | L_{U S H} (t + 1) - L_{U S H} (t) {| |}_{2} \leq V_{U S H}^{m a x} Δ_{f l y} .

(12)

It is important to note that the USH must avoid collisions with any USM during flight. To ensure safe operation within the same horizontal plane at a fixed altitude, we define a maximum distance threshold

D_{U S H, n}^{m a x}

between the USH and USM. External obstacles (e.g., buildings) are not considered in this study, as UAVs are assumed to operate at a sufficient height to avoid interference from ground-based objects.

| | L_{n} (t) - L_{U S H} (t) {| |}_{2} \geq D_{U S H, n}^{m a x} .

(13)

To ensure that USH can smoothly offload the computational tasks generated by UEs, the UEs must be within the coverage area of USH. Assuming that the coverage area of USH is a circle with radius

R_{U S H}^{m a x} (t)

, and using

2 ϕ (t)

to denote the maximum coverage angle of USH at time slot t, then we have

ϕ (t) \in (0, π / 2)

and

R_{U S H}^{m a x} (t) = H t a n (ϕ (t))

. Meanwhile, the location of UE k must satisfy the constraints

\sqrt{{(x_{k} (t) - x_{U S H} (t))}^{2} + {(y_{k} (t) - y_{U S H} (t))}^{2}} \leq R_{U S H}^{m a x} (t) .

(14)

2.4. Task Execution

In the MEC system proposed in this manuscript, UEs, USH and USMs jointly complete the computing task. For each

t \in T

, USH is preset to establish communication with only one UE or USM. Let

α_{k} (t)

indicate the connection status between the UE k and

U S H

, and

β_{n} (t)

indicates that between the USH and USM n, respectively, specifically as follows:

α_{k} (t) = \{\begin{matrix} 1, & UE k offloading tasks to USH, \\ 0, & UE k computing locally, \end{matrix}

(15)

β_{n} (t) = \{\begin{matrix} 1, & USM n offloading tasks from USH, \\ 0, & USM n keeping out of tasks, \end{matrix}

(16)

and with the constraints:

0 \leq \sum_{k = 1}^{K} α_{k} (t) \leq 1,

(17)

0 \leq \sum_{n = 1}^{N} β_{n} (t) \leq 1 .

(18)

2.4.1. UE Implementation

As previously set,

C_{k} (t)

represents the amount of CPU cycles needed for UE k to perform the entire task independently,

α_{k} (t)

is the proportion of computing tasks assigned to UE k in the total,

f_{k} (t)

is the computing capability of UE k, and

η

is the computational efficiency of UE; we can obtain the time delay and energy consumption caused by local computing as follows:

T_{k}^{c o m} (t) = \frac{C_{k} (t) a_{k} (t)}{f_{k} (t)},

(19)

E_{k}^{c o m} (t) = η f_{k}^{3} (t) T_{k}^{c o m} (t) = η f_{k}^{2} (t) C_{k} (t) a_{k} (t) .

(20)

In addition, the offloading of compressed data from the UE to the USH also generates time and energy consumption, which we represent as:

T_{k, U S H}^{o f f} (t) = \frac{D_{k} (t) (1 - a_{k} (t))}{C R_{k} (t) r_{k, U S H} (t)},

(21)

E_{k, U S H}^{o f f} (t) = P_{k} T_{k, U S H}^{o f f} (t),

(22)

where

P_{k}

is the transmitting power of the UE k.

2.4.2. USH Implementation

After the received task is decompressed, USH divides it into two parts; one is offloaded to USMs, and the other is left to calculate for itself. The corresponding delays of the former and the latter are expressed as

T_{U S H, n}^{o f f} (t) = \frac{D_{k} (t) (1 - a_{k} (t)) b_{n} (t)}{r_{U S H, n} (t)},

(23)

T_{U S H}^{c o m} (t) = \frac{C_{k} (t) (1 - a_{k} (t)) (1 - b_{n} (t))}{f_{U S H} (t)},

(24)

where

b_{n} (t)

is the proportion of the task assigned to the USM n by USH in the time slot t and satisfies

b_{n} (t) \in [0, 1]

, while

f_{U S H} (t)

represents the computing capability of USH. Let

P_{U S H}

be the transmit power of USH in time slot t; USH unloading and the calculation of the energy consumption can be expressed as

E_{U S H, n}^{o f f} (t) = P_{U S H} T_{U S H, n}^{o f f} (t)

(25)

E_{U S H}^{c o m} (t) = η f_{U S H}^{3} (t) T_{U S H}^{c o m} (t) .

(26)

The USH performs calculations and communication while flying, so the flight time has no effect on the system delay but the flight energy consumption is a factor worth adding. We utilize

M_{U S H}

to denote the mass of the USH; the flight energy consumption of USH in time slot t is

E_{U S H}^{f l y} (t) = 0.5 M_{U S H} Δ_{f l y} v^{2} (t) .

(27)

2.4.3. USM Implementation

Compared with UEs and USH, which both transmit tasks and calculate, USM has a relatively simple work type and only needs to execute computing task offloading from USH. Let

f_{n} (t)

represent the computation capacity of USM; we can obtain the delay and energy usage of USM n in time slot t as

T_{n}^{c o m} (t) = \frac{C_{k} (t) (1 - a_{k} (t)) b_{n} (t)}{f_{n} (t)},

(28)

E_{U S H}^{c o m} (t) = η f_{n}^{3} (t) T_{n}^{c o m} (t) = η f_{n}^{2} (t) C_{k} (t) (1 - a_{k} (t)) b_{n} (t) .

(29)

2.5. Problem Formulation

After determining the respective tasks, regardless of the UEs or USH, the modules used for computation and transmission are separated, which means that computation and transmission can be carried out simultaneously without interfering with each other; thus, the overall system delay is derived by

\begin{matrix} T_{t o t a l} (t) = \sum_{k = 1}^{K} \sum_{n = 1}^{N} α_{k} (t) β_{n} (t) m a x & {T_{k}^{c o m} (t) + T_{k}^{c p r} (t), T_{k, U S H}^{o f f} (t) \\ + m a x {T_{U S H}^{c o m} (t), T_{U S H, n}^{o f f} (t) + T_{n}^{c o m} (t)}} . \end{matrix}

(30)

Regarding the overall energy usage of the system, it involves the components used by the UEs, USH and USMs for compression, communication and computation, expressed as

\begin{matrix} E_{t o t a l} (t) = \sum_{k = 1}^{K} \sum_{n = 1}^{N} α_{k} (t) β_{n} (t) & (E_{k}^{c o m} (t) + E_{k}^{c p r} (t) + E_{k, U S H}^{o f f} (t) + E_{U S H}^{c o m} (t) \\ + E_{U S H}^{f l y} (t) + E_{U S H, n}^{o f f} (t) + E_{n}^{c o m} (t)) . \end{matrix}

(31)

Our objective is to reduce the delay and energy usage of the entire system, which can be achieved by jointly optimizing parameters such as the compression ratio

C R_{k} (t)

, the association status during task offloading

α_{k} (t)

and

β_{n} (t)

, the offloading rates

a_{k} (t)

and

b_{n} (t)

, and the USH flight trajectory M. Utilizing the wights

ω_{1}

and

ω_{2}

to indicate the degree of emphasis on delay and energy consumption, while

E_{k}^{m a x}

,

E_{U S H}^{m a x}

and

E_{n}^{m a x}

represent the energy possessed by UE, USH and USM in the system cycle, the optimization problem of the UAV swarm-enabled MEC system is formulated as

\begin{matrix} \min_{{C R_{k} (t), α_{k} (t), a_{k} (t), M, β_{n} (t), b_{n} (t)}} & {\sum_{t = 1}^{T} (ω_{1} T_{t o t a l} (t) + ω_{2} E_{t o t a l} (t))}, \end{matrix}

(32a)

\begin{matrix} s . t . & (12), (13), (14), (17), (18), \end{matrix}

(32b)

\begin{matrix} 0 < ϕ < π / 2,, \forall t \in T, \end{matrix}

(32c)

\begin{matrix} 1 \leq C R_{k} (t) \leq C R_{k}^{m a x}, \forall t \in T, \end{matrix}

(32d)

\begin{matrix} 0 \leq a_{k} (t) \leq 1, \forall k \in K, \forall t \in T, \end{matrix}

(32e)

\begin{matrix} 0 \leq b_{n} (t) \leq 1, \forall n \in N, \forall t \in T, \end{matrix}

(32f)

\begin{matrix} \sum_{t = 1}^{T} (E_{k}^{c p r} (t) + E_{k}^{c o m} (t) + E_{k, U S H}^{o f f} (t)) \leq E_{k}^{m a x}, \forall k \in K \end{matrix}

(32g)

\begin{matrix} \sum_{t = 1}^{T} \sum_{k = 1}^{K} \sum_{n = 1}^{N} (E_{U S H}^{f l y} {(t)}^{+} E_{U S H}^{c o m} (t) + E_{U S H, n}^{o f f} (t)) \leq E_{U S H}^{m a x}, \end{matrix}

(32h)

\begin{matrix} \sum_{t = 1}^{T} \sum_{n = 1}^{N} E_{n}^{c o m} (t) \leq E_{n}^{m a x}, \forall n \in N . \end{matrix}

(32i)

where Equations (32b) and (32c) constrain the flight speed, safe distance, coverage area and communication status of the USH, while (32d), (32e) and (32f) limit the compression ratio and task offloading proportion. Moreover, the energy consumption requirements of UE, USH and USM are also specified by (32g), (32h) and (32i).

3. Computing Offloading Strategy

The optimization problem in (32) is non-convex and NP-hard. Moreover, it involves a dynamic model that jointly optimizes task compression ratios, power control, and resource allocation for multiple UEs. Since the decision-making scheme in each time slot depends on real-time system states, traditional optimization algorithms struggle to achieve the global optimal solution. To address this, we model the problem as an MDP and propose a computational offloading algorithm based on PER-DDPG, which seeks a more efficient and energy-saving strategy.

3.1. Construction of MDP

The optimization problem is formulated as an MDP defined by the quintuple (

S, A, P, R, γ

), where

S

,

A

,

P

,

R

and

γ

correspond to the state space, action space, state transition probability, reward function and discount factor, respectively. In an MDP, the decisions and actions of the agent are influenced by the environment, and also affect it own rewards. This interaction can be constructed as a time-slot-based sequence, in which in each time slot, the agent makes decisions based on its current state. Next, we describe the state, action, and reward function for the agent.

3.1.1. State Space

The energy loss and latency of the entire MEC system mainly involve three aspects. Firstly, the energy and time overhead of the UE k during compression and offloading are directly related to the initial task

A_{k}

. Secondly, the location of USH and UE influences the transmission rate from UE to USH, thereby indirectly affecting the offloading delay and energy consumption of UE. In addition, the offloading decisions of UE k and USH must take into account their available energy

{\hat{E}}_{k} (t)

and

{\hat{E}}_{U S H} (t)

in the current time slot t. Therefore, the state space

s_{t}

can be represented as:

\begin{matrix} s_{t} = {A_{k} (t), L_{k} (t), L_{U S H} (t), {\hat{E}}_{k} (t), {\hat{E}}_{U S H} (t)} . \end{matrix}

(33)

3.1.2. Action Space

The task compression rate

C R_{k} (t)

of the UE at time slot t, the communication connection

α_{k} (t)

between USH and UE k, the task offloading rate

a_{k} (t)

and

b (t)

of UE and USH, as well as the move angle

ψ (t)

and speed

v (t)

of USH together constitute the action space

a_{t}

, that is,

a_{t} = {C R_{k} (t), α_{k} (t), a_{k} (t), b (t), ψ (t), v (t)} .

(34)

3.1.3. Reward Design

According to the reward value, the agent carries out learning and action selection. Since the goal of this manuscript is to maximize the reward by minimizing the system energy consumption and delay, the reward function can be designed as follows:

r_{t} = r (s_{t}, a_{t}) = - \sum_{t = 1}^{T} (ω_{1} T_{t o t a l} (t) + ω_{2} E_{t o t a l} (t)) .

(35)

3.2. Solution Based on PER-DDPG

In dynamic environments where UE positions and task requirements are constantly changing, DRL can continuously learn and adapt to these dynamic scenarios through the ongoing interactions between the agent and the environment, thereby obtaining the optimal solution to the optimization problem. Based on the Actor–Critic method, the DDPG algorithm combines the ideas of the deterministic policy gradient (DPG) and the deep Q-learning (DQN) algorithm, and adopts a dual neural network structure. As shown in the framework in Figure 2, the training data for the DDPG algorithm are generated during the interaction with the environment.

In view of the high continuity and dimensionality of state and action space, the PER mechanism is incorporated into the DDPG algorithm to improve sample efficiency, accelerate learning convergence, and mitigate overfitting risks. Let

N_{0}

represent the random noise and

ρ

decays with time slot t, then the DDPG algorithm selects the action

a_{t} = μ (s_{t} ∣ θ^{μ}) + ρ N_{0}

according to the current state

s_{t}

. Immediately afterwards, by interacting with the environment, the reward

r_{t}

and the next state

s_{t + 1}

can be obtained. Next, the DDPG algorithm puts the tuple

(s_{t}, a_{t}, r_{t}, s_{t + 1})

into the experience replay buffer, and randomly extracts a small batch of experience data for training during each learning iteration. After the Q-network outputs the Q-value for the current state–action pair, the target Q-network also obtains the target Q-value based on the state

s_{t + 1}

in the next time slot and the next action

a_{t + 1}

estimated by the target policy network. In order to make the estimated Q-value closer to the target Q-value, the Q-network compares their mean square error (MSE)and updates the network parameters. The loss function we applied is:

L (θ^{Q}) = \frac{1}{M} \sum_{i} {(δ_{i})}^{2},

(36)

where M represents the number of mini-batch samples, and

δ_{i}

denotes the TD-error, which is used to evaluate the difference between the value of the current state and that of the next state. Let

θ^{Q}

and

θ^{Q^{'}}

represent the parameters of the Q-network and the target Q-network, respectively, then

δ

can be expressed as:

δ_{i} = r_{i} + γ Q (s_{i + 1}, μ^{'} (s_{i + 1}) | θ^{Q^{'}}) - Q (s_{i}, a_{i} | θ^{Q}) .

(37)

Combining the current state

s_{i}

, the action

a_{i}

and the Q-value output by the Q-network, we obtain the update gradient of the policy network as follows:

\nabla_{θ^{μ}} J \approx \frac{1}{M} \sum_{i} [\nabla_{a} Q (s, a | θ^{Q}) {|_{s = s_{i}, a = μ (s_{i})} \nabla_{θ^{μ}} μ (s | θ^{μ}) |}_{s_{i}}] .

(38)

Assuming

ξ \in (0, 1)

is a soft update parameter that controls the speed of network updates and influences the stability of the network, then through the soft update mechanism, upgrade the target policy network and target Q network parameters:

\begin{matrix} θ^{μ^{'}} = ξ θ^{μ} + (1 - ξ) θ^{μ^{'}}, \end{matrix}

(39a)

\begin{matrix} θ^{Q^{'}} = ξ θ^{Q} + (1 - ξ) θ^{Q^{'}} . \end{matrix}

(39b)

To ensure that high-value experiences can be retrieved with a higher probability, Prioritized Experience Replay (PER) assigns higher priority to particularly successful or unsuccessful experiences when extracting experience. However, this can lead to overfitting, so the PER mechanism usually also introduces a certain probability of low-value experience. By dynamically adjusting the probability of extracting experience, it not only helps the intelligent agent to understand the environment more comprehensively and avoid repeating mistakes, but can also help it utilize the learning potential of high-value experience to find better strategies in the process of exploration. The sampling probability of an experience sample can be expressed as:

P (i) = \frac{p_{i}^{α}}{\sum_{j = 1}^{I} p_{j}^{α}, α \in [0, 1]},

(40)

where

p_{i}

is the priority metric based on TD-error, and I represents the episode being carried out. As a priority adjustment parameter,

α

is a random factor set when selecting experience, in order to give the opportunity to extract experience samples with small TD-error, thus maintaining sample diversity. If

α = 1

, the TD-error value is used directly; if

α = 0

, it corresponds to the original uniform random sampling. The priority metric is based on the following ranking method:

p_{i} = \frac{1}{r a n k (i)}, p_{i} > 0 .

(41)

The agent prioritizes updating experience samples that have high TD-error values. However, this method introduces errors into the model, resulting in distortion of the traditional probability distribution, potentially hindering the convergence of the neural network during training. To alleviate this problem, importance sampling is used to correct the weight changes as follows:

W_{i} = {(\frac{1}{R \cdot p_{i}})}^{β},

(42)

where

β

and R represent the degree of error correction and the capacity of the experience replay pool, respectively. So far, the data obtained from interacting with the environment successfully distinguish the importance of experience samples, thereby improving the learning efficiency of the experience samples. Moreover, the loss function is updated as:

L (θ^{Q}) = \frac{1}{I} \sum_{i = 1}^{I} W_{i} {(δ_{i})}^{2}

(43)

With the above description as the cornerstone, we developed an algorithm based on PER-DDPG, which is suitable for computation offloading and resource allocation in UAV swarm-assisted MEC involving lossless data compression, as shown in Algorithm 1.

Algorithm 1: Computation offloading algorithm based on PER-DDPG.

4. Numerical Simulation

In this section, the offloading efficiency of PER-DDPG in the UAV swarm-assisted MEC system is verified by numerical simulation, and its performance advantage in the data compression scenario is demonstrated. The experiments use Python 3.7 and the TensorFlow1 framework to simulate the system environment on the Pycharm platform.

4.1. Parameter Settings

This study assumes a collaborative computing network consisting of 50 UEs, 1 USH, and 4 USMs within a rectangular area of 400 m × 400 m. The USH dynamically adjusts its flight path based on Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS) communication conditions with the UEs, while the USMs remain fixed in position. The UEs’ positions are randomly updated in each time slot, and their assigned tasks are synchronously refreshed across the network. These parameters are primarily configured based on references from the literature [26,40]. Additional communication, computational and other related simulation parameters are detailed in Table 1. The soft update coefficient

ξ

of the PER-DDPG algorithm is

0.01

, and the batch size for randomly sampled data is 64.

4.2. Convergence Analysis

To comprehensively evaluate the performance of the proposed PER-DDPG-based algorithm, this manuscript conducts a series of comparative experiments with three benchmark algorithms: the standard DDPG algorithm, the DQN algorithm, and the uncompressed PER-DDPG (PER-DDPG_Ncpr). The evaluation process in this manuscript consists of four main phases: (1) hyperparameter optimization for the PER-DDPG algorithm to identify the optimal configuration with superior convergence performance; (2) sensitivity analysis of delay and energy consumption metrics to determine the optimal weight coefficients; (3) performance comparison with the benchmark algorithms under identical experimental conditions; and (4) scalability analysis through varying data volumes and user numbers to assess the algorithm’s robustness.

The impact of learning rate (

l r

) variations on algorithm performance is demonstrated in Figure 3. This study conducted systematic parameter tuning and comparative analysis, selecting a set of

l r

demonstrating distinct performance characteristics for detailed evaluation, specifically

0.001, 0.0002, 0.00001

and

0.00009

. The selection of an appropriate

l r

significantly influences the training dynamics and ultimately determines the convergence behavior of the algorithm. As shown in Figure 3, when using

l r

of

0.001

,

0.00001

, and

0.00009

, the algorithm exhibits substantial oscillation amplitudes, slower convergence rates, and difficulties in stabilizing at optimal values. This phenomenon occurs because excessively large learning rates cause the algorithm to overshoot the optimal solution, while overly small learning rates result in insufficient updates and slow progress. In contrast, the

l r

of

0.0002

results in faster convergence, reduced oscillation amplitude, and superior computational offloading strategy performance, achieving the optimal reward convergence value. Consequently, the

l r

in this study is established at

0.0002

.

The appropriate configuration of the discount factor (

γ

) significantly influences the algorithm’s convergence characteristics. A higher

γ

value emphasizes long-term returns, potentially leading to overly conservative strategies and unstable performance in certain states. Conversely, a lower

γ

value prioritizes immediate rewards, which may result in suboptimal long-term strategies and aggressive behavior, causing instability and fluctuations during training. Figure 4 demonstrates the impact of varying discount factors, specifically

0.1

,

0.01

, and

0.001

, on algorithm performance. Experimental results indicate that when

γ

is set to

0.1

or

0.001

, the algorithm produces substantial oscillation amplitudes and difficulties in achieving stable convergence. In contrast, setting

γ

to

0.01

significantly reduces oscillation amplitude, thus achieving optimal convergence performance. Consequently, the

γ

in this study is established at

0.01

.

The setting of weighting values in a multi-objective optimization problem should account for the sensitivity of both energy consumption and delay. The system delay weights

ω_{1}

and energy consumption weights

ω_{2}

are set within the range of

(0, 1)

. As shown in Figure 5, it can be seen that the convergence of this algorithm is poor when the weights are evenly distributed. When the sensitivity is skewed towards delay, the optimal strategy cannot be obtained. However, when

ω_{1} = 0.2

,

ω_{2} = 0.8

, the algorithm’s convergence is more stable, and the optimal strategy can be achieved.

Based on the comparative analysis of the above experiments, the PER-DDPG algorithm demonstrates optimal performance metrics when

l r = 0.0002

,

γ = 0.01

,

ω_{1} = 0.2

,

ω_{2} = 0.8

. Therefore, we will adopt this set of optimized hyperparameter configurations in subsequent experiments.

Figure 6 shows the reward convergence comparison graph of different algorithms. It can be observed that as the amount of training steps grows, the rewards of the four algorithms demonstrate a consistent upward trajectory, and eventually converge to a stable reward value, which indicates that the reinforcement learning agents are able to learn better strategies to minimize the latency and energy consumption of the UEs through interaction with the environment. The fluctuations observed in the performance curves are primarily caused by the dynamic nature of the optimization process, including the exploration–exploitation trade-off in DRL, the varying complexity of tasks, and the real-time adjustments in resource allocation and task offloading. These factors collectively contribute to the non-monotonic behavior of the algorithm’s performance over time.

Regarding convergence speed, the DDPG algorithm converges around 110 steps, while the other benchmark algorithms converge after 200 steps, which enables the proposed scheme to achieve superior convergence speed compared to the other algorithms. When considering convergence stability, the PER-DDPG_Ncpr algorithm exhibits the smoothest convergence, followed by the PER-DDPG algorithm. This is because the PER-DDPG_Ncpr algorithm has a smaller action space, which effectively reduces the policy search space and improves learning efficiency. Additionally, the DDPG algorithm solves the constraints of DQN in discrete action space through deterministic policy and policy gradient updates, and is suitable for solving continuous action space problems. In terms of overall performance, the PER-DDPG algorithm adopted in this manuscript achieves the highest reward and the smoothest convergence. This is attributed to the introduction of PER in PER-DDPG, which samples experiences based on their importance. This improves learning efficiency and sample utilization, further optimizing the algorithm’s performance in complex environments.

4.3. Performance Comparison

The performance of different algorithms under varying task volumes is compared in Figure 7. We can see that when the task data volume increases from

[1, 1.25]

Mb to

[2.25, 2.5]

Mb, the proposed algorithms result in the lowest total delay and energy usage in the system. Compared to PER-DDPG_Ncpr, DDPG, and DQN algorithms, there is a minimum reduction of

6.4 %

,

3.7 %

, and

22.7 %

in total latency and energy consumption, respectively. This demonstrates that the approach explored by the proposed algorithm delivers better results than the other three baseline algorithms. Additionally, the adoption of the data compression technique significantly lowers the overall energy usage for users, and the benefits of data compression become more pronounced as the task data size increases.

Figure 8 shows the system total cost and the relationship between the amount of UEs. With a growing number of UEs, the total energy usage exhibits a rising trend. This is primarily due to two factors: the increase in the total system cost caused by the growing number of UEs, combined with the limited communication resources of the system, which leads to resource competition among terminals and further increases the system cost. In contrast to the other three algorithms, the approach used in this article is able to allocate the system resources more rationally, and the total cost is always lower. For example, when the number of UEs is 50, the total system cost is reduced by at least

45.0 %

,

32.7 %

, and

60.7 %

compared to PER-DDPG_Ncpr, DDPG, and DQN algorithms, respectively.

Based on the above numerical analysis, the PER-DDPG algorithm introduced in this study demonstrates superior performance compared to the other three algorithms. By comprehensively optimizing task offloading, resource allocation, and data compression ratio, the algorithm can select the optimal strategy to achieve a relatively low maximum system processing cost.

5. Conclusions

In this manuscript, we study a coordination optimization scheme for task offloading and data compression enabled in UAV swarm-enabled mobile edge computing. First, we establish the system total delay and energy minimization problem by jointly considering the positional relationship between UAVs and UEs, the offloading ratio, the compression ratio, and the allocation of computational resources; second, we describe the problem as an MDP and solve it by using the PER-DDPG algorithm. Finally, we carried out extensive experiments to validate the validity of the proposed scheme. The numerical outcomes indicate that the scheme significantly lowers the overall system cost. Future work will consider the fairness problem of task offloading under multiple users, as well as the covertness of task transmission during task offloading to ensure security during communication.

Author Contributions

Conceptualization, Z.H. and S.L.; methodology, Z.H. and S.L.; software, S.L.; validation, S.L., D.Z. and C.S.; formal analysis, S.L. and D.Z.; investigation, Z.H. and S.L.; resources, Z.H. and S.L.; data curation, S.L. and D.Z.; writing—original draft preparation, Z.H. and S.L.; writing—review and editing, Z.H., D.Z. and T.W.; visualization, S.L. and D.Z.; supervision, Z.H.; project administration, Z.H. and T.W.; funding acquisition, Z.H. and C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China under Grant 52302505, the Shaanxi Key Research and Development Program of China under Grant 2023-YBGY-027.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, J.; Yi, C.; Chen, J.; Zhu, K.; Cai, J. Joint Trajectory Planning, Application Placement, and Energy Renewal for UAV-Assisted MEC: A Triple-Learner-Based Approach. IEEE Internet Things J. 2023, 10, 13622–13636. [Google Scholar] [CrossRef]
Wang, Y.; Chen, M.; Li, Z.; Hu, Y. Joint Allocations of Radio and Computational Resource for User Energy Consumption Minimization Under Latency Constraints in Multi-Cell MEC Systems. IEEE Trans. Veh. Technol. 2023, 72, 3304–3320. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, L.; Wang, L.; Hui, N.; Cui, X.; Wu, J.; Peng, Y.; Qi, Y.; Xing, C. Service-aware 6G: An intelligent and open network based on the convergence of communication, computing and caching. Digit. Commun. Netw. 2020, 6, 253–260. [Google Scholar] [CrossRef]
Jeong, S.; Simeone, O.; Kang, J. Mobile Edge Computing via a UAV-Mounted Cloudlet: Optimization of Bit Allocation and Path Planning. IEEE Trans. Veh. Technol. 2018, 67, 2049–2063. [Google Scholar] [CrossRef]
Yuan, H.; Wang, M.; Bi, J.; Shi, S.; Yang, J.; Zhang, J.; Zhou, M.; Buyya, R. Cost-Efficient Task Offloading in Mobile Edge Computing With Layered Unmanned Aerial Vehicles. IEEE Internet Things J. 2024, 11, 30496–30509. [Google Scholar] [CrossRef]
Trivisonno, R.; Guerzoni, R.; Vaishnavi, I.; Soldani, D. Towards zero latency Software Defined 5G Networks. In Proceedings of the 2015 IEEE International Conference on Communication Workshop (ICCW), London, UK, 8–12 June 2015; pp. 2566–2571. [Google Scholar] [CrossRef]
Liang, W.; Ma, S.; Yang, S.; Zhang, B.; Gao, A. Hierarchical Matching Algorithm for Relay Selection in MEC-Aided Ultra-Dense UAV Networks. Drones 2023, 7, 579. [Google Scholar] [CrossRef]
Cai, Q.; Zhou, Y.; Liu, L.; Qi, Y.; Pan, Z.; Zhang, H. Collaboration of Heterogeneous Edge Computing Paradigms: How to Fill the Gap Between Theory and Practice. IEEE Wirel. Commun. 2024, 31, 110–117. [Google Scholar] [CrossRef]
Lu, Y.; Xu, C.; Wang, Y. Joint Computation Offloading and Trajectory Optimization for Edge Computing UAV: A KNN-DDPG Algorithm. Drones 2024, 8, 564. [Google Scholar] [CrossRef]
Zhang, K.; Mao, Y.; Leng, S.; Zhao, Q.; Li, L.; Peng, X.; Pan, L.; Maharjan, S.; Zhang, Y. Energy-Efficient Offloading for Mobile Edge Computing in 5G Heterogeneous Networks. IEEE Access 2016, 4, 5896–5907. [Google Scholar] [CrossRef]
Wang, M.; Li, R.; Jing, F.; Gao, M. Multi-UAV Assisted Air–Ground Collaborative MEC System: DRL-Based Joint Task Offloading and Resource Allocation and 3D UAV Trajectory Optimization. Drones 2024, 8, 510. [Google Scholar] [CrossRef]
Wu, Z.; Yang, Z.; Yang, C.; Lin, J.; Liu, Y.; Chen, X. Joint deployment and trajectory optimization in UAV-assisted vehicular edge computing networks. J. Commun. Netw. 2022, 24, 47–58. [Google Scholar] [CrossRef]
Wang, R.; Huang, Y.; Lu, Y.; Xie, P.; Wu, Q. Robust Task Offloading and Trajectory Optimization for UAV-Mounted Mobile Edge Computing. Drones 2024, 8, 757. [Google Scholar] [CrossRef]
Yang, X.; Wang, Q.; Yang, B.; Cao, X. Energy-Efficient Aerial STAR-RIS-Aided Computing Offloading and Content Caching for Wireless Sensor Networks. Sensors 2025, 25, 393. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Li, W.; Yao, J. An Efficiency Framework for Task Allocation Based on Reinforcement Learning. In Proceedings of the 2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML), Chengdu, China, 3–5 November 2023; pp. 928–933. [Google Scholar] [CrossRef]
Yang, Y.Q.; Xia, Z.; Zhao, Z.; Zhang, T.; Li, K.; Yin, X.; Shi, H.; Peng, T. Intelligent Resource Management and Optimization of Clustered UAV Airborne SAR System. In Proceedings of the 2021 IEEE 4th International Conference on Electronics Technology (ICET), Chengdu, China, 7–May 2021; pp. 987–991. [Google Scholar] [CrossRef]
Duan, H.; Luo, Q.; Shi, Y.; Ma, G. Hybrid Particle Swarm Optimization and Genetic Algorithm for Multi-UAV Formation Reconfiguration. IEEE Comput. Intell. Mag. 2013, 8, 16–27. [Google Scholar] [CrossRef]
Liu, W.; Xu, Y.; Qi, N.; Yao, K.; Zhang, Y.; He, W. Joint Computation Offloading and Resource Allocation in UAV Swarms with Multi-access Edge Computing. In Proceedings of the 2020 International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 21–23 October 2020; pp. 280–285. [Google Scholar] [CrossRef]
Seid, A.M.; Boateng, G.O.; Anokye, S.; Kwantwi, T.; Sun, G.; Liu, G. Collaborative Computation Offloading and Resource Allocation in Multi-UAV-Assisted IoT Networks: A Deep Reinforcement Learning Approach. IEEE Internet Things J. 2021, 8, 12203–12218. [Google Scholar] [CrossRef]
Li, J.; Chen, J.; Yi, C.; Zhang, T.; Zhu, K.; Cai, J. Energy-Efficient UAV Swarm Assisted MEC With Dynamic Clustering and Scheduling. In Proceedings of the 2024 IEEE Wireless Communications and Networking Conference (WCNC), Dubai, United Arab Emirates, 21–24 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Miao, Y.; Hwang, K.; Wu, D.; Hao, Y.; Chen, M. Drone Swarm Path Planning for Mobile Edge Computing in Industrial Internet of Things. IEEE Trans. Ind. Inform. 2023, 19, 6836–6848. [Google Scholar] [CrossRef]
Liu, B.; Huang, H.; Guo, S.; Chen, W.; Zheng, Z. Joint Computation Offloading and Routing Optimization for UAV-Edge-Cloud Computing Environments. In Proceedings of the 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Guangzhou, China, 8–12 October 2018; pp. 1745–1752. [Google Scholar] [CrossRef]
Chen, M.; Liew, S.C.; Shao, Z.; Kai, C. Markov Approximation for Combinatorial Network Optimization. IEEE Trans. Inf. Theory 2013, 59, 6301–6327. [Google Scholar] [CrossRef]
Srisooksai, T.; Keamarungsi, K.; Lamsrichan, P.; Araki, K. Practical data compression in wireless sensor networks: A survey. J. Netw. Comput. Appl. 2012, 35, 37–59. [Google Scholar] [CrossRef]
Lu, S.; Xia, Q.; Tang, X.; Zhang, X.; Lu, Y.; She, J. A Reliable Data Compression Scheme in Sensor-Cloud Systems Based on Edge Computing. IEEE Access 2021, 9, 49007–49015. [Google Scholar] [CrossRef]
Cheng, K.; Fang, X.; Wang, X. Energy Efficient Edge Computing and Data Compression Collaboration Scheme for UAV-Assisted Network. IEEE Trans. Veh. Technol. 2023, 72, 16395–16408. [Google Scholar] [CrossRef]
Sayood, K. Introduction to Data Compression; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1996. [Google Scholar]
Sharma, K.; Gupta, K. Lossless data compression techniques and their performance. In Proceedings of the 2017 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 5–6 May 2017; pp. 256–261. [Google Scholar] [CrossRef]
Rodríguez Marco, J.E.; Sánchez Rubio, M.; Martínez Herráiz, J.J.; González Armengod, R.; Del Pino, J.C.P. Contributions to Image Transmission in Icing Conditions on Unmanned Aerial Vehicles. Drones 2023, 7, 571. [Google Scholar] [CrossRef]
Jayasankar, U.; Thirumal, V.; Ponnurangam, D. A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications. J. King Saud Univ. Comput. Inf. Sci. 2021, 33, 119–140. [Google Scholar] [CrossRef]
Dipti Mathpal, M.D.; Mehta, S. A Research Paper on Lossless Data Compression Techniques. Int. J. Innov. Res. Sci. Technol. 2017, 4, 190–194. [Google Scholar]
Han, B.; Ye, Y.; Shi, L.; Xu, Y.; Lu, G. Energy-Efficient Computation Offloading for MEC-Enabled Blockchain by Data Compression. In Proceedings of the 2024 IEEE/CIC International Conference on Communications in China (ICCC), Hangzhou, China, 7–9 August 2024; pp. 1970–1975. [Google Scholar] [CrossRef]
Liang, J.; Ma, B.; Feng, Z.; Huang, J. Reliability-Aware Task Processing and Offloading for Data-Intensive Applications in Edge computing. IEEE Trans. Netw. Serv. Manag. 2023, 20, 4668–4680. [Google Scholar] [CrossRef]
Qiu, X.; Dai, Y.; Xiang, Y.; Xing, L. Correlation Modeling and Resource Optimization for Cloud Service With Fault Recovery. IEEE Trans. Cloud Comput. 2019, 7, 693–704. [Google Scholar] [CrossRef]
Tu, W.; Liu, X. Energy Consumption Minimization for a Data Compression Based NOMA-MEC System. In Proceedings of the 2024 5th Information Communication Technologies Conference (ICTC), Nanjing, China, 10–12 May 2024; pp. 337–343. [Google Scholar] [CrossRef]
Ding, Z.; Lv, C.; Huang, Z.; Zhang, M.; Chang, M.; Liu, R. Joint Optimization of Transmission and Computing Resource in Mobile Edge Computing Systems with Multiple Base Stations. In Proceedings of the 2022 IEEE 5th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 16–18 December 2022; Volume 5, pp. 507–512. [Google Scholar] [CrossRef]
Li, X.; You, C.; Andreev, S.; Gong, Y.; Huang, K. Wirelessly Powered Crowd Sensing: Joint Power Transfer, Sensing, Compression, and Transmission. IEEE J. Sel. Areas Commun. 2019, 37, 391–406. [Google Scholar] [CrossRef]
Ren, J.; Yu, G.; Cai, Y.; He, Y. Latency Optimization for Resource Allocation in Mobile-Edge Computation Offloading. IEEE Trans. Wirel. Commun. 2018, 17, 5506–5519. [Google Scholar] [CrossRef]
Yang, Y.; Gong, Y.; Wu, Y.C. Intelligent-Reflecting-Surface-Aided Mobile Edge Computing with Binary Offloading: Energy Minimization for IoT Devices. IEEE Internet Things J. 2022, 9, 12973–12983. [Google Scholar] [CrossRef]
Seid, A.M.; Boateng, G.O.; Mareri, B.; Sun, G.; Jiang, W. Multi-Agent DRL for Task Offloading and Resource Allocation in Multi-UAV Enabled IoT Edge Network. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4531–4547. [Google Scholar] [CrossRef]

Figure 1. System model of the UAV swarm-enabled mobile edge computing.

Figure 2. UAV swarm-enabled MEC network framework based on PER-DDPG.

Figure 3. Impact of learning rates on the convergence of PER-DDPG.

Figure 4. Convergence performance of PER-DDPG with different

γ

.

Figure 4. Convergence performance of PER-DDPG with different

γ

.

Figure 5. Sensitivity analysis of weights in cumulative rewards.

Figure 6. Convergence performance of different algorithms.

Figure 7. Comparison of algorithm performance under different task sizes.

Figure 8. Comparison of algorithm performance with different numbers of UEs.

Table 1. Main parameters and assumptions.

Parameter	Descriptions	Value
D	Observation period	320 s
T	Number of time slot number	40
$D_{k} (t)$	Size of task generated by UE k	[1, 2.5] Mb
$C_{k} (t)$	Average Computation per Bit	[700, 1000] cycles/b
$C R_{k}^{m a x}$	maximum compression ratio	2
H	Flight Altitude of USM and USMs	100 m
f	Carrier frequency	2.5 GHz
$ζ^{L o S}, ζ^{N L o S}$	Additional attenuation	1, 20
$P_{k}$	Transmission power of UE k	0.1 W
$σ^{2}$	Gaussian white noise power	−100 dB
B	Communications bandwidth	1 MHz
$P_{U S H}$	Transmission power of USH	125 mW
$G_{U S H}$	Reference channel gain	−50 dB
$Δ_{f l y}$	Flight duration	1 s
$V_{U S H}^{m a x}$	Maximum flight speed of USH	50 m/s
$η$	Computational efficiency of UE	$10^{- 28}$
$f_{k}$	Computing capability of UE k	0.3 GHz
$f_{U S H}$	Computing capability of USH	1.2 GHz
$M_{U S H}$	USH weight	9.65 kg
$f_{n}$	Computing capability of USM n	15 GHz
$E_{k}^{m a x}$	Available power for UE k	5 kJ
$E_{U S H}^{m a x}$	Available power for USH	500 kJ
$E_{n}^{m a x}$	Available power for USM n	500
$ω_{1}$	System delay weight	[0, 1]
$ω_{2}$	System energy weight	[0, 1]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, Z.; Liu, S.; Zhou, D.; Shen, C.; Wang, T. Task Offloading and Data Compression Collaboration Optimization for UAV Swarm-Enabled Mobile Edge Computing. Drones 2025, 9, 288. https://doi.org/10.3390/drones9040288

AMA Style

Hu Z, Liu S, Zhou D, Shen C, Wang T. Task Offloading and Data Compression Collaboration Optimization for UAV Swarm-Enabled Mobile Edge Computing. Drones. 2025; 9(4):288. https://doi.org/10.3390/drones9040288

Chicago/Turabian Style

Hu, Zhijuan, Shuangyu Liu, Dongsheng Zhou, Chao Shen, and Tingting Wang. 2025. "Task Offloading and Data Compression Collaboration Optimization for UAV Swarm-Enabled Mobile Edge Computing" Drones 9, no. 4: 288. https://doi.org/10.3390/drones9040288

APA Style

Hu, Z., Liu, S., Zhou, D., Shen, C., & Wang, T. (2025). Task Offloading and Data Compression Collaboration Optimization for UAV Swarm-Enabled Mobile Edge Computing. Drones, 9(4), 288. https://doi.org/10.3390/drones9040288

Article Menu

Task Offloading and Data Compression Collaboration Optimization for UAV Swarm-Enabled Mobile Edge Computing

Abstract

1. Introduction

2. System Model

2.1. Data Compression

2.2. Task Transmission

2.2.1. Communication Between UEs and USH

2.2.2. Communication Between USH and USMs

2.3. USH Movement

2.4. Task Execution

2.4.1. UE Implementation

2.4.2. USH Implementation

2.4.3. USM Implementation

2.5. Problem Formulation

3. Computing Offloading Strategy

3.1. Construction of MDP

3.1.1. State Space

3.1.2. Action Space

3.1.3. Reward Design

3.2. Solution Based on PER-DDPG

4. Numerical Simulation

4.1. Parameter Settings

4.2. Convergence Analysis

4.3. Performance Comparison

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI