Exact and Approximation Algorithms for Task Offloading with Service Caching and Dependency in Mobile Edge Computing

Cui, Bowen; Zhang, Jianwei

doi:10.3390/fi17060255

Open AccessArticle

Exact and Approximation Algorithms for Task Offloading with Service Caching and Dependency in Mobile Edge Computing

by

Bowen Cui

and

Jianwei Zhang

^*

School of Information Science and Engineering, Yunnan University, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(6), 255; https://doi.org/10.3390/fi17060255

Submission received: 11 May 2025 / Revised: 3 June 2025 / Accepted: 6 June 2025 / Published: 10 June 2025

Download

Browse Figures

Versions Notes

Abstract

With the continuous development of the Internet of Things (IoT) and communication technologies, the demand for low latency in practical applications is becoming increasingly significant. Mobile edge computing, as a promising computational model, is receiving growing attention. However, most existing studies fail to consider two critical factors: task dependency and service caching. Additionally, the majority of proposed solutions are not related to the optimal solution. We investigate the task offloading problem in mobile edge computing. Considering the requirements of applications for service caching and task dependency, we define an optimization problem to minimize the delay under the constraint of maximum completion cost and present a

(1 + ϵ)

-approximation algorithm and an exact algorithm. Specifically, the offloading scheme is determined based on the relationships between tasks as well as the cost and delay incurred by data transmission and task execution. Simulation results demonstrate that in all cases, the offloading schemes obtained by our algorithm consistently outperform other algorithms. Moreover, the approximation ratio to the optimal solution from the approximation algorithm is validated to be less than

(1 + ϵ)

, and the exact algorithm consistently produces the optimal solution.

Keywords:

mobile edge computing (MEC); task offloading; task dependency; service caching; approximation algorithm

1. Introduction

Although the CPU of mobile devices is constantly being updated and iterated, and their computational capabilities are gradually improving, the processing power of mobile devices still appears to be insufficient, especially when compared to the explosive growth of data. This is due to limitations such as battery capacity and computational resources. With the continuous development of the Internet of Things (IoT) and communication technologies, there are an increasing number of computationally intensive and latency-sensitive applications, like image processing and machine learning, or those requiring real-time processing, such as augmented reality (AR) and autonomous driving, which are particularly challenging for mobile devices to handle [1,2]. Although cloud servers have powerful processing capabilities, running resource-intensive applications requires significant time for data transmission, resulting in unpredictable communication delays [3,4,5]. The emergence of mobile edge computing (MEC) addresses the shortcomings of both cloud and mobile devices, offering great potential in solving the task offloading problem [6,7,8,9,10].

Although [11,12,13,14] propose new computational models in MEC, it still faces numerous challenges. Taking object and pose recognition as an example, a typical object and pose recognition application requires several steps: feature extraction, object model matching, pose estimation, clustering, and classification [15]. When devising offloading strategies for these tasks, there are two factors that we need to consider.

The first factor is whether the device has cached the necessary services for task execution. This is one of the key issues in the development of future 6G technology [16,17]. Service caching not only enhances network reliability, adaptability, and bandwidth utilization, but also, by caching the required services on the execution node, effectively supports offloading strategies, thereby improving the performance of the algorithm [18]. However, most research on service caching focuses on the issue of service placement [2,19,20], which incurs higher costs and time compared to task transmission and execution [20]. For instance, a machine learning model requires a large amount of data for training, and if we wanted to migrate this service to another execution node, it would take a considerable amount of time. This is not desirable for delay-sensitive applications. Therefore, like in [10], we assume that the service is already placed on the execution node. We focus on the impact of service caching on the offloading strategy.

The second factor is whether there is dependency among tasks. For instance, the pose estimation task can only be executed after the object model matching task. Each subtask requires the result of its parent task to continue execution, and different offloading strategies directly affect real-time performance. Traditional coarse-grained scheduling only considers operations at the application level [21], which makes it unable to address the complex and diverse demands of current applications. Although most current studies take this into account, they either rank subtasks based on delay constraints and execute each subtask sequentially [22,23,24,25], or use heuristic algorithms to solve such problems [26,27,28,29]. This results in certain limitations and makes it difficult to ensure performance guarantees.

Therefore, service caching and task dependency both impact the performance and feasibility of offloading strategies. When considering task offloading, if service caching is considered without considering the dependency between tasks, the offloading strategy may fail to execute [10]. Conversely, if only task dependency is considered, significant time may be spent caching services for the offloading strategy, which affects the performance of the algorithm. Meanwhile, both service caching and task dependency significantly influence the integration of MEC and future 6G technologies. Considering service caching leverages the user-proximity feature of MEC to meet the high-bandwidth, high-data-rate, and low-latency requirements of 6G, thereby promoting the convergence of communication, computation, and caching (3C) [16]. On the other hand, incorporating task dependency facilitates the integration of AI technologies, such as object and pose recognition applications, with MEC, further enhancing the role and applicability of MEC in the 6G era [17]. Although many current studies take both factors into account, their approaches are either based on deep learning or solved through heuristic algorithms [10,19,30,31], which cannot guarantee that the offloading strategy obtained is the optimal solution or close to the optimal solution, and they lack strong performance guarantees.

To overcome these constraints, effectively solve the offloading problem, and ensure the optimality of the offloading schemes, we propose both a polynomial-time approximation algorithm and an exact algorithm. These algorithms are improved based on dynamic programming (DP) and the Bellman–Ford algorithm, respectively. We summarize the main contributions as follows.

We define the delay-minimization problem under task dependency, service caching, and the maximum completion cost constraint, and prove its NP-hardness. Due to dependency, the delay and cost of task offloading are related to all the predecessor nodes. However, many existing studies fail to take this into account, which is of practical importance for delay-sensitive applications.
We then propose a $(1 + ϵ)$ -approximation algorithm to solve the aforementioned problem, and derive its time complexity as $O (d_{i n} N^{'} {M^{'}}^{2} \frac{L}{ϵ} (\log \log (U B / L B) + 1))$ , where L denotes the depth of the task graph, $d_{i n}$ denotes the maximum indegree of the task graph, $M^{'}$ is the number of execution devices, $N^{'}$ is the number of tasks, and $L B$ and $U B$ represent the lower and upper bounds of the delay, respectively. In this paper, our approximation algorithm enables each task to make its offloading decision based on local information during execution, without the need to consider global decisions. Moreover, it provides worst-case performance guarantees, which aligns well with the characteristics of the distributed structure of MEC.
We propose an exact algorithm that can obtain the optimal solution to the aforementioned problem. Although its time complexity is theoretically exponential, in practice, the running time is much smaller than the theoretical value when solving real-world problems.
We finally evaluate the proposed algorithm using a random task model and a real task model, then compare it with the FixDoc algorithm [19], the Hermes algorithm [32], and a brute force algorithm. The simulation results validate the approximation ratio $(1 + ϵ)$ and the superiority of the obtained offloading scheme of our algorithm over the other algorithms.

The rest of this paper is organized as follows. Section 2 summarizes the related work. In Section 3, we model the problem and prove its complexity. Section 4 presents the polynomial-time approximation algorithm. Section 5 introduces the exact algorithm. In Section 6, we conduct experiments and discuss the proposed algorithms. Finally, Section 7 concludes the paper.

2. Related Works

An increasing number of applications require low latency and low costs, such as autonomous driving, facial recognition, and AR. MEC, as a new computing model, is gaining more and more attention.

Yang et al. [11] combined an intelligent reflecting surface (IRS) with the MEC model and considered an IRS-aided multi-device MEC system, where each IoT device follows a binary offloading policy to minimize the total energy consumption of the IoT devices. Liu et al. [12] proposed a novel STAR-RIS-assisted MEC system that extends the coverage range of wireless access points (APs) from half-space to full-space. Chen et al. [13] discussed the problem of computational task offloading and resource allocation in remote areas lacking ground communication infrastructure through an air–ground integrated MEC system. Qin et al. [14] introduced a new user-centric mobile edge computing (UCMEC) model that addresses the issues of resource allocation imbalance and network interference in traditional MEC architectures. Although the above studies have made improvements to task offloading in traditional MEC systems, they do not consider two crucial constraints in task offloading: task dependency and service caching.

With the increasing complexity of applications, edge servers are constrained by limited resources and cannot cache all task services [2,20]. Research on service caching mainly focuses on the joint optimization of service placement and task offloading. However, compared to task execution and offloading, service placement and updates often incur higher costs and time, negatively affecting system stability [20]. Moreover, due to the uncertainty of tasks and delay sensitivity, offloading schemes may cause congestion at certain execution nodes, significantly impacting the required offloading time [33]. Based on these factors, Farhadi [20] proposed a task offloading scheme that decouples the time scales of task offloading and service deployment. Their method introduces a joint optimization framework for task offloading and service deployment under multiple constraints.

However, the above-mentioned problem does not take into account the dependency between tasks. With the continuous development of applications, the vast majority of applications consist of sets of interdependent tasks, making the offloading of dependent tasks essential [15,19]. Arabnejad et al. [22] addressed the cost minimization problem of dependent tasks under delay constraints in cloud environments. They set sub-delay constraints for each subtask based on the delay constraints and ranked them. Then, a heuristic ProLiS algorithm and a metaheuristic L-ACO algorithm were proposed to solve this problem. Lou et al. [23] applied the aforementioned task ranking method to the MEC environment and proposed a two-stage heuristic algorithm to obtain the cost-minimized offloading scheme. Wang et al. [26] focused on reducing delay and improving resource utilization for dependent tasks. To address this problem, they designed a Nash Equilibrium-based Joint Task Scheduling and Offloading Strategy (N-JSU). Hosny et al. [27] proposed a method based on an Enhanced Genetic Algorithm (EMGA), which considers task dependencies while optimizing multiple objectives. Cai et al. [28] combine the prioritized queue strategy with a joint delay-quality dependence model, then transform the task assignment process into a tree search problem and solve it using a heuristic tree-based algorithm. However, they only consider task dependencies while neglecting service caching, limiting the applicability to real-world applications.

In fact, the tasks of applications now not only need to determine the execution order based on their dependencies, but each subtask also requires specific services to be executed. Therefore, it is crucial to simultaneously consider both factors for task offloading. Liu et al. [19] proposed a DP-based heuristic algorithm targeting task-completion-time minimization. Zhao et al. [10] addressed the limitations of computational resources in mobile devices and proposed a convex optimization-based approach to solve this problem. Zhao et al. [30] proposed a dynamic caching-assisted task offloading scheme (CachOf), which combines collaborative computing of edge servers and dynamic resource allocation strategies. They then applied deep reinforcement learning (DRL) methods to optimize the task offloading strategy. Li et al. [31] proposed a method based on the multi-objective artificial bee colony algorithm (MOTOCP) to maximize cache hit rate and minimize service delay. The method gradually improves the algorithm’s performance by balancing multiple objectives through Pareto optimality. However, the aforementioned methods are either based on deep learning or heuristic algorithms. The former cannot guarantee an offloading solution within polynomial time, while the latter cannot guarantee that each offloading solution obtained will be optimal or near-optimal.

In most practical applications, tasks are often dependent on specific services for execution. Additionally, many tasks require low-latency performance. Due to the distance to the server, transmission delay inevitably tends to be relatively high. In the context of approximate algorithms for wireless networks, Tan et al. [34,35] considered the sum-rate maximization problem by Perron-Frobenius theory. They focused on finding approximately optimal solutions by studying the solutions to two related problems: the signal-to-interference-plus-noise ratio (SINR) approximation power control (SAPC) problem and the max–min weighted SINR problem. For the SAPC problem, the authors proposed an asynchronous iterative algorithm, while for the max–min weighted SINR problem, they developed a fast, two-timescale algorithm and derived a synchronous algorithm. However, the approximation ratio in their work depends on the parameters of the problem instances, and the problem model they consider differs significantly from ours. For the use of approximation algorithms in the task offloading problem, Zhang et al. [36] considered the energy minimization problem under latency constraints with linear task dependency, and proposed the Lagrangian Relaxation Based Aggregated Cost algorithm to obtain both optimal and approximate solutions to the constrained optimization problem. However, unlike our approach, they did not provide a clear approximation ratio. Zhou et al. [37], aiming to jointly optimize the tradeoff between latency and energy in a multi-server MEC system, proposed an approximation algorithm based on the Markov approximation framework to solve the proposed problem. Additionally, Younis et al. [38] introduced an energy–latency-aware task offloading and approximation computation problem. They decomposed the original problem into three subproblems and used duality techniques and convex optimization to obtain an approximate solution. Although both of these studies proposed approximation algorithms to address the task offloading problem, their approximation ratios are dependent on the objective values of the problem. In contrast, by setting the value of

ϵ

, we can guarantee a constant approximation ratio of (1 +

ϵ

) for each case, which provides a better approximation guarantee in our research. To the best of our knowledge, only Hermes [32] has proposed a polynomial-time approximation algorithm to address the delay-minimization problem under a cost constraint. The approximation ratio of this algorithm is also (1 +

ϵ

), which is the same as ours. However, their approach does not take into account constraints related to service caching, which may lead to situations where offloading solutions are not feasible.

3. System Model and Problem Formulation

3.1. System Model

We consider a model comprising a set of mobile devices, a set of edge nodes, and a cloud node. As shown in Figure 1, each node in every layer can communicate not only with nodes within the same layer but also with any node in other layers. The cloud node possesses robust processing capabilities, featuring minimal processing delay and the capacity to cache all services. However, it incurs higher processing costs and is located at a considerable distance from both mobile devices and edge nodes, resulting in increased transmission delay and cost. In contrast, mobile devices exhibit the lowest processing capabilities among the three entities and are only capable of caching a limited number of services. Nonetheless, mobile devices have the advantage of minimal transmission delay and the lowest transmission costs when communicating with each other. We consider a set of execution nodes, including cloud nodes, edge nodes, and mobile devices. Let

M_{v} = {m_{1}, m_{2}, \dots, m_{M^{'}}}

denote the set of execution nodes that satisfy the service cache requirements for task v.

We consider an application that must be completed within a maximum cost T. The application consists of multiple interdependent tasks, and these dependencies can be modeled as a directed acyclic graph (DAG)

G = (V, E)

, where V represents the set of tasks, and E represents the set of edges used to represent the dependency relationships between tasks. An edge

(i, j) \in E

indicates that the output data of task i will serve as the input data for task j, meaning task j can only be executed after task i is completed and the data is transmitted. Our delay and cost calculation models are based on [9,19,39]. Next, we provide a detailed introduction to the processing and communication models.

(1) Processing model: We assume that execution node v has cached the services required by task j and can process multiple tasks simultaneously, so the time to process task j on execution node v is solely composed of the computation time. Specifically, the time and cost to process task j on execution node v are

T_{j}^{v} = \frac{L_{j}}{f_{v}}, T_{j}^{v} = p_{v} \cdot T_{j}^{v}

(1)

respectively, where

L_{j} = S_{j} \cdot C_{v}

[11] represents the computational workload of task j, with

S_{j}

denoting the data size of task j and

C_{v}

representing the number of CPU cycles required by execution node v to process 1 bit of data.

f_{v}

denotes the CPU frequency of execution node v, where the CPU frequency of servers is generally higher than that of mobile devices.

p_{v}

represents the computational cost per unit time for execution node v, which is an approximation in this paper [39].

(2) Communication model: The communication time between execution nodes includes both the task data upload and the result download durations. Following [9], we assume uplink/downlink channel reciprocity and denote the channel gain for transmitting the output data of task i from execution node k to execution node v as the input data of task j by

h_{i j}^{k v} = A_{d} \cdot {(\frac{3 \times 10^{8}}{4 π f_{c} d_{k v}})}^{d_{e}}

, where

A_{d} = 4.11

denotes the antenna gain,

f_{c} = 915 MHz

denotes the carrier frequency,

d_{e} = 26

denotes the path loss exponent, and

d_{k v}

denotes the distance between execution nodes k and v. We assume that the channel gain does not change during the transmission of tasks i and j. The uplink and downlink rates for transmitting the output data of task i from execution node k to execution node v as the input data of task j are given by

R_{i j}^{k v} = B \cdot \log_{2} (1 + \frac{c h_{i j}^{k v}}{σ^{2}})

and

{\tilde{R}}_{i j}^{k v} = \tilde{B} \cdot \log_{2} (1 + \frac{c h_{i j}^{k v}}{{\tilde{σ}}^{2}})

, respectively, where B and

\tilde{B}

denote uplink and downlink the communication bandwidth, respectively;

c = 0.935

watt denotes the communication cost per unit time, which is an approximation [39,40]; and

σ^{2}

and

{\tilde{σ}}^{2}

denote uplink and downlink the noise power, respectively. For simplicity, we assume equal noise power without loss of generality. Therefore, the communication time and cost for transmitting the output data of task i from execution node k to execution node v as the input data of task j can be expressed as follows:

T_{i j}^{k v} = \frac{O_{i}}{R_{i j}^{k v}} + \frac{I_{j}}{{\tilde{R}}_{i j}^{k v}}, C_{i j}^{k v} = c \cdot T_{i j}^{k v}

(2)

where

O_{i}

is the output data of task i and

I_{j}

is the input data of task j, since during communication, the output of task i is used as the input for task j, thus

O_{i}

=

I_{j}

.

We assume that an application must begin and end on the same execution node. To satisfy this requirement in task modeling, we insert two virtual nodes (task s and task t) into the directed acyclic graph. These virtual nodes correspond to tasks with zero execution time and cost. Specifically, task s is added to the local device to initiate the application, and task t is added to the local device to collect the results at the end of execution. We use

V^{'}

to represent the task set before the insertion of the virtual nodes. Consequently, the total number of tasks can be expressed as

N^{'} = | V^{'} | + 2 .

Figure 2 illustrates an example of an offloading scheme that considers both service caching and task dependency, which consists of three execution nodes. Execution node

m_{2}

caches the functionality of task 1, and execution node

m_{3}

caches the functionalities of tasks 2 and 3. The application task set is initiated by execution node

m_{1}

. Specifically, virtual nodes are offloaded to

m_{1}

. However, since neither

m_{1}

nor

m_{3}

caches the functionality of task 1, task 1 is offloaded to

m_{2}

. Similarly, tasks 2 and 3 are offloaded to

m_{3}

. In this example,

m_{1}

and

m_{2}

represent mobile devices, while

m_{3}

represents an edge node. The cloud node is not considered in this case.

3.2. Problem Formulation

Our objective is to determine a task offloading scheme x that minimizes the total delay while satisfying the cost constraint, considering a network with task dependencies and service caching.

C (j, x) = C_{j}^{x_{j}} + \sum_{i \in C (j)} [C (i, x) + C_{i j}^{x_{i} x_{j}}] .

(3)

C (j, x)

represents the cost of completing task j in offloading scheme x.

x_{j}

denotes the device selected for offloading task j. The set

C (j)

represents the set of predecessor nodes of node j.

C_{j}^{x_{j}}

represents the processing cost incurred by server

x_{j}

for handling task j. Additionally,

C_{i j}^{x_{i} x_{j}}

denotes the communication cost between server

x_{i}

, which offloads task i, and server

x_{j}

, which offloads task j. As described in the above equation, the total cost of completing task j is the sum of the costs of completing all its predecessor tasks, the data transmission cost, and the execution cost of task j.

T (j, x) = max_{i \in C (j)} [T (i, x) + T_{i j}^{x_{i} x_{j}}] + T_{j}^{x_{j}} .

(4)

T (j, x)

represents the time required to complete task j in offloading scheme x.

T_{j}^{x_{j}}

represents the processing delay incurred by server

x_{j}

for handling task j. Additionally,

T_{i j}^{x_{i} x_{j}}

denotes the communication delay between server

x_{i}

, which offloads task i, and server

x_{j}

, which offloads task j. Task i can only be executed after data has been transmitted from all its predecessor nodes, and its execution time is determined by the slowest predecessor node.

P : min_{x \in {[M_{v}]}^{V}} T (V, x)

\begin{matrix} s . t . C (N^{'}, x) & \leq T, \\ | x_{i} | - | x_{j} | & \geq 0, \forall x_{i}, x_{j} \in x, \forall (i, j) \in E \end{matrix}

In this context,

C (N^{'}, x)

and

T (V, x)

are defined in (3) and (4), respectively. Here,

M_{v} = {m_{1}, m_{2}, \dots, m_{M^{'}}}

denote the set of execution nodes that satisfy the service cache requirements for task v, V represents the set of all tasks,

N^{'}

denotes the last task in the sequence, and T represents the cost constraint. The first constraint indicates that the cost incurred when completing the virtual task t should not exceed the cost limit. The second constraint ensures that the successor tasks can only be offloaded once the predecessor tasks have been completed.

Theorem 1.

P is NP-hard.

Proof.

In a serial task graph where service caching and data transmission are not considered, we transform the binary partition problem of P into a 0–1 knapsack problem. Since the 0–1 knapsack problem is NP-hard, problem P is at least NP-hard as well [32].

We rewrite a special case of the original problem P as problem

P^{'}

, where

C_{i}^{0} = 0

.

C_{i}^{1}

and

T_{i}^{1}

represent the cost and delay required to complete task i, respectively.

P' : min_{x_{i} \in {0, 1}} \sum_{i = 1}^{N^{'}} [(1 - x_{i}) T_{i}^{0} + x_{i} T_{i}^{1}]

s . t . \sum_{i = 1}^{N^{'}} x_{i} C_{i}^{1} \leq T

By setting

T_{i}^{0} = 0

,

T_{i}^{1} = - v_{i}

, and

C_{i}^{1} = w_{i}

, problem

P^{'}

can be transformed into problem Q, which corresponds to the 0-1 knapsack problem.

Q : max_{x_{i} \in {0, 1}} \sum_{i = 1}^{N^{'}} x_{i} v_{i}

s . t . \sum_{i = 1}^{N^{'}} x_{i} w_{i} \leq T

If we solve

P^{'}

, we can also solve Q. Since

P^{'}

, as a special case of the original problem P, is NP-hard, it follows that the original problem P is also NP-hard. □

4. Approximation Algorithm

In this section, we propose an approximation algorithm and provide a proof of its approximation ratio and time complexity.

4.1. Upper and Lower Bounds of Delay

Before describing the approximation algorithm, we first need to determine the upper and lower bounds of the delay, as shown in Algorithm 1. We assume that the task set is initiated by device a.

From line 1 to line 2, we initialize the lower and upper bounds of the delay for all tasks on different servers to infinity (since it is not feasible to set the value to infinity in practice, we substitute it with

10^{6}

or a larger value), and set the lower bound of the virtual task s at device a to be 0. From line 3 to line 7, based on the task order in List I, we calculate the upper and lower bounds of the delay for each task on the execution node that satisfies its service requirements. This calculation is performed using a greedy approach, without considering the cost constraint. List I represents the task order obtained through a topological sorting algorithm, such as the Kahn algorithm, ensuring that all task dependencies are respected. Finally, at line 8,

U B

and

L B

are obtained as the delay bounds.

Algorithm 1 FULC(

G (V, E)

, List I)

1:: Initialize $T_{\min} (v, m) \leftarrow \infty$ , $T_{\max} (v, m) \leftarrow 0$ , $v \in V, m \in M$ .
2:: Set $T_{\min} (s, a) \leftarrow 0$ .
3:: for $j \in V ∖ {s}$ in List I do
4:: for $m \in M_{j}$ do
5:: $T_{\min} (j, m) \leftarrow T_{j}^{m} + min_{x_{i} \in M_{i}} max_{i \in C (j)} [T (i, x_{i}) + T_{i j}^{x_{i} m}] .$
$T_{\max} (j, m) \leftarrow T_{j}^{m} + max_{x_{i} \in M_{i}} max_{i \in C (j)} [T (i, x_{i}) + T_{i j}^{x_{i} m}] .$
6:: end for
7:: end for
8:: $L B \leftarrow T_{\min} (t, a)$ , $U B \leftarrow T_{\max} (t, a)$ .

4.2. Polynomial-Time Approximation Algorithm

We use

C (i, j, k)

to represent the minimum completion cost of task i on device j under a maximum consumption delay of k. The optimal strategy is derived by solving all subproblems for

i \in [V]

and

j \in [M_{i}]

of this problem.

We do not solve all subproblems for

k \in [1, U B]

, but instead discretize the delay values.

q_{S} (x) = k, (k - 1) S < x \leq k S, S = \frac{L B ϵ}{L + 1} .

(5)

L denotes the depth of task graph (the greatest number of nodes contained within a single path). If all subproblems of a given problem have already been solved, the problem can be described as follows:

k_{add} = q_{S} (T_{j}^{m} + T_{i j}^{x_{i} m}), k - k_{add} > = 0 .

(6)

\begin{matrix} C_{new} (j, m, k) = C_{j}^{m} + min_{x_{i} \in M_{i}} \sum_{i \in C (j)} [C_{i j}^{x_{i} m} + C (i, x_{i}, k - max_{i \in C (j)} k_{add})] . \end{matrix}

(7)

k_{add}

represents the quantized value of transmission delay and execution delay. Equation (7) calculates the minimum cost that satisfies the maximum consumption delay k based on all feasible solutions, where

max_{i \in C (j)} k_{add}

represents the slowest predecessor node in each feasible solution. Since the cost function (7) is additive, this implies that the minimum cost to complete task j on device m only depends on the sum of the costs of completing all its predecessor tasks and the data transmission cost. Furthermore, since all predecessor tasks of task j are independent of each other, this means that each predecessor task can be completed independently on different devices, and the data transmission process is also independent. As a result, the minimum value of the sum of the costs of completing all its predecessor tasks and the data transmission cost can be transformed into the sum of the minimum completion cost for each predecessor task and the corresponding transmission cost from that task i to task j. Therefore, we can transform (7) into (8) while ensuring that the quality of the solution remains unchanged. At the same time, the solution space is reduced from

M^{' p}

to

p M^{'}

, where p denotes the number of predecessor tasks for task j, and

M^{'}

is the number of execution devices for task j.

\begin{matrix} C_{new} (j, m, k) = C_{j}^{m} + \sum_{i \in C (j)} min_{x_{i} \in M_{i}} [C_{i j}^{x_{i} m} + C (i, x_{i}, k - max_{i \in C (j)} k_{add})] . \end{matrix}

(8)

After solving all subproblems of

C (i, j, k)

, all the data needs to be aggregated to the local device a. Therefore, the original problem can be transformed into the following form:

min k s . t . C (t, a, k) \leq T

We propose an approximation algorithm to solve the task offloading problem, as shown in Algorithm 2. In lines 1 to 2, we initialize the minimum cost of all tasks when the maximum consumption delay is 0 to infinity, and set the virtual task s to have a minimum cost of 0 at the maximum consumption delay of 0, with the quantized value of

U B

being

U B^{'}

. From lines 3 to 16, we compute the minimum cost for each task on the execution node that satisfies its service requirements, based on DP [41], under the constraint that the maximum consumption delay does not exceed k. If a feasible solution with a delay constraint is found before k exceeds

U B^{'}

, we return the corresponding offloading scheme. Otherwise, we return FAIL.

Algorithm 2 PTA(

G (V, E)

,

M_{v}

,

U B

,

L B

, T,

ϵ

, List I)

1:: Initialize $S \leftarrow \frac{L B ϵ}{L + 1}$ , $C (v, m, 0) \leftarrow \infty$ for all nodes $v \in V, m \in M_{v}$ .
2:: Set $C (s, a, 0) \leftarrow 0$ , $U^{'} \leftarrow ⌊\frac{U B}{S}⌋ + L + 1$ .
3:: for $k = 1, 2, \dots, U^{'}$ do
4:: for $j \in V$ in List I do
5:: for $m \in M_{j}$ do
6:: $C (j, m, k) \leftarrow C (j, m, k - 1)$ .
Compute $k_{add}$ , $C_{new} (j, m, k)$ according to (6) and (8).
7:: if $C_{new} (j, m, k) < C (j, m, k)$ then
8:: $C (j, m, k) \leftarrow C_{new} (j, m, k)$ .
9:: if $C (t, a, k) \leq T$ then
10:: return the corresponding plan and delay.
11:: end if
12:: end if
13:: end for
14:: end for
15:: end for
16:: return FAIL.

Lemma 1.

Let x represent any offloading scheme, and let

\tilde{d} (x)

denote the quantized delay value of the offloading scheme x. The delay of the offloading scheme x satisfies

d (x) \leq \tilde{d} (x) S \leq d (x) + (L + 1) S

.

Proof.

The proof is presented in Appendix A. □

However, since Algorithm 2 does not guarantee polynomial-time complexity, we present an approximate testing process of

U B

and

L B

(Algorithm 3), which iteratively invokes Algorithm 2 to reduce the gap between

U B

and

L B

as well as the quantization step size, thereby reducing the overall time complexity to guarantee polynomial-time complexity.

Algorithm 3 TUL(

G (V, E)

,

U B

,

L B

, T,

ϵ

, List I)

1:: Set $B_{L} \leftarrow L B$ , $B_{U} \leftarrow \frac{U B}{1 + ϵ}$ .
2:: while $\frac{B_{U}}{B_{L}} > 2$ do
3:: $B \leftarrow \sqrt{B_{U} \times B_{L}}$ .
4:: if $P T A$ ( $G (V, E)$ , $M_{v}$ , B, B, T, $ϵ$ , List I) returns FAIL then
5:: Test $(ϵ, B)$ = Yes, $B_{L} \leftarrow B$ .
6:: else
7:: Test $(ϵ, B)$ = No, $B_{U} \leftarrow B$ .
8:: end if
9:: end while
10:: return PTA( $G (V, E)$ , $M_{v}$ , $(1 + ϵ) B_{U}$ , $B_{L}$ , T, $ϵ$ , List I).

Lemma 2.

If Test(ϵ,B) = Yes, when

P T A

(

G (V, E)

,

M_{v}

, B, B, T, ϵ, List I) returns FAIL, then

d^{*} > B

; otherwise,

d^{*} < (1 + ϵ) B

, where

d^{*}

represents the delay of the optimal solution.

Proof.

The proof is presented in Appendix B. □

Theorem 2.

Given valid bounds

0 < L B \leq d^{*} \leq U B

, by Lemma 1 and Lemma 2 we can obtain a

(1 + ϵ)

-approximate result in

O (d_{i n} N^{'} {M^{'}}^{2} \frac{L}{ϵ} (\log \log (U B / L B) + 1))

time.

Proof.

From Algorithm 2, each DP procedure solves

N^{'} M^{'} U^{'}

subproblems [32]. Let

d_{i n}

denote the maximum indegree of the task graph. For solving each subproblem in equation 8, there can be up to

d_{i n}

minimization problems across

M^{'}

devices. Consequently, the overall complexity of the procedure in Algorithm 3 can be bounded by

O (d_{i n} N^{'} {M^{'}}^{2} U^{'}) = O (d_{i n} N^{'} {M^{'}}^{2} L \cdot \frac{U B}{L B ϵ} + d_{i n} N^{'} {M^{'}}^{2} L) .

If

U B \geq L B

, and

ϵ \leq 1

, then

O (d_{i n} N^{'} {M^{'}}^{2} U^{'}) = O (d_{i n} N^{'} {M^{'}}^{2} L \cdot \frac{U B}{L B ϵ}) .

In Algorithm 1, we first compute the maximum delay

U B

and minimum delay

L B

for the problem without cost constraints, which serve as valid bounds. By Lemma 2,

Test (ϵ, B)

is an effective process for reducing the dynamic range. Lines 4 to 8 in Algorithm 3 ensure that, at each iteration,

B_{L}

is a valid lower bound and

(1 + ϵ) B_{U}

is a valid upper bound. In Algorithm 3, binary search is used to search for the boundary within the range of

log B_{L}

to

log B_{U}

until the condition

B_{U} / B_{L} \leq 2

is satisfied. During the iterative process,

log (B_{U} / B_{L})

decreases to either

log (\sqrt{B_{U} B_{L}} / B_{L})

or

log (B_{U} / \sqrt{B_{U} B_{L}})

. Since

log (\sqrt{B_{U} B_{L}} / B_{L}) = log (B_{U} / \sqrt{B_{U} B_{L}}) = \frac{1}{2} log (B_{U} / B_{L})

, the dynamic range is halved after each iteration. Therefore, the binary search requires

\log \log (U B / L B)

calls to Algorithm 2. The complexity of lines 4 to 8 in Algorithm 3 is

O (d_{i n} N^{'} {M^{'}}^{2} \frac{L}{ϵ} \log \log (U B / L B))

. Since

U B / L B = (1 + ϵ) B_{U} / B_{L} = O (1)

in line 10 of Algorithm 3, the complexity of line 10 in Algorithm 3 is

O (d_{i n} N^{'} {M^{'}}^{2} \frac{L}{ϵ})

. Therefore, the total complexity is

O (d_{i n} N^{'} {M^{'}}^{2} \frac{L}{ϵ} (\log \log (U B / L B) + 1))

.

According to Lemma 1, we assume x is the offloading scheme obtained by Algorithm 2, and

x^{*}

is the optimal offloading scheme. we can deduce that Algorithm 2 can produce an offloading scheme x satisfying

d (x) \leq \tilde{d} (x) S \leq \tilde{d} (x^{*}) S \leq d (x^{*}) + (L + 1) S = d (x^{*}) + L B ϵ \leq d^{*} (1 + ϵ) .

□

5. Exact Algorithm

In this section, we introduce the exact algorithm and prove its time complexity. Note that the exact algorithm in this paper is primarily used for comparison with approximation algorithms, and it also performs well in certain scenarios.

5.1. Label Structure

For each task in the task set, a label set

L_{v}

, where

v \in V

, needs to be stored. Each label in this label set can be represented as

L_{{v, m, h}} = {f^{delay} (x), f^{cost} (x), h_{p}, m_{p}}

, where

f^{delay} (x)

and

f^{cost} (x)

denote the delay and cost consumed by the offloading scheme x for completing task v.

h_{p}

is the length index, representing the position of the label on the y-axis in Figure 3. The maximum value of

h_{p}

is H.

m_{p}

is the execution node index, representing the position of the label on the x-axis in Figure 3, used to distinguish the offloading scenarios of individual tasks on different execution nodes.

5.2. Dominance Test

We use x and y to represent the two labels. The operator ⊙ represents the dominance test. By

x ⊙ y

, we calculate the non-dominated set

{x, y}

based on the delay and cost performance of x and y.

We use

(=, =)

to denote the situation where

x (delay) = y (delay)

and

x (cost) = y (cost)

, and similar notation is used for other cases. The

(=, =)

case implies two equivalent offloading schemes. During our computation, we only need to store the minimal complete set, so we can randomly select either x or y for storage. However, if we want to store the maximal complete set, since x and y represent different offloading schemes but have the same indicators, both should be stored, i.e.,

{x, y}

should be saved. In all other cases, the dominance relationship is clear.

x ⊙ y = {x, y}

means that x and y do not dominate each other;

x ⊙ y = {x}

means that y is dominated by x, and vice versa.

5.3. Exact Algorithm

A Bellman–Ford framework-based exact algorithm (Algorithm 4) is essentially a label correction (LC) algorithm [42,43] based on an improved version of the Bellman–Ford algorithm. In lines 1 to 2, we first initialize all label sets as empty. The outer loop runs for

N^{'} - 1

iterations, which is similar to the classic Bellman–Ford algorithm. From lines 3 to 18, we iterate over all tasks according to List I. For each task, we generate all offloading schemes by combining the label sets of all predecessor nodes, and then calculate new labels based on these offloading schemes. Afterward, we perform a dominated test (DT) on the new labels with those having the same execution node index. If the newly generated label is not dominated by any other label, it returns false. We remove the labels dominated by the new label and insert the new label into the label set; otherwise, we discard the new label. In simple terms, during each iteration, the Pareto frontier is stored for the label set of each node on each server based on the DT table in Table 1. Finally, in line 19, we find the offloading scheme that satisfies the cost constraint and minimizes the delay in the label set of the virtual node t, which serves as the optimal solution.

As shown in Figure 3, the labels

{1, 4}

and

{2, 3}

are retained because they are non-dominated. However, if a new label

{3, 3}

arrives at the same execution node as labels

{1, 4}

and

{2, 3}

, it cannot be added to the set

L_{v}

, as

{3, 3}

is dominated by

{2, 3}

. On the other hand, if the new label is

{2, 2}

, it will be added to the set

L_{v}

and the label

{2, 3}

will be removed. Moreover, since the x-axis in Figure 3 represents different execution nodes, even though the labels

{2, 4}

and

{3, 3}

are dominated by the labels

{1, 4}

and

{2, 3}

, respectively, they can still be added to the set

L_{v}

.

Algorithm 4 BFF(

G (V, E)

,

M_{v}

, T, List I)

1:: Initialize $L_{j, M_{j}, h} \leftarrow Ø$ for all $j \in V,$ for any $h \in [1, H]$ .
2:: Set $L_{s, a, 1} \leftarrow {0, 0, 1, a}$ .
3:: for $q = 1$ to $N^{'} - 1$ do
4:: for $j \in V$ in List I do
5:: for $m \in M_{j}$ do
6:: $k \leftarrow$ All permutations of $L_{i}$ , $i \in C (j)$ .
7:: $L_{new} \leftarrow {f^{delay} (k), f^{cost} (k), h^{*}, m}$ .
8:: if $D T (L_{new} ⊙ L_{j, m, *}) = f a l s e$ then
9:: for $h \in [1, H]$ do
10:: if $L_{new} ⊙ L_{j, m, h} = L_{new}$ then
11:: $L_{j, m, h} \leftarrow Ø$ .
12:: end if
13:: end for
14:: $L_{j, m} \leftarrow L_{j, m} \cup L_{new}$ .
15:: end if
16:: end for
17:: end for
18:: end for
19:: The optimal solution is the label value in $L_{t}$ that satisfies the cost constraint T and minimizes the delay.

Theorem 3.

The maximum time complexity of BFF is

N^{' 2} M^{'} {(H M_{m a x})}^{d_{i n}}

.

Proof.

The outermost loop runs

N^{'} - 1

times, where

N^{'}

denotes the total number of tasks. In each iteration of this loop, every task and server is traversed. Therefore, there are

N^{'} M^{'}

operations to find the new label set. Since each new label set is derived by permuting the label sets stored in all predecessor nodes, and each node’s label set contains at most

H M_{\max}

labels, where

M_{\max}

denotes the maximum value of

M^{'}

, the new label set can have at most

{(H M_{\max})}^{d_{in}}

labels. Thus, the maximum time complexity is

N^{' 2} M^{'} {(H M_{\max})}^{d_{i n}}

. □

6. Simulation Results

6.1. Simulation Environment and Settings

The simulation was performed on an average computer with Intel i7-13700H (Intel Corporation: Santa Clara, CA, USA) at 2.40 GHz and 16 GB RAM. All the considered algorithms were implemented in MATLAB 2023b. In our simulation, there are seven mobile devices, three edge servers, and one cloud server. We set

ϵ = 0.4

,

c = 0.935

watt,

P = 0.944

watt,

S_{n} = [4, 80]

Mb,

O_{j} = I_{j} = [0.1, 2]

Mb,

C_{C} = C_{E N} = \frac{1}{2} C_{E C} = 50

cycles/bit, and

f_{C} = 2 f_{E N} = 10 f_{E C}

, where C,

E N

, and

E C

denote the cloud server, edge server, and mobile device, respectively. Additionally, we set

B = \tilde{B} = 10^{6}

Hz,

σ^{2} = {\tilde{σ}}^{2} = 10^{- 10}

watt, and

f_{E N} = 10

GHz [9]. The processing cost for the mobile device is denoted as P, while the processing cost at the edge server is between 4 watt, and at the cloud server is 10 watt [39]. To better reflect the distance differences between different execution nodes, we set the distance between mobile devices to be 30 m, the distance between mobile devices and edge servers to be 100 m, the distance between edge servers to be 150 m, and the distances from mobile devices and edge servers to the cloud server to be 500 m and 400 m, respectively. Current research methods are predominantly based on deep learning or solved through heuristic algorithms [10,30,31], which cannot guarantee finding an approximate or optimal solution. Therefore, we do not compare our method with them. To evaluate the performance of the proposed algorithm, we compare it with the following methods:

Offloading only to servers (OOS): The BFF algorithm offloads tasks only to edge and cloud servers. If the final offloading scheme fails to meet the cost constraints, a significant amount of time will need to be spent on the local device to cache the services required to execute these tasks. Note that the BFF algorithm in this paper is primarily used for comparison with approximation algorithms, and it also performs well in certain scenarios.
FixDoc [19]: The algorithm is presented as Algorithm 1 in [19]. It is a task offloading method based on DP, which is the earliest proposed method in the problem we are considering. We use this algorithm to derive the task offloading strategy that minimizes the cost while satisfying the cost constraint.
Hermes [32]: A polynomial-time approximation algorithm used to solve the delay-minimization problem under the cost constraint. To ensure fairness, we impose a constraint of service caching on the original algorithm.
Brute force: This algorithm enumerates all possible offloading strategies to obtain the optimal solution.

6.2. Simulation Results

In Figure 4 and Figure 5, we compare the effect of cost constraints on the delay values and running time. Similar to [10], we generate a DAG for the structure of the real world, that is, the Gaussian elimination (GE) algorithm [44], and the dimension of the given graph is 4. For each task, the total number of mobile devices and edge servers corresponding to the service is cached as [2, 3]. Since the time complexity of OOS is identical to that of BFF, and its performance is inferior, while the brute force algorithm yields the same results as BFF but with significantly longer running times, we exclude OOS and the brute force algorithm from the comparison based on running time. As the cost constraint increases, the delay values of the offloading solutions generated by various algorithms progressively decrease. When the cost constraint is very small, the solutions obtained by OOS perform the worst. This is due to the high server cost, which causes all feasible solutions to violate the cost constraint. As a result, the algorithm is forced to spend a considerable amount of time performing local caching in order to execute the tasks. On the other hand, both TUL and Hermes consistently achieve a

(1 + ϵ)

-approximation solution. As the cost constraint is relaxed, the running times of these algorithms decrease. However, the running time of TUL is significantly smaller than that of Hermes. In contrast, BFF computes all possible solutions based on the DT table for each experiment and then selects the optimal solution that satisfies the cost constraint. As a result, the running time of BFF does not change as the cost constraint is relaxed. In all cases, the solutions obtained by BFF are identical to the optimal solution. When the cost constraint is very small, the running time of BFF is slightly smaller than that of Hermes. Although the time complexity of BFF is theoretically exponential, in practical applications, the number of labels retained during each task iteration tends to be relatively small, making the algorithm more efficient in real-world scenarios.

Figure 6 and Figure 7 describe the effect of task size on delay values and running time when the cost constraint is set to a tiny value. The task mapping is derived from a randomly generated DAG diagram. For each task, the total number of mobile devices and edge servers corresponding to the service is cached as [2, 3]. Due to the long running time of the brute force algorithm and the fact that its results are identical to those of BFF, it is not included in this experiment. Given that the time complexity of OOS matches that of BFF, but its empirical performance is inferior, we exclude OOS from the running time comparison. As the number of tasks grows, the delay values of the offloading solutions derived from different algorithms generally increase. Among them, OOS consistently shows the poorest performance, with the highest delay values across all task sets. On the other hand, FixDoc demonstrates the fastest running time as it solely utilizes DP to compute the minimum completion cost of task offloading. However, the solutions it provides result in relatively high delay values. Both TUL and Hermes are able to consistently provide a

(1 + ϵ)

-approximation solution, regardless of the task set being used. However, TUL stands out by achieving this with a significantly smaller running time compared to Hermes, which makes TUL more efficient in terms of execution speed. In contrast, BFF produces solutions with the lowest delay values, indicating that it optimizes delay performance better than the other algorithms. Its running time is obviously smaller than that of Hermes when the task quantity is low, but as the task set continues to grow, the difference in running time between it and TUL becomes increasingly large.

Figure 8 and Figure 9 provide a detailed illustration of how the approximation ratio

ϵ

influences the performance of the TUL algorithm. We used the same experimental setup as in Figure 4 and Figure 5, with the cost constraint set to be very small. From the experimental findings, it is evident that as the approximation ratio

ϵ

gradually increases, the delay values of the offloading solutions produced by TUL also tend to increase. Meanwhile, the running time of TUL decreases. Additionally, the running time of TUL is significantly smaller than that of the brute force algorithm. However, it is important to note that, even with the increase in

ϵ

, the delay values of these solutions never exceed the bound of

(1 + ϵ)

times the optimal solution. This behavior highlights the effectiveness of TUL in maintaining a close approximation to the optimal solution.

In Table 2 and Table 3, we compare the effects of the number of tasks, the number of execution nodes, and the value of

ϵ

on the two algorithms to analyze their approximation ratio, time complexity, and overall differences. In this experiment, the task mapping is derived from a randomly generated DAG diagram. to represent the task dependency structures. From the results, it is observed that when the number of tasks,

N^{'}

, increases while keeping other parameters constant, the delay and running time of each algorithm increase. However, due to the random generation of DAG, increasing the number of tasks may increase the width of the DAG rather than its depth, which sometimes results in a smaller delay value despite a larger number of tasks. When the number of execution nodes,

M^{'}

, increases while keeping other parameters constant, the delay and running time of each algorithm decrease. This is because the increased number of execution nodes leads to a larger set of feasible solutions, thereby improving the chances of finding better solutions. When the values of

N^{'}

,

M^{'}

, and

ϵ

are small, the running times and results of TUL and BFF show little difference. Since the time complexity of TUL and BFF are linearly related to

N^{'}

and

N^{' 2}

, respectively, the running time difference between TUL and BFF is not significant when the number of tasks increases. However, as the number of execution nodes increases, the running time difference between BFF and TUL becomes much more significant. This is because the running time of BFF is proportional to

M^{' d_{i n}}

, while the running time of TUL is proportional to

M^{' 2}

. In a DAG, the maximum indegree,

d_{i n}

, is usually greater than 2, so the running time of BFF increases much more significantly with

M^{'}

than that of TUL. Additionally, we observe that when

ϵ

decreases, TUL’s running time difference becomes more pronounced with increases in tasks and execution nodes. This is because the running time of TUL is inversely proportional to

ϵ

, so for the same changes in the number of tasks and execution nodes, smaller values of

ϵ

result in larger variations in running time. Furthermore, when the value of

ϵ

decreases while keeping other parameters constant, TUL typically produces more accurate results but at the cost of longer running times. Although there are cases where a larger

ϵ

yields better results, TUL guarantees a (1 +

ϵ

)-approximate solution. Regardless of the value of

ϵ

, TUL will never exceed the optimal solution by more than a factor of (1 +

ϵ

), further validating the truth of TUL’s approximation ratio.

7. Conclusions

In this paper, we propose a delay-minimization task offloading problem that takes into account task dependencies, service caching, and the maximum completion cost. Unlike traditional MEC, we add peripheral mobile devices to the offload to make it more realistic. Furthermore, we design a

(1 + ϵ)

-approximation algorithm and exact algorithm to solve this problem. The simulation results demonstrate that our approximation algorithm indeed achieves the approximation ratio

(1 + ϵ)

, and outperforms other algorithms related to this problem. The exact algorithm consistently produces the optimal solution.

Author Contributions

Conceptualization, B.C. and J.Z.; methodology, B.C. and J.Z.; software, B.C.; validation, B.C.; formal analysis, B.C. and J.Z.; investigation, B.C. and J.Z.; resources, J.Z.; data curation, B.C.; writing—original draft preparation, B.C.; writing—review and editing, J.Z.; visualization, B.C.; supervision, J.Z.; project administration, J.Z.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Young Talent Project of the “Xingdian Talent Support Program” of Yunnan Province of China (grant number C619300A129).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 1

Based on (6) and (8), it can be concluded that the maximum overestimation of

q_{S}

for each task equals the longest path of G. Therefore, the maximum number of overestimated

q_{S}

for any offloading scheme x is L. Then,

d (x) \leq \tilde{d} (x) S \leq d (x) + L S \leq d (x) + (L + 1) S

.

Appendix B. Proof of Lemma 2

We assume x is the offloading scheme obtained by Algorithm 2, and

x^{*}

is the optimal offloading scheme. When Test(

ϵ

,B) = Yes, according to Lemma 1, we can derive

\frac{d^{*}}{S} + L \geq \tilde{d} (x^{*}) \geq \tilde{d} (x) > B^{'} \geq \frac{B}{S} + L, d^{*} > B .

When Test(

ϵ

,B) = No, similarly, we can derive

d^{*} \leq \tilde{d} (x^{*}) S \leq \tilde{d} (x) S \leq B^{'} S \leq B + (L + 1) S = B + L B ϵ .

Since B =

U B

=

L B

, we have

d^{*} < (1 + ϵ) B

.

References

Mach, P.; Becvar, Z. Mobile edge computing: A survey on architecture and computation offloading. IEEE Commun. Surv. Tutor. 2017, 19, 1628–1656. [Google Scholar] [CrossRef]
Xu, J.; Chen, L.; Zhou, P. Joint service caching and task offloading for mobile edge computing in dense networks. In Proceedings of the IEEE Conference on Computer Communications, Honolulu, HI, USA, 16–19 April 2018; pp. 207–215. [Google Scholar]
Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge computing: Vision and challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
Chen, M.H.; Liang, B.; Dong, M. Joint offloading and resource allocation for computation and communication in mobile cloud with computing access point. In Proceedings of the IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar]
Bozorgchenani, A.; Mashhadi, F.; Tarchi, D.; Monroy, S.A.S. Multi-objective computation sharing in energy and delay constrained mobile edge computing environments. IEEE Trans. Mob. Comput. 2020, 20, 2992–3005. [Google Scholar] [CrossRef]
Chen, M.; Hao, Y. Task offloading for mobile edge computing in software defined ultra-dense network. IEEE J. Sel. Areas Commun. 2018, 36, 587–597. [Google Scholar] [CrossRef]
Abbas, N.; Zhang, Y.; Taherkordi, A.; Skeie, T. Mobile edge computing: A survey. IEEE Internet Things J. 2017, 5, 450–465. [Google Scholar] [CrossRef]
Mao, Y.; Zhang, J.; Letaief, K.B. Joint task offloading scheduling and transmit power allocation for mobile edge computing systems. In Proceedings of the IEEE Wireless Communications and Networking Conference, San Francisco, CA, USA, 19–22 March 2017; pp. 1–6. [Google Scholar]
Bi, S.; Huang, L.; Zhang, Y.J.A. Joint optimization of service caching placement and computation offloading in mobile edge computing systems. IEEE Trans. Wirel. Commun. 2020, 19, 4947–4963. [Google Scholar] [CrossRef]
Zhao, G.; Xu, H.; Zhao, Y.; Qiao, C.; Huang, L. Offloading tasks with dependency and service caching in mobile edge computing. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 2777–2792. [Google Scholar] [CrossRef]
Yang, Y.; Gong, Y.; Wu, Y.C. Intelligent-reflecting-surface-aided mobile edge computing with binary offloading: Energy minimization for IoT devices. IEEE Internet Things J. 2022, 9, 12973–12983. [Google Scholar] [CrossRef]
Liu, Z.; Li, Z.; Wen, M.; Gong, Y.; Wu, Y.-C. STAR-RIS-aided mobile edge computing: Computation rate maximization with binary amplitude coefficients. IEEE Trans. Commun. 2023, 71, 4313–4327. [Google Scholar] [CrossRef]
Chen, Y.; Li, K.; Wu, Y.; Huang, J.; Zhao, L. Energy efficient task offloading and resource allocation in air-ground integrated MEC systems: A distributed online approach. IEEE Trans. Mob. Comput. 2024, 23, 8129–8142. [Google Scholar] [CrossRef]
Qin, L.; Lu, H.; Chen, Y.; Chong, B.; Wu, F. Towards decentralized task offloading and resource allocation in user-centric MEC. IEEE Trans. Mob. Comput. 2024, 23, 11807–11823. [Google Scholar] [CrossRef]
Ra, M.R.; Sheth, A.; Mummert, L.; Pillai, P.; Wetherall, D.; Govindan, R. Odessa: Enabling interactive perception applications on mobile devices. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services, Bethesda, MD, USA, 28 June–1 July 2011. [Google Scholar]
Zhou, Y.; Liu, L.; Wang, L.; Hui, N.; Cui, X.; Wu, J.; Peng, Y.; Qi, Y.; Xing, C. Service-aware 6G: An intelligent and open network based on the con-vergence of communication, computing and cachin. Digit. Commun. Netw. 2020, 6, 253–260. [Google Scholar] [CrossRef]
Cheng, G.; Jiang, C.; Yue, B.; Wang, R.; Alzahrani, B.; Zhang, Y. AI-driven proactive content caching for 6G. IEEE Wirel. Commun. 2023, 30, 180–188. [Google Scholar] [CrossRef]
Kuo, T.Y.; Lee, M.C.; Kim, J.H.; Lee, T.S. Quality-aware joint caching, computing and communication optimization for video delivery in vehicular networks. IEEE Trans. Veh. Technol. 2023, 72, 5240–5256. [Google Scholar] [CrossRef]
Liu, L.; Tan, H.; Jiang, S.H.C.; Han, Z.; Li, X.Y.; Huang, H. Dependent task placement and scheduling with function configuration in edge computing. In Proceedings of the IEEE International Symposium on Quality of Service, Phoenix, AZ, USA, 24–25 June 2019; pp. 1–10. [Google Scholar]
Farhadi, V.; Mehmeti, F.; He, T.; La Porta, T.F.; Khamfroush, H.; Wang, S.; Chan, K.S.; Poularakis, K. Service placement and request scheduling for data-intensive applications in edge clouds. IEEE Trans. Netw. 2021, 29, 779–792. [Google Scholar] [CrossRef]
Lv, X.; Du, H.; Ye, Q. TBTOA: A DAG-based task offloading scheme for mobile edge computing. In Proceedings of the IEEE International Conference on Communications, Seoul, Republic of Korea, 16–20 May 2022; pp. 4607–4612. [Google Scholar]
Arabnejad, V.; Bubendorfer, K.; Ng, B. Budget and deadline aware e-science workflow scheduling in clouds. IEEE Trans. Parallel Distrib. Syst. 2019, 30, 29–44. [Google Scholar] [CrossRef]
Lou, J.; Tang, Z.; Zhang, S.; Jia, W.; Zhao, W.; Li, J. Cost-effective scheduling for dependent tasks with tight deadline constraints in mobile edge computing. IEEE Trans. Mob. Comput. 2023, 22, 5829–5845. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, J.; Zhou, Y.; Yang, L.; He, B.; Yang, Y. Dependent task offloading with energy-latency tradeoff in mobile edge computing. IET Commun. 2022, 16, 1993–2001. [Google Scholar] [CrossRef]
Wu, Q.; Ishikawa, F.; Zhu, Q.; Xia, Y.; Wen, J. Deadline-constrained cost optimization approaches for workflow scheduling in clouds. IEEE Trans. Parallel Distribut. Syst. 2017, 28, 3401–3412. [Google Scholar] [CrossRef]
Wang, M.; Zhang, Y.; He, X.; Yu, S. Joint scheduling and offloading of computational tasks with time dependency under edge computing networks. Simul. Model. Pract. Theory 2023, 129, 102824. [Google Scholar] [CrossRef]
Hosny, K.M.; Awad, A.I.; Khashaba, M.M.; Fouda, M.M.; Guizani, M.; Mohamed, E.R. Enhanced multi-objective gorilla troops optimizer for real-time multi-user dependent tasks offloading in edge-cloud computing. J. Netw. Comput. Appl. 2023, 218, 103702. [Google Scholar] [CrossRef]
Cai, Q.; Zhou, Y.; Liu, L.; Qi, Y.; Shi, J. Prioritized assignment with task dependency in collaborative mobile edge computing. IEEE Trans. Mob. Comput. 2024, 23, 13505–13521. [Google Scholar] [CrossRef]
Zhao, M.; Zhang, X.; He, Z.; Chen, Y.; Zhang, Y. Dependency-aware task scheduling and layer loading for mobile edge computing networks. IEEE Internet Things J. 2024, 11, 34364–34381. [Google Scholar] [CrossRef]
Zhao, L.; Zhao, Z.; Hawbani, A.; Liu, Z.; Tan, Z.; Yu, K. Dynamic caching dependency-aware task offloading in mobile edge computing. IEEE Trans. Comput. 2025, 74, 1510–1523. [Google Scholar] [CrossRef]
Li, Y.; Zhu, X.; Li, N.; Wang, L.; Chen, Y.; Yang, F.; Zhai, L. Collaborative content caching and task offloading in multi-access edge computing. IEEE Trans. Veh. Technol. 2023, 72, 5367–5372. [Google Scholar] [CrossRef]
Kao, Y.H.; Krishnamachari, B.; Ra, M.R.; Bai, F. Hermes: Latency optimal task assignment for resource-constrained mobile computing. IEEE Trans. Mob. Comput. 2017, 16, 3056–3069. [Google Scholar] [CrossRef]
Eshraghi, N.; Liang, B. Joint offloading decision and resource allocation with uncertain task computing requirement. In Proceedings of the IEEE INFOCOM, Paris, France, 29 April–2 May 2019; pp. 1414–1422. [Google Scholar]
Tan, C.W.; Chiang, M.; Srikant, R. Fast algorithms and performance bounds for sum rate maximization in wireless networks. IEEE Trans. Netw. 2013, 21, 706–719. [Google Scholar] [CrossRef]
Tan, C.W. Wireless network optimization by Perron-Frobenius theory. Found. Trends® Netw. 2015, 9, 107–218. [Google Scholar] [CrossRef]
Zhang, W.; Wen, Y.; Wu, D.O. Collaborative task execution in mobile cloud computing under a stochastic wireless channel. IEEE Trans. Wirel. Commun. 2015, 14, 81–93. [Google Scholar] [CrossRef]
Zhou, W.; Fang, W.; Li, Y.; Yuan, B.; Li, Y.; Wang, T. Markov approximation for task offloading and computation scaling in mobile edge computing. Mob. Inf. Syst. 2019, 2019, 8172698. [Google Scholar] [CrossRef]
Younis, A.; Tran, T.X.; Pompili, D. Energy-latency-aware task offloading and approximate computing at the mobile edge. In Proceedings of the 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems, Monterey, CA, USA, 4–7 November 2019; pp. 299–307. [Google Scholar]
Sundar, S.; Liang, B. Offloading dependent tasks with communication delay and deadline constraint. In Proceedings of the IEEE INFOCOM, Honolulu, HI, USA, 16–19 April 2018; pp. 37–45. [Google Scholar]
Flipsen, B.; Geraedts, J.; Reinders, A.; Bakker, C.; Dafnomilis, I.; Gudadhe, A. Environmental sizing of smartphone batteries. In Proceedings of the IEEE 2012 Electronics Goes Green 2012+, Berlin, Germany, 9–12 September 2012; pp. 1–9. [Google Scholar]
Hassin, R. Approximation schemes for the restricted shortest path problem. Math. Oper. Res. 1992, 17, 36–42. [Google Scholar] [CrossRef]
Paixão, J.M.; Santos, J.L. Labeling methods for the general case of the multi-objective shortest path problem—A computational study. In Computational Intelligence and Decision Making: Trends and Applications; Springer: Berlin/Heidelberg, Germany, 2012; pp. 489–502. [Google Scholar]
Gandibleux, X.; Beugnies, F.; Randriamasy, S. Martins’ algorithm revisited for multi-objective shortest path problems with a MaxMin cost function. 4OR 2006, 4, 47–59. [Google Scholar] [CrossRef]
Topcuoglu, H.; Hariri, S.; Wu, M.Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distribut. Syst. 2002, 13, 260–274. [Google Scholar] [CrossRef]

Figure 1. System model.

Figure 2. An example of an offloading scheme that considers both service caching and task dependency.

Figure 3. Illustration of storing and updating labels of label set

L_{v}

.

Figure 3. Illustration of storing and updating labels of label set

L_{v}

.

Figure 4. Effect of cost on delay.

Figure 5. Effect of cost on running time.

Figure 6. Effect of task size on delay.

Figure 7. Effect of task size on running time.

Figure 8. Effect of

ϵ

on TUL.

Figure 8. Effect of

ϵ

on TUL.

Figure 9. Effect of

ϵ

on running time of TUL.

Figure 9. Effect of

ϵ

on running time of TUL.

Table 1. Dominance test.

⊙	$x^{delay} > y^{delay}$	$x^{delay} < y^{delay}$	$x^{delay} = y^{delay}$
$x^{cost} > y^{cost}$	${y}$	${x, y}$	${y}$
$x^{cost} < y^{cost}$	${x, y}$	${x}$	${x}$
$x^{cost} = y^{cost}$	${y}$	${x}$	${x ∣ y}$

Table 2. Effect of task quantity, the number of execution nodes, and the value of

ϵ

on the delay of the two algorithms.

Table 2. Effect of task quantity, the number of execution nodes, and the value of

ϵ

on the delay of the two algorithms.

	$N^{'} = 7, M^{'} = 2$	$N^{'} = 7, M^{'} = 8$	$N^{'} = 12, M^{'} = 2$	$N^{'} = 12, M^{'} = 8$	$N^{'} = 17, M^{'} = 2$	$N^{'} = 17, M^{'} = 8$
BFF	763	335	1345	510	1326	765
TUL( $ε = 0.2$ )	763	335	1345	510	1326	765
TUL( $ε = 0.5$ )	763	367	1345	510	1326	813
TUL( $ε = 0.8$ )	763	335	1533	591	1326	861

Table 3. Effect of task quantity, the number of execution nodes, and the value of

ϵ

on the running time of the two algorithms.

Table 3. Effect of task quantity, the number of execution nodes, and the value of

ϵ

on the running time of the two algorithms.

	$N^{'} = 7, M^{'} = 2$	$N^{'} = 7, M^{'} = 8$	$N^{'} = 12, M^{'} = 2$	$N^{'} = 12, M^{'} = 8$	$N^{'} = 17, M^{'} = 2$	$N^{'} = 17, M^{'} = 8$
BFF	0.02	53.73	0.05	94.42	0.12	140.48
TUL( $ε = 0.2$ )	0.07	4.13	0.14	6.08	0.29	17.54
TUL( $ε = 0.5$ )	0.02	1.03	0.02	1.39	0.07	2.84
TUL( $ε = 0.8$ )	0.007	0.51	0.01	0.79	0.04	1.58

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, B.; Zhang, J. Exact and Approximation Algorithms for Task Offloading with Service Caching and Dependency in Mobile Edge Computing. Future Internet 2025, 17, 255. https://doi.org/10.3390/fi17060255

AMA Style

Cui B, Zhang J. Exact and Approximation Algorithms for Task Offloading with Service Caching and Dependency in Mobile Edge Computing. Future Internet. 2025; 17(6):255. https://doi.org/10.3390/fi17060255

Chicago/Turabian Style

Cui, Bowen, and Jianwei Zhang. 2025. "Exact and Approximation Algorithms for Task Offloading with Service Caching and Dependency in Mobile Edge Computing" Future Internet 17, no. 6: 255. https://doi.org/10.3390/fi17060255

APA Style

Cui, B., & Zhang, J. (2025). Exact and Approximation Algorithms for Task Offloading with Service Caching and Dependency in Mobile Edge Computing. Future Internet, 17(6), 255. https://doi.org/10.3390/fi17060255

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exact and Approximation Algorithms for Task Offloading with Service Caching and Dependency in Mobile Edge Computing

Abstract

1. Introduction

2. Related Works

3. System Model and Problem Formulation

3.1. System Model

3.2. Problem Formulation

4. Approximation Algorithm

4.1. Upper and Lower Bounds of Delay

4.2. Polynomial-Time Approximation Algorithm

5. Exact Algorithm

5.1. Label Structure

5.2. Dominance Test

5.3. Exact Algorithm

6. Simulation Results

6.1. Simulation Environment and Settings

6.2. Simulation Results

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Lemma 1

Appendix B. Proof of Lemma 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI