Mobility-and Energy-Aware Cooperative Edge Ofﬂoading for Dependent Computation Tasks †

: Cooperative edge ofﬂoading to nearby end devices via Device-to-Device (D2D) links in edge networks with sliced computing resources has mainly been studied for end devices (helper nodes) that are stationary (or follow predetermined mobility paths) and for independent computation tasks. However, end devices are often mobile, and a given application request commonly requires a set of dependent computation tasks. We formulate a novel model for the cooperative edge ofﬂoading of dependent computation tasks to mobile helper nodes. We model the task dependencies with a general task dependency graph. Our model employs the state-of-the-art deep-learning-based PECNet mobility model and ofﬂoads a task only when the sojourn time in the coverage area of a helper node or Multi-access Edge Computing (MEC) server is sufﬁciently long. We formulate the minimization problem for the consumed battery energy for task execution, task data transmission, and waiting for ofﬂoaded task results on end devices. We convert the resulting non-convex mixed integer nonlinear programming problem into an equivalent quadratically constrained quadratic programming (QCQP) problem, which we solve via a novel Energy-Efﬁcient Task Ofﬂoading (EETO) algorithm. The numerical evaluations indicate that the EETO approach consistently reduces the battery energy consumption across a wide range of task complexities and task completion deadlines and can thus extend the battery lifetimes of mobile devices operating with sliced edge computing resources.


Motivation
The introduction of Multi-access Edge Computing (MEC) has facilitated the rapid growth of low-latency services provided by emerging paradigms, such as the Tactile Internet [1], the Internet of Things (IoT) [2][3][4], and Machine-Type-Communications (MTC) [5], as well as demanding applications, such as online gaming, virtual or augmented reality [6,7], and real-time data analytics [8]. The MEC concept brings the computation and storage resources as close as possible to the mobile end devices, namely towards the edge of the network, e.g., to cellular base stations (BSs) and WiFi access points [9][10][11][12][13]. Essentially, the MEC concept is a part of the ongoing trend to jointly provide communication,

Overview of Contributions and Structure of This Article
Our main contribution of this article, which extends the conference paper [24], is the development and evaluation of the online Energy-Efficient Task Offloading (EETO) algorithm that minimizes the battery energy consumption in mobile end devices with dependent computation tasks. Towards the development of EETO, we explicitly model the dependencies of the individual computation tasks of an application request in a task dependency graph in Section 3.2. We incorporate the recently developed and validated deep-learning-based PECNet user trajectory prediction into our model in Section 3.5 to account for the sojourn times of a mobile UE that offloads tasks within the coverage areas of the various prospective presently nearby (but mobile) helper nodes that can assist by taking over computation tasks or relaying computation tasks to a stationary MEC server.
We formulate the energy minimization problem in Section 4, considering the battery energy expenditures for task computation, task data transmission, and waiting for offloaded tasks to complete. In order to efficiently solve the energy minimization problem, which is a non-convex mixed integer nonlinear program, we conduct a transformation to a quadratically constrained quadratic programming (QCQP) problem in Section 5.1 and specify the EETO algorithm in Section 5.2. To the best of our knowledge, the EETO algorithm is the first task offloading algorithm in a device-enhanced MEC setting that accommodates dependent computation tasks and employs deep-learning-based trajectory prediction for both the mobile UE that offloads tasks as well as the mobile helper nodes. The performance evaluation in Section 6 indicates that EETO flexibly accommodates a wide range of scenarios and parameter settings and consistently achieves low battery energy consumption, whereas benchmarks typically perform well only for a specific scenario or narrow parameter range.

Mobility Models
User mobility, i.e., the movement of devices or vehicles in the area of an MEC network, introduces several challenges for task offloading in cooperative MEC networks, which have been considered in relatively few prior studies. A matching-based algorithm for choosing the proper task offloading method to road side units (RSUs) and nearby vehicles in vehicular edge computing has been proposed in [46]. The matching-based algorithm aims at optimizing the system utility in terms of latency as well as computing and communication costs. The vehicle locations in [46] are modeled by 2D Euclidean coordinates, while their mobility follows a constant velocity model which is predetermined along roadways; a similar roadways-based mobility model has been considered in [47] (various other studies, e.g., [48], have examined similar mobility models for offloading to RSUs only, not to other vehicles). The mobility models based on roadways are appropriate for the specific case of vehicular networks but are not suitable for general mobility scenarios that are not restricted to predetermined roadways. The study [49] considered cooperative task offloading with mobile MEC servers that are mounted on unmanned aerial vehicles (UAVs), and a noncooperative version of this UAV-assisted MEC is studied in [50]. The flight path of the UAVs is optimized to effectively support the offloading of independent tasks that can be partitioned. In contrast to these preceding studies, we consider general mobility scenarios for the end devices and stationary MEC servers.
The study [51] used a deep-learning-based algorithm to predict the user mobility trajectories and locations in general mobility scenarios to develop an online algorithm for the non-cooperative offloading of tasks from a mobile end device to stationary MEC servers. However, the study [51] did not consider the cooperation between adjacent end devices. In contrast, we consider cooperative task offloading to adjacent mobile end devices reached with D2D communication and to stationary MEC servers with cellular communication.
The study [52] considered an elementary position and direction vector mobility model for cooperative task offloading. A generic three-layer cooperative edge computing network architecture is presented in [53] taking into account the mobility effects of the users by considering the sojourn time with exponential distribution for the coverage of fog nodes. The study [53] parameterizes the exponential distribution for the sojourn times via a Gaussian distribution and notes that a future work direction is to employ machine learning for mobility modeling. The study [54] considered the cooperative task offloading of independent tasks utilizing human pedestrian trajectories that are predicted via data mining techniques [55,56]. In the present paper, we tackle the future work direction noted in [53] and advance the mobility modeling in [54] by formulating the task offloading model with the PECNet deep learning trajectory method, as elaborated in Section 3.5, in the context of dependent computation tasks.

Task Dependency Models
The existing task offloading studies typically considered independent computation tasks. To the best of our knowledge, only the non-cooperative MEC task offloading study [51] considered task dependencies. However, the task dependencies in [51] are limited to a simple sequential (linear) task dependency, where each task depends only on the immediately preceding task in a linear task sequence. A general task dependency graph has been considered in the non-cooperative task offloading study [57]. However, both studies [51,57] offloaded tasks in a non-cooperative fashion, i.e., only to installed stationary MEC and cloud servers (not to mobile helper end devices). In contrast, we consider arbitrary task dependencies that are represented by a general task dependency graph for cooperative task offloading to mobile helper end devices and stationary MEC servers.
To the best of our knowledge, the present study is the first to develop and evaluate a task offloading algorithm that minimizes the battery energy consumption for arbitrary task dependencies in a cooperative edge computing setting with mobile helper nodes. The proposed EETO algorithm employs a state-of-the-art deep-learning-based trajectory prediction for general mobility settings.

Overview
We consider a three-layer heterogeneous network with multiple devices and small cells. As illustrated in Figure 1, each small cell has a base station (BS), which could, for instance, be a Wi-Fi access point or a femto cell base station. In addition, each BS is affiliated with an MEC server, e.g., an MEC server could be directly attached to the BS, or the MEC service could be provided to the BS via an edge cloud network architecture [58]. The devices, which in the 3GPP standardization language are referred to as user equipment nodes, are mobile. While the 3GPP standardization language uses the abbreviation UE for all user equipment nodes, we reserve the UE abbreviation for a device with a set of computation-intensive tasks. The remaining devices that have sufficient computation/communication resources to be able to act as helpers or relays, such as mobile phones, tablets, and laptop computers, are referred to as user equipment helper nodes (UHs). For simplicity of the system model, we consider a single UE with a set of computation-intensive tasks in this study. The system model with a single UE is directly applicable for scenarios that combine a low density of end devices (UEs) with computation-intensive tasks with a generally high density of UHs, such that each UE can essentially form its own surrounding cloud of adjacent D2D-connected UHs. Similar to our system model, the preceding non-cooperative dependent-task offloading studies considered either a single UE with a set of dependent tasks [51] or a set of UEs (each with one task) that, in the aggregate, form one set of dependent tasks [57]. The system model in this article can serve as a basis for future model extensions to multiple UEs (each with a set of dependent tasks) or, effectively, to multiple independent sets of dependent tasks. One possible strategy for this extension is to model the UHs with already assigned tasks as being outside of the D2D ranges of UEs that seek to offload tasks.
The UHs are randomly distributed along the UE's path, and we let I = {UH 1 , UH 2 , . . . , UH I } represent the set of UHs. A UH i ∈ I has a service coverage radius R i . Similarly, we denote M = {S 1 , S 2 , . . . , S M } as the set of edge servers, whereby server S m ∈ M has the service coverage radius R m . The offloading process is controlled by the macro BS (MBS), which knows the channel states and the user positions, whereby the computational load and reliability aspects of the MBS control can be addressed through decentralized control plane techniques [59]. The MBS is responsible for making the offloading decisions. The system model notations are summarized in Table 1.

Task Model
We assume that the computation-intensive applications can be divided into tasks of different sizes which have to be executed within limited time frame constraints, whereby parallel execution of tasks is possible, subject to the task dependency constraints. For example, a video navigation application running on a smartphone [60] can be modeled as a set of tasks with dependencies as characterized by a general dependency graph. More formally, we denote K = {1, 2, . . . , K} for the set of tasks in a computation-intensive application. Each task is associated with a set of parameters {b k , c k }, whereby b k is the data size of computation task k (in bits), and c k denotes the required computation resources to execute the computations for each bit in task k (in CPU cycles/bit), corresponding to a total amount of b k c k computation resources (in CPU cycles) required to execute the task k. The computationally intensive application has a deadline T max d for the execution of all K tasks.
The dependency relationship of tasks implies an execution order, according to which a given task may have to wait for its predecessor to be executed first (see Figure 2). We define the concepts of the start and finish time of a task to model this effect for the computation offloading decision algorithm as follows: • Finish time is the time instant of the execution completion of task k: whereby ST k is the start time of task k as defined next, and T exe k is the execution time (span) for task k.

•
Start time is the time instant when the execution of task k can commence: whereby the set P (k) contains all predecessor tasks of task k. According to Equation (2), the execution of a task k without predecessors (P (k) = ∅) can start immediately, while the start time of a task k that depends on predecessor tasks (P (k) = ∅) equals the maximum finish time FT j of the respective predecessor tasks j ∈ P (k). Figure 2. Example illustration of general task dependency graph specifying the required task execution order for an application with a total of K = 10 tasks. Tasks k = 2, 3, and 4 depend on the prior completion of task k = 1. The last task K = 10 has to be completed by the deadline T max d .

Communication Model
The achievable up-link data rate for the transmission of task k can be obtained based on the Shannon theorem as whereby B is the channel bandwidth between the sender and receiver. The sender options include the UE and the UHs, and the receivers can be the UHs and the MEC servers. We denote P tr k for the transmission power for task k, denote H k for the channel gain between the sender and receiver while transmitting task k, and denote σ 2 for the Gaussian channel noise variance (with default value σ 2 = 10 −9 W). We note that more complex channel models, e.g., models that include fading and shadowing, can be substituted into our overall task offloading model in a straightforward manner and are left as a future research direction.
In our scenario, the computed results of a task are of negligible size, and their downlink transmission is not explicitly modeled. We note that in order to mitigate interference, the UE should be actively uploading (transmitting) only one task at a time to a server via the cellular channel and one task at a time to a helper or relay node via the D2D channel, whereby both cellular and D2D transmission can occur simultaneously as they do not interfere with each other. We model both the cellular channel and the D2D channel to have bandwidth B. We do not consider the transmission sequencing to at most one cellular and one D2D transmission at a time in the current model and leave this transmission sequencing as a future model refinement.

Computation Model
There are four alternative decisions for task execution in our dynamic offloading decision algorithm: (a) local execution on the UE, (b) remote execution on a helper UH, or (c) remote execution on the MEC server, whereby a task can be transmitted directly by the UE to the MEC server or via a relay (UH). We proceed to define each possible decision in detail.

Local Execution
If our optimization algorithm decides to execute task k locally at the UE, then the execution time is where b k is the data size of task k in bits, c k denotes the required computational CPU cycles per bit for task k, and f UE k is the UE's CPU cycle frequency allocated to execute task k. Based on Equation (4) and the effective switched capacitance e, which characterizes the chip architecture [61], the UE battery energy consumption for local execution can be estimated as whereby we set the default effective switched capacitance e values to e UE = 10 −25 F and e UH i = 0.8 · 10 −27 F in our evaluations in Section 6. We set the decision variable x k = 1 if task k is executed locally; otherwise, x k = 0.

Helper Execution
The task execution process by a helper UH i consists of two steps. First, the UE transfers task k to the helper UH i via D2D communication on the up-link, and then task k is executed by UH i . As illustrated in Figure 3, the total execution delay is whereby r UH i k is the transmission rate from the UE toUH i , and f UH i k is the CPU cycle frequency of UH i allocated for the execution of task k. We neglect the delay for sending the result back from helper node UH i to the UE in our model due to the typically small number of output bits. The total task execution delay T h i k consists of the transmission delay t tr , and the transmission delay for returning the computation result (which is neglected in our model).
Following Equation (6), the battery energy consumption of the helper execution mode consists of (a) the transmission from the UE to the helper UH i , (b) the helper energy consumption for task execution, and (c) the energy that the UE consumes to wait for receiving the result back from the helper node: where P tr k is the UE transmission power, and P wait k is the UE idle circuit power while the UE is waiting to receive the result back.
As described in Section 3.1, there are I number of helpers available for the UE in the area. Let h i k = 1, h i k ∈ H, indicate that the UE chose to offload task k to UH i , whereby the set H = {h i k |k ∈ K, i ∈ I} contains all helper node selection variables. Since task k can only be offloaded to at most one helper at the same time, an offloading selection algorithm should follow the constraint

Server Execution
For task execution on the server, there are two possible paths for offloading the tasks.

Direct Offloading from UE to MEC Server
The time delay for executing task k can be calculated as where r S m k is the transmission rate from the UE to server S m , and f S m k denotes the CPU computation cycle frequency of server S m for the execution of task k. The corresponding UE battery energy consumption is In Equation (10), the MEC server energy consumption for executing task k is not included, since MEC servers do not typically rely on battery power. Throughout this study, the focus is on saving battery energy consumption. Incorporating the saving of energy consumption in the MEC servers, which are powered from the wired grid, is an interesting direction for future research.
Since there are M servers available for the UE in the area, the UE can choose to offload task k to S m , and in this case, s m k = 1, whereby the set s m k ∈ S, S = {s m k |k ∈ K, m ∈ M} contains all MEC server node selection variables. Since a given task k can only be offloaded to one MEC server (and not split among multiple MEC servers), the offloading selection algorithm should follow the constraint The UE first sends task k to the relay UH i , which then forwards the task to the server S m . The delay consists of three steps for transmissions and computation: whereby r UH i k is the transmission rate from the UE to UH i , and r S m,i k is the transmission rate from UH i to server S m . The UE and UH battery energy consumption is where P tr,i k is the transmission power of UH i . We assume that the computation result is sent back directly from server S m to the UE and that the transmission delay for the result is negligible. We define the selection variables s i,m k = 1, s i,m k ∈ HS for offloading task k via UH i to S m , whereby the set HS = {s i,m k |k ∈ K, i ∈ I, m ∈ M} contains all relays and MEC server selection variables. Since a given task k can only be serviced via a single helper node UH i by a single server S m , the offloading selection algorithm should follow the constraint

Mobility Model
Based on recent studies, most user trajectories contain similar patterns [51,62]. To incorporate this insight so as to achieve effective mobility-aware task offloading, we employ machine learning to predict the UE's and the UHs' paths to estimate their available service coverage time. Specifically, we employ a Predicted Endpoint Conditioned Network (PEC-Net) [63] as a deep-learning-based method for predicting socially compliant trajectories which infer users' destinations to assist prediction. This enhances the plausibility of predictions for trajectories in addition to using the historical data of motion paths, yielding coherent user trajectories. The main idea of PECNet is to divide the prediction problem into two parts. The first part estimates the potential destinations of users using an endpoint estimation variational autoencoder (VAE). The second part predicts socially compliant trajectories while jointly considering the motion history and potential destinations of all users in the scene.
The PECNet system model [63], which is illustrated in Figure 4, includes three key elements: a past trajectory encoder, endpoint VAE, and social pooling module. First, a user's motion histories are encoded via the past trajectory encoder. Then, the result is fed into the endpoint VAE to estimate the user's destination. Subsequently, the social pooling module uses the encoded past trajectory and the estimated destinations of all users to jointly predict the future path of every user in the scene. The final output are paths whose future segments (i.e., the predictions) are strongly dependent on the past locations (i.e., the inputs). Similar to prior studies involving mobility prediction, e.g., [12,54], we consider time to be suitably slotted (whereby we consider the typical 0.4 s slot duration for pedestrian mobility). In our scenario, the positions of the UE and a helper node UH i at time t are defined as X UE t = (x t , y t ) and X UH i t = (x t i , y t i ), respectively. The trajectories of the UE and helpers UH i from time (t − n + 1) to t are: This information can be collected by the macro BS. The PECNet takes as input the trajectories of the UE and helpers UH i and outputs the predicted movements from time (t + 1) to (t + l): where The sojourn time of the UE in UH i 's coverage from time t is then The same process can be utilized to obtain the sojourn time T s m s,t of the UE in the coverage of MEC server S m , as well as the sojourn time T s i,m s,t of UH i in the coverage of MEC server S m .
As we observe from Figure 1, due to the users' mobility, the service coverage is limited; however, in our method, by defining the concept of sojourn time T s,t and finish time FT k of the tasks, users only choose a destination if the period of the coverage availability (T s,t ) is longer than the required time (FT k ) for the task execution.
The validations in [63] indicate that PECNet is highly accurate. We assume that the PECNet predictions are correct, similar to related studies that assume perfectly correct mobility predictions, e.g., [48,53]. Various mechanisms for mitigating any rarely occurring prediction errors can be examined in future research. One strategy for reducing the chance of prediction errors could be to impose "safety margins" on the predicted sojourn times. Another strategy could be to classify the tasks according to their level of criticality and to classify the helpers according to their reliability; then, tasks could be assigned under consideration of the task criticality and helper reliability to minimize potential offloading failures.

Dynamic Computation Offloading Problem Formulation
Considering the offloading decision options defined in Section 3, our joint cost optimization of the computation and communication cooperation in the device-enhanced MEC system for the execution of task k considering the task's execution deadline can be defined based on the battery energy consumption for the execution of task k: where x k , h i k , s m k , and s i,m k represent the binary variables for the offloading decision according to the strategies introduced in Section 3.4. For the execution of task k, only one of these variables can be 1, and the rest are 0, giving the task execution time Hence, the finish time FT k of task k is obtained by inserting T exe k from Equation (22) into Equation (1).
The optimization problem of minimizing the energy cost considering the service execution deadline, the mobility of the end devices, and the task dependencies can be formulated based on the total energy consumption E exe tot = ∑ k∈K E exe k : ] are the start times. The decision-making variables are present in the constraints C1 and C2. Constraint C3 defines the start time for each task k based on its predecessor tasks. The execution of tasks without predecessor tasks can start immediately, and the start time of all other tasks equals the maximum finish time of their respective predecessors. Constraint C4 indicates that the finish time of the last task K should be less than or equal to the total time deadline T max d of the application. Constraint C5 represents that an offloading destination may only be chosen if the sojourn time T s,k of the UE in the range of the computation resource is longer than the time FT k needed to finish executing task k. Utilizing Equation (20), the sojourn time for the different offloading options can be calculated as In Equation (24), T h i s,t denotes the UE sojourn time in the range of helper node UH i , T s m s,t is the UE sojourn time in the range of MEC server S m , and T s i,m s,t is the sojourn time of helper UH i in the coverage area of MEC server S m . If the decision is made to offload task k to either the helper node UH i (h i k = 1) or MEC server S m (s m k = 1), the finish time FT k of task k should be shorter than the sojourn time T s,k in the coverage of that computation resource, meaning that any chosen resource should remain in range until the completion of the execution of task k. The offloading to server S m via relaying by helper node UH i requires that the UE is in the coverage area of the helper (with corresponding sojourn time T h i s,t ), that the helper is in the coverage of the server (T s i ,m s,t ), and that the UE is still in the coverage of the server (T s m s,t ) for directly returning the computation results from the server to the UE.
We further note that the sojourn time T s,k in constraint C5 depends on the time t, as explicitly indicated by the t subscript on the right-hand side of Equation (24) and in Equation (20). In order to avoid notational clutter, we have omitted the t subscript on the left-hand side of Equation (24) and in the optimization problem in Equation (23). However, we emphasize that the optimization problem in Equation (23) is solved in an online fashion at a particular time t when a UE request for computing a set of K tasks arrives. The start time ST k of a task (with ST 1 = 0) is relative to the UE request arrival time t.

Solution of Dynamic Computation Offloading Optimization Problem
The optimization problem OP1 is a non-convex mixed integer nonlinear programming problem due to the binary constraints and, therefore, hard to solve in polynomial time [64]. To facilitate an efficient solution, it should first be transformed to an equivalent quadratically constrained quadratic programming (QCQP) problem. Then, the binary offloading decisions can be recovered using a semidefinite relaxation (SDR) and stochastic mapping method. The QCQP can be efficiently solved, e.g., with conic optimization solvers [65], especially with accelerators [66,67]. On a contemporary personal computer with a moderate performance level, the QCQP solution takes on the order of 1 second for the typical problems considered in the evaluation in Section 6. With the appropriate computing hardware at BSs and suitable accelerators, solution times on the order of 0.1 seconds or less are anticipated, making the online solution of the optimization feasible for scenarios with pedestrian-level mobility. We outline additional strategies for speeding up the offloading decision making in future research in Section 7.2.

Conversion of OP1 into QCQP
The first conversion step for OP1 is to convert the integer constraints to a quadratic formulation: With these quadratic constraint formulations, the QCQP transformation of OP1 becomes C2 : C3 0 : ST k = 0 ∀j ∈ P (k), P (k) = ∅, ∀k ∈ K; C3 1 : ST k − FT j ≥ 0, ∀j ∈ P (k), P (k) = ∅, ∀k ∈ K; where we reformulated C1 and vectorized C3. Next, we define a vector v of dimension ((2 + I + M + I M)K + 1) × 1 as v = [α, β, 1] T , and a standard unit vector e j with the jth entry equal to 1 and dimension ((2 as the set of decisions to which devices the tasks should be offloaded to or through, respectively; n 0 is the vector of energy consumptions associated with a full set of decisions, extended by a row of zeros for the optimizer, and are the reformulated equations for the constraints, consisting of standard unit vectors as well as task execution times and sojourn times. The vector n 1k corresponds to the (1 + I + M + I M) probabilities of the offloading strategies for task k and additionally the task k deadline (used in C5). In the definition of n 2 , T l K is the local task K execution time from Equation (4), T h i K is the UH i task K execution time from Equation (6), T s m K is the server m task K execution time from Equation (9), and T s i,m K is the task k execution time with relaying via UH i to server m from Equation (12); these execution times for the various tasks k, k = 1, . . . , K, are also used in the definition of n T . The vector n T s of the sojourn times has the same dimension as n T . Thus, the QCQP transformation of OP2 can be written as C3 0 : (e (1+I+M+I M)K+1 ) T v = 0, ∀j ∈ P (k), P (k) = ∅, ∀k ∈ K; For the further change to the homogeneous QCQP OP4, we define g = [v T 1], n 1 = diag(n 1k ), n 3 = diag(n 3 ), a = (2 + I + M + I M)K, b = (1 + I + M + I M)K + 1, and c = (1 + I + M + I M)K + k. In order to convert the necessary constraints for the variety of device roles into the standard form for QCQP solvers, we convert all constraints into the matrix form Then, OP4 : min C2 : g T M 2 g = 1, ∀k ∈ K, C3 0 : g T M 3 g = 0, ∀j ∈ P (k), P (k) = ∅, ∀k ∈ K; C3 1 : g T M 3 g ≥ 0, ∀j ∈ P (k), P (k) = ∅, ∀k ∈ K; C4 : g T M 4 g T max d ; C5 : g T M 5 g ≤ 0, ∀k ∈ K.
For the final conversion step, we define the symmetric, positive semidefinite matrix G = gg T and write with Tr(·) denoting the trace of a square matrix: While this problem OP5 has a few additional constraints, the single constraints are much easier to handle for the solver. For example, constraints C6-C9 are simple checks for single elements of the matrix G.

Energy-Efficient Task Offloading (EETO) Algorithm
In this section, we propose a stochastic mapping method to obtain the optimized offloading strategy according to the inter-task dependency and the whole application completion time T max d . We apply SDR by dropping the last non-convex constraint of rank 1 to obtain an approximate solutionG, the last row of which includes [α, β, 1, 1]. If rank(G) = 1, then we directly extract α as an offloading decision for all tasks from the last row ofG. Otherwise, we employ a probability-based stochastic mapping method to recover the solution. For each task k, we select the largest value of each offloading decision from the elements group with index k in α and denote them as t 1 , t 2 , t 3 , and t 4 . We map t 1 , t 2 , t 3 , and t 4 with the probability-based stochastic mapping method: where Q 1 , Q 2 , Q 3 , and Q 4 are the probabilities of the corresponding offloading decision being 1. We randomly set t  (32) and (33), we stochastically map the probabilities that the corresponding offloading decision would be selected with Q 1 , Q 2 , Q 3 , and Q 4 , respectively. Then, we randomly set one of the two numbers to 1, while we set the rest of the nine elements to 0, which means that only one offloading decision would be selected for the currently considered task. For each task, we perform the same stochastic mapping, and after a decision has been made for each task, we obtain the offloading strategyα. Furthermore, we compare FT K with T max d , and the strategy could be a final solution only if FT K ≤ T max d . For higher accuracy, we repeat the process L times to obtain a set of solutions and select the solution that yields the minimum energy cost. The algorithm can be summarized as Algorithm 1.
We note that if none of the task execution strategies from Section 3.4 are feasible for a task, then the task is too complex, and the solution of the QCQP will fail, i.e., there is no feasible solution for the EETO algorithm. Table 1) Output: Offloading strategy α Initialize: Predict paths of UE and helpers with PECNet with historical paths X UE , X UH i , calculate sojourn times T s,k with Equation (24). Initialize all matrices in Equation (31). Solve the SDP in Equation (31) without the rank-1 constraint to get the optimal solutionG. Extract the first (1 + I + M + I M)K elements of the last row inG as α ; if rank(G)==1 then α = α ; else for l = 1:L do for k = 1 : K do s(k) = (1 + I + M + I M) elements in α (l) related to task k; Perform probab. based stoch. mapping: Set one element of s(k) to 1, and others to 0; Compute the current FT k by α (l) and Equations (1) and (22); if FT k > T s,k then Discard α (l);

EETO Evaluation
This section presents simulation results to evaluate the performance of our joint communication and computation cooperation offloading method (EETO) for mobile helper nodes with dependent computation tasks.

Simulation Setup
The simulations employ the parameters in Table 1. The UE and UHs are initially randomly distributed and follow the mobility pattern of the publicly available PECNet dataset [63]. The coverage ranges of the BSs (servers) and each user end device are 400 m and 50 m, respectively, i.e., the UE can reach a BS that is 400 m away, but the UE can only reach a UH i , i = 1, . . . , I, that is 50 m or less away. We employ the PECNet model for the Stanford Drone Dataset, which is a common mobility benchmark containing over 11,000 unique pedestrians in a university campus setting [63]. Initially, we set the UE mobility speed to 1 m/s. As outlined in Section 2.3, there is no prior benchmark for offloading dependent computation tasks in a cooperative edge computing setting. Therefore, we compare EETO with the following four alternate task offloading approaches for the dependenttask edge computing setting: all local (ALO), all sever (ASO), computation cooperation optimization (CPCO), and communication cooperation optimization (CMCO). In ALO, the tasks are all executed locally on the UE (corresponding to local execution, Section 3.4.1). In ASO, all tasks are executed remotely on MEC servers (i.e., directly offloaded from the UE to an MEC server, Section 3.4.3.1). In CPCO, the computation resources of adjacent devices can be used, and therefore, the UH i can act as helper nodes, i.e., the options from Sections 3.4.1, 3.4.2, and 3.4.3.1 are available to the optimization. In CMCO, the UH i can only act as relay nodes to transmit the computation tasks to an MEC server, i.e., the options from Sections 3.4.1, 3.4.3.1, and 3.4.3.2 are available to the optimization. Thus, the main difference between CPCO and CMCO pertains to the function of the helper nodes UH i : in CPCO, the UH i help only with task execution (computation) but not with relay communication, whereas in CMCO, the UH i help only with relay communication for task offloading to the MEC server but not with helper execution.
We ran 50 independent simulation replications for each evaluation. The resulting 95% confidence intervals are less than 5% of the corresponding sample means and are omitted from the plots to avoid visual clutter.

Impact of Task Complexity c k and Size b k
The average battery energy consumption as a function of the computation CPU cycles c k needed for execution of each bit of task k for the five methods is shown in Figure 5a. We observe from Figure 5a that the ALO energy consumption increases linearly as the computations c k per bit increase. Specifically, when the required CPU cycles c k per bit are less than 10 cycles/bit, ALO attains the minimum energy consumption. The left part of Table 2 indicates that when the required computations c k increase to 15 cycles/bit, ALO can still satisfy the deadline requirements (marked in blue); however, the energy consumption is higher than for the other algorithms (see Figure 5a). The left part of Table 2 indicates that for computation-intensive tasks with c k ≥ 20 cycles/bit, ALO fails to finish the task execution within the deadline T max d (marked in red), which clearly demonstrates the need for a task offloading method.
(a) (b) Figure 5. Average battery energy consumption E exe tot vs. task computation demands c k and data size b k ; default b k , c k , and T max d from Table 1; random task dependency. (a) Task computation demands c k , (b) data size b k .  Figure 5a indicates that the ASO battery energy consumption is nearly constant with a slight increase. In the ASO case, the UE consumes battery energy for transmitting the data directly to the server. When the computations c k per bit increase, the UE consumes slightly more energy for waiting for the server (however, the energy consumption for waiting is far lower compared to the transmission energy). For simple tasks (with small c k ), the task transmission to the MEC server (directly, ASO, or indirectly via helper relays, CMCO) consumes more UE and UH battery energy than executing the simple tasks locally (ALO) or on nearby helpers (CPCO). However, as tasks become more complicated (i.e., as c k increases), the battery energy consumption for task execution on nearby helpers (CPCO) starts to exceed the battery energy consumption for transmitting the tasks via helpers (CMCO) to the MEC server.
We observe from Figure 5a that our proposed EETO method achieves the lowest average battery energy consumption compared to the other approaches. EETO trades optimally between the computation cooperation (CPCO) and the communication cooperation (CMCO) functionalities of the helper nodes, i.e., optimally trades off between all task execution options in Section 3.4, and thus achieves the minimal average battery energy consumption across the full range of task complexities, i.e., required computation cycles c k per bit for task k. Importantly, in typically heterogeneous operating scenarios with a wide range of task complexities c k (as further examined in Figure 8), our proposed EETO approach makes the optimal decision for each task k depending on the individual characteristics of each task k and thus can extract substantial energy consumption reductions compared to the CPCO or CMCO benchmarks, which could exploit only one type of cooperation. Figure 5b shows that when the task data size b k increases, more battery energy is required to offload the tasks to the helper or server, and more computation resources are required to complete the execution process. Therefore, the average battery energy consumption of all approaches increases as b k increases. However, the EETO energy consumption is lower compared to the other methods for each data size b k . The right part of Table 2 indicates that for increasing task data size b k , ALO cannot satisfy the latency requirements, thus demonstrating again the need for an offloading method. Figure 6a shows the average battery energy consumption as a function of the transmit power P tr k , indicting an overall growing trend of the average battery energy consumption with increasing P tr k . This is mainly because P tr k influences the transmission data rate logarithmically (see Equation (3)) and the battery energy consumption for task offloading linearly (see Equations (7), (10), and (13)). We observe from Figure 6a that for low P tr k = 45 mW, CMCO and EETO consume only about two-thirds of the battery energy of CPCO and less than half of ASO, mainly due to the energy-efficient relay task transmission to the MEC servers (however, a low P tr k reduces the offloading speed and makes it harder to meet the task deadlines). In contrast, for a high transmission power, EETO and CPCO achieve substantial battery energy savings compared to ASO and CMCO by avoiding the energy-expensive direct transmission to an MEC server and the relay transmission by a helper (whereby the CMCO transmits the task data twice, once from the UE to the UH and then from the UH to the MEC server). Figure 6b shows the average battery energy consumption as a function of the UE speed, while the UH i are maintained at their default speeds which have an average of roughly 0.7 m/s in the Stanford data set [63]. We observe from Figure 6b an initially decreasing energy consumption as the UE speed increases to 1.5 m/s and then an increasing energy consumption as the UE speed increases above 1.5 m/s. This is mainly because a UE with a similar moving speed as the surrounding UH i has typically more UH i in its vicinity for a longer sojourn time. Importantly, Figure 6b indicates that EETO consistently reduces the battery energy consumption compared to CPCO and CMCO across the full range of considered UE speeds.

Impact of Transmission Power P t k and UE Speed
(a) (b) Figure 6. Average battery energy consumption E exe tot vs. transmit power and UE speed; fixed parameters: b k , c k , T max d , (and P tr k for plot (b)) from Table 1; random task dependency (UE speed 1 m/s for plot (a)). (a) Transmit power P tr k , (b) UE speed. Figure 7a,b show the finish time FT K and average energy consumption E exe tot of the five algorithms for various task dependency relationship graphs, namely sequential, random, and parallel. In the sequential dependency graph, each task k has one predecessor and to start the task execution, its predecessor task needs to be completed. However, in the random dependency graphs, each task k can have zero or multiple predecessor tasks, which requires totally different offloading decisions compared to the sequential dependency graph. Since we can execute multiple tasks at the same time with the random and parallel graphs, the execution time is less than for the sequential graph for all methods (for ALO, the sequential finish time is 21.2 s, i.e., outside the range plotted in Figure 7a). We can also see that EETO does not always have the shortest execution time since the aim is to minimize the energy while keeping the time within the deadline T max d . However, the EETO execution time is very close to the shortest execution time.

Impact of Task Dependency
(a) (b) Figure 7. Impact of task dependency graph; fixed parameters: uniform random data size b k = 280-320 KB, c k = 30-50 cycles/bit (uniform random), task deadline T max Focusing on the energy consumption, we observe from Figure 7b that the parallel task dependency leads to the lowest energy consumption. This reduction in energy consumption is mainly due to the reduced waiting time, i.e., the more independent the tasks are, the less energy is consumed for waiting for the completion of predecessor tasks. Overall, we observe from Figure 7b that among all methods and all task dependency cases, our proposed EETO achieves the lowest battery energy consumption. 6.2.4. Impact of Task Deadline T max d Figure 8 shows the average battery energy consumption as a function of the task completion deadline T max d for two different types of task heterogeneity. Figure 8a considers a wide range of computational complexities c k of the individual tasks k, k = 1, . . . , K, that are drawn uniformly randomly over the range [15,55] cycles/bit with a prescribed latency requirement T max d for the entire set of K tasks. ASO and CMCO execute all tasks on MEC servers within 2 s; thus, even if the deadline T max d increases, the energy consumption remains essentially constant. However, an increasing deadline T max d allows energy-efficient helpers to execute tasks, even if they have long processing times, thus reducing the CPCO and EETO energy consumption. By optimally trading off the task execution options, EETO achieves some moderate energy reductions for the heterogeneous task scenarios in Figure 8 compared to the minimum of the CMCO and CPCO energy consumptions. Importantly, EETO flexibly achieves the minimum energy consumption across the entire examined T max d range.

Summary of This Article
We developed and evaluated the novel Energy-Efficient Task Offloading (EETO) algorithm for dependent computation tasks in a cooperative edge computing setting with mobile end devices that contribute computation and communication relay support in a sliced edge network environment. EETO accommodates arbitrary task dependencies that are characterized by a general task dependency graph and employs a deep-learning-based trajectory prediction for the device sojourn times in the wireless transmission ranges. We have formulated the task offloading optimization problem as a quadratically constrained quadratic programming (QCQP) problem and developed a solution strategy that obtains the task offloading decisions through a semidefinite relaxation and stochastic mapping from the QCQP solution.
The simulation evaluations indicate the EETO consistently achieves low battery energy consumption across heterogeneous parameter settings and scenarios. In particular, the EETO substantially outperforms naive benchmarks that compute the tasks locally or offload all tasks to MEC servers. EETO also outperforms benchmarks with a limited set of offloading decision options; specifically, we considered benchmarks that allow helper nodes to only function as task processing (computations) nodes (CPCO) or to only function as task communication (relay) nodes for offloading to an MEC server (CMCO). We found that the CPCO and CMCO benchmarks attain the EETO performance in some scenarios, while in other scenarios, the EETO achieves significant performance gains.

Limitations and Future Research Directions
A limitation of the developed EETO algorithm is the computational effort for solving the QCQP, as noted in Section 5. One strategy for simplifying the QCQP could be to reduce the considered task offloading options for scenarios where a limited set of options achieve or nearly achieve the optimal EETO performance. For instance, a machine learning approach could learn the scenarios where a limited set of offloading options, such as the naive options or CPCO or CMCO, closely approach the EETO performance and then consider only the reduced set of offloading options in the solution of the optimization problem. Based on training data sets that can be created with the optimal EETO offloading decision making developed in this article, future research could investigate such simplified offloading strategies for scenarios that would need to be identified with pre-trained machine learning models.
Another strategy for simplifying the QCQP could be to "thin" out the set of potential helper nodes or MEC server nodes that are considered for task offloading, that is, the helper and MEC server nodes to be considered in the solution of the QCQP could be pre-selected according to criteria that indicate a high level of potential usefulness of a helper or MEC server. In scenarios with a high density of potential helper nodes in the vicinity of a UE that seeks to offload tasks, the helper nodes could be thinned out randomly. Alternatively, the pre-selection could be based on the distance from the UE to the helper and MEC server nodes or the wireless channel conditions. The pre-selection could be aided by machine learning strategies. For the development of such machine learning strategies, the formal optimization model and QCQP formulation and solution could be utilized to generate solution data sets for training. A related machine learning strategy could learn a direct mapping, i.e., a mapping from a given set of task properties as well as available communication and computing resources to a set of task offloading decisions. A model trained with solution data sets obtained from the QCQP solutions could potentially provide such a direct mapping with a simple neural network forward pass.
The present study has focused on developing a general mathematical model for the offloading of dependent computation tasks to mobile helper nodes, whereby the task properties as well as available communication and computation resources are characterized through parsimonious model parameters. A future research direction is to conduct end-to-end evaluations of the task offloading in testbeds with popular distributed computing applications, such as distributed data management and database operations [68,69], distributed Internet of Things data analytics [70,71], distributed video analytics [72][73][74], and general distributed artificial intelligence and data analytics [75,76]. Moreover, popular user-oriented applications, such as augmented and virtual reality [77][78][79][80][81][82] and online gaming [83], are important application domains to examine in the context of cooperative task offloading to sliced networked computing resources. Such testbed evaluations could examine the typical values of the model parameters, e.g., for the task complexity c k , as well the impacts of real-world wireless bandwidth fluctuations and network congestion. Another important direction for future work is to investigate incentive mechanisms that promote fair cooperation between users.