D2D-Assisted Multi-User Cooperative Partial Offloading in MEC Based on Deep Reinforcement Learning

Mobile edge computing (MEC) and device-to-device (D2D) communication can alleviate the resource constraints of mobile devices and reduce communication latency. In this paper, we construct a D2D-MEC framework and study the multi-user cooperative partial offloading and computing resource allocation. We maximize the number of devices under the maximum delay constraints of the application and the limited computing resources. In the considered system, each user can offload its tasks to an edge server and a nearby D2D device. We first formulate the optimization problem as an NP-hard problem and then decouple it into two subproblems. The convex optimization method is used to solve the first subproblem, and the second subproblem is defined as a Markov decision process (MDP). A deep reinforcement learning algorithm based on a deep Q network (DQN) is developed to maximize the amount of tasks that the system can compute. Extensive simulation results demonstrate the effectiveness and superiority of the proposed scheme.


Introduction
In recent years, with the development of wireless networks and the popularity of smart mobile devices, mobile applications such as augmented reality (AR), virtual reality (VR), and facial recognition payment have grown exponentially [1,2]. These applications tend to be computation intensive and require low latency, but the battery capacities, computation resources, and storage capacities of mobile user equipment (UE) are very limited. As a result, most emerging applications may not be suitable for local execution on mobile devices [3,4]. To address this daunting challenge, the functions of the central network are increasingly moving towards the edge of the network [4].
Mobile edge computing (MEC) is regarded as a promising paradigm. This technique moves service platforms with computing, storage, and communication capabilities to the edge node (base station (BS)) nearest to the mobile devices [5][6][7][8][9]. MEC allows resourceconstrained mobile terminals to migrate part or all of the complex applications to the edge cloud, becoming a low-latency, low-energy, and efficient solution [10][11][12]. However, the heterogeneous characteristics of BSs and MEC servers and the limited resources of communication and edge computing bring challenges to computational offloading methods [13].
Computing offloading is one of the key issues in MEC [14,15]. Its main task is to plan the offloading scheme of computing tasks and the allocation scheme of computing resources to reduce delays, save energy consumption, and improve computing resource single BS. Unlike traditional algorithms considering latency and energy consumption, we maximize the computing power of the entire system. An improved algorithm based on DQN is presented to solve the proposed problem. The main contributions of this paper are summarized as follows: • We construct a D2D-MEC framework that combines D2D communications and MEC technology. The user equipment with limited computational capability can offload part of its computation-intensive tasks to the MEC server located in the BS and the idle equipment nearby, and the allocation of computing resources is the responsibility of the BS. In order to maximize the computing power of the whole system under the condition of limited computing resources, we propose an MEC framework including partial offloading, resource allocation, and user association under the maximum delay constraint of the application. • We propose an optimization problem with constraints on both delay and computational resources, which is NP hard. By analyzing the internal structure of the optimization problem, it is decomposed into two sub-problems. We prove that the optimal solutions of the two sub-problems constitute the optimal solution of the original problem. The convex optimization is employed to solve the optimal solution of the first subproblem. The second subproblem is described as a Markov decision process (MDP) used to maximize the amount of tasks calculated by the system, in which offload decisions, resource allocation, and user association are determined simultaneously. A DQN-based modeless reinforcement learning algorithm is proposed to maximize the objective function. • Extensive simulations demonstrate that the proposed algorithm outperforms traditional MEC schemes, Q-learning, DQN, and other conventional algorithms under different system parameters.
The rest of this paper is organized as follows. In Section 2, we present the system model, including a network model, a channel model, a computation model, and problem formulation. In Section 3, we decompose the original problem into two subproblems. In Section 4, we propose a reinforcement learning algorithm based on DQN to solve subproblem 2. In Section 5, we show the simulation results. Finally, we conclude this study in Section 6.

Network Model
As shown in Figure 1, the scenario we consider consists of a single BS equipped with an edge cloud server and mobile devices within the coverage area of the BS. A mobile device sends data to the BS through the cellular network and sends data to a nearby mobile device through D2D. Within the range of D2D communication, the mobile devices are divided into Task Devices (TDs), denoted as U = {u i |i = 1, 2, . . . , U}, and D2D Resource Devices (D2D RDs), represented by k i , i = 1, 2, . . . , K. Set K = {k i |i = 0, 1, 2, . . . , K} represents all resource devices, where k 0 is the BS. RDs can provide computing resources for tasks on TDs. The same as [31][32][33], the applications considered in this study are all oriented to data partitioning. The computing task on TD can be arbitrarily divided into three parts, and the computation is performed in parallel on the local, edge cloud, and a D2D RD at the same time.
We divide the system time into several time slots. The system state is constant in time slots, but changes between slots. Each slot BS allocates computing resources. The computing task on i ∈ U is denoted as where Q i is the size of the task data, C i indicates the number of CPU cycles per bit of data, representing the computational complexity of the application, τ i is the maximum delay, and f i is the local computing capacity. Computing resources of k i ∈ K are denoted by F i , i = 0, 1, 2, . . . , K.

Channel Model
We assume that the wireless channel state remains constant when each computation task is transmitted between the TD and RD. The transmission rate between the TD i and the BS is calculated by The transmission rate between TD i and D2D RD j is calculated by where h ij is the channel power gain between TD i and RD j; B ij is the bandwidth allocated to the cellular channel or D2D channel between TD i and RD j; p c i is the transmission power of cellular from TD i to BS; and p d i is the transmission power of D2D from TD i to a D2D RD. Since the two powers are limited by the maximum uplink power p max i of TD, they are subject to constraints:

Computation Model
The computing task on TD is divided into three parts, which are computed on a local, edge cloud, and D2D RD, respectively. x ij ∈ {0, 1}, ∀i ∈ U , ∀j ∈ K/k 0 is the user association between TD i and RD j, x ij = 1 indicates that TD i offloads part of the computing task to D2D RD j, and otherwise, x ij = 0. Since a TD selects, at most, one D2D RD for computational offloading, there are constraints: ∑ 1], i ∈ U denote the proportion of a computing task on TD i that is offloaded to the edge cloud and D2D RD, respectively. Since the locally computed ratio should be non-negative, α i and β i should satisfy the constraint: 0 ≤ α i + β i ≤ 1. Let f ij , ∀i ∈ U , ∀j ∈ K denote the computational resource allocated by RD j to TD i. Since RDs have limited computing resources, there are constraints: ∑ Local Computing: The local computation delay of the task on TD i can be computed as • Edge Computing: The total latency of edge computing on TD i consists of three parts: (1) time for uploading computing tasks D e,t i , (2) time for executing tasks on the MEC server D e,c i , and (3) time for downloading computing results. Similar to [31,34], this study ignores the delay of sending results back to TDs from MEC server. This is because the size of the results is usually much smaller than the size of the transmitting data. Therefore, according to Equation (1), the delay of TD i to complete edge cloud computing can be computed as • D2D RDs Computing: Similar to edge cloud computing, the delay of TD i to complete D2D RD computing can be obtained by (1) Therefore, according to Equations (5)-(7), the total delay for completing the task φ i on TD i is:

Problem Formulation
In this paper, we improve the computing capability of the whole system, and the computing capability of the system is reflected by the number of devices supported by the system. Under the constraints of computing resources and maximum delay, the number of devices served by the system indicates the computing capability of the system [25].
We regard the number of TDs that can complete the computing task as the optimization target, and is the completion of the computing task on TD i. o u i = 1 indicates that the task is completed within the maximum delay; otherwise, o u i = 0. The list of notations is given in abbreviations part. The optimization problem of this study can be formulated as: . , x u U k K } denotes the user association vector between the TD and the D2D RD; α = {α u 1 , α u 2 , . . . , α u U } is the offloading decision vector of edge cloud computing; β = {β u 1 , β u 2 , . . . , β u U } is the offloading decision vector of D2D RDs computing; and . . , f u U k K } is the allocation decision of computing resources on RDs. Additionally, (10) indicates that the user-associated variable is a binary variable, x ij = 1 indicates that TD i offloads the task part to D2D RD j, or x ij = 1. (11) ensures that a TD can only select one of multiple D2D RDs devices for task offloading. Meanwhile, (12)- (14) indicate that the cloud, D2D RD, and local data ratios are all positive and cannot exceed one. Additionally, (15) and (16) ensure that RD j cannot allocate more computing resources to all TDs than its maximum computing capability.
Proof. See Appendix A.
Theorem 1 shows that P1 is a NP-hard problem, and the object function of (9) is non-convex, which is a difficult problem to solve. In order to solve P1, we first decompose and simplify the problem, and then solve it using a reinforcement learning method instead of a conventional optimization method.

Problem Decomposition
We maximize the number of TDs served by the system under the constraints of limited computing resources. The requirement for the completion of tasks on TDs is that the computing time is less than the maximum delay. According to D e,c i in Equation (6), the computing resource f ik 0 allocated by the edge server to TD i is proportional to the data size α i . Similarly, according to D D,c i in Equation (7), the computational resource f ij allocated by D2D RD j to TD i is proportional to the data size β i . Based on the above analysis, reducing the value of α i + β i can alleviate computing resource TD i, which is occupied for the entire system. As a result, the system can have more computing resources to provide services for other TDs.
Based on the above analysis, in order to solve P1, we first determine the value of α i + β i , ∀i ∈ U and the resource allocation scheme f , and then determine α i , β i and user association x. Finally, we prove that the optimal solution obtained in this way is the same as the optimal solution of P1. Define variables In P2, the values of {α , x} are fixed, and the optimal {γ, f } are calculated. The goal of optimization is to minimize the demand for computing resources of the system. The mathematical formulation of the problem is expressed as where constraint (18) is set according to (14), and (19)-(21) represent constraints on local computing delay D l,c i , edge cloud computing delay D e i , and D2D computing delay D D i , respectively. Theorem 2. The optimal solution of P2 is given by {γ * , f * }: Proof. See Appendix B.
It can be observed that γ * i is a constant independent of {x, α}, and f * ij can be represented by variable α i . Therefore, substituting the solution of P2 into P1 can obtain: where o u i is obtained by Theorem 3. The optimal solutions {γ * , f * },{x * , α * } obtained by P2 and P3 are the optimal solutions of P1.
Proof. See Appendix C.

DQN-Based Computation Offloading
In order to solve problem P3, we express it as a MDP, which can be solved using modelfree reinforcement learning, and propose an improved reinforcement learning algorithm based on DQN. Compared with the conventional DQN, the improved reinforcement learning algorithm can enable the agent to learn a better solution and converge to solve the problems. In this paper, we improve the number of task users that the system can serve by optimizing the offloading strategy and the allocation of computing resources in the system. Obviously, the optimal solution is unknown, so the agent can easily fall into a sub-optimal solution after finding a feasible allocation scheme and obtaining a positive reward. Therefore, in order to make the agent constantly search for a better solution, we compare and update the recorded action trajectory during the learning process, and then make random selection with a certain probability under the condition that the agent learns a lot of the recorded action trajectory, so that the agent has the opportunity to learn a better solution. In addition, we set the reward value given by the environment when learning the recorded action trajectory. This is different from using the Q-network to decide the action, which is used for fast convergence.
We call the proposed algorithm DQN-PTR, which integrates the priority action trajectory replay method into DQN. In this section, we first define the three key elements of the MDP problem, i.e., the state space, action space, and reward function. Then, we explain the detailed implementation of our proposed algorithm for reinforcement learning.
, 1}, i ∈ U represents the completion of the tasks on the task devices. We define s 0 as the system state observed by the BS at the beginning of the time slot; that is, s 0 = {F 0 , F 1 , . . . , F K , 0, 0, . . . , 0}. • Action Space: The action space consists of two parts: . . , x ik K }, i ∈ U represents the offload association between the TDs and the D2D RDs. α is the task offload ratio of the currently assigned TDs. According to constraint (26) and (27), it is stipulated that action a t at step t satisfies condition ∑ x(t) = 1. • Reward Function: The objective function of P3 is the sum of the TD devices that complete the calculation.
Considering that the size of the computing tasks of each device is different, to ensure the fairness of the evaluation, the reward function is defined as the sum of the size of the computing tasks completed in the current time slot: In the reinforcement learning process, in each episode, as the environment performs the t step action, the state and action space of the next step also change. Assume that the action selected by the agent at step t is a t = {x ij = 1, α (t)}. a t is executed by the environment when s t , a t satisfy the condition, given by With the successful execution of a t , the remaining resources of the RDs and the TDs that have not allocated computing resources are reduced, and are given by If any of the conditions (35)- (37) are not met, a t is infeasible, i.e., s t+1 = s t , A t+1 = A t .

Algorithm Design Based on DQN
The structure of the DQN-PTR algorithm proposed in this paper is shown in Figure 2, which mainly includes four parts: the environment, the replay buffer, the networks, and the trajectory record. The environment performs actions, computes rewards, and gives transitions to the state and the action space. The replay buffer stores the task offloading experiences, which are used to train the Q-Network. The network part includes two networks, which are used to predict the Q value and target the Q value, respectively. Network parameters are updated according to the differences between the two Q values. The recorded action trajectory is updated at the end of each training episode. If the result of the current episode is better than the recorded result, the action trajectory of the current episode is used to replace the originally recorded action trajectory. We propose an DQN-PTR-based task offloading algorithm in Algorithm 1, and explain the main steps in detail, as follows:

1.
Initialize the action-value function Q(s, a) and the target action-value functionQ(s, a) with parameters θ and θ , respectively. Initialize experiment replay buffer D to an empty set of size N.

2.
Initialize = 0, and specify that the growth rate of is increment = 0.0001 and grows to max = 0.9999. This parameter determines the probability of random selection when the agent selects an action, and the probability of random selection decreases with the update of the network parameters.

3.
Initialize the optimal trajectory record O = ∅ and the maximum total return value R = 0. 4.
In each episode, the agent in the BS collects the s 0 . Initialize R = 0 to calculate the total return value of this episode. µ ∈ [0, 1] is used to determine whether all actions in this episode are determined by the Q-Network or the optimal trajectory. 5.
If the training episode is less than or equal to 100, is used to decide whether the choice of action is randomly selected or selected according to the maximum Q value. 6.
When the training episode is greater than 100: µ > 0.9, the selection of actions in this episode is the same as that in point number five; when µ ≤ 0.9, the actions are performed in accordance with the optimal trajectory O in this episode. It should be noted that the reward settings in environment 1 and environment 2 are different. They are expressed as follows.

7.
When step t ends, store (s t , a t , r t , s t+1 ) in the experience replay buffer; update the total reward value of the current cycle R = R + r t ; retrieve multiple records from the experience replay buffer to update Q-Network; increase the value of . For each step t do 10: If episode ≤ 100 or µ > 0.9 then 11: If rand ∈ [0, 1] ≥ then 12: Select a random action a t 13: else 14: Set a t = arg max a Q(s t , a; θ) 15: end if Execute action a t , observe next state s t+1 and reward r t according to environment 2 20: end if 21: Store transition(s t , a t , r t , s t+1 ) in D 23: If episode terminates at step t + 1 then 24: If R > R then 25: Replace Sample random mini-batch of transitions (s i , a j , r j , s j+1 ) from D

34:
Perform a gradient descent step on (y j − Q(s j , a j ; θ)) 2 with respect to the network parameters θ 35: If < max then 36: = + increment 37: end if 38: Every C steps resetQ = Q 39: end for 40: end for

Analysis of Simulation Results
In this section, we evaluate the performance of the computational offloading scheme proposed in this paper and the DQN-PTR algorithm through computer simulations. We first present the simulation parameters of the system. Then, we discuss the experimental results.

Simulation Setup
In the simulation, we assume the following scenario. We consider partial offloading between multiple devices in a small area covered by a single base station.The edge server on the base station side is equipped with an reinforcement learning agent which can make decisions about the offloading scheme in this area. In addition, the computing resources in the system are limited, and computing tasks cannot be completed locally. According to references [7,25,26,29,34,35], we set the simulation parameters to match our research scenario. We consider that the transmission power, channel bandwidth, and background noise of each device are 2 W, 10 MHz, and −170 dBm. The computing capacity of TDs and RDs are 24 Mcycles/s and 35 Mcycles/s, respectively. The data size and maximum latency of each task are 2.15 Mbits and 1 s, respectively. The number of CPU cycles is set 20 Cycles/bit.
We compare the proposed algorithm with traditional MEC schemes, two benchmark algorithms, and reinforcement learning algorithms: • Full Local: All TDs execute their tasks via local computing.
• Local-cloud: The computing tasks on TDs can be divided into two parts, which are computed on the local and edge cloud, respectively. In order to make full use of computing resources, all computing resources on TD i are allocated to task φ i . If the local resources are insufficient, the computing resources will be supplemented by the edge cloud. • RBA: The TD i randomly selects a D2D RD device for computing offload and utilizes all the computing resources of the D2D RD device. If the computing resources of the two places are insufficient, the edge cloud will supplement the computing resources. • GBA: The TD i selects the D2D RD device with the largest remaining resources. Under the condition of making full use of local computing resources, the remaining computing tasks are evenly sent to the D2D RD device and edge cloud for computing. • Q-learning: Q-learning is a basic reinforcement learning algorithm. Using Q-learning to solve P3, in simulation, the state space, action space, and reward function of Q-learning are all the same as those in DQN-PTR. • DQN: DQN is an improved reinforcement learning algorithm based on Q-learning. In the simulation, the state space, action space, and reward function of DQN are all the same as those in DQN-PTR.

Simulation Result
In Figure 3, we show the number of supported (or unexecuted) TDs versus the total number of TDs in the system. The computing capacity of the MEC server is 50 Mcycles/s, and the number of D2D RDs is 1 2 of the number of TDs. Since the scenarios we study mainly concern computationally intensive tasks, none of the tasks are computed locally. Relying on the computing resources of the local and MEC server, when the number of TDs reaches five, the upper limit of the system computing capacity can be reached. When considering D2D RDs in the system, the total number of computing tasks that the system can complete naturally increases as the number of TDs increases. In Figure 3, the DQN-PTR method proposed in this paper can achieve the best results. The DQN algorithm works satisfactorily when the number of devices is relatively small, but the results become worse as the number increases. However, the gap between Q-learning and DQN-PTR is always large. This is because as the number of devices increases, the number of actions and states in reinforcement learning also increases, which results in worse training results under the same training episodes. Additionally, when the number of TDs is greater than 10, the results obtained by the RBA and GBA algorithms are both smaller than the algorithm proposed in this paper.
In Figure 4, we show the number of supported (or unexecuted) TDs as the computing resources of the MEC server increases. The number of TDs and D2D RDs in the system is 30 and 15, respectively. It can be observed from Figure 4 that if the DQN-PTR algorithm is used for computing offload planning, the MEC server only needs to provide 100 Mcycles/s of computing resources to compute all tasks in the system. In addition, if the computing resources of D2D RDs in the system are utilized, even the relatively poor allocation algorithm (DQN, RBA and GBA) can save about 300 Mcycles/s of cloud computing resources compared with traditional local-cloud offloading.  In Figure 5, we present the number of supported (or unexecuted) TDs versus the number of D2D RDs. The computing power of the MEC server is fixed at 50 Mcycles/s and the number of TDs is 20, so the total data volume of all tasks in the system is 43.00 Mbits. As can be seen from Figure 5, our proposed DQN-PTR algorithm requires the fewest D2D RD devices. DQN, GBA, and Q-learning algorithms require 15 D2D RD devices to complete all tasks, while RBA requires 20 D2D RD devices.  Figure 6 shows the learning curves of Q-learning, DQN, and DQN-PTR. The number of TDs and D2D RDs is 20 and 10, respectively. In order to ensure fairness, the three algorithms have the same parameters, i.e., the maximum number of steps allowed per episode is 50. The ε − greedy value increases from 0 to 0.95 with a growth rate of 0.0001. Learning rate γ, replay memory size, and mini-batch size are 0.0001, 10,000, and 200, respectively. Reference [36] for an analysis of our curves, we found that DQN-PTR performed better than DQN and Q-learning. Although Q-learning has the fastest learning speed, our proposed algorithm is more stable than DQN in terms of the fluctuation of the curve, and obtains the highest average return value.

Conclusions
This paper has proposed an integrated framework for multi-user partial offloading and resource allocation, combining MEC and D2D technologies. Under this framework, we can make the decision of computational offloading and the resource allocation of MEC and idle devices. We have also designed a convex optimization method to simplify the problem. Finally, we have derived a DQN-based reinforcement learning algorithm to solve the problem. Simulation results have shown that the proposed scheme has better performance than other benchmark schemes under different system parameters.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: The data size of the task φ i C i CPU cycles per bit required for task φ i τ i The maximum delay of the task φ i f i The local computing capacity of the TD i F i The computing resource of the RD i R ij The transmission rate between the TD i and the RD j B ij The bandwidth allocated to the channel between TD i and RD j p c i The cellular transmission power from TD i to BS The total delay for completing the task φ i on TD i o u i The completion of the computing task on TD i

Appendix A
Considering that when the MEC server and all TDs do not provide computing resources, and the transfer time is fast enough. In this case, the number of computing tasks is much more than the number of D2D RDs, and satisfies: Therefore, the maximum number of tasks supported by a single D2D RD is F K . In P1, the problem of maximizing the number of tasks can be rewritten as It is difficult to obtain the optimal solution of the problem P A. For the fact that x is binary, then the feasible set is not convex, P A is non-convex integer programs. According to the statement in [17,37,38], P A is NP-hard. From [39], P1 is also NP-hard because P1 is a special case of P A.

Appendix B
According to (10) and (11), ∑ k K j=k 1 x ij is equal to 0 or 1. Therefore, in order to prove Theorem 1, the problem is classified according to the value of user association x.

•
When ∑ k K j=k 1 x ij = 0, the computing task of TD i is only computed at the local and edge cloud, thus β i = 0, f ij = 0, ∀j ∈ K/k 0 and α i = 1. In this case, constraint (21) is obviously satisfied. We then have From (20), we can obtain Take the derivative of the right hand side of the above inequality, we have To sum up, when the equal signs of (A1) and (A2) hold, f ik 0 can take the minimum value, that is, (22) and (23) are proved. • When ∑ k K j=k 1 x ij = 1, it means that TD i offloads the computing task to a certain D2D RD, we define the D2D RD as D2D RD j. According to (21), we have Similar to (A2), when the equal signs of (A1) and (A5) hold, f ij takes the minimum value, that is, (24) can be proved. • When ∑ k K j=k 1 x ij = 0, α i = 1 is substituted into (24) and the result is f * ij = 0. Therefore, Theorem 1 is proved in both cases.

Appendix C
In order to prove that the optimal solutions of P2 and P3 are the same as the optimal solutions of P1, it is necessary to prove that the maximum values of the objective functions calculated by the optimal solutions obtained by P3 and P1 are the same. To this end, we first denote the optimal solution set of P2 and P3 as: {γ * , f * , x * , α * }. The maximum value of the objective function of P3 is N 3 . The optimal solution set of P1 is set as: {x * 1 , α * 1 , β * 1 , f * 1 }, where γ * 1 = α * 1 + β * 1 ,α * 1 = α * 1 γ * 1 ,β * 1 = γ * 1 − α * 1 , and the maximum value of the objective function is N 1 .
Similarly, it can be proved that {x * 1 , α * 1 , β * 1 , f * 1 } satisfy the constraints (18), (26)- (29). Assuming that the conditions (19)- (21) are not satisfied, it means that there are offloading schemes {x * 1i , α * 1i , β * 1i , f * 1i } of TD i , i ∈ U that do not satisfy the constraints (19)- (21), that is, D i > τ i , o u i = 0, f * 1i ≤ 0, 0 ≤ α * 1i + β * 1i ≤ 1. This means that the TD i and all computing resources f * 1i = { f * 1ij }, ∀j ∈ K allocated to TD i does not affect the value of the objective function. Group all TDs that do not satisfy the constraints into set M = {m i |i = 1, 2, . . . , M}. Set N = U /M = {n i |i = 1, 2, . . . , U − M} and redefine the computing resource in the system as F j = F j − ∑ M i=1 f * 1m i j , j ∈ K. Re-resource allocation is performed on TDs set N and network resource state F j through P1, and the obtained resource allocation scheme is equivalent to {x * 1 , α * 1 , β * 1 , f * 1 } and both satisfy the constraints of P2 and P3. So the maximum value of the objective function is still N 1 and the solution is a feasible solution of P3. The maximum objective function obtained by solving the same problem through P2,P3 is defined as N 3 , so N 1 ≤ N 3 . In addition, since the number of TDs and the computing resources in the network are both reduced, the obtained maximum objective function value N 3 ≤ N 3 , so N 1 ≤ N 3 ≤ N 3 .
To sum up, N 1 ≤ N 3 ≤ N 3 and N 3 ≤ N 1 , so there can only be N 3 = N 1 . Theorem 2 is proved.