Deep Learning-Based Dynamic Computation Task Offloading for Mobile Edge Computing Networks

This paper investigates the computation offloading problem in mobile edge computing (MEC) networks with dynamic weighted tasks. We aim to minimize the system utility of the MEC network by jointly optimizing the offloading decision and bandwidth allocation problems. The optimization of joint offloading decisions and bandwidth allocation is formulated as a mixed-integer programming (MIP) problem. In general, the problem can be efficiently generated by deep learning-based algorithms for offloading decisions and then solved by using traditional optimization methods. However, these methods are weakly adaptive to new environments and require a large number of training samples to retrain the deep learning model once the environment changes. To overcome this weakness, in this paper, we propose a deep supervised learning-based computational offloading (DSLO) algorithm for dynamic computational tasks in MEC networks. We further introduce batch normalization to speed up the model convergence process and improve the robustness of the model. Numerical results show that DSLO only requires a few training samples and can quickly adapt to new MEC scenarios. Specifically, it can achieve 99% normalized system utility by using only four training samples per MEC scenario. Therefore, DSLO enables the fast deployment of computation offloading algorithms in future MEC networks.


Introduction
The rapid development of wireless communication technologies has driven the emergence of more and more latency-sensitive and resource-intensive applications and services, such as augmented reality, voice recognition, image recognition, and mobile health care. These mobile applications have dramatically increased the demand for computing and storage resources for wireless devices, which are typically provided by cloud servers. However, because cloud servers are usually deployed over long distances, high transmission latency results when tasks are offloaded to the cloud. Therefore, mobile edge computing (MEC) [1] is considered an effective solution to this problem. The fundamental principle of MEC is that by deploying hosts with cloud computing capabilities at the edge of the wireless network. wireless devices can offload tasks to edge servers, solve the problem of insufficient computational resources, reduce the delay of computing tasks [2,3], and save energy consumption [4,5].
As one of the key technologies of MEC, computation offloading solves the problem of resource limitation and low processing efficiency by offloading computation-intensive tasks to MEC hosts at the network edge. On the one hand, the task offloading process is affected by different factors, e.g., computational resources, wireless communication quality, etc. On the other hand, if computational tasks are aggressively offloaded to the edge servers, the bandwidth in the network system will be occupied. The uplink wireless channel will be severely congested, which significantly increases the transmission delay of computational tasks. Therefore, how to make the best decision is the critical issue of edge offloading. In general, computation offloading and resource allocation are jointly formulated as nonlinear programming problems with mixed-integer programming. Most of the existing solutions are based on heuristic or approximation algorithms [6,7]. However, these solutions rely on expert knowledge and exact mathematical models, which often require a fresh start to adjust the mathematical models once the MEC environment changes, resulting in inefficient offloading decisions. The computational complexity of such methods is relatively high. Therefore, it remains a challenge to design an MEC network with a lowcomplexity algorithm that can be applied to time-varying environments.
Recently, deep learning has been widely used to optimize offloading decisions [8]. Since the nature of a neural network is a black-box model that does not require precise expert knowledge and accurate mathematical models, deep learning provides a good idea for the above challenges. Deep reinforcement learning learns to solve complex problems, such as video games, chess activities, and intelligent robot control, through continuous trial and error. Currently, computational offloading based on deep Q-learning (DQL) [9] is most widely studied, which discretizes the state/action space and optimizes computational offloading decisions and system resource optimization through online learning. However, the discretization of continuous variables limits the performance of DQL and is not suitable for dealing with high-dimensional action spaces [10]. In addition, most of these methods are based on static interactive MEC environments with sufficient training samples to train neural networks. Once the MEC scene changes, it is difficult to collect enough training samples when converging to a new scene. However, considering the dynamic nature of the offloading task, how to converge to a new scene quickly with small samples is still a problem worth investigating.
There are some studies on dynamic MEC networks in the literature. Huang et al. [11] constructed dynamic MEC scenarios by time-varying wireless channel gains and weight factors of computational tasks and proposed a meta-learning-based computation offloading (MELO) algorithm to obtain a general model that can be quickly adapted to new task scenarios by a small number of training samples in MEC scenarios. Wang et al. [12] constructed dynamic task scenarios by different topologies and proposed a meta-reinforcement learning-based approach to improve the training efficiency and reduce the dependence on samples. Unfortunately, this research ignores the energy consumption of the system to achieve low latency. Moreover, the training process of the computational offloading algorithm based on meta-learning is tedious. A new model needs to be created additionally for computing the gradient at each training.
In this paper, we consider a multi-scenario MEC network with multiple WDs and a single edge server, where each MEC scenario is characterized by a specific task weighting factor. Each WD in the network makes real-time decisions on whether to offload its computing tasks to the edge server or execute it locally. By jointly optimizing WDs' computing offload and bandwidth allocation, we propose a Deep-Supervised-Learningbased computation Offloading (DSLO) algorithm to learn to minimize the network utility. The main contributions of this paper are as follows: • We model the system utility of the MEC network with prioritized computing tasks such as the weighted sum of energy consumption and delay cost. To minimize the system utility, we decompose the joint optimization problem into the offloading decision subproblem and the transmission bandwidth allocation subproblem, which are further solved via deep learning and optimization methods, respectively. The remainder of this paper is organized as follows. The related work is introduced in Section 2. In Section 3, we present the system model and problem formulation. We propose the DSLO algorithm in Section 4. Numerical results are presented in Section 5, and a conclusion is provided in Section 6. Before leaving this section, those important notations and abbreviations used throughout this paper are respectively summarized in Abbreviations.

Related Work
So far, considerable research efforts have been devoted to the offloading scheme design for MEC networks. The computing offloading optimization problem can be modeled by three elements [13], i.e., state parameters, decision actions, and system utility. The current state of research on computational offloading can be divided into static MEC network computational offloading and dynamic MEC network computational offloading, depending on the number of state parameters.
Under static MEC network computational offloading [14][15][16], the offloading decision is optimized for a unique category of state parameters, such as wireless channel gain or task computation amount. You et al. [17] considered a single-user system with wireless channel gain as the state parameter and optimized the performance of local computing and offloading computing under the constraints of energy harvesting and processing latency. Tran et al. [18] proposed a low-complexity heuristic offloading framework with the wireless channel gain as the state parameter. Bi et al. [19] considered a cache-assisted MEC system, where the server can selectively cache the previously generated programs for future reuse. Resorting to the deep neural network (DNN), Huang et al. [20] proposed a deep reinforcement learning-based computational offloading framework (DROO), which maximizes the system computation rate by jointly optimizing computation offloading and resource allocation according to the time-varying channel gains. Considering the static interaction environment of MEC, we believe that there are sufficient training samples under the static MEC network. Most of the offloading algorithms based on deep learning are based on DNN.
Considering the dynamic properties of wireless applications, MEC networks make computational offloading decisions by evaluating multiple classes of state parameters jointly. Min et al. [21] considered a dynamic MEC network consisting of a time-varying radio link transmission rate and investigated the computational offloading of IoT devices with energy harvesting in a dynamic MEC network. They proposed an offloading scheme based on deep reinforcement learning, which uses a convolutional neural network (CNN) to compress the state space to speed up the convergence process of the algorithm. Huang et al. [11] considered a dynamic computing task scenario consisting of different weight priority coefficients and proposed a meta-learning-based computing offloading (MELO) framework. MELO can efficiently adapt to a new MEC scenario and minimize the total system delay. Qu et al. [22] considered a dynamic computing task scenario composed of different computing power and bandwidth and proposed a computing offloading framework based on meta-reinforcement learning. The algorithm can quickly adapt to complex and dynamic environments and can be used to improve the robustness of task offloading decisions in IoT environments. Wang et al. [12] constructed dynamic computing task scenarios based on different network topologies, modeled the computational tasks as directed acyclic graphs (DAGs), and designed a sequence-to-sequence (seq2seq) neural network model to generate offloading strategies that can quickly adapt to new environments. Chen et al. [23] considered an MEC system composed of a random task arrival model, where the state parameters include the size of the input data, the maximum tolerable delay, the number of CPU cycles, and the time slot of the task arrival. They designed a temporal feature extraction network composed of one-dimensional convolutional (Conv1D) residual blocks and a long short-term memory (LSTM) network to solve the joint optimization problem of computational offloading and resource allocation. For dynamic MEC network computational offloading, the state parameters are too complicated to optimize the decision actions via classical optimizations, where computation offloading and resource allocation are jointly formulated as a mixed-integer nonlinear programming problem [24]. Considering the dynamics and complexity of the actual MEC environment, we assume that the modeled dynamic MEC scenario contains only a small number of training samples.
Recent works [12,13,20,23,25] resort to the deep learning methods, which utilize deep neural networks [26] to learn from data samples and generate the optimal mapping from state space to action space. However, they require a large number of data samples to train the neural network and cannot converge to the optimal offloading performance when lacking training samples. Achieving efficient deep learning-based computing offloading under a few training samples is challenging in dynamic MEC networks. To the best of our knowledge, there are relatively few studies investigating binary offloading designs based on dynamic MEC networks. Unlike the literature [11], the objective of the study in this paper also includes energy consumption and weight factors as input data together with the neural network and uses CNN networks for offloading prediction. Therefore, our work focuses on designing binary offloading strategies based on time-varying workloads and different computational tasks, aiming to minimize the weighted sum of energy consumption and time delay.

System Model
We consider an MEC network composed of one edge server, one wireless access point (AP), and N WDs, denoted as N = {1, 2, . . . , N}, as shown in Figure 1. The AP and the edge server are connected by optical fiber, whose transmission delay can be ignored. Each WD has one task that needs to be processed locally or be offloaded to the edge server through the AP. Without loss of generality, each WD needs to execute one prioritized computing task with a specific task weight. Denote w n ∈ W as the task weight of WD n , whose value changes in the countable set W depending on the specific task category. An MEC task scenario is considered different from another one if at least one task's weight is different. Each WD can decide whether to offload its tasks to edge servers or compute locally by the device itself. Denote a = {a n ∈ {0, 1}|n ∈ N } as the offloading decision. Specifically, a n = 1 indicates that the WD n offloads its task to the edge server and a n = 0 means its task is processed locally.

Energy Consumption
When the offloading decision is present, we study the resource allocation optimization problem of the MEC network and model dynamic computation tasks via task weights. We set a tuple (d n , γ n ) to represent WD n 's task, for n ∈ N . Specifically, d n is the workload of WD n , and γ n is the required number of CPU cycles to complete the task. When WD n offloads its total task to the edge server, the energy consumption includes data transmission and task computation, which can be expressed as where E t n is the transmission energy consumed by WD n for uploading its workload to the edge server, which is a linear function of workload d n , αd n is the task computation energy consumption, and α is the factor of the computation energy consumption at the edge server. In the above formula, superscripts c and t stand for "edge computing" and "transmission", respectively. When α = 0, we only consider the transmission energy consumption of WDs. Here, we ignore the energy consumption and delay of the edge server when transmitting the computation results back to WD n . Similar to the literature [27], we assume that the computed result size is much smaller than the input data size.
When WD n executes its task locally, we define e l n as the local energy consumption per data bit of WD n , where the superscript indicates the offloading method. Here, the superscript is l denotes "local computing". So, WD n 's energy consumption for executing its total task locally can be given by: By evaluating the energy consumption of both computation offloading and local execution under the offloading decision a n , we get the total energy consumption of WD n , as E n = E c n a n + E l n (1 − a n ).

Time Delay
In addition to the total energy consumption, another major factor affecting the system's overall efficiency is the time delay. It includes the transmission delay when WDs offload tasks to MEC servers and the processing delay of MEC servers, and WDs' local execution latency.
Considering the delay in computation offloading, when WD n 's task is offloaded to the edge server, we use c n to denote the allocated bandwidth to WD n for task transmission. Then, the time latency T c n can be expressed as The transmission delay of WD n is defined as d n C n , and the computational delay of WD n is defined as γ n f n , where f n is defined as the edge processing rate. Meanwhile, the local execution delay for WD n to execute its task is given by where t l n is the local execution delay per data bit of WD n . Therefore, the total time delay is a combination of delay under both computation offloading and local execution, which can be given as T n = T c n a n + T l n (1 − a n ).

Problem Formulation
We formulate the joint computation offloading and bandwidth allocation for the MEC network as an mixed integer optimization problem. We introduce the system utility Q(d, a, w, c) to represent the sum cost of the MEC network as: E n w n + β max{T n w n |∀n ∈ N }, where d = {d n | n ∈ N }, a = {a n | n ∈ N }, c = {c n | n ∈ N }, and β denotes the weight of energy consumption and task completion. Considering dynamic MEC scenarios with different computation task d and weights w, we aim to optimize offloading decisions a and bandwidth allocation c to minimize the system utility Q (d, a, w, c). The optimization problem can be defined as (P1): Here, the constraints (8) and (9) mean that the bandwidth allocated to all WDs c n is non-negative, and C limits the total bandwidth. Since the constraints on the offloading decision and bandwidth allocations are decoupled, we decompose problem (P1) into an offloading decision subproblem and a bandwidth allocation subproblem (P2), as illustrated in Figure 2.  Considering the continuously changing data size d n and the occasionally changing task weight w n , the offloading decision subproblem aims to find an offloading strategy π to effectively produce the optimal offloading decision a * , as: The subproblem (P2) aims to optimize the bandwidth allocation c and is expressed as s.t. (9) and (8).
Given a * , the subproblem (P2) is convex and can be efficiently solved by the standardized CVX tool package.
We assume that changes in workload d are faster than changes in task weights w due to the heterogeneity of mobile devices. The MEC scenario occasionally changes when the task weight changes. Some existing deep learning-based computational offloading methods are based on static network scenarios and require a large amount of training data samples. Once the MEC scenario changes, the deep learning model is difficult to adapt to a new scenario. In the next section, we propose an algorithm based on deep learning that uses few training samples and can quickly adapt to new MEC scenarios.

Deep Supervised Learning-Based Offloading Algorithm
In this section, we propose a DSLO algorithm to achieve the optimal binary offloading decision. The algorithm takes advantage of the translation invariance of the convolutional neural network to capture the local features of input data and accelerate the convergence of the offloading process. For dynamic MEC task scenarios, DSLO can quickly adapt to the new workload d and task weights w. The overall pipeline of the DSLO algorithm is illustrated in Figure 3.

Different scenarios
Training model with CNN / DNN Output predicted offloading decision . We consider a dynamic MEC task scenario, where the task weight w of the scenario is variable and the task workloads d can be changed independently. The dataset contains Here, each data sample (d k , w k , a k ) is a combination of workload, weight factor, and optimal offloading decision. The dataset is further randomly split into I train for the training phase and I test for the testing phase. Correspondingly, we denote K train and K test as the index set of data samples for training and testing under each MEC scenario.

Neural Network Architecture
We implement DSLO based on two classical neural network architectures, CNN and DNN. As shown in Figure 4, the DSLO-CNN algorithm model utilizes three convolutional layers and three fully connected layers. Each layer is followed by the rectified linear (ReLU) activation function except for the output layer. The Sigmoid activation function is used to drop the output value near 0 or 1. The DSLO-CNN algorithm captures the associated information of d i and w i via block by block scanning and focuses on local contents.  The DSLO-DNN algorithm structure is simpler and only uses fully connected layers, as shown in Figure 5. It weights and processes all data through the full connection. Although the influence of all data on a single node is considered, the association information between workloads d and task weights w is not highlighted.  Table 1 is a parameter description of both DLSO algorithm structures. Each convolution kernel has a size of 2. BN denotes batch normalization whose size depends on the size of the corresponding layer. We set ReLU activation functions at each layer except the output layer in order to increase the generalization performance of the model and alleviate the problem of overfitting in a single scenario. We use BN to address issues related to internal covariance shifts in feature diagrams, so as to prevent model overfitting by smoothing the flow of gradients and improving network generalization. Its mathematical expression is as follows: where S denotes the batch size, x i S denotes the input data, y i denotes the data after BN, u and σ are the means. Regarding the variance of input data, γ and β are the scale factor and the offset, respectively. To avoid a denominator of 0, ε is usually set to 1.0 × 10 −5 to increase numerical stability [28].

Train DSLO
To train DSLO, we randomly sample a batch of MEC scenarios {Ψ i | i ∈ I train }, where each scenario Ψ i contains K train data samples where K train ⊂ K. We merge these train-ing samples from different MEC scenarios together, denoted as M = (d k , w k , a k ) i |k ⊂ K train , i ⊂ I train . During each round of training, we randomly sample a set of training data samples M b from M, denoted as D = {(d m , w m , a m )|m ∈ M b }. We use randomly sampling with replacement to prevent the model from being overly dependent on a particular batch of data during training and thus falling into a local optimum [29]. Then, we train the neural network model by minimizing its mean-squared error loss [30] as where f θ is a parameterized function of the model and represents the mapping relationship between computation task and offloading decision. Then, the model's parameters θ are updated by gradient descent [31], i.e., where ∇ θ is calculated with respect to the gradient of θ, and η is a step length hyperparameter. After G rounds of training iteration, the network model converges and is used as the offload decision prediction model for a dynamic MEC network composed of different workloads and weight factors.

Test DSLO
During the test phase, for each MEC scenario in the test dataset {Ψ i |i ∈ I test }, we randomly sample a batch of data samples K test ⊂ K from Ψ i . For each test data sample, we input the workload d i k and weighting factors w i k into the DSLO and obtain the predicted offloading decisionâ, asâ Finally, based onâ, we evaluate the network utility Q by solving the subproblem (P2). Before leaving this section, we provide the pseudo-code of the DSLO algorithm in Algorithm 1.  (14) Update the parameters of the network model according to (15) end // Testing procedure Given a new sample set from K test , generate its offloading decisionâ Evaluate the network utility Q by solving the subproblem (P2)

Parameter Settings
In this section, we study the numerical performance of the proposed DSLO algorithm under different MEC scenarios. In the simulation, we follow the settings in [27] and set the local computation time of the mobile device as t l n = 4.75 × 10 −7 s/bit and the processing energy consumption as e l n = 3.25 × 10 −7 J/bit. We assume that the input data size of all tasks d n is randomly distributed between 10 and 30 MB. In addition, we consider x264 video encoding application as the computing task of WDs where the number of computational cycles is relevant to the input size as γ n = 1900 cycles/byte × d n [32]. The transmission energy consumption of mobile devices is, respectively, e t n = 1.42 × 10 −7 J/bit. The CPU rate of the edge server is 10 × 10 9 cycles/s. We further set α = 3.5 × 10 −7 J/bit and the uplink bandwidth limit as 100 Mbps. In Table 2, we summarize the simulation parameters for easy reference. To train and evaluate the DSLO algorithm, we pre-generate a dataset composed of various MEC task scenarios. For an MEC task scenario, given the workload d t and the task weight w t at time t, we compute its best offloading action by enumerating all 2 N binary combinations, calculate all those corresponding optimization utilities Q t by solving subproblem (P2), and choose the offloading decision a * t with the minimum utility Q t * as the optimal decision. Then, we add the state parameter (d t , w t ) and the offloading decision along with its optimal (a * t , Q t * ) into the dataset. Considering extending to generality, we set the task weights W = {1, 1.5}. We evaluate an MEC network with N = 10 WDs and include all 1024 MEC task scenarios with different weights in the dataset, which are further used to train and test DSLO.
Regarding network utility performance, we compare our DSLO algorithm with three representative benchmarks:

2.
Linear Relaxation (LR) algorithm [33]: The binary offloading decision variable conditioned on (11) is relaxed to a real number between 0 and 1, asâ n ∈ [0, 1]. Then, the optimization problem (P1) with this relaxed constraint is convex with respect to {â n } and can be solved using the convex optimization toolbox. Onceâ n is obtained, the binary offloading decision a n is determined as follows a n = 1, whenâ n ≥ 0.5 0, otherwise

3.
Greedy strategy: For the greedy scheme, we enumerate all offloading decision combinations and then adopt the best one.
The following numerical results are an average of 200 realizations running on a laptop with Intel(R) Core(TM) i7-4710MQ CPU @ 2.50 GHz.

DSLO with Plenty Training Samples
We first evaluate the extreme performance of DSLO with a large amount of data samples. In Figure 6, we study the convergence performance of the DSLO algorithm for a specific MEC scenario, whose task weights are w = {1, 1.5, 1, 1.5, 1, 1.5, 1, 1.5, 1, 1.5}. In this scenario, we set |M| = |K train | = 5000, |M b | = 128 , and |K test | = 1000. There are enough training samples to train DSLO until convergence. For the sake of a better illustration, we define a normalized system utility: which is a ratio between the enumerated optimal offloading action and the one generated by the evaluated algorithm. Both DSLO-CNN and DSLO-DNN converge to a normalized system utilityQ ≥ 99%. With regard to the convergence speed, DSLO-CNN only takes 1000 time frames, while DSLO-DNN takes 5000 time frames. Therefore, the DSLO-CNN algorithm can quickly converge and achieve approximately optimal convergenceQ. In Figure 7, we compare the system utility performance of different offloading algorithms under the varying numbers of WDs. The MEC task scenarios are different due to the different number of WDs. For fairness, we have considered two different MEC scenarios: Ψ 1 and Ψ 2 , whose weight factors are evenly distributed. For each WD, we give weight factors data in Table 3. Compare Ψ 1 and Ψ 2 with the data at the same position. For example, when N = 15 and CNN is adopted in the algorithm, the minimize system utility Q obtained, respectively, is 2230 and 2183, which is a small gap. To evaluate the extreme performance, both DSLO algorithms have been trained with 5000 independent workloads until convergence. Each value is an average of over 1000 independent test data samples in Figure 7. For different MEC scenarios, both DSLO-DNN and DSLO-CNN achieve approximately optimal performance as the Greedy algorithm and are significantly better than LR and Random algorithms. Under the condition of the same algorithm, only changing the task weight (priority relationship) between WDs will not produce a big difference in the results.   In Table 4, we compare the CPU execution latency of DSLO algorithms based on different WD numbers. The CPU execution latency of DSLO is significantly less than the widely used heuristic LR algorithm. When N = 5, the CPU execution latency for LR to complete the given task is 4.14 × 10 −2 s. The latency increases linearly with the number of WDs to 3.14 × 10 −1 s for 15 WDs. In comparison, the test latency of DSLO is always below 10 −3 s. While DSLO-CNN achieves better convergence performance due to its complex neural networks, it doubles the training latency of DSLO-DNN. Interestingly, the test latency of DSLO-CNN is almost the same as the one of DSLO-DNN, which generates an offloading within 1 millisecond.  In Figure 8, we evaluate the effect of BN in both DSLO algorithms. BN greatly improves the convergence performance of DSLO in metrics of both convergence speed and the optimal system utility. As shown in Figure 8a, with the BN structure, the training time of DSLO-CNN reduces from 4000 time frames to 1000 time frames. As shown in Figure 8b, models that do not contain BN layers have a high tendency to fall into local optima in the training phase.

DSLO with Few Training Samples
In Figure 9, we study the convergence performance of the DSLO algorithm for dynamic MEC scenarios with few training samples in each scenario. Specifically, we extract 70% of the entire scene from 1024 MEC task scenarios as the training set, i.e., |I train | = 714 and set |M b | = 128. For each MEC scenario, we only use |K train | = 2 and |K test | = 100. As shown in Figure 9, DSLO-DNN slowly converges to a normalized system utilityQ = 0.98 until the 9000 time frames. In comparison, it converges toQ ≥ 0.99 within 4000 time frames when there are plenty of training samples, as shown in Figure 6. On the other hand, the DSLO-CNN algorithm remains in a high convergence speed, resulting in a normalized system utilityQ ≥ 99% within 2000 time frames. In Figure 10, we evaluate the convergence performance of DSLO with different scales of training scenarios. As shown in Figure 10a, when |I train | = 10%|I |, we randomly choose 103 MEC scenarios out of 1024 scenarios to build the training dataset. Other system parameters are the same as the ones in Figure 9. It follows that there are only 206 training samples when |I train | = 10%|I |. With such a few training samples, DSLO-CNN achieves a normalized system utilityQ = 98% after convergence. As shown in Figure 10b, when |I train | = 30%|I |, the DSLO-CNN algorithm converges within t = 1000 withQ = 99%. In Figure 10c,d DSLO-CNN converges faster and achieves a more stable performance with the increase of training scenarios. In all these cases, DSLO-DNN converges much slower than DSLO-CNN. Hence, the proposed DSLO-CNN is suitable for dynamic MEC scenarios with few training samples. In Figure 11, we further statistically test the adaptability of DSLO to new MEC scenarios. We plot both the median and the confidence intervals ofQ overall test scenarios. As shown in Figure 11a, we make the number of training scenarios 70% of the full scenario, as |I train | = 70%|I |. We adjust the training set by changing the size of |K train | while keeping other parameters unchanged. When |K train | = 1, it means that there is only one training sample per MEC scenario {Ψ i | i ∈ I train }. The median of DSLO-CNN is greater than 98%, and the median of DSLO-DNN is no more than 96%. As the number of training samples increases, DSLO is more adaptable to new scenarios. In general, DSLO-CNN outperforms DSLO-DNN, especially with few training samples. As shown in Figure 11b, we evaluate DSLO that trained under different sizes of |I train |. At the same time, the number of training samples is kept constant for each scenario, as |K train | = 2. When |I train | = 10%|I |, the median of the DSLO-CNN algorithm is greater than 96%. When |I train | = 70%|I |, we see that the median of the DSLO-CNN algorithm can reach more than 99% and the confidence intervals are mostly above 98%. In comparison, the median of the DSLO-DNN algorithm is always less than 98%. Under different scales of training sets, the adaptability of DSLO-CNN to new MEC scenarios is always better than that of the DSLO-DNN algorithm.

Comparisons with MELO and ARM
To further evaluate the effectiveness and superiority of the proposed algorithm, we compare the proposed algorithm with other computational offloading algorithms. In addition to the LR [33] algorithm, two different benchmark algorithms, MELO [11] and ARM [34], have been evaluated. MELO is a meta-learning-based computing offloading algorithm that learns a priori knowledge of historical MEC scenarios and quickly converges when encountering new MEC scenarios by fine-tuning training with only a few samples. ARM is an adaptive risk minimization framework, which extracts global features from different scenarios to update the network model and improves the model's adaptability to unknown scenarios.
Before fine-tuning the training, the models were trained with a sufficient number of samples, where |I train | = 70%|I | as well as |K train | = 4. It is worth mentioning that the input of the MELO algorithm model does not contain weight information, and other parameters are set in the same way as the DSLO algorithm. To avoid chance, we use the average value of the fine-tuning test in 10 MEC scenarios as the actual test effect. As shown in Figure 12, the LR algorithm produces a constant normalized system utility valueQ = 0.934. The MELO algorithm converges to the same performance as DSLO-DNN in 30 fine-tuning steps, resulting in a normalized system utility valueQ over 98.5%. The ARM algorithm is implemented based on CNN and generates a normalized system utility valueQ = 0.99, which is greater than MELO and DSLO-DNN. However, the normalized system utility valueQ generated by DSLO-CNN is always greater than 99%. Hence, DSLO-CNN makes full use of the weight information and can achieve better model initialization parameters than other computational offloading algorithms with a small number of training samples. The algorithm can be quickly adapted to new MEC scenarios.

Conclusions
In this paper, we propose a deep supervised learning-based offloading algorithm, DSLO, which requires few training samples and can adapt to different MEC computing task scenarios. We take advantage of convolutional neural networks to efficiently handle multidimensional state parameters. The proposed algorithm can quickly adapt to dynamically changing MEC scenarios and achieve near-optimal offloading decisions. Numerical results have validated the accuracy of the proposed algorithm and the performance advantage compared with the existing MELO algorithm. Extensive numerical results are evaluated to study the performance of both DSLO-CNN and DSLO-DNN algorithms. DSLO outperforms the current benchmark algorithms in the metrics of system utility and CPU execution latency. In general, DSLO-CNN requires fewer training samples than DSLO-DNN. It can achieve 99% normalized system utilityQ by using only four training samples per MEC scenario.
The proposed DSLO algorithm in this paper relies on a small size of labeled data, which limits its application in some MEC scenarios whose optimal labels are unavailable or difficult to be obtained. Moreover, DSLO requires all training data to be collected and trained in a centralized MEC server, which may raise some security and privacy issues related to personal data. For future work, we expect a DSLO algorithm implemented via reinforcement learning methods without manually labeled data. Furthermore, considering the data privacy in distributed MEC scenarios, implementing DSLO on a federated learning framework is interesting and necessary. Considering the heterogeneity and complexity of MEC networks, we expect that DSLO can be further extended to dynamically changing MEC scenarios with online learning. It will benefit future offloading algorithm deployment on large-scale MEC networks.

Conflicts of Interest:
The authors declare no conflict of interest.

N
The number of WDs w n The weight assigned to the n-th WD a n Offloading decision of the n-th WD d n Workload of the n-th WD γ n Number of CPU cycles required by the n-th WD to complete tasks E c n Energy consumption of the edge server to executing the n -th WD task E t n Energy consumption to transfer offloading task of the n-th WD c n Bandwidth allocated to the n-th WD C Total bandwidth T c n The edge processing delay of the n-th WD f n The processor's computing speed of the n-th WD e l n Local computing unit data energy consumption of the n-th WD E l n Local computing energy consumption of the n-th WD t l n Local unit data execution delay of the n-th WD T l n Local execution delay of the n-th WD Q(·) The system utility function π Offloading policy function L(·) The training loss function of the model Ψ Scenario of MEC