Deep Reinforcement Learning-Based Video Ofﬂoading and Resource Allocation in NOMA-Enabled Networks

: With the proliferation of video surveillance system deployment and related applications, real-time video analysis is very critical to achieving intelligent monitoring, autonomous driving, etc. Analyzing video stream with high accuracy and low latency through the traditional cloud computing represents a non-trivial problem. In this paper, we propose a non-orthogonal multiple access (NOMA)-based edge real-time video analysis framework with one edge server (ES) and multiple user equipments (UEs). A cost minimization problem composed of delay, energy and accuracy is formulated to improve the quality of experience (QoE) of the UEs. In order to efﬁciently solve this problem, we propose the joint video frame resolution scaling, task ofﬂoading, and resource allocation algorithm based on the Deep Q-Learning Network (JVFRS-TO-RA-DQN), which effectively overcomes the sparsity of the single-layer reward function and accelerates the training convergence speed. JVFRS-TO-RA-DQN consists of two DQN networks to reduce the curse of dimensionality, which, respectively, select the ofﬂoading and resource allocation action, as well as the resolution scaling action. The experimental results show that JVFRS-TO-RA-DQN can effectively reduce the cost of edge computing and has better performance in terms of convergence compared to other baseline schemes.


Introduction
Along with the development of the communication infrastructure and embedded systems, a number of cameras have been deployed around the world to collect environmental information, including traffic monitoring, electronic health care, object tracking, and smart robotics [1].According to Cisco's forecast, video streaming will account for 80% of total Internet traffic by 2023 [2].In particular, surveillance cameras can produce video transmissions of nearly 25-30 frames per second.Low frame rate (1.25 Hz) moving image sequences generate more than 100 Mb data per second.However, camera sensors have limited computing ability and only support low-complexity recognition algorithms, which means that the video recognition accuracy is limited.In addition, deploying a network which only uses cameras to meet computing requirements is too costly for the system.To obtain the information from a video, it is necessary to send the video frames taken by the user equipments (UEs) to a data center with abundant calculation resources, and deal with the desired scene information.However, the bandwidth, which is needed to efficiently transmit, and the analysis accuracy of the video stream is prohibitive.In addition, video analytics are also computation-intensive, and analysis of the video on the UEs and the cloud data centers alone cannot meet the requirements of resource and delay.Although many researchers have tried to solve this problem, the huge challenge of the video analysis delay is due to a lack of effective performance.Mobile edge computing (MEC)-based video analysis is the only feasible method to satisfy the demand for a large volume of video streams in real-time.
As an emerging computing architecture, MEC can decentralize the computing ability from centralized cloud centers to users and mobile devices, which are close to the edge of the network [3].Computing the task at the edge of the network can not only reduce the delay and the load of bandwidth, but also reduce the risk of privacy leakage and improve data security.Most of the existing articles on task offloading adopt an orthogonal frequency division multiple access technology (OFDMA) [4], which can only provide a single channel resource for the UEs and has low bandwidth resource utilization.To improve the utilization of bandwidth resources, non-orthogonal multiple access (NOMA) has been proposed [5].Unlike orthogonal multiple access technology (OMA), NOMA can accommodate more users via non-orthogonal resource allocation, that is, it can provide services to multiple users on the same subchannel at the same frequency and time, thus improving spectral efficiency [6,7].Applying NOMA technology to the video offloading process can effectively increase the capacity of the bandwidth resources, reduce the delays of video stream offloading, and improve the user's quality of experience (QoE) [8].
However, real-time video analysis is not easy.Offloading video to the edge for analysis requires high demands on resources and latency, which needs to dynamically balance the transmission and computation processes.Specifically, video analysis usually has a high demand for resources.Analyzing the video frame by frame requires a large amount of computing resources [9,10], which may easily lead to a long delay [11].Moreover, the complex network structure necessitates the consumption of several GBs of memory.Due to the limitation of the computational resources of the edge, it is necessary to enact a reasonable offloading decision and resource allocation to satisfy the demand of delay.Generally, the offloading decision optimization problem is always combined with resource allocation, which leads to a non-convex NP problem, where it is difficult to determine the best way to allocate resources in a distributed environment.
Recently, deep reinforcement learning (DRL) has been widely used in many applications of mobile communication [12].Using DRL to improve the performance of large-scale dynamic video analysis is quite interesting, however, the video analytic system lacks an effective mechanism to optimize the video configuration adaptively, which causes low resource utilization.Specifically, we use the video frame resolution as an example to illustrate the impact of the video configuration on the accuracy and latency of video analysis.In this paper, we divide the total time into several equal time slots and define "fps" as the numbers of video frames for each time slot.Images with a higher resolution may be more accurate, however, they may cause a long transmission delay.When the network bandwidth is timevarying, translating video with a higher resolution may increase resource consumption and cause high latency, while translating video with a lower resolution may reduce the network utilization and the analysis accuracy.Therefore, it is necessary to propose an algorithm to select the video frame resolution adaptively, according to the state of the network.
Based on the above discussion, this paper proposes the joint video frame resolution scaling, task offloading, and resource allocation algorithm based on Deep Q-Learning Network (JVFRS-TO-RA-DQN) to optimize video edge offloading and the resource allocation decision.The key contributions of this paper are as follows.

•
A two-layer NOMA-enabled video edge scheduling architecture is proposed, where UEs are divided into different clusters of NOMA, and the tasks generated by UEs in the same cluster are offloaded over a common subchannel to improve the offloading efficiency.

•
An attempt is made to optimize the QoE of the UE by formulating a cost-minimization problem composed of delay, energy, and accuracy in order to weigh up the relationship between these three parameters.

•
The JVFRS-TO-RA-DQN algorithm is proposed to solve the joint optimization problem.
The JVFRS-TO-RA-DQN algorithm contains two DQN networks; one is used to select offloading and resource allocation action, and the other is used to select video frame resolution scaling action, which effectively overcomes the sparsity of the single-layer reward function and accelerates the training convergence speed.

•
The experimental results show that the JVFRS-TO-RA-DQN algorithm can achieve better performance gains in terms of improving video analysis accuracy, reducing total delay, and decreasing energy consumption compared to the other baseline schemes.
The rest of this article is organized as follows.In Section 2, we review the relevant work carried out in other studies.Section 3 provides a description of the problem and the scheduling model, and Section 4 depicts the details of our algorithm's implementation.In Section 5, the results of the simulation are analyzed, and finally, we summarize our work in Section 6.

NOMA-Enabled Task Offloading in MEC Scenarios
Video surveillance systems have been used extensively in various industries, and are gradually becoming intelligent, for example, face recognition and object detection [13,14], etc.Along with the rapidly growing number of video monitoring applications, it is necessary to transfer and analyze an increasing amount of video data.MEC has great potential in terms of reducing delays [15][16][17][18], and energy consumption [19][20][21] due to its intelligence in computing and caching.It is possible to reduce the transmission burden and latency by offloading the computation-intensive and latency-sensitive tasks to the MEC server.In order to further optimize the spectrum resource allocation and achieve high speed transmission and wide coverage, most studies combined NOMA and MEC.The authors in [22] focused on the partial offloading and binary offloading problems under time division multiple access and NOMA and tried to maximize the computing efficiency of the system.The authors in [23] proposed the ultra-dense heterogeneous network (UDHN) based NOMA-MEC system and studied the resource allocation problem of multi-SBS and multi-users to minimize user energy consumption and task delay.The authors in [24] focused on reducing the transmission delay and optimizing workload offloading allocation in a downlink NOMA-based MEC system.To solve this problem, they designed a channel quality ranking algorithm to obtain the optimal offloading decision.The authors in [25] considered the random task arrival and the uncertainty of channel conditions, and they proposed a decentralized DRL framework to solve the problem of power allocation, where the state was based on local observations.In addition, NOMA technology is also applied in many practical situations to improve spectrum utilization, such as robotics, unmanned aerial vehicle (UAV), and smart healthcare scenarios, e.g., where multiple users offload tasks to the MEC simultaneously.The authors in [26] proposed a communication enabled indoor intelligent robots (IRs) service framework, which adopted the NOMA to support the highly reliable communications.The efficiency and communication reliability of the IRs was maximized using a DRL-based algorithm.The authors in [27] considered a framework for computation offloading in which UAVs used NOMA and MEC techniques to serve mobile users.They introduced federated learning and reinforcement learning to solve the problem of privacy restriction between the UAVs.In order to satisfy the ultra-reliable low-latency connectivity requirements of the remote-e-Health systems, the authors in [28] considered applying a NOMA to the e-Health systems and proved that the NOMA exhibits an excellent performance in the scenarios of fifth generation and beyond.These solutions provide certain insights in to applying a NOMA to enable efficient task offloading in MEC scenarios, while also providing a feasible scheme through which to solve the efficient transmission of video data.

Video Analysis in MEC Scenarios
Due to the increasing demand of video surveillance applications for resources [29], offloading them to edge devices for computing has been widely studied by industry scholars.Additionally, these studies focus on optimizing the target through intelligent offloading or adaptive configuration.The authors in [30] observed ROI changes from the perspective of the UE, when they decoupled the rendering and offloading parts with fast object tracking used locally in order to solve this problem.The authors in [31] investigated edge-end cloud collaboration for real-time video analytics and designed an online algorithm to achieve near-optimal utility by adjusting the quality of video frames generated on the UEs.The authors in [32] designed a new video configuration decision-making system, which examined the influence of video content on the frame rate and the resolution of the video stream.The authors in [33] optimized the video configuration and network bandwidth resource allocation using the Lyapunov and the Markov approximation, which solved the problems of resource limitation and network dynamic changes in edge-based video analysis systems.To consider fairness and long-term system cost, as well as optimizing the overall user QoE, the authors in [34] proposed an intelligent edge cache system to solve the bandwidth requirement and delay tolerance of 3600 panoramic video footage.To realize secure video sharing in vehicular edge computing, the authors in [35] designed an attribute-based encryption algorithm with static and dynamic attributes, and utilized a blockchain to record access strategies, which could ensure the data security and privacy of the video footage.In order to improve the QoE of live-streaming video, the authors in [36] first selected the candidate transcoding tasks by their contribution to popularity-weighted video quality and assigned these tasks to MEC in a greedy manner.The authors in [37] proposed a segment prefetching and edge caching algorithm to improve the QoE of Hyper Text Transfer Protocol (HTTP) adaptive video streaming.They first proposed and analyzed different segmentation prefetch strategies to dynamically adapt to the current conditions of the network and the needs of service providers.Moreover, they presented segment prefetching policies based on different approaches and techniques, and they studied their performance and feasibility.However, for real surveillance video, the resolution can only be selected downward.When the network resources are sufficient, we can then consider selecting the higher resolution video using super-resolution techniques to obtain higher video frame resolution and video analysis accuracy.

Video Offloading Based on DRL
In the last few years, the development of artificial intelligence (AI) technologies has led to rapid progress of DRL in modeling, routing, and resource management with a model-free environment.An adaptive video configuration network was pursued in [38] based on a black-box approach, independent of a detailed analytical performance model.The authors presented and designed an intelligent system named Cuttlefish, a type of smart coder which can adapt to the needs of the users without using any pre-programmed models or specific assumptions.The authors in [39] dealt with the problem of joint configuration adaptation and bandwidth allocation in an edge-assisted real-time video analysis system.They presented a novel approach which could select the configurations for multiple video streams immediately based on the state of the network and the content of the video.To work out the collaboration in the MEC network, an AI-based task allocation algorithm was presented in [40], which was trained by using a self-play strategy.The algorithm could detect a change in the network environment and adjust the resource allocation decision simultaneously.The authors in [41] proposed a two-layer learning model based on a DQN and a back propagation neural network to solve the joint decision of task offloading, wireless channel allocation, and image compression ratio selection in video analysis, and balance the accuracy of image recognition and processing delay.The authors in [42] presented a new approach to allocating resources in MEC networks using a radio map and DRL.Then, they presented a collaborative offloading and resource allocation algorithm which was used to solve the problem of reducing system latency and energy consumption.The authors in [43] considered the real-time video analytics of cameras based on edge coordination.In order to realize highly energy-efficient video analysis in a digital twin, a mobile device and edge coordination video analysis framework based on deep reinforcement learning was proposed, which takes energy consumption, analysis accuracy, and delay into consideration.However, using a DQN network to train the multi-parameter problem necessitates the calculation of the probabilities of various actions and the selection of the action with the maximum probability, which leads to a high training delay and reduces effectiveness.Therefore, it is worthwhile to study how the training efficiency can be increased, and the precision of the network can be guaranteed.

System Model
The video edge scheduling model consists of two layers with different functionalities, the end layer and the edge layer, as shown in Figure 1.In the end layer, a set of UEs is randomly distributed on the ground, which can be expressed as M = {1,2, . . .,M}.N = {1,2, . . .,N} represents a set of clusters of NOMA, and K = {1,2, . . .,K} denotes a set of subchannels.All UEs in a NOMA cluster share one subchannel at the same time for offloading, with each subchannel having an equal bandwidth.The UEs adopt a binary offloading rule, that is, the task of each UE must be processed locally or offloaded to the edge layer.The UEs continuously transmit video analysis tasks to the edge layer.In the edge layer, a MEC server is integrated on a computation access point (CAP), which receives the tasks offloaded to the edge layer, and the MEC server provides computing services for tasks.
where τ represents the number of bits required for a pixel to carry information.η 2 m indicates the video frame resolution of task D m .
Although the edge environment is constantly changing, the network state and video data are stable within a short time range, therefore we split the time into discrete time slots and each time slot has a duration.At the start of each time slot, the resources are reconfigured according to the present state and historical trend to obtain the best resource distribution status for the overall and long-term results.In the rest of this section, we explain the NOMA-enabled transmission model (Section 3.1) and the edge computing model (Section 3.2).

NOMA-Enabled Transmission Model
We assume that each UE can only be grouped into one NOMA cluster [44].We define x m,n = 1 to indicate that UE m is assigned to the NOMA cluster n.On the other hand, x m,n = 0 indicates that this assignment does not occur.Then, Since the result obtained after video processing is very small, we do not consider the process of sending the result back to the UEs, and only consider the process of offloading the video stream to the MEC server.The uplink transmission rate of the UEs for the NOMA scheme is where W denotes the total transmission bandwidth, which may be bisected by K subchannels; p m,k represents the transmit power of the UE m to the MEC server on subchannel k; h m,k represents the channel gain between the UE m and the MEC server on subchannel k; σ 2 k indicates the noise power on the subchannel k.Then, the translation delay of UE m can be expressed as where ρ m is the compression ratio of the video frame for UE m, which is determined by video resolution and bit rate [45].
When the video stream task D m is offloaded to the edge for computing, the energy consumption generated by the UE m during transmission is (5)

Edge Computation Model
(1) Edge Computing: We denote F as the total computing capacity of the MEC server, while κ m is the ratio of computing capacity allocated by the MEC server to the UE m.Thus, the computation delay of the task D m offloaded to the MEC server is The energy consumption generated by the UE m during edge processing is where υ = 10 −27 is the effective switched capacitance of the CPU, determined by the CPU hardware architecture.
(2) Local Computing: For task D m computed locally, we use F loc m to present the computing capacity of UE m.Thus, the latency of the task D m processed locally is Thus, the energy consumption of the task processed locally is

Problem Formulation
In order to optimize multiple conflicting goals equally, a common approach is to give different weights to these conflicting goals, and then to weigh and sum the goals.In this article, improving user accuracy and reducing processing latency are the basic goals.According to the reference [46], the analytical accuracy ϕ m of task D m is expressed as the ratio of the number of objects that are correctly identified to the total number of objects in a video frame, which can be expressed as which is widely used in the relevant references [47][48][49].
Combined with Equations ( 4) and ( 6), the total latency generated by UE m offloaded to the MEC server is composed of the transmission delay and computation delay, which can be expressed as Combined with Equations ( 5) and ( 7), the total energy consumption generated by UE m offloaded to the MEC sever is composed of transmission energy consumption and computation energy consumption, which can be expressed as According to the assigned calculation model and communication model, the total latency required for task D m to be processed at t is expressed as At the same time, the total energy consumption of the task D m at t can be expressed as The states and video content are constantly changing, causing the offloading decision and resource allocation strategies to need to be constantly adjusted to accommodate the dynamics of our environment.When designing adaptive algorithms, our goal is to optimize the cost function, consisting of delay T m , energy consumption E m , and video analysis accuracy ϕ m , under long-term resource constraints.Based on the design of the utility function in [47], the cost minimization function can be modeled as In the cost function, ω t is the weight for the delay, and ω e is the weight for the energy consumption, where ω t + ω e < 1. Constraint C 1 in the objective function guarantees that a UE can only be assigned to a NOMA cluster.Constraint C 2 denotes that the UEs can only select computed locally or offloaded to the MEC server.Constraint C 3 ensures that the maximum total delay of UE m must be less than the tolerance time T max m of task D m , and the total energy consumption of UE m cannot exceed the threshold E max m .Constraint C 4 guarantees that the transmit power of UE m cannot exceed the threshold P max m .Constraint C 5 ensures that the allocated computation capacity cannot exceed the total capacity of the MEC server.Constraint C 6 guarantees that the minimum video frame resolution of task D m must be higher than the threshold η min m .Two important challenges to solving this problem are the difficulty of the problem itself, and the prediction of future network status, video content, and other information.Since edge nodes typically run for months or years, in order to deal with the problem of unpredictable future information, it is necessary to relax the constraints in each time slot of the objective function to the average over a long period of time.In addition, the optimization problem is a mixed integer nonlinear program which is hard to resolve even if the future information is known.To address these two challenges, we need to design an algorithm that provides the best offloading and resource allocation for video streaming without being able to foresee future information.

Deep Reinforcement Learning-Based Algorithm
Based on the optimization target and constraints, the DRL-based algorithm is adopted.In the rest of this article, we first define the state space, the action space, and the reward function.Secondly, we present a more detailed description of the participant critic algorithm framework.

Deep Reinforcement Learning Model
The deep reinforcement learning process reformulates the computational offloading problem as a Markov Decision Process (MDP) model.A typical MDP model consists of a tuple {S, A, P, R, γ} with five elements, where S represents the state space, A represents the finite action space, P is the state transfer probability, R represents the reward function, and γ ∈ [0, 1] is the discount factor for future rewards.Each element of the MDP model tuple corresponds to the following meaning.

State Space
At time slot t, the state of the UEs includes basic information about the computational task.The state space S m,t ∈ S t can be expressed as where S m,t denotes the state space at time slot t.

Action Space
At time slot t, the action of UE i is represented as It consists of two vectors: the task offloading and resource allocation vector α m , and the video frame resolution scale vector β m .Vector α m contains two actions: the resource allocation action x m,n , and the offloading decision action ∑ N n=1 x m,n .x m,n represents whether UE m is assigned to the NOMA cluster n, and ∑ N n=1 x m,n represents whether task D m needs to be offloaded to the MEC server.Vector β m represents the action for the video frame resolution compression ratio selection, where β m ∈ [0.5, 1.5].
In the MEC system network proposed in this paper, the MEC server distributes the offloading and resource allocation policy to the UEs, however, the selection of the video frames resolution should also be determined.

Reward Function
The cost minimization function in this article contains multiple factors.Specifically, our goal is to reduce latency and energy consumption, as well as improve video analysis accuracy under long-term resource constraints.Therefore, the reward functions can be designed based on the optimization problem.
We propose JVFRS-TO-RA-DQN, which contains two DQN networks.The first DQN network selects the optimal offloading and resource allocation strategy, and the second DQN network selects the appropriate video frame resolution scaling factor to ensure the maximum accuracy of video analysis and to reduce the system delay and energy consumption.A detailed description of the two reward functions is given below.
After performing the action A m,t , a reward r m,t will be obtained for the action A m,t + 1 that the edge server chooses to perform.The reward function is generally related to an objective function, which aims to minimize the delay of the system in the context of task offloading and resource allocation.However, the aim of reinforcement learning training is to obtain the maximum long-term accumulation of rewards.Thus, the offloading reward function of the UEs at time t can be designed as For UEs, it should be penalized if the accuracy of the next state is not within the threshold after taking action α m,t .Therefore, the resolution scaling reward function is designed as Finally, we use r m,t = ξ m,t + ζ m,t to represent the total reward of the system.By maximizing the long-term cumulative reward r m,t , an efficient joint video frame resolution, task offloading, and resource allocation strategy, which we abbreviate as JVFRS-CO-RA-DQN in this paper, can be developed to achieve the minimization of system delay and energy consumption while improving video analysis accuracy.

JVFRS-TO-RA-DQN Algorithm
In this section, we propose JVFRS-TO-RA-DQN to solve the problem of joint video frame resolution scaling, task offloading, and resource allocation.The proposed algorithm is based on DQN, which can study the offline historical data through the experience of a simulation without requiring full environmental knowledge.The detailed algorithm is shown in Figure 2.
According to the state of the system at present, the DQN algorithm maximizes the predefined reward function by choosing an A m,t from a limited sum of actions.
In the process of training, apart from state S m,t , action A m,t , policy π, and reward function r m,t , the state-action value function Q π (S m,t , A m,t ) determines the action A m,t of the state S m,t through a mapping function π(A m,t |S m,t ).If Q π (S m,t , A m,t ) is updated at each time step, then it is assumed that it will converge to the optimum state-action value function Q π' (S m,t , A m,t ).According to the Bellman equation, the evaluation of the quality of a specific action in a given specific state can be expressed as where λ represents the learning rate which reflects the rate of the algorithm adapting to a new environment, where λ ∈ (0, 1].Since the complexity of Equation ( 20) is exponentially related to the number of state-action pairs, it is more difficult to solve Q values when the state-action pairs increase.In order to accurately calculate the Q values, predicting the values of Q between different state-action pairs is significant, and also represents the hinge of the DQN algorithm.In contrast to using traditional tabular Q-learning for prediction, DQN has a special replay memory structure to store the data generated after each step, including every step.When the network is in training, it extracts some memory from the replay memory for experiential learning.The replay memory has enough training data to fit the Q values of different state-action pairs, which leads to where Q(•; θ) is a deep learning network function denoted by θ.Q π (S, A) is updated by minimizing the loss function, which is defined as The gradient descent algorithm is utilized to minimize the loss in Equation ( 22) and therefore to update the weight θ, so as to make it possible to minimize the error between the evaluation and the target.As a result, the neural network can predict more accurately as the training process continues.The JVFRS-CO-RA-DQN algorithm is explained in Algorithm 1. Initialize state S m,t in Equation ( 16 Compute the offloading target Q value and the scaling target Q value 14: Train the offloading target Q value and the scaling target Q value 15: Perform gradient descent with respect to θ 16: Update the evaluate Q-network and target Q-network 17: end for 18: end for

Experimental Results and Discussion
In this section, we evaluate the performance of JVFRS-CO-RA-DQN in terms of the cost under different network conditions, the delay and analysis accuracy under different minimum frame resolutions, and the convergence performance.We first describe the parameter settings before delving into the simulation results.

Parameter Settings
In this study, we adapt Python 3.7 as the software tool to simulate the framework, and the deep learning framework in JVFRS-CO-RA-DQN is PyTorch 1.4.0.The hardware is a computer with Intel I7-13700HQ @ 2.5 GHz and 16-GB of memory.To verify the effectiveness of the JVFRS-CO-RA-DQN algorithm proposed in this study, a network consisting of one MEC, four NOMA clusters, and ten UEs is considered for experiments, with the UEs randomly distributed within [0, 200] m from the MEC.The total communication bandwidth W is 12 MHz.We define c m in correlation with the data size of the task as c m = c bit m l m , where c bit m is 100 cycles/bit.The computational capacity of the MEC F is 12 GHz, the computational capacity of the UEs.F loc m is [0.4,2] GHz, the average energy consumption threshold E max m is 15 J, and the number of bits required to carry information per unit pixel of video τ is 24.The minimum transmission power p of the UEs is 0.5 W, while the maximum tolerance time t max m of task D m is 30 ms.Based on the experimental data from the reference [46], we adopt Equation (10) as the analytic accuracy function on both edge servers and the UEs, and the video frame resolution is higher than 40,000 px (200 × 200).The parameters are shown in Table 1.

Result Analysis
To assess its performance fairly, the proposed scheme was compared with four baseline schemes: A comparison between the six algorithms is shown in Table 2.
We first take the experiment in a specific scene with k = 4, M = 10, F = 10 GHz, W = 12 MHz and η min m = 200 × 200 px.Then, we record the system delay under five different schemes.We take 1500 experiments and average the experimental data, which is exhibited in Table 3.The average latency of the JVFRS-TO-RA-DQN scheme is 167.71 ms.That is about 48.14% less than the JVFRS-TO-RA-DQN-OMA scheme, about 33.38% less than the TO-RA-DQN-NOMA scheme, and about 59.01% less than the MA-NOMA scheme.The influence of different communication bandwidths on the cost function is shown in Figure 3   Due to the fact that the UEs do not utilize the computing resources of the MEC server, the LCO schemes will not change as the computational capacity of the MEC server increases.However, the other schemes are reduced as the computational capacity of the MEC server increases since more computer source is allocated for MEC, with the computation time also being shortened accordingly.It is obvious that the average cost is mostly affected by other elements when the computational capacity of the MEC server is much bigger than the computational capacity of the UEs.Except for the delay and energy consumption, the video analytic accuracy is also affected by the video frame resolution.We modify the minimum video frame resolutions in this experiment to estimate the influence of resolution on accuracy.We take 1500 experiments and average the experimental data in a scenario with k = 4, M = 10, F = 10 GHz, W = 10 GHz, and η min m ranging from 200 × 200 px to 700 × 700 px.When calculating the accuracy, we assume that the system can detect all objects when the video frame resolution is 700 × 700 px.The video frame resolution is lower than 700 × 700 px, and the optimized resolution is between the minimum and the maximum video frame resolutions.The video frame resolution of the LCO scheme, the ECO-OMA scheme, and the TO-RA-DQN-NOMA scheme is identified as the average of the minimum and the maximum video frame resolutions.
Figure 5 shows the effects of different minimum frame resolutions on the average delay.As illustrated in Figure 5, the average delay of all the algorithms increases as the minimum video frame resolutions change, except for the MA-NOMA.This is because the MA-NOMA always keeps the highest resolution, and the change of minimum frame resolution does not affect it.The proposed algorithm maintains minimum average delay, which means the proposed algorithm can adjust the resolution of video frame adaptively to reduce the system average delay.The average delay of the JVFRS-TO-RA-DQN-OMA scheme is higher than that of the proposed algorithm, which means that the delay of the NOMA enabled MEC system is superior to that of the OMA. Figure 6 shows the effects of different minimum frame resolutions on video analytic accuracy.It is demonstrated that the MA-NOMA algorithm always maintains the highest level of accuracy, which is because the system can detect all objects when the video frame resolution is 700 × 700 px.The analytic accuracy of the proposed algorithm is lower than TO-RA-DQN-NOMA's, for the proposed algorithm sacrifices the analytic accuracy to reduce delay and energy consumption.The influence of delay and energy on the system is gradually increasing when the minimum video frame resolution is larger than 500 × 500 px, the video analytic accuracy is almost steady.We then depict the process of convergence with the TO-RA-DQN-NOMA algorithm and the proposed algorithm in a scenario with k = 4, M = 10, F = 10 GHz, W = 12 MHz and η min m = 200 × 200 px. Figure 7 depicts the performance differences between the TO-RA-DQN-NOMA algorithm and the proposed algorithm at different learning rates.On the one hand, the proposed algorithm obtains correspondingly higher rewards than the TO-RA-DQN-NOMA scheme.This is owing to the fact that different algorithms lead to different offloading decisions, which means that the video frames offloaded to MEC are different, while the average delay and video analysis accuracy are also different.Compared with TO-RA-DQN-NOMA algorithm, the simulation data in Table 4 indicates that the proposed algorithm has better performance than the TO-RA-DQN-NOMA algorithm, in terms of video analytic accuracy.Moreover, the convergence rate of the proposed algorithm is higher than that of the TO-RA-DQN-NOMA algorithm.Furthermore, it was discovered that after training about 200 epochs, the learning rates of the proposed algorithm are 10 −6 and 10 −7 , converging to a reward value.On the other hand, after training more than 400 epochs, the learning rates of the TO-RA-DQN-NOMA algorithm are 10 −6 and 10 −7 , converging to a reward value.Figure 8 shows the convergence performance of the proposed algorithm under different numbers of UEs.We take the experiment in a scene with k = 4, F = 10 GHz, W = 10 GHz, η min m = 200 × 200 px, M = 10, 15, and 20.The algorithm converges rapidly and steadily no matter how many UEs there are.Furthermore, the average reward of 10 UEs apparently exceeds that of 15 UEs.The average reward of 10 UEs converges to a reward for training at approximately 120 epochs, the average reward of 15 UEs converges to a reward for training at approximately 270 epochs, and the average reward of 20 UEs converges to a reward value at over 400 epochs.This is due to the fact that more UEs are able to offload computing tasks at a higher efficiency than when there are fewer UEs, which reduces the energy consumption of UEs while enhancing the users' QoE on the basis of latency, energy consumption, and video analytic accuracy.

Conclusions
With the popularization of video surveillance applications and the diversification of functional applications, real-time video stream analysis is of great value for intelligent monitoring, smart cities, autonomous driving, and other scenarios.In this paper, we designed a NOMA-enabled smart video analysis system with multiple UEs for the purpose of improving the video analytic accuracy, reducing the average delay, and decreasing the energy consumption.Aiming to optimize the QoE of UEs, we formulated a cost minimization problem composed of delay, energy, and accuracy to weigh up the relationship between these three parameters.The cost minimum function was an NP-hard problem with high dimensional nonlinear mixed integer programming, which was difficult to calculate the optimal solution.The JVFRS-TO-RA-DQN algorithm was proposed to solve the above problem.The proposed algorithm contains two DQN networks one was used to select the offloading and resource allocation actions, and the other was used to select video frame resolution scaling actions, which effectively overcame the sparsity of the single-layer reward function and accelerated the training convergence speed.A large number of simulation experiments showed that the JVFRS-TO-RA-DQN algorithm can achieve better performance in improving video analysis accuracy, reducing total delay and decreasing energy consumption compared to other baseline schemes.

Figure 1 .
Figure 1.The video edge scheduling architecture.The video stream calculation task of the UE m is expressed as D m = {l m , c m , T max m }, where l m represents the data size of the video stream task D m , c m represents the total central processing unit (CPU) cycles required for the video stream task D m , and T max m denotes the maximum tolerance time for task D m .After T max m , task D m will be declared the process ended in failure.The data size l m of task D m can be expressed as

Algorithm 1 : 1 :
JVFRS-CO-RA-DQN algorithmInput: D m , w, F, γ.Output: α m , β m .Initialize the evaluate network with random weights as θ 2: Initialize the target networks as a copy of the evaluate network with random weights as θ' 3: Initialize replay memory D 4: Initialize an empty state set S_Set 5: for episode = 1 to Max do 6:

( 1 )
Local Computing Only (LCO): the video streams are processed totally at the UEs with ∑ n=1 N x m,n = 0, ∀m ∈ M, which has a fixed video frame resolution.(2) Edge Computing Only via OMA (ECO-OMA): the video streams are totally offloaded to and processed at the MEC server with x m,n = 1, ∀m ∈ M, n ∈ N, which has a fixed video frame resolution.(3) JVFRS-TO-RA-DQN via OMA (JVFRS-TO-RA-DQN-OMA): Unlike JVFRS-TO-RA-DQN, task D m generated by UE m are offloaded to the MEC server through OMA.Each UE has an independent subchannel.We use y m to denote whether task D m offloaded to the MEC server, y m = 1 denotes that task D m were offloaded to MEC sever; otherwise, y m = 0. (4) Task offloading and a resource allocation algorithm based on DQN via NOMA (TO-RA-DQN-NOMA) [50]: Compared with JVFRS-TO-RA-DQN, TO-RA-DQN-NOMA does not consider the change in video frame resolution, which means that it has a fixed video frame resolution.(5) Maximum accuracy algorithm via NOMA (MA-NOMA) [46]: Compared with JVFRS-TO-RA-DQN, MA-NOMA implements maximum accuracy with the largest frame resolutions in NOMA.
. We take 1500 experiments and average the experimental data in a scene with k = 4, M = 10, F = 10 GHz, η min m = 200 × 200 px and W ranging from 2 MHz to 12 MHz.Figure 3 exhibits the effect of different communication bandwidths on the cost in this scenario under six schemes.First of all, the cost of all schemes decreases as the communication bandwidth increases, except for that of the LCO scheme.This is because the LCO scheme transmits only at the UEs, and a change in the network communication bandwidth does not affect it.Additionally, the average cost of LCO schemes almost does not change.However, the cost of other schemes decreases as the communication bandwidth increases because every UE is able to allocate more bandwidth and the delay and energy consumption of communication transmission is also reduced.The cost of the proposed algorithm is reduced by about 49.51% compared with the JVFRS-TO-RA-DQN-OMA scheme at W = 6 MHz, and by about 34.17% compared with the TO-RA-DQN-NOMA scheme at W = 6 MHz.

Figure 3 .
Figure 3.The effect of communication bandwidth on the cost.

Figure 4
Figure 4 depicts the effect of the different computational capacities of the MEC server on the cost function.We take 1500 experiments and average the experimental data in a scenario with k = 4, M = 10, W = 10 GHz, η min m = 200 × 200 px and F ranging from 2 MHz to 12 MHz.Due to the fact that the UEs do not utilize the computing resources of the MEC server, the LCO schemes will not change as the computational capacity of the MEC server increases.However, the other schemes are reduced as the computational capacity of the MEC server increases since more computer source is allocated for MEC, with the computation time also being shortened accordingly.It is obvious that the average cost is mostly affected by other elements when the computational capacity of the MEC server is much bigger than the computational capacity of the UEs.

Figure 4 .
Figure 4.The effect of computational capacity of MEC on the cost.

Figure 5 .
Figure 5.The effect of minimum frame resolution on average delay.

Figure 6 .
Figure 6.The effect of minimum frame resolution on video analytic accuracy.

Figure 7 .
Figure 7. Convergence performance of the proposed algorithm and the TO−RA−DQN−NOMA algorithm under different learning rates.

Figure 8 .
Figure 8. Convergence performance of the proposed algorithm with different numbers of UEs.
With probability ε to select a random offloading and resource allocation decision α m,t ; with probability δ to select a random resolution β m,t 9: Execute action α m,t , receive a reward ξ m,t ; execute action β m,t , receive a reward ζ m,t 10: Combine α m,t and β m,t as A m,t , calculate r m,t with ξ m,t and ζ m,t , and observe the next state S m,t + 1 11: Store interaction tuple {S m,t , A m,t , r m,t , S m,t + 1 } in D 12: Sample a random tuple {S m,t , A m,t , r m,t , S m,t + 1 } from D 13:

Table 3 .
The average latency of six schemes.

Table 4 .
The video analytic accuracy of the two schemes.