Task Unloading Strategy of Multi UAV for Transmission Line Inspection Based on Deep Reinforcement Learning

: Due to the limitation of the computing power and energy resources, an unmanned aerial vehicle (UAV) team usually ofﬂoads the inspection task to the cloud for processing when performing emergency fault inspection, which will lead to low efﬁciency of transmission line inspection. In order to solve the above problems, this paper proposes a task ofﬂoading strategy based on deep reinforcement learning (DRL), aiming for the application of a multi-UAV and single-edge server. First, a “device-edge-cloud” collaborative ofﬂoading architecture is constructed in the UAV edge environment. Secondly, the problem of ofﬂoading power line inspection tasks is classiﬁed as an optimization problem to obtain the minimum delay under the constraints of edge server computing and communication resources. Finally, the problem is constructed as a Markov decision, and a deep Q-network (DQN) is used to obtain the minimum delay of the system. In addition, an experience replay mechanism and a greedy algorithm are introduced in the learning process to improve the ofﬂoading accuracy. The experimental results show that the proposed ofﬂoading strategy in this paper saves 54%, 37% and 26% of the task completion time, respectively, compared with local ofﬂoading, cloud ofﬂoading and random ofﬂoading. It effectively reduces the UAV inspection delay and improves the transmission line inspection efﬁciency.


Introduction
The rapid development of the social economy is inseparable from the rapid expansion of power grid scale.Large-scale and complex transmission lines require plenty of resources for inspection.Due to differences in distribution, the hidden dangers of transmission lines are also different [1].The traditional manual-based transmission line inspection method shows the shortcomings such as low inspection efficiency, many blind spots, and easy occurrence of safety accidents, which cannot meet the requirements of modern inspection.These shortcomings of traditional detection methods promote the development of unmanned aerial vehicle (UAV) inspection [2].
The use of UAV for transmission line inspection is a key part of the interconnected power grid [3].Due to the computing equipment carried by UAV, it has weak computing power and limited energy.Therefore, no one has the opportunity to unload the fault task to the cloud for processing during transmission line inspection [4].Cloud servers can provide strong computing power.However, the huge transmission cost and relatively long delay may not be conducive to inspectors to obtain fault information in a timely manner, especially for inspection tasks that are sensitive to delay.If the delay of some tasks exceeds the allowable delay requirement, it will lead to untimely maintenance of the line and even a large-scale power outage [5].
Edge computing has the characteristics of being close to the data source, good realtime performance, low latency, and fast response, which effectively solves the problems of a centralized cloud.Therefore, edge computing technology is gradually being applied to the field of transmission line inspection.As a new network architecture, mobile edge computing provides computing resources at the edge of the network (such as base stations, where edge computing servers are arranged next to the base stations), which can process transmission line inspection tasks at close range, reduce task delays and improve inspection tasks reliability [6][7][8][9].Under the circumstance that the battery energy of the drone and the computing power of the edge computing server are limited, how to make full use of the edge resources is the key technology to improve the inspection efficiency of transmission lines.However, how to reasonably divide tasks and offload them to edge servers is the key to solving the problem [10,11].
Computing offloading is an important application of edge computing, and a reasonable offloading strategy can effectively improve the efficiency of task completion.Since the computing resources and energy of UAVs are limited, and the reliability of cloud services needs to be considered, a reasonable offloading strategy is required [12][13][14].Since the delay will affect the user's quality of service (QoS), and may cause the coupling program to fail to run normally due to the lack of the calculation result of this segment.Therefore, all offloading decisions at least need to meet the time delay limit acceptable to the mobile device program [15,16].Reference [17] uses Markov decision process to obtain the optimal strategy for edge computing offloading for dynamic program mining technology, but it is difficult to obtain a fixed state transition probability matrix in reality, which reduces the accuracy of decision making.In Reference [18], a scenario where a single ground base station provides computing offloading services for a single UAV is introduced.The UAV flies from the initial position to the destination, some tasks are calculated locally, and the other part is offloaded to the ground base station for execution.The UAV flight path and bit allocation are jointly optimized, and the sub-optimal solution of the minimum UAV energy consumption is obtained by the method of successive convex approximation.The computing power of a single UAV is limited and cannot meet the needs of multi-user intensive tasks.Based on the energy harvesting causal constraints and the UAV speed constraints, the Reference [19] proposed to use a two-stage algorithm and a three-stage alternative algorithm to solve the calculation rate maximization problem of the MEC wireless power supply system in partial and binary offloading modes, respectively.The algorithm is fast and has low complexity.However, this strategy does not take into account the limited energy of UAVs.In reference [20], the offloading problem is expressed as a mixed integer nonlinear programming problem, and the solution is designed based on genetic algorithm and particle swarm optimization.The energy consumption to complete the task is minimized.Tang et al. [21] proposed a system load joint optimization method based on discrete binary particle swarm optimization algorithm, which effectively reduces system delay and energy consumption.Reference [22] proposes an online task offloading and energy efficiency frequency scaling algorithm, which minimizes device energy consumption while ensuring the upper limit of task queue length.Liu et al. [23] used Lyapunov optimization and duality theory to reformulate the problem and decompose it into a set of subproblems.Each sub-problem can be solved separately and distributed by mobile devices or edge servers, effectively reducing system latency.However, most of the above uninstallation strategies are for single-user single-edge server scenarios.
In order to solve the above problems, this paper proposes a power inspection and offloading strategy based on deep reinforcement learning (DRL).Different from the above methods, this offloading strategy is suitable for multi-UAV and edge servers with limited computing power, and has a wider scope of application.In this paper, a "device-edge-cloud" collaborative offloading architecture is constructed in the UAV edge environment.The computing model is established on the basis of edge server computing and communication resource constraints.Turn the task offloading problem into a minimal latency problem.Then, a Markov decision is constructed to solve the optimization problem using a deep Q-network (DQN).In addition, experience replay mechanism and greedy algorithm are introduced in the learning process.The experimental results show that the offloading strategy proposed in this paper has lower delay and higher reliability, which can effectively improve the inspection efficiency of transmission lines.

UAV Inspection System Model
The "end-edge-cloud" system model based on DRL under the background of multiple UAVs and single ES ring for transmission circuit fault inspection is shown in Figure 1.The system is divided into UAV layer, MEC layer and cloud layer.Among them, the UAV layer contains N UAVs conducting fault inspection within the range of single-edge computing server, denoted as N = 1, 2, 3, ..., H, and the computing resources of each UAV are limited, that is, some tasks of UAV can be processed by using local computing resources.
greedy algorithm are introduced in the learning process.The experimental results show that the offloading strategy proposed in this paper has lower delay and higher reliability, which can effectively improve the inspection efficiency of transmission lines.

UAV Inspection System Model
The "end-edge-cloud" system model based on DRL under the background of multiple UAVs and single ES ring for transmission circuit fault inspection is shown in Figure 1.The system is divided into UAV layer, MEC layer and cloud layer.Among them, the UAV layer contains N UAVs conducting fault inspection within the range of single-edge computing server, denoted as N = 1, 2, 3, ..., H, and the computing resources of each UAV are limited, that is, some tasks of UAV can be processed by using local computing resources.The edge layer is mainly the base station located near the edge server installation of the transmission line, and the UAV can be connected to the corresponding edge server through the wireless channel.In this paper, this paper assumes that there is a base station near the transmission line, which can meet the task requirements of the UAV offloading to the ES.At the same time, the drone's computing tasks can be uninstalled to the cloud, where the base station will upload the task to the cloud through wired channels for computing.The layer where the cloud is located is the cloud service layer, which contains high-performance computing resources and is directly connected to the service providers, which can meet the requirements of UAV fault inspection tasks efficiently and quickly.

Communication Model
In the edge computing system model, the UAV and the edge server are connected wirelessly to achieve two-way data transmission, assuming that there are no obstacles in the air, then the line-of-sight channel of the UAV communication link is dominant.Therefore, the UAV channel gain to the edge server is described as a free-space path loss model.The data transmission rate of the UAV n to the edge server is: where yn represents the number of sub-channels allocated to UAV N, B represents the bandwidth of a sub-channel, Pn is the transmitting power of UAV N, uav 0 β is the channel power gain at a reference distance of 1 m, dn represents the distance between UAV N and edge server, and N0 represents the spectral density of noise power.The edge layer is mainly the base station located near the edge server installation of the transmission line, and the UAV can be connected to the corresponding edge server through the wireless channel.In this paper, this paper assumes that there is a base station near the transmission line, which can meet the task requirements of the UAV offloading to the ES.At the same time, the drone's computing tasks can be uninstalled to the cloud, where the base station will upload the task to the cloud through wired channels for computing.The layer where the cloud is located is the cloud service layer, which contains high-performance computing resources and is directly connected to the service providers, which can meet the requirements of UAV fault inspection tasks efficiently and quickly.

Communication Model
In the edge computing system model, the UAV and the edge server are connected wirelessly to achieve two-way data transmission, assuming that there are no obstacles in the air, then the line-of-sight channel of the UAV communication link is dominant.Therefore, the UAV channel gain to the edge server is described as a free-space path loss model.The data transmission rate of the UAV n to the edge server is: where y n represents the number of sub-channels allocated to UAV N, B represents the bandwidth of a sub-channel, P n is the transmitting power of UAV N, β uav 0 is the channel power gain at a reference distance of 1 m, d n represents the distance between UAV N and edge server, and N 0 represents the spectral density of noise power.
In this system, the edge server and cloud server are connected by wire.As the cloud server is far from the edge node, the edge node forwards the UAV offloading data to the cloud.The delay of data transmission is generally similar to the delay of result return, and the delay is independent of the amount of input data.Therefore, the round-trip delay generated by data transmission between edge server and cloud server is shown in the following formula: where t cloud o f f represents the delay generated when data is forwarded from the edge server to the cloud.

Local Calculation Delay of UAV
When the UAV chooses to perform the task locally, it does not need to transmit the input data of the task to the edge server through the wireless channel, but only needs to use the local computing resources of the UAV for processing.At this point, we need to consider the delay t w n generated when UAV N waits for idle local computing resources and the execution delay t p n generated when the local UAV processes task requirements.The local computing delay of UAV is: where C n represents the amount of calculation required to complete the task requirements, and f n represents the calculation rate of local computing resources of UAV.

Delay of Offloading to Edge Server
When UAV computing tasks did not meet the requirements of the local executive offloading to the edge on the server for processing, the time delay consists of the following three parts: t w n represents the latency incurred while waiting for an idle channel and queuing in the edge server.t up n represents the task to upload the edge server transmission delay.t s n represents the execution delay of processing service requirements on the edge server.Since the returned data is smaller than the input data, the return delay is generally ignored.That is, the UAV chooses to execute on the edge server and the total delay is: where Z n represents the number of computing resources allocated by edge server to UAV N, C indicates the number of CPU cores assigned by the task, and f 0 is the cycle frequency of a CPU core.

Delay of Offloading to the Cloud Server
If the computing task of UAV needs to be unloaded to the cloud for execution, the input parameters of the task should first be uploaded to the edge layer by UAV through the wireless channel, and then forwarded to the cloud for processing by the corresponding edge server through the wired channel.Due to the strong computing capability of the cloud server, the task processing delay is negligible compared with the transmission delay.The delay t C n generated when the UAV offloads the computing task to the cloud mainly includes three parts: waiting delay for idle wireless channel, transmission delay for uploading data to edge computing node, and round-trip data transmission between edge server and cloud server, as shown in the formula: where: D n represents the data volume of input parameters required by this task.

Total Delay of Edge Computing System
In the edge computing system in this paper, each UAV computing task can be unloaded to one of the local, edge server and cloud for execution.As shown above, for all UAVs within λ, the formula for calculating the total delay of the system is as follows: where a n is defined as 1 or 0 when the task is executed locally or not, and b n is defined as 1 or 0 when the task is uninstalled to the edge server or the cloud server.

Objective Function
In order to improve the offloading rate, the objective of this paper is to minimize task delay under task delay constraint: st where Constraint (b) indicates that in any time period, the number of sub channels allocated to a task should not exceed the total number of sub channels that can be allocated.Constraint (c) indicates that in any time period, the computing resources allocated to UAV do not exceed the total amount of computing resources within the time period.Constraint (d) is that when the number of sub-channels allocated to the UAV is 0, the task is executed locally.Constraint (e) indicates that if the computing resources allocated to the UAV are 0 and are not executed locally, the task can only be unloaded to the cloud server for execution.This problem is a non-convex programming problem, so this paper uses deep reinforcement learning to solve the problem.

Based on The DRL Offloading Policy
In order to improve the reliability of offloading, an agent is set on the edge server.It is the main component when dealing with artificial intelligence, where the agent can perceive certain information from the environment and can also perform certain actions on the environment.After the agent makes the offloading decision, the UAV unloads the task to the designated device for processing according to the decision result.At the same time, the agent will send the task offloading results to the cloud for aggregation and optimization of offloading parameters.The cloud then transmits the optimized parameters back to the decision maker, so as to continuously update and optimize the service offloading decision.

Environment
The agent in the edge server interacts with the edge computing environment.In each time slot, the agent observes the environment and obtains x(n) = {D, W, q, η}, where D is the information of the drone, W is the number of tasks, q is the communication resource of the edge server, and η is the computing resource of the edge server.

State
When the DRL algorithm is used to offload the UAV edge computing, its state reflects the UAV environment.The current state of the system reflects the service requirements, network conditions, and computing resources of edge servers of all UAV users in the environment for a certain period of time, i.e., s (k) = {d k , c k , p k , h k }, where d k is the amount of input data of the service to be unloaded, c k is the amount of computation required to execute the service, p k represents the signal transmission power of the UAV, and h k represents the channel gain between the user and the edge computing node.At each time slot t, the agent receives the current state s from the environment, takes action a based on the current state, and reaches a new state.The agent generates a sequence through the interaction between the above process and the environment:s (1) , a (1) , s (2) , • • • a (k−1) , s (k) , the Markov decision process.

Actions and Rewards
In each slot, the agent selects an action a (k) from action space A, which assigns all CPU cores to offload tasks waiting at the head of each buffer.If a task allocates zero CPU cores, it will wait for the next allocation.
In the edge computing system in this paper, the agent interacts with the environment through states, actions and rewards.When the agent receives the task request from the drone, it finds the optimal action based on the current environment state.An action is then returned to the drone indicating that the task is performed locally, on an edge server or in the cloud.After the action is executed, the agent will receive the reward for the action.Finally, the environment moves to the next state.This paper defines the reward obtained by executing action a (x) in the system state s (x) with time step x as: µ(x) = t L n − t n , that is, the time saved compared to the local execution of the UAV.

Based on DRL Offloading Algorithm
The Q function is defined as Q(s (k) , a (k) ), which represents the property of an action a (k) in the state s (k) .After receiving the state s and taking action a, the optimal Q function can obtain the expected maximum reward according to the policy π, that is, Q * (s, a) = max π E R (k) s (k) = s, a (k) = a, π .The Bellman equation can be obtained through the above Q optimal function [24], as shown in Formula ( 12): where γ is the discount factor, which takes a value between 0 and 1.
The Q function can be obtained by iteratively updating the Bedman Equation: This iterative algorithm guarantees that Q(k) tends to Q* when k tends to infinity, but it takes a long time.Therefore, the Q function is approximately estimated using a neural network as a nonlinear function, where the input and output of the neural network are the state vector and the Q value for each action, respectively.
Define a as an approximate function Q, that is , and a neural network whose weight parameter a is called a Q-network.The Q-network updates its parameter i by minimizing the sequence of loss functions L at the time of iteration i, and the loss function is shown in Formula ( 14): where is the target value of the iteration i times.
Differentiate the loss function in the direction of the weight parameter θ to obtain the descending gradient shown in (15): The parameters of the DQN are iteratively updated using stochastic gradient descent (SGD).Since approximating a Q function with a nonlinear function may lead to an unstable learning process, this paper adopts an experience replay mechanism to improve the performance.It is able to decompose the similarity of subsequent training samples so that the network obtains local minima.During each training process, SGD updates are made in random mini-batches from memory pool D. Since the deep learning algorithm is a non-policy algorithm that needs to be fully explored, this paper adopts the -greedy search strategy, and the specific algorithm is shown in Algorithm 1.
First, initialize the state, experience pool, and Q-network.Observe the current state, randomly select an action, then execute the action to obtain the next state, and obtain the action a (k) with the maximum function value through the greedy strategy.Store sequence Secondly, the Q-network updates its parameters by minimizing the loss function sequence s (k) , a (k) , µ (k) , s (k+1) , using SGD to iteratively update the parameter θ i of the DQN until the number of iterations is complete.Lastly, each UAV receives the optimal unloading decision, reducing the usage time of the entire system.

Algorithm 1 Based on DRL offloading algorithm
Input: current state s (k) , experience pool D, initial Q-network Output: optimal action 1: Initialize playback memory D, the number of data bars that can be accommodated is M 2: Use random weights to initialize the Q function 3: for episode = 1, . . .,M do 4: Observe the current state s (k) 6: Randomly choose an action with a probability of a (k) 7: Otherwise, use the greedy strategy to select the action with the largest current function value Execute action a (k) in the emulator to get µ (k) and the next state s (k+1) 9: Store a sequence s (k) , a (k) , µ (k) , s (k+1) in recall memory D 10: Randomly sample a sample data from the playback memory D, denoted as Using the loss function of Formula ( 14), perform a gradient descent to obtain Formula (15) 12: Perform SGD iteration according to Formula (15) and update the weights θ = θ + ∆θ 13: end for 14: end for

Simulation and Analysis
In this section, this paper is simulated by MATLAB 2018a tool.The computer is equipped with a Core i9-9900k CPU with a frequency of 3.4 GHz, a RAM capacity of 16 GB, and the operating system is Windows 10.
The decision performance of the proposed DRL algorithm is firstly simulated, and then local offloading (LO), cloud offloading (CO), and random offloading (RO) are performed.Finally, the algorithm in this paper is compared with the three unloading strategies of LO, CO and RO.It is assumed that the UAV swarm covers the transmission line within the range of 1000 m and covers an edge server that can provide offloading service.As the UAV group will randomly generate fault tasks, five types of inspection tasks are set, and the input data size is [25, 35, 50, 65] MB, respectively, and the calculation amount is [0.5, 0.6, 0.9, 1.2] GC.The requirements of various types of tasks are randomly generated from the five types of tasks, and the average delay generated by the inspection tasks performed by all UAVs is calculated.Other simulation parameters are shown in Table 1. Figure 2 shows the average task success rate for different task arrival rates, as we can see, the proposed algorithm has a higher success rate compared to the random offloading algorithm and local execution.At a low task arrival rate, the average task success rate is about 0.99.When the task success rate of local execution and random offloading strategy presents linear and exponential decay, as the task arrival probability increases, the offloading algorithm proposed in this paper only decreases slightly, almost above 0.95.This is mainly because the DRL algorithm proposed in this paper is based on the "end-edge-cloud" collaborative edge computing system, which can intelligently wait for the offloading task until idle computing resources appear.If the offloading task cannot be performed, it is transmitted to the cloud.Therefore, the DRL algorithm achieves good decision performance.As shown in Figure 3, because the computational power of UAVs is usually insufficient, the locally calculated response time does not meet the maximum time delay requirements for a large number of tasks to be processed, so the user average time delay increases with the raw data size.Compared with the LO, RO and CO algorithms, the average delay of the DRL offloading strategy was not significant.The CO algorithm unloads all tasks to the cloud for execution.Although the cloud server is rich in computing resources, long-distance transmission requires long transmission delay.Secondly, although the LO algorithm saves the transmission delay and the waiting delay, it will also increase the execution delay due to the limited computing resources of the UAV.Although the edge server can perform multiple tasks simultaneously, its random allocation of resources causes the offloading algorithm to reach the optimal policy of edge service offloading, so the RO algorithm delay is higher than DQN.It is worth noting that because the computational power of the edge server is limited, the user average service delay will As shown in Figure 3, because the computational power of UAVs is usually insufficient, the locally calculated response time does not meet the maximum time delay requirements for a large number of tasks to be processed, so the user average time delay increases with the raw data size.Compared with the LO, RO and CO algorithms, the average delay of the DRL offloading strategy was not significant.The CO algorithm unloads all tasks to the cloud for execution.Although the cloud server is rich in computing resources, long-distance transmission requires long transmission delay.Secondly, although the LO algorithm saves the transmission delay and the waiting delay, it will also increase the execution delay due to the limited computing resources of the UAV.Although the edge server can perform multiple tasks simultaneously, its random allocation of resources causes the offloading algorithm to reach the optimal policy of edge service offloading, so the RO algorithm delay is higher than DQN.It is worth noting that because the computational power of the edge server is limited, the user average service delay will be higher than the CO algorithm when the size of the raw data is greater than 60 MB of the algorithm.This is mainly because the delay generated in the data transmission has an impact on the DRL.With the development of communication technology, it is hoped that the data transmission delay can be reduced in the future.Compared with the three offloading algorithms of LO, RO and CO, the average task completion delay of the offloading strategy proposed in this paper is reduced by 54%, 26%, and 37%, respectively.Figure 4 shows that the proposed DRL offloading strategy has significantly less UAV energy consumption than the locally executed and random offloading strategies.This is mainly because the LO algorithm consumes a lot of computing energy consumption of the UAV, and the RO offloading strategy will also consume a lot of communication energy of the UAV, which will greatly shorten the flight time of the UAV.The DRL offloading strategy is proposed based on the three-layer architecture, and the communication situation is considered in the computational modeling, so the energy consumption of the UAV is greatly reduced.

Conclusions
In order to meet the low-latency service requirements of UAV transmission line inspection, and improve the efficiency of transmission line inspection.In this paper, a DRL-based UAV edge computing offloading strategy is proposed.The offloading strat- Figure 4 shows that the proposed DRL offloading strategy has significantly less UAV energy consumption than the locally executed and random offloading strategies.This is mainly because the LO algorithm consumes a lot of computing energy consumption of the UAV, and the RO offloading strategy will also consume a lot of communication energy of the UAV, which will greatly shorten the flight time of the UAV.The DRL offloading strategy is proposed based on the three-layer architecture, and the communication situation is considered in the computational modeling, so the energy consumption of the UAV is greatly reduced.Figure 4 shows that the proposed DRL offloading strategy has significantly less UAV energy consumption than the locally executed and random offloading strategies.This is mainly because the LO algorithm consumes a lot of computing energy consumption of the UAV, and the RO offloading strategy will also consume a lot of communication energy of the UAV, which will greatly shorten the flight time of the UAV.The DRL offloading strategy is proposed based on the three-layer architecture, and the communication situation is considered in the computational modeling, so the energy consumption of the UAV is greatly reduced.

Conclusions
In order to meet the low-latency service requirements of UAV transmission line inspection, and improve the efficiency of transmission line inspection.In this paper, a DRL-based UAV edge computing offloading strategy is proposed.The offloading strat-

Conclusions
In order to meet the low-latency service requirements of UAV transmission line inspection, and improve the efficiency of transmission line inspection.In this paper, a DRL-based UAV edge computing offloading strategy is proposed.The offloading strategy is oriented to the single edge server serves multi-UAV scenario, and a "device-edge-cloud" collaborative service offloading architecture is proposed.The task offloading problem in the UAV edge environment is constructed as a minimum delay optimization problem obtained under the constraints of edge server communication and computing resources.Secondly, a deep Q-network is introduced to obtain the optimal unloading strategy of the UAV.In order to improve the accuracy of the unloading strategy in this paper, a greedy search strategy is added, and SGD iterative update is used to solve the optimization problem and to minimize the total service time of the drone.The simulation results show that compared with the three offloading strategies of local offloading, cloud offloading and random offloading, the DRL offloading strategy saves 54%, 37% and 26% of the task completion time, respectively.It effectively reduces the UAV inspection delay and improves the transmission line inspection efficiency.

Electronics 2022 ,
11,  x FOR PEER REVIEW 9 of 12 performed, it is transmitted to the cloud.Therefore, the DRL algorithm achieves good decision performance.

Figure 3 .
Figure 3. Relationship between delay and data size.

Figure 4 .
Figure 4. Diagram of energy consumption and data size.

Figure 3 .
Figure 3. Relationship between delay and data size.

Figure 3 .
Figure 3. Relationship between delay and data size.

Figure 4 .
Figure 4. Diagram of energy consumption and data size.

Figure 4 .
Figure 4. Diagram of energy consumption and data size.