1. Introduction
The rapid development of the social economy is inseparable from the rapid expansion of power grid scale. Large-scale and complex transmission lines require plenty of resources for inspection. Due to differences in distribution, the hidden dangers of transmission lines are also different [
1]. The traditional manual-based transmission line inspection method shows the shortcomings such as low inspection efficiency, many blind spots, and easy occurrence of safety accidents, which cannot meet the requirements of modern inspection. These shortcomings of traditional detection methods promote the development of unmanned aerial vehicle (UAV) inspection [
2].
The use of UAV for transmission line inspection is a key part of the interconnected power grid [
3]. Due to the computing equipment carried by UAV, it has weak computing power and limited energy. Therefore, no one has the opportunity to unload the fault task to the cloud for processing during transmission line inspection [
4]. Cloud servers can provide strong computing power. However, the huge transmission cost and relatively long delay may not be conducive to inspectors to obtain fault information in a timely manner, especially for inspection tasks that are sensitive to delay. If the delay of some tasks exceeds the allowable delay requirement, it will lead to untimely maintenance of the line and even a large-scale power outage [
5].
Edge computing has the characteristics of being close to the data source, good real-time performance, low latency, and fast response, which effectively solves the problems of a centralized cloud. Therefore, edge computing technology is gradually being applied to the field of transmission line inspection. As a new network architecture, mobile edge computing provides computing resources at the edge of the network (such as base stations, where edge computing servers are arranged next to the base stations), which can process transmission line inspection tasks at close range, reduce task delays and improve inspection tasks reliability [
6,
7,
8,
9]. Under the circumstance that the battery energy of the drone and the computing power of the edge computing server are limited, how to make full use of the edge resources is the key technology to improve the inspection efficiency of transmission lines. However, how to reasonably divide tasks and offload them to edge servers is the key to solving the problem [
10,
11].
Computing offloading is an important application of edge computing, and a reasonable offloading strategy can effectively improve the efficiency of task completion. Since the computing resources and energy of UAVs are limited, and the reliability of cloud services needs to be considered, a reasonable offloading strategy is required [
12,
13,
14]. Since the delay will affect the user’s quality of service (QoS), and may cause the coupling program to fail to run normally due to the lack of the calculation result of this segment. Therefore, all offloading decisions at least need to meet the time delay limit acceptable to the mobile device program [
15,
16]. Reference [
17] uses Markov decision process to obtain the optimal strategy for edge computing offloading for dynamic program mining technology, but it is difficult to obtain a fixed state transition probability matrix in reality, which reduces the accuracy of decision making. In Reference [
18], a scenario where a single ground base station provides computing offloading services for a single UAV is introduced. The UAV flies from the initial position to the destination, some tasks are calculated locally, and the other part is offloaded to the ground base station for execution. The UAV flight path and bit allocation are jointly optimized, and the sub-optimal solution of the minimum UAV energy consumption is obtained by the method of successive convex approximation. The computing power of a single UAV is limited and cannot meet the needs of multi-user intensive tasks. Based on the energy harvesting causal constraints and the UAV speed constraints, the Reference [
19] proposed to use a two-stage algorithm and a three-stage alternative algorithm to solve the calculation rate maximization problem of the MEC wireless power supply system in partial and binary offloading modes, respectively. The algorithm is fast and has low complexity. However, this strategy does not take into account the limited energy of UAVs. In reference [
20], the offloading problem is expressed as a mixed integer nonlinear programming problem, and the solution is designed based on genetic algorithm and particle swarm optimization. The energy consumption to complete the task is minimized. Tang et al. [
21] proposed a system load joint optimization method based on discrete binary particle swarm optimization algorithm, which effectively reduces system delay and energy consumption. Reference [
22] proposes an online task offloading and energy efficiency frequency scaling algorithm, which minimizes device energy consumption while ensuring the upper limit of task queue length. Liu et al. [
23] used Lyapunov optimization and duality theory to reformulate the problem and decompose it into a set of subproblems. Each sub-problem can be solved separately and distributed by mobile devices or edge servers, effectively reducing system latency. However, most of the above uninstallation strategies are for single-user single-edge server scenarios.
In order to solve the above problems, this paper proposes a power inspection and offloading strategy based on deep reinforcement learning (DRL). Different from the above methods, this offloading strategy is suitable for multi-UAV and edge servers with limited computing power, and has a wider scope of application. In this paper, a “device-edge-cloud” collaborative offloading architecture is constructed in the UAV edge environment. The computing model is established on the basis of edge server computing and communication resource constraints. Turn the task offloading problem into a minimal latency problem. Then, a Markov decision is constructed to solve the optimization problem using a deep Q-network (DQN). In addition, experience replay mechanism and greedy algorithm are introduced in the learning process. The experimental results show that the offloading strategy proposed in this paper has lower delay and higher reliability, which can effectively improve the inspection efficiency of transmission lines.
2. UAV Inspection System Model
The “end-edge-cloud” system model based on DRL under the background of multiple UAVs and single ES ring for transmission circuit fault inspection is shown in
Figure 1. The system is divided into UAV layer, MEC layer and cloud layer. Among them, the UAV layer contains
N UAVs conducting fault inspection within the range of single-edge computing server, denoted as
N = 1, 2, 3, ..., H, and the computing resources of each UAV are limited, that is, some tasks of UAV can be processed by using local computing resources.
The edge layer is mainly the base station located near the edge server installation of the transmission line, and the UAV can be connected to the corresponding edge server through the wireless channel. In this paper, this paper assumes that there is a base station near the transmission line, which can meet the task requirements of the UAV offloading to the ES. At the same time, the drone’s computing tasks can be uninstalled to the cloud, where the base station will upload the task to the cloud through wired channels for computing. The layer where the cloud is located is the cloud service layer, which contains high-performance computing resources and is directly connected to the service providers, which can meet the requirements of UAV fault inspection tasks efficiently and quickly.
4. Based on The DRL Offloading Policy
In order to improve the reliability of offloading, an agent is set on the edge server. It is the main component when dealing with artificial intelligence, where the agent can perceive certain information from the environment and can also perform certain actions on the environment. After the agent makes the offloading decision, the UAV unloads the task to the designated device for processing according to the decision result. At the same time, the agent will send the task offloading results to the cloud for aggregation and optimization of offloading parameters. The cloud then transmits the optimized parameters back to the decision maker, so as to continuously update and optimize the service offloading decision.
4.1. Key Elements of Reinforcement Learning
4.1.1. Environment
The agent in the edge server interacts with the edge computing environment. In each time slot, the agent observes the environment and obtains , where D is the information of the drone, W is the number of tasks, q is the communication resource of the edge server, and is the computing resource of the edge server.
4.1.2. State
When the DRL algorithm is used to offload the UAV edge computing, its state reflects the UAV environment. The current state of the system reflects the service requirements, network conditions, and computing resources of edge servers of all UAV users in the environment for a certain period of time, i.e., , where dk is the amount of input data of the service to be unloaded, ck is the amount of computation required to execute the service, pk represents the signal transmission power of the UAV, and hk represents the channel gain between the user and the edge computing node. At each time slot t, the agent receives the current state s from the environment, takes action a based on the current state, and reaches a new state. The agent generates a sequence through the interaction between the above process and the environment:, the Markov decision process.
4.1.3. Actions and Rewards
In each slot, the agent selects an action from action space A, which assigns all CPU cores to offload tasks waiting at the head of each buffer. If a task allocates zero CPU cores, it will wait for the next allocation.
In the edge computing system in this paper, the agent interacts with the environment through states, actions and rewards. When the agent receives the task request from the drone, it finds the optimal action based on the current environment state. An action is then returned to the drone indicating that the task is performed locally, on an edge server or in the cloud. After the action is executed, the agent will receive the reward for the action. Finally, the environment moves to the next state. This paper defines the reward obtained by executing action in the system state with time step x as: , that is, the time saved compared to the local execution of the UAV.
4.2. Based on DRL Offloading Algorithm
The Q function is defined as
, which represents the property of an action
in the state
. After receiving the state
s and taking action
a, the optimal Q function can obtain the expected maximum reward according to the policy
, that is,
. The Bellman equation can be obtained through the above Q optimal function [
24], as shown in Formula (12):
where
is the discount factor, which takes a value between 0 and 1.
The
Q function can be obtained by iteratively updating the Bedman Equation:
This iterative algorithm guarantees that Q(k) tends to Q* when k tends to infinity, but it takes a long time. Therefore, the Q function is approximately estimated using a neural network as a nonlinear function, where the input and output of the neural network are the state vector and the Q value for each action, respectively.
Define a as an approximate function Q, that is
, and a neural network whose weight parameter a is called a Q-network. The Q-network updates its parameter
i by minimizing the sequence of loss functions
L at the time of iteration
i, and the loss function is shown in Formula (14):
where
is the target value of the iteration
i times.
Differentiate the loss function in the direction of the weight parameter
θ to obtain the descending gradient shown in (15):
The parameters of the DQN are iteratively updated using stochastic gradient descent (SGD). Since approximating a Q function with a nonlinear function may lead to an unstable learning process, this paper adopts an experience replay mechanism to improve the performance. It is able to decompose the similarity of subsequent training samples so that the network obtains local minima. During each training process, SGD updates are made in random mini-batches from memory pool D. Since the deep learning algorithm is a non-policy algorithm that needs to be fully explored, this paper adopts the ϵ-greedy search strategy, and the specific algorithm is shown in Algorithm 1.
First, initialize the state, experience pool, and Q-network. Observe the current state, randomly select an action, then execute the action to obtain the next state, and obtain the action with the maximum function value through the greedy strategy. Store sequence into experience pool D.
Secondly, the Q-network updates its parameters by minimizing the loss function sequence , using SGD to iteratively update the parameter θi of the DQN until the number of iterations is complete.
Lastly, each UAV receives the optimal unloading decision, reducing the usage time of the entire system.
Algorithm 1 Based on DRL offloading algorithm |
Input: current state , experience pool D, initial Q-network |
Output: optimal action |
1: Initialize playback memory D, the number of data bars that can be accommodated is M |
2: Use random weights to initialize the Q function |
3: for episode = 1,…,M do |
4: for do |
5: Observe the current state |
6: Randomly choose an action with a probability of |
7: Otherwise, use the greedy strategy to select the action with the largest current function value |
8: Execute action in the emulator to get and the next state |
9: Store a sequence in recall memory D |
10: Randomly sample a sample data from the playback memory D, denoted as |
11: Using the loss function of Formula (14), perform a gradient descent to obtain Formula (15) |
12: Perform SGD iteration according to Formula (15) and update the weights |
13: end for |
14: end for |
5. Simulation and Analysis
In this section, this paper is simulated by MATLAB 2018a tool. The computer is equipped with a Core i9-9900k CPU with a frequency of 3.4 GHz, a RAM capacity of 16 GB, and the operating system is Windows 10.
The decision performance of the proposed DRL algorithm is firstly simulated, and then local offloading (LO), cloud offloading (CO), and random offloading (RO) are performed. Finally, the algorithm in this paper is compared with the three unloading strategies of LO, CO and RO. It is assumed that the UAV swarm covers the transmission line within the range of 1000 m and covers an edge server that can provide offloading service. As the UAV group will randomly generate fault tasks, five types of inspection tasks are set, and the input data size is [25, 35, 50, 65] MB, respectively, and the calculation amount is [0.5, 0.6, 0.9, 1.2] GC. The requirements of various types of tasks are randomly generated from the five types of tasks, and the average delay generated by the inspection tasks performed by all UAVs is calculated. Other simulation parameters are shown in
Table 1.
Figure 2 shows the average task success rate for different task arrival rates, as we can see, the proposed algorithm has a higher success rate compared to the random offloading algorithm and local execution. At a low task arrival rate, the average task success rate is about 0.99. When the task success rate of local execution and random offloading strategy presents linear and exponential decay, as the task arrival probability increases, the offloading algorithm proposed in this paper only decreases slightly, almost above 0.95. This is mainly because the DRL algorithm proposed in this paper is based on the “end-edge-cloud” collaborative edge computing system, which can intelligently wait for the offloading task until idle computing resources appear. If the offloading task cannot be performed, it is transmitted to the cloud. Therefore, the DRL algorithm achieves good decision performance.
As shown in
Figure 3, because the computational power of UAVs is usually insufficient, the locally calculated response time does not meet the maximum time delay requirements for a large number of tasks to be processed, so the user average time delay increases with the raw data size. Compared with the LO, RO and CO algorithms, the average delay of the DRL offloading strategy was not significant. The CO algorithm unloads all tasks to the cloud for execution. Although the cloud server is rich in computing resources, long-distance transmission requires long transmission delay. Secondly, although the LO algorithm saves the transmission delay and the waiting delay, it will also increase the execution delay due to the limited computing resources of the UAV. Although the edge server can perform multiple tasks simultaneously, its random allocation of resources causes the offloading algorithm to reach the optimal policy of edge service offloading, so the RO algorithm delay is higher than DQN. It is worth noting that because the computational power of the edge server is limited, the user average service delay will be higher than the CO algorithm when the size of the raw data is greater than 60 MB of the algorithm. This is mainly because the delay generated in the data transmission has an impact on the DRL. With the development of communication technology, it is hoped that the data transmission delay can be reduced in the future. Compared with the three offloading algorithms of LO, RO and CO, the average task completion delay of the offloading strategy proposed in this paper is reduced by 54%, 26%, and 37%, respectively.
Figure 4 shows that the proposed DRL offloading strategy has significantly less UAV energy consumption than the locally executed and random offloading strategies. This is mainly because the LO algorithm consumes a lot of computing energy consumption of the UAV, and the RO offloading strategy will also consume a lot of communication energy of the UAV, which will greatly shorten the flight time of the UAV. The DRL offloading strategy is proposed based on the three-layer architecture, and the communication situation is considered in the computational modeling, so the energy consumption of the UAV is greatly reduced.
6. Conclusions
In order to meet the low-latency service requirements of UAV transmission line inspection, and improve the efficiency of transmission line inspection. In this paper, a DRL-based UAV edge computing offloading strategy is proposed. The offloading strategy is oriented to the single edge server serves multi-UAV scenario, and a “device-edge-cloud” collaborative service offloading architecture is proposed. The task offloading problem in the UAV edge environment is constructed as a minimum delay optimization problem obtained under the constraints of edge server communication and computing resources. Secondly, a deep Q-network is introduced to obtain the optimal unloading strategy of the UAV. In order to improve the accuracy of the unloading strategy in this paper, a greedy search strategy is added, and SGD iterative update is used to solve the optimization problem and to minimize the total service time of the drone. The simulation results show that compared with the three offloading strategies of local offloading, cloud offloading and random offloading, the DRL offloading strategy saves 54%, 37% and 26% of the task completion time, respectively. It effectively reduces the UAV inspection delay and improves the transmission line inspection efficiency.