Resource Allocation and Ofﬂoading Strategy for UAV-Assisted LEO Satellite Edge Computing

: In emergency situations, such as earthquakes, landslides and other natural disasters, the terrestrial communications infrastructure is severely disrupted and unable to provide services to terrestrial IoT devices. However, tasks in emergency scenarios often require high levels of computing power and energy supply that cannot be processed quickly enough by devices locally and require computational ofﬂoading. In addition, ofﬂoading tasks to server-equipped edge base stations may not always be feasible due to the lack of infrastructure or distance. Since Low Orbit Satellites (LEO) have abundant computing resources, and Unmanned Aerial Vehicles (UAVs) have ﬂexible deployment, ofﬂoading tasks to LEO satellite edge servers via UAVs becomes straightforward, which provides computing services to ground-based devices. Therefore, this paper investigates the computational tasks and resource allocation in a UAV-assisted multi-layer LEO satellite network, taking into account satellite computing resources and device task volumes. In order to minimise the weighted sum of energy consumption and delay in the system, the problem is formulated as a constrained optimisation problem, which is then transformed into a Markov Decision Problem (MDP). We propose a UAV-assisted airspace integration network architecture, and a Deep Deterministic Policy Gradient and Long short-term memory (DDPG-LSTM)-based task ofﬂoading and resource allocation algorithm to solve the problem. Simulation results demonstrate that the solution outperforms the baseline approach and that our framework and algorithm have the potential to provide reliable communication services in emergency situations.


Introduction
As 5G networks and IoT develop rapidly, countless promising applications and services are emerging, including a High Definition (HD) livestream, autonomous driving, industrial automation and virtual reality, which will take advantage of the benefits that 5G networks will offer, including extremely high data rates, reduced latency, enhanced reliability and large-scale connectivity [1].The number of IoT connection types is expected to reach 25 billion by 2025.However, the computational power of some IoT devices struggle to handle the large number of tasks due to their limited resources.The emergence of mobile edge computing provides strong support for task offloading and execution.
In traditional edge computing, servers are usually deployed on terrestrial infrastructure communication facilities, which are susceptible to severe damage and loss of service capacity by natural disasters, such as earthquakes and landslides.In emergency scenarios, where finding safe routes and field rescue is critical, these tasks are often highly latency sensitive and computationally intensive [2], requiring offloading to resource-rich locations for processing.Limited by transmission distance, bandwidth and energy, as well as the possibility of the surrounding infrastructure often being damaged in a disaster, offloading tasks to surrounding areas for execution is not always feasible.Therefore, for the reduction of energy consumption and delay, tasks are offloaded to edge servers, which have more computing power [3].The solutions to these problems have been made possible by the fast growth of the LEO satellite network.
In the last few years, great progress has been made in the study of task offloading for LEO satellite networks.K. Jaiswal et al. investigated a task offloading scheme for LEO satellites to minimise task processing time by jointly optimising offloading decisions for IoT devices [4][5][6].For the task offloading problem in SAGIN-based edge cloud computing systems, M. D. Nguyen et al. consumed energy while adhering to the task's maximum latency constraint [7].However, in emergency situations, none of the existing satellite task offloading efforts consider the inability of IoT devices to communicate directly with satellites for task offloading due to the disruption of the ground network infrastructure, which is limited by power.
In this paper, we wish to address these issues.First, UAVs, with their high mobility and flexible deployment, can act as aerial base stations [8][9][10].We propose a satellite-UAV-IoT device network architecture where multiple LEO satellites collaborate to offload computational tasks.The UAV acts as a connecting device between LEO satellites and IoT devices, observing satellite resource information and device task information to unload ground tasks to LEO satellites.Next, the problem is modelled as a constrained optimisation non-deterministic Polynomial (NP)-hard problem considering the task offloading of IoT devices and resource allocation of LEO satellites.Then, this optimisation issue is described as a Markov decision process (MDP), and a Deep Deterministic Policy Gradient and Long short-term memory (DDPG-LSTM)-based task offloading and resource allocation algorithm is designed to tackle the issue.Finally, an experimental environment for the simulation of the algorithm was created, and the findings demonstrate that the suggested approach saves the Weighting summation of the energy and latency by an average of 64.5% compared to the benchmark algorithm.The key contributions are the following.

•
A UAV-assisted air-space integrated task offload architecture is proposed in emergency scenarios, which jointly considers resource allocation and offloading schemes under the lack of ground resources of computing; • A multi-satellite joint task offload scheme is proposed, which takes full advantage of satellite computing resources to complete the task with low delay and energy consumed; • A Deep Reinforcement Learning (DRL) algorithm is proposed, and simulation experiments prove the functionality of the algorithm, reducing the weighted sum of the energy consumed and delay by an average of 64.5.
The remainder of this paper is organized as listed below.Section 2 showcases related work, including LEO satellite task offloading and UAV-assisted task offloading.Section 3 presents the model, problem description, and optimization objective.The satellite selection and the DDPG-LSTM algorithm are presented in Section 4. Section 5 analyses the experimental findings.Section 6 summarises the thesis.

Related Work
Task offloading supported by LEO satellite edge computing is a promising method in dealing with energy and computationally resource-constrained IoT devices effectively.In prior works on task offloading, to reduce the consumption of energy or latency of ground-based devices, Wang et al. studied the offloading problem in multiple IoT device scenarios and proposed a strategy for the allocation of resources and offloading, which significantly reduced the average system cost [11,12].Tan et al. introduces a multi-stage offloading scheme to obtain the most appropriate offloading strategy and reduce the average request response latency and request cost [13].Wang et al. propose a strategy considering the differences in terminal tasks and computing capabilities, energy and latency minimisation [14,15].In studying hybrid task offload, [16,17] proposed hybrid cloud and edge computing LEO satellite networks with a three-tier computing architecture, jointly considering cooperation between mobile users, LEO satellites and cloud servers.Wei et al. considered cooperative offloading between LEO satellite networks, jointly optimising offloading and resource allocation strategies aiming to minimise the weighted sum of user latency and energy consumption.Tang et al. considered the LEO coverage time and computing power to minimize the energy consumption of users by optimizing offloading decisions.[18] investigated cooperative offloading strategies for satellite edge computing systems and terrestrial base stations, considering satellite orbit characteristics and optimising offloading strategies to reduce energy consumption and latency.To make full use of the computing power of cloud servers, Tang et al. proposed an LEO-assisted terrestrial satellite network architecture for collaborative computing offloading at the cloud edge to minimise system energy consumption within the constraints of latency and other Quality of Service (QoS) requirements [19].
The focus of some work has been on UAV-assisted task offloading, where mobile and flexible UAVs can enhance task offloading efficiency.To maximise computational efficiency and task queue stability, Ding et al. propose a DRL-based scheme to optimise offloading and resource allocation [20,21].Considering the limited computational resources of UAVs, Chen et al. propose a strategy for offloading to ground-based base station servers to optimise transmit power and base station selection under the practical constraints of task completion latency and power consumption [22].In a UAV-enabled mobile edge computing system based on device-to-device communication, the overall energy efficiency is maximised by optimising UAV and node transmit power and scheduling strategies in order to improve the balance between different types of nodes [23].For channel uncertainty during offloading, [24] minimises the energy consumption under constraints such as the user quality of service by optimising the CPU frequency and user transmit power, etc.To effectively support the communication and computation of unmanned surface vehicles, [25] jointly considered the UAV flight speed and offloading decision to minimise the energy consumption of the UAV swarm under the condition of ensuring the time delay constraint.Wang et al. considered the energy limitation problem of UAVs and proposed the strategy of offloading the mission to the ground base station by optimising the communication scheduling and resource allocation, etc., considering the location relationship of the ground base station and energy efficiency, and aimed to minimise the total energy consumption [26].
Several efforts focused on joint UAV and LEO satellite processing task offloads.Chai et al. consider allocating computing and communication resources to build a resource allocation and task scheduling system.UAVs are responsible for collection tasks, and satellites provide edge computing services.A scheme for joint multitask offloading and resource allocation in satellite is proposed, significantly reducing the offloading cost [27].Due to the uncertainty of the air environment, Liu et al. presents an integrated network architecture between the air and ground and designs an adaptive joint deep reinforcement learning offloading scheme to select the most suitable LEO or task offload UAVs based on energy and computational capability, which improves the energy and computational efficiency.Under the condition of satisfying energy dynamics and considering the UAV on-board computing resources and energy constraints, it is proposed in [28] that IoT could locally handle and transfer it to servers, improving the success rate of the task.
From the above analysis, few past works have focused on UAV-assisted LEO edge computing for task offloading in emergency scenarios, where tasks from ground-based IoT devices are offloaded to LEO satellites for processing with the assistance of UAVs.In addition, tasks in emergency scenarios are computationally intensive and latency sensitive.Therefore, multiple satellites are considered for collaborative task processing to reduce energy consumption and task processing delays by optimising resource allocation and offloading strategies.

Network Scenario
Consider the emergency scenarios for 6G, where natural disasters such as earthquakes cause severe damage to ground infrastructure and prevent the provision of computing services to IoT devices.Due to the easy deployment characteristics of UAVs, we consider providing computing offload services for ground devices with the help of UAVs to meet the execution needs of computationally intensive and latency-sensitive tasks.
The UAVs collect tasks from ground-based IoT devices and communicate with multiple low-orbiting satellites simultaneously.In order to minimise execution latency, multiple LEOs with a relatively good channel state are selected to share the computational load.Selecting too few LEOs may result in an inability to carry the entire task load, leading to high computational latency and energy consumption.Conversely, selecting too many LEOs can take up satellite resources and other IoT devices are allocated fewer satellite resources, reducing overall system performance.
The ground-based IoT devices {1, • • • , N} are denoted as i, UAVs {1, • • • , M} are denoted as m and low-orbiting guard {1, • • • , V} is denoted as j.The locations of IoT devices, drones and satellites are represented using 3D coordinates.The access bandwidth of the satellite is divided into K sub-channels, each with bandwidth B, and the satellite storage capacity is C j .Where S i denotes the size of the device computational task, we used S ij to denote the task allocation of the IoT devices and the size of the allocation; β denotes necessary computing cycles to complete a bit of the computational task.

Architecture
Current terrestrial communication networks are prone to interruptions in the event of serious natural disasters.Instead, low earth orbiting satellites can provide communication guarantees for the emergency response to natural disasters or post-disaster relief.As communication networks continue to evolve, the integration of satellite and terrestrial networks, making full utilisation of the benefits of satellite networks to provide network support for emergency scenarios, is becoming one of the important topics.
In an emergency scenario, a multi-layered LEO satellite-UAV-ground network-integrated air-space-sky architecture is proposed in this paper, in which the ground-based communication facilities are damaged and unable to provide network services.IoT devices (search and rescue tools, rescue vehicles, etc.) are evenly distributed on the ground and UAVs are hovering over the affected area.It is assumed that the UAV has no on-board processing capability, providing relay services to ground equipment to assist in task offload.The ground equipment communicates with the UAV via a wireless channel and then offloads the task to the UAV.The ultra-dense multi-layer LEO topology ensures that seamless service coverage can be provided by multiple satellites for ground devices.The UAV selects service satellites based on satellite resources and computational tasks, forwards the tasks to be processed by the LEO satellites, and provides computational services to ground devices.The proposed task offload architecture fully exploits the resources, combines the advantages of UAVs and LEO satellites to meet the needs of ground emergency response, and provides a new solution for the ground computing task offload in emergency scenarios.The specific scenario is shown in Figure 1.

Channel Model 3.3.1. IoT Device-UAV Channel
The coordinates of the UAVs are denoted as (X m , Y m , H m ), and the coordinates of the IoT device are represented as (X i , Y i , 0); then, the horizontal distance is indicated as It is assumed that each small affected area is covered by a drone and is within the service area of just one drone.If the IoT device i sends a task to the drone m, the IoT device has to be inside the drone's coverage [5].
where d max dictates the maximum coverage radius of the UAV.
The transmission from the IoT device to the drone is assumed to take place over a wireless channel [29]; to prevent significant co-channel interference, IoT devices offload their computing tasks to drones in the form of frequency division multiple access (FDMA) [30].When an IoT device and a drone communicate, the drone flies at a low altitude, the channel is considered line-of-sight (LoS) [31] and the small fading effect of the channel is negligible [32].The uplink data rate r can be expressed as where ω u denotes the channel bandwidth of the IoT device to the UAV, g i,m is the channel gain of the uplink, σ 2 is the additive white Gaussian noise (AWGN) power and p i is the transmission power of the channel [33].
where g o denotes the reference channel gain and d 2 i,m + H 2 m denotes the Squared Euclidean Distance from the IoT device to the UAV.

UAV-LEO Channel
The geometric distance from the UAV to the LEO satellite, neglecting other factors, is such that the satellite enters the communication window, and data can be transmitted only when α ≥ 20.The coordinates of the UAVs are denoted as (X j , Y j , H j ).The geocentric angle θ mj between ground-based IoT devices and satellites can be expressed as [16] where R denotes the Earth's radius, H m represents the UAV's height, H j denotes the satellite altitude, and α j indicates the horizontal angle between the UAV and the satellite.The maximum value of the communication window θ is obtained when α = 20.The distance is given by Acording to 3GPP Release15, an additional Doppler shift due to satellite motion should be taken into account according to the following formula: where v sat denotes the satellite speed, c denotes the speed of light, and f m,j is the carrier frequency at the transmitter.The drone antenna transmitting gain and receiving antenna gain of the satellite are given by the following formula [34]: where φ denotes the effectiveness of the antenna, and Ω m and Ω j are the antenna radii on the reflective surfaces of the UAV and satellite, respectively.c is the speed of light.
According to the work that has been finished [30], the channel coefficient of the UAV-LEO channel is modelled as h mj = u mj l mj (10) where u mj and l mj represent the path loss factor and the small range fading, respectively.In particular, the path loss factor can be written as , λ m,j = c/ f m,j .The small-scale fading is given by where Q is the Rician fading factor, lmj denotes the LoS component satisfying | lmj |=1, and l mj denotes the non-line-of-sight (NLA) component following l mj ∼ CN (0, 1).According to the Shannon formula, the following equation gives the data rate of the uplink where B refers to the bandwidth used to link the UAV and the satellite, P mj denotes the UAV uplink transmission power, and N 0 denotes the noise power spectral density.

Task Offloading and Computing
The ground device offloads the task to the UAV, then transfers the task block S i via the UAV to leo j for processing.Use a ij = 1 to indicate that the task can be processed by the satellite leo j and vice versa to indicate that task S i is not offloaded to leo j for processing.
In the IoT device-UAV channel, the data offloading delay for the IoT device task S i to offload the task to the UAV includes both transmission delay, propagation delay, transmission delay T tran im and propagation delay T prop im , which is given by the following equation [35].
where c represents the light speed.The time delay between the IoT device and the drone can be given by the following equation.
The energy consumed by the device to transmit to the drone is calculated as In the UAV-LEO channel, the delay in offloading task S i from the UAV to leo j includes a transmission delay T tran mj , propagation delay T prop mj , and computation delay T j , which can be expressed separately as where S ij is the task volume size transferred from S i to leo j , β represents the process cycles taken by the CPU to execute one bit of the task volume, and f ij indicates the computing frequency allocated to S ij by leo j .The drone to satellite time delay is The energy used consists of the link transmission energy and the energy needed to calculate in the LEO satellite.The following equation gives the calculation of the transfer energy consumption [36].
where P mj is the uplink power of the UAV m.According to the following equation, the satellite calculates the energy consumption as where k is the energy factor.The total time delay T and total energy consumption E can be expressed as where T prop is the total propagation time delay, T tran is the total transmission time delay, T tran = T tran im + T tran mj .

Problem Definition
In terrestrial satellite networks, the management of available computing resources is crucial.One of the critical aspects of resource management is allocating tasks from IoT devices to satellite nodes for processing, where different offloading decisions lead to additional costs, affecting system performance and energy consumption.
Based on the system model and assumptions discussed above, the primary objective is the minimisation of the balanced totals of system latency and energy consumption by the collaborative optimisation of offloading decisions and resource allocation.The system satis-fies the storage requirements, given the available bandwidth and computational resources for all tasks simultaneously, and the problem can be expressed mathematically, as follows.
C6 : where X j denotes the size of the remaining storage resources of leo j , f ij is the computing resources allocated to s ij by leo j , f * j is the satellite's highest CPU frequency j, P * m is the full uplink transmission power of the UAV, C j is the total storage space, N is the number of individual satellite connections, K is the maximum number of channels, and the size of the task block is allocated by S ij .Where ζ, η ∈ [1, 10], ζ and η are the weights of delay and energy consumption, respectively.
C1 is that the free storage capacity of the LEO satellite to which the device is connected is not less than the device's task size, C2 is that the storage space already used by the satellite is not more significant than the total storage space, C3 is that the number of individual satellite connections does not exceed the maximum number of channels, and C4 is a constraint on satellite computing resources to ensure that the CPU resources being allocated to IoT device tasks do not overwhelm the total CPU computing resources.C5 is the sum of the tasks given to different satellites by S i and the whole task size, and C6 is the UAV to uplink the transmission power that is not more significant than the maximum UAV power.
The complexity of the problem is increased by the coupling relation among the optimizing variables.In addition, the proposed optimisation problem is a mixed integer nonlinear problem, though the function and constraints have binary variables.As IoT devices continue to rise, the complexity grows exponentially.To reduce the problem's complexity, decomposing the original problem into sub-problems provides a new solution.It decouples the optimising problem and turns it into two sub-problems: satellite selection, task volume and computational resource allocation.
Transformation, according to the optimization objective, yields The satellite selection needs to satisfy the constraints C1, C2 and C3, and the objective of optimisation is phrased as The task and computational resource allocation policy must satisfy constraints C4, C5 and C6 with the following optimization objectives.

Algorithm Design
In this section, the above sub-problems are analyzed.Two algorithms are proposed to solve problems based on task separability and resource separability, respectively.To better understand our proposed solutions, we will briefly introduce the algorithmic process, explaining the concepts related to Monte Carlo methods, Markov decision processes, reinforcement learning, and the mathematical definitions of value and reward functions.

Satellite Selection
The satellite selection problem is a mixed integer programming problem, as observed from the objective function and constraints; thus, a Monte Carlo random sampling method is considered for the solution.A Monte Carlo method-based satellite selection algorithm is proposed for satellite and task matching, where satellites move continuously according to a predetermined orbit.IoT devices generate computational tasks and obtain the optimal satellite combination for IoT device task offloading.

Monte Carlo-Based Satellite Selection Algorithm
Monte Carlo methods are also known as random sampling or statistical test methods.The Monte Carlo method is a computational method but is different from the general numerical computational methods.It is a method based on probabilistic statistical theory.It solves problems that are difficult to solve by numerical methods, which is why it is increasingly used in many applications.
The UAV collects the current satellite storage resource information (C 1 , • • • , C j ) and the IoT device task (S 1 , • • • , S i ).According to the managed satellite resource and task information, the satellite storage resource information is randomly sampled by the Monte Carlo method to approximate the task size S i and obtain the approximate satellite subsequence UAV m to Satellite propagation time delay Maximum propagation time delay is The optimal combination of offloading satellites is obtained by minimizing the maximum propagation delay of IoT devices.As shown in Algorithm 1.
The algorithm inputs the task size processed and the satellite resources.Minimizing the propagation delay obtains the optimal offloading decision for the IoT device tasks, which are transmitted via a link between the UAV and the LEO satellite.Compute distance between Satellite and UAV H m,n , according to (6); end if 15: end for

Task and Computing Resource Allocation Strategy
Due to the presence of correlations in several restrictions, to reduce the problem's difficulty, several deterministic factors have to be taken into account when offloading satellite edge tasks.These include the satellite edge server's state and the environment of the communication.Therefore, the optimizing problem is converted to an offloading scheme based on DRL-making methods and solving it by Markov decision.The DRL architecture consists of interactions through the agent to solve the above problem by training the best policy to maximise the cumulative reward [37].

DDPG-Based Task Offloading and Resource Allocation Algorithm
Through iterative trials, reinforcement learning optimizes the action selection in multiple situations based on a given reward function.The intelligent body perceives the state and performs actions to change it.During each iteration, the competent body observes the state as input and selects the action to be completed.The execution of the action produces a reward, and the intellectual body judges the quality of the action by observing the prize.The selection of activities by intelligence tends to increase the long-term total compensation and maximize the reward function.
The DDPG algorithm is one of the most popular methods for dealing with problems in RL, described as (S, a, r t , γ), where S is the state, a denotes the action, r t is the immediacy reward of the time slot, and γ ∈ (0, 1).The anticipated long-term discounting compensation has the following definition: Here, at time t, the state and action are s t and a t , respectively; r(s t , a t ) is the straight reward.Taking into account that we have to deal with continuous actions, we decide to follow a determined policy and write the value function, as follows In an effort to achieve maximum expected discounted benefits over the long term, at each slots, we use the time-series difference method learned from the previous period's experience to update the action function.
The DDPG algorithm makes the approximation Q(s, a|θ).The actor uses the strategy function µ(s|θ) to decide on an action, the critic uses the value function to judge the policy functional, and the network of values and strategic network are renewed in accordance with the critic's output.The loss function has the form Refine the critic network by minimising L(θ Q ), and Q(θ Q ) is the maximum future payoff that will be earned by proceeding with the policy µ(s t+1 |θ Q ) to the state following the implementation of action a t .The actor network must change the action parameters in the general direction where the maximum Q is more probable.
Instead of updating all the parameters, the fixed goal Q-network can stabilise the learning by keeping a part of the parameters updated.Let θ µ and θ Q be the parameters before the update, respectively, and θ µ and θ Q after the update.
As changes in task offloading are time-continuous and the offloading decision taken in the previous time slot has an impact on the current observation, we use LSTM to capture the correlation between the previous observation and the current observation and more potential information by learning a series of past experiences.The algorithm overcomes the inability of the Deep Deterministic Policy Gradient (DDPG) to handle partial observability and history-dependent decisions by adding a recursive mechanism.The DDPG-LSTM is therefore proposed, and the algorithm architecture is shown in Figure 2. The organisation of the DDPG-LSTM is mainly built on the actor-critic model.The actor network of the agent is responsible for generating actions and contains two components: the actor network µ θ and µ θ , where θ and θ are the network parameters.The critic network of agent is responsible for evaluating actions and contains two components: the critic network Q φ and Q φ , where φ and φ are the network parameters.
The DDPG-LSTM algorithm has three main elements.

• State
The state consists of the IoT device task information and satellite resource information.
where S i denotes the computational task size of the ground-based IoT device and f j denotes the computational resources of the LEO satellite.
The action is composed of the task allocation vector S ij and the computational resource allocation vector F ij to obtain the action space A of the system.
In this work, a function is introduced to explain the amount of change in the system cost when action A t is taken in system state S t .It is expressed as Here, U t and U t+1 means the latency and energy consumption at time point t and the next time point.The magnitude of the difference represents the cost reduction achieved by A t ; the system benefit of taking action A t is U t .
The MDP is aimed at maximising the Reward Sum expected to be received, and can therefore be formulated as follows The process of DDPG-LSTM is shown in the Algorithm 2, where the weight parameters and the replay buffer are initialised.In each training round, the agent is given a state Z t , decides on an action a t , performs the action and receives a reward R t .Then, the replay buffer stores the experience transformation and the chosen batch size of M. Finally, the networks are refreshed.

Algorithm 2 DDPG-LSTM-based task and resource allocation algorithm
Input: task S i , computing resource f j Output: task allocation S ij , computing resource f ji 1: Actor_Critic weight parameters randomised initialisation : ω, θ, ω , θ 2: Initialising Replay Buffer M, mini-batch size B, train eposide threshold C; 3: for m = 1, Ep do Initialize LSTM states in the network; 6: Reception of primary state Z t ; 7: Choose a t by online network, Implementation a t , obtain R t and switch to the state Z t+1 ; 10: Sample random mini-batch transitions (Z i , a i , R i , Z i+1 ) from R; 13: Using the loss function and policy gradient to refresh online critic and actor parameters; end for 17: end for

Performance Evaluation
Within this chapter, we examine the behaviour of the proposed DDPG-based scheme.First, we present the simulation scenario.Then, many simulation experiments are performed, and the results are compared and analyzed.Lastly, we verify the training efficiency under different parameters and find the optimal parameter settings, comparing the performance of various task offloading schemes and further demonstrating the method's effectiveness.

Simulation Environment and Parameters
A UAV-assisted air-space integration network scenario is considered; we used Python to simulate and evaluate the proposed algorithm.In our simulations, we consider a geographical area of 1 km × 1 km, in which IoT devices are then randomly deployed within the area [38].With LEO satellites flying at [700, 1000] km and UAVs flying at 100 m [39], the LEO satellites provide a seamless coverage of the geographical area.We assume that the efficiency of the satellite, IoT device and UAV antenna are 0.6, 0.55 and 0.6, respectively [34].The maximum available computing resources of the satellite are evenly distributed in the [3,8] GHz interval.The detailed simulation parameters are given in Table 1 [40][41][42].The DDPG-LSTM algorithm with a LSTM layer and the neural network [43] uses the Relu, Tanh and sigmoid functions as the activation function, while the end actor network results use softmax to restrict actions.Some critical parameters are analyzed to explore the impact of the algorithm parameters.For each parameter studied, we provide some possible reference values.Energy consumption and the delay weighted sum is used as an evaluation criterion to explore the effect of parameters.The detailed algorithm parameters are given in Table 2. Figure 3 represents the performance of the algorithm at different learning rates.The update step's size affects the convergence speed, and when the learning rate is too low, the algorithm converges slowly.When the learning rate is too high, the maximum value may be missed due to the excessive size of the update step.The graph shows that the algorithm performs optimally if δ a = 0.0001 and δ c = 0.001.In addition, it can be found that the network performs better when δ c is greater than δ a because the actor network needs guid-ance from the critic network to learn.When the critic network learns faster than the actor network, it can better guide the update direction of the actor network.Figure 4 shows the algorithm's performance under different soft update rates.Compared to the complex update strategy, the upgrade interval of the objective network is reduced by the soft update strategy.It ensures that the target mesh is updated in every iteration, increasing the frequency of updating the target mesh and helping to decrease the time taken for the algorithm to converge.The smaller the soft update coefficient, the more stable the algorithm will be and the less the parameters of the target mesh will change, resulting in a too-slow convergence of the algorithm.If the soft update coefficient is too large, the algorithm will be unstable.Therefore, an appropriate soft update factor can make the algorithm stable and fast.The figure illustrates that the best performance of the algorithm is achieved when the soft update factor is τ = 0.005.Figure 5 shows the algorithm's performance under a varying batch size.It uses small batch learning to increase the speed of model training to reduce the cost per iteration.Small batches converge faster for training compared to extensive data collection.Still, they can lead to poor performance as the data stored in the buffer is initially over-utilized [44], reducing the importance of the data at a later stage.Large batches of data cause the network to update too slowly and may also perform poorly.The behaviour of the algorithm on a variety of sets is depicted in the graph, from which it is clear that the algorithm achieves better results when the batch size = 256.In addition, it can be found that when the batch size is 64 and 128, the system loss is higher, and the curve fluctuates more, making it difficult for the algorithm to converge quickly and reducing its performance.The actor network will make actions that give the critic network a high rating, and the rating calculation will use the discount factor.To reflect the continuity of the decision, the actor-network is expected to consider the reward and the next prize when choosing action a t .Too small a discount factor prevents the critic network from anticipating the future in time and affects the algorithm's performance.Conversely, when the discount factor is too significant, it may reduce the critic network's prediction accuracy.In the figure, the method performs better when the discount factor = 0.99.

Performance Comparison
In simulation experiments, the DDPG-LSTM algorithm proposed in this paper is compared with random offloading (RO) [45], Twin Delayed DDPG (TD3) [46][47][48] and local computing(Local), using the weighted sum of latency and energy consumption as evaluation criteria [36,49].Then, the performance of the algorithm is compared under different computational resources and task volumes, validating the performance advantages of the algorithm.
Assume that a stack of task orders arrive at the MEC server at each slot, and the device generates a task at a slot.The task execution cost is used to compare the performance of different policies.The offload expense is the summation of the time taken by the devices to fulfil their individual tasks throughout the time slot.
Figure 7 reveals that energy consumption and the delay weighted sum of executing the task using the DDPG-LSTM method is lower than the other three strategies.The performance of the algorithm continues to improve as the training progresses.DDPG-LSTM takes into account the satellite status information to ensure that resources are fully utilised and continuously optimises the resource allocation strategy to ensure that the task is completed with the lowest possible latency.In addition, the figure shows that the cost of the task execution is much higher than the other strategies due to the lack of IoT device computing resources.Compared to the TD3 algorithm, DDPG has a faster convergence speed, and LSTM is easier to capture temporal information.Therefore, the DDPG-based scheme has a memory function to store valuable historical data, thus achieving better performance and validating the effectiveness of LSTM for task offloading strategies.Figure 8 displays the cumulative expense of different data sizes.As the task data becomes more significant, the server has to use additional time and energy to handle the tasks, and the average full system expense for handling tasks is trending up.In contrast, the DDPG-LSTM algorithm has the smaller rising trend and the better performance compared to other algorithms.As the volume of the task size becomes larger, the expense of the locally computed increases faster.Figure 9 indicates the changing of the cost as the LEO satellite server compute frequency rises between [3,8] GHz.It is apparent from the figure that the combined expense of the three strategies tends to be greater as the processing frequency of the server becomes higher.As the frequency of LEO satellite server processing rises, the expense of the DDPG-LSTM suggested in this paper is the lowest.Demonstration of the distribution of offload tasks across several LEO satellites; Figure 10 shows four IoT devices, with three of the satellites providing edge computing capability.The altitude and computing resources of the satellites affect the mission offload, and similarly, the distribution of computing resources is affected by the mission offload.For satellite communication, transmission costs increase with altitude, and higher altitudes prolong the latency and energy consumption spent in the space segment during the computed offload.In satellite edge computing, lower computational resources and larger task offloads increase the processing latency and energy consumption of the task.

Conclusions
In this paper, we put forward an integrated air-space-sky network architecture for UAV-assisted task offloading to provide more available computational resources for ground devices and to ensure computational requirements in emergency scenarios.To minimise the delay and energy consumed by offloading tasks, described the problem as MDP.We further develop an algorithm to solve it.The solution enables the UAV controller to determine the best unloading decision based on dynamic channel conditions and the satellite position, including task offloading scenarios and resource allocation strategies.Finally, a series of trials are conducted to validate the validity and superiority of our proposed unloading scheme.
In the future, we need to consider more of drones' auxiliary access LEO mobile Internet of things in the edge of the network equipment.In some real-world situations, IoT devices are mobile at high speed; the approach we have proposed may not be appropriate for such scenarios.In the case of mobile IoT devices, by using satellite switching to overcome this problem, in this paper, IoT devices can only perform task offloading.When the number of IoT devices or tasks increases, this offloading strategy puts a lot of bandwidth pressure on the satellite network and increases the energy consumption for task transmission.Therefore, a partial offload strategy can be explored for future work associated with airbased edge computing.

Figure 1 .
Figure 1.An illustration of task offloading scenario.

5 : 8 :
Compute Propagation Delay between Satellite and UAVD = {D m,1 , • • • , D m,j }, according to D m,n = H m,n c ; 6: end for 7: for u = 1, t do if a u C T ≥ ∑ S then 9:Compute Propagation Delay T u , according to T u = max(a u × D);

Figure 3 .
Figure 3. Training process with different Learning Rate settings.

Figure 4 .
Figure 4. Training process with different Soft Update Rate settings.

Figure 5 .
Figure 5. Training process with different batch size settings.

Figure 6
Figure 6 represents the performance of the algorithm under different discount factors.The actor network will make actions that give the critic network a high rating, and the rating calculation will use the discount factor.To reflect the continuity of the decision, the actor-network is expected to consider the reward and the next prize when choosing action a t .Too small a discount factor prevents the critic network from anticipating the future in time and affects the algorithm's performance.Conversely, when the discount factor is too significant, it may reduce the critic network's prediction accuracy.In the figure, the method performs better when the discount factor = 0.99.

Figure 6 .
Figure 6.Training process with different discount factor settings.

Figure 8 .
Figure 8. Relationship between cost and S i .

Figure 9 .
Figure 9. Relationship between cost and f j .

Figure 10 .
Figure 10.Distributions of offloading task and computing resourse.