Next Article in Journal
Clinical Trials of a Stroke Rehabilitation Trainer Employing a Speed-Adapted Treadmill
Next Article in Special Issue
Partitioned RIS-Assisted Vehicular Secure Communication Based on Meta-Learning and Reinforcement Learning
Previous Article in Journal
DynaNet: A Dynamic Feature Extraction and Multi-Path Attention Fusion Network for Change Detection
Previous Article in Special Issue
Spatio-Temporal Heterogeneity-Oriented Graph Convolutional Network for Urban Traffic Flow Prediction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SAC-MS: Joint Slice Resource Allocation, User Association and UAV Trajectory Optimization with No-Fly Zone Constraints

1
College of Electronic and Information Engineering, Shandong University of Science and Technology, Qingdao 266590, China
2
College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao 266590, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(18), 5833; https://doi.org/10.3390/s25185833
Submission received: 15 August 2025 / Revised: 9 September 2025 / Accepted: 16 September 2025 / Published: 18 September 2025

Abstract

With the rapid growth of user service demands, space–air–ground integrated networks (SAGINs) face challenges such as limited resources, complex connectivity, diverse service requirements, and no-fly zone (NFZ) constraints. To address these issues, this paper proposes a joint optimization approach under NFZ constraints, maximizing system utility by simultaneously optimizing user association, unmanned aerial vehicle (UAV) trajectory, and slice resource allocation. Due to the problem’s non-convexity, it is decomposed into three subproblems: user association, UAV trajectory optimization, and slice resource allocation. To solve them efficiently, we design the iterative SAC-MS algorithm, which combines matching game theory for user association, sequential convex approximation (SCA) for UAV trajectory, and soft actor–critic (SAC) reinforcement learning for slice resource allocation. Simulation results show that SAC-MS outperforms TD3-MS, DDPG-MS, DQN-MS, and hard slicing, improving system utility by 10.53%, 13.17%, 31.25%, and 45.38%, respectively.

1. Introduction

With the advent of the sixth-generation (6G) wireless communication era, the paradigm of ubiquitous connectivity and support for diverse vertical industries has placed unprecedented demands on network capabilities. To meet these heterogeneous service demands, network slicing has emerged as a promising solution. It enables the partitioning of a shared physical network into multiple virtual networks based on different service types, with each network independently optimizing its resources and control strategies [1,2]. Driven by the rapid proliferation of high-bandwidth applications such as Internet of Things (IoT) devices and intelligent terminals, global mobile data traffic has increased dramatically. Traditional terrestrial networks are increasingly unable to meet the needs of wide-area coverage, high-speed access, and ultra-low latency, especially in remote or disaster-stricken regions. In these areas, terrestrial base stations are limited by geographical constraints and high deployment costs, resulting in significant coverage gaps. Furthermore, in urban hotspots or large-scale event scenarios, severe network congestion can significantly degrade the quality of service.
To address these challenges, the space–air–ground integrated network (SAGIN) has emerged as a promising architecture for next-generation wireless networks [3,4,5]. Specifically, low earth orbit (LEO) satellites can offer seamless connectivity in remote areas; however, the long communication distances between satellites and users hinder their ability to meet low-latency requirements [6]. In contrast, UAVs, acting as aerial base stations, offer flexible and rapid deployment and benefit from line-of-sight (LoS) communication, making them well-suited to complement terrestrial infrastructures [7,8]. By integrating satellites, unmanned aerial vehicles UAVs, and terrestrial networks, SAGIN enhances edge computing capabilities and improves network resilience, thereby alleviating the burden on traditional networks. This architecture is particularly well-suited for scenarios involving emergency response, coverage of remote areas, and dynamic service demands.
In practical deployment scenarios, UAVs, acting as mobile aerial base stations, provide communication and computing services to ground users. The design of UAV flight trajectories plays a critical role in determining the overall system performance. To achieve wide-area coverage and efficient service provisioning for ground users, it is essential to plan UAV flight paths carefully. A substantial body of research has been devoted to UAV trajectory optimization in multiple access systems [9,10,11], multi-UAV cooperative networks [12], and relay-based communication systems [13], yielding valuable insights.
However, in real-world environments, UAV trajectories are strictly constrained by regulatory and geographical factors. High-risk areas such as airports, power transmission lines, and military zones are often designated as no-fly zones (NFZs) [14], which must be explicitly considered in UAV trajectory planning. In recent years, several studies have begun incorporating NFZ constraints into trajectory optimization frameworks to enhance the safety and feasibility of UAV operations [15,16]. On the other hand, within the architecture of space–air–ground integrated networks, the way ground users select the most suitable access point—whether a terrestrial base station, UAV, or satellite—significantly impacts overall network performance. Moreover, given the limited communication and computing resources available at each access node, efficiently allocating these resources to meet heterogeneous service demands remains a key challenge requiring urgent attention.
Given the aforementioned challenges, this study develops a space–air–ground integrated network architecture with NFZ constraints. The proposed framework jointly optimizes user association, slice resource allocation, and UAV trajectory planning to meet the diverse service requirements of ground users. The main contributions of this work are summarized as follows:
  • To meet the customized demands of diverse services under resource-constrained conditions, this paper introduces a dynamic radio access network slicing mechanism within the architecture of a SAGIN, subject to NFZ constraints. This design aims to enable on-demand allocation and efficient management of communication resources.
  • We formulate a joint optimization problem that integrates user association, slice resource allocation, and UAV trajectory optimization, aiming to maximize the system utility, defined as the difference between the total system gains and the system costs. It is then decomposed into three subproblems: user association, UAV trajectory optimization and slice resource allocation.
  • We propose the SAC-MS algorithm to solve the joint optimization problem. Specifically, a many-to-one matching game is adopted to achieve stable user–base station association, the SCA method is employed to transform the non-convex UAV trajectory optimization problem into convex subproblems, and a deep reinforcement learning-based algorithm is introduced for adaptive slice resource allocation in dynamic networks.
  • Simulation results show that the proposed algorithm improves system utility by 10.53%, 13.17%, 31.25%, and 45.38% compared to the benchmark algorithms (TD3-MS, DDPG-MS, DQN-MS, and Hard_slicing), respectively.
The remainder of this paper is organized as follows. Section 2 provides a comprehensive review of related works on SAGIN, UAV trajectory optimization, user association, and slice resource allocation. Section 3 describes the system model, including the network model, UAV mobility model, NFZ model, communication model, and computation model. Section 4 formulates the joint optimization problem. Section 5 presents the proposed joint algorithm for user association, UAV trajectory optimization, and resource allocation. Section 6 analyzes the simulation results. Finally, Section 7 concludes the paper. To ensure clarity, we summarize all symbols and their definitions in Table 1 and provide the explanations of Acronyms in Table 2.

2. Related Works

To meet the growing demand for wireless services, the space–air–ground integrated network (SAGIN) has emerged as a pivotal architecture for enhancing connectivity and service quality by integrating satellites, UAVs, and terrestrial infrastructures. While extensive research has been conducted on its individual components, a significant gap remains in holistically integrating these elements under practical constraints [17].
Regarding SAGIN architectures, studies have made progress in supporting seamless coverage and massive connectivity. For instance, Mao et al. [18] introduced a space–aerial-assisted hybrid cloud–edge computing framework to minimize computation delay, while Cheng et al. [19] developed a SAGIN-based computing architecture for offloading. Others have focused on service function chaining [20] and dynamic resource allocation [21]. However, a common limitation across these works is their treatment of the physical network as a monolithic entity. They struggle to meet the diverse and customized needs of different users simultaneously. Reference [22] analyzed how dynamic network slice allocation delays affect latency, bandwidth, and service continuity, showing that reducing allocation time enhances performance for mobile users in 5G networks. While RAN slicing is a promising solution to this problem, its application in the complex SAGIN context remains strikingly underexplored, creating a clear research opportunity.
In the domain of UAV trajectory optimization, the focus has been on leveraging UAVs’ flexibility and LoS links to improve performance. Works like [23] jointly optimized trajectory and power control, and [24,25] aimed to minimize delay through integrated trajectory and resource planning. Ref. [26] applied the Successive Convex Approximation (SCA) method to solve the subproblems of computation resource and bandwidth allocation as well as UAV trajectory optimization. Notwithstanding these contributions, the vast majority of these studies operate in idealized environments. A critical oversight is the prevalent neglect of No-Fly Zone (NFZ) constraints, which are non-negotiable for real-world UAV deployment around airports, power lines, or military zones. Consequently, many proposed trajectories are academically interesting but practically infeasible, highlighting the need for trajectory designs that explicitly incorporate such safety-critical constraints.
On the front of user association and slice resource allocation, References [27,28] investigated the user association problem using evolutionary game theory and the Lagrangian dual method, respectively. The joint optimization of user association and resource allocation has been extensively studied in [29,30,31]. Ref. [29] used a D3QN-based method for resource allocation, and [31] developed a MADDPG framework for joint user association and allocation. Ref. [32] proposed a fractional programming approach using quadratic transform, weighted MMSE, and SCA for beamforming, offloading, and resource allocation. Ref. [33] proposed an energy-efficient user selection algorithm with dynamic caching. In addition, ref. [34] introduced a multi-agent deep reinforcement learning-based strategy for resource allocation in vehicular networks. Despite their sophistication, these approaches are often myopic. They predominantly focus on single-dimension optimization within a static or ground-based network topology. Crucially, they fail to account for the dynamic network topology introduced by mobile UAVs and the intricate coupling between association decisions, resource slicing policies, and UAV trajectories. Furthermore, many DRL solutions rely on algorithms like DQN, DDPG, or TD3, which are known to suffer from training instability and inefficiency in high-dimensional continuous action spaces, precisely the nature of the resource allocation problem here.
Therefore, in contrast to the existing literature, our work proposes a comprehensive solution that bridges these gaps. We introduce a joint optimization framework that (1) incorporates RAN slicing into SAGIN for customized service provisioning, (2) rigorously embeds NFZ constraints into UAV trajectory planning to ensure practicality, and (3) simultaneously solves user association, slice resource allocation, and trajectory optimization. We employ the Soft Actor–Critic (SAC) algorithm, which is specifically designed for stability and efficiency in high-dimensional continuous action spaces, thereby addressing the limitations of prior DRL approaches used in this domain.
To clearly highlight the differences between this study and existing literature and emphasize its novelty, we compared it with previous works, as shown in Table 3.

3. System Model

3.1. Network Model

As shown in Figure 1, we study a SAGIN consisting of B terrestrial base stations (TBSs, including macro base stations (MBSs) and small base stations (SBSs)), I ground users, U UAVs, and a constellation of low earth orbit (LEO) satellites. The coverage area of each MBS contains multiple SBSs. UAVs act as aerial base stations and fly from a starting point l 0 to a destination point l e at a constant power to provide services to ground users. The UAV service area is denoted as the region x max   ×   y max , within which several no-fly zones (NFZs) are defined to prohibit UAV traversal. Due to the limited coverage of terrestrial base stations and their potential damage to these stations in disaster areas, which can disrupt communications, UAVs and LEO satellites are introduced to achieve seamless coverage across the entire cell. This ensures continuous communication services for ground users. Each user can establish a communication link with terrestrial base stations, UAVs, or LEO satellites. We assume that sets W   =   { W 0 , ,   W B + U } and F   =   { F 0 , ,   F B + U } represent the total virtual bandwidth and computing resources of all base stations, respectively. The bandwidth and computing resources of each terrestrial base station and UAV are partitioned into three service slices: SLS, SLT, and SR, which correspond to latency-sensitive, latency-tolerant, and high-data-rate tasks, respectively. Accordingly, the set of all slices is denoted by n N   =   { S L S ,   S L T ,   S R } . This resource slicing scheme can be applied to any number of service slices [35].
In addition, we define the sets of LEO satellites, TBSs, and UAVs as L   =   1 ,   2 , ,   L ,   B   =   1 ,   2 , ,   B , and U   =   { 1 ,   2 , ,   U } , respectively. The set of all base stations is denoted as J   =   L B U . The set of users associated with slice n is denoted as I n   =   { 1 , ,   M 1 ,   ,   M 1   +   M 2 , ,   M 1   +   M 2   +   M 3 } , while I M 1   =   { 1 ,   2 , ,   M 1 } ,   I M 2   =   { 1 ,   2 , ,   M 2 } , and I M 3   =   { 1 ,   2 , ,   M 3 } represent the sets of users served by the SLS, SLT, and SR slices, respectively. We also define a binary association variable x n i , j t to indicate whether user i on slice n is associated with base station j , where x n i , j t   =   1 if associated, otherwise, x n i , j t   =   0 . At each time slot t , each user i on slice n can be associated with only one base station, i.e., j J x n i , j t   =   1 . The bandwidth and computing resources allocated by the j th base station to slice n are denoted by W n   =   b n , j t W j and F n   =   f n , j t F j , respectively. b n , j t and f n , j t represent the allocation ratios of bandwidth and computing resources assigned to slice n , respectively, and b n , j t 0 ,   1 ,   f n , j t 0 ,   1 .

3.2. UAV Mobility Model

To simplify the modeling, we assume that all UAVs fly at a fixed altitude H . Based on the 3D Cartesian coordinate system, the position of the u th UAV at time slot t is denoted by l u ( t )   =   x u U A V t ,   y u U A V t ,   H , and its projection onto the horizontal plane is denoted by l u ( t )   =   x u U A V t ,   y u U A V t . The UAV mobility is subject to a set of constraints, including its initial and final positions, which are expressed as follows.
l u ( 0 ) = l 0 , l u ( T ) = l e
In addition, the UAV’s flight speed is denoted as V u t   =   l u ( t   +   1 )     l u ( t ) τ [36]. Furthermore, considering the constraints of the service area, as well as the UAV’s minimum and maximum speed requirements, denoted by V min and V max , respectively, the UAV’s trajectory and flight speed should satisfy the following conditions.
0 x u U A V t x max
0 y u U A V t y max
  V min 2 V u ( t ) 2 V max 2
Due to limitations in horizontal flight speed, the UAV’s flight distance is also constrained and must satisfy the following condition.
Δ l u ( t ) = l u ( t + 1 ) l u ( t ) L max x
where Δ l u ( t ) denotes the UAV’s horizontal travel distance, and L max x represents the maximum allowable horizontal distance.

3.3. NFZ Model

In certain specific areas, there are K randomly distributed but non-overlapping NFZs that prohibit UAVs from flying through [37], we model each NFZ as a cylinder of infinite height [14], which is a common simplification for conservative trajectory planning. This implies that a UAV violates the NFZ constraint if its horizontal projection enters the circular area of the NFZ, regardless of its altitude. Moreover, we consider NFZs with a radius of r N F Z and a sufficiently high altitude. The set of NFZs is denoted by K   =   1 ,   2 , ,   K , and the center of the k th NFZ is represented by p k   =   x ¯ k ,   y ¯ k , as shown in Figure 2. The NFZ constraint is formulated solely based on the horizontal distance.
l u t p k 2 r N F Z 2 , u U , k K
The radius r N F Z is usually set proportionally to the size of the simulation scenario, reflecting a typical safety margin around sensitive areas. The assumption of infinite height ( H N F Z     ) is valid as long as the UAV’s operational altitude H is significantly lower than any realistic NFZ ceiling, which holds true for our scenario.

3.4. Communication Model

In wireless communication, the distance between the sender and the receiver is an important factor to consider. For convenience, we assume the geographical position of the LEO satellite is l l t   =   x l L E O t ,   y l L E O t ,   z l L E O t , the location of the b th TBS is defined as l b t   =   x b T B S t ,   y b T B S t ,   z b T B S t , and the coordinates of user i on slice n are given by l n i t = x n i t , y n i t , 0 . Considering a 3D Cartesian coordinate system, at any time t , the projected positions of LEO l , TBS b , and user i on the horizontal plane are denoted by l l t = x l L E O t ,   y l L E O t , l b t = x b T B S t , y b T B S t , and l n i t = x n i t , y n i t , respectively. Accordingly, the distances between LEO l , TBS b , UAV u and user i on slice n are denoted as d n i , l t = z l L E O ( t ) 2 + l l t l n i t 2 , d n i , b t = z b T B S ( t ) 2 + l b t l n i t 2 and d n i , u t = H 2 + l u ( t ) l n i t 2 , respectively.

3.4.1. LEO-User Communication Model

With the aid of LEO satellite downlinks, seamless coverage can be provided to users. Since the distance between the satellite and the user is relatively large, the impact of user mobility on the channel gain can be neglected. The channel coefficient between user i on slice n and LEO l is denoted as
g n i , l d o w n t = h n i , l t d n i , l t α = c 4 π f c 2 d n i , l t α
where h n i , l t denotes the unit radio propagation loss of the satellite link due to free-space path loss [12]. c represents the speed of light, f c denotes the carrier frequency, d n i , l t is the distance between LEO l and user i , and α is the path loss exponent. Therefore, at time slot t , the downlink transmission rate of user i on slice n associated with LEO l is
r n i , l d o w n t = x n i , l t y l , n i t W l log 1 + p l , n i d o w n t g n i , l d o w n t σ 2
where the transmission power of LEO l to user i on slice n is denoted by p l , n i d o w n t , and x n i , l t is a binary variable indicating whether user i is associated with LEO l . σ 2 represents the noise variance, and y l , n i t denotes the proportion of bandwidth resources allocated to user i on slice n , associated with LEO l .
For users on both latency-tolerant and latency-sensitive slices, computing tasks need to be offloaded to MEC servers for processing. Accordingly, the uplink data transmission rate of user i on slice n when offloading tasks to the MEC server deployed on LEO satellite l is
r n i , l u p t = x n i , l t y l , n i t W l log 1 + p n i , l u p t g n i , l u p t σ 2

3.4.2. UAV–User Communication Model

In the UAV–user communication model, the UAV acts as an aerial base station providing different types of services to users via the downlink. Similarly, at time slot t , the channel gain between UAV u and user i on slice n can be expressed as
g n i , u d o w n t = h n i , u t d n i , u t α
It is assumed that the small-scale fading component h n i , u t follows a Rayleigh fading channel model and is given by h n i , u t = h 0 R R + 1 h ^ n i , u t + 1 R + 1 h ˜ n i , u t , where h 0 denotes the reference channel gain at a distance of 1 m, R represents the Rician factor, h ^ n i , u t is the Line-of-Sight (LoS) component, h ˜ n i , u t is the Non-Line-of-Sight (NLoS) component, and h ˜ n i , u t ~ C N ( 0 ,   1 ) . Accordingly, the data transmission rate between UAV u and user i on slice n at time slot t can be expressed as follows.
r n i , u down ( t ) = x n i , u ( t ) y u , n i ( t ) W u log 2 1 + p u , n i down ( t ) g n i , u down ( t ) u U { u } p u , n i down ( t ) g n i , u down ( t ) + σ 2
Similarly, the uplink data transmission rate for user i offloading tasks to the MEC server equipped on UAV u is
r n i , u up ( t ) = x n i , u ( t ) y u , n i ( t ) W u log 1 + p n i , u up ( t ) g n i , u up ( t ) u U \ { u } p n i , u up ( t ) g n i , u up ( t ) + σ 2

3.4.3. TBS-User Communication Model

In the communication model between TBSs and users, each TBS provides customized services to multiple users. At time slot t , the channel gain between user i and TBS b is defined as g n i , b d o w n t = h n i , b t d n i , b t α , where h n i , b t follows an exponential distribution with unit mean [5]. Therefore, at time slot t , the downlink data transmission rate between terrestrial base station b and user i on slice n is
r n i , b down ( t ) = x n i , b ( t ) y b , n i ( t ) W b log 1 + p b , n i down ( t ) g n i , b down ( t ) b B { b } p b , n i down ( t ) g n i , b down ( t ) + σ 2
Similarly to the satellite and UAV cases, when user i offloads a task to the MEC server equipped at terrestrial base station b , the uplink data transmission rate is
r n i , u up ( t ) = x n i , u ( t ) y u , n i ( t ) W u log 1 + p n i , u up ( t ) g n i , u up ( t ) u U \ { u } p n i , u up ( t ) g n i , u up ( t ) + σ 2

3.4.4. Computation Model

Each user i is required to process a computation task at each time slot t , which can be represented by a tuple Ω n i t = L n i t , C n i t , φ n c o m p . L n i t denotes the data size, C n i t is the required number of CPU cycles per bit, and φ n c o m p represents the maximum tolerable delay threshold allowed for slice n . Considering that the amount of data generated by computation is relatively small compared to the total task data, the download time can be neglected. Therefore, the total service delay experienced by the user mainly consists of two components: transmission delay and computation delay. The transmission delay can be expressed as
T n i , j u p t = L n i t r n i , j u p t
Let f n i , j t (CPU cycles per second) denote the amount of computing resources allocated to user i on slice n . Then, the computation delay for processing the task on slice n can be expressed as
T n i , j c o m t = C n i t f n i , j t
Therefore, for user i associated with base station j , the total delay can be expressed as
T n i , j t = T n i , j u p t + T n i , j c o m t
In the above expression, the two terms represent the transmission delay and the processing delay of the task, respectively. Moreover, the computational capabilities vary across different types of base stations. The SLT slice is designed for applications that can tolerate a certain degree of delay but have requirements for high reliability and substantial data volume. Therefore, for a user ii associated with the SLT slice, the constraint is defined by a maximum tolerable latency threshold φ SLT c o m p , The constraint is formalized as:
T n , j t φ SLT c o m p ,   n = SLT
For users in the SLS slice, which caters to ultra-reliable low-latency communication (URLLC) services such as autonomous driving and industrial control, the QoS requirement is the most stringent. The total service delay must not exceed a very small maximum tolerable latency threshold, and the service must be highly reliable. Therefore, the QoS constraint for a user ii associated with the SLS slice is defined as follows:
T n , j t φ SLS c o m p ,   n = SLS
where φ SLS c o m p is the maximum end-to-end latency threshold for the SLS slice, and its value is typically much smaller than that of the SLT slice (i.e., φ SLS c o m p φ SLT c o m p ).
For the SR slice, the primary concern is the transmission rate of users. Therefore, the transmission rate of a user assigned to the SR slice should exceed the minimum rate threshold defined for the slice, i.e.,
r n i , j t R e

4. Problem Formulation

In this section, our objective is to minimize the overall system cost while maximizing system utility, subject to the service requirements of users across different network slices.

4.1. System Cost

The total system cost of network slicing consists of the operational cost and the reconfiguration cost of the slices.

4.1.1. Operational Cost

The operational cost of a slice depends on the bandwidth and computational resources allocated to it by the base stations. Thus, at time slot t , the operational cost can be expressed as
U o ( t ) = n N i I n j J ζ o , j , b b n , j ( t ) W j + ζ o , j , f f n , j ( t ) F j
where ζ o , j , b and ζ o , j , f denote the unit costs of bandwidth and computational resources at base station j , respectively.

4.1.2. Reconfiguration Cost

At different time slots t , the service requests of user i may vary dynamically, necessitating adjustments in resource allocation strategies. Moreover, the reconfiguration cost of a slice is closely related to the type of user service. The reconfiguration cost at time slot t is defined as follows
U r ( t ) = n N i I n j J ζ r , j , b C 1 + ζ r , j , f C 2
For users in the SR slice:
C 1 = b n , j t W j b n , j t 1 W j +
C 1 = b n , j ( t ) W j b n , j ( t 1 ) W j , if   b n , j ( t ) W j > b n , j ( t 1 ) W j 0 , otherwise
For users in computation offloading slices:
C 2 = f n , j t F j f n , j t 1 F j +
C 2 = f n , j t F j f n , j t 1 F j ,   if   f n , j t F j > f n , j t 1 F j 0 otherwise
Based on the above analysis, the total system cost of network slicing can be expressed as follows
U cost t = U o t + U r t

4.2. System Revenue

In this paper, the system revenue is defined as consisting of two components: the communication rate utility and the computational efficiency utility. Therefore, the total system revenue can be expressed as follows
U renu ( t ) = n N i I n j J ω 1 r n i , j ( t ) + ω 2 C n i ( t ) T n i , j com ( t )
The joint problem of user association, slice resource allocation, and UAV trajectory optimization is formulated with the objective of maximizing the overall system utility while minimizing the total system cost. Therefore, the system optimization objective is defined as follows
P 1 : max x , b , f , l t T U renu ( t ) U cost ( t )
C 1 : 0 x u UAV ( t ) x max , u U
C 2 : 0 y u U A V ( t ) y max , u U
C 3 :   V min 2 V u ( t ) 2 V max 2 , u U
C 4 : Δ l u ( t ) = l u ( t + 1 ) l u ( t ) L max x , u U
C 5 : l u ( t ) p k 2 r N F Z 2 , u U , k K
C 6 : T n i , j ( t ) φ n comp , n S L S , S L T , i I n , j J
C 7 : r n i , j t R e , n N , i I , j J
C 8 :   b n , j t [ 0 , 1 ] , n N , j J
C 9 :   f n , j t [ 0 , 1 ] , n N , j J
C 10 :   n N b n , j t = 1 , n N ,
C 11 : n N f n , j t = 1 , n N
C 12 : x n i , j t 0 , 1 ,   n N , i I n , j J
C 13 : j J x n i , j t 1 , n N , i I n
C 14 : n N B n W j ,   j J
C 15 :   n N F n F j ,   j J
where x = x n i , j t denotes the set of user-associated variables, while b = b n , j t and f = f n , j t represents the set of bandwidth resource allocation ratios and the set of computing resource allocation ratios. The l = l u ( t ) denotes the set of UAV trajectories. Constraints C1–C4 define the limitations on UAV flight trajectory and velocity; C5 represents the NFZ constraint; C6–C7 impose requirements on the data transmission rate and service delay for slice n ; C8–C9 specify the proportions of bandwidth and computing resources that base station j allocates to user i on slice n ; C10–C11 indicate that the total available bandwidth and computing resources of the base station are distributed among all users; C12 defines the user association as a binary variable, when x n i , j t   =   1 , it implies that user i on slice n is associated with base station j , otherwise x n i , j t   =   0 ; C13 ensures that each user can connect to at most one base station; and C14 and C15 represent the total inter-slice constraints on bandwidth and computing resources, respectively.
Solving the original optimization problem is challenging for the following reasons. First, the convexity of the objective function is not guaranteed. Second, the user association variable x n i , j t is binary, which introduces integer constraints, and constraint C12 is non-convex. Therefore, the objective function forms a mixed-integer non-convex optimization problem, which is generally difficult to solve optimally. To address the computational complexity brought by the mixed-integer non-convex nature of the original problem, we relax the binary user association variables and further decompose the problem into three subproblems: user association, resource allocation, and UAV trajectory optimization, as illustrated in Figure 3.
These three subproblems are not solved by a simple one-time sequential process but are instead iteratively optimized within an alternating optimization framework, as illustrated in Figure 3. The workflow of the proposed SAC-MS algorithm is as follows: in each iteration, the UA subproblem is first solved while keeping the resource allocation and trajectory fixed. Then, using the updated association strategy and resource allocation scheme, the TO subproblem is solved. Finally, the resource allocation subproblem is addressed based on the updated association strategy and trajectory. This alternating process is repeated until convergence.
Specifically, given the bandwidth allocation ratio b n , j t , the computation resource allocation ratio f n , j t , and the UAV trajectory optimization strategy l u t , the user association problem can be formulated as follows
s u b U A : max x t T U renu ( t ) U cost ( t )
s . t .   C 6 C 7 ,   C 12 C 13
Given the bandwidth allocation b n , j t , computing resource allocation f n , j t , and user association strategy x n i , j t in P1, the UAV positions can be optimized by solving the following problem.
s u b T O : max l t T n N i I n j J ω 1 r n i , j ( t )
s . t .   C 1 C 7
Given the user association strategy x n i , j t and the UAV position l u t , the problem of bandwidth and computation resource allocation can be formulated as follows
s u b R A : max b , f t T U renu ( t ) U cos t ( t )
s . t .   C 6 C 11 ,   C 14 C 15

5. Proposed Algorithm

In this section, we propose a joint user association, UAV trajectory optimization, and resource allocation algorithm (SAC-MS) to iteratively solve the three subproblems mentioned above. First, a matching game approach is adopted for the user association problem. For the UAV trajectory optimization problem, auxiliary variables are introduced, and the SCA method is employed to solve it. Finally, the resource allocation problem is addressed using a reinforcement learning algorithm to learn the optimal strategy for bandwidth and computation resource allocation.

5.1. User Association (UA)

To solve the user association subproblem with reduced computing complexity, we apply many-to-one matching theory to obtain the optimal association strategy. In this model, each user can associate with only one base station, while each base station can serve multiple users. Therefore, as illustrated in Figure 4, the relationship between users and base stations is modeled as a many-to-one matching problem.
Formally, the user association problem can be modeled as a many-to-one matching game, represented by a tuple J ,   I ,   J ,   I , where J = { j } j J ,   I = { i } i I denote the preference sets of base stations and users, respectively. The user association matching game is denoted by θ , and the many-to-one matching between the base station set J and the user set I must satisfy the following conditions [38].
  • θ i J , θ ( i ) 1 , i I n
  • θ ( j ) I n , θ j I j max , i I n
  • θ i = j i θ j
The first property restricts each user to be paired with only one base station. The second property limits the number of users that can be matched with each base station. The third property requires that a user can only be associated with a base station if the base station agrees to provide service to that user.
Let S j i and S i j denote the utility functions of base station j for user i , and of user i for base station j , respectively. If user i prefers base station j over base station j , i.e., S i ( j ) > S i ( j ) , where j ,   j J , this preference relationship is denoted as j i j . Conversely, if base station j prefers user i over user i , i.e., S j ( i ) > S j ( i ) , where i ,   i I , this preference relationship is denoted as i j i [39].
Utility Function of Users: Considering resource efficiency, each user tends to select the base station that can allocate more resources. Based on this, the user’s utility function is modeled as the throughput per unit of energy consumption, i.e.,
S i j = x n i , j t r n i , j t p n i , j t
Utility Function of Base Stations: To enhance overall system performance, each base station tends to serve users with the strongest received signal strength. Accordingly, the utility function of a base station is defined based on the signal quality of its associated users, aiming to maximize system utility, i.e.,
S j i = p n i , j t g n i , j t
The user utility function and the base station utility function are not independent optimization objectives, but preference indicators in the many-to-one matching game. Although stable matching does not strictly maximize system utility, it eliminates all blocking pairs and achieves a Pareto-efficient equilibrium, thus effectively enhancing system performance. We provide the pseudocode for addressing the user association problem using a many-to-one bilateral stable matching game, as shown in Algorithm 1, and analyzed the computational complexity of the algorithm.
We provide the pseudocode for addressing the user association problem using a many-to-one bilateral stable matching game, as shown in Algorithm 1, and analyze the computational complexity of the algorithm.
Algorithm 1: Association Algorithm Based on Stable Matching Game.
1: Input : Preference matrices for BSs and users, Utility functions for both BSs and users
to calculate preferences
2: Ouput : Stable matching results between BSs and users
3:Initialize:
4:  (a) Set all users as “free” (not yet matched);
5:  (b) Set all BSs with empty connected user lists;
6:  (c) Set BSs’ rejection list;
7:  (d) Set a list of requests for each user;
8:Composition of preference lists:
9:  (a) BS and user exchange their listing information;
10:  (b) Each user constructs its preference list based on its own Utility Function
S i j = x n i , j t r n i , j t p n i , j t , and rank BSs in descending order of preference;
11:  (c) Each BS constructs its preference list based on its own Utility Function
   S j i = p n i , j t g n i , j t , and rank users in descending order of preference;
12:Match Process:
13:Repeat:
14:  For each user i I that is free:
15:     i applies to the highest-ranked BS j in its preference list;
16:  For each BS j J :
17:   (a) j receives applications from users;
18:   (b) Sort users in the application list according to BS j preference list;
19:   If the number of applicants for j exceeds the quota:
20:     (a) Select the top I n users based on preference;
21:     (b) Reject the remaining users and add them to j rejection list;
22:   else:
23:     (a) Accept all applicants and add them to BS j waiting list;
24:     (b) Add accepted UAVs to n ’s matched UAVs list;
25:For each user i I :
26:  If  i is rejected by a BS:
27:    (a) i re-applies to the next most preferred BS;
28:    (b) Update the application list of i and write down the BS that has been
applied;
29:  end if
30:end for
31:For each BS j J :
32:    (a) Update the waiting list based on the newly accepted users;
33:    (b) Combine the original waiting list with the newly accepted users;
34:end for
The computational complexity of the user association matching game consists of the complexity of constructing the participants’ preference lists and the complexity of executing the game. Each user ranks the base stations in descending order according to its own preferences, thereby generating its preference list. This ranking step results in a computational complexity of O J log J for each user, and thus O J I log J for I users. Similarly, the computational complexity for J base stations is O J I log I . Therefore, the total complexity of constructing preference lists for all users and base stations is O J I log J I . During the execution of the game, each user sends an association request to a base station, and the iteration coefficient of the game determines its execution complexity. Accordingly, the computational complexity of executing the game is O I . However, the complexity of constructing the preference lists is more significant than that of executing the game. Hence, the overall computational complexity of Algorithm 1 is O J I log J I .

5.2. UAV Trajectory Optimization (TO)

After obtaining the user association strategy x n i , j t , we incorporate it into the UAV trajectory optimization subproblem, which results in the following formulation.
sub TO : max l t T n N i I n u U r n i , u t = t T n N i I n u U x n i , u ( t ) y u , n i ( t ) W u log 1 + p u , n i ( t ) g n i , u ( t ) u U { u } p u , n i ( t ) g n i , u ( t ) + σ 2 = t T n N i I n u U x n i , u ( t ) y u , n i ( t ) W u log 1 + p u , n i ( t ) h n i , u ( t ) u U { u } p u , n i ( t ) h n i , u ( t ) + σ 2 d n i , u α ( t )
s . t .   C 1 C 7 ,
C 16 : log 1 + B u , n i t Γ n i , u t R t h
where B u , n i t = p u , n i t h n i , u t , and Γ n i , u t = Γ n i , u t = u U \ { u } p u , n i t h n i , u t + σ 2 d n i , u t α . Note that, due to the non-convexity of constraints C5 and C7, the UAV trajectory optimization problem is neither a concave optimization problem nor a quasi-concave maximization problem. As a result, it is generally difficult to obtain the global optimum using existing methods, and no efficient solution method currently exists. To address this issue, we adopt the SCA technique. Specifically, we first introduce an auxiliary variable set H = h n i ( t ) , i I to approximate and reformulate the original problem for iterative solution, i.e.,
C 17 : h n i t 1 l u ( t ) l n i ( t ) 2 + H 2 α / 2 ,   i I
Therefore, the original UAV trajectory optimization problem, denoted as s u b T O , can be reformulated as the following problem.
s u b T O 1 : max l , h t T n N i I n u U r n i , u ( t ) = t T n N i I n u U x n i , u ( t ) y u , n i ( t ) W u log 1 + B u , n i ( t ) h u , n i ( t )
Since the above optimization problem is also non-convex, we employ the SCA method to obtain a linear lower-bound approximation. Let f l u ( t ) = 1 l u ( t ) l n i ( t ) 2 + H 2 α / 2 , the function is neither globally convex nor concave; however, it can be approximated near a given point l u ( ϖ ) t using a first-order Taylor expansion, as shown below.
f l u ( t ) f l u ( ϖ ) ( t ) + f l u ( ϖ ) ( t ) T l u ( t ) l u ( ϖ ) ( t )
where f l u ( ϖ ) ( t ) = 1 l u ( ϖ ) ( t ) l n i ( t ) 2 + H 2 α / 2 , f = α l u t l n i t l u ( ϖ ) t l n i t 2 + H 2 α 2 + 1 .Therefore, the original constraint can be approximated by its lower bound, i.e.,
C 18 : h n i t f l u ( ϖ ) ( t ) + f l u ( ϖ ) ( t ) T l u ( t ) l u ( ϖ ) ( t ) ,   i I
Similarly, constraint C5 is non-convex. To address this, we define L ϖ   =   l u ϖ t as the trajectory obtained at the ϖ th iteration. Then, by applying the first-order Taylor expansion, constraint C5 can be relaxed and approximated as follows.
l u t p k 2 2 l u ( ϖ ) t p k T × l u ( ϖ ) t l u t + p k l u ( ϖ ) t 2 = Δ C u , k ϖ t
Therefore, constraint C5 is transformed into constraint C 19 : C u , k ϖ t r N F Z 2 , which is convex. The equation C u , k ϖ t r N F Z 2 defines a line (a hyperplane in 2D) that is tangent to the NFZ circle at the point where the line connecting p k and l u ( ϖ ) ( t ) intersects the circle. The inequality C u , k ϖ t r N F Z 2 then specifies the half-space that excludes the NFZ, effectively replacing the circular keep-out zone with a linear keep-out boundary at each iteration. Since the Taylor expansion provides a lower bound for the convex quadratic function l u t p k 2 , this approximation ensures that the solution at iteration ϖ + 1 will strictly satisfy the original NFZ constraint if the reference point l u ϖ t itself is feasible. The UAV trajectory optimization subproblem s u b T O 1 can be transformed into problem s u b T O 2 .
s u b T O 2 : max l , h t T n N i I n u U r n i , u ( t ) = t T n N i I n u U x n i , u ( t ) y u , n i ( t ) W u log 1 + B u , n i ( t ) h u , n i ( t )
s . t .         C 1 C 4 ,   C 6 C 7 ,   C 19
Through the above transformations and approximations, the UAV trajectory optimization problem s u b T O 2 is converted into a convex optimization problem that can be efficiently solved using standard convex optimization solvers such as CVX [40]. In Algorithm 2, we present the pseudocode for UAV trajectory optimization based on SCA and analyze its computational complexity.
Algorithm 2: UAV Trajectory Optimization Algorithm Based on SCA.
1: Input: User Association Strategy x i , j t , Slice Resource Allocation Ratio b n ( t ) , f n ( t ) , Initial trajectory L 0 = l u ( 0 ) ( t ) , NFZ’s Information p k and r N F Z , Maximum
Iterations ϖ max ;
2: Output: Optimized UAV trajectory L * = l u * ( t )
3:  Initialize iteration index ϖ 0 ;
4:  Repeat:
5:   For the ϖ -th iteration, given reference trajectory L ( ϖ )
6:      Solve the convex optimization problem (sub-TO2):
7:     Let L ϖ + 1 = l u ϖ + 1 ( t ) be the optimal solution of the above problem;
8:     Update ϖ ϖ + 1 ;
9:     Until ϖ > ϖ max
10:     Return L * = L ϖ
In this study, we optimize the UAV trajectory using SCA and CVX. According to [24], the computational complexity of CVX is O U ϖ 3.5 .

5.3. Resource Allocation (RA)

We propose a reinforcement learning algorithm based on Soft Actor–Critic (SAC) to solve the resource allocation problem. SAC is a deep reinforcement learning algorithm that is well-suited for continuous action spaces. It is built upon the maximum entropy reinforcement learning framework and aims to maximize the expected cumulative reward by introducing an entropy term as a regularizer. This encourages exploration in the policy space to find the optimal policy μ . In the SAC algorithm, the Markov Decision Process (MDP) is defined by the tuple S , A , R [41], where the state space S and the action space A are continuous, and R denotes the reward function. The three key components of the MDP are defined as follows.
(1) State Space: In this system, the allocation decisions for bandwidth and computing resources are influenced by the number and size of data packets within each slice. Therefore, the system state at each time slot is defined as the number of data packets that each slice needs to transmit, i.e., s ( t ) = p a c k e t n t , n N .
(2) Action Space: The action space includes two optimization variables, defined as a ( t ) = b n t , f n t , n N . At time slot t , the agent selects an action a ( t ) from the action space to make a decision.
(3) Reward: The reward is a function of the state and action, reflecting the quality of an action taken in a given state. To effectively solve the optimization problem, we define the reward function as follows
r ( t ) = ξ 1 U renu ( t ) U cos t ( t ) ξ 2 i I max R e r n i , j ( t ) , 0               ξ 3 max T n i , j ( t ) φ n comp ( t ) , 0
The purpose of designing this reward function is to maximize system utility while ensuring a minimum rate threshold and reducing computational bottlenecks. The function consists of three components: the first term represents the difference between system revenue and cost; the second term is a penalty for insufficient data rate, ensuring that each user’s rate requirements are met; and the third term reflects the deviation between actual latency and the slice’s tolerable delay, serving as a penalty for delay violations. These three components are balanced using weighting factors ξ 1 , ξ 2 , and ξ 3 to comprehensively account for different optimization objectives.
The goal of SAC is not solely to maximize the expected return, but to maximize the expected return while also maximizing the entropy of the policy. To address the problem of resource allocation in network slicing, the resource allocation algorithm based on Soft Actor–Critic (SAC-RA) defines a maximum entropy objective formulated as follows.
J μ = t = 1 T E s ( t ) , a ( t ) ρ μ r s ( t ) , a ( t ) α ˜ log μ ( · | s ( t ) )
where α ˜ is the temperature parameter that balances the trade-off between reward maximization and entropy maximization, and ρ μ represents the state-action distribution under the current policy μ . In SAC-RA, the parameters β 1 and β 2 of the Q-function are updated at fixed time intervals by minimizing the soft Bellman residual [42].
J Q β i = E s ( t ) , a ( t ) ~ D 1 2 Q β i s ( t ) , a ( t ) y ^ ( t ) 2
where y ^ ( t ) = r t + γ min κ = 1 , 2 Q β ¯ κ s t + 1 , a t + 1 α ˜ log μ φ a t + 1 | s t + 1 . The parameter of policy function φ is updated by
J μ φ = E s ( t ) D , ε ( t ) T α ˜ log μ φ f φ ε ( t ) ; s ( t ) | s ( t ) min κ = 1 , 2 Q β κ s ( t ) , f φ ε ( t ) ; s ( t )
In the above expression, f φ ε ( t ) ; s ( t ) is a reparameterization function [43], which maps the state s ( t ) and noise ε ( t ) to the action a ( t ) . Specifically, given a state s ( t ) , the action is computed through a t = f φ ε ( t ) ; s ( t ) . The overall framework of SAC-RA is illustrated in Figure 5, and the resource allocation algorithm SAC-RA is shown in Algorithm 3.
Algorithm 3: SAC-RA Algorithm.
1: I n i t i a l i z e : policy network φ , Q-functions β 1 and β 2 , Parameters of target networks β ¯ 1 β 1 , β ¯ 2 β 2 , replay buffer D , learning rate λ , discount factor γ , entropy coefficient α ˜ .
2: for each episode do
3: Initialize state s ( t ) ;
4:for step t = 1 to T do
5:    Selects action a t according to μ a t | s t ;
6:    Execute action a ( t ) , and observe the immediate reward r ( t ) and the
    state s t + 1
7:   Store s t , a t , r t , s t + 1 in the experience replay buffer D ;
8:   if replay buffer size batch size then
9:     Randomly sample batch s t , a t , r t , s t + 1 from buffer D ;
10:     Calculate the target Q-value, i.e., Equation (43);
11:     Update Q Network: β i β i λ β i J Q β i ;
12:     Update policy-function parameter: i.e., Equation (44), φ φ λ φ J μ φ ;
13:    Update the target network using soft updates: β ¯ i τ β i + 1 τ β ¯ i ;
14:   end if
15:  end for
16:end for
We assume that the Actor and Critic networks in SAC are Z -layer fully connected neural networks, with Y Z denoting the number of neurons in the Z -th layer. Thus, the forward/backward propagation cost of the neural networks is z = 0 Z 1 Y Z × Y Z + 1 . Let N b be the batch size and N E p i s o d e the total number of training epochs. Accordingly, the computational complexity of the SAC-based slice resource allocation algorithm is O z = 0 Z 1 Y Z × Y Z + 1 × N b × N E p i s o d e . Based on the computational complexities of each algorithm discussed above, the overall computational complexity of the proposed SAC-MS algorithm is O K O J I log J I + O U ϖ 3.5 + O z = 0 Z 1 Y Z × Y Z + 1 × N b × N E p i s o d e .

6. Simulation Results and Analysis

In this section, we first configure the key parameters for the experiments. Then, through a series of simulation experiments, we systematically evaluate the proposed algorithms, including hyperparameter analysis, environment parameter analysis, and comparisons with benchmark algorithms.

6.1. Simulation Setup

We verify the performance of the proposed SAC-MS algorithm through simulations implemented using Python 3.11.9 and PyTorch 2.3.0. The network topology is configured over a 600 m × 600 m area, including one macro base station (MBS), three small base stations (SBSs), 100 ground users, and two no-fly zones. The shadow fading for the MBS and SBS is set to 8 dB and 10 dB, respectively [44]. The additive white Gaussian noise power σ 2 is set to −174 dBm, and the carrier frequency f c is 2 GHz. The UAV operates at a fixed altitude of 80 m, with a maximum speed of 15 m/s and a minimum speed of 6 m/s. For the reinforcement learning setup, the learning rates of the actor and critic networks are set to 0.001 and 0.0015, respectively. The batch size is 128, the temperature coefficient is 0.001, and the total number of training iterations is 6000. We provide the training parameters of the experiments and summarize them in Table 4.

6.2. Parameter Analysis

6.2.1. Analysis of Hyperparameters

In reinforcement learning, the choice of hyperparameters significantly affects the convergence behavior, learning stability, and overall performance of the algorithm. To ensure the reliability and robustness of the proposed SAC-MS algorithm, we conduct a systematic analysis of key hyperparameters, including the learning rates of the actor and critic networks, the batch size, and the temperature coefficient.
We conduct simulations using different learning rates. As shown in Figure 6, setting the actor and critic learning rates to 0.1 and 0.15 leads to strong reward oscillations, indicating unstable updates and poor convergence. When reduced to 0.01 and 0.015, the algorithm achieves faster initial learning but converges to a lower reward. In contrast, smaller rates (0.001 and 0.0015) result in slower initial progress but lead to more stable convergence and better final performance. This highlights the trade-off between convergence speed and stability in learning rate selection.
Figure 7 compares the reward convergence under different batch sizes: 64, 128, and 256. All settings show rapid reward growth after initial exploration. The batch-size = 256 achieves faster early convergence but suffers from a noticeable performance drop around iteration 5400, indicating potential instability due to over-smoothed gradients. The batch-size = 64 yields more stable but slower learning due to high variance in updates. In contrast, batch-size = 128 provides the best trade-off, ensuring fast and stable convergence without significant fluctuations. This demonstrates that a moderate batch size (128) leads to more reliable training dynamics in SAC-MS.
As shown in Figure 8, different entropy temperature learning rates exhibit distinct convergence behaviors. A large alpha-lr (e.g., 0.1 and 0.01) results in rapid initial learning but introduces significant instability during training due to excessive entropy adjustments. Conversely, smaller values (e.g., 0.001 and 0.0001) provide smoother and more stable learning curves, ensuring consistent policy improvement. However, when the entropy temperature coefficient is too small (e.g., 0.0001), the convergence speed becomes relatively slow.

6.2.2. Analysis of Environmental Parameters

From Figure 9 and Figure 10, it can be observed that as bandwidth and computational resources increase, the system utility generally rises monotonically and gradually exhibits diminishing marginal returns. When resources are scarce (e.g., in the 70–100 range), system utility grows rapidly; as resources become sufficient (e.g., in the 115–130 range), the growth rate decreases, and system performance tends to saturate. It should be noted that in certain intervals (e.g., 100–115), the increase appears relatively larger. This is because the simulation results are based on discrete sampling points, and the additional resources in this interval happen to satisfy the service requirements of some users, resulting in a temporary improvement. Such local fluctuations do not alter the overall trend: as resources increase, the growth of system utility gradually slows and approaches saturation.
As shown in Figure 11, the system utility of all algorithms increases with the number of users. However, the superior efficiency of the proposed SAC-MS algorithm allows it to fully utilize the available resources, causing its growth to significantly slow down as it approaches the system capacity bottleneck at around 100 users. In contrast, the less efficient DDPG-MS and TD3-MS algorithms exhibit a more linear growth trend within the observed range, indicating that they have not yet fully stressed the system resources. This further demonstrates that SAC-MS can achieve higher resource utilization efficiency.
Figure 12 illustrates the optimal UAV trajectory planning results under no-fly zone constraints. As shown in the figure, the UAV not only successfully avoids the designated NFZs, ensuring flight safety and regulatory compliance, but also navigates as close as possible to user-dense areas. Such a trajectory design significantly shortens the communication distance between the UAV and ground users, thereby improving the signal-to-noise ratio (SNR), enhancing data transmission rates, and reducing communication latency. Moreover, flying near regions with high user demand facilitates more efficient task execution and real-time resource allocation.

6.2.3. Performance Comparison

Figure 13 illustrates the comparison of system utility achieved by different algorithms in the resource allocation task. As shown in the figure, the SAC algorithm converges rapidly after approximately 1000 training iterations and consistently maintains the highest utility throughout the entire training process, demonstrating its superior policy representation capability and efficient exploration performance in continuous action spaces. TD3 and DDPG follow in terms of performance, with TD3 showing better convergence stability than DDPG due to its delayed update mechanism and double Q-network structure. In contrast, DQN adopts a discrete action space, resulting in coarse-grained resource allocation and poor adaptability to dynamic environments, leading to significant fluctuations in system utility. Moreover, the hard slicing strategy does not incorporate any learning mechanism and thus cannot dynamically optimize according to environmental changes, resulting in the lowest utility level. These results further highlight the advantages of continuous action space-based deep reinforcement learning methods SAC in complex and high-dimensional resource allocation tasks.
As shown in Figure 14, the performance of TD3 under different hyperparameter settings is illustrated. Compared to TD3, SAC employs a maximum entropy framework, which enhances exploration efficiency and helps avoid local optima. In terms of system utility, SAC achieves approximately a 2.2% improvement over TD3’s best hyperparameter configuration (lr = 0.0001, batch_size = 256).
As shown in Figure 15, the performance of the DDPG-based algorithm under different hyperparameters is illustrated. Compared to DDPG, SAC demonstrates faster convergence, addresses the Q-value overestimation problem in DDPG, and is better suited for high-dimensional tasks. Relative to DDPG’s best hyperparameter combination (lr = 0.001, batch_size = 256), SAC achieves a 7.8% improvement in system utility.
Figure 16 compares DQN variants under different hyperparameter settings. DQN is suitable for discrete action spaces; however, when applied to continuous control tasks, it typically suffers from slow convergence, large fluctuations, and difficulty in achieving optimal performance. As illustrated in the figure, SAC outperforms DQN’s best hyperparameter configuration (lr = 0.001, batch_size = 128) by achieving a 31.25% improvement in system utility.
Figure 17 illustrates the trend of system utility over training iterations under different UAV flight strategies. It can be observed that when the UAV flight strategy accounts for no-fly zone constraints, the system utility is significantly higher compared to the case without such consideration. This is because no-fly zones act as hard constraints that limit the feasible flight region of the UAV. If these zones are not effectively avoided during path planning, the UAV may traverse illegal areas, resulting in communication interruptions, invalid trajectories, or resource waste. In contrast, flight strategies that consider no-fly zones can proactively avoid restricted areas and plan feasible paths, thereby ensuring communication link continuity and service quality, ultimately improving resource allocation efficiency and overall system performance.
To evaluate the impact of different user association strategies on system performance. Figure 18 compares three schemes: matching game-based association (SAC-MS), nearest-distance association (SAC-NS), and random association (SAC-RS). The figure illustrates the trend of utility values over training iterations. The results show that SAC-MS consistently achieves the highest utility, eventually stabilizing around 189, demonstrating superior system performance. This is mainly attributed to the matching game’s two-sided preference model, which dynamically matches users with base stations by considering both channel conditions and base station capacity, effectively avoiding base station overload and resource wastage, thereby enhancing overall system utility. In contrast, SAC-NS converges quickly in the early stages but considers only distance while ignoring interference and load. SAC-RS, which randomly selects base stations without considering channel state or user demand, shows poor convergence and large fluctuations, with a final utility of only around 150-indicating the weakest performance.

7. Discussion and Future Direction

The proposed SAC-MS framework provides a theoretical foundation for joint resource optimization in space–air–ground integrated networks (SAGINs) and demonstrates significant performance improvements in simulations. However, several challenges and limitations remain when moving from theoretical modeling to practical deployment, which also point to promising future research directions.
First, the limited battery capacity of UAVs constrains flight endurance and service coverage, suggesting that incorporating an energy consumption model into trajectory optimization is a valuable extension. Second, communication delays and signaling overhead in real SAGINs may weaken the timeliness of resource allocation, highlighting the need for delay-tolerant mechanisms and lightweight coordination protocols. Third, this study primarily focuses on a single-UAV scenario; although the proposed method can be extended to multi-UAV networks, the computational complexity of SCA-based trajectory optimization will significantly increase with the number of UAVs. Future work may explore distributed architectures and multi-agent reinforcement learning to enhance scalability and practicality. Fourth, the communication model relies on the assumption of perfect and instantaneous CSI, which is highly challenging to obtain in real environments with high mobility and severe fading. Developing robust optimization frameworks that can withstand CSI uncertainty and estimation errors will be an important direction. Fifth, although the proposed method has the potential to be extended to account for satellite orbital dynamics, in this study the satellite position is treated as fixed or only slowly varying over time. This limitation also provides new insights and directions for future research. Finally, the NFZs in this study are idealized as static cylindrical regions, whereas in reality they may be dynamic, irregularly shaped, or partially unknown, and may also involve multi-layer altitude restrictions. Future research could integrate real-time data from UAV traffic management (UTM) systems to enable trajectory planning under more complex and dynamic NFZ conditions.

8. Conclusions

To address challenges such as resource limitations caused by the exponential growth in user service demands, coverage blind spots in remote areas, and restrictions imposed by no-fly zones (NFZs), this paper investigated the joint optimization problem of user association, UAV trajectory planning, and network slicing resource allocation in space–air–ground integrated networks (SAGINs) under NFZ constraints. To facilitate the solution process, the original problem was decomposed into three subproblems: user association was solved using a many-to-one matching game; UAV trajectory was optimized via the sequential convex approximation (SCA) algorithm; and dynamic slice resource allocation was achieved through a deep reinforcement learning (DRL) framework (SAC). Simulation results demonstrated that, compared with benchmark algorithms, the proposed SAC-MS method more effectively satisfied users’ quality of service (QoS) requirements, avoided various no-fly zones, and achieved better resource allocation balance across network slices, thereby enhancing the overall system utility.

Author Contributions

Conceptualization, G.C.; Data curation, F.S.; Formal analysis, G.C. and F.S.; Methodology, T.P.; Validation, G.J.; Writing—original draft, G.C.; Writing—review and editing, F.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China under Grant No. 61701284, 52574256, 52374221, the Natural Science Foundation of Shandong Province of China under Grant No. ZR2022MF226, ZR2022MF288 and ZR2023MF097, the Talented Young Teachers Training Program of Shandong University of Science and Technology under Grant No.BJ20221101, the Innovative Research Foundation of Qingdao under Grant No. 19-6-2-1-cg, the Elite Plan Project of Shandong University of Science and Technology under Grant No. skr21-3-B-048, the Taishan Scholar Program of Shandong Province under Grant No. tstp20250506.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors are grateful to the anonymous reviewers for providing us with so many valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hossain, A.R.; Ansari, N. Priority-based downlink wireless resource provisioning for radio access network slicing. IEEE Trans. Veh. Technol. 2021, 70, 9273–9281. [Google Scholar] [CrossRef]
  2. Nguyen, D.C.; Ding, M.; Pathirana, P.N.; Seneviratne, A.; Li, J.; Niyato, D.; Dobre, O.; Poor, H.V. 6g internet of things: A comprehensive survey. IEEE Internet Things J. 2021, 9, 359–383. [Google Scholar] [CrossRef]
  3. You, X.; Wang, C.-X.; Huang, J.; Gao, X.; Zhang, Z.; Wang, M.; Huang, Y.; Zhang, C.; Jiang, Y.; Wang, J.; et al. Towards 6g wireless communication networks: Vision, enabling technologies, and new paradigm shifts. Sci. China Inf. Sci. 2021, 64, 110301. [Google Scholar] [CrossRef]
  4. Liu, J.; Du, X.; Cui, J.; Pan, M.; Wei, D. Task-oriented intelligent networking architecture for the space–air–ground–aqua integrated network. IEEE Internet Things J. 2020, 7, 5345–5358. [Google Scholar] [CrossRef]
  5. Zhou, G.; Zhao, L.; Zheng, G.; Song, S.; Zhang, J.; Hanzo, L. Multi objective optimization of space–air–ground-integrated network slicing relying on a pair of central and distributed learning algorithms. IEEE Internet Things J. 2023, 11, 8327–8344. [Google Scholar] [CrossRef]
  6. Hu, Z.; Zeng, F.; Xiao, Z.; Fu, B.; Jiang, H.; Xiong, H.; Zhu, Y.; Alazab, M. Joint resources allocation and 3d trajectory optimization for uav-enabled space-air-ground integrated networks. IEEE Trans. Veh. Technol. 2023, 72, 14214–14229. [Google Scholar] [CrossRef]
  7. Wang, Q.; Chen, Z.; Li, H.; Li, S. Joint power and trajectory design for physical-layer secrecy in the uav-aided mobile relaying system. IEEE Access 2018, 6, 62849–62855. [Google Scholar] [CrossRef]
  8. Yi, W.; Liu, Y.; Bodanese, E.; Nallanathan, A.; Karagiannidis, G.K. A unified spatial framework for uav-aided mmwave networks. IEEE Trans. Commun. 2019, 67, 8801–8817. [Google Scholar] [CrossRef]
  9. Wu, Q.; Zeng, Y.; Zhang, R. Joint trajectory and communication design for uav-enabled multiple access. In Proceedings of the GLOBECOM 2017—2017 IEEE Global Communications Conference, Singapore, 4–8 December 2017; pp. 1–6. [Google Scholar]
  10. Sohail, M.F.; Leow, C.Y.; Won, S. Non-orthogonal multiple access for unmanned aerial vehicle assisted communication. IEEE Access 2018, 6, 22716–22727. [Google Scholar] [CrossRef]
  11. Wang, J.; Liu, M.; Sun, J.; Gui, G.; Gacanin, H.; Sari, H.; Adachi, F. Multiple unmanned-aerial-vehicles deployment and user pairing for nonorthogonal multiple access schemes. IEEE Internet Things J. 2020, 8, 1883–1895. [Google Scholar] [CrossRef]
  12. Liu, X.; Liu, Y.; Chen, Y.; Hanzo, L. Trajectory design and power control for multi-uav assisted wireless networks: A machine learning approach. IEEE Trans. Veh. Technol. 2019, 68, 7957–7969. [Google Scholar] [CrossRef]
  13. Zhai, D.; Li, H.; Tang, X.; Zhang, R.; Ding, Z.; Yu, F.R. Height optimization and resource allocation for noma enhanced uav-aided relay networks. IEEE Trans. Commun. 2020, 69, 962–975. [Google Scholar] [CrossRef]
  14. Wu, P.; Yuan, X.; Hu, Y.; Schmeink, A. Trajectory and user assignment design for uav communication network with no-fly zone. IEEE Trans. Veh. Technol. 2024, 73, 15820–15825. [Google Scholar] [CrossRef]
  15. Li, R.; Wei, Z.; Yang, L.; Ng, D.W.K.; Yuan, J.; An, J. Resource allocation for secure multi-uav communication systems with multi-eavesdropper. IEEE Trans. Commun. 2020, 68, 4490–4506. [Google Scholar] [CrossRef]
  16. Gao, Y.; Tang, H.; Li, B.; Yuan, X. Joint trajectory and power design for uav-enabled secure communications with no-fly zone constraints. IEEE Access 2019, 7, 44459–44470. [Google Scholar] [CrossRef]
  17. Lyu, F.; Yang, P.; Wu, H.; Zhou, C.; Ren, J.; Zhang, Y.; Shen, X. Service-oriented dynamic resource slicing and optimization for space-air-ground integrated vehicular networks. IEEE Trans. Intell. Transp. Syst. 2021, 23, 7469–7483. [Google Scholar] [CrossRef]
  18. Mao, S.; He, S.; Wu, J. Joint uav position optimization and resource scheduling in space-air-ground integrated networks with mixed cloud-edge computing. IEEE Syst. J. 2020, 15, 3992–4002. [Google Scholar] [CrossRef]
  19. Cheng, N.; Lyu, F.; Quan, W.; Zhou, C.; He, H.; Shi, W.; Shen, X. Space/aerial-assisted computing offloading for iot applications: A learning-based approach. IEEE J. Sel. Areas Commun. 2019, 37, 1117–1129. [Google Scholar] [CrossRef]
  20. Zhang, P.; Yang, P.; Kumar, N.; Guizani, M. Space-air-ground integrated network resource allocation based on service function chain. IEEE Trans. Veh. Technol. 2022, 71, 7730–7738. [Google Scholar] [CrossRef]
  21. Jia, H.; Wang, Y.; Wu, W. Dynamic resource allocation for remote iot data collection in sagin. IEEE Internet Things J. 2024, 11, 20575–20589. [Google Scholar] [CrossRef]
  22. Gonçalves, D.M.; Bittencourt, L.F.; Madeira, E.R. Overhead and performance of dynamic network slice allocation for mobile users. Future Gener. Comput. Syst. 2024, 160, 739–751. [Google Scholar] [CrossRef]
  23. Wu, Q.; Zeng, Y.; Zhang, R. Joint trajectory and communication design for multi-uav enabled wireless networks. IEEE Trans. Wirel. Commun. 2018, 17, 2109–2121. [Google Scholar] [CrossRef]
  24. Qin, P.; Wu, X.; Fu, M.; Ding, R.; Fu, Y. Latency minimization resource allocation and trajectory optimization for uav-assisted cache-computing network with energy recharging. IEEE Trans. Commun. 2025, 73, 5715–5728. [Google Scholar] [CrossRef]
  25. Zhang, Y.; Gang, Y.; Wu, P.; Fan, G.; Xu, W.; Ai, B.; Wu, Q. Integrated sensing, communication, and computation in sagin: Joint beamforming and resource allocation. IEEE Trans. Cogn. Commun. Netw. 2025. [Google Scholar] [CrossRef]
  26. Qin, P.; Wu, X.; Cai, Z.; Zhao, X.; Fu, Y.; Wang, M.; Geng, S. Joint trajectory plan and resource allocation for uav-enabled c-noma in air-ground integrated 6g heterogeneous network. IEEE Trans. Netw. Sci. Eng. 2023, 10, 3421–3434. [Google Scholar]
  27. Skouroumounis, C.; Krikidis, I. An evolutionary game for mobile user access mode selection in sub-6 ghz/mmwave cellular networks. IEEE Trans. Wirel. Commun. 2022, 21, 5644–5657. [Google Scholar] [CrossRef]
  28. Zhang, T.; Wang, Y.; Liu, Y.; Xu, W.; Nallanathan, A. Cache-enabling uav communications: Network deployment and resource allocation. IEEE Trans. Wirel. Commun. 2020, 19, 7470–7483. [Google Scholar] [CrossRef]
  29. Zhao, N.; Liang, Y.-C.; Niyato, D.; Pei, Y.; Wu, M.; Jiang, Y. Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks. IEEE Trans. Wirel. Commun. 2019, 18, 5141–5152. [Google Scholar] [CrossRef]
  30. Han, Q.; Yang, B.; Miao, G.; Chen, C.; Wang, X.; Guan, X. Backhaul-aware user association and resource allocation for energy-constrained hetnets. IEEE Trans. Veh. Technol. 2016, 66, 580–593. [Google Scholar] [CrossRef]
  31. Nguyen, M.D.; Le, L.B.; Girard, A. Integrated computation offloading, uav trajectory control, edge-cloud and radio resource allocation in sagin. IEEE Trans. Cloud Comput. 2023, 12, 100–115. [Google Scholar] [CrossRef]
  32. Wei, Q.; Chen, Y.; Jia, Z.; Bai, W.; Pei, T.; Wu, Q. Energy-efficient caching and user selection for resource-limited sagins in emergency communications. IEEE Trans. Commun. 2024, 73, 4121–4136. [Google Scholar] [CrossRef]
  33. Chen, G.; Sun, F.; Liang, H.; Zeng, Q.; Zhang, Y.-D. Maddpg-m&l: Uav-assisted joint user association and slicing resource allocation in hetnets. IEEE Trans. Netw. Sci. Eng. 2025, 12, 2878–2894. [Google Scholar]
  34. Cui, Y.; Shi, H.; Wang, R.; He, P.; Wu, D.; Huang, X. Multi-agent reinforcement learning for slicing resource allocation in vehicular networks. IEEE Trans. Intell. Transp. Syst. 2023, 25, 2005–2016. [Google Scholar] [CrossRef]
  35. Chen, G.; Qi, S.; Shen, F.; Zeng, Q.; Zhang, Y.-D. Information-aware driven dynamic leo-ran slicing algorithm joint with communication, computing, and caching. IEEE J. Sel. Areas Commun. 2024, 42, 1044–1062. [Google Scholar] [CrossRef]
  36. Zhang, T.; Xu, Y.; Loo, J.; Yang, D.; Xiao, L. Joint computation and communication design for uav-assisted mobile edge computing in iot. IEEE Trans. Ind. Inform. 2019, 16, 5505–5516. [Google Scholar] [CrossRef]
  37. Li, R.; Wei, Z.; Yang, L.; Ng, D.W.K.; Yang, N.; Yuan, J.; An, J. Joint trajectory and resource allocation design for uav communication systems. In Proceedings of the 2018 IEEE Globecom Workshops (GC Wkshps), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar]
  38. Lhazmir, S.; Oualhaj, O.A.; Kobbane, A.; Ben-Othman, J. Matching game with no-regret learning for iot energy-efficient associations with uav. IEEE Trans. Green Commun. Netw. 2020, 4, 973–981. [Google Scholar] [CrossRef]
  39. LeAnh, T.; Tran, N.H.; Saad, W.; Le, L.B.; Niyato, D.; Ho, T.M.; Hong, C.S. Matching theory for distributed user association and resource allocation in cognitive femtocell networks. IEEE Trans. Veh. Technol. 2017, 66, 8413–8428. [Google Scholar] [CrossRef]
  40. Grant, M.; Boyd, S. CVX: Matlab Software for Disciplined Convex Programming, Version 2.2. March 2014. Available online: http://cvxr.com/cvx (accessed on 15 September 2025).
  41. Tian, J.; Liu, Q.; Zhang, H.; Wu, D. Multiagent deep-reinforcement-learning-based resource allocation for heterogeneous qos guarantees for vehicular networks. IEEE Internet Things J. 2021, 9, 1683–1695. [Google Scholar] [CrossRef]
  42. Zhou, X.; Huang, L.; Ye, T.; Sun, W. Computation bits maximization in uav-assisted mec networks with fairness constraint. IEEE Internet Things J. 2022, 9, 20997–21009. [Google Scholar] [CrossRef]
  43. Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
  44. Li, Y.; Sheng, M.; Sun, Y.; Shi, Y. Joint optimization of bs operation, user association, subcarrier assignment, and power allocation for energy-efficient hetnets. IEEE J. Sel. Areas Commun. 2016, 34, 3339–3353. [Google Scholar] [CrossRef]
Figure 1. System scenario of network slicing in a space–air–ground integrated network.
Figure 1. System scenario of network slicing in a space–air–ground integrated network.
Sensors 25 05833 g001
Figure 2. Communication system with No-Fly Zones.
Figure 2. Communication system with No-Fly Zones.
Sensors 25 05833 g002
Figure 3. Flowchart of the Proposed Algorithm.
Figure 3. Flowchart of the Proposed Algorithm.
Sensors 25 05833 g003
Figure 4. Many-to-one matching game between users and BSs.
Figure 4. Many-to-one matching game between users and BSs.
Sensors 25 05833 g004
Figure 5. Sketch of SAC-RA.
Figure 5. Sketch of SAC-RA.
Sensors 25 05833 g005
Figure 6. Convergence of different learning rates.
Figure 6. Convergence of different learning rates.
Sensors 25 05833 g006
Figure 7. Convergence of different batch sizes.
Figure 7. Convergence of different batch sizes.
Sensors 25 05833 g007
Figure 8. Convergence of different temperature coefficients.
Figure 8. Convergence of different temperature coefficients.
Sensors 25 05833 g008
Figure 9. Impact of bandwidth on system utility. The utility of all algorithms increases with more bandwidth but exhibits diminishing returns. Once bandwidth becomes sufficient to meet user demands, performance saturates. The proposed SAC-MS algorithm consistently outperforms benchmarks across all bandwidth values.
Figure 9. Impact of bandwidth on system utility. The utility of all algorithms increases with more bandwidth but exhibits diminishing returns. Once bandwidth becomes sufficient to meet user demands, performance saturates. The proposed SAC-MS algorithm consistently outperforms benchmarks across all bandwidth values.
Sensors 25 05833 g009
Figure 10. Impact of computing resource on system utility. Similarly to bandwidth, increased computing resources improve utility, but the gains diminish after a point due to resource saturation. SAC-MS achieves the highest utility by more efficiently allocating these resources to meet heterogeneous task demands.
Figure 10. Impact of computing resource on system utility. Similarly to bandwidth, increased computing resources improve utility, but the gains diminish after a point due to resource saturation. SAC-MS achieves the highest utility by more efficiently allocating these resources to meet heterogeneous task demands.
Sensors 25 05833 g010
Figure 11. Impact of the number of users on the system utility. System utility grows with the number of users since more users lead to more efficient utilization of bandwidth and computing resources. However, beyond a certain point, resources become a bottleneck: interference increases and scheduling complexity rises, causing the growth rate of system utility to slow down. This illustrates the trade-off between multi-user diversity gain and resource limitation.
Figure 11. Impact of the number of users on the system utility. System utility grows with the number of users since more users lead to more efficient utilization of bandwidth and computing resources. However, beyond a certain point, resources become a bottleneck: interference increases and scheduling complexity rises, causing the growth rate of system utility to slow down. This illustrates the trade-off between multi-user diversity gain and resource limitation.
Sensors 25 05833 g011
Figure 12. UAV trajectory diagram under the no-fly zone constraint. The optimized UAV trajectory avoids NFZs while flying close to dense user areas. This reduces communication distance, enhances SNR, and improves throughput while ensuring safety and regulatory compliance. It highlights how NFZ-aware planning balances safety with communication efficiency.
Figure 12. UAV trajectory diagram under the no-fly zone constraint. The optimized UAV trajectory avoids NFZs while flying close to dense user areas. This reduces communication distance, enhances SNR, and improves throughput while ensuring safety and regulatory compliance. It highlights how NFZ-aware planning balances safety with communication efficiency.
Sensors 25 05833 g012
Figure 13. Impact of different algorithms on system utility. The proposed SAC-MS algorithm converges faster and to a higher average utility than the TD3-MS, DDPG-MS, and DQN-MS benchmarks. SAC’s superior performance is attributed to its maximum entropy framework, which encourages more explorative and stable policy learning in high-dimensional continuous action spaces, avoiding the overestimation bias and training instability common in DDPG and TD3.
Figure 13. Impact of different algorithms on system utility. The proposed SAC-MS algorithm converges faster and to a higher average utility than the TD3-MS, DDPG-MS, and DQN-MS benchmarks. SAC’s superior performance is attributed to its maximum entropy framework, which encourages more explorative and stable policy learning in high-dimensional continuous action spaces, avoiding the overestimation bias and training instability common in DDPG and TD3.
Sensors 25 05833 g013
Figure 14. Convergence performance of the TD3 algorithm based on different hyperparameters. Although careful tuning (e.g., lr = 0.0001, batch_size = 256) stabilizes TD3, SAC still outperforms TD3 by ~2.2%. The advantage arises from SAC’s entropy-regularized objective, which avoids premature convergence to suboptimal policies and encourages diverse exploration.
Figure 14. Convergence performance of the TD3 algorithm based on different hyperparameters. Although careful tuning (e.g., lr = 0.0001, batch_size = 256) stabilizes TD3, SAC still outperforms TD3 by ~2.2%. The advantage arises from SAC’s entropy-regularized objective, which avoids premature convergence to suboptimal policies and encourages diverse exploration.
Sensors 25 05833 g014
Figure 15. Convergence performance of the DDPG algorithm based on different hyperparameters. Even under its best hyperparameter settings (lr = 0.001, batch_size = 256), DDPG lags behind SAC. SAC converges faster and improves utility by ~7.8%, largely because it mitigates Q-value overestimation and handles high-dimensional continuous tasks more robustly.
Figure 15. Convergence performance of the DDPG algorithm based on different hyperparameters. Even under its best hyperparameter settings (lr = 0.001, batch_size = 256), DDPG lags behind SAC. SAC converges faster and improves utility by ~7.8%, largely because it mitigates Q-value overestimation and handles high-dimensional continuous tasks more robustly.
Sensors 25 05833 g015
Figure 16. Convergence performance of the DQN algorithm based on different hyperparameters. DQN shows unstable convergence and poor adaptability because its discrete action space limits fine-grained resource allocation. Compared with DQN’s best case, SAC improves system utility by ~31.25%, highlighting the importance of continuous action learning.
Figure 16. Convergence performance of the DQN algorithm based on different hyperparameters. DQN shows unstable convergence and poor adaptability because its discrete action space limits fine-grained resource allocation. Compared with DQN’s best case, SAC improves system utility by ~31.25%, highlighting the importance of continuous action learning.
Sensors 25 05833 g016
Figure 17. System utility comparison with and without NFZs. SAC-MS generates smoother and shorter UAV trajectories that simultaneously avoid NFZs and minimize communication distance. Baseline methods either detour excessively or approach NFZ boundaries too closely, demonstrating SAC-MS’s ability to balance safety and efficiency.
Figure 17. System utility comparison with and without NFZs. SAC-MS generates smoother and shorter UAV trajectories that simultaneously avoid NFZs and minimize communication distance. Baseline methods either detour excessively or approach NFZ boundaries too closely, demonstrating SAC-MS’s ability to balance safety and efficiency.
Sensors 25 05833 g017
Figure 18. Comparison of different user association methods. The proposed matching game-based method (SAC-MS) significantly outperforms both nearest-station (SAC-NS) and random association (SAC-RS) strategies. This demonstrates that an intelligent association strategy, which dynamically balances user-channel quality and base-station load, is crucial for maximizing overall system performance, rather than simply connecting to the nearest node or making random choices.
Figure 18. Comparison of different user association methods. The proposed matching game-based method (SAC-MS) significantly outperforms both nearest-station (SAC-NS) and random association (SAC-RS) strategies. This demonstrates that an intelligent association strategy, which dynamically balances user-channel quality and base-station load, is crucial for maximizing overall system performance, rather than simply connecting to the nearest node or making random choices.
Sensors 25 05833 g018
Table 1. Summary of main notations.
Table 1. Summary of main notations.
NotationDefinition
L The set of LEO satellite
B The set of terrestrial base stations
U The set of unmanned aerial vehicle
I n The set of users associated with slice
J The set of all base stations
N The set of network slice
K The set of No-Fly Zone
W The set of bandwidth resources
F The set of computing resources
x n i , j t Association variable between user i and base station j on slice n
b n , j t Bandwidth allocation ratio for slice n
f n , j t Computation resource allocation ratio for slice n
y j , n i t Bandwidth allocation ratio from base station j to user i on slice n
ξ 1 , ξ 2 , ξ 3 Weight coefficients
r N F Z No-fly zone radius
α Path loss exponent
α ˜ Temperature coefficient
φ n c o m p Latency tolerance threshold of slice n
R e Minimum transmission rate threshold
Table 2. Summary of Acronyms.
Table 2. Summary of Acronyms.
AcronymsDefinition
SAGINSpace–Air–Ground Integrated Network
UAVUnmanned Aerial Vehicle
LEOLow Earth Orbit
TBSTerrestrial Base Station
MBSMacro Base Station
SBSSmall Base Station
NFZNo-Fly Zone
SLSLatency-Sensitive Slice
SLTLatency-Tolerant Slice
SRHigh-Data-Rate Slice
UAUser Association
TOTrajectory Optimization
RAResource Allocation
SACSoft Actor–Critic
TD3No-fly zone radius
DDPGDeep Deterministic Policy Gradient
DQNDeep Q-Network
SAC-MSMatching game, Sequential Convex Approximation, and Soft Actor–Critic-based Multi-Slice optimization algorithm
Table 3. Literature comparison.
Table 3. Literature comparison.
Related PaperObjectiveUAVNetwork SlicingNFZUAV Trajectory OptimizationUser
Association
Resource
Allocation
[18]minimize the maximum computation delay×××
[19]Minimize the total system cost××××
[21]Maximize the long term network
utility
×××××
[23]Maximize the minimum throughput×××
[24]Minimize the system latency×××
[25]Maximize the system EE×××
[26]Minimize the system’s weighted energy consumption××××
[27]SINR, average rate, and mobility-induced time overhead×××××
[28]Maximizing QoE of users××
[29]Maximum long-term overall network utility××××
[30]Maximize the network utility reflecting proportional fairness××××
[31]Weighting of maximized throughput, SINR and minimized delay××
[32]Minimize the weighted energy consumption××
[33]Maximize the residual energy of the satellite××××
Our workMaximum long-term overall system utility
Table 4. Simulation parameters for system.
Table 4. Simulation parameters for system.
Training ParametersValue
Number of terrestrial base stations3
Number of UAV1
Number of ground users100
UAV flight altitude (m)80
Noise Power σ 2 (dBm)−174
Learning rate of actor network LR_a0.001
Learning rate of critic network LR_c0.0015
discount factor γ 0.99
entropy coefficient α ˜ 0.001
Target network soft update coefficient τ 0.005
Replay Buffer Capacity D 100,000
batch size128
Max episode6000
Convergence threshold0.001
Activation functionTanh
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, G.; Sun, F.; Jing, G.; Pang, T. SAC-MS: Joint Slice Resource Allocation, User Association and UAV Trajectory Optimization with No-Fly Zone Constraints. Sensors 2025, 25, 5833. https://doi.org/10.3390/s25185833

AMA Style

Chen G, Sun F, Jing G, Pang T. SAC-MS: Joint Slice Resource Allocation, User Association and UAV Trajectory Optimization with No-Fly Zone Constraints. Sensors. 2025; 25(18):5833. https://doi.org/10.3390/s25185833

Chicago/Turabian Style

Chen, Geng, Fang Sun, Gang Jing, and Tianyu Pang. 2025. "SAC-MS: Joint Slice Resource Allocation, User Association and UAV Trajectory Optimization with No-Fly Zone Constraints" Sensors 25, no. 18: 5833. https://doi.org/10.3390/s25185833

APA Style

Chen, G., Sun, F., Jing, G., & Pang, T. (2025). SAC-MS: Joint Slice Resource Allocation, User Association and UAV Trajectory Optimization with No-Fly Zone Constraints. Sensors, 25(18), 5833. https://doi.org/10.3390/s25185833

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop