Joint Model Partitioning and Bandwidth Allocation for UAV-Assisted Space–Air–Ground–Sea Integrated Network: A Hybrid A3C-PPO Approach
Abstract
1. Introduction
2. Related Work
- A UAV-assisted MEC architecture is proposed to minimize task execution latency while maximizing completion rates in SAGSIN. Specifically, the framework addresses the resource constraints of intelligent terminals by leveraging UAV-mounted servers. The architecture decouples the complex non-convex optimization problem into two tractable subproblems, the bandwidth allocation strategy for UAVs and the DNN model partitioning for terminals.
- A hybrid optimization framework integrating A3C and PPO is proposed for SAGSIN. Heterogeneous temporal dynamics and structural requirements of bandwidth allocation and DNN partitioning are explicitly recognized. High-frequency adaptation to channel fluctuations is handled by the asynchronous architecture of A3C. Stable topology-aware discrete decisions are governed by the clipped objective function of PPO. Rapid convergence in dynamic environments and robust generalization across model topologies are achieved. The scenario-driven algorithm assignment is fundamentally distinguished from generic decomposition strategies.
- Based on the above framework, the proposed scheduling algorithm quantifies task priority by integrating computation time and remaining available time. Through the dynamic calculation of priority, urgent tasks are guaranteed execution resources, while short tasks are prioritized according to the Short-Job-First principle, avoiding resource monopolization by long-duration tasks that causes service timeouts.
3. System Modeling and Problem Formulation
3.1. Network Model
3.2. DNN Task Latency Analysis
3.3. Task Offloading
3.4. Problem Formulation
4. Joint Model Segmentation and Resource Allocation Based on Deep Reinforcement Learning (DRL)
4.1. MDP Formulation
4.2. Scheduling Algorithm Based on Computation Delay Weighted Remaining Time Priority
4.3. Joint Optimization Method for Bandwidth Allocation and DNN Slicing Based on Weighted Priority Scheduling and A3C-PPO
4.4. Algorithm Design
| Algorithm 1 Training of the Joint Optimization Strategy for Bandwidth Allocation and DNN Task Partitioning Based on A3C-PPO (Worker Side) |
|
5. Simulation Configuration
5.1. Simulation Parameter Settings
5.2. Algorithm Performance Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Talal, M.; Garfan, S.; Qays, R.; Pamucar, D.; Delen, D.; Pedrycz, W.; Alamleh, A.; Alamoodi, A.H.; Zaidan, B.B.; Simic, V.M. A comprehensive systematic review on machine learning application in the 5G-RAN architecture: Issues, challenges, and future directions. J. Netw. Comput. Appl. 2025, 233, 104041. [Google Scholar] [CrossRef]
- Xu, J.; Kishk, M.A.; Alouini, M.S. Space-air-ground-sea integrated networks: Modeling and coverage analysis. IEEE Trans. Wirel. Commun. 2023, 22, 6298–6313. [Google Scholar] [CrossRef]
- Rafique, W.; Qi, L.; Yaqoob, I.; Imran, M.; Rasool, R.U.; Dou, W. Complementing IoT services through software defined networking and edge computing: A comprehensive survey. IEEE Commun. Surv. Tutor. 2020, 22, 1761–1804. [Google Scholar] [CrossRef]
- Cao, L.; Huo, T.; Li, S.; Zhang, X.; Chen, Y.; Lin, G.; Wu, F.; Ling, Y.; Zhou, Y.; Xie, Q. Cost optimization in edge computing: A survey. Artif. Intell. Rev. 2024, 57, 10947. [Google Scholar] [CrossRef]
- Zhu, R.; Boukerche, A.; Yang, Q. An interference-aware and collision-free MAC protocol for underwater wireless sensor networks. ACM Trans. Sens. Netw. 2025, 21, 1–26. [Google Scholar] [CrossRef]
- Song, D.; Zhai, X.; Liu, X.; Liu, Z.; Tan, C.W.; Li, C. Energy-Efficient Trajectory Design and Unsupervised Clustering for UAV-Aided Fair Data Collections with Dense Ground Users. IEEE Internet Things J. 2025, 12, 29555–29569. [Google Scholar] [CrossRef]
- Liao, Z.; Yuan, C.; Zheng, B.; Tang, X. An adaptive deployment scheme of unmanned aerial vehicles in dynamic vehicle networking for complete offloading. IEEE Internet Things J. 2024, 11, 23509–23520. [Google Scholar] [CrossRef]
- Dhuheir, M.; Erbad, A.; Al-Fuqaha, A.; Seid, A.M. Meta reinforcement learning for UAV-assisted energy harvesting IoT devices in disaster-affected areas. IEEE Open J. Commun. Soc. 2024, 5, 2145–2163. [Google Scholar] [CrossRef]
- Zhai, X.B.; Fu, S.; Yi, C.; Liu, Z.; Dong, C.; Tan, C.W. Deep Reinforcement Learning-Based Task Offloading with Collaborative Inference in UAV-Assisted Mobile Edge Computing Networks. IEEE Trans. Intell. Transp. Syst. 2025, 27, 472–482. [Google Scholar] [CrossRef]
- Fan, X.; Chen, Y.; Liu, M.; Zhu, Y.; Li, Z. Joint optimization of data sensing and computing in the air-ground collaborative inference framework: A multi-agent hybrid-action DRL approach. Comput. Netw. 2025, 270, 111540. [Google Scholar] [CrossRef]
- Muy, S.; That, V.; Lee, J.R. Multiple quality-of-services optimization in space-air-ground integrated network: Centralized and decentralized deep reinforcement learning approaches. Eng. Appl. Artif. Intell. 2026, 165, 113548. [Google Scholar] [CrossRef]
- Xu, Y.; Yu, Q. Deep reinforcement learning based computation offloading and resource allocation strategy for maritime internet of things. Comput. Netw. 2025, 264, 111221. [Google Scholar]
- Wan, Z.; Li, J.; Zhu, P.; Wang, D.; Liu, F.; You, X. Performance analysis of multi-UAV aided cell-free radio access network with network-assisted full-duplex for URLLC. IEEE Trans. Commun. 2024, 72, 5810–5822. [Google Scholar]
- Lei, H.; Ran, H.; Ansari, I.S.; Park, K.H.; Pan, G.; Alouini, M.S. DDPG-based aerial secure data collection. IEEE Trans. Commun. 2024, 72, 5179–5193. [Google Scholar] [CrossRef]
- Archana, T.; Aravind, T.; Malini, A.H.; Kalaivani, C.T. Energy-aware adaptive obstacle avoidance based on meta-reinforcement learning with segmentation for UAV trajectory planning. Int. J. Aeronaut. Space Sci. 2025, 27, 598–618. [Google Scholar] [CrossRef]
- Moltajaei Farid, A.; Roshanian, J.; Mouhoub, M. Multiple aerial/ground vehicles coordinated spraying using reinforcement learning. Eng. Appl. Artif. Intell. 2025, 151, 110686. [Google Scholar] [CrossRef]
- Hariz, H.M.; Mosaddegh, S.S.Z.; Mokari, N.; Javan, M.R.; Ar, B.A.; Jorswieck, E.A. AI-based radio resource management and trajectory design for IRS-UAV-assisted PD-NOMA communication. IEEE Trans. Netw. Serv. Manag. 2024, 21, 3385–3400. [Google Scholar]
- Hu, X.; Zhao, H.; He, D.; Zhang, W. Secure communication and resource allocation in double-RIS cooperative-aided UAV-MEC networks. Drones 2025, 9, 587. [Google Scholar]
- Lin, Z.; Yang, J.; Chen, Y.; Xu, C.; Zhang, X. Maritime distributed computation offloading in space-air-ground-sea integrated networks. IEEE Commun. Lett. 2024, 28, 1614–1618. [Google Scholar] [CrossRef]
- Deng, D.; Wang, C.; Xu, L. Joint optimization via deep reinforcement learning for secure-driven NOMA-UAV networks. Chin. J. Aeronaut. 2025, 38, 103616. [Google Scholar]
- Zhou, S.; Niu, X.; Lu, S.; Xiang, Y.; Pu, C. Self-organized anti-jamming reinforcement learning for resource allocation in UAV-assisted networks. Signal Process. 2026, 240, 110346. [Google Scholar] [CrossRef]
- Ahmed, M.; Fatima, N.; Raza, S.; Ali, H.; Qayum, A.; Khan, W.U. Optimizing Resource Allocation and Task Offloading in Multi-UAV MEC Networks. IEEE Access 2025, 13, 68710–68725. [Google Scholar] [CrossRef]
- Xu, J.; Ai, B.; Chen, L.; Cui, Y.; Wang, N. Deep Reinforcement Learning for Computation and Communication Resource Allocation in Multiaccess MEC Assisted Railway IoT Networks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 23797–23808. [Google Scholar] [CrossRef]
- Xue, M.; Wu, H.; Peng, G.; Wolter, K. DDPQN: An Efficient DNN Offloading Strategy in Local-Edge-Cloud Collaborative Environments. IEEE Trans. Serv. Comput. 2022, 15, 640–655. [Google Scholar] [CrossRef]
- Liu, J.; Huo, Y.; Qu, P.; Xu, S.; Liu, Z.; Ma, Q.; Huang, J. FedCD: A hybrid federated learning framework for efficient training with IoT devices. IEEE Internet Things J. 2024, 11, 20040–20050. [Google Scholar] [CrossRef]
- Huang, L.; Feng, X.; Zhang, C.; Qian, L.; Wu, Y. Deep Reinforcement Learning-Based Joint Task Offloading and Bandwidth Allocation for Multi-User Mobile Edge Computing. Digit. Commun. Netw. 2019, 5, 10–17. [Google Scholar] [CrossRef]
- Deng, Q.; Ge, Y.; Ding, Z. A Unifying View of OTFS and Its Many Variants. IEEE Commun. Surv. Tutor. 2025, 27, 3561–3586. [Google Scholar] [CrossRef]










| Parameters | Meaning |
|---|---|
| Number of UAVs | |
| Number of intelligent terminals | |
| Set of UAVs | |
| Set of intelligent terminals | |
| DNN inference task running on intelligent terminal | |
| Total number of layers in the DNN model for task | |
| Total execution latency of task at partitioning point k | |
| Execution latency of the first k layers on the ED (including queuing latency) | |
| Pure computation latency of the first k layers on the ED | |
| Queuing waiting latency on the ED | |
| Transmission latency of the k-th layer data from ED to UAV | |
| Execution latency of the remaining layers on the UAV | |
| Distance between UAV and intelligent terminal | |
| Channel power gain between UAV and intelligent terminal | |
| Bandwidth allocated by UAV to intelligent terminal | |
| Total available bandwidth of UAV | |
| Transmission power from intelligent terminal to UAV | |
| Noise power spectral density at the UAV | |
| Data size offloaded at the k-th layer of task | |
| Maximum number of intelligent terminals that UAV can serve | |
| Maximum tolerable latency for task |
| Parameter Description | Setting |
|---|---|
| A3C discount factor | 0.9 |
| Asynchronous Advantage Actor-Critic network learning rate | , , |
| Parameter synchronization steps between global neural network and worker neural network | 3 |
| PPO discount factor | 0.9 |
| PPO GAE decay coefficient (Generalized Advantage Estimation) | 0.95 |
| PPO clipping parameter (clip_epsilon) | 0.2 |
| Weighting coefficients , , of weighted priority scheduling algorithm | 0.5, 0.25, 0.25 |
| Decay coefficient of weighted priority scheduling algorithm | 2 |
| Total available bandwidth of UAV | 20–60 MHz |
| Noise power spectral density between UAVs | dBm/Hz |
| Channel power gain at 1-m distance | dB |
| Position | DNN Part Executed by Intelligent Terminal | DNN Part Executed by UAV | Data Size (bit) |
|---|---|---|---|
| 1 | None | All | 1,048,576 |
| 2 | Conv + Batch Normalization + ReLU | MaxPool + Conv2 to Conv5 + AvgPool + FC Layer | 2,097,152 |
| 3 | Conv + Batch Normalization + ReLU + MaxPool | Conv2 to Conv5 + AvgPool + FC Layer | 524,288 |
| 4 | Conv + Batch Normalization + ReLU + MaxPool + Conv2 | Conv3 + Conv4 + Conv5 + AvgPool + FC Layer | 524,288 |
| 5 | Conv + Batch Normalization + ReLU + MaxPool + Conv2 + Conv3 | Conv4 + Conv5 + AvgPool + FC Layer | 262,144 |
| 6 | Conv + Batch Normalization + ReLU + MaxPool + Conv2 + Conv3 + Conv4 | Conv5 + AvgPool + FC Layer | 131,072 |
| 7 | Conv + Batch Normalization + ReLU + MaxPool + Conv2 to Conv5 | AvgPool + FC Layer | 65,536 |
| 8 | Conv + Batch Normalization + ReLU + MaxPool + Conv2 to Conv5 + AvgPool | FC Layer | 16,384 |
| 9 | All | None | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lin, Y.; Han, Y.; Wu, M.; Lin, S.; Zhang, X.; Xu, Z. Joint Model Partitioning and Bandwidth Allocation for UAV-Assisted Space–Air–Ground–Sea Integrated Network: A Hybrid A3C-PPO Approach. Entropy 2026, 28, 337. https://doi.org/10.3390/e28030337
Lin Y, Han Y, Wu M, Lin S, Zhang X, Xu Z. Joint Model Partitioning and Bandwidth Allocation for UAV-Assisted Space–Air–Ground–Sea Integrated Network: A Hybrid A3C-PPO Approach. Entropy. 2026; 28(3):337. https://doi.org/10.3390/e28030337
Chicago/Turabian StyleLin, Yuanmo, Yuanyuan Han, Minmin Wu, Shaoyu Lin, Xia Zhang, and Zhiyong Xu. 2026. "Joint Model Partitioning and Bandwidth Allocation for UAV-Assisted Space–Air–Ground–Sea Integrated Network: A Hybrid A3C-PPO Approach" Entropy 28, no. 3: 337. https://doi.org/10.3390/e28030337
APA StyleLin, Y., Han, Y., Wu, M., Lin, S., Zhang, X., & Xu, Z. (2026). Joint Model Partitioning and Bandwidth Allocation for UAV-Assisted Space–Air–Ground–Sea Integrated Network: A Hybrid A3C-PPO Approach. Entropy, 28(3), 337. https://doi.org/10.3390/e28030337
