Reinforcement Learning-Based Cloud-Aware HAPS Trajectory Optimization in Soft-Switching Hybrid FSO/RF Cooperative Transmission System
Abstract
1. Introduction
- The cloud-aware HAPS trajectory optimization problem in soft-switching hybrid FSO/RF systems is formulated and solved by a PPO-based DRL approach, under the stochastic moving occluding cloud (SMOC) model derived from the ERA5 dataset.
- A potential-based reward-shaping mechanism within the PPO framework is developed to mitigate sparse decoding feedback of RCs, delivering faster convergence and superior performance over threshold-based HS-PPO schemes.
2. System Model
2.1. Space–Air–Ground Architecture
2.2. Channel Model
2.2.1. FSO Channel Model
2.2.2. RF Channel Model
2.3. Hybrid FSO/RF Systems
2.3.1. Hard Switching
2.3.2. Rateless Coding
3. Trajectory Optimization
3.1. Proximal Policy Optimization
| Algorithm 1 PPO for HAPS Trajectory Optimization | |||
| Require: Policy and value network , hyperparameters | |||
| 1: | for each training iteration do | ||
| 2: | Collect trajectory using | ||
| 3: | Compute returns | ||
| 4: | Compute TD residuals | ||
| 5: | Compute advantages | ||
| 6: | Normalize advantages: | ||
| 7: | Store old policy probabilities | ||
| 8: | for epoch do | ||
| 9: | for mini-batch do | ||
| 10: | Compute | ||
| 11: | Update via gradient ascent on | ||
| 12: | end for | ||
| 13: | end for | ||
| 14: | end for | ||
| 15: | return Optimized policy | ||
3.2. Trajectory Optimization with PPO-Based DRL
3.2.1. RC-PPO
3.2.2. HS-PPO
4. Simulation and Results
4.1. Cloud Field Generation
4.2. PPO Training Configuration
4.3. Results and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Xu, G.; Xu, M.; Zhang, Q.; Song, Z. Cooperative FSO/RF Space-Air-Ground Integrated Network System with Adaptive Combining: A Performance Analysis. IEEE Trans. Wirel. Commun. 2024, 23, 17279–17293. [Google Scholar]
- Samy, R.; Yang, H.C.; Rakia, T.; Alouini, M.S. Space-Air-Ground FSO Networks for High-Throughput Satellite Communications. IEEE Commun. Mag. 2022, 60, 82–87. [Google Scholar] [CrossRef]
- Ata, Y.; Alouini, M.S. HAPS Based FSO Links Performance Analysis and Improvement with Adaptive Optics Correction. IEEE Trans. Wirel. Commun. 2023, 22, 4916–4929. [Google Scholar]
- Zhu, X.; Kahn, J. Free-space optical communication through atmospheric turbulence channels. IEEE Trans. Commun. 2002, 50, 1293–1300. [Google Scholar] [CrossRef]
- Yu, S.; Ding, J.; Fu, Y.; Ma, J.; Tan, L.; Wang, L. Novel approximate and asymptotic expressions of the outage probability and BER in gamma–gamma fading FSO links with generalized pointing errors. Opt. Commun. 2019, 435, 289–296. [Google Scholar] [CrossRef]
- Bag, B.; Das, A.; Ansari, I.S.; Prokeš, A.; Bose, C.; Chandra, A. Performance Analysis of Hybrid FSO Systems Using FSO/RF-FSO Link Adaptation. Photo. J. 2018, 10, 7904417. [Google Scholar]
- Sharma, K.; Kaur, S.; Singh, H. Channel modelling and performance analysis of switching based hybrid FSO/RF communication system. J. Opt. 2025, 1–9. [Google Scholar] [CrossRef]
- Zhang, Q.; Yu, J.; Long, J.; Wang, C.; Chen, J.; Lu, X. A Hybrid RF/FSO Transmission System Based on a Shared Transmitter. Sensors 2025, 25, 2021. [Google Scholar] [CrossRef]
- Alathwary, W.A.; Altubaishi, E.S. Investigating and analyzing the performance of dual-hop hybrid FSO/RF systems. Alex. Eng. J. 2024, 101, 16–24. [Google Scholar] [CrossRef]
- Mashiko, K.; Kawamoto, Y.; Kato, N.; Yoshida, K.; Ariyoshi, M. Combined Control of Coverage Area and HAPS Deployment in Hybrid FSO/RF SAGIN. IEEE Trans. Veh. Technol. 2025, 74, 10819–10828. [Google Scholar] [CrossRef]
- Nikbakht-Sardari, N.; Ghiamy, M.; Akbari, M.E.; Charmin, A. Novel adaptive hard-switching based hybrid RF-FSO system architecture using two threshold values in the presence of atmospheric turbulence and pointing error. Results Eng. 2023, 17, 100813. [Google Scholar] [CrossRef]
- Shokrollahi, A. Raptor Codes. IEEE Trans. Inf. Theory 2007, 52, 2551–2567. [Google Scholar]
- MacKay, D. Fountain codes. IEE Proc. Commun. 2005, 152, 1062–1068. [Google Scholar]
- Ning, Z.; Yang, Y.; Wang, X.; Song, Q.; Guo, L.; Jamalipour, A. Multi-Agent Deep Reinforcement Learning Based UAV Trajectory Optimization for Differentiated Services. IEEE Trans. Mob. Comput. 2024, 23, 5818–5834. [Google Scholar] [CrossRef]
- Srivatsa, V.; Kusuma, S.M. Deep Q-Networks and 5G Technology for Flight Analysis and Trajectory Prediction. In Proceedings of the 2024 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, 12–14 July 2024; pp. 1–6. [Google Scholar]
- Liu, C.H.; Chen, Z.; Tang, J.; Xu, J.; Piao, C. Energy-Efficient UAV Control for Effective and Fair Communication Coverage: A Deep Reinforcement Learning Approach. IEEE J. Sel. Areas Commun. 2018, 36, 2059–2070. [Google Scholar] [CrossRef]
- Almohamad, A.; Ibrahim, M.; Ekin, S.; Hasna, M.; Althunibat, S.; Qaraqe, K. Optimizing Non-Terrestrial Hybrid RF/FSO Links with Reinforcement Learning: Navigating Through Clouds. IEEE Open J. Commun. Soc. 2025, 6, 793–806. [Google Scholar] [CrossRef]
- Wang, Z.; Li, H.; Wu, Z.; Wu, H. A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space. Int. J. Adv. Robot. Syst. 2021, 18, 1729881421989546. [Google Scholar] [CrossRef]
- Liu, Q.; Jiang, Z.; Yang, H.J.; Khosravi, M.; Waite, J.R.; Sarkar, S. Enhancing PPO with Trajectory-Aware Hybrid Policies. arXiv 2025, arXiv:2502.15968. [Google Scholar] [CrossRef]
- Benbouzid, A.M.; Belghachem, N. Performance analysis of OOK and PPM modulation schemes in MIMO-FSO links under gamma-gamma atmospheric turbulence. In Proceedings of the Environmental Effects on Light Propagation and Adaptive Systems VI; Stein, K., Gladysz, S., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, WA, USA, 2023; Volume 12731, p. 127310Q. [Google Scholar] [CrossRef]
- Chen, D.; Hui, J. Parameter estimation of Gamma–Gamma fading channel in free space optical communication. Opt. Commun. 2021, 488, 126830. [Google Scholar] [CrossRef]
- Kim, I.I.; McArthur, B.; Korevaar, E.J. Comparison of laser beam propagation at 785 nm and 1550 nm in fog and haze for optical wireless communications. In Proceedings of the SPIE Optics East, San Jose, CA, USA, 20–21 January 2001. [Google Scholar]
- Le, H.D.; Nguyen, T.V.; Pham, A.T. Cloud Attenuation Statistical Model for Satellite-Based FSO Communications. IEEE Antennas Wirel. Propag. Lett. 2021, 20, 643–647. [Google Scholar] [CrossRef]
- Luini, L.; Nebuloni, R. Radio wave propagation and channel modeling for earth–space systems. In Proceedings of the Impact of Clouds from Ka Band to Optical Frequencies; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
- Lyras, N.K.; Kourogiorgas, C.I.; Panagopoulos, A.D. Cloud Attenuation Statistics Prediction From Ka-Band to Optical Frequencies: Integrated Liquid Water Content Field Synthesizer. IEEE Trans. Antennas Propag. 2017, 65, 319–328. [Google Scholar] [CrossRef]
- Tu, Z.; Zhang, S. Overview of LDPC Codes. In Proceedings of the 7th IEEE International Conference on Computer and Information Technology (CIT 2007), Aizu-Wakamatsu, Japan, 16–19 October 2007; pp. 469–474. [Google Scholar] [CrossRef]
- Luby, M. LT codes. In Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings, Vancouver, BC, Canada, 19 November 2002; pp. 271–280. [Google Scholar] [CrossRef]
- Luini, L.; Capsoni, C. Modeling High-Resolution 3-D Cloud Fields for Earth-Space Communication Systems. IEEE Trans. Antennas Propag. 2014, 62, 5190–5199. [Google Scholar] [CrossRef]
- Xing, Y.; Hsieh, F.; Ghosh, A.; Rappaport, T.S. High Altitude Platform Stations (HAPS): Architecture and System Performance. In Proceedings of the 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), Helsinki, Finland, 25–28 April 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Eastman, R.; Wood, R.; Bretherton, C. Time Scales of Clouds and Cloud-Controlling Variables in Subtropical Stratocumulus from a Lagrangian Perspective. J. Atmos. Sci. 2016, 73, 3079–3091. [Google Scholar] [CrossRef]



| Parameter | Value |
|---|---|
| HAPS altitude () | 20 km |
| Clouds base altitude () | 1 km |
| Clouds max altitude () | 10 km |
| Receiver aperture diameter (D) | 1 m |
| Responsivity of PD (R) | 0.8 |
| Variance of background noise () | 250 W |
| Noise power spectral density () | −100 dB/MHz |
| Optical transmit power () | 1 W |
| RF transmit power () | 1 W |
| Telescope gain of transmitter, receiver () | 70 dB |
| Antenna gain of transmitter, receiver () | 50 dB |
| Optical wavelength () | 1550 nm |
| RF frequency () | 30 GHz |
| Optical bandwidth () | 10 GHz |
| RF bandwidth () | 500 MHz |
| Parameters | Values |
|---|---|
| Feature layers | [128, 64, 32] (ReLU) |
| Policy head init gain | 0.01 |
| Learning rate () | |
| Discount () | 0.99 |
| Clip ratio () | 0.2 |
| Entropy coeff./Value coeff. | 0.01/0.5 |
| Gradient clip | 0.5 |
| Mini-batch size | 64 |
| Update epochs per batch (K) | 10 |
| Total environment steps |
| Weight | Value | Role and Tuning Rationale |
|---|---|---|
| Throughput () | 1 | Fixed to establish reward scale, directly reflecting the objective of maximizing data rate. |
| Decoding Progress () | 5 | Provides dense gradients for decoding progress. Value chosen via grid search. |
| Heading Penalty () | 0.35 | Penalizes abrupt heading changes. Initially small; increased if trajectories exhibited excessive jitter. |
| Distance Penalty () | 0.1 | Encourages the agent to maintain proximity to the GS, accelerating learning convergence. Initially small; increased if the agent strayed too far. |
| Terminal Reward (q) | 100 | Task completion signal. Large scalar reward that reinforces successful mission completion. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Cui, B.; Cai, S.; Wang, L.; Zhang, Z.; Wang, F. Reinforcement Learning-Based Cloud-Aware HAPS Trajectory Optimization in Soft-Switching Hybrid FSO/RF Cooperative Transmission System. Sensors 2026, 26, 948. https://doi.org/10.3390/s26030948
Cui B, Cai S, Wang L, Zhang Z, Wang F. Reinforcement Learning-Based Cloud-Aware HAPS Trajectory Optimization in Soft-Switching Hybrid FSO/RF Cooperative Transmission System. Sensors. 2026; 26(3):948. https://doi.org/10.3390/s26030948
Chicago/Turabian StyleCui, Beibei, Shanyong Cai, Liqian Wang, Zhiguo Zhang, and Feng Wang. 2026. "Reinforcement Learning-Based Cloud-Aware HAPS Trajectory Optimization in Soft-Switching Hybrid FSO/RF Cooperative Transmission System" Sensors 26, no. 3: 948. https://doi.org/10.3390/s26030948
APA StyleCui, B., Cai, S., Wang, L., Zhang, Z., & Wang, F. (2026). Reinforcement Learning-Based Cloud-Aware HAPS Trajectory Optimization in Soft-Switching Hybrid FSO/RF Cooperative Transmission System. Sensors, 26(3), 948. https://doi.org/10.3390/s26030948

