A New Method for Optimizing Low-Earth-Orbit Satellite Communication Links Based on Deep Reinforcement Learning
Abstract
1. Introduction
2. Dynamic Channel Modeling for LEO Satellite Communication Links
2.1. Free-Space Path Loss Model
2.2. Doppler Frequency Offset Model
2.3. Rain Attenuation Model
2.4. Other Loss Models
3. Performance Index Model of Communication Link
3.1. Normalized Throughput
3.2. Bit Error Rate (BER)
3.3. Transmission Delay
3.4. Power Efficiency
4. Optimization Algorithm of LEO Satellite Communication Link Based on DRL
4.1. Design of State Space
4.2. Design of Mixed Action Space
4.3. Design of Hybrid DRL Algorithm
- Forward propagation: At each time step, the feature extraction network extracts features from the original state to obtain a feature vector .
- Action generation: The PPO–Actor network samples a continuous action based on the current policy. And the DQN-Q network selects the discrete action with the highest Q-value.
- Environmental interaction: Perform composite action , and obtain reward and the next state from the environment (LEO satellite communication link simulator).
- Experience storage: Store the transferred sample in the experience replay buffer.
- Network Update: Sample a batch of data from the buffer. Update DQN branch: Calculate Q-value loss and perform backpropagation. Update PPO branch: Use the sampled data to calculate the advantage function , then update the Actor network by maximizing the clipped objective function , and update the Critic network by minimizing the value function error.
- Offline training phase (ground assisted): Complete the training of all DRL models at ground stations equipped with high-performance computing clusters. Generate a large amount of diverse scenario data using the STK and NS-3 joint simulation platform, covering different orbital heights, weather conditions, and business loads. After training, a lightweight inference model is generated.
- Online reasoning stage (satellite deployment): The lightweight inference model is deployed on the LEO satellite onboard processor to achieve local real-time decision-making.
- Continuous learning mechanism: The model is updated regularly by the ground station: when passing over the top ground station, the satellite can receive the updated model parameters through the high-speed link.
5. Results and Discussion
5.1. Design of Random Training Environment
5.2. Training Process and Hyperparameters of DRL
- Initialization: Randomly initialize the DRL network parameters of the satellite.
- Scene loop: Simulate a complete process of the satellite passing through a ground station (approximately 10–15 min of simulation time) for each training episode.
- Step loop: Observe at each time step (such as 10 ms): The agent obtains the state from the environment and the agent outputs action through the PPO and DQN networks to the environment, while receiving the reward and the new state . It stores the experience tuple in the experience replay buffer. Then, it periodically samples from the buffer and updates the network parameters. The simulation convergence results during the training process are shown in Figure 4.
5.3. Simulation Results and Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hui, M.; Zhai, S.; Wang, D.; Hui, T.; Wang, W.; Du, P.; Gong, F. A review of leo satellite communication payloads for integrated communication, navigation, and remote sensing: Opportunities, challenges, future directions. IEEE Internet Things 2025, 12, 18954–18992. [Google Scholar] [CrossRef]
- Zhou, D.; Sheng, M.; Li, J.; Han, Z. Aerospace integrated networks innovation for empowering 6G: A survey and future challenges. IEEE Commun. Surv. Tutor. 2023, 25, 975–1019. [Google Scholar] [CrossRef]
- Li, J.; Han, C.; Ye, N.; Pan, J.; Yang, K.; An, J. Instant Positioning by Single Satellite: Delay-Doppler Analysis Method Enhanced by Beam-Hopping. IEEE Trans. Veh. Technol. 2025, 9, 14418–14431. [Google Scholar] [CrossRef]
- Kozhaya, S.; Kassas, Z.M. A first look at the OneWeb LEO constellation: Beacons, beams, and positioning. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 7528–7534. [Google Scholar] [CrossRef]
- Boley, A.C.; Byers, M. Satellite mega-constellations create risks in Low Earth Orbit, the atmosphere and on Earth. Sci. Rep. 2021, 11, 10642. [Google Scholar] [CrossRef]
- Osoro, O.B.; Oughton, E.J. A techno-economic framework for satellite networks applied to low earth orbit constellations: Assessing Starlink, OneWeb and Kuiper. IEEE Access 2021, 9, 141611–141625. [Google Scholar] [CrossRef]
- Fernandes, M.A.; Loureiro, P.A.; Fernandes, G.M.; Monteiro, P.P.; Guiomar, F.P. Digitally mitigating Doppler shift in high-capacity coherent FSO LEO-to-earth links. J. Light. Technol. 2023, 41, 3993–4001. [Google Scholar] [CrossRef]
- Shi, J.; Li, Z.; Hu, J.; Tie, Z.; Li, S.; Liang, W.; Ding, Z. OTFS enabled LEO satellite communications: A promising solution to severe doppler effects. IEEE Netw. 2023, 38, 203–209. [Google Scholar] [CrossRef]
- Behera, B.; Raghu, N.; Yadav, A.; Setia, N.; Goyal, D. Satellite-to-Ground Propagation Modelling for High-Frequency Communication Systems. Int. J. Antenn. Propag. 2025, 7, 49–55. [Google Scholar]
- Sabuj, S.R.; Alam, M.S.; Haider, M.; Hossain, M.A.; Pathan, A.S.K. Low Altitude Satellite Constellation for Futuristic Aerial-Ground Communications. CMES-Comp. Model. Eng. Sci. 2023, 136, 1053–1089. [Google Scholar]
- Al-Hraishawi, H.; Chougrani, H.; Kisseleff, S.; Lagunas, E.; Chatzinotas, S. A survey on nongeostationary satellite systems: The communication perspective. IEEE Commun. Surv. Tutor. 2022, 25, 101–132. [Google Scholar] [CrossRef]
- Wang, S.; Li, Q. Satellite computing: Vision and challenges. IEEE Internet Things 2023, 10, 22514–22529. [Google Scholar] [CrossRef]
- Biglieri, E. High-level modulation and coding for nonlinear satellite channels. IEEE Trans. Commun. 2003, 32, 616–626. [Google Scholar] [CrossRef]
- Bischl, H.; Brandt, H.; De Cola, T.; De Gaudenzi, R.; Eberlein, E.; Girault, N.; Alberty, E.; Lipp, S.; Rinaldo, R.; Rislow, B.; et al. Adaptive coding and modulation for satellite broadband networks: From theory to practice. Int. J. Satell. Commun. Netw. 2010, 28, 59–111. [Google Scholar] [CrossRef]
- Huang, J.; Su, Y.; Liu, W.; Wang, F. Adaptive modulation and coding techniques for global navigation satellite system inter-satellite communication based on the channel condition. IET Commun. 2016, 10, 2091–2095. [Google Scholar] [CrossRef]
- Neinavaie, M.; Kassas, Z.M. Cognitive sensing and navigation with unknown OFDM signals with application to terrestrial 5G and Starlink LEO satellites. IEEE J. Sel. Areas Commun. 2023, 42, 146–160. [Google Scholar] [CrossRef]
- Martínez, F.O.; Uribe, G.; Mosquera, F.L. OneWeb: Web content adaptation platform based on W3C Mobile Web Initiative guidelines. Ing. Investig. 2011, 31, 117–126. [Google Scholar] [CrossRef]
- Shi, Y.; Zhang, J.; Letaief, K.B.; Bai, B.; Chen, W. Large-scale convex optimization for ultra-dense cloud-RAN. IEEE Wirel. Commun. 2015, 22, 84–91. [Google Scholar] [CrossRef]
- Zeng, L.; Zhang, C.; Qin, P.; Zhou, Y.; Cai, Y. One Method for Predicting Satellite Communication Terminal Service Demands Based on Artificial Intelligence Algorithms. Appl. Sci. 2024, 14, 6019. [Google Scholar] [CrossRef]
- Zhao, B.; Liu, J.; Wei, Z.; You, I. A deep reinforcement learning based approach for energy-efficient channel allocation in satellite Internet of Things. IEEE Access 2020, 8, 62197–62206. [Google Scholar] [CrossRef]
- Wang, H.; Ouyang, Q.; Xi, W.; Xiang, Y.; Ye, N. Dual Intelligence: Leveraging DRL with Smart Satellites to Counter Intelligent Jamming in Satellite Networks. IEEE Trans. Cogn. Commun. Netw. 2025, 12, 1054–1067. [Google Scholar] [CrossRef]
- Bhattacharyya, A.; Nambiar, S.M.; Ojha, R.; Gyaneshwar, A.; Chadha, U.; Srinivasan, K. Machine Learning and Deep Learning powered satellite communications: Enabling technologies, applications, open challenges, and future research directions. Int. J. Satell. Commun. Netw. 2023, 41, 539–588. [Google Scholar] [CrossRef]
- Deng, B.; Jiang, C.; Yao, H.; Guo, S.; Zhao, S. The next generation heterogeneous satellite communication networks: Integration of resource management and deep reinforcement learning. IEEE Wirel. Commun. 2019, 27, 105–111. [Google Scholar] [CrossRef]
- Huang, J.; Yang, Y.; Yin, L.; He, D.; Yan, Q. Deep reinforcement learning-based power allocation for rate-splitting multiple access in 6G LEO satellite communication system. IEEE Wirel. Commun. Lett. 2022, 11, 2185–2189. [Google Scholar] [CrossRef]
- Ferreira, P.V.R.; Paffenroth, R.; Wyglinski, A.M. Multiobjective reinforcement learning for cognitive satellite communications using deep neural network ensembles. IEEE J. Sel. Areas Commun. 2018, 36, 1030–1041. [Google Scholar] [CrossRef]
- Huang, J.; Yang, Y.; Lee, J.; He, D.; Li, Y. Deep reinforcement learning-based resource allocation for RSMA in LEO satellite-terrestrial networks. IEEE Trans. Commun. 2023, 72, 1341–1354. [Google Scholar] [CrossRef]
- Foschini, G.J.; Chizhik, D.; Gans, M.J.; Papadias, C.; Valenzuela, R.A. Analysis and performance of some basic space-time architectures. IEEE J. Sel. Areas Commun. 2003, 21, 303–320. [Google Scholar] [CrossRef]
- Wang, C.; Ellis, J.D. Dynamic Doppler frequency shift errors: Measurement, characterization, and compensation. IEEE Trans. Instrum. Meas. 2015, 64, 1994–2004. [Google Scholar] [CrossRef]
- Giannetti, F.; Reggiannini, R. Opportunistic rain rate estimation from measurements of satellite downlink attenuation: A survey. Sensors 2021, 21, 5872. [Google Scholar] [CrossRef]




| Parameter | Value Range/Distribution |
|---|---|
| initial elevation angle | 10°~80° |
| orbital altitude | 500~1200 km |
| rainfall rate | 0~50 mm/h (exponential distribution) |
| number of interference sources | 0~4 (Poisson distribution) |
| interference source power | −20~0 dBm (uniform distribution) |
| Rician factor | 5~15 dB (uniform distribution) |
| Packet arrival rate | 0.1~1.0 Mbps (uniform distribution) |
| Hyperparameter | Value |
|---|---|
| PPO learning rate | 3 × 10−4 |
| DQN learning rate | 1 × 10−3 |
| discount factor | 0.99 |
| Experience replay buffer size | 1 × 106 |
| Number | Altitude | Weather Conditions | Initial Elevation | Business Load |
|---|---|---|---|---|
| S1 | 550 km | sunny | 60° | Mild |
| S2 | 550 km | light rain (5 mm/h) | 30° | Moderate |
| S3 | 550 km | moderate rain (15 mm/h) | 10° | Heavy |
| S4 | 975 km | sunny | 75° | Moderate |
| S5 | 975 km | moderate rain (15 mm/h) | 45° | Mild |
| S6 | 975 km | rainstorm (25 mm/h) | 30° | Moderate |
| S7 | 975 km | rainstorm (25 mm/h) | 60° | Heavy |
| S8 | 1200 km | sunny | 45° | Moderate |
| S9 | 1200 km | light rain (5 mm/h) | 30° | Mild |
| S10 | 1200 km | moderate rain (15 mm/h) | 60° | Heavy |
| S11 | 1200 km | rainstorm (25 mm/h) | 45° | Moderate |
| S12 | 975 km | rainstorm (25 mm/h) | 15° | Heavy |
| Scenarios | Methods | Throughput (Mbps) | BER | Delay (ms) | Power (dBm) |
|---|---|---|---|---|---|
| (×10−6) | |||||
| S1 | Fixed strategy | 86.5 | 0.3 | 10.2 | 40.0 |
| ACM | 93.2 | 0.4 | 9.8 | 40.0 | |
| PPO | 95.8 | 0.3 | 9.5 | 36.5 | |
| DQN | 92.1 | 0.5 | 10.0 | 35.8 | |
| New method | 98.4 | 0.2 | 9.2 | 34.2 | |
| S2 | Fixed strategy | 72.3 | 1.2 | 13.5 | 40.0 |
| ACM | 80.5 | 1.5 | 12.8 | 40.0 | |
| PPO | 83.6 | 1.1 | 12.2 | 37.1 | |
| DQN | 79.8 | 1.4 | 12.6 | 36.3 | |
| New method | 86.2 | 0.8 | 11.8 | 35.0 | |
| S3 | Fixed strategy | 48.5 | 8.5 | 20.8 | 40.0 |
| ACM | 55.2 | 3.8 | 18.5 | 40.0 | |
| PPO | 58.7 | 2.6 | 17.3 | 34.8 | |
| DQN | 54.3 | 3.2 | 18.0 | 33.9 | |
| New method | 62.1 | 1.5 | 16.2 | 32.5 | |
| S4 | Fixed strategy | 79.8 | 0.4 | 11.8 | 40.0 |
| ACM | 86.4 | 0.5 | 11.2 | 40.0 | |
| PPO | 88.9 | 0.4 | 10.8 | 37.2 | |
| DQN | 85.2 | 0.6 | 11.1 | 36.4 | |
| New method | 91.5 | 0.3 | 10.5 | 34.8 | |
| S5 | Fixed strategy | 63.2 | 2.8 | 16.2 | 40.0 |
| ACM | 70.1 | 2.2 | 15.1 | 40.0 | |
| PPO | 73.5 | 1.6 | 14.3 | 35.2 | |
| DQN | 69.8 | 1.9 | 14.8 | 34.5 | |
| New method | 76.8 | 1.0 | 13.7 | 33.1 | |
| S6 | Fixed strategy | 45.2 | 28.0 | 22.5 | 40.0 |
| ACM | 58.7 | 5.2 | 18.9 | 40.0 | |
| PPO | 62.3 | 2.8 | 17.2 | 33.6 | |
| DQN | 55.6 | 3.5 | 18.1 | 32.4 | |
| New method | 69.4 | 1.2 | 15.8 | 31.5 | |
| S7 | Fixed strategy | 38.7 | 35.0 | 24.8 | 40.0 |
| ACM | 50.2 | 6.8 | 20.5 | 40.0 | |
| PPO | 54.5 | 3.5 | 18.6 | 34.2 | |
| DQN | 48.9 | 4.2 | 19.3 | 33.1 | |
| New method | 61.2 | 1.8 | 17.0 | 32.0 | |
| S8 | Fixed strategy | 68.5 | 0.5 | 14.5 | 40.0 |
| ACM | 75.2 | 0.6 | 13.8 | 40.0 | |
| PPO | 77.8 | 0.5 | 13.2 | 37.8 | |
| DQN | 74.3 | 0.7 | 13.6 | 36.9 | |
| New method | 80.1 | 0.4 | 12.9 | 35.5 | |
| S9 | Fixed strategy | 60.2 | 1.8 | 17.2 | 40.0 |
| ACM | 67.5 | 2.0 | 16.1 | 40.0 | |
| PPO | 70.3 | 1.4 | 15.3 | 36.8 | |
| DQN | 66.8 | 1.7 | 15.8 | 36.0 | |
| New method | 73.0 | 0.9 | 14.8 | 34.6 | |
| S10 | Fixed strategy | 48.8 | 4.2 | 20.5 | 40.0 |
| ACM | 55.9 | 2.9 | 18.8 | 40.0 | |
| PPO | 59.4 | 2.0 | 17.5 | 35.8 | |
| DQN | 55.1 | 2.4 | 18.2 | 35.0 | |
| New method | 62.7 | 1.3 | 16.5 | 33.8 | |
| S11 | Fixed strategy | 35.6 | 42.0 | 26.5 | 40.0 |
| ACM | 46.8 | 8.5 | 22.3 | 40.0 | |
| PPO | 51.2 | 4.2 | 20.1 | 35.5 | |
| DQN | 45.5 | 5.1 | 21.0 | 34.2 | |
| New method | 57.5 | 2.1 | 18.8 | 33.2 | |
| S12 | Fixed strategy | 28.5 | 95.0 | 28.7 | 40.0 |
| ACM | 42.1 | 12.8 | 22.3 | 40.0 | |
| PPO | 47.6 | 6.5 | 20.1 | 35.2 | |
| DQN | 41.3 | 7.8 | 21.5 | 34.1 | |
| New method | 53.8 | 3.2 | 18.6 | 33.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yu, H.; Li, S.; Wu, J.; Sun, Y.; Wang, L. A New Method for Optimizing Low-Earth-Orbit Satellite Communication Links Based on Deep Reinforcement Learning. Aerospace 2026, 13, 285. https://doi.org/10.3390/aerospace13030285
Yu H, Li S, Wu J, Sun Y, Wang L. A New Method for Optimizing Low-Earth-Orbit Satellite Communication Links Based on Deep Reinforcement Learning. Aerospace. 2026; 13(3):285. https://doi.org/10.3390/aerospace13030285
Chicago/Turabian StyleYu, He, Shengli Li, Junchao Wu, Yanhong Sun, and Limin Wang. 2026. "A New Method for Optimizing Low-Earth-Orbit Satellite Communication Links Based on Deep Reinforcement Learning" Aerospace 13, no. 3: 285. https://doi.org/10.3390/aerospace13030285
APA StyleYu, H., Li, S., Wu, J., Sun, Y., & Wang, L. (2026). A New Method for Optimizing Low-Earth-Orbit Satellite Communication Links Based on Deep Reinforcement Learning. Aerospace, 13(3), 285. https://doi.org/10.3390/aerospace13030285

