Spacecraft Safe Proximity Policy Based on Graph Neural Network Safe Reinforcement Learning
Abstract
1. Introduction
2. Preliminaries and Modeling
2.1. Problem Formulation
2.2. Dynamics of Spacecraft Safe Proximity Scenarios
2.3. Graph Structure of Spacecraft Safe Proximity Missions
2.4. Constrained Markov Decision Process
2.4.1. State and Action
2.4.2. Cost
2.4.3. Reward
- (1)
- Terminal reward
- (2)
- Process reward
- (3)
- Fuel penalty
- (4)
- Obstacle collision penalty
3. Spacecraft Safe Proximity Policy Based on Safe Reinforcement Learning
3.1. Principle of Graph Neural Network
3.2. Soft Actor Critic–Lagrangian Algorithm
| Algorithm 1. SACL Algorithm | |
| ►Initialize parameters | |
| ►Initialize target network weights | |
| ►Initialize an empty replay pool | |
| for each iteration do | |
| for each environment step do | |
| ►Sample action from the policy | |
| ►Sample transition from the environment | |
| ►Store the transition in the replay pool | |
| end for | |
| for each gradient step do | |
| ►Update the reward Q-function parameters | |
| ►Update the cost Q-function parameters | |
| ►Update polity weights | |
| ►Adjust Lagrange-multiplier | |
| ►Adjust temperature parameter | |
| ►Update reward target network weights | |
| ►Update cost target network weights | |
| end for | |
| end for | |
| ►Optimized parameters |
3.3. GAT-SACL Algorithm
4. Simulation and Analysis
4.1. Scenario Settings
4.2. Test Result and Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, W.-J.; Cheng, D.-Y.; Liu, X.-G.; Wang, Y.-B.; Shi, W.-H.; Tang, Z.-X.; Gao, F.; Zeng, F.-M.; Chai, H.-Y.; Luo, W.-B.; et al. On-Orbit Service (OOS) of Spacecraft: A Review of Engineering Developments. Prog. Aerosp. Sci. 2019, 108, 32–120. [Google Scholar] [CrossRef]
- Chen, R.; Chen, Z.; Bai, Y.; Zhao, Y.; Yao, W.; Wang, Y. Ground Experiment of Safe Proximity Control for Complex-Shaped Spacecraft. IEEE Trans. Ind. Electron. 2023, 70, 11535–11543. [Google Scholar] [CrossRef]
- Zhang, J.; Chu, X.; Zhang, Y.; Hu, Q.; Zhai, G.; Li, Y. Safe-Trajectory Optimization and Tracking Control in Ultra-Close Proximity to a Failed Satellite. Acta Astronaut. 2018, 144, 339–352. [Google Scholar] [CrossRef]
- Morgan, D.; Chung, S.-J.; Hadaegh, F.Y. Model Predictive Control of Swarms of Spacecraft Using Sequential Convex Programming. J. Guid. Control Dyn. 2014, 37, 1725–1740. [Google Scholar] [CrossRef]
- Wang, Y.; Chen, X.; Ran, D.; Zhao, Y.; Chen, Y.; Bai, Y. Spacecraft Formation Reconfiguration with Multi-Obstacle Avoidance under Navigation and Control Uncertainties Using Adaptive Artificial Potential Function Method. Astrodynamics 2020, 4, 41–56. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
- Zhou, M.; Luo, J.; Villela, J.; Yang, Y.; Rusu, D.; Miao, J.; Zhang, W.; Alban, M.; Fadakar, I.; Chen, Z.; et al. SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving. arXiv 2010, arXiv:2010.09776. [Google Scholar]
- Tipaldi, M.; Iervolino, R.; Massenio, P.R. Reinforcement Learning in Spacecraft Control Applications: Advances, Prospects, and Challenges. Annu. Rev. Control 2022, 54, 1–23. [Google Scholar] [CrossRef]
- Scorsoglio, A.; Furfaro, R.; Linares, R.; Massari, M. Relative Motion Guidance for Near-Rectilinear Lunar Orbits with Path Constraints via Actor-Critic Reinforcement Learning. Adv. Space Res. 2023, 71, 316–335. [Google Scholar] [CrossRef]
- Federici, L.; Scorsoglio, A.; Zavoli, A.; Furfaro, R. Meta-Reinforcement Learning for Adaptive Spacecraft Guidance during Finite-Thrust Rendezvous Missions. Acta Astronaut. 2022, 201, 129–141. [Google Scholar] [CrossRef]
- Hovell, K.; Ulrich, S. Deep Reinforcement Learning for Spacecraft Proximity Operations Guidance. J. Spacecr. Rocket. 2021, 58, 254–264. [Google Scholar] [CrossRef]
- Yang, L.; Wang, J.; Jiang, J.; Bai, X.; Xu, M.J. Low-orbit Space Debris Warning and Autonomous Collision Avoidance for Space Environment Governance. Phys. Conf. Ser. 2025, 3015, 012005. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, K.; Zhang, Y.; Shi, H.; Tang, L.; Li, M. Near-Optimal Interception Strategy for Orbital Pursuit-Evasion Using Deep Reinforcement Learning. Acta Astronaut. 2022, 198, 9–25. [Google Scholar] [CrossRef]
- Li, X.; Wang, X. Online Solution for Orbital Pursuit-Evasion Game via Heterogeneous Proximal Policy Optimization. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 12044–12058. [Google Scholar] [CrossRef]
- Qu, Q.; Liu, K.; Wang, W.; Lu, J. Spacecraft Proximity Maneuvering and Rendezvous with Collision Avoidance Based on Reinforcement Learning. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 5823–5834. [Google Scholar] [CrossRef]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D. Deterministic Policy Gradient Algorithms. In Proceedings of the International Conference on Machine Learning, ICML, Beijing, China, 21–26 June 2014; JMLR: Cambridge, MA, USA, 2014; pp. 387–395. [Google Scholar]
- Sharma, K.P.; Kumar, I.; Singh, P.P.; Anbazhagan, K.; Albarakati, H.M.; Bhatt, M.W.; Ziyadullayevich, A.A.; Rana, A.; A, S.S. Advancing Spacecraft Rendezvous and Docking through Safety Reinforcement Learning and Ubiquitous Learning Principles. Comput. Hum. Behav. 2024, 153, 108110. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, Q.; Shen, L.; Yuan, B.; Wang, X.; Tao, D. Evaluating Model-Free Reinforcement Learning toward Safety-Critical Tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; AAAI: Washington, DC, USA, 2023; Volume 37, pp. 15313–15321. [Google Scholar]
- Gu, S.; Yang, L.; Du, Y.; Chen, G.; Walter, F.; Wang, J.; Knoll, A. A Review of Safe Reinforcement Learning: Methods, Theories, and Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 11216–11235. [Google Scholar] [CrossRef] [PubMed]
- Garcıa, J.; Fernandez, F. A Comprehensive Survey on Safe Reinforcement Learning. J. Mach. Learn. Res. 2015, 16, 1437–1480. [Google Scholar]
- Ha, S.; Xu, P.; Tan, Z.; Levine, S.; Tan, J. Learning to walk in the real world with minimal human effort. arXiv 2020, arXiv:2002.08550. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A. Soft Actor-Critic Algorithms and Applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Ray, A.; Achiam, J.; Amodei, D. Benchmarking Safe Exploration in Deep Reinforcement Learning. arXiv 2019, arXiv:1910.01708. [Google Scholar]
- Mu, C.; Liu, S.; Lu, M.; Liu, Z.; Cui, L.; Wang, K. Autonomous Spacecraft Collision Avoidance with a Variable Number of Space Debris Based on Safe Reinforcement Learning. Aerosp. Sci. Technol. 2024, 149, 109131. [Google Scholar] [CrossRef]
- Zhang, L.; Shen, L.; Yang, L.; Chen, S.; Yuan, B.; Wang, X.; Tao, D. Penalized Proximal Policy Optimization for Safe Reinforcement Learning. arXiv 2022, arXiv:2205.11814. [Google Scholar] [CrossRef]
- Xue, X.; Yue, X.; Yuan, J. Connectivity Preservation and Collision Avoidance Control for Spacecraft Formation Flying in the Presence of Multiple Obstacles. Adv. Space Res. 2021, 67, 3504–3514. [Google Scholar] [CrossRef]
- Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph Neural Networks: A Review of Methods and Applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
- Khemani, B.; Patil, S.; Kotecha, K.; Tanwar, S. A Review of Graph Neural Networks: Concepts, Architectures, Techniques, Challenges, Datasets, Applications, and Future Directions. J. Big Data 2024, 11, 18. [Google Scholar] [CrossRef]
- Munikoti, S.; Agarwal, D.; Das, L.; Halappanavar, M.; Natarajan, B. Challenges and Opportunities in Deep Reinforcement Learning with Graph Neural Networks: A Comprehensive Review of Algorithms and Applications. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 15051–15071. [Google Scholar] [CrossRef]
- Zhao, B.; Huo, M.; Li, Z.; Yu, Z.; Qi, N. Graph-Based Multi-Agent Reinforcement Learning for Large-Scale UAVs Swarm System Control. Aerosp. Sci. Technol. 2024, 150, 109166. [Google Scholar] [CrossRef]
- Zhao, B.; Huo, M.; Li, Z.; Feng, W.; Yu, Z.; Qi, N.; Wang, S. Graph-Based Multi-Agent Reinforcement Learning for Collaborative Search and Tracking of Multiple UAVs. Chin. J. Aeronaut. 2025, 38, 103214. [Google Scholar] [CrossRef]
- Yang, M.; Liu, G.; Zhou, Z.; Wang, J. Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph Attention Network for UAV Swarms. Drones 2023, 7, 476. [Google Scholar] [CrossRef]
- Lai, Y.; Zhu, Y.; Li, L.; Lan, Q.; Zuo, Y. STGLR: A Spacecraft Anomaly Detection Method Based on Spatio-Temporal Graph Learning. Sensors 2025, 25, 310. [Google Scholar] [CrossRef]
- Jacquet, A.; Infantes, G.; Meuleau, N.; Benazera, E.; Roussel, S.; Baudoui, V.; Guerra, J. Earth Observation Satellite Scheduling with Graph Neural Networks. arXiv 2024, arXiv:2408.15041. [Google Scholar] [CrossRef]
- Clohessy, W.H.; Wiltshire, R.S. Terminal Guidance System for Satellite Rendezvous. J. Aerosp. Sci. 1960, 27, 653–658. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Zaheer, M.; Kottur, S.; Ravanbakhsh, S.; Poczos, B.; Salakhutdinov, R.; Smola, A. Deep sets. arXiv 2017, arXiv:1703.06114. [Google Scholar] [PubMed]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Yang, Q.; Simão, T.D.; Tindemans, S.H.; Spaan, M.T.J. WCSAC: Worst-case soft actor critic for safety-constrained reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; AAAI: Washington, DC, USA, 2023; Volume 35, pp. 10639–10646. [Google Scholar]























| Type | Parameters | Value |
|---|---|---|
| Scenario Setting | Orbit Semi-Major Axis of Target Spacecraft (km) | 6889.577 |
| Observation Space X (m) | [−20, 100] | |
| Observation Space Y (m) | [−20, 100] | |
| The Maximum Distance (m) | 100 | |
| The Maximum Velocity (m/s) | 10 | |
| Integral Step Size (s) | 0.1 | |
| Observation Range of Service Spacecraft (m) | 25 | |
| Initial Setting of Service Spacecraft | Center of Initial Position Area (m) | (90, 90) |
| Radius of Initial Position Area (m) | 10 | |
| Initial Velocity Range | [−0.001, 0.001] | |
| Initial Setting of Obstacle | Radius of Target Spacecraft (m) | 5 |
| Position of Target Spacecraft (m) | (0, 0) | |
| Radius of Space Debris 1 (m) | 6 | |
| Initial Position of Space Debris 1 (m) | (60, 65) | |
| Radius of Space Debris 2 (m) | 6 | |
| Initial Position of Space Debris 2 (m) | (30, 40) | |
| Radius of Space Debris 3 (m) | 5 | |
| Initial Position of Space Debris 3 (m) | (30, 80) | |
| Initial Velocity Range | [−0.01, 0.02] | |
| Constraint | Expected Point Position (m) | (0, 20) |
| Tolerant Distance (m) | 1 | |
| Tolerant Velocity (m/s) | 0.1 | |
| Lower Limit of Safety Distance (m) | 5 |
| Parameters | Value |
|---|---|
| Max Train Step | 2 × 106 |
| Max Step in Episode | 1000 |
| 0.99 | |
| Actor Leaning Rate | 0.0003 |
| Critic Learning Rate | 0.0003 |
| Lagrange Multiplier Learning Rate | 0.0003 |
| Soft Update Factor Initial Value | 0.12 |
| Reply Buffer Size | 1 × 106 |
| Batch Size | 256 |
| Policy | Episode Reward | Episode Cost | Success | Collision | Fuel Consumption (m/s) | Episode Time (s) |
|---|---|---|---|---|---|---|
| GAT-SACL (conservative) | −102.796 [−105.150, −100.489] | 0 | 100% | 0% | 22.855 [22.753, 22.972] | 43.068 [42.942, 43.194] |
| GAT-SACL (radical) | −124.074 [−128.096, −120.363] | 0 | 100% | 0% | 28.640 [28.396, 28.909] | 52.778 [52.519, 53.047] |
| GAT-SAC (conservative) | −119.164 [−122.224, −116.348] | 0 | 100% | 0% | 28.910 [28.795, 29.013] | 64.353 [64.173, 64.542] |
| GAT-SAC (radical) | −123.202 [−126.138, −120.461] | 0 | 100% | 0% | 27.077 [26.996, 27.164] | 56.591 [56.430, 56.757] |
| GPOPS | - | - | 100% | 0% | 13.279 [12.566, 14.096] | 50 |
| Number | Name | Parameters | Value |
|---|---|---|---|
| Case 1 | Random Initial Position of Obstacles | Radius of Initial Position Area (m) | 3 |
| Case 2 | 4 Pieces of Space Debris | Initial Position of Space Debris 4 (m) | [45, 50] |
| Radius of Space Debris 4 (m) | 5 | ||
| Case 3 | 6 Pieces of Space Debris | Initial Position of Space Debris 5 (m) | [20, 60] |
| Radius of Space Debris 5 (m) | 5 | ||
| Initial Position of Space Debris 6 (m) | [50, 100] | ||
| Radius of Space Debris 6 (m) | 5 | ||
| Case 4 | Reduced Sensing Radius | Observation Range of Service Spacecraft (m) | 20 |
| Case 5 | Extended sensing Radius | Observation Range of Service Spacecraft (m) | 30 |
| Case | Policy | Episode Reward | Episode Cost | Success | Collision | Fuel Consumption (m/s) | Episode Time (s) |
|---|---|---|---|---|---|---|---|
| Case 1 | GAT-SACL- (conservative) | −105.406 [−107.851, −103.098] | 5.22 [3.14, 8.09] | 82% [72%, 88%] | 18% [10%, 25%] | 22.855 [22.753, 22.972] | 43.068 [42.942, 43.194] |
| Case 1 | GAT-SACL (radical) | −132.162 [−136.994, −127.7485] | 16.19 [11.8, 21.856] | 62% [51%, 70%] | 38% [28%, 47%] | 28.642 [28.397, 28.909] | 52.766 [52.507, 53.034] |
| Case 2 | GAT-SACL- (conservative) | −103.166 [−105.646, −100.765] | 0.74 [0.20, 1.84] | 95% [88%, 98%] | 5% [1%, 9%] | 22.863 [22.761, 22.979] | 43.063 [42.938, 43.188] |
| Case 2 | GAT-SACL (radical) | −126.693 [−130.737, −122.823] | 5.24 [3.40, 7.67] | 77% [67%, 84%] | 23% [15%, 31%] | 28.640 [28.395, 28.908] | 52.778 [52.519, 53.047] |
| Case 3 | GAT-SACL- (conservative) | −106.133 [−108.835, −103.563] | 6.68 [4.664, 9.03] | 68% [58%, 76%] | 32% [22%, 40%] | 22.859 [22.757, 22.975] | 43.068 [42.942, 43.194] |
| Case 3 | GAT-SACL (radical) | −126.704 [−130.757, −122.834] | 5.25 [3.41, 7.67] | 77% [67%, 84%] | 23% [15%, 31%] | 28.643 [28.399, 28.911] | 52.777 [52.518, 53.046] |
| Case 4 | GAT-SACL- (conservative) | −102.800 [−105.152, −100.493] | 0 | 100% | 0% | 22.864 [22.761, 22.979] | 43.068 [42.943, 43.193] |
| Case 4 | GAT-SACL (radical) | −124.078 [−128.083, −120.364] | 0 | 100% | 0% | 28.652 [28.407, 28.920] | 52.757 [52.496, 53.025] |
| Case 5 | GAT-SACL- (conservative) | −102.797 [−105.150, −100.489] | 0 | 100% | 0% | 22.863 [22.762, 22.979] | 43.063 [42.937, 43.189] |
| Case 5 | GAT-SACL (radical) | −124.067 [−128.083, −120.354] | 0 | 100% | 0% | 28.594 [28.349, 28.861] | 52.779 [52.521, 53.049] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhou, H.; Wang, J.; Dong, M.; Zhao, Y.; Bai, Y.; Chen, R. Spacecraft Safe Proximity Policy Based on Graph Neural Network Safe Reinforcement Learning. Aerospace 2026, 13, 210. https://doi.org/10.3390/aerospace13030210
Zhou H, Wang J, Dong M, Zhao Y, Bai Y, Chen R. Spacecraft Safe Proximity Policy Based on Graph Neural Network Safe Reinforcement Learning. Aerospace. 2026; 13(3):210. https://doi.org/10.3390/aerospace13030210
Chicago/Turabian StyleZhou, Heng, Jingxian Wang, Monan Dong, Yong Zhao, Yuzhu Bai, and Rong Chen. 2026. "Spacecraft Safe Proximity Policy Based on Graph Neural Network Safe Reinforcement Learning" Aerospace 13, no. 3: 210. https://doi.org/10.3390/aerospace13030210
APA StyleZhou, H., Wang, J., Dong, M., Zhao, Y., Bai, Y., & Chen, R. (2026). Spacecraft Safe Proximity Policy Based on Graph Neural Network Safe Reinforcement Learning. Aerospace, 13(3), 210. https://doi.org/10.3390/aerospace13030210
