Robust Collision Avoidance for ASVs Using Deep Reinforcement Learning with Sim2Real Methods in Static Obstacle Environments
Abstract
1. Introduction
2. Simulation Environment and Learning Architecture
2.1. Simulation Environment
2.2. Learning Architecture
3. Markov Decision Process and Sim2Real Methods
3.1. Markov Decision Process
3.1.1. State Space
3.1.2. Action Space
3.1.3. Reward Function
3.2. Sim2Real Methods
3.2.1. Domain Randomization
3.2.2. Curriculum Learning
4. Hyper Parameters and Simulation Scenario and Results
4.1. Hyper Parameters Settting
4.2. Simulation Scenario
4.2.1. Evaluation Methodology
4.2.2. Task and Environment Configuration
4.2.3. Sim2Real Test Conditions
4.3. Training Stability and Behavioral Analysis
4.4. Stage 2 Results
4.5. Stage 3 Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
DRL | Deep Reinforcement Learning |
ASV | Autonomous Surface Vessel |
Sim2Real | Simulation-to-Real |
TD3 | Twin Delayed Deep Deterministic Policy Gradient |
VO | Velocity Obstacle |
PID | Proportional–Integral–Derivative |
ROS | Robot Operating System |
MDP | Markov Decision Process |
PER | Prioritized Experience Replay |
References
- Alam, M.S.; Sudha, S.K.R.; Somayajula, A. AI on the water: Applying drl to autonomous vessel navigation. arXiv 2023, arXiv:2310.14938. [Google Scholar] [CrossRef]
- Yang, S.; Wang, K.; Wang, W.; Wu, H.; Suo, Y.; Chen, G.; Xian, J. Dual-attention proximal policy optimization for efficient autonomous navigation in narrow channels using deep reinforcement learning. Ocean Eng. 2025, 326, 120707. [Google Scholar] [CrossRef]
- Xu, D.; Yang, J.; Zhou, X.; Xu, H. Hybrid path planning method for USV using bidirectional A* and improved DWA considering the manoeuvrability and COLREGs. Ocean Eng. 2024, 298, 117210. [Google Scholar] [CrossRef]
- Cao, S.; Fan, P.; Yan, T.; Xie, C.; Deng, J.; Xu, F.; Shu, Y. Inland waterway ship path planning based on improved RRT algorithm. J. Mar. Sci. Eng. 2022, 10, 1460. [Google Scholar] [CrossRef]
- Jadhav, A.K.; Pandi, A.R.; Somayajula, A. Collision avoidance for autonomous surface vessels using novel artificial potential fields. Ocean Eng. 2023, 288, 116011. [Google Scholar] [CrossRef]
- Johansen, T.A.; Cristofaro, A.; Perez, T. Ship collision avoidance using scenario-based model predictive control. IFAC-Pap. 2016, 49, 14–21. [Google Scholar]
- Li, W.; Wang, Y.; Zhu, S.; Xiao, J.; Chen, S.; Guo, J.; Ren, D.; Wang, J. Path tracking and local obstacle avoidance for automated vehicle based on improved artificial potential field. Int. J. Control Autom. Syst. 2023, 21, 1644–1658. [Google Scholar] [CrossRef]
- Yasukawa, H.; Yoshimura, Y. Introduction of MMG standard method for ship maneuvering predictions. J. Mar. Sci. Technol. 2015, 20, 37–52. [Google Scholar] [CrossRef]
- Shen, H.; Hashimoto, H.; Matsuda, A.; Taniguchi, Y.; Terada, D.; Guo, C. Automatic collision avoidance of multiple ships based on deep Q-learning. Appl. Ocean Res. 2019, 86, 268–288. [Google Scholar] [CrossRef]
- Yuan, J.; Han, M.; Wang, H.; Zhong, B.; Gao, W.; Yu, D. AUV Collision Avoidance Planning Method Based on Deep Deterministic Policy Gradient. J. Mar. Sci. Eng. 2023, 11, 2258. [Google Scholar] [CrossRef]
- Hua, M.; Zhou, W.; Cheng, H.; Chen, Z. Improved DDPG algorithm-based path planning for unmanned surface vehicles. Intell. Robot. 2024, 4, 363–384. [Google Scholar] [CrossRef]
- Meyer, E.; Heiberg, A.; Rasheed, A.; San, O. COLREG-Compliant Collision Avoidance for Unmanned Surface Vehicle Using Deep Reinforcement Learning. IEEE Access 2020, 8, 165344–165364. [Google Scholar] [CrossRef]
- Guan, W.; Cui, Z.; Zhang, X. Intelligent smart marine autonomous surface ship decision system based on improved PPO algorithm. Sensors 2022, 22, 5732. [Google Scholar] [CrossRef] [PubMed]
- Wu, X.; Wei, C.; Guan, D.; Ji, Z. Risk-aware deep reinforcement learning for mapless navigation of unmanned surface vehicles in uncertain and congested environments. Ocean Eng. 2025, 322, 120446. [Google Scholar] [CrossRef]
- Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-real transfer in deep reinforcement learning for robotics: A survey. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, Australia, 1–4 December 2020; pp. 737–744. [Google Scholar]
- Song, R.; Gao, S.; Li, Y. Sim-to-Real in Unmanned Surface Vehicle Control: A System Identification-Based Approach for Enhanced Training Environments. In Proceedings of the 2024 9th International Conference on Electronic Technology and Information Science (ICETIS), Xiamen, China, 10–12 May 2024. [Google Scholar]
- Li, J.; Chavez-Galaviz, J.; Azizzadenesheli, K.; Mahmoudian, N. Dynamic Obstacle Avoidance for USVs Using Cross-Domain Deep Reinforcement Learning and Neural Network Model Predictive Controller. Sensors 2023, 23, 3572. [Google Scholar] [CrossRef]
- Wang, N.; Wang, Y.; Zhao, Y.; Wang, Y.; Li, Z. Sim-to-real: Mapless navigation for USVs using deep reinforcement learning. J. Mar. Sci. Eng. 2022, 10, 895. [Google Scholar] [CrossRef]
- Batista, L.F.; Aravecchia, S.; Hutchinson, S.; Pradalier, C. Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests. arXiv 2025, arXiv:2505.10033. [Google Scholar] [CrossRef]
- Fossen. Marine Systems Simulator (MSS). 2004. Available online: https://github.com/cybergalactic/MSS (accessed on 5 September 2025).
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017. [Google Scholar]
- Narvekar, S.; Peng, B.; Leonetti, M.; Sinapov, J.; Taylor, M.E.; Stone, P. Curriculum learning for reinforcement learning domains: A framework and survey. J. Mach. Learn. Res. 2020, 21, 1–50. [Google Scholar]
- Fiorini, P.; Shiller, Z. Motion planning in dynamic environments using velocity obstacles. Int. J. Robot. Res. 1998, 17, 760–772. [Google Scholar] [CrossRef]
- Blendermann, W. Parameter identification of wind loads on ships. J. Wind Eng. Ind. Aerodyn. 1994, 51, 339–351. [Google Scholar] [CrossRef]
Category | Parameter | Range | Rational |
---|---|---|---|
Sensor | Lidar standard deviation (σ) | σ ~ Uniform [0.01, 0.05] (m) | Lidar Range noise |
Position | Position noise (x) | ϵ ~ Uniform [0, 0.1] (m) | Agent Position noise |
Orientation | Heading angle (ψ) | ϵ ~ Gaussian (0, 0.032) (rad) | Agent Heading noise |
Control | Action noise (a) | ϵ ~ Gaussian (0, 0.052) (−) | Actuator noise |
Parameters | Value |
---|---|
Discount factor | 0.99 |
Learning rate | 0.0001 |
Soft update factor τ | 0.005 |
Target policy noise σ | 0.02 |
Policy noise clip | 0.05 |
Actor update delay | 2 |
Parameters | Value |
---|---|
0.5 | |
5.0 | |
0.2 | |
8.0 | |
−5.0 | |
−4.0 | |
−1.0 | |
−2.0 1500 −1000 −500 |
Parameters | Value(m) |
---|---|
Cylinder radius | 2, 3, 4, 4.5, 5 |
Cube width × length | 3 × 6, 4 × 7, 5 × 6, 5 × 5, 7 × 7 |
Sphere radius | 2, 2.5, 3, 3.5, 4 |
Tetrapod height | 4.5, 5.0, 5.5, 6.0, 6.5 |
Category | Train | Test |
---|---|---|
MAP | Stage 1 | Stages 2–3 |
Obstacle Count | 60 | 80 (Stage 2), 105 (Stage 3) |
Waypoint Distance [m] | 50 | 50 (Stage 2), 80 (Stage 3) |
Current | None | ≤0.2 m/s (random direction) |
Wind | None | ≤5 m/s (random direction) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, C.; Park, S.; Woo, J. Robust Collision Avoidance for ASVs Using Deep Reinforcement Learning with Sim2Real Methods in Static Obstacle Environments. J. Mar. Sci. Eng. 2025, 13, 1727. https://doi.org/10.3390/jmse13091727
Han C, Park S, Woo J. Robust Collision Avoidance for ASVs Using Deep Reinforcement Learning with Sim2Real Methods in Static Obstacle Environments. Journal of Marine Science and Engineering. 2025; 13(9):1727. https://doi.org/10.3390/jmse13091727
Chicago/Turabian StyleHan, Changgyu, Sekil Park, and Joohyun Woo. 2025. "Robust Collision Avoidance for ASVs Using Deep Reinforcement Learning with Sim2Real Methods in Static Obstacle Environments" Journal of Marine Science and Engineering 13, no. 9: 1727. https://doi.org/10.3390/jmse13091727
APA StyleHan, C., Park, S., & Woo, J. (2025). Robust Collision Avoidance for ASVs Using Deep Reinforcement Learning with Sim2Real Methods in Static Obstacle Environments. Journal of Marine Science and Engineering, 13(9), 1727. https://doi.org/10.3390/jmse13091727