Global Path Planning for Land–Air Amphibious Biomimetic Robot Based on Improved PPO
Abstract
1. Introduction
2. Land–Air Amphibious Biomimetic Robot Platform
2.1. Land–Air Amphibious Biomimetic Robot Hardware Platform
2.2. Kinematic Consistency and Energy Analysis
3. The PPO Algorithm
4. Improved PPO Algorithm for Land–Air Environments
4.1. Markov Decision Process of Land–Air Robot
4.1.1. State Space
4.1.2. Action Space
4.1.3. Reward Function
- Distance reward. The distance reward is designed to encourage the land–air amphibious biomimetic robot to approach its target by measuring the change in Euclidean distance between the robot and the target point from the current timestep to the previous one, thereby rewarding progress made during path planning. Its mathematical expression is:
- 2.
- Altitude penalty. The land–air amphibious biomimetic robot has two modes: flight and ground travel, with the energy consumption of the flight mode being significantly higher than that of the ground mode. To reduce the robot’s energy consumption during flight and encourage ground travel, this paper introduces an altitude penalty. Its mathematical expression is:
- 3.
- Collision penalty. To ensure the safety of path planning for the land–air amphibious biomimetic robot in complex environments, this paper introduces a collision penalty to prevent collisions between the robot and obstacles. Its mathematical expression is:
- 4.
- Time penalty. To encourage the land–air amphibious biomimetic robot to accomplish path planning efficiently, this paper introduces a time penalty, designed to prompt the robot to reach the target point quickly while minimizing unnecessary movements, thereby reducing mission execution time. Its mathematical expression is:
- 5.
- Smooth reward. In continuous control tasks, excessively abrupt action variations may lead to system instability, increased energy consumption, and trajectory oscillations. To ensure the smoothness of robotic motions, this paper introduces a smooth reward. Its mathematical expression is:
- 6.
- Obstacle traversal reward. To address scenarios where land–air amphibious biomimetic robots encounter impassable obstacles in ground mode within complex environments, this paper introduces an obstacle traversal reward to incentivize the robot to employ flight mode when necessary for obstacle clearance. Its mathematical expression is:
- 7.
- Terminal Reward. When the land–air amphibious biomimetic robot successfully reaches the target point, a significant positive reward is provided to incentivize task completion. Its mathematical expression is:
4.2. Improved Strategy Network with GRU
4.3. OU Random Noise
4.4. Improved Value Network with Self-Attention Mechanism
- Linear transformation: Assume that the input sequence is , where N is the length of the input sequence and D is the feature dimension of the input. For each attention head i, through different linear transformation matrices , and , we obtain the query matrix , key matrix and value matrix. The calculation formula is shown in Formula (27):
- Where , , .
- Calculate the attention weight: For each attention head, the dot product between the query matrix and the transpose of the key matrix is calculated, and then divided by the scaling factor , and then normalized by the softmax function to obtain the attention weight matrix . The calculation formula is shown in Formula (28):
- Weighted summation: Multiply the attention weight matrix by the value matrix and perform a weighted summation to obtain the output of each head. The calculation formula is shown in Formula (29):
- Concatenation and Fusion: The outputs of all h attention heads are concatenated and then transformed through a linear projection matrix to produce the final output. The calculation formula is shown in Formula (30):
| Algorithm 1: Improved Proximal Policy Optimization (IPPO) for Amphibious Path Planning |
![]() |
5. Experimental Results and Analysis
5.1. Environment and Parameter Configuration
5.2. Ablation Experiment
5.3. Controlled Experiment
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| PPO | Proximal Policy Optimization |
| IPPO | Improved Proximal Policy Optimization |
| GRU | Gate Recurrent Unit |
References
- Kim, C.; Lee, K.; Ryu, S.; Seo, T. Amphibious robot with self-rotating paddle-wheel mechanism. IEEE/ASME Trans. Mechatron. 2023, 28, 1836–1843. [Google Scholar] [CrossRef]
- Zhang, H.; Zhu, Y.; Yang, J.; Zhao, J. A bioinspired paddle-wheeled leg amphibious robot with environment-adaptive autonomously. IEEE/ASME Trans. Mechatron. 2024, 30, 15–26. [Google Scholar] [CrossRef]
- Chen, L.; Cui, R.; Yan, W.; Xu, H.; Zhang, S.; Yu, H. Terrain-adaptive locomotion control for an underwater hexapod robot: Sensing leg–terrain interaction with proprioceptive sensors. IEEE Robot. Autom. Mag. 2023, 31, 41–52. [Google Scholar] [CrossRef]
- Speciale, C.; Milana, S.; Carcaterra, A.; Concilio, A. A Review of Bio-Inspired Perching Mechanisms for Flapping-Wing Robots. Biomimetics 2025, 10, 666. [Google Scholar] [CrossRef] [PubMed]
- Zhang, D.; Xu, M.; Zhu, P.; Guo, C.; Zhong, Z.; Lu, H.; Zheng, Z. The development of a novel terrestrial/aerial robot: Autonomous quadrotor tilting hybrid robot. Robotica 2024, 42, 118–138. [Google Scholar] [CrossRef]
- Alexander, A.; Venkatesan, K.; Mounsef, J.; Ramanujam, K. A Comprehensive Survey of Path Planning Algorithms for Autonomous Systems and Mobile Robots: Traditional and Modern Approaches. IEEE Access 2025, 13, 176287–176326. [Google Scholar] [CrossRef]
- Qin, H.; Shao, S.; Wang, T.; Yu, X.; Jiang, Y.; Cao, Z. Review of autonomous path planning algorithms for mobile robots. Drones 2023, 7, 211. [Google Scholar] [CrossRef]
- Fu, X.; Huang, Z.; Zhang, G.; Wang, W.; Wang, J. Research on path planning of mobile robots based on improved A* algorithm. PeerJ Comput. Sci. 2025, 11, e2691. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Xu, H.; Miao, P.; Petrovic, B.; Rodic, A.; Wang, Z. Path Planning for Mobile Robots Based on Improved A* Algorithm. In Proceedings of the International Conference on Robotics, Automation and Intelligent Control (ICRAIC), Zhangjiajie, China, 22–24 December 2023; pp. 382–387. [Google Scholar]
- He, X.; Zhou, Y.; Liu, H.; Shang, W. Improved RRT*-Connect Manipulator Path Planning in a Multi-Obstacle Narrow Environment. Sensors 2025, 25, 2364. [Google Scholar] [CrossRef] [PubMed]
- Yan, P.; Yan, Z.; Zheng, H.; Guo, J. Real Time Robot Path Planning Method Based on Improved Artificial Potential Field Method. In Proceedings of the Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 4814–4820. [Google Scholar]
- Li, Y.; Zhu, Q. Local path planning based on improved Dynamic window approach. In Proceedings of the Chinese Control Conference (CCC), Shanghai, China, 26–28 July 2021; pp. 4291–4295. [Google Scholar]
- Xu, X.; Zeng, J.; Zhao, Y.; Lü, X. Research on global path planning algorithm for mobile robots based on improved A. Expert Syst. Appl. 2024, 243, 122922. [Google Scholar] [CrossRef]
- Bai, Z.; Pang, H.; He, Z.; Zhao, B.; Wang, T. Path Planning of Autonomous Mobile Robot in Comprehensive Unknown Environment Using Deep Reinforcement Learning. IEEE Internet Things J. 2024, 11, 22153–22166. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, P. Path Planning of a Mobile Robot for a Dynamic Indoor Environment Based on an SAC-LSTM Algorithm. Sensors 2023, 23, 9802. [Google Scholar] [CrossRef] [PubMed]
- Nie, J.; Zhang, G.; Lu, X.; Wang, H.; Sheng, C.; Sun, L. Reinforcement learning method based on sample regularization and adaptive learning rate for AGV path planning. Neurocomputing 2025, 614, 128820. [Google Scholar] [CrossRef]
- Fei, W.; Xiaoping, Z.; Zhou, Z.; Yang, T. Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environments. Chin. J. Aeronaut. 2024, 37, 237–257. [Google Scholar]
- Qi, C.; Wu, C.; Lei, L.; Li, X.; Cong, P. UAV path planning based on the improved PPO algorithm. In Proceedings of the Asia Conference on Advanced Robotics, Automation, and Control Engineering (ARACE), Qingdao, China, 26–28 August 2022. [Google Scholar]
- Chen, S.; Mo, Y.; Wu, X.; Xiao, J.; Liu, Q. Reinforcement Learning-Based Energy-Saving Path Planning for UAVs in Turbulent Wind. Electronics 2024, 13, 3190. [Google Scholar] [CrossRef]
- Cao, X.; Zhang, J.; Xiang, Y.; Yan, Z. SAC-based path planning for amphibious UAVs: A maximum entropy deep reinforcement learning approach. In Proceedings of the International Conference on Image Processing, Intelligent Control and Computer Engineering, Hangzhou, China, 19–25 October 2025. [Google Scholar]
- Mondal, M.S.; Ramasamy, S.; Pranav, A. An Attention-Aware Deep Reinforcement Learning Framework for Cooperative UAV-UGV Routing. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 13687–13693. [Google Scholar]
- Zhou, X.; Zhong, H.; Zhang, H.; He, W.; Hua, H.; Wang, Y. Current Status, Challenges, and Prospects for New Types of Aerial Robots (NTARs). Engineering 2024, 41, 19–34. [Google Scholar] [CrossRef]
- Mandralis, I.; Nemovi, R.; Ramezani, A.; Murray, R.M.; Gharib , M. ATMO: An aerially transforming morphobot for dynamic ground-aerial transition. Commun. Eng. 2025, 4, 74. [Google Scholar] [CrossRef] [PubMed]
- Fang, F.; Zhou, J.; Zhang, Y.; Yi, Y.; Huang, Z.; Feng, Y.; Tao, K.; Li, W.; Zhang, W. A Multimodal Amphibious Robot Driven by Soft Electrohydraulic Flippers. Cyborg Bionic Syst. 2025, 6, 0253. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.; Xiao, J.; Teo, C.W.R.; Li, J.; Feroskhan, M. Air–Ground Collaborative Control for Angle-Specified Heterogeneous Formations. IEEE Trans. Intell. Veh. 2025, 10, 1483–1497. [Google Scholar] [CrossRef]
- Liang, D.; Huang, X.; Xue, Z.; Li, P. Path planning for amphibious unmanned ground vehicles under cross-domain constraints. Intel. Serv. Robot. 2025, 18, 1381–1416. [Google Scholar] [CrossRef]
- Singh, B.; Patel, S.; Vijayvargiya, A.; Kumar, R. Data-driven gait model for bipedal locomotion over continuous changing speeds and inclines. Auton. Robot. 2023, 47, 753–769. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]

















| Category | Parameters | Value/Description |
|---|---|---|
| Simulation Platform | Physics Engine | SOFA |
| Global Environment | Map Dimensions | 50 m × 50 m × 20m |
| Obstacle Distribution | Spatially Dispersed, Locally Clustered | |
| Obstacle Density | 3∼5% | |
| Obstacle Geometry | Cylinders: Radius 0.5 m–2.0 m Cuboids: Edge length 1.0 m–3.0 m Walls and irregular shapes | |
| Computational Setup | Hardware | CPU: Intel Core i7-14700KF GPU: NVIDIA GeForce RTX 4060Ti RAM: 32 GB |
| Software Stack | System: Ubuntu 20.04 Frameworks: Python 3.8, PyTorch 1.7.0. |
| Parameters | Value |
|---|---|
| Training rounds | 4100 |
| Rewards discounts | 0.995 |
| Learning rate | 0.00025 |
| Clipping range | 0.2 |
| GAE factor | 0.95 |
| Batch_size | 256 |
| Maximum steps per episode | 800 |
| Optimizer | Adam |
| (1.0, 5.0, 0.2, 100) | |
| (0.1, 0.1, 50, 0.5) |
| Methods | GRU | OU | Self_Attention |
|---|---|---|---|
| PPO | ✗ | ✗ | ✗ |
| PPO_ATTEN | ✗ | ✗ | ✓ |
| PPO_OU | ✗ | ✓ | ✗ |
| PPO_GRU | ✓ | ✗ | ✗ |
| IPPO(Ours) | ✓ | ✓ | ✓ |
| Env | Algorithm | Average Path Length/m | Average Flight Path Length/m | Average Ground Path Length/m | Average Energy Consumption/kJ |
|---|---|---|---|---|---|
| Env 1 | DDPG | 74.049 | 69.748 | 4.301 | 19.052 |
| PPO | 76.041 | 72.536 | 3.505 | 19.769 | |
| IPPO (Ours) | 79.164 | 24.702 | 54.462 | 9.204 | |
| Env 2 | DDPG | 83.927 | 80.974 | 2.953 | 22.024 |
| PPO | 72.087 | 69.675 | 2.412 | 18.945 | |
| IPPO (Ours) | 81.280 | 39.705 | 41.575 | 12.661 | |
| Env 3 | DDPG | 71.816 | 68.985 | 2.831 | 18.778 |
| PPO | 73.086 | 62.639 | 10.447 | 17.416 | |
| IPPO (Ours) | 75.181 | 28.550 | 46.631 | 9.881 |
| Env | Algorithm | Average Rewards | Average Success Rates |
|---|---|---|---|
| Env 1 | DDPG | 122.4 | 82.6 |
| PPO | 120.6 | 84.0 | |
| IPPO (Ours) | 162.2 | 93.0 | |
| Env 2 | DDPG | 112.6 | 70.5 |
| PPO | 117.5 | 73.3 | |
| IPPO (Ours) | 154.1 | 89.0 | |
| Env 3 | DDPG | 103.6 | 66.0 |
| PPO | 113.2 | 70.8 | |
| IPPO (Ours) | 152.3 | 85.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jiang, W.; Liu, J.; Wang, W.; Wang, Y. Global Path Planning for Land–Air Amphibious Biomimetic Robot Based on Improved PPO. Biomimetics 2026, 11, 25. https://doi.org/10.3390/biomimetics11010025
Jiang W, Liu J, Wang W, Wang Y. Global Path Planning for Land–Air Amphibious Biomimetic Robot Based on Improved PPO. Biomimetics. 2026; 11(1):25. https://doi.org/10.3390/biomimetics11010025
Chicago/Turabian StyleJiang, Weilai, Jingwei Liu, Wei Wang, and Yaonan Wang. 2026. "Global Path Planning for Land–Air Amphibious Biomimetic Robot Based on Improved PPO" Biomimetics 11, no. 1: 25. https://doi.org/10.3390/biomimetics11010025
APA StyleJiang, W., Liu, J., Wang, W., & Wang, Y. (2026). Global Path Planning for Land–Air Amphibious Biomimetic Robot Based on Improved PPO. Biomimetics, 11(1), 25. https://doi.org/10.3390/biomimetics11010025

