Lightweight Obstacle Avoidance for Fixed-Wing UAVs Using Entropy-Aware PPO
Abstract
1. Introduction
- Considering the flight stability and dynamic constraints of fixed-wing UAVs, we propose an optimization framework formulated as an entropy-aware PPO learning model, which incorporates a reward function balancing target approach and path maintenance to ensure smooth and efficient collision avoidance flight trajectories.
- We introduce a strategy updating mechanism based on entropy-aware adjustment to address the challenge of local optimization caused by PPO’s reliance on historical data during training. This mechanism ensures that our algorithm identifies obstacle-avoidance strategies with higher success rates.
- We demonstrate that the proposed framework outperforms other methods in obstacle avoidance efficiency and flight path smoothness through software-in-the-loop and hardware-in-the-loop experiments, and we confirm the feasibility of running the algorithm on edge devices.
2. Related Work
2.1. Fixed-Wing UAV Collision Avoidance
2.2. DRL for Visual Navigation
3. Methodology
3.1. Problem Formulation
3.1.1. State Space
3.1.2. Action Space
3.1.3. Reward Function
4. Lightweight Obstacle Avoidance Using Entropy-Aware PPO
4.1. Overview
4.2. Strategy Selection Mechanism
4.2.1. Balance Exploration and Exploitation
4.2.2. Lowering Sensitivity to Prior Knowledge
4.3. Entropy-Aware PPO Method
Algorithm 1 Entropy-aware PPO for UAV obstacle avoidance |
Initialization: Initialize actor network , critic network . |
Initialize high-quality experience buffer . |
Define environment and parameters: max_episodes, max_timesteps. |
Define hyperparameters: learning rate α, discount factor γ, PPO clip ϵ, weights , guidance coefficient λ. |
Load pre-trained lightweight depth model and encoder . |
for episode = 1 to max_episodes do |
Reset environment and UAV position ; clear episode buffer . |
for timestep t = 1 to max_timesteps do |
Capture RGB image ; compute depth map . |
Encode visual features . |
Get relative distance d and angle α to the target; form state . |
Get action probability and value: , . |
Execute ; observe , collision status , goal status . |
Compute rewards: |
if target reached, else 0; |
if collision, else 0; |
; |
Total reward: |
Store in |
if is high or successful transition then |
Add transition to |
end if |
if collision, target reached, or timestep limit then |
break |
end if |
end for |
for to do |
Compute advantage estimates using GAE |
Compute adaptive entropy:
|
PPO clipped loss:
|
Value function loss: |
Entropy loss: |
Replay buffer imitation loss: |
Total loss:
|
Update θ, ϕ; using Adam on |
Clear |
end for |
end for |
Output: Trained actor policy |
5. Experimental Validation
5.1. Training Settings
5.2. Ablation Studies
5.2.1. Inferred Reward Function
5.2.2. Adaptive Entropy
5.3. Policy Comparison
5.4. Hardware-in-the-Loop Simulation
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Huang, C.; Fang, S.; Wu, H.; Wang, Y.; Yang, Y. Low-Altitude Intelligent Transportation: System architecture, infrastructure, and key technologies. J. Ind. Inf. Integr. 2024, 42, 100694. [Google Scholar] [CrossRef]
- Cohen, A.P.; Shaheen, S.A.; Farrar, E.M. Urban air mobility: History, ecosystem, market potential, and challenges. IEEE Trans. Intell. Transp. Syst. 2021, 22, 6074–6087. [Google Scholar] [CrossRef]
- Lyu, Y.; Wang, W.; Chen, P. Fixed-Wing UAV Based Air-to-Ground Channel Measurement and Modeling at 2.7 GHz in Rural Environment. IEEE Trans. Antennas Propag. 2024, 73, 2038–2052. [Google Scholar] [CrossRef]
- Zhang, A.; Xu, H.; Bi, W.; Xu, S. Adaptive mutant particle swarm optimization based precise cargo airdrop of unmanned aerial vehicles. Appl. Soft Comput. 2022, 130, 109657. [Google Scholar] [CrossRef]
- Lungu, M. Backstepping and dynamic inversion combined controller for auto-landing of fixed wing UAVs. Aerosp. Sci. Technol. 2020, 96, 105526. [Google Scholar] [CrossRef]
- Wang, C.; Wei, Z.; Jiang, W.; Jiang, H.; Feng, Z. Cooperative Sensing Enhanced UAV Path-Following and Obstacle Avoidance with Variable Formation. IEEE Trans. Veh. Technol. 2024, 73, 7501–7516. [Google Scholar] [CrossRef]
- Karaman, S.; Walter, M.R.; Perez, A.; Frazzoli, E.; Teller, S. Anytime motion planning using the RRT. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1478–1483. [Google Scholar]
- Mercy, T.; Van Parys, R.; Pipeleers, G. Spline-based motion planning for autonomous guided vehicles in a dynamic environment. IEEE Trans. Control Syst. Technol. 2017, 26, 2182–2189. [Google Scholar] [CrossRef]
- Wu, J.; Wang, H.; Liu, Y.; Zhang, M.; Wu, T. Learning-based fixed-wing UAV reactive maneuver control for obstacle avoidance. Aerosp. Sci. Technol. 2022, 126, 107623. [Google Scholar] [CrossRef]
- Muñoz-Bañón, M.Á.; Velasco-Sanchez, E.; Candelas, F.A.; Torres, F. OpenStreetMap-based autonomous navigation with lidar naive-valley-path obstacle avoidance. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24428–24438. [Google Scholar] [CrossRef]
- Popov, A.; Gebhardt, P.; Chen, K.; Oldja, R. Nvradarnet: Real-time radar obstacle and free space detection for autonomous driving. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 6958–6964. [Google Scholar]
- Mandloi, D.; Arya, R.; Verma, A.K. Unmanned aerial vehicle path planning based on A* algorithm and its variants in 3D environment. Int. J. Syst. Assur. Eng. Manag. 2021, 12, 990–1000. [Google Scholar] [CrossRef]
- Ma, H.; Meng, F.; Ye, C.; Wang, J.; Meng, M.Q.-H. Bi-Risk-RRT based efficient motion planning for autonomous ground vehicles. IEEE Trans. Intell. Veh. 2022, 7, 722–733. [Google Scholar] [CrossRef]
- Wang, J.; Li, T.; Li, B.; Meng, M.Q.-H. GMR-RRT*: Sampling-based path planning using gaussian mixture regression. IEEE Trans. Intell. Veh. 2022, 7, 690–700. [Google Scholar] [CrossRef]
- Kaufmann, E.; Bauersfeld, L.; Loquercio, A.; Müller, M.; Koltun, V.; Scaramuzza, D. Champion-level drone racing using deep reinforcement learning. Nature 2023, 620, 982–987. [Google Scholar] [CrossRef]
- Wu, K.; Wang, H.; Esfahani, M.A.; Yuan, S. Learn to navigate autonomously through deep reinforcement learning. IEEE Trans. Ind. Electron. 2021, 69, 5342–5352. [Google Scholar] [CrossRef]
- Xue, Y.; Chen, W. A UAV navigation approach based on deep reinforcement learning in large cluttered 3D environments. IEEE Trans. Veh. Technol. 2022, 72, 3001–3014. [Google Scholar] [CrossRef]
- Kulhánek, J.; Derner, E.; Babuška, R. Visual navigation in real-world indoor environments using end-to-end deep reinforcement learning. IEEE Robot. Autom. Lett. 2021, 6, 4345–4352. [Google Scholar] [CrossRef]
- Wu, J.; Zhou, Y.; Yang, H.; Huang, Z.; Lv, C. Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 14745–14759. [Google Scholar] [CrossRef] [PubMed]
- Huang, H.; Zhu, G.; Fan, Z.; Zhai, H.; Cai, Y.; Shi, Z.; Dong, Z.; Hao, Z. Vision-based distributed multi-UAV collision avoidance via deep reinforcement learning for navigation. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 13745–13752. [Google Scholar]
- Wang, C.; Wang, J.; Shen, Y.; Zhang, X. Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 2019, 68, 2124–2136. [Google Scholar] [CrossRef]
- de Jesus, J.C.; Kich, V.A.; Kolling, A.H.; Grando, R.B.; Guerra, R.S.; Drews, P.L.J. Depth-CUPRL: Depth-imaged contrastive unsupervised prioritized representations in reinforcement learning for mapless navigation of unmanned aerial vehicles. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
- Yuan, M.; Pun, M.-O.; Wang, D. Rényi state entropy maximization for exploration acceleration in reinforcement learning. IEEE Trans. Artif. Intell. 2022, 4, 1154–1164. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. In Proceedings of the 34th International Conference on Machine Learning (ICML)–Deep Reinforcement Learning Workshop, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Dijkstra, E.W. A note on two problems in connexion with graphs. Numer. Math. 1959, 1, 269–271. [Google Scholar] [CrossRef]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- Warren, C.W. Global path planning using artificial potential fields. In Proceedings of the 1989 IEEE International Conference on Robotics and Automation, Scottsdale, AZ, USA, 14–19 May 198; IEEE: Piscataway, NJ, USA, 1989; pp. 316–317. [Google Scholar]
- Noreen, I.; Khan, A.; Habib, Z. Optimal path planning using RRT* based approaches: A survey and future directions. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 1–10. [Google Scholar] [CrossRef]
- Babinec, A.; Duchoň, F.; Dekan, M.; Pásztó, P.; Kelemen, M. VFH* TDT (VFH* with Time Dependent Tree): A new laser rangefinder based obstacle avoidance method designed for environment with non-static obstacles. Robot. Auton. Syst. 2014, 62, 1098–1115. [Google Scholar] [CrossRef]
- McLain, T.; Beard, R.W.; Owen, M. Implementing Dubins Airplane Paths on Fixed-Wing UAVs; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Yu, C.; Li, J.; Zhou, J. A constrained differential evolution algorithm to solve UAV path planning in disaster scenarios. Knowl.-Based Syst. 2020, 204, 106209. [Google Scholar] [CrossRef]
- Chai, Z.; Zheng, J.; Xiao, L.; Yan, B.; Qu, P.; Wen, H.; Wang, Y.; Zhou, H.; Sun, H. Multi-strategy fusion differential evolution algorithm for UAV path planning in complex environment. Aerosp. Sci. Technol. 2022, 121, 107287. [Google Scholar] [CrossRef]
- Freitas, E.J.; Cohen, M.W.; Neto, A.A.; Guimarães, F.G.; Pimenta, L.C. DE3D-NURBS: A differential evolution-based 3D path-planner integrating kinematic constraints and obstacle avoidance. Knowl.-Based Syst. 2024, 300, 112084. [Google Scholar] [CrossRef]
- Lindqvist, B.; Mansouri, S.S.; Agha-mohammadi, A.-A.; Nikolakopoulos, G. Nonlinear MPC for collision avoidance and control of UAVs with dynamic obstacles. IEEE Robot. Autom. Lett. 2020, 5, 6001–6008. [Google Scholar] [CrossRef]
- Zhao, D.; Wang, H.; Shao, K.; Zhu, Y. Deep reinforcement learning with experience replay based on SARSA. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
- Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Hernandez-Garcia, J.F.; Sutton, R.S. Understanding multi-step deep reinforcement learning: A systematic study of the DQN target. arXiv 2019, arXiv:1901.07510. [Google Scholar] [CrossRef]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, AAAI, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Wang, Z.; Schaul, T.; Hessel, M.; Van Hasselt, H.; Lanctot, M.; De Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; PMLR: New York, NY, USA, 2016; pp. 1995–2003. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: New York, NY, USA, 2018; pp. 1861–1870. [Google Scholar]
- Lapan, M. Deep Reinforcement Learning Hands-On; Packt Publishing Ltd.: Birmingham, UK, 2018. [Google Scholar]
- Padhy, R.P.; Sa, P.K.; Narducci, F.; Bisogni, C.; Bakshi, S. Monocular vision-aided depth measurement from RGB images for autonomous UAV navigation. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 20, 1–22. [Google Scholar] [CrossRef]
- Martini, M.; Cerrato, S.; Salvetti, F.; Angarano, S.; Chiaberge, M. Position-agnostic autonomous navigation in vineyards with deep reinforcement learning. In Proceedings of the 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), Mexico City, Mexico, 20–24 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 477–484. [Google Scholar]
- Lu, Y.; Chen, Y.; Zhao, D.; Li, D. MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation. Neurocomputing 2021, 421, 140–150. [Google Scholar] [CrossRef]
- Chai, R.; Niu, H.; Carrasco, J.; Arvin, F.; Yin, H.; Lennox, B. Design and experimental validation of deep reinforcement learning-based fast trajectory planning and control for mobile robot in unknown environment. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 5778–5792. [Google Scholar] [CrossRef] [PubMed]
- Jiang, H.; Zhou, Y.; Lin, J.; Xu, Z.; Lv, C.; Liu, Y. Temporal knowledge-aware soft actor-critic for robot navigation. IEEE Trans. Ind. Inform. 2021, 17, 7431–7439. [Google Scholar]
- Bhat, S.F.; Birkl, R.; Wofk, D.; Wonka, P.; Müller, M. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv 2023, arXiv:2302.12288. [Google Scholar]
- Ma, X.; Dai, X.; Bai, Y.; Wang, Y.; Fu, Y. Rewrite the Stars. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5694–5703. [Google Scholar]
- Tan, T.; Cao, G. Deep learning on mobile devices through neural processing units and edge computing. In Proceedings of the IEEE INFOCOM 2022—IEEE Conference on Computer Communications, Virtual, 2–5 May 2022; pp. 1209–1218. [Google Scholar]
- Shah, S.; Dey, D.; Lovett, C.; Kapoor, A. AirSim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
- Berndt, J. JSBSim: An open source flight dynamics model in C++. In Proceedings of the AIAA Modeling and Simulation Technologies Conference and Exhibit, Providence, RI, USA, 16–19 August 2004; p. 4923. [Google Scholar]
- Zhang, Z.; Wu, J.; Dai, J.; He, C. A novel real-time penetration path planning algorithm for stealth UAV in 3D complex dynamic environment. IEEE Access 2020, 8, 122757–122771. [Google Scholar] [CrossRef]
- Paranjape, A.A.; Meier, K.C.; Shi, X.; Chung, S.-J.; Hutchinson, S. Motion primitives and 3D path planning for fast flight through a forest. Int. J. Robot. Res. 2015, 34, 357–377. [Google Scholar] [CrossRef]
Parameter | Value |
---|---|
Air Speed (m/s) | 30 |
Depth Map Size () | 224, 224 |
Reward Term Weight () | 30, −30, 0.5, 1.0 |
Flying Distance Cap (m) | 1300 |
Learning Rate | 0.0003 |
Gamma () | 0.95 |
Clip Range () | 0.3 |
K Epochs | 2 |
Batch Size | 2048 |
Value Loss Coefficient () | 0.5 |
Entropy Loss Coefficient () | 0.1 |
Max Timesteps Per Episode | 60 |
Max episodes | 3000 |
State Dimension | 256 |
Action Dimension | 8 |
Scheme | Entropy-Aware Loss | Collision Rate | Fail Reach | Episode Length | ||
---|---|---|---|---|---|---|
A (baseline) | – | – | – | 73% | 16% | 69.8 |
B | ✓ | ✓ | – | 17% | 15% | 151.4 |
C | ✓ | – | – | 13% | 21% | 173.7 |
D | – | ✓ | – | 9% | 11% | 130.4 |
E (complete) | ✓ | ✓ | ✓ | 5% | 9% | 121.6 |
Success Rate (%, ↓) | Scene I: City | Scene II: Line-Cruising | Scene III: Valley |
---|---|---|---|
Proposed | 86.0 | 80.0 | 74.0 |
PPO | 82.0 | 76.0 | 69.0 |
TRPO | 80.0 | 74.0 | 68.0 |
A3C | 78.0 | 72.0 | 66.0 |
DQN | 77.0 | 70.0 | 64.0 |
DDPG | 76.0 | 68.0 | 62.0 |
Indicator | Scene I | Scene II | Scene III |
---|---|---|---|
Success Rate (Proposed) | 83% | 80% | 74% |
Success Rate (Baseline) | 72% | 86% | – |
Avg. Path Length (m) (Proposed) | 952.3 | 1980.2 | 1727.8 |
Avg. Path Length (m) (Base) | 1080.6 | 1806.3 | – |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, M.; Chai, H.; Zhao, C.; Lyu, Y.; Hu, J. Lightweight Obstacle Avoidance for Fixed-Wing UAVs Using Entropy-Aware PPO. Drones 2025, 9, 598. https://doi.org/10.3390/drones9090598
Su M, Chai H, Zhao C, Lyu Y, Hu J. Lightweight Obstacle Avoidance for Fixed-Wing UAVs Using Entropy-Aware PPO. Drones. 2025; 9(9):598. https://doi.org/10.3390/drones9090598
Chicago/Turabian StyleSu, Meimei, Haochen Chai, Chunhui Zhao, Yang Lyu, and Jinwen Hu. 2025. "Lightweight Obstacle Avoidance for Fixed-Wing UAVs Using Entropy-Aware PPO" Drones 9, no. 9: 598. https://doi.org/10.3390/drones9090598
APA StyleSu, M., Chai, H., Zhao, C., Lyu, Y., & Hu, J. (2025). Lightweight Obstacle Avoidance for Fixed-Wing UAVs Using Entropy-Aware PPO. Drones, 9(9), 598. https://doi.org/10.3390/drones9090598