Temporally-Aware Deep Reinforcement Learning for Dynamic Obstacle Avoidance in UAVs
Highlights
- A compact two-frame multi-layer light detection and ranging (LiDAR) representation and a convolutional neural network–long short-term memory (CNN-LSTM) recurrent proximal policy optimization (Recurrent PPO) policy are developed to extract local geometric structures and short-term dynamic cues for unmanned aerial vehicle (UAV) obstacle avoidance.
- A velocity-projection action shield is introduced to correct high-risk policy outputs during training and execution, and simulation results show an average success rate of 91.88%, with an average online computation time of 0.78 ms across four test configurations.
- Lightweight temporal LiDAR observations can improve local navigation and short-term collision awareness without explicit obstacle tracking or trajectory prediction.
- The proposed framework provides a computationally efficient obstacle-avoidance strategy that reduces collision risk for small UAVs in mixed static–dynamic environments.
Abstract
1. Introduction
- A lightweight temporal LiDAR representation is designed for UAV dynamic obstacle avoidance. By combining sector-wise minimum pooling, two-frame stacking, and low-dimensional navigation features, the proposed representation preserves local obstacle geometry and short-term dynamic cues while maintaining a compact input dimension.
- A CNN-LSTM-based recurrent policy is trained using Recurrent PPO for UAV velocity control in partially observable dynamic environments. The two-frame observation provides explicit short-term variation information, and the LSTM hidden state further encodes historical observation dependencies, thereby alleviating the myopic behavior of single-frame feedforward policies.
- A velocity-projection action shield is incorporated into both training and execution. The shield smoothly corrects high-risk actions generated by the policy and introduces the safety-shield correction reward into the reward function, which helps reduce collision risk in the tested simulation scenarios and encourages the policy to generate fewer high-risk action outputs.
2. Related Work
2.1. Conventional Obstacle-Avoidance Methods
2.2. Learning-Based Obstacle-Avoidance Methods
2.3. Temporal Modeling and Action Filtering in Reinforcement Learning
3. Materials and Methods
3.1. Problem Formulation
3.2. LiDAR State Representation
3.3. Overall Framework
3.4. Recurrent PPO
3.5. Curriculum Learning and Goal-Guided Action Fusion
3.6. Velocity-Projection Safety Shield
3.7. Reward Function
4. Results
4.1. Simulation Environment and Parameters
4.2. Curriculum Learning and Training Procedure
4.3. Ablation Experiments
4.4. Baseline Comparison
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, T.; Yang, L.; Chang, Y.; Huang, Z.; Jiang, H.; Zheng, Y. A review of dynamic obstacle avoidance for unmanned aerial vehicles (UAVs). In Proceedings of the 2024 7th International Symposium on Autonomous Systems (ISAS), Chongqing, China, 7–9 May 2024; pp. 274–279. [Google Scholar] [CrossRef]
- Merei, A.; Mcheick, H.; Ghaddar, A.; Rebaine, D. A survey on obstacle detection and avoidance methods for UAVs. Drones 2025, 9, 203. [Google Scholar] [CrossRef]
- Xia, W.; Song, F.; Peng, Z. Dynamic obstacle perception technology for UAVs based on LiDAR. Drones 2025, 9, 540. [Google Scholar] [CrossRef]
- Xu, Z.; Jin, H.; Han, X.; Shen, H.; Shimada, K. Intent prediction-driven model predictive control for UAV planning and navigation in dynamic environments. IEEE Robot. Autom. Lett. 2025, 10, 4946–4953. [Google Scholar] [CrossRef]
- Memlikai, G.; Tsintotas, K.A. Reinforcement learning for UAV control: From algorithms to deployment readiness. Machines 2026, 14, 177. [Google Scholar] [CrossRef]
- Zhou, X.; Wang, Z.; Ye, H.; Xu, C.; Gao, F. EGO-Planner: An ESDF-free gradient-based local planner for quadrotors. IEEE Robot. Autom. Lett. 2021, 6, 478–485. [Google Scholar] [CrossRef]
- Xu, Z.; Xiu, Y.; Zhan, X.; Chen, B.; Shimada, K. Vision-aided UAV navigation and dynamic obstacle avoidance using gradient-based B-spline trajectory optimization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 1214–1220. [Google Scholar] [CrossRef]
- Lu, M.; Fan, X.; Chen, H.; Lu, P. FAPP: Fast and adaptive perception and planning for UAVs in dynamic cluttered environments. IEEE Trans. Robot. 2025, 41, 871–886. [Google Scholar] [CrossRef]
- Tordesillas, J.; Lopez, B.T.; How, J.P. FASTER: Fast and safe trajectory planner for flights in unknown environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 1934–1940. [Google Scholar] [CrossRef]
- Xu, Z.; Han, X.; Shen, H.; Jin, H.; Shimada, K. NavRL: Learning safe flight in dynamic environments. IEEE Robot. Autom. Lett. 2025, 10, 3668–3675. [Google Scholar] [CrossRef]
- Fan, X.; Lu, M.; Xu, B.; Lu, P. Flying in highly dynamic environments with end-to-end learning approach. IEEE Robot. Autom. Lett. 2025, 10, 3851–3858. [Google Scholar] [CrossRef]
- Liu, J.; Luo, W.; Zhang, G.; Li, R. Unmanned aerial vehicle path planning in complex dynamic environments based on deep reinforcement learning. Machines 2025, 13, 162. [Google Scholar] [CrossRef]
- Miera, P.; Szolc, H.; Kryjak, T. LiDAR-based drone navigation with reinforcement learning. arXiv 2023, arXiv:2307.14313. [Google Scholar] [CrossRef]
- Kaufmann, E.; Bauersfeld, L.; Loquercio, A.; Müller, M.; Koltun, V.; Scaramuzza, D. Champion-level drone racing using deep reinforcement learning. Nature 2023, 620, 982–987. [Google Scholar] [CrossRef] [PubMed]
- Loquercio, A.; Kaufmann, E.; Ranftl, R.; Müller, M.; Koltun, V.; Scaramuzza, D. Learning high-speed flight in the wild. Sci. Robot. 2021, 6, eabg5810. [Google Scholar] [CrossRef] [PubMed]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Song, X. Design and application of an intelligent decision-making system for unmanned aerial vehicles based on deep reinforcement learning. IEEE Access 2025, 13, 171435–171441. [Google Scholar] [CrossRef]
- de Heuvel, J.; Zeng, X.; Shi, W.; Sethuraman, T.; Bennewitz, M. Spatiotemporal attention enhances lidar-based robot navigation in dynamic environments. IEEE Robot. Autom. Lett. 2024, 9, 4202–4209. [Google Scholar] [CrossRef]
- Hausknecht, M.; Stone, P. Deep recurrent Q-learning for partially observable MDPs. arXiv 2015, arXiv:1507.06527. [Google Scholar] [CrossRef]
- Singla, A.; Padakandla, S.; Bhatnagar, S. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge. IEEE Trans. Intell. Transp. Syst. 2021, 22, 107–118. [Google Scholar] [CrossRef]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- LaValle, S.M. Rapidly-Exploring Random Trees: A New Tool for Path Planning; Technical Report TR 98-11; Department of Computer Science, Iowa State University: Ames, IA, USA, 1998. [Google Scholar]
- Khatib, O. Real-time obstacle avoidance for manipulators and mobile robots. Int. J. Robot. Res. 1986, 5, 90–98. [Google Scholar] [CrossRef]
- Fiorini, P.; Shiller, Z. Motion planning in dynamic environments using velocity obstacles. Int. J. Robot. Res. 1998, 17, 760–772. [Google Scholar] [CrossRef]
- Xu, G.; Wu, T.; Wang, Z.; Wang, Q.; Gao, F. Flying on point clouds with reinforcement learning. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 19–25 October 2025; pp. 7231–7238. [Google Scholar] [CrossRef]
- Xie, Z.; Dames, P. DRL-VO: Learning to navigate through crowded dynamic scenes using velocity obstacles. IEEE Trans. Robot. 2023, 39, 2700–2719. [Google Scholar] [CrossRef]
- Xu, B.; Yan, Z.; Lu, M.; Fan, X.; Luo, Y.; Lin, Y.; Chen, Z.; Chen, Y.; Qiao, Q.; Lu, P. Flow-aided flight through dynamic clutters from point to motion. IEEE Robot. Autom. Lett. 2026, 11, 218–225. [Google Scholar] [CrossRef]
- Luo, W.; Wang, X.; Han, F.; Zhou, Z.; Cai, J.; Zeng, L.; Chen, H.; Chen, J.; Zhou, X. Research on LSTM-PPO obstacle avoidance algorithm and training environment for unmanned surface vehicles. J. Mar. Sci. Eng. 2025, 13, 479. [Google Scholar] [CrossRef]
- Song, S. LSTM-DDPG-based dynamic obstacle avoidance for UAVs in power distribution networks using velocity obstacle modeling. Informatica 2025, 49, 65–74. [Google Scholar] [CrossRef]
- Dalal, G.; Dvijotham, K.; Vecerik, M.; Hester, T.; Paduraru, C.; Tassa, Y. Safe exploration in continuous action spaces. arXiv 2018, arXiv:1801.08757. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Narvekar, S.; Peng, B.; Leonetti, M.; Sinapov, J.; Taylor, M.E.; Stone, P. Curriculum learning for reinforcement learning domains: A framework and survey. J. Mach. Learn. Res. 2020, 21, 1–50. [Google Scholar]
- Panerati, J.; Zheng, H.; Zhou, S.; Xu, J.; Prorok, A.; Schoellig, A.P. Learning to fly—A gym environment with PyBullet physics for reinforcement learning of multi-agent quadcopter control. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 7512–7519. [Google Scholar] [CrossRef]
- Shah, S.; Dey, D.; Lovett, C.; Kapoor, A. AirSim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics; Hutter, M., Siegwart, R., Eds.; Springer: Cham, Switzerland, 2018; pp. 621–635. [Google Scholar] [CrossRef]
- Song, Y.; Naji, S.; Kaufmann, E.; Loquercio, A.; Scaramuzza, D. Flightmare: A flexible quadrotor simulator. In Proceedings of the Conference on Robot Learning (CoRL), Cambridge, MA, USA, 16–18 November 2020; pp. 1147–1157. [Google Scholar]
- Nguyen-Duong-Hoang, P.; Phan-Van, T.; Pham-Ngoc, S.; Dang-Le-Bao, C.; Le-Trung, Q. IsaacLab vs. gym-pybullet-drones: A comparative study of UAV simulators for reinforcement learning. In Proceedings of the RIVF International Conference on Computing and Communication Technologies (RIVF), Ho Chi Minh City, Vietnam, 18–20 December 2025; pp. 179–184. [Google Scholar] [CrossRef]
- Kilic, K.I.; Desoeuvres, A.; Pedersen, C.B.; Vasegaard, A.E.; Nielsen, P. Adaptive artificial potential field method for small autonomous vehicles. Robot. Auton. Syst. 2026, 198, 105364. [Google Scholar] [CrossRef]
- Han, Q.; Ma, X.; Liu, J.; Liu, H.; Yan, Y.; Yang, Q. A hybrid RRT-DWA path planning framework for UAVs in dynamic environments. Sci. Rep. 2026, 16, 3089. [Google Scholar] [CrossRef] [PubMed]
- McNemar, Q. Note on the Sampling Error of the Difference between Correlated Proportions or Percentages. Psychometrika 1947, 12, 153–157. [Google Scholar] [CrossRef] [PubMed]
- Nie, Y.; Zhao, C.; Shi, K.; Chen, Y. Novel Stability Analysis for Time-Delay Systems via Two New Lemmas and Genetic Algorithm. Chaos Solitons Fractals 2026, 208, 118232. [Google Scholar] [CrossRef]
- Fan, H.; Shi, K.; Guo, Z.; Zhou, A.; Cai, J. Finite-Time Synchronization and Mittag–Leffler Synchronization for Uncertain Fractional-Order Delayed Cellular Neural Networks with Fuzzy Operators via Nonlinear Adaptive Control. Fractal Fract. 2025, 9, 634. [Google Scholar] [CrossRef]
- Fan, H.; Shi, K.; Zhou, A.; Meng, F.; Jiang, L. Exploring Fixed-Time Synchronization of Fractional-Order Fuzzy Cellular Neural Networks with Information Interactions and Time-Varying Delays via Adaptive Multi-Module Control. Fractal Fract. 2026, 10, 253. [Google Scholar] [CrossRef]













| Parameter | Value |
|---|---|
| Physical simulation frequency | 240 Hz |
| Policy control frequency | 30 Hz |
| Core operating region | 8 m × 8 m |
| Flight/start altitude range | 1.9–2.1 m |
| Goal altitude | 2.0 m |
| Obstacle height | 4.0 m |
| Obstacle radius | 0.2–0.3 m |
| Maximum UAV speed | 0.6 m/s |
| Dynamic obstacle speed limit | 0.15/0.35 m/s |
| Maximum allowed steps per episode | 1000 steps |
| Parameter | Value |
|---|---|
| Discount factor | 0.99 |
| Generalized advantage estimation (GAE) parameter | 0.95 |
| Clipping range | 0.2 |
| Rollout steps per update | 19,200 |
| Learning rate (initial) | |
| Entropy coefficient | 0.015 |
| Batch size | 256 |
| Value-function coefficient | 0.5 |
| Maximum gradient norm | 0.5 |
| Scenario | Method | Success (%) | Collision (%) | Timeout (%) | Path Length (m) | Online Computation Time (ms) |
|---|---|---|---|---|---|---|
| 6 S/18 D | Proposed | 95.60 ± 1.20 | 4.40 ± 1.20 | 0.00 ± 0.00 | 8.13 ± 0.22 | 0.82 ± 0.05 |
| PPO [33] | 85.80 ± 0.92 | 13.87 ± 0.64 | 0.33 ± 0.31 | 8.77 ± 0.10 | 0.49 ± 0.03 | |
| EGO-Planner [6] | 72.40 ± 2.03 | 27.60 ± 2.03 | 0.00 ± 0.00 | 7.97 ± 0.09 | 0.85 ± 0.01 | |
| Adaptive APF [39] | 80.73 ± 0.23 | 13.87 ± 0.92 | 5.40 ± 1.06 | 11.46 ± 0.13 | 0.51 ± 0.01 | |
| RRT-DWA [40] | 95.93 ± 0.64 | 1.93 ± 0.58 | 2.13 ± 0.12 | 9.15 ± 0.40 | 1.30 ± 0.02 | |
| 25 S/8 D | Proposed | 97.33 ± 0.23 | 2.53 ± 0.23 | 0.13 ± 0.23 | 8.17 ± 0.23 | 0.75 ± 0.02 |
| PPO [33] | 81.00 ± 1.71 | 16.87 ± 2.30 | 2.13 ± 0.61 | 9.03 ± 0.20 | 0.50 ± 0.03 | |
| EGO-Planner [6] | 85.87 ± 1.92 | 14.13 ± 1.92 | 0.00 ± 0.00 | 7.96 ± 0.19 | 0.57 ± 0.01 | |
| Adaptive APF [39] | 41.33 ± 0.61 | 5.67 ± 0.76 | 53.00 ± 0.20 | 12.49 ± 0.31 | 0.46 ± 0.01 | |
| RRT-DWA [40] | 82.93 ± 1.50 | 0.67 ± 0.42 | 16.40 ± 1.91 | 9.89 ± 0.34 | 1.36 ± 0.02 | |
| 16 S/17 D | Proposed | 93.53 ± 0.12 | 6.13 ± 0.42 | 0.33 ± 0.31 | 8.62 ± 0.37 | 0.76 ± 0.01 |
| PPO [33] | 74.33 ± 0.23 | 24.73 ± 0.70 | 0.93 ± 0.76 | 8.78 ± 0.35 | 0.49 ± 0.02 | |
| EGO-Planner [6] | 66.47 ± 1.75 | 33.53 ± 1.75 | 0.00 ± 0.00 | 8.03 ± 0.12 | 0.75 ± 0.09 | |
| Adaptive APF [39] | 39.80 ± 0.60 | 6.40 ± 1.00 | 53.80 ± 0.87 | 12.67 ± 0.30 | 0.47 ± 0.01 | |
| RRT-DWA [40] | 88.40 ± 1.83 | 3.80 ± 0.87 | 7.80 ± 1.11 | 10.28 ± 0.27 | 1.36 ± 0.02 | |
| 8 S/25 D high-speed | Proposed | 81.07 ± 0.90 | 18.93 ± 0.90 | 0.00 ± 0.00 | 8.88 ± 0.20 | 0.77 ± 0.01 |
| PPO [33] | 58.53 ± 1.30 | 41.47 ± 1.30 | 0.00 ± 0.00 | 9.13 ± 0.13 | 0.53 ± 0.01 | |
| EGO-Planner [6] | 37.07 ± 0.70 | 62.93 ± 0.70 | 0.00 ± 0.00 | 7.93 ± 0.48 | 0.84 ± 0.04 | |
| Adaptive APF [39] | 16.93 ± 1.63 | 82.00 ± 1.06 | 1.07 ± 0.58 | 10.36 ± 0.28 | 0.60 ± 0.01 | |
| RRT-DWA [40] | 78.47 ± 1.22 | 20.67 ± 1.42 | 0.87 ± 0.50 | 10.75 ± 0.14 | 1.43 ± 0.03 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, C.; Wang, S. Temporally-Aware Deep Reinforcement Learning for Dynamic Obstacle Avoidance in UAVs. Drones 2026, 10, 505. https://doi.org/10.3390/drones10070505
Liu C, Wang S. Temporally-Aware Deep Reinforcement Learning for Dynamic Obstacle Avoidance in UAVs. Drones. 2026; 10(7):505. https://doi.org/10.3390/drones10070505
Chicago/Turabian StyleLiu, Chang, and Shan Wang. 2026. "Temporally-Aware Deep Reinforcement Learning for Dynamic Obstacle Avoidance in UAVs" Drones 10, no. 7: 505. https://doi.org/10.3390/drones10070505
APA StyleLiu, C., & Wang, S. (2026). Temporally-Aware Deep Reinforcement Learning for Dynamic Obstacle Avoidance in UAVs. Drones, 10(7), 505. https://doi.org/10.3390/drones10070505
