Unmanned Aerial Vehicle Path Planning in Complex Dynamic Environments Based on Deep Reinforcement Learning
Abstract
:1. Introduction
- Considering the negative impact of dynamic obstacles, the reward function is set up using an artificial potential field to circumvent obstacles and approach the target so that the UAV can obtain the optimal path.
- We introduce a data storage mechanism that alters the distribution of the experience types in the replay memory, thereby accelerating the convergence of the proposed method.
- Based on the scenarios, we apply the Yolov8-StrongSort model to consider dynamic obstacle trajectories and the background of the image output from the model, thereby reducing the difficulty of training the UAV.
- For efficient target tracking, we choose OSNet, a lightweight yet powerful network architecture with exceptional target reidentification capabilities, ensuring stable performance even in complex environments.
2. Reinforcement Learning for the UAV ANCA Problem
2.1. States and Actions
2.2. Reward Function Design Based on Artificial Potential Field Method
2.3. Markov Decision Process
2.4. Examples
3. Methods
3.1. System Framework
3.2. Obstacle Detection and Tracking
3.2.1. Yolov8
3.2.2. Optimization of StrongSort
- NSA Kalman (Noise-Scale Adaptive Kalman Algorithm)
- Target Reidentification
3.3. Path Planning
3.3.1. DQN
3.3.2. Playback Memory Data Storage Mechanism
3.4. Mathematical Analysis of Collision Avoidance
3.4.1. Security Assurance in the Framework of the Markov Decision Process
3.4.2. Safe Path Planning Based on a Lyapunov Stability Analysis
3.4.3. Worst-Case Scenario Assessment
4. Results and Discussion
4.1. Experimental Environment
4.2. Comparative Experiments with Yolov8-StrongSort
4.3. UAV Path Planning
4.3.1. Experimental Setup
4.3.2. UAV Indoor Obstacle Avoidance
4.3.3. Algorithm Performance with Different Obstacle Densities
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
- Algorithm A1.
Algorithm A1. Replay memory data storage mechanism method |
1. Experencestorage(rm) |
2. Initialize experience storage flags, experience deposit ratio pRE, pDE, pSE |
3. Generate a random value p′, and p′ ∈ [0, 1] |
4. Type(rm) ← Determine the type of experience rm |
5. if Type(rm) is RE do |
6. if p′ < pRE do |
7. Fstorage ← true |
8. end if |
9. else if Type(rm) is DE do |
10. if p′ < pRE do |
11. Fstorage ← true |
12. end if |
13. else if Type(rm) is SE do |
14. if p′ < pRE do |
15. Fstorage ← true |
16. end if |
17. end if |
18. Return Fstorage |
- 2.
- Certification
References
- Chen, G.; Sun, D.; Dong, W.; Sheng, X.; Zhu, X.; Ding, H. Computationally efficient trajectory planning for high speed obstacle avoidance of a quadrotor with active sensing. IEEE Robot. Autom. Lett. 2021, 6, 3365–3372. [Google Scholar] [CrossRef]
- Falanga, D.; Kim, S.; Scaramuzza, D. How fast is too fast? the role of perception latency in high-speed sense and avoid. IEEE Robot. Autom. Lett. 2019, 4, 1884–1891. [Google Scholar] [CrossRef]
- Gao, F.; Wu, W.; Gao, W.; Shen, S. Flying on point clouds: Online trajectory generation and autonomous navigation for quadrotors in cluttered environments. J. Field Robot. 2019, 36, 710–733. [Google Scholar] [CrossRef]
- Tordesillas, J.; Lopez, B.T.; How, J.P. Faster: Fast and safe trajectory planner for flights in unknown environments. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 1934–1940. [Google Scholar]
- Zhou, X.; Wang, Z.; Ye, H.; Xu, C.; Gao, F. Ego-planner: An esdf-free gradient-based local planner for quadrotors. IEEE Robot. Autom. Lett. 2020, 6, 478–485. [Google Scholar] [CrossRef]
- Chen, G.; Dong, W.; Sheng, X.; Zhu, X.; Ding, H. An active sense and avoid system for flying robots in dynamic environments. IEEE/ASME Trans. Mechatron. 2021, 26, 668–678. [Google Scholar] [CrossRef]
- Chen, G.; Peng, P.; Zhang, P.; Dong, W. Risk-aware trajectory sampling for quadrotor obstacle avoidance in dynamic environments. IEEE Trans. Ind. Electron. 2023, 70, 12606–12615. [Google Scholar] [CrossRef]
- Lin, J.; Zhu, H.; Alonso-Mora, J. Robust vision-based obstacle avoidance for micro aerial vehicles in dynamic environments. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 2682–2688. [Google Scholar]
- Wang, Y.; Ji, J.; Wang, Q.; Xu, C.; Gao, F. Autonomous flights in dynamic environments with onboard vision. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 1966–1973. [Google Scholar]
- Zhu, H.; Alonso-Mora, J. Chance-constrained collision avoidance for mavs in dynamic environments. IEEE Robot. Autom. Lett. 2019, 4, 776–783. [Google Scholar] [CrossRef]
- Wang, C.C.; Thorpe, C.; Thrun, S.; Hebert, M.; Durrant-Whyte, H. Simultaneous localization, mapping and moving object tracking. Int. J. Robot. Res. 2007, 26, 889–916. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, J.; Chen, Y. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1472–1481. [Google Scholar]
- Wojke, N.; Bewley, A. Simple online and realtime tracking. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3464–3468. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Milan, A.; Leal-Taixé, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multi-object tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar]
- Kang, K.; Li, H.; Yan, J.; Zeng, X.; Yang, B.; Xiao, T.; Zhang, C.; Wang, Z.; Wang, R.; Wang, X.; et al. T-cnn: Tubelets with convolutional neural networks for object detection from videos. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 2896–2907. [Google Scholar] [CrossRef]
- Pal, S.K.; Pramanik, A.; Maiti, J.; Mitra, P. Deep learning in multi-object detection and tracking: State of the art. Appl. Intell. 2021, 51, 6400–6429. [Google Scholar] [CrossRef]
- Sahu, C.K.; Young, C.; Rai, R. Artificial intelligence (AI) in augmented reality (AR)-assisted manufacturing applications: A review. Int. J. Prod. Res. 2021, 59, 4903–4959. [Google Scholar] [CrossRef]
- Dendorfer, P.; Osep, A.; Milan, A.; Schindler, K.; Cremers, D.; Reid, I.; Roth, S.; Leal-Taixé, L. Motchallenge: A benchmark for single-camera multiple target tracking. Int. J. Comput. Vis. 2021, 129, 845–881. [Google Scholar] [CrossRef]
- Yang, B.; Nevatia, R. An online learned CRF model for multi-object tracking. In Proceedings of the CVPR 2011, Providence, RI, USA, 16–21 June 2011; pp. 1397–1404. [Google Scholar]
- Wang, M.; Liu, Y. Learning a neural solver for multiple object tracking. In Proceedings of the CVPR 2019, Long Beach, CA, USA, 15–20 June 2019; pp. 7304–7312. [Google Scholar]
- Wang, D.; Fang, W.; Chen, W.; Sun, T.; Chen, T. Model update strategies about object tracking: A state of the art review. Electronics 2019, 8, 1207. [Google Scholar] [CrossRef]
- Fiaz, M.; Mahmood, A.; Jung, S.K. Tracking noisy targets: A review of recent object tracking approaches. arXiv 2018, arXiv:1802.03098. [Google Scholar]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the ICIP 2016, Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
- Pang, J.; Qiu, L.; Li, X.; Chen, H.; Li, Q.; Yu, F. Quasi-dense similarity learning for multiple object tracking. In Proceedings of the CVPR 2021, Nashville, TN, USA, 20–25 June 2021; pp. 164–173. [Google Scholar]
- Wang, Q.; Zheng, Y.; Pan, P.; Xu, Y. Multiple object tracking with correlation learning. In Proceedings of the CVPR 2021, Nashville, TN, USA, 20–25 June 2021; pp. 3876–3886. [Google Scholar]
- Bergmann, P.; Meinhardt, T.; Leal-Taixe, L. Tracking without bells and whistles. In Proceedings of the ICCV 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 941–951. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the NIPS 2015, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Zhang, J.; Zhou, S.; Chang, X.; Wan, F.; Wang, J.; Wu, Y.; Huang, D. Multiple object tracking by flowing and fusing. arXiv 2020, arXiv:2001.11180. [Google Scholar]
- Zhou, X.; Koltun, V.; Krähenbühl, P. Tracking objects as points. In Proceedings of the ECCV 2020, Glasgow, UK, 23–28 August 2020; pp. 474–490. [Google Scholar]
- Wu, Z.; Dong, S.; Yuan, M.; Cui, J.; Zhao, L.; Tong, C. Rotate artificial potential field algorithm toward 3D real-time path planning for unmanned aerial vehicle. Proc. Inst. Mech. Eng. G J. Aerosp. Eng. 2023, 237, 940–955. [Google Scholar] [CrossRef]
- Yang, K.; Gan, S.K.; Sukkarieh, S. A Gaussian process-based RRT planner for the exploration of an unknown and cluttered environment with a UAV. Adv. Robot. 2013, 27, 431–443. [Google Scholar] [CrossRef]
- Liang, H.; Bai, H.; Sun, R.; Sun, R.; Li, C. Three-dimensional path planning based on DEM. In Proceedings of the 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 5980–5987. [Google Scholar]
- Baek, J.; Han, S.I.; Han, Y. Energy-efficient UAV routing for wireless sensor networks. IEEE Trans. Veh. Technol. 2020, 69, 1741–1750. [Google Scholar] [CrossRef]
- Li, W.; Wang, L.; Zou, A.; Cai, J.; He, H.; Tan, T. Path planning for UAV based on improved PRM. Energies 2022, 15, 7267. [Google Scholar] [CrossRef]
- Roberge, V.; Tarbouchi, M.; Labonte, G. Comparison of parallel genetic algorithm and particle swarm optimization for real-time UAV path planning. IEEE Trans. Ind. Inform. 2013, 9, 132–141. [Google Scholar] [CrossRef]
- Geng, Q.; Zhao, Z. A kind of route planning method for UAV based on improved PSO algorithm. In Proceedings of the 25th Chinese Control and Decision Conference (CCDC), Guiyang, China, 25–27 May 2013; pp. 2328–2331. [Google Scholar]
- Qu, C.; Gai, W.; Zhong, M.; Zhang, J. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning. Appl. Soft Comput. 2020, 89, 106099. [Google Scholar] [CrossRef]
- Chai, X.; Zheng, Z.; Xiao, J.; Yan, L.; Qu, B.; Wen, P.; Wang, H.; Zhou, Y.; Sun, H. Multi-strategy fusion differential evolution algorithm for UAV path planning in complex environment. Aerosp. Sci. Technol. 2022, 121, 107287. [Google Scholar] [CrossRef]
- Singla, A.; Padakandla, S.; Bhatnagar, S. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge. IEEE Trans. Intell. Transp. Syst. 2021, 22, 107–118. [Google Scholar] [CrossRef]
- Feng, S.; Shu, H.; Xie, B. 3D environment path planning based on improved deep reinforcement learning. Comput. Appl. Softw. 2021, 38, 250–255. [Google Scholar]
- Ruan, X.G.; Ren, D.Q.; Zhu, X.Q.; Huang, J. Mobile robot navigation based on deep reinforcement learning. In Proceedings of the 2019 Chinese Control and Decision Conference (CCDC), Nanchang, China, 3–5 June 2019; IEEE Press: Piscataway, NJ, USA, 2019; pp. 6174–6178. [Google Scholar]
- Zhang, W.; Zhang, W.; Song, F.; Long, L. Monocular vision obstacle avoidance method for quadcopter based on deep learning. J. Comput. Appl. 2019, 39, 1001. [Google Scholar]
- Taghibakhshi, A.; Ogden, N.; West, M. Local navigation and docking of an autonomous robot mower using reinforcement learning and computer vision. In Proceedings of the 2021 13th International Conference on Computer and Automation Engineering (ICCAE), Melbourne, Australia, 20–22 March 2021; pp. 10–14. [Google Scholar]
- Lai, Y.-C.; Huang, Z.-Y. Detection of a moving UAV based on deep learning-based distance estimation. Remote Sens. 2020, 12, 3035. [Google Scholar] [CrossRef]
- Zhang, S.; Sutton, R.S. A deeper look at experience replay. arXiv 2017, arXiv:1712.01275. [Google Scholar]
- Li, X.J.; Liu, H.; Li, J.Q.; Li, Y. Deep deterministic policy gradient algorithm for crowd-evacuation path planning. Comput. Ind. Eng. 2021, 161, 107621. [Google Scholar] [CrossRef]
- Jafari, O.H.; Mitzel, D.; Leibe, B. Real-time RGB-D based people detection and tracking for mobile robots and head-worn cameras. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 5636–5643. [Google Scholar]
- Oleynikova, H.; Honegger, D.; Pollefeys, M. Reactive avoidance using embedded stereo vision for MAV flight. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 50–56. [Google Scholar]
- Pfeiffer, M.; Paolo, G.; Sommer, H.; Nieto, J.I.; Siegwart, R.; Cadena, C. A data-driven model for interaction-aware pedestrian motion prediction in object cluttered environments. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 5921–5928. [Google Scholar]
- Wang, C.; Wang, Y.; Xu, M.; Crandall, D.J. Stepwise goal-driven networks for trajectory prediction. IEEE Robot. Autom. Lett. 2022, 7, 2716–2723. [Google Scholar] [CrossRef]
- Wulfmeier, M.; Rao, D.; Wang, D.Z.; Ondruska, P.; Posner, I. Large-scale cost function learning for path planning using deep inverse reinforcement learning. Int. J. Robot. Res. 2017, 36, 1073–1087. [Google Scholar] [CrossRef]
- Eppenberger, T.; Cesari, G.; Dymczyk, M.; Siegwart, R.; Dubé, R. Leveraging stereo-camera data for real-time dynamic obstacle detection and tracking. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 10528–10535. [Google Scholar]
- Kalman, R.E. A new approach to linear filtering and prediction problems. Trans. ASME–J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
- Evangelidis, G.D.; Psarakis, E.Z. Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1858–1865. [Google Scholar] [CrossRef]
- Baker, S.; Matthews, I. Equivalence and efficiency of image alignment algorithms. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), Kauai, HI, USA, 8–14 December 2001. [Google Scholar]
- Du, Y.; Wan, J.; Zhao, Y.; Zhang, B.; Tong, Z.; Dong, J. Giaotracker: A comprehensive framework for mcmot with global information and optimizing strategies in visdrone 2021. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2809–2819. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. Strongsort: Make deepsort great again. IEEE Trans. Multimed. 2023, 25, 8725–8737. [Google Scholar] [CrossRef]
- Hu, Y.; Tang, H.; Pan, G. Spiking deep residual networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 5200–5205. [Google Scholar] [CrossRef]
- Sun, H.; Demanet, L. Beyond correlations: Deep learning for seismic interferometry. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 3385–3396. [Google Scholar] [CrossRef]
- Qiao, S.; Chen, L.C.; Yuille, A. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10213–10224. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Jiang, W.; Bao, C.; Xu, G.; Wang, Y. Research on autonomous obstacle avoidance and target tracking of UAV based on improved dueling DQN algorithm. In Proceedings of the 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021; pp. 5110–5115. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Vieillard, N.; Pietquin, O.; Geist, M. Munchausen reinforcement learning. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Volume 33, pp. 4235–4246. [Google Scholar]
- Tu, G.-T.; Juang, J.-G. UAV path planning and obstacle avoidance based on reinforcement learning in 3d environments. Actuators 2023, 12, 57. [Google Scholar] [CrossRef]
- Choi, D.; Kim, D.; Lee, K. Enhanced potential field-based collision avoidance in cluttered three-dimensional urban environments. Appl. Sci. 2021, 11, 11003. [Google Scholar] [CrossRef]
- Sung, M.; Karumanchi, S.H.; Gahlawat, A.; Hovakimyan, N. Robust model based reinforcement learning using L1 adaptive control. In Proceedings of the 12th International Conference on Learning Representations (ICLR 2024), Vienna, Austria, 7–11 May 2024. [Google Scholar]
Parameters Meanings of Parameters | Meanings of Different Values | ||
---|---|---|---|
Yes | No | ||
Obstacles detected or not | 1 | 0 | |
Collision or not | 1 | 0 | |
Out of bounds or not | 1 | 0 | |
Arrival at destination or not | 1 | 0 |
Models | IDF1 ↑ | IDP ↑ | IDR ↑ | FP ↓ | IDSW ↑ | MOTA ↑ | Time (ms) ↓ |
---|---|---|---|---|---|---|---|
Deep-SORT | 63.7 | 63.8 | 63.7 | 593 | 54 | 87.2 | 33.2 |
ByteTrack | 83.5 | 83.3 | 83.2 | 488 | 27 | 90.4 | 13.2 |
StrongSort | 81.2 | 81.4 | 81.3 | 483 | 24 | 90.5 | 43.5 |
YL-SS | 88.2 | 88.6 | 86.3 | 321 | 12 | 90.6 | 28.4 |
Parameters | Values |
---|---|
Learning rate | 0.01 |
Discount factor | 0.9 |
Pre-training steps | 800 |
Mini-batch size | 128 |
Replay memory size | 1000 |
Network update frequency | 50 |
Model | Average Moving Step | Number of Success | Success Rate (%) |
---|---|---|---|
DQN [65] | 29.65 | 163 | 54.33 |
Dueling DQN [66] | 28.25 | 167 | 55.67 |
M-DQN [67] | 27.96 | 171 | 57.00 |
Improved Q-learning [68] | 26.47 | 176 | 58.67 |
DDM-DQN | 25.98 | 182 | 60.67 |
EPF [69] | —— | 195 | 65.00 |
APF-DQN | 23.08 | 209 | 69.67 |
L1-MBRL [70] | —— | 217 | 72.33 |
YS-DADQN | 20.71 | 233 | 77.67 |
Obstacle Density (%) | Average Moving Step | Success Rate (%) |
---|---|---|
10% (low density) | 18.2 | 85% |
30% (medium density) | 20.4 | 78% |
50% (high density) | 22.9 | 72% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Luo, W.; Zhang, G.; Li, R. Unmanned Aerial Vehicle Path Planning in Complex Dynamic Environments Based on Deep Reinforcement Learning. Machines 2025, 13, 162. https://doi.org/10.3390/machines13020162
Liu J, Luo W, Zhang G, Li R. Unmanned Aerial Vehicle Path Planning in Complex Dynamic Environments Based on Deep Reinforcement Learning. Machines. 2025; 13(2):162. https://doi.org/10.3390/machines13020162
Chicago/Turabian StyleLiu, Jiandong, Wei Luo, Guoqing Zhang, and Ruihao Li. 2025. "Unmanned Aerial Vehicle Path Planning in Complex Dynamic Environments Based on Deep Reinforcement Learning" Machines 13, no. 2: 162. https://doi.org/10.3390/machines13020162
APA StyleLiu, J., Luo, W., Zhang, G., & Li, R. (2025). Unmanned Aerial Vehicle Path Planning in Complex Dynamic Environments Based on Deep Reinforcement Learning. Machines, 13(2), 162. https://doi.org/10.3390/machines13020162