An Improved Deep Reinforcement Learning-Based UAV Area Coverage Algorithm for an Unknown Dynamic Environment
Abstract
1. Introduction
- (1)
- A DAM-DDQN algorithm has been proposed for dynamic obstacle avoidance in UAV area coverage. It can not only avoid dynamic, unknown obstacles, but also improve the coverage efficiency by reducing the repeated coverage rate.
- (2)
- A DAM is designed to realize the feature information fusion of local obstacle information and full-area coverage information for the flight environment.
- (3)
- An adaptive exploration decay DDQN algorithm is proposed. In this algorithm, the decay strategy is designed based on the real-time coverage rate. Meanwhile, a reward function is designed to solve sparse reward problems. The improved DDQN algorithm accelerates the convergence process.
2. Problem Formulation
3. Area Coverage Algorithm Based on DAM-DDQN
3.1. UAV State Space
3.2. UAV Action Space
3.3. Reward Function Design
3.4. DAM-DDQN Area Coverage Algorithm
3.4.1. DAM Algorithm for Unknown Dynamic Environments
- (1)
- Traditional AM algorithm
- (2)
- Improved DAM algorithm
3.4.2. Improved DDQN Algorithm
3.4.3. DAM-DDQN Algorithm Implementation
Algorithm 1.DAM-DDQN |
|
|
4. Simulation Analysis
4.1. Simulation Platform
4.2. Parameter Setting
4.3. Evaluation Criteria
- (1)
- Coverage rate
- (2)
- Repeated coverage rate
- (3)
- Coverage total steps
4.4. Simulation and Analysis
- (1)
- Simulation of an environment without dynamic obstacles
- (2)
- Simulation of an environment with dynamic obstacles
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, W.; Zhen, Z. Intelligent surveillance and reconnaissance mode of police UAV based on grid. In Proceeding of the 2021 7th International Symposium on Mechatronics and Industrial Informatics (ISMII), Zhuhai, China, 22–24 January 2021; pp. 292–295. [Google Scholar] [CrossRef]
- Wang, S.; Han, Y.; Chen, J.; Zhang, Z.; Wang, G.; Du, N. A deep-learning-based sea search and rescue algorithm by UAV remote sensing. In Proceedings of the 2018 IEEE CSAA Guidance, Navigation and Control Conference (CGNCC), Xiamen, China, 10–12 August 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, UK, 1998. [Google Scholar] [CrossRef]
- Ohi, A.Q.; Mridha, M.F.; Monowar, M.M.; Hamid, M.A. Exploring optimal control of epidemic spread using reinforcement learning. Sci. Rep. 2020, 10, 22106. [Google Scholar] [CrossRef] [PubMed]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar] [CrossRef]
- Tan, X.; Han, L.; Gong, H.; Wu, Q. Biologically inspired complete coverage path planning algorithm based on Q-learning. Sensors 2023, 23, 4647. [Google Scholar] [CrossRef] [PubMed]
- Saha, O.; Ren, G.; Heydari, J.; Ganapathy, V.; Shah, M. Deep reinforcement learning based online area covering autonomous robot. In Proceedings of the 2021 7th International Conference on Automation, Robotics and Applications (ICARA), Prague, Czech Republic, 4–6 February 2021; pp. 21–25. [Google Scholar] [CrossRef]
- Hou, T.; Li, J.; Pei, X.; Wang, H.; Liu, T. A Spiral Coverage Path Planning Algorithm for Nonomnidirectional Robots. J. Field Robot. 2025, 42, 2260–2279. [Google Scholar] [CrossRef]
- Aydemir, F.; Cetin, A. Multi-agent dynamic area coverage based on reinforcement learning with connected agents. Comput. Syst. Sci. Eng. 2023, 45, 215–230. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, Z.; Xing, N.; Zhao, S. UAV Coverage Path Planning Based on Deep Reinforcement Learning. In Proceedings of the 2023 IEEE 6th International Conference on Computer and Communication Engineering Technology (CCET), Beijing, China, 4–6 August 2023; pp. 143–147. [Google Scholar] [CrossRef]
- Zhang, N.; Yue, L.; Zhang, Q.; Gao, C.; Zhang, B.; Wang, Y. A UAV Coverage Path Planning Method Based on a Diameter–Height Model for Mountainous Terrain. Appl. Sci. 2025, 15, 1988. [Google Scholar] [CrossRef]
- Zhang, N.; Zhang, B.; Zhang, Q.; Gao, C.; Feng, J.; Yue, L. Large-Area Coverage Path Planning Method Based on Vehicle–UAV Collaboration. Appl. Sci. 2025, 15, 1247. [Google Scholar] [CrossRef]
- Tong, D.; Wei, R. Regional coverage maximization: Alternative geographical space abstraction and modeling. Geogr. Anal. 2017, 49, 125–142. [Google Scholar] [CrossRef]
- Le, A.V.; Parween, R.; Elara Mohan, R.; Khanh Nhan, N.H.; Enjikalayil, R. Optimization complete area coverage by reconfigurable hTrihex tiling robot. Sensors 2020, 20, 3170. [Google Scholar] [CrossRef]
- Yao, P.; Wang, H.; Su, Z. Real-time path planning of unmanned aerial vehicle for target tracking and obstacle avoidance in complex dynamic environment. Aerosp. Sci. Technol. 2015, 47, 269–279. [Google Scholar] [CrossRef]
- Feng, J.; Zhang, J.; Zhang, G.; Xie, S.; Ding, Y.; Liu, Z. UAV dynamic path planning based on obstacle position prediction in an unknown environment. IEEE Access 2021, 9, 154679–154691. [Google Scholar] [CrossRef]
- Liu, W.; Zhang, B.; Liu, P.; Pan, J.; Chen, S. Velocity obstacle guided motion planning method in dynamic environments. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 101889. [Google Scholar] [CrossRef]
- Lindqvist, B.; Mansouri, S.S.; Agha-mohammadi, A.; Nikolakopoulos, G. Nonlinear MPC for collision avoidance and control of UAVs with dynamic obstacles. IEEE Robot. Autom. Lett. 2020, 5, 6001–6008. [Google Scholar] [CrossRef]
- Liu, J.; Luo, W.; Zhang, G.; Li, R. Unmanned Aerial Vehicle Path Planning in Complex Dynamic Environments Based on Deep Reinforcement Learning. Machines 2025, 13, 162. [Google Scholar] [CrossRef]
- Liu, Z.; Cao, Y.; Chen, J.; Li, J. A hierarchical reinforcement learning algorithm based on attention mechanism for UAV autonomous navigation. IEEE Trans. Intell. Transp. Syst. 2022, 24, 13309–13320. [Google Scholar] [CrossRef]
- Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
- Rizzolatti, G.; Craighero, L. Advances in Psychological Science; Psychology Press: Montreal, QC, Canada, 1998; pp. 171–198. [Google Scholar]
- Guo, F.; Zhang, Y.; Tang, J.; Li, W. YOLOv3-A: A traffic sign detection network based on attention mechanism. J. Commun. 2021, 42, 87–99. [Google Scholar] [CrossRef]
- Qiu, S.; Wu, Y.; Anwar, S.; Li, C. Investigating attention mechanism in 3D point cloud object detection. In Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK, 1–3 December 2021; pp. 403–412. [Google Scholar] [CrossRef]
- Cabreira, T.M.; Brisolara, L.B.; Paulo, R.F.J. Survey on coverage path planning with unmanned aerial vehicles. Drones 2019, 3, 4. [Google Scholar] [CrossRef]
- Yuan, J.; Liu, Z.; Lian, Y.; Chen, L.; An, Q.; Wang, L.; Ma, B. Global optimization of UAV area coverage path planning based on good point set and genetic algorithm. Aerospace 2022, 9, 86. [Google Scholar] [CrossRef]
- Cao, Y.; Cheng, X.; Mu, J. Concentrated coverage path planning algorithm of UAV formation for aerial photography. IEEE Sens. J. 2022, 22, 11098–11111. [Google Scholar] [CrossRef]
- Wu, X.; Wang, G.; Shen, N. Research on obstacle avoidance optimization and path planning of autonomous vehicles based on attention mechanism combined with multimodal information decision-making thoughts of robots. Front. Neurorobotics 2023, 17, 1269447. [Google Scholar] [CrossRef] [PubMed]
- Chame, H.F.; Chevallereau, C. A top-down and bottom-up visual attention model for humanoid object approaching and obstacle avoidance. In Proceedings of the 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR), Recife, Brazil, 8–12 October 2016; pp. 25–30. [Google Scholar] [CrossRef]
- Westheider, J.; Rückin, J.; Popović, M. Multi-UAV adaptive path planning using deep reinforcement learning. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 649–656. [Google Scholar] [CrossRef]
- Zhang, D.G.; Zhao, P.Z.; Cui, Y.; Chen, L.; Zhang, T.; Wu, H. A new method of mobile ad hoc network routing based on greed forwarding improvement strategy. IEEE Access 2019, 7, 158514–158524. [Google Scholar] [CrossRef]
Symbol | Value | Description |
---|---|---|
λ | 0.001 | Learning rate |
γ | 0.95 | Discount factor |
0.3 | coverage adjustment parameter | |
0.7 | coverage adjustment parameter | |
0.6 | DAM feature weight factor | |
n_episode | 15,000 | Maximum training times |
n_time_step | 150 | Maximum coverage times |
memory_size | 20,000 | Batch size of training sample |
batch_size | 128 | Batch size of training sample |
frequency | 10 | Frequency of target network updates |
Algorithm | Average Training Episodes | |||
---|---|---|---|---|
SAC | 100% | 23.10% | 339 | 7500 |
DDQN | 100% | 21.04% | 335 | 8100 |
AM-DDQN | 100% | 15.89% | 321 | 5800 |
DAM-DDQN | 100% | 8.90% | 298 | 4300 |
Algorithm | Average Training Episodes | |||
---|---|---|---|---|
SAC | 100% | 29.03% | 367 | 10,000 |
DDQN | 100% | 28.90% | 359 | 11,000 |
AM-DDQN | 100% | 19.01% | 335 | 7800 |
DAM-DDQN | 100% | 11.3% | 307 | 6200 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, J.; Li, H.; Chen, C.; Liu, Y.; Zhang, X. An Improved Deep Reinforcement Learning-Based UAV Area Coverage Algorithm for an Unknown Dynamic Environment. Appl. Sci. 2025, 15, 8942. https://doi.org/10.3390/app15168942
Huang J, Li H, Chen C, Liu Y, Zhang X. An Improved Deep Reinforcement Learning-Based UAV Area Coverage Algorithm for an Unknown Dynamic Environment. Applied Sciences. 2025; 15(16):8942. https://doi.org/10.3390/app15168942
Chicago/Turabian StyleHuang, Jiaoru, Huxin Li, Chaobo Chen, Yushuang Liu, and Xiaoyan Zhang. 2025. "An Improved Deep Reinforcement Learning-Based UAV Area Coverage Algorithm for an Unknown Dynamic Environment" Applied Sciences 15, no. 16: 8942. https://doi.org/10.3390/app15168942
APA StyleHuang, J., Li, H., Chen, C., Liu, Y., & Zhang, X. (2025). An Improved Deep Reinforcement Learning-Based UAV Area Coverage Algorithm for an Unknown Dynamic Environment. Applied Sciences, 15(16), 8942. https://doi.org/10.3390/app15168942