Next Article in Journal
A Novel Method Based on Eulerian Streamlines for Droplet Impingement Characteristic Computation Under Icing Conditions
Previous Article in Journal
A Transformer-Based Self-Organizing UAV Swarm for Assisting an Emergency Communications System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Multi-UAV Dynamic Target Search Based on Multi-Potential-Field Fusion Reward Shaping MAPPO

1
Beijing Institute of Technology, School of Mechatronics Engineering, Beijing 100081, China
2
Institute of Advanced Interdisciplinary Technology, Shenzhen MSU-BIT University, Shenzhen 518116, China
3
Yangtze River Delta Research Institute of BIT, Jiaxing 314000, China
*
Author to whom correspondence should be addressed.
Drones 2025, 9(11), 770; https://doi.org/10.3390/drones9110770
Submission received: 29 August 2025 / Revised: 5 November 2025 / Accepted: 6 November 2025 / Published: 7 November 2025

Abstract

In the cooperative search for dynamic targets by multiple UAVs, target uncertainty and system complexity pose significant challenges to cooperative decision-making. Multi-agent reinforcement learning (MARL) technology can be used for cooperative policy optimization, but it suffers from convergence difficulties and low policy quality in reward-sparse environments such as dynamic target search. To address this issue, this paper proposes a Multi-Potential-Field Fusion Reward Shaping MAPPO (MPRS-MAPPO) algorithm. First, three potential field functions are constructed for reward shaping: probability edge potential field, maximum probability potential field, and coverage probability sum potential field. Subsequently, an adaptive fusion weight mechanism is proposed to adjust fusion weights based on the correlation between potential field values and advantage values. Furthermore, a warm-up phase is introduced to improve training stability. Extensive experiments, including multi-scale and physical tests, demonstrate that MPRS-MAPPO significantly improves convergence speed, detection rate, and stability compared with MAPPO, MASAC, QMIX, and Scanline. Detection rates increased by 7.87–29.76%, and training uncertainty decreased by 7.43–56.36%, validating the algorithm’s robustness, scalability, and real-world applicability.
Keywords: multi-UAV collaboration; dynamic target search; reinforcement learning; reward shaping; multi-potential field fusion multi-UAV collaboration; dynamic target search; reinforcement learning; reward shaping; multi-potential field fusion

Share and Cite

MDPI and ACS Style

Hong, X.; Wang, Z.; Wang, Y.; Xue, C.; Gao, Y. Multi-UAV Dynamic Target Search Based on Multi-Potential-Field Fusion Reward Shaping MAPPO. Drones 2025, 9, 770. https://doi.org/10.3390/drones9110770

AMA Style

Hong X, Wang Z, Wang Y, Xue C, Gao Y. Multi-UAV Dynamic Target Search Based on Multi-Potential-Field Fusion Reward Shaping MAPPO. Drones. 2025; 9(11):770. https://doi.org/10.3390/drones9110770

Chicago/Turabian Style

Hong, Xiaotong, Zhengjie Wang, Yue Wang, Chao Xue, and Yang Gao. 2025. "Multi-UAV Dynamic Target Search Based on Multi-Potential-Field Fusion Reward Shaping MAPPO" Drones 9, no. 11: 770. https://doi.org/10.3390/drones9110770

APA Style

Hong, X., Wang, Z., Wang, Y., Xue, C., & Gao, Y. (2025). Multi-UAV Dynamic Target Search Based on Multi-Potential-Field Fusion Reward Shaping MAPPO. Drones, 9(11), 770. https://doi.org/10.3390/drones9110770

Article Metrics

Back to TopTop