Next Article in Journal
UAS Photogrammetry and TLS Technology: A Novel Approach to Predictive Maintenance in Industrial Tank Systems
Previous Article in Journal
Probabilistic Chain-Enhanced Parallel Genetic Algorithm for UAV Reconnaissance Task Assignment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

UAV Swarm Cooperative Dynamic Target Search: A MAPPO-Based Discrete Optimal Control Method

College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Drones 2024, 8(6), 214; https://doi.org/10.3390/drones8060214
Submission received: 29 April 2024 / Revised: 19 May 2024 / Accepted: 20 May 2024 / Published: 22 May 2024

Abstract

Unmanned aerial vehicles (UAVs) are commonly employed in pursuit and rescue missions, where the target’s trajectory is unknown. Traditional methods, such as evolutionary algorithms and ant colony optimization, can generate a search route in a given scenario. However, when the scene changes, the solution needs to be recalculated. In contrast, more advanced deep reinforcement learning methods can train an agent that can be directly applied to a similar task without recalculation. Nevertheless, there are several challenges when the agent learns how to search for unknown dynamic targets. In this search task, the rewards are random and sparse, which makes learning difficult. In addition, because of the need for the agent to adapt to various scenario settings, interactions required between the agent and the environment are more comparable to typical reinforcement learning tasks. These challenges increase the difficulty of training agents. To address these issues, we propose the OC-MAPPO method, which combines optimal control (OC) and Multi-Agent Proximal Policy Optimization (MAPPO) with GPU parallelization. The optimal control model provides the agent with continuous and stable rewards. Through parallelized models, the agent can interact with the environment and collect data more rapidly. Experimental results demonstrate that the proposed method can help the agent learn faster, and the algorithm demonstrated a 26.97% increase in the success rate compared to genetic algorithms.
Keywords: UAVs; optimal control; dynamic target search; multi-agents; MAPPO UAVs; optimal control; dynamic target search; multi-agents; MAPPO

Share and Cite

MDPI and ACS Style

Wei, D.; Zhang, L.; Liu, Q.; Chen, H.; Huang, J. UAV Swarm Cooperative Dynamic Target Search: A MAPPO-Based Discrete Optimal Control Method. Drones 2024, 8, 214. https://doi.org/10.3390/drones8060214

AMA Style

Wei D, Zhang L, Liu Q, Chen H, Huang J. UAV Swarm Cooperative Dynamic Target Search: A MAPPO-Based Discrete Optimal Control Method. Drones. 2024; 8(6):214. https://doi.org/10.3390/drones8060214

Chicago/Turabian Style

Wei, Dexing, Lun Zhang, Quan Liu, Hao Chen, and Jian Huang. 2024. "UAV Swarm Cooperative Dynamic Target Search: A MAPPO-Based Discrete Optimal Control Method" Drones 8, no. 6: 214. https://doi.org/10.3390/drones8060214

Article Metrics

Back to TopTop