Multi-AUV Cooperative Search for Moving Targets Based on Multi-Agent Reinforcement Learning
Abstract
1. Introduction
- The multi-AUV cooperative search for the moving target problem is formulated mathematically, including the environment model, the AUV model, and the information update and fusion model. In the multi-AUV cluster system, each AUV obtains environmental cognitive information from other AUVs within its communication range, processes it through the information update and fusion model, and utilizes the fused information to guide its actions. The information update and fusion model not only enhances the environmental perception capability of individual AUVs but also strengthens the cooperation among members of the multi-AUV cluster system. Subsequently, the multi-AUV cooperative search for moving targets is reformulated as a multi-objective optimization problem, where the optimization objectives include the uncertainty of the mission area, the exploration rate, and the number of detected moving targets.
- A convolutional multi-agent deep deterministic policy gradient method with prioritized experience replay is proposed to address the problem of multi-AUV cooperative search for moving targets. This method employs a CNN to extract spatial features from states and observations, thereby enabling more effective information processing. In addition, a prioritized experience replay mechanism is adopted to accelerate strategy convergence.
- To evaluate the performance of the proposed PER-CMADDPG method, simulation experiments are conducted, and the results are compared with those obtained using MADDPG and the convolutional multi-agent deep deterministic policy gradient (CMADDPG). The comparative analysis verifies the effectiveness and advantages of the proposed method. In addition, the influences of factors such as the multi-AUV cluster system scales, AUV speed, and sonar detection radius on the performance of the proposed PER-CMADDPG method are analyzed.
2. Related Works
2.1. Target Search
2.2. Multi-Agent Reinforcement Learning
3. Problem Formulation
3.1. Multi-AUV Cooperative Search for Moving Targets
- 1.
- Both the multi-AUV cluster system and the targets operate at fixed but different depths, thereby simplifying the movement of the multi-AUV cluster system and the targets from three-dimensional space movement to two-dimensional planar movement. Since the multi-AUV cluster system and the targets move at different depths, only intra-AUV collisions are considered due to fixed depths; inter-vehicle vertical separation remains constant.
- 2.
- Each individual in the multi-AUV cluster system has a fixed initial position and moves at a constant speed. Each AUV is equipped with a series of detection sonars. When a target enters the detection range of the sonar of an AUV, the AUV is capable of detecting the target. However, due to the inherent detection probability and false alarm probability of the equipped sonar, the result of a single detection is unreliable. Therefore, the multi-AUV cluster system must cooperate to perform repeated detections to improve the reliability of the detection results. In addition, each AUV can only share detection information with neighboring AUVs within its communication range.
- 3.
- The targets are initially randomly distributed within the mission area and move randomly at a constant speed within the same area. The targets are assumed to employ evasion strategies to avoid detection by the multi-AUV cluster system and to prevent collisions with other targets.
3.2. Mathematical Model
3.2.1. Environment Model
3.2.2. AUV Model
3.2.3. Information Update and Fusion Model
3.2.4. Mathematical Formulation of the Problem
4. Reformulation and Algorithm
4.1. Dec-POMDP
- N represents the number of agents;
- is the global state space, and is the current state of the environment;
- is the joint action space of all agents, and is the action of the i-th agent;
- is the joint reward by executing the joint action given the state s;
- is the joint observation space of all agents, and is the local observation of the i-th agent;
- : is the state transition probability function;
- : is the local observation probability function;
- is the discount factor.
4.2. State Space
4.3. Observation Space
- is obtained from the target probability map , assuming that the target existence probability outside the mission area is 0, which means that there is no target outside the mission.
- is obtained from the uncertainty map . The uncertainty outside the mission area is assumed to be 0, which indicates that the AUV has complete knowledge of the outside of the mission area.
- is obtained from the environment search map , where is normalized by dividing by t. The environment search information outside the mission area is assumed to be 1, which means that there is no need to explore.
- is extracted from the AUV position map , and the value outside of the mission area is assumed to be 0, which means that there is no AUV.
4.4. Action Space
4.5. Reward Function
4.6. Multi-AUV Cooperative Search for Moving Targets Method
4.6.1. MADDPG
4.6.2. Actor–Critic Networks Based on CNN
4.6.3. Prioritized Experience Replay
4.6.4. PER-CMADDPG
| Algorithm 1 The PER-CMADDPG Algorithm |
|
5. Experimental Results and Analysis
5.1. Simulation Settings
5.2. Result Analysis
5.2.1. Performance Analysis of the PER-CMADDPG Method
5.2.2. Influence of Environmental Parameters
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Torabi, P.; Hemmati, A. A synchronized vessel and autonomous vehicle model for environmental monitoring: Mixed integer linear programming model and adaptive matheuristic. Comput. Oper. Res. 2025, 183, 107188. [Google Scholar] [CrossRef]
- Pervukhin, D.; Kotov, D.; Trushnikov, V. Development of a Conceptual Model for the Information and Control System of an Autonomous Underwater Vehicle for Solving Problems in the Mineral and Raw Materials Complex. Energies 2024, 17, 5916. [Google Scholar] [CrossRef]
- Richmond, K.; Haden, T.; Siegel, V.; Alexander, M.; Gulley, J.; Adame, T.; Heaton, T.; Monteleone, K.; Worl, R. Field Demonstrations of Precision Mapping and Control by the SUNFISH® AUV in Support of Marine Archaeology. In Proceedings of the OCEANS 2023-MTS/IEEE US Gulf Coast, Biloxi, MI, USA, 25–28 September 2023; pp. 1–8. [Google Scholar]
- Qin, H.; Zhou, N.; Han, S.; Xue, Y. An environment information-driven online Bi-level path planning algorithm for underwater search and rescue AUV. Ocean Eng. 2024, 296, 116949. [Google Scholar] [CrossRef]
- Gan, W.; Qiao, L. Many-Versus-Many UUV Attack-Defense Game in 3D Scenarios Using Hierarchical Multi-Agent Reinforcement Learning. IEEE Internet Things J. 2025, 12, 23479–23494. [Google Scholar] [CrossRef]
- Wang, T.; Peng, X.; Hu, H.; Xu, D. Maritime Manned/unmanned Collaborative Systems and Key Technologies: A Survey. Acta Armamentarii 2024, 45, 3317. [Google Scholar]
- Wang, Y.; Liu, K.; Geng, L.; Zhang, S. Knowledge hierarchy-based dynamic multi-objective optimization method for AUV path planning in cooperative search missions. Ocean Eng. 2024, 312, 119267. [Google Scholar] [CrossRef]
- Li, Y.; Huang, Y.; Zou, Z.; Yu, Q.; Zhang, Z.; Sun, Q. Multi-AUV underwater static target search method based on consensus-based bundle algorithm and improved Glasius bio-inspired neural network. Inf. Sci. 2024, 673, 120684. [Google Scholar] [CrossRef]
- You, Y.; Xing, W.; Xie, F.; Yao, Y. Multi-AUV Static Target Search Based on Improved PSO. In Proceedings of the 2023 8th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS), Singapore, 16–18 June 2023; pp. 84–90. [Google Scholar]
- Wang, G.; Wei, F.; Jiang, Y.; Zhao, M.; Wang, K.; Qi, H. A multi-auv maritime target search method for moving and invisible objects based on multi-agent deep reinforcement learning. Sensors 2022, 22, 8562. [Google Scholar] [CrossRef]
- Karthikeyan, R.; Rani, B.S. An innovative approach for obstacle avoidance and path planning of mobile robot using adaptive deep reinforcement learning for indoor environment. Knowl.-Based Syst. 2025, 326, 114058. [Google Scholar] [CrossRef]
- Kuo, P.H.; Chen, K.L.; Lin, Y.S.; Chiu, Y.C.; Peng, C.C. Deep reinforcement learning–based collision avoidance strategy for multiple unmanned aerial vehicles. Eng. Appl. Artif. Intell. 2025, 160, 111862. [Google Scholar] [CrossRef]
- Zhang, S.; Zeng, Q. Online Unmanned Ground Vehicle Path Planning Based on Multi-Attribute Intelligent Reinforcement Learning for Mine Search and Rescue. Appl. Sci. 2024, 14, 9127. [Google Scholar] [CrossRef]
- Liu, T.; Hu, Y.; Xu, H. Deep reinforcement learning for vectored thruster autonomous underwater vehicle control. Complexity 2021, 2021, 6649625. [Google Scholar] [CrossRef]
- Ren, H.; Gao, J.; Gao, L.; Wang, J.; He, J.; Li, S. Reinforcement Learning based Hovering Control of a Buoyancy Driven Unmanned Underwater Vehicle with Discrete Inputs. In Proceedings of the 2025 10th International Conference on Control and Robotics Engineering (ICCRE), Nagoya, Japan, 9–11 May 2025; pp. 165–170. [Google Scholar]
- Lin, Y.H.; Chiang, C.H.; Yu, C.M.; Huang, J.Y.T. Intelligent docking control of autonomous underwater vehicles using deep reinforcement learning and a digital twin system. Expert Syst. Appl. 2026, 296, 129085. [Google Scholar] [CrossRef]
- Hu, J.; Xie, L.; Lum, K.Y.; Xu, J. Multiagent information fusion and cooperative control in target search. IEEE Trans. Control Syst. Technol. 2012, 21, 1223–1235. [Google Scholar] [CrossRef]
- Ji, H.; Yao, J.; Pei, C.; Liang, H. Collaborative target search for multiple UAVs based on collaborative particle swarm optimization genetic algorithm. In International Conference on Autonomous Unmanned Systems; Springer: Berlin/Heidelberg, Germany, 2021; pp. 900–909. [Google Scholar]
- Lu, J.; Jiang, J.; Han, B.; Liu, J.; Lu, X. Dynamic target search of UAV swarm based on improved pigeon-inspired optimization. In 2021 5th Chinese Conference on Swarm Intelligence and Cooperative Control; Springer: Berlin/Heidelberg, Germany, 2022; pp. 361–371. [Google Scholar]
- Yue, W.; Xin, H.; Lin, B.; Liu, Z.C.; Li, L.L. Path planning of MAUV cooperative search for multi-intelligent targets. Kongzhi Lilun Yu Yingyong/Control Theory Appl. 2022, 39, 2065–2073. [Google Scholar] [CrossRef]
- Jiang, Z.; Sun, X.; Wang, W.; Zhou, S.; Li, Q.; Da, L. Path planning method for maritime dynamic target search based on improved GBNN. Complex Intell. Syst. 2025, 11, 296. [Google Scholar] [CrossRef]
- Zhang, B.; Lin, X.; Zhu, Y.; Tian, J.; Zhu, Z. Enhancing multi-UAV reconnaissance and search through double critic DDPG with belief probability maps. IEEE Trans. Intell. Veh. 2024, 9, 3827–3842. [Google Scholar] [CrossRef]
- Wu, J.; Luo, J.; Jiang, C.; Gao, L. A multi-agent deep reinforcement learning approach for multi-UAV cooperative search in multi-layered aerial computing networks. IEEE Internet Things J. 2024, 12, 5807–5821. [Google Scholar] [CrossRef]
- Song, R.; Gao, S.; Li, Y. A novel approach to multi-USV cooperative search in unknown dynamic marine environment using reinforcement learning. Neural Comput. Appl. 2024, 37, 16055–16070. [Google Scholar] [CrossRef]
- Li, Y.; Ma, M.; Cao, J.; Luo, G.; Wang, D.; Chen, W. A method for multi-AUV cooperative area search in unknown environment based on reinforcement learning. J. Mar. Sci. Eng. 2024, 12, 1194. [Google Scholar] [CrossRef]
- Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA, 27–29 June 1993; pp. 330–337. [Google Scholar]
- De Witt, C.S.; Gupta, T.; Makoviichuk, D.; Makoviychuk, V.; Torr, P.H.; Sun, M.; Whiteson, S. Is independent learning all you need in the starcraft multi-agent challenge? arXiv 2020, arXiv:2011.09533. [Google Scholar]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
- Rashid, T.; Samvelyan, M.; De Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 2020, 21, 1–51. [Google Scholar]
- Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30, 6380–6391. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Su, K.; Qian, F. Multi-UAV cooperative searching and tracking for moving targets based on multi-agent reinforcement learning. Appl. Sci. 2023, 13, 11905. [Google Scholar] [CrossRef]
- Yang, T.; Chi, Q.; Xu, N.; Bai, J.; Wei, W.; Chen, H.; Wu, H.; Yao, Z.; Chen, W.; Lin, Y. Integrating deep Q-networks, convolutional neural networks, and artificial potential fields for enhanced search path planning of unmanned surface vessels. Ocean Eng. 2025, 335, 121338. [Google Scholar] [CrossRef]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]








| Notation | Explanation |
|---|---|
| The mission area, the index of the grid area (x,y), and the coordinate of the grid area. | |
| The set of targets, the index of the target i, the position of the target at time step t, and the grid area coordinate corresponding to the position of the target . | |
| The multi-AUV cluster system, the index of the AUV j, the position of the AUV at time step t, and the grid area coordinate corresponding to the position of the AUV . | |
| The length and width of the mission area, the length and width of the grid area. | |
| The number of grid areas in the rows and columns of the mission area. | |
| The number of targets and the number of individuals in the multi-AUV cluster system. | |
| The unit time, the current time, and the total mission time. | |
| The target presence state of the grid area . | |
| The detection radius of the sonar, the communication radius. | |
| The detection areas of the AUV at time step t and the group of neighbors for the AUV at time step t. | |
| The detection result of the AUV at time step t for the grid area . | |
| The true detection probability and false alarm probability. | |
| The heading angle of the AUV at time step t and the change in heading angle per unit time . | |
| The target probability map of the AUV at time step t, the uncertainty map of the AUV at time step t, the environment search map of the AUV at time step t, and the AUV position map of the AUV at time step t. | |
| The nonlinear conversion. | |
| The information decaying factor, the predetermined threshold. | |
| The safe distance between individuals in the multi-AUV cluster system. | |
| The global state space, the joint action space, the joint reward, and the joint observation. | |
| The weight coefficients of reward function. | |
| Importance sampling weights. |
| Parameters | Value | Parameters | Value |
|---|---|---|---|
| Training episode () | 1000 | Actor network learning rate | 0.001 |
| Training maximum steps () | 500 | Critic network hidden layer units | 64 |
| The PER buffer capacity (B) | Critic network learning rate | 0.001 | |
| Batch size | 256 | Kernel size 1 | 3 |
| Discount factor () | 0.99 | Stride 1 | 1 |
| Soft update factor () | 0.01 | Kernel size 2 | 3 |
| Actor network hidden layer units | 64 | Stride 2 | 1 |
| Number of Individuals in the Multi-AUV Cluster System | Average Number of Detected Targets | Average Exploration Rate | Average Uncertainty |
|---|---|---|---|
| 3 | 1.8 | 0.8246 | 55.0767 |
| 6 | 6.0 | 0.8932 | 39.0417 |
| 9 | 6.0 | 0.9420 | 33.2131 |
| AUV Speed | Average Number of Detected Targets | Average Exploration Rate | Average Uncertainty |
|---|---|---|---|
| 1 m/s | 1.4 | 0.5467 | 54.5474 |
| 2 m/s | 6.0 | 0.8932 | 39.0417 |
| 3 m/s | 6.0 | 0.9520 | 39.3437 |
| The Sonar Detection Radius | Average Number of Detected Targets | Average Exploration Rate | Average Uncertainty |
|---|---|---|---|
| 100 m | 2.8 | 0.6754 | 66.5433 |
| 200 m | 6.0 | 0.8932 | 39.0417 |
| 300 m | 5.8 | 0.9205 | 44.0516 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, L.; An, R.; Guo, Z.; Gao, J. Multi-AUV Cooperative Search for Moving Targets Based on Multi-Agent Reinforcement Learning. J. Mar. Sci. Eng. 2025, 13, 2072. https://doi.org/10.3390/jmse13112072
Li L, An R, Guo Z, Gao J. Multi-AUV Cooperative Search for Moving Targets Based on Multi-Agent Reinforcement Learning. Journal of Marine Science and Engineering. 2025; 13(11):2072. https://doi.org/10.3390/jmse13112072
Chicago/Turabian StyleLi, Le, Ruiqi An, Zhaozhi Guo, and Jian Gao. 2025. "Multi-AUV Cooperative Search for Moving Targets Based on Multi-Agent Reinforcement Learning" Journal of Marine Science and Engineering 13, no. 11: 2072. https://doi.org/10.3390/jmse13112072
APA StyleLi, L., An, R., Guo, Z., & Gao, J. (2025). Multi-AUV Cooperative Search for Moving Targets Based on Multi-Agent Reinforcement Learning. Journal of Marine Science and Engineering, 13(11), 2072. https://doi.org/10.3390/jmse13112072

