Multi-UAV Cooperative Searching and Tracking for Moving Targets Based on Multi-Agent Reinforcement Learning
Abstract
:1. Introduction
- The system model of multi-UAV cooperative searching and tracking for moving targets is constructed, which is extended from decentralized partially observable markov decision process. By optimizing both the average observation rate of moving targets and the average exploration rate of mission area, UAVs can maintain a constant observation over perceived targets and exploration for unknown environment;
- A novel information update and fusion mechanism is proposed to enhance environment perception ability of the multi-UAV system. In our model, each UAV keeps its individual cognitive information about the mission region to guide its action in a fully distributed decision-making approach. UAVs can achieve better understanding of the environment and better cooperatioin via information update and fusion;
- A distributed MARL method is proposed to learn cooperative searching and tracking policies for multi-UAV system.The reward function and observation space are newly designed, considering both target searching and region exploration. The method has also been proven effective through simulation analysis.
2. Related Works
2.1. Target Searching and Tracking
2.2. Multi-Agent Reinforcement Learning
3. Problem Formulation
3.1. Scenario Description
- We assume that multiple homogeneous fixed-wing UAVs are used to search multiple unknown moving targets. UAVs’ flight is confined to a two-dimensional plane at a specific altitude and UAVs prevent collisions by stratifying the altitude. Targets wander randomly within the mission region;
- As shown in Figure 1, UAVs utilize the sensor to detect the targets below them. When a target falls in detection range of the sensor, UAVs can detect the target. But UAVs cannot distinguish the identities or indices of targets;
- Each UAV in our model possesses an adequate amount of memory and processing capacity. And UAVs cooperate with each other in a distributed framework. Each UAV can share information with neighboring UAVs in communication range and decides independently with its own cognitive information.
3.2. System Model
4. Information Update and Fusion
4.1. Target Probability Map
4.2. Environment Search Status Map
4.3. UAV Location Map
5. Proposed Method
5.1. Preliminaries
5.1.1. Decentralized Partially Observable Markov Decision Process
- N denotes the set of n agents;
- S denotes the global state space and denotes the current and specific state;
- denotes all agents’ joint action space and is the action of the i-th agent;
- is the state trasition probability function from state s to next state given joint action ;
- denotes the joint reward function by executing the joint action given state s;
- represents all agents’ joint observation space, and denotes the local observation of the i-th agent;
- is the local observation function of the i-th agent given the global state s;
- denotes the constant discount factor.
5.1.2. Multi-Agent Proximal Policy Optimization
5.2. Observation and Action Space Representation
- Target Probability Information : It is extracted from target probability map and the target existence probability out of mission region is assumed 0;
- Environment Search Status Information : It is extracted from environment search status map and should be divided by t for normalization. The environment search status information out of the mission region is assumed 1, which means that there is no need to explore;
- UAV Location Information : It is extracted from UAV location map . And the value of out of the mission region is assumed 0, which means that there is no UAV.
5.3. Reward Function Design
5.4. Training Framework Based on MAPPO
| Algorithm 1: MAPPO | 
|  | 
6. Simulation Results
6.1. Setting Up
6.2. Convergence Analysis
6.3. Performance Analysis
6.4. Comparison with Other Methods
- A-CMOMMT: A-CMOMMT [16] is a traditional method to solve the target searching and tracking problem. In this method, control of multiple agents is based on a combination of force vectors that are attractive for nearby targets and repulsive for nearby agents;
- ACO: As a swarm intelligence algorithm, ant colony optimization (ACO) is also applied in the target searching and tracking problem [21,43]. The pheromone used to initialize all the cells in this approach is identical. The pheromone of cells encountered by UAVs will undergo vaporization at every time step. The pheromone map, target existence probability and UAVs’ locations will be included in the heuristic information to guide UAVs’ decisions;
- Random: At each timestep, the agents randomly select an action from the potential candidates.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Michael, N.; Mellinger, D.; Lindsey, Q.; Kumar, V. The grasp multiple micro-uav testbed. IEEE Robot. Autom. Mag. 2010, 17, 56–65. [Google Scholar] [CrossRef]
- Kumar, V.; Michael, N. Opportunities and challenges with autonomous micro aerial vehicles. Int. J. Robot. Res. 2012, 31, 1279–1291. [Google Scholar] [CrossRef]
- How, J.P.; Fraser, C.; Kulling, K.C.; Bertuccelli, L.F.; Toupet, O.; Brunet, L.; Bachrach, A.; Roy, N. Increasing autonomy of UAVs. IEEE Robot. Autom. Mag. 2009, 16, 43–51. [Google Scholar] [CrossRef]
- Raj, J.; Raghuwaiya, K.; Vanualailai, J. Collision avoidance of 3D rectangular planes by multiple cooperating autonomous agents. J. Adv. Transp. 2020, 2020, 4723687. [Google Scholar] [CrossRef]
- Qi, J.; Song, D.; Shang, H.; Wang, N.; Hua, C.; Wu, C.; Qi, X.; Han, J. Search and rescue rotary-wing uav and its application to the lushan ms 7.0 earthquake. J. Field Robot. 2016, 33, 290–321. [Google Scholar] [CrossRef]
- Ablavsky, V.; Snorrason, M. Optimal search for a moving target—A geometric approach. In Proceedings of the AIAA Guidance, Navigation, and Control Conference and Exhibit, Dever, CO, USA, 14–17 August 2000; p. 4060. [Google Scholar]
- Jung, B.; Sukhatme, G.S. A region-based approach for cooperative multi-target tracking in a structured environment. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, 30 September–4 October 2002; Volume 3, pp. 2764–2769. [Google Scholar]
- Tang, Z.; Ozguner, U. Motion planning for multitarget surveillance with mobile sensor agents. IEEE Trans. Robot. 2005, 21, 898–908. [Google Scholar] [CrossRef]
- Lanillos, P.; Gan, S.K.; Besada-Portas, E.; Pajares, G.; Sukkarieh, S. Multi-UAV target search using decentralized gradient-based negotiation with expected observation. Inf. Sci. 2014, 282, 92–110. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Garaffa, L.C.; Basso, M.; Konzen, A.A.; de Freitas, E.P. Reinforcement learning for mobile robotics exploration: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 3796–3810. [Google Scholar] [CrossRef]
- Crandall, J.W.; Goodrich, M.A. Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning. Mach. Learn. 2011, 82, 281–314. [Google Scholar] [CrossRef]
- Koopman, B.O. The theory of search. I. Kinematic bases. Oper. Res. 1956, 4, 324–346. [Google Scholar] [CrossRef]
- Raap, M.; Preuß, M.; Meyer-Nieberg, S. Moving target search optimization—A literature review. Comput. Oper. Res. 2019, 105, 132–140. [Google Scholar] [CrossRef]
- Bertuccelli, L.F.; How, J.P. Robust UAV search for environments with imprecise probability maps. In Proceedings of the 44th IEEE Conference on Decision and Control, Seville, Spain, 15 December 2005; pp. 5680–5685. [Google Scholar]
- Parker, L.E. Distributed algorithms for multi-robot observation of multiple moving targets. Auton. Robot. 2002, 12, 231–255. [Google Scholar] [CrossRef]
- Ding, Y.; Zhu, M.; He, Y.; Jiang, J. P-CMOMMT algorithm for the cooperative multi-robot observation of multiple moving targets. In Proceedings of the 2006 6th World Congress on Intelligent Control and Automation, Dalian, China, 21–23 June 2006; Volume 2, pp. 9267–9271. [Google Scholar]
- Jilkov, V.P.; Li, X.R. On fusion of multiple objectives for UAV search & track path optimization. J. Adv. Inf. Fusion 2009, 4, 27–39. [Google Scholar]
- Booth, K.E.C.; Piacentini, C.; Bernardini, S.; Beck, J.C. Target Search on Road Networks with Range-Constrained UAVs and Ground-Based Mobile Recharging Vehicles. IEEE Robot. Autom. Lett. 2020, 5, 6702–6709. [Google Scholar] [CrossRef]
- Phung, M.D.; Ha, Q.P. Motion-encoded particle swarm optimization for moving target search using UAVs. Appl. Soft Comput. 2020, 97, 106705. [Google Scholar] [CrossRef]
- Zhen, Z.; Chen, Y.; Wen, L.; Han, B. An intelligent cooperative mission planning scheme of UAV swarm in uncertain dynamic environment. Aerosp. Sci. Technol. 2020, 100, 105826. [Google Scholar] [CrossRef]
- Duan, H.; Zhao, J.; Deng, Y.; Shi, Y.; Ding, X. Dynamic discrete pigeon-inspired optimization for multi-UAV cooperative search-attack mission planning. IEEE Trans. Aerosp. Electron. Syst. 2020, 57, 706–720. [Google Scholar] [CrossRef]
- Hayat, S.; Yanmaz, E.; Brown, T.X.; Bettstetter, C. Multi-objective UAV path planning for search and rescue. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 5569–5574. [Google Scholar]
- Wang, T.; Qin, R.; Chen, Y.; Snoussi, H.; Choi, C. A reinforcement learning approach for UAV target searching and tracking. Multimed. Tools Appl. 2019, 78, 4347–4364. [Google Scholar] [CrossRef]
- Yan, P.; Jia, T.; Bai, C. Searching and tracking an unknown number of targets: A learning-based method enhanced with maps merging. Sensors 2021, 21, 1076. [Google Scholar] [CrossRef]
- Zhou, W.; Ll, J.; Liu, Z.; Shen, L. Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning. Chin. J. Aeronaut. 2022, 35, 100–112. [Google Scholar] [CrossRef]
- Shen, G.; Lei, L.; Zhang, X.; Li, Z.; Cai, S.; Zhang, L. Multi-UAV Cooperative Search Based on Reinforcement Learning with a Digital Twin Driven Training Framework. IEEE Trans. Veh. Technol. 2023, 72, 8354–8368. [Google Scholar] [CrossRef]
- Oroojlooy, A.; Hajinezhad, D. A review of cooperative multi-agent deep reinforcement learning. Appl. Intell. 2023, 53, 13677–13722. [Google Scholar] [CrossRef]
- Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative learning. In Readings in Agents; Morgan Kaufmann: Burlington, MA, USA, 1997; pp. 487–494. [Google Scholar]
- Foerster, J.; Farquhar, G.; Afouras, T.; Nardelli, N.; Whiteson, S. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- De Witt, C.S.; Gupta, T.; Makoviichuk, D.; Makoviychuk, V.; Torr, P.H.; Sun, M.; Whiteson, S. Is independent learning all you need in the starcraft multi-agent challenge? arXiv 2020, arXiv:2011.09533. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30, 1–12. [Google Scholar]
- Oliehoek, F.A.; Spaan, M.T.; Vlassis, N. Optimal and approximate Q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 2008, 32, 289–353. [Google Scholar] [CrossRef]
- Iqbal, S.; Sha, F. Actor-attention-critic for multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2961–2970. [Google Scholar]
- Yu, C.; Velu, A.; Vinitsky, E.; Gao, J.; Wang, Y.; Bayen, A.; Wu, Y. The surprising effectiveness of ppo in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. 2022, 35, 24611–24624. [Google Scholar]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
- Rashid, T.; Samvelyan, M.; De Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 2020, 21, 7234–7284. [Google Scholar]
- Su, J.; Adams, S.; Beling, P. Value-decomposition multi-agent actor-critics. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 11352–11360. [Google Scholar]
- Millet, T.; Casbeer, D.; Mercker, T.; Bishop, J. Multi-agent decentralized search of a probability map with communication constraints. In Proceedings of the AIAA Guidance, Navigation, and Control Conference, Toronto, ON, Canada, 2–5 August 2010; p. 8424. [Google Scholar]
- Zhong, M.; Cassandras, C.G. Distributed coverage control and data collection with mobile sensor networks. IEEE Trans. Autom. Control 2011, 56, 2445–2455. [Google Scholar] [CrossRef]
- Hu, J.; Xie, L.; Lum, K.Y.; Xu, J. Multiagent information fusion and cooperative control in target search. IEEE Trans. Control Syst. Technol. 2012, 21, 1223–1235. [Google Scholar] [CrossRef]
- Dibangoye, J.S.; Amato, C.; Buffet, O.; Charpillet, F. Optimally solving Dec-POMDPs as continuous-state MDPs. J. Artif. Intell. Res. 2016, 55, 443–497. [Google Scholar] [CrossRef]
- Zhen, Z.; Xing, D.; Gao, C. Cooperative search-attack mission planning for multi-UAV based on intelligent self-organized algorithm. Aerosp. Sci. Technol. 2018, 76, 402–411. [Google Scholar] [CrossRef]







| Reference | Method | Sensor Model | Communication Range | Target | 
|---|---|---|---|---|
| [16] | A-CMOMMT | Deterministic | Limited | Moving | 
| [17] | P-CMOMMT | Deterministic | Limited | Moving | 
| [18] | MOO | Probabilistic | Unlimited | Moving | 
| [19] | CP | Deterministic | Unlimited | Moving | 
| [20] | PSO | Probabilistic | Unlimited | Moving | 
| [21] | ACO | Probabilistic | Limited | Moving | 
| [22] | PIO | Probabilistic | Limited | Static | 
| [23] | GA | Deterministic | Limited | Static | 
| [24] | RL | Probabilistic | Unlimited | Moving | 
| [25] | DRL | Deterministic | Limited | Moving | 
| [26] | MARL | Deterministic | Limited | Moving | 
| [27] | MARL | Probabilistic | Limited | Static | 
| Parameters | Value | 
|---|---|
| Total misssion time steps (T) | 200 | 
| Detection probability () | 0.9 | 
| False alarm probability () | 0.1 | 
| Sensing range () | 3 | 
| Communication range () | 6 | 
| Range of observation field () | 7 | 
| Information decaying factor () | 0.1 | 
| 2 | |
| 1 | |
| 0.5 | 
| Parameters | Value | 
|---|---|
| Number of steps to execute (E) | |
| Batch size (B) | 16 | 
| Learning rate () | |
| Discount factor () | 0.99 | 
| Clip factor () | 0.2 | 
| Optimizer | Adam | 
| 4 UAVs | 6 UAVs | 8 UAVs | 10 UAVs | |
|---|---|---|---|---|
| MAPPO | 0.143 | 0.174 | 0.176 | 0.185 | 
| A-CMOMMT | 0.228 | 0.237 | 0.231 | 0.239 | 
| ACO | 0.226 | 0.254 | 0.220 | 0.193 | 
| Random | 0.119 | 0.152 | 0.148 | 0.168 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, K.; Qian, F. Multi-UAV Cooperative Searching and Tracking for Moving Targets Based on Multi-Agent Reinforcement Learning. Appl. Sci. 2023, 13, 11905. https://doi.org/10.3390/app132111905
Su K, Qian F. Multi-UAV Cooperative Searching and Tracking for Moving Targets Based on Multi-Agent Reinforcement Learning. Applied Sciences. 2023; 13(21):11905. https://doi.org/10.3390/app132111905
Chicago/Turabian StyleSu, Kai, and Feng Qian. 2023. "Multi-UAV Cooperative Searching and Tracking for Moving Targets Based on Multi-Agent Reinforcement Learning" Applied Sciences 13, no. 21: 11905. https://doi.org/10.3390/app132111905
APA StyleSu, K., & Qian, F. (2023). Multi-UAV Cooperative Searching and Tracking for Moving Targets Based on Multi-Agent Reinforcement Learning. Applied Sciences, 13(21), 11905. https://doi.org/10.3390/app132111905
 
        


 
       