Gemini: A Cascaded Dual-Agent DRL Framework for Task Chain Planning in UAV-UGV Collaborative Disaster Rescue
Abstract
1. Introduction
1.1. Background
1.2. Challenges
- How to plan task chains to simultaneously consider communication and platform resources constrained by disaster conditions to serve more rescue missions? For service capability enhancement methods: It performs resource pooling by allocating resource loads to UAVs and UGVs according to multi-dimensional mission requirements in disaster zones; For network resource load balancing methods: It performs pooling by connecting UAVs and UGVs tasked with serving the same disaster response mission to improve capability.
- How to plan task chains to effectively cooperate UAVs with UGVs in disaster-altered environments to serve complex rescue missions? Complex missions often demand that UAV and UGV platforms cooperate effectively to dispatch resources; Current methods rarely consider collaborative service policies within the UAV-UGV collaboration under dynamic disaster disturbances.
- How to plan task chains to timely return feedback for dealing with large-scale disaster scenarios? The current methods, based on mathematical programming and rule-based heuristic algorithms, are inefficient and incapable of addressing large-scale disaster scenarios in real time.
1.3. Contributions
- Task chain Planning Model: This paper comprehensively considers the limitations of communication and platform resources in UAV-UGV collaboration, then formalizes the task chain planning problem as an integer programming model, which decouples the problem into a joint optimization of chain selection and resource allocation. It ensures that the limited resources and services for missions can be considered simultaneously in the planned task chains (Addresses Challenge 1).
- Cooperation Policy: This paper proposes a cooperation policy for task chain planning to compensate for the low robustness of a single task chain. By planning multiple task chains for each rescue mission, the cooperation policy enhances platforms’ coordination capabilities, enabling the UAVs to cooperate with UGVs to serve for more complex rescue missions (Addresses Challenge 2).
- Cascaded DRL Framework: By applying the proposed integer programming model, this paper proposes Gemini, a cascaded dual-agent DRL framework that handles chain selection and resource allocation simultaneously. Gemini independently trains two agents: DRL-P and DRL-R. They collaboratively optimize the overall performance to dynamically plan task chains according to the HSN state and resource requirements. The trained Gemini can generate task chains quickly in large-scale disaster scenarios (Addresses Challenge 3).
1.4. Organization
2. Related Work
3. Preliminaries
3.1. HSN and Task Chain
- Sensing entity : Responsible for environmental monitoring and data collection. Equipped with sensors and reconnaissance modules, sensing platforms generate dynamic situational awareness by aggregating mission intelligence.
- Deciding entity : Making decision through embedded reasoning algorithms. Deciding entities analyze inputs from sensing entities, evaluate the benefit of executable actions, and generate the best decision.
- Influencing entity : Implements actions based on received directives. Influencing entities are configured with restricted loads to perform missions.
- the head node is a sensing entity, .
- the tail node is an influencing entity, .
- there is an intermediate node which is a deciding entity, .
3.2. Problem Description
4. Task Chain Planning Model
4.1. Assumption
- All missions are simultaneously detected and maintain positional stasis while HSN plans task chains.
- Entities can share intelligence with all entities quickly through communication protocols. Task chains have the same communication requirement for each connection.
- Task chains can be executed successfully while they are planned. The mission takes a relatively long time to execute, so the consumed resources during the execution process cannot be released.
4.2. Entity Model
4.3. Resource Allocation
4.4. Task Chain Planning
4.5. Optimization Objective
4.6. Complexity Analysis
5. Gemini Design
5.1. General Overview
5.2. Elements of Gemini
5.2.1. Element of DRL-R
5.2.2. The Element of DRL-P
5.3. Training Process
Algorithm 1: Training Process of DRL-R. |
Algorithm 2: Training Process of DRL-P. |
6. Performance Evaluation
6.1. Settings for Numerical Simulation
6.1.1. Data Preparation
6.1.2. Device Configuration
6.1.3. Metrics
6.1.4. Comparison Method
6.2. Training Performance of Gemini
6.3. Numerical Simulation Result
6.3.1. Simulation in Task Chain Planning
6.3.2. Simulation in the Service Benefit
6.3.3. Simulation in Different Proportions of Entities
6.3.4. Sensitivity Analysis
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sun, L.; Kong, S.; Yang, Z.; Gao, D.; Fan, B. Modified Siamese Network Based on Feature Enhancement and Dynamic Template for Low-Light Object Tracking in UAV Videos. Drones 2023, 7, 483. [Google Scholar] [CrossRef]
- Adam, M.S.; Abdullah, N.F.; Abu-Samah, A.; Amodu, O.A.; Nordin, R. Advanced Path Planning for UAV Swarms in Smart City Disaster Scenarios Using Hybrid Metaheuristic Algorithms. Drones 2025, 9, 64. [Google Scholar] [CrossRef]
- Wang, C.; Wu, L.; Yan, C.; Wang, Z.; Long, H.; Yu, C. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork. Chin. J. Aeronaut. 2020, 33, 2930–2945. [Google Scholar] [CrossRef]
- Xu, D.; Guo, Y.; Long, H.; Wang, C. A Novel Variable Step-size Path Planning Framework with Step-Consistent Markov Decision Process For Large-Scale UAV Swarm. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 10447–10452. [Google Scholar] [CrossRef]
- Shi, Z.; Feng, Z.; Wang, Q.; Dong, X.; Lü, J.; Ren, Z.; Wang, D. Prescribed-Time Time-Varying Output Formation Tracking for Heterogeneous Multi-Agent Systems. IEEE Internet Things J. 2024, 12, 11622–11632. [Google Scholar] [CrossRef]
- Nowakowski, M.; Berger, G.S.; Braun, J.; Mendes, J.a.; Bonzatto Junior, L.; Lima, J. Advance Reconnaissance of UGV Path Planning Using Unmanned Aerial Vehicle to Carry Our Mission in Unknown Environment. In Robot 2023: Sixth Iberian Robotics Conference; Marques, L., Santos, C., Lima, J.L., Tardioli, D., Ferre, M., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2024; pp. 50–61. [Google Scholar]
- Akhihiero, D.; Olawoye, U.; Das, S.; Gross, J. Cooperative Localization for GNSS-Denied Subterranean Navigation: A UAV–UGV Team Approach. NAVIGATION J. Inst. Navig. 2024, 71. [Google Scholar] [CrossRef]
- Ribeiro, R.G.; Cota, L.P.; Euzébio, T.A.M.; Ramírez, J.A.; Guimarães, F.G. Unmanned-Aerial-Vehicle Routing Problem With Mobile Charging Stations for Assisting Search and Rescue Missions in Postdisaster Scenarios. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 6682–6696. [Google Scholar] [CrossRef]
- Zhang, J.; Yue, X.; Zhang, H.; Xiao, T. Optimal Unmanned Ground Vehicle—Unmanned Aerial Vehicle Formation-Maintenance Control for Air-Ground Cooperation. Appl. Sci. 2022, 12, 3598. [Google Scholar] [CrossRef]
- Han, S.; Wang, M.; Duan, J.; Zhang, J.; Li, D. Research on Unmanned Aerial Vehicle Emergency Support System and Optimization Method Based on Gaussian Global Seagull Algorithm. Drones 2024, 8, 763. [Google Scholar] [CrossRef]
- Wang, Y.; Su, Z.; Xu, Q.; Li, R.; Luan, T.H. Lifesaving with RescueChain: Energy-Efficient and Partition-Tolerant Blockchain Based Secure Information Sharing for UAV-Aided Disaster Rescue. In Proceedings of the 40th IEEE Conference on Computer Communications, INFOCOM 2021, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar] [CrossRef]
- Munasinghe, I.; Perera, A.; Deo, R.C. A Comprehensive Review of UAV-UGV Collaboration: Advancements and Challenges. J. Sens. Actuator Netw. 2024, 13, 81. [Google Scholar] [CrossRef]
- Bravo-Arrabal, J.; Toscano-Moreno, M.; Fernandez-Lozano, J.J.; Mandow, A.; Gomez-Ruiz, J.A.; García-Cerezo, A. The Internet of Cooperative Agents Architecture (X-IoCA) for Robots, Hybrid Sensor Networks, and MEC Centers in Complex Environments: A Search and Rescue Case Study. Sensors 2021, 21, 7843. [Google Scholar] [CrossRef]
- Zhang, X.; Wang, K.; Li, X.; Shin, H.; Liu, P. Joint Task Chain, Power and UAV Trajectory Optimization Based on an Integrated Multi-UAV System. IEEE Trans. Veh. Technol. 2025, 1–15. [Google Scholar] [CrossRef]
- Wang, Y.; Su, Z.; Xu, Q.; Li, R.; Luan, T.H.; Wang, P. A Secure and Intelligent Data Sharing Scheme for UAV-Assisted Disaster Rescue. IEEE/ACM Trans. Netw. 2023, 31, 2422–2438. [Google Scholar] [CrossRef]
- Zhang, Y.; Yu, J.; Tang, Y.; Deng, Y.; Tian, X.; Yue, Y.; Yang, Y. GACF: Ground-Aerial Collaborative Framework for Large-Scale Emergency Rescue Scenarios. In Proceedings of the 2023 IEEE International Conference on Unmanned Systems (ICUS), Hefei, China, 13–15 October 2023; pp. 1701–1707. [Google Scholar] [CrossRef]
- Orhan, D.; Idouar, Y.; Pilla, L.L.; Cassagne, A.; Barthou, D.; Jégo, C. Scheduling Strategies for Partially-Replicable Task Chains on Two Types of Resources. arXiv 2025. [Google Scholar] [CrossRef]
- Qun, T.; Li, P.; Zhimeng, L.; Wei, Q. A Load Balancing Method for Matching Reconnaissance Tasks and Satellite Resources. J. Natl. Univ. Def. Technol. 2011, 33, 95–99. [Google Scholar]
- Jairam Naik, K. A Dynamic ACO-Based Elastic Load Balancer for Cloud Computing (D-ACOELB). In Data Engineering and Communication Technology; Raju, K.S., Senkerik, R., Lanka, S.P., Rajagopal, V., Eds.; Springer: Singapore, 2020; pp. 11–20. [Google Scholar]
- Liang, Y.; Lan, Y. TCLBM: A Task Chain-Based Load Balancing Algorithm for Microservices. Tsinghua Sci. Technol. 2021, 26, 251–258. [Google Scholar] [CrossRef]
- Mondal, M.S.; Ramasamy, S.; Humann, J.D.; Dotterweich, J.M.; Reddinger, J.P.F.; Childers, M.A.; Bhounsule, P. A robust uav-ugv collaborative framework for persistent surveillance in disaster management applications. In Proceedings of the 2024 International Conference on Unmanned Aircraft Systems (ICUAS), Chania, Crete, Greece, 4–7 June 2024; pp. 1239–1246. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, K.; Ma, J.; Li, X.; LI, D.; Sun, M.; Gao, F.; Xing, H.; Feng, L. Dynamic integration model of time-sensitive strike chain in naval battlefield. Chin. J. Ship Res. 2024, 19, 290–298. [Google Scholar] [CrossRef]
- Gowda, V.D.; Sharma, A.; Prasad, K.; Saxena, R.; Barua, T.; Mohiuddin, K. Dynamic Disaster Management with Real-Time IoT Data Analysis and Response. In Proceedings of the 2024 International Conference on Automation and Computation (AUTOCOM), Dehradun, India, 14–16 March 2024; pp. 142–147. [Google Scholar] [CrossRef]
- Xu, H. Joint All-domain Command and Control Technology Based on Networked Information System and its Future Perspectives. Aerosp. Shanghai (Chin. Engl.) 2024, 41, 1–8. [Google Scholar] [CrossRef]
- Yue, Q.; Li, J.; Huang, Z.; Xie, X.; Yang, Q. Vulnerability Assessment and Topology Reconstruction of Task Chains in UAV Networks. Electronics 2024, 13, 2126. [Google Scholar] [CrossRef]
- Duan, T.; Li, Q.; Zhou, X.; Li, X. An Adaptive Task Planning Method for UAVC Task Layer: DSTCA. Drones 2024, 8, 553. [Google Scholar] [CrossRef]
- Zhang, Y.; Yan, H.; Zhu, D.; Wang, J.; Zhang, C.; Ding, W.; Luo, X.; Hua, C.; Meng, M.Q. Air-Ground Collaborative Robots for Fire and Rescue Missions: Towards Mapping and Navigation Perspective. arXiv 2024. [Google Scholar] [CrossRef]
- Hu, Z.; Zhang, R.; Li, X.; Yu, Z.; Li, X.; Zhao, W.; Zhang, X.; Li, L. Conflict detection in Task Heterogeneous Information Networks. Web Intell. 2022, 20, 21–35. [Google Scholar] [CrossRef]
- Xu, J.; Liu, X.; Jin, J.; Pan, W.; Li, X.; Yang, Y. Holistic Service Provisioning in a UAV-UGV Integrated Network for Last-Mile Delivery. IEEE Trans. Netw. Serv. Manag. 2025, 22, 380–393. [Google Scholar] [CrossRef]
- Qin, C.; Niu, M.; Zhang, P.; He, J. Exploiting Cascaded Channel Signature for PHY-Layer Authentication in RIS-Enabled UAV Communication Systems. Drones 2024, 8, 358. [Google Scholar] [CrossRef]
- Xu, S. Research on the Key Techniques of Unmanned Operation Network Based on Connectivity. Ph.D. Thesis, Nanjing University of Aeronautics and Astronautics, Nanjing, China, 2017. [Google Scholar]
- Li, R.; Jiang, B.; Zong, Y.; Lu, N.; Guo, L. Event-Triggered Collaborative Fault Diagnosis for UAV–UGV Systems. Drones 2024, 8, 324. [Google Scholar] [CrossRef]
- Xiao, H.; Sun, S.; Li, D. Research on Kill Chain Resource Allocation Optimization Based on Reinforcement Learning and Game Theory. In Proceedings of the 2024 2nd International Conference on Computer, Vision and Intelligent Technology (ICCVIT), Huaibei, China, 24–27 November 2024; pp. 1–5. [Google Scholar] [CrossRef]
- Zhang, L.; Gao, F.; Chen, B.; Xi, L.; Deng, F.; Chen, J. Weighted Decentralized Information Filter for Collaborative Air-Ground Target Geolocation in Large Outdoor Environments. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 7292–7302. [Google Scholar] [CrossRef]
- Zhou, Y.; Jin, Z.; Shi, H.; Shi, L.; Lu, N.; Dong, M. Enhanced Emergency Communication Services for Post–Disaster Rescue: Multi-IRS Assisted Air-Ground Integrated Data Collection. IEEE Trans. Netw. Sci. Eng. 2024, 11, 4651–4664. [Google Scholar] [CrossRef]
- Asadujjaman, M.; Rahman, H.F.; Chakrabortty, R.K.; Ryan, M.J. Supply chain integrated resource-constrained multi-project scheduling problem. Comput. Ind. Eng. 2024, 194, 110380. [Google Scholar] [CrossRef]
- Hsie, M.; Wu, M.Y.; Huang, C.Y. Optimal urban sewer layout design using Steiner tree problems. Eng. Optim. 2019, 51, 1980–1996. [Google Scholar] [CrossRef]
- Dai, Y.; Ouyang, H.; Zheng, H.; Long, H.; Duan, X. Interpreting a deep reinforcement learning model with conceptual embedding and performance analysis. Appl. Intell. 2023, 53, 6936–6952. [Google Scholar] [CrossRef]
- Zhao, R.; Tang, J.; Zeng, W.; Guo, Y.; Zhao, X. Towards human-like questioning: Knowledge base question generation with bias-corrected reinforcement learning from human feedbac. Inf. Process. Manag. 2025, 62, 1–23. [Google Scholar] [CrossRef]
- Zhao, R.; Xu, D.; Jian, S.; Tan, L.; Sun, X.; Zhang, W. Quadratic Exponential Decrease Roll-Back: An Efficient Gradient Update Mechanism in Proximal Policy Optimization. In Proceedings of the 2023 2nd International Conference on Machine Learning, Cloud Computing and Intelligent Mining (MLCCIM), Sichuan, China, 25–29 July 2024; pp. 65–70. [Google Scholar] [CrossRef]
- Fan, Q.; Pan, P.; Li, X.; Wang, S.; Li, J.; Wen, J. DRL-D: Revenue-Aware Online Service Function Chain Deployment via Deep Reinforcement Learning. IEEE Trans. Netw. Serv. Manag. 2022, 19, 4531–4545. [Google Scholar] [CrossRef]
- Gao, Y.; Lyu, N. A New Multi-Target Three-Way Threat Assessment Method with Heterogeneous Information and Attribute Relevance. Mathematics 2024, 12, 691. [Google Scholar] [CrossRef]
- Kurose, J.F.; Ross, K.W. Computer Networking: A Top-Down Approach Featuring the Internet, 1st ed.; Addison-Wesley: Boston, MA, USA, 2002. [Google Scholar]
- Xu, K.; Li, Z.; Liang, N.; Kong, F.; Lei, S.; Wang, S.; Paul, A.; Wu, Z. Research on Multi-Layer Defense against DDoS Attacks in Intelligent Distribution Networks. Electronics 2024, 13, 3583. [Google Scholar] [CrossRef]
Category | Hyperparameter | DRL-R | DRL-P |
---|---|---|---|
Model | Static input size | 55 | 8 |
Hidden size | 128 | 128 | |
Number of layers | 1 | 1 | |
Dropout rate | 0.1 | 0.1 | |
Training | Actor learning rate | ||
Critic learning rate | |||
Batch size | 32 | 6 | |
Max gradient norm | 2.0 | 2.0 | |
Training set size | 2048 | 5000 | |
Validation set size | 10 | 20 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wen, M.; Guo, Y.; Qiu, C.; Ren, B.; Zhang, M.; Luo, X. Gemini: A Cascaded Dual-Agent DRL Framework for Task Chain Planning in UAV-UGV Collaborative Disaster Rescue. Drones 2025, 9, 492. https://doi.org/10.3390/drones9070492
Wen M, Guo Y, Qiu C, Ren B, Zhang M, Luo X. Gemini: A Cascaded Dual-Agent DRL Framework for Task Chain Planning in UAV-UGV Collaborative Disaster Rescue. Drones. 2025; 9(7):492. https://doi.org/10.3390/drones9070492
Chicago/Turabian StyleWen, Mengxuan, Yunxiao Guo, Changhao Qiu, Bangbang Ren, Mengmeng Zhang, and Xueshan Luo. 2025. "Gemini: A Cascaded Dual-Agent DRL Framework for Task Chain Planning in UAV-UGV Collaborative Disaster Rescue" Drones 9, no. 7: 492. https://doi.org/10.3390/drones9070492
APA StyleWen, M., Guo, Y., Qiu, C., Ren, B., Zhang, M., & Luo, X. (2025). Gemini: A Cascaded Dual-Agent DRL Framework for Task Chain Planning in UAV-UGV Collaborative Disaster Rescue. Drones, 9(7), 492. https://doi.org/10.3390/drones9070492