Long-Endurance Collaborative Search and Rescue Based on Maritime Unmanned Systems and Deep-Reinforcement Learning †
Abstract
1. Introduction
- We propose a multi-UAV cooperative search (ACS) algorithm leveraging MARL and probability map for the first-phase search. Then, we design a second-phase further search (SFS) algorithm for multi-USV by refining the probabilistic map provided the first-phase search of UAVs.
- To deal with the energy constraints of UAVs and perform long-endurance cooperative maritime search operations, we design a multi-USV charging scheduling (SCS) algorithm based on MADDPG and utilize multiple USVs as mobile charging stations to prolong the flight time of UAVs.
- We conduct extensive simulations to evaluate the feasibility and effectiveness of the proposed scheme.
2. Related Work
3. System Model
3.1. Probability Map Model
3.2. Energy Consumption Model
3.3. Wireless Charging Model
4. The Proposed Scheme
4.1. General Description
4.2. Search Algorithms of Multi-UAV and Multi-USV
4.2.1. ACS Algorithm
- State space : To search the task area, each UAV searches the target and observes the state information of other UAVs, where is the state space of UAV i, i.e.,
- Observation space : is the set of observed values of all UAVs. represents the observations of UAV i, including the observed position information of UAV i, the position information of other UAVs, and the energy consumption of UAVs. Thus, the observation information can be expressed as
- Action space : includes all actions that all the UAVs may undertake. In the search activities, the actions to be taken by the UAV include moving direction and moving distance. The action taken by UAV i at time t is expressed as
- State Transition Probability represents the probability of reaching the next state after executing the action in that state.
- Reward function : An appropriate reward function can help the UAVs explore better actions. The main objective of the exploration pursued by the UAV is to cover the unexplored area as soon as possible, minimize the energy consumption of the UAVs, and avoid collision with other UAVs. Therefore, the reward function is defined as follows.Target reward: This reward function encourages the UAV to find the target as soon as possible and mark the location of the target. We set that when , it means that the UAV has determined the target location, where is a threshold. The reward that the UAV can get when marking a target location is as follows.Coverage reward: This reward guides the UAVs to quickly cover the mission area, with fewer repetitive searches, and to cover as much unexplored area as possible. Therefore, the UAV search reward isCollision penalty: We use a penalty mechanism to guide UAV not to collide with other UAVs, and the collision penalty is [39]In summary, the whole reward function is derived as
Algorithm 1 The proposed ACS Algorithm. |
|
4.2.2. SFS Algorithm
Algorithm 2 The proposed SFS Algorithm. |
|
4.3. SCS Algorithm
- State Space: is the global environment information in the system, including the position coordinates of the USV and UAV, the energy level and the current working state of the USV, which is represented as a binary variable . If is set to 1, it indicates that the USV is engaged in the charging process; otherwise, equals 0. We use to represent the state of USV i at time slot t, i.e.,
- Observation Space: is the set of observations for all USVs. In a multi-agent system, each USV determines the action based on its current state as well as the current state of its nearby USVs. The observed values include the location information of itself and its neighbors, the location information and the charging urgency of the UAV. The observation space of USV i is
- Action Space: contains all actions that all USVs may take during the course of exploration, including direction and distance of movement. The action space of USV i is represented as
- State Transition Probability : This describes the probability that the system transits to another state after performing an action in a state.
- Reward function: denotes the reward of USV i at time slot t given that the agent observes a system state and takes an action . The objective of the USVs is to learn the optimal strategy , which is to maximize the cumulative reward while interacting with the environment. Therefore, we design a reward function based on the local information of each agent as well as the collaborative information to incentivize the USVs to search the target and maintain the UAV battery level while minimizing the energy consumption due to the movement.Therefore, the reward function of USV i at time t is expressed asThe energy consumed by each USV is determined by the distance it travels. Our objective is to devise an optimal path that minimizes energy consumption. Therefore, the reward for energy consumption is formulated asThe item is a reward component indicating the profit obtained by successfully charging an UAV at time slot t, which is defined as follows.The represents a penalty to an USV if it fails to charge any UAV in time and the remaining battery falls below the Eth. We define asIf there is a collision between USVs, there is a penalty, which we define as
Algorithm 3 The proposed SCS Algorithm. |
|
5. Experiments
5.1. Simulation Setup
5.2. The Effectiveness of the Proposed Scheme
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, J.; Zhang, G.; Jiang, C.; Zhang, W. A survey of maritime unmanned search system: Theory, applications and future directions. Ocean Eng. 2023, 285, 115359–115371. [Google Scholar] [CrossRef]
- Chen, L.; Yu, S.; Chen, Q.; Li, S.; Chen, X.; Zhao, Y. 5s: Design and in-orbit demonstration of a multifunctional integrated satellite-based internet of things payload. IEEE Internet Things J. 2024, 11, 12864–12873. [Google Scholar] [CrossRef]
- Luo, H.; Ma, S.; Tao, H.; Ruby, R.; Zhou, J.; Wu, K. Drl-optimized optical communication for a reliable uav-based maritime data transmission. IEEE Internet Things J. 2024, 11, 18768–18781. [Google Scholar] [CrossRef]
- Wang, J.; Luo, H.; Ruby, R.; Liu, J.; Wang, S.; Wu, K. Enabling reliable water?air direct optical wireless communication for uncrewed vehicular networks: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 2024, 73, 11470–11486. [Google Scholar] [CrossRef]
- Luo, H.; Wang, J.; Bu, F.; Ruby, R.; Wu, K.; Guo, Z. Recent progress of air/water cross-boundary communications for underwater sensor networks: A review. IEEE Sens. J. 2022, 22, 8360–8382. [Google Scholar] [CrossRef]
- Peng, X.; Lan, X.; Chen, Q. Age of task-aware aav-based mobile edge computing techniques in emergency rescue applications. IEEE Internet Things J. 2025, 12, 8909–8930. [Google Scholar] [CrossRef]
- Queralta, J.P.; Taipalmaa, J.; Pullinen, B.C.; Sarker, V.K.; Gia, T.N.; Tenhunen, H.; Gabbouj, M.; Raitoharju, J.; Westerlund, T. Collaborative multi-robot search and rescue: Planning, coordination, perception, and active vision. IEEE Access 2020, 8, 191617–191643. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, W.; Liu, J.; Sun, C. Cooperative usv–uav marine search and rescue with visual navigation and reinforcement learning-based control. ISA Trans. 2023, 137, 222–235. [Google Scholar] [CrossRef]
- Yang, T.; Jiang, Z.; Sun, R. Maritime search and rescue based on group mobile computing for UAVs and USVs. IEEE Trans. Ind. Inform. 2020, 99, 1–8. [Google Scholar] [CrossRef]
- Krishna, C.L.; Cao, M.; Murphy, R.R. Autonomous observation of multiple usvs from uav while prioritizing camera tilt and yaw over uav motion. In Proceedings of the 2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), Shanghai, China, 11–13 October 2017; pp. 141–146. [Google Scholar]
- Xiao, X.; Dufek, J.; Woodbury, T.; Murphy, R. Uav assisted usv visual navigation for marine mass casualty incident response. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 6105–6110. [Google Scholar]
- Wang, Y.; Su, Z.; Xu, Q.; Li, R.; Luan, T.H.; Wang, P. A secure and intelligent data sharing scheme for uav-assisted disaster rescue. IEEE/ACM Trans. Netw. 2023, 31, 2422–2438. [Google Scholar] [CrossRef]
- Liu, X.; Ansari, N.; Sha, Q.; Jia, Y. Efficient green energy far-field wireless charging for internet of things. IEEE Internet Things J. 2022, 9, 23047–23057. [Google Scholar] [CrossRef]
- Ma, X.; Liu, X.; Ansari, N. Green laser-powered uav far-field wireless charging and data backhauling for a large-scale sensor network. IEEE Internet Things J. 2024, 11, 31932–31946. [Google Scholar] [CrossRef]
- Mondal, M.S.; Ramasamy, S.; Humann, J.D.; Reddinger, J.-P.F.; Dotterweich, J.M.; Childers, M.A.; Bhounsule, P. Optimizing fuel-constrained uav-ugv routes for large scale coverage: Bilevel planning in heterogeneous multi-agent systems. In Proceedings of the 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), Boston, MA, USA, 4–5 December 2023; pp. 114–120. [Google Scholar]
- Yu, K.; Budhiraja, A.K.; Tokekar, P. Algorithms for routing of unmanned aerial vehicles with mobile recharging stations. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 5720–5725. [Google Scholar]
- Wang, Y.; Su, Z. An envy-free online uav charging scheme with vehicle-mounted mobile wireless chargers. China Commun. 2023, 20, 89–102. [Google Scholar] [CrossRef]
- Dong, P.; Liu, J.; Tao, H.; Ruby, R.; Jian, M.; Luo, H. An optimized scheduling scheme for uav-usv cooperative search via multi-agent reinforcement learning approach. In Proceedings of the 20th International Conference on Mobility, Sensing and Networking (MSN 2024), Harbin, China, 20–22 December 2024; pp. 172–179. [Google Scholar] [CrossRef]
- Dufek, J.; Murphy, R. Visual pose estimation of usv from uav to assist drowning victims recovery. In Proceedings of the 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Lausanne, Switzerland, 23–27 October 2016; pp. 147–153. [Google Scholar]
- Zhang, J.; Xiong, J.; Zhang, G.; Gu, F.; He, Y. Flooding disaster oriented usv & uav system development & demonstration. In Proceedings of the OCEANS 2016-Shanghai, Shanghai, China, 10–13 April 2016; pp. 1–4. [Google Scholar]
- Wang, Y.; Su, Z.; Zhang, N.; Li, R. Mobile wireless rechargeable uav networks: Challenges and solutions. IEEE Commun. Mag. 2022, 60, 33–39. [Google Scholar] [CrossRef]
- Mahbub, I.; Patwary, A.B.; Mahin, R.; Roy, S. Far-field wireless power beaming to mobile receivers using distributed, coherent phased arrays: A review of the critical components of a distributed wireless power beaming system. IEEE Microw. Mag. 2024, 25, 72–94. [Google Scholar] [CrossRef]
- Chen, W.; Zhao, S.; Shi, Q.; Zhang, R. Resonant beam charging-powered uav-assisted sensing data collection. IEEE Trans. Veh. Technol. 2019, 69, 1086–1090. [Google Scholar] [CrossRef]
- Zhang, K.; Yang, Z.; Başar, T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. In Handbook of Reinforcement Learning and Control; Springer: Cham, Switzerland, 2021; pp. 321–384. [Google Scholar]
- Yi, Z.; Xiang, C.; Huaguang, S.; Zhanqi, J.; Nianwen, N.; Fuqiang, L. Multi-objective coordinated optimization for uav charging scheduling in intelligent aerial-ground perception networks. Chin. J. Electron. 2023, 32, 1203–1217. [Google Scholar] [CrossRef]
- Zhu, K.; Yang, J.; Zhang, Y.; Nie, J.; Lim, W.Y.B.; Zhang, H.; Xiong, Z. Aerial refueling: Scheduling wireless energy charging for uav enabled data collection. IEEE Trans. Green Commun. Netw. 2022, 6, 1494–1510. [Google Scholar] [CrossRef]
- Messaoudi, K.; Oubbati, O.S.; Rachedi, A.; Bendouma, T. Uav-ugv-based system for aoi minimization in iot networks. In Proceedings of the ICC 2023—IEEE International Conference on Communications, Rome, Italy, 28 May–1 June 2023; pp. 4743–4748. [Google Scholar]
- Zhao, M.; Shi, Q.; Zhao, M.-J. Efficiency maximization for uav-enabled mobile relaying systems with laser charging. IEEE Trans. Wirel. Commun. 2020, 19, 3257–3272. [Google Scholar] [CrossRef]
- Shin, M.; Kim, J.; Levorato, M. Auction-based charging scheduling with deep learning framework for multi-drone networks. IEEE Trans. Veh. Technol. 2019, 68, 4235–4248. [Google Scholar] [CrossRef]
- Jiang, S. Fostering marine internet with advanced maritime radio system using spectrums of cellular networks. In Proceedings of the 2016 IEEE International Conference on Communication Systems (ICCS), Shenzhen, China, 14–16 December 2016; pp. 1–6. [Google Scholar]
- Yao, P.; Gao, Z. Uav/usv cooperative trajectory optimization based on reinforcement learning. In Proceedings of the 2022 China Automation Congress (CAC), Xiamen, China, 25–27 November 2022; pp. 4711–4715. [Google Scholar]
- Liu, Y.; Peng, Y.; Wang, M.; Xie, J.; Zhou, R. Multi-usv system cooperative underwater target search based on reinforcement learning and probability map. Math. Probl. Eng. 2020, 2020, 7842768–7842780. [Google Scholar] [CrossRef]
- Schneider, M.; Stenger, A.; Hof, J. An adaptive vns algorithm for vehicle routing problems with intermediate stops. OR Spectr. 2015, 37, 353–387. [Google Scholar] [CrossRef]
- Zeng, Y.; Xu, J.; Zhang, R. Energy minimization for wireless communication with rotary-wing UAV. IEEE Trans. Wirel. Commun. 2019, 18, 2329–2345. [Google Scholar] [CrossRef]
- Han, Y.; Ma, W. Automatic monitoring of water pollution based on the combination of UAV and USV. In Proceedings of the 2021 IEEE 4th International Conference on Electronic Information and Communication Technology (ICEICT), Xi’an, China, 18–20 August 2021; pp. 420–424. [Google Scholar]
- Wang, Y.; Luan, H.T.; Su, Z.; Zhang, N.; Benslimane, A. A secure and efficient wireless charging scheme for electric vehicles in vehicular energy networks. IEEE Trans. Veh. Technol. 2022, 71, 1491–1508. [Google Scholar] [CrossRef]
- Shen, G.; Lei, L.; Zhang, X.; Li, Z.; Cai, S.; Zhang, L. Multi-UAV cooperative search based on reinforcement learning with a digital twin driven training framework. IEEE Trans. Veh. Technol. 2023, 72, 8354–8368. [Google Scholar] [CrossRef]
- Iqbal, S.; Sha, F. Actor-attention-critic for multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 2961–2970. [Google Scholar]
- Qu, X.; Gan, W.; Song, D.; Zhou, L. Pursuit-evasion game strategy of USV based on deep reinforcement learning in complex multi-obstacle environment. Ocean Eng. 2023, 273, 114016. [Google Scholar] [CrossRef]
- Lambora, A.; Gupta, K.; Chopra, K. Genetic algorithm—A literature review. In Proceedings of the International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; pp. 380–384. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Abbeel, O.P.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30, 6379–6390. [Google Scholar]
- Lyu, L.; Chu, Z.; Lin, B. Joint association and power optimization for multi-uav assisted cooperative transmission in marine iot networks. Peer Peer Netw. Appl. 2021, 14, 3307–3318. [Google Scholar] [CrossRef]
- Simolin, T.; Rauma, K.; Viri, R.; Mäkinen, J.; Rautiainen, A.; Järventausta, P. Charging powers of the electric vehicle fleet: Evolution and implications at commercial charging sites. Appl. Energy 2021, 303, 117651–117663. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Params | Description | Value (Unit) |
---|---|---|
Communication distance of USVs | 100 m | |
The safe distance between UAVs | 3 m | |
The safe distance between USVs | 5 m | |
Maximum energy capacity | 97.58 Wh | |
Battery energy threshold | 20% | |
v | UAV level speed | 40 km/h |
Air density | 1.225 kg/m3 | |
Total area of rotor disks | 0.18 m2 | |
Fuselage drag ratio | 0.6 | |
Rotor solidity | 0.05 | |
d | The detection probability | 0.9 |
f | The false probability | 0.1 |
E | Number of episodes | 20,000 |
Discount factor | 0.95 | |
Learning rate | 0.01 | |
Target network update speed | 0.01 | |
S | Batch size | 1024 |
Buffer length |
f | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | |
---|---|---|---|---|---|---|
d | ||||||
0.5 | 0.832 | 0.860 | 0.944 | 0.984 | 1.000 | |
0.6 | 0.690 | 0.776 | 0.866 | 0.950 | 0.984 | |
0.7 | 0.583 | 0.650 | 0.786 | 0.866 | 0.929 | |
0.8 | 0.451 | 0.531 | 0.668 | 0.736 | 0.854 | |
0.9 | 0.305 | 0.390 | 0.507 | 0.587 | 0.664 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dong, P.; Liu, J.; Tao, H.; Zhao, Y.; Feng, Z.; Luo, H. Long-Endurance Collaborative Search and Rescue Based on Maritime Unmanned Systems and Deep-Reinforcement Learning. Sensors 2025, 25, 4025. https://doi.org/10.3390/s25134025
Dong P, Liu J, Tao H, Zhao Y, Feng Z, Luo H. Long-Endurance Collaborative Search and Rescue Based on Maritime Unmanned Systems and Deep-Reinforcement Learning. Sensors. 2025; 25(13):4025. https://doi.org/10.3390/s25134025
Chicago/Turabian StyleDong, Pengyan, Jiahong Liu, Hang Tao, Yang Zhao, Zhijie Feng, and Hanjiang Luo. 2025. "Long-Endurance Collaborative Search and Rescue Based on Maritime Unmanned Systems and Deep-Reinforcement Learning" Sensors 25, no. 13: 4025. https://doi.org/10.3390/s25134025
APA StyleDong, P., Liu, J., Tao, H., Zhao, Y., Feng, Z., & Luo, H. (2025). Long-Endurance Collaborative Search and Rescue Based on Maritime Unmanned Systems and Deep-Reinforcement Learning. Sensors, 25(13), 4025. https://doi.org/10.3390/s25134025