Energy-Efficient, Multi-Agent Deep Reinforcement Learning Approach for Adaptive Beacon Selection in AUV-Based Underwater Localization
Abstract
1. Introduction
- We develop a fully adaptive and energy-aware underwater localization framework based on deep reinforcement learning, enabling AUVs to dynamically optimize beacon selection and transmit power allocation under highly variable acoustic conditions.
- We formulate the underwater localization problem as a Markov Decision Process (MDP) with a rich state representation that incorporates signal features, depth, ranging uncertainty, and beacon availability, enabling environment-aware decision making.
- We design a multi-objective reward function that jointly balances localization accuracy, energy consumption, and ranging reliability through a risk metric derived from the mean ranging error.
- We integrate multiple DRL algorithms, TD3, SAC, MADDPG, and D2DPG, each addressing different challenges, such as overestimation bias, entropy-driven exploration, multi-agent cooperation, and scalable decentralized learning.
- We incorporate an RTT-based geometric localization model with an isogradient sound-speed profile, capturing depth-dependent acoustic propagation effects for more accurate position estimation.
- We derive the CRLB for the proposed framework to analytically characterize the theoretical limits of localization accuracy under noisy acoustic measurements and beacon uncertainties.
- We provide a detailed computational complexity analysis demonstrating that the proposed hierarchical DRL framework scales effectively with the number of beacons, transmit-power levels, and learning iterations, making it practical for large-scale UASN deployments.
2. Related Work
3. System Model
3.1. AUV Beacon Communication Model
3.2. Signal Propagation and Attenuation Model
3.3. Sound-Speed Profile and Range Estimation
3.4. AUV Position Estimation
3.5. The Energy Consumption Model
4. Proposed Method
4.1. Localization Problem Based on RL
4.2. Hierarchical Deep Reinforcement Learning Framework
| Algorithm 1 RL-based AUV localization framework | |
| 1: | Initialize: Actor–critic networks , , replay buffer |
| 2: | Set learning rates , , discount factor , and exploration policy |
| 3: | for episode to E do |
| 4: | Initialize AUV position and select initial beacon set |
| 5: | for time step to K do |
| 6: | Observe current state |
| 7: | Select action (with exploration ) |
| 8: | Execute |
| 9: | Measure RTT and RSS, update estimated position |
| 10: | Compute reward |
| 11: | Store transition in replay buffer |
| 12: | Sample mini-batch from and update actor–critic networks via TD3/SAC |
| 13: | if multi-agent mode then |
| 14: | Synchronise critic networks among agents (MADDPG) |
| 15: | else if distributed mode then |
| 16: | Exchange parameters with neighboring agents (D2DPG) |
| 17: | end if |
| 18: | end for |
| 19: | Optional: decay exploration parameter or learning rates |
| 20: | end for |
| 21: | Return: Trained policy and value network |
4.3. Complexity and Convergence
5. Results and Performance Evaluation
5.1. Experimental Setup
5.2. Baseline Schemes
5.3. Localization Accuracy Analysis
5.4. Energy Consumption Evaluation
5.5. Utility Function Performance
5.6. Convergence Behavior of Learning Algorithms
5.7. Sensitivity to Number of Beacons
5.8. Computational Complexity and Resource Analysis
5.9. Adaptivity Analysis Under Dynamic and Adverse Conditions
5.9.1. Adaptivity to Time-Varying Noise Statistics
5.9.2. Adaptivity to Sudden Beacon Failure
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Essaky, S.; Raja, G.; Dev, K.; Niyato, D. ARReSVG: Intelligent Multi-UAV Navigation in Partially Observable Spaces Using Adaptive Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2025, 74, 15429–15440. [Google Scholar] [CrossRef]
- Ioannou, G.; Forti, N.; Millefiori, L.M.; Carniel, S.; Renga, A.; Tomasicchio, G.; Binda, S.; Braca, P. Underwater inspection and monitoring: Technologies for autonomous operations. IEEE Aerosp. Electron. Syst. Mag. 2024, 39, 4–16. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhu, J.; Wang, H.; Shen, X.; Wang, B.; Dong, Y. Deep reinforcement learning-based adaptive modulation for underwater acoustic communication with outdated channel state information. Remote Sens. 2022, 14, 3947. [Google Scholar] [CrossRef]
- Gang, Q.; Muhammad, A.; Khan, Z.U.; Khan, M.S.; Ahmed, F.; Ahmad, J. Machine learning-based prediction of node localization accuracy in IIoT-based MI-UWSNs and design of a TD coil for omnidirectional communication. Sustainability 2022, 14, 9683. [Google Scholar] [CrossRef]
- Yan, J.; Yi, M.; Yang, X.; Luo, X.; Guan, X. Broad-learning-based localization for underwater sensor networks with stratification compensation. IEEE Internet Things J. 2023, 10, 13123–13137. [Google Scholar] [CrossRef]
- Campo-Valera, M.; Diego-Tortosa, D.; Asorey-Cacheda, R. Signal Processing for Estimating the Time of Arrival and Amplitude of Nonlinear Underwater Acoustic Waves. In Research and Applications of Digital Signal Processing; IntechOpen: London, UK, 2025. [Google Scholar]
- Yan, J.; Guan, X.; Yang, X.; Chen, C.; Luo, X. A Survey on Integration Design of Localization, Communication and Control for Underwater Acoustic Sensor Networks. IEEE Internet Things J. 2025, 12, 6300–6324. [Google Scholar] [CrossRef]
- Muhammad, A.; Li, F.; Khan, Z.U.; Khan, F.; Khan, J.; Khan, S.U. Exploration of contemporary modernization in UWSNs in the context of localization including opportunities for future research in machine learning and deep learning. Sci. Rep. 2025, 15, 5672. [Google Scholar] [CrossRef] [PubMed]
- Nain, M.; Goyal, N.; Dhurandher, S.K.; Dave, M.; Verma, A.K.; Malik, A. A survey on node localization technologies in UWSNs: Potential solutions, recent advancements, and future directions. Int. J. Commun. Syst. 2024, 37, e5915. [Google Scholar] [CrossRef]
- Kim, Y.; Erol-Kantarci, M.; Noh, Y.; Kim, K. Range-free localization with a mobile beacon via motion compensation in underwater sensor networks. IEEE Wireless Commun. Lett. 2020, 10, 6–10. [Google Scholar] [CrossRef]
- Muhammad, A.; Li, F.; Mohsan, S.A.; Khan, Z.U.; Khan, W.; Khan, S.U.; Han, Z.; Khan, F. Magneto Inductive (MI) Channel Variables Prediction through Machine Learning Linear regression method, for Underwater and Underground WSNs. IEEE Access 2025, 13, 33124–33137. [Google Scholar] [CrossRef]
- Khan, S.U.; Khan, Z.U.; Alkhowaiter, M.; Khan, J.; Ullah, S. Energy-efficient routing protocols for UWSNs: A comprehensive review of taxonomy, challenges, opportunities, future research directions, and machine learning perspectives. J. King Saud Univ. Comput. Inf. Sci. 2024, 36, 102128. [Google Scholar] [CrossRef]
- Tian, X.; Du, X.; Liu, X.; Wang, L.; Zhao, L. A Low-Delay Source-Location-Privacy Protection Scheme with Multi-AUV Collaboration for Underwater Acoustic Sensor Networks. IEEE Sens. J. 2025, 25, 12236–12252. [Google Scholar] [CrossRef]
- Yue, Y.; Pan, Z.; Li, S.; Su, W.; Han, J. Reinforcement Learning Based Smart UUV-IoUT Localization in Underwater Acoustic Topology Network. IEEE Internet Things J. 2025, 12, 16637–16652. [Google Scholar] [CrossRef]
- Su, R.; Gong, Z.; Li, C.; Han, S. High accuracy AUV-aided underwater Localization: Far-field information fusion perspective. IEEE Trans. Signal Process. 2024, 72, 1877–1891. [Google Scholar] [CrossRef]
- Chen, C.; Mei, X. Brief Literature Survey on Localization in Ocean Wireless Sensor Networks. J. Phys. Conf. Ser. 2025, 3055, 012045. [Google Scholar] [CrossRef]
- Liu, C.; Lv, Z.; Xiao, L.; Su, W.; Ye, L.; Yang, H.; You, X.; Han, S. Efficient beacon-aided AUV localization: A reinforcement learning based approach. IEEE Trans. Veh. Technol. 2024, 73, 7799–7811. [Google Scholar] [CrossRef]
- Li, Y.; Yu, W.; Xu, H.; Guan, X. Robust Multiple Autonomous Underwater Vehicle Cooperative Localization Based on the Principle of Maximum Entropy. IEEE Trans. Autom. Sci. Eng. 2025, 22, 12960–12974. [Google Scholar] [CrossRef]
- Hong, J.; Fulton, M.; Orpen, K.; Barthelemy, K.; Berlin, K.; Sattar, J. A Quantitative Evaluation of Bathymetry-Based Bayesian Localization Methods for Autonomous Underwater Robots. IEEE J. Ocean. Eng. 2025, 50, 985–1000. [Google Scholar] [CrossRef]
- Li, Y.; Yu, W.; Guan, X. Trajectory Planning-Aided Cooperative Localization for Multi-AUV Networks Under Harsh Communication Conditions: A Co-Designed Approach. IEEE Trans. Netw. 2025, 33, 3088–3103. [Google Scholar] [CrossRef]
- Wang, Y.; Song, S.; Liu, J.; Guo, X.; Cui, J. Efficient AUV-aided localization for large-scale underwater acoustic sensor networks. IEEE Internet Things J. 2024, 11, 31776–31790. [Google Scholar] [CrossRef]
- Huang, P.; Li, Y.; Wang, Y.; Guan, X. Information-entropy-based trajectory planning for AUV-aided network localization: A reinforcement learning approach. IEEE Internet Things J. 2024, 12, 2122–2134. [Google Scholar] [CrossRef]
- Li, Y.; Liu, M.; Zhang, S.; Zheng, R.; Lan, J. Node dynamic localization and prediction algorithm for internet of underwater things. IEEE Internet Things J. 2021, 9, 5380–5390. [Google Scholar] [CrossRef]
- Yan, J.; Zhao, H.; Luo, X.; Wang, Y.; Chen, C.; Guan, X. Asynchronous localization of underwater target using consensus-based unscented Kalman filtering. IEEE J. Ocean. Eng. 2019, 45, 1466–1481. [Google Scholar] [CrossRef]
- Yan, J.; Guo, D.; Luo, X.; Guan, X. AUV-aided localization for underwater acoustic sensor networks with current field estimation. IEEE Trans. Veh. Technol. 2020, 69, 8855–8870. [Google Scholar] [CrossRef]
- Fan, R.; Boukerche, A.; Pan, P.; Jin, Z.; Su, Y.; Dou, F. Secure Localization for Underwater Wireless Sensor Networks via AUV Cooperative Beamforming with Reinforcement Learning. IEEE Trans. Mobile Comput. 2024, 24, 924–938. [Google Scholar] [CrossRef]
- Middelkoop, J.M.; Celi, F.; Faggiani, A.; Hummel, H.; Bhulai, S.; Tesei, A.; Been, R.; Ferri, G. Optimizing Source Localization via Reinforcement Learning in Multi-Agent Underwater Networks. In Proceedings of the OCEANS 2025 Brest, Brest, France, 16–19 June 2025; pp. 1–10. [Google Scholar]
- Li, Y.; Cai, K.; Zhang, Y.; Tang, Z.; Jiang, T. Localization and tracking for AUVs in marine information networks: Research directions, recent advances, and challenges. IEEE Netw. 2019, 33, 78–85. [Google Scholar] [CrossRef]
- Liu, J.; Wang, Z.; Cui, J.-H.; Zhou, S.; Yang, B. A joint time synchronization and localization design for mobile underwater sensor networks. IEEE Trans. Mobile Comput. 2015, 15, 530–543. [Google Scholar] [CrossRef]
- Liu, B.; Chen, H.; Zhong, Z.; Poor, H.V. Asymmetrical round trip based synchronization-free localization in large-scale underwater sensor networks. IEEE Trans. Wireless Commun. 2010, 9, 3532–3542. [Google Scholar] [CrossRef]
- Luo, H.; Wu, K.; Gong, Y.-J.; Ni, L.M. Localization for drifting restricted floating ocean sensor networks. IEEE Trans. Veh. Technol. 2016, 65, 9968–9981. [Google Scholar] [CrossRef]
- Liu, X.; Han, F.; Ji, W.; Liu, Y.; Xie, Y. A novel range-free localization scheme based on anchor pairs condition decision in wireless sensor networks. IEEE Trans. Commun. 2020, 68, 7882–7895. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, T.; Shin, H.-S.; Xu, X. Efficient underwater acoustical localization method based on time difference and bearing measurements. IEEE Trans. Instrum. Meas. 2020, 70, 8501316. [Google Scholar] [CrossRef]
- Gong, Z.; Li, C.; Jiang, F.; Zheng, J. AUV-aided localization of underwater acoustic devices based on Doppler shift measurements. IEEE Trans. Wireless Commun. 2020, 19, 2226–2239. [Google Scholar] [CrossRef]
- Pinheiro, B.C.; Moreno, U.F.; de Sousa, J.T.; Rodríguez, O.C. Kernel-function-based models for acoustic localization of underwater vehicles. IEEE J. Ocean. Eng. 2016, 42, 603–618. [Google Scholar] [CrossRef]
- Wang, Q.; He, B.; Zhang, Y.; Yu, F.; Huang, X.; Yang, R. An autonomous cooperative system of multi-AUV for underwater targets detection and localization. Eng. Appl. Artif. Intell. 2023, 121, 105907. [Google Scholar] [CrossRef]
- Wang, Z.; Sui, Y.; Qin, H.; Lu, H. State super sampling soft actor–critic algorithm for multi-AUV hunting in 3D underwater environment. J. Mar. Sci. Eng. 2023, 11, 1257. [Google Scholar] [CrossRef]
- Fujimoto, S.; Van Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning (ICML); PMLR: Brookline, MA, USA, 2018; pp. 1587–1596. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor–critic Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning (ICML); PMLR: Brookline, MA, USA, 2018; pp. 1861–1870. [Google Scholar]
- Liu, C.; Chen, Y.; Xiao, L.; Yang, H.; Su, W.; You, X. Reinforcement learning-based AUV localization in underwater acoustic sensor networks. In 2023 IEEE/CIC International Conference on Communications in China (ICCC); IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]











| Ref./Year | Proposed Method | Key Focus | Addressed Challenges | Key Findings | Limitations |
|---|---|---|---|---|---|
| Li, H. et al. (2025) [18] | Robust cooperative localization for multi-AUVs | Cooperative localization for multi-AUVs in UWSNs | Uncertainties in UWSNs, model mismatches, measurement errors, communication channel variability | Enhances accuracy, scalability, and robustness of multi-AUV localization by addressing uncertainty (maximum entropy + belief propagation). | Real-time uncertainty handling can be complex; potential performance drops in extreme environments; high computational cost. |
| Hong, J. et al. (2025) [19] | New probabilistic technique for AUV bathymetry localization | Bayesian AUV bathymetry localization | Visual landmark localization in UWSNs, finding true location in UWSNs scenarios | Compares EKF, UKF, Particle Filter, and MPF; MPF is most accurate under various conditions. | Simulations may not fully replicate real underwater complexities; performance may vary in non-ideal conditions. |
| Li, Y. et al. (2025) [20] | Integration of LCP for AUVs | Integrated method for LCP in AUV networks | Independent operation challenges, communication and planning conflicts, conflicting demands between communication and planning | Cooperative localization guided by trajectory planning; mutual enhancement under different communication conditions. | Potential real-time computational complexity; limited scalability for large networks; dependency on precise communication conditions. |
| Wang, Y. et al. (2024) [21] | Efficient AUV-based localization approach | Large-scale UAWSNs localization using multiple AUVs | Complex path planning for multiple AUVs, harsh underwater conditions | Unified framework integrating path planning and localization. | Challenges in real-time implementation; sensitivity to environmental conditions (e.g., stratification); computational complexity for large-scale networks. |
| Huang, P. et al. (2024) [22] | Trajectory planning localization for AUVs using RL | Anchor-based AUV localization for UAWSNs | Harsh submerged environment, high network maintenance cost, reliance on fixed anchors (buoys) | RL-based trajectory planning reduces entropy and shortens AUV trajectory while maintaining localization accuracy. | Limited scalability for large UAWSNs; computational complexity of RL algorithms. |
| Li, Y. et al. (2021) [23] | Dynamic node localization and prediction | Dynamic localization for IoT | Environmental changes, dynamic node behavior | Improves localization accuracy in changing environments. | Requires real-time environmental monitoring. |
| Yan, J. et al. (2020) [24] | DRL for privacy-preserving localization | Privacy-preserving localization in underwater networks | Security, privacy concerns | Provides a secure, privacy-preserving localization technique. | High computational cost for real-time applications. |
| Fan, R. et al. (2024) [26] | Secure localization in UWSNs using cooperative beamforming | Secure localization in UWSNs with cooperative beamforming | Privacy leaks, eavesdropping risks, multi-path complexities, stratification effects | Multi-anchor, multi-objective dual joint optimization improves security and energy performance; solved via MADDPG; validated in simulations and field experiments. | Complex and computationally intensive optimization; potential scalability issues for large deployments. |
| Middelkoop, J. et al. (2025) [27] | Multi-agent RL for underwater source localization | Trajectory optimization for underwater source localization | Non-stationary multi-agent environments, communication losses, trajectory optimization challenges | Shared-parameter MARL optimizes two-AUV trajectories to maximize detection probability. | Relies on simplified simulations; scalability limits for larger teams; sensitivity to real-world underwater dynamics. |
| Li, Y. et al. (2019) [28] | AUV-based localization | Tracking and localization of AUVs | Low accuracy in real-time tracking, interference in UAWSN environments | Summarizes recent advancements and future research directions. | Not suitable for large-scale IoT networks in UAWSNs. |
| Liu, J. et al. (2015) [29] | Joint time synchronization and localization | Localization for mobile UWSNs | Time synchronization in mobile networks | Combines time synchronization and localization to improve accuracy. | Computational complexity for real-time applications. |
| Liu, B. et al. (2010) [30] | Synchronization-free localization | Localization in large-scale UWSNs | Synchronization issues in large-scale networks | Proposes synchronization-free localization methods. | Limited scalability in real-world deployments. |
| Luo, H. et al. (2016) [31] | Localization of floating sensor networks | Drifting/ floating localization for UAWSNs | Sensor node mobility, lack of fixed infrastructure | Reports advances for localization under node mobility. | Mobility modeling and system design assumptions may not fully reflect complex real deployments. |
| Liu, X. et al. (2020) [32] | Anchor-paired range-free localization | Range-free localization for WSNs | Limited coverage, limited communication range | Introduces a decision-making approach for node localization. | High complexity when handling large networks. |
| Zhang, L. et al. (2020) [33] | Bearing measurement and time difference | Localization in UAWSNs environments | Time synchronization, environmental interference | Efficient localization approach for UAWSNs. | Sensitive to environmental changes. |
| Gong, Z. et al. (2020) [34] | Doppler shift-based AUV localization | AUV-based localization | Time synchronization errors, Doppler effects | Outperforms prior Doppler-based localization methods. | Dependency on AUV motion/conditions and external factors. |
| Pinheiro, B.C. et al. (2016) [35] | Kernel-function-based models | Localization of underwater vehicles | High error rates in mobile underwater vehicles | Improves localization accuracy for mobile vehicles. | Performance may degrade for large-scale networks. |
| Wang, Q. et al. (2023) [36] | Cooperative online target detection using multi-AUVs with SSS | Real-time target detection and positioning using multiple AUVs | Severe noise and geometric deformation (SSS), high false alarm rates, real-time computational constraints on AUVs | MSCNet for threshold segmentation and LWBlock for feature extraction. | Evaluated mainly in simulation/sea trials; potential scalability issues for larger detection networks. |
| Our Work | Multi-agent DRL-based adaptive localization (TD3, SAC, MADDPG, D2DPG) | Energy-efficient AUV localization with adaptive beacon selection | Dynamic underwater channels, energy constraints, scalability, non-stationarity | Achieves lower localization error, reduced energy consumption, faster convergence, and robustness under dynamic conditions. | Higher training complexity; requires offline training and sufficient exploration data. |
| Parameter | Value | Parameter | Value |
|---|---|---|---|
| Simulation area | Number of beacons (N) | 4–15 | |
| AUV motion model | Random waypoint/hovering | Beacon deployment | Random uniform |
| Carrier frequency () | Reference distance () | ||
| Geometric spreading factor () | Absorption coefficient () | ||
| Maximum transmit power () | Power quantization levels (M) | 5 | |
| Sound speed at surface () | Sound speed gradient () | ||
| RTT noise variance () | RSS noise variance () | ||
| Transmission duration () | Discount factor () | ||
| Learning rate | Replay buffer size | samples | |
| Mini-batch size | 256 | Exploration strategy | -greedy/entropy |
| Training episodes | 3000 | Steps per episode | 150 |
| Monte Carlo runs | 50 | GPU used | NVIDIA RTX 5070 |
| Algorithm | FLOPs/Episode | Avg. Time (ms) | Training Time (h) | Memory (MB) |
|---|---|---|---|---|
| FLA | 4.5 | 0.4 | 35 | |
| SRLUWL | 9.2 | 1.6 | 120 | |
| SDRLUWL | 11.0 | 2.0 | 150 | |
| SAC | 16.5 | 3.8 | 420 | |
| TD3 | 14.8 | 3.4 | 390 | |
| MADDPG | 18.2 | 4.5 | 480 | |
| D2DPG | 19.6 | 5.0 | 520 |
| Algorithm | Complexity Growth |
|---|---|
| FLA | |
| SRLUWL | |
| SDRLUWL | |
| SAC | |
| TD3 | |
| MADDPG | |
| D2DPG |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Khan, Z.U.; Gao, H.; Kulsoom, F.; Mohsan, S.A.H.; Muhammad, A.; Chaudry, H.N. Energy-Efficient, Multi-Agent Deep Reinforcement Learning Approach for Adaptive Beacon Selection in AUV-Based Underwater Localization. J. Mar. Sci. Eng. 2026, 14, 262. https://doi.org/10.3390/jmse14030262
Khan ZU, Gao H, Kulsoom F, Mohsan SAH, Muhammad A, Chaudry HN. Energy-Efficient, Multi-Agent Deep Reinforcement Learning Approach for Adaptive Beacon Selection in AUV-Based Underwater Localization. Journal of Marine Science and Engineering. 2026; 14(3):262. https://doi.org/10.3390/jmse14030262
Chicago/Turabian StyleKhan, Zahid Ullah, Hangyuan Gao, Farzana Kulsoom, Syed Agha Hassnain Mohsan, Aman Muhammad, and Hassan Nazeer Chaudry. 2026. "Energy-Efficient, Multi-Agent Deep Reinforcement Learning Approach for Adaptive Beacon Selection in AUV-Based Underwater Localization" Journal of Marine Science and Engineering 14, no. 3: 262. https://doi.org/10.3390/jmse14030262
APA StyleKhan, Z. U., Gao, H., Kulsoom, F., Mohsan, S. A. H., Muhammad, A., & Chaudry, H. N. (2026). Energy-Efficient, Multi-Agent Deep Reinforcement Learning Approach for Adaptive Beacon Selection in AUV-Based Underwater Localization. Journal of Marine Science and Engineering, 14(3), 262. https://doi.org/10.3390/jmse14030262

