A Q-Learning-Based Link-Aware Routing Protocol for Underwater Wireless Sensor Networks
Abstract
1. Introduction
- Link-aware routing: Node mobility is predicted, and the link expiration time is calculated to dynamically adjust the neighbor maintenance frequency, so as to realize the awareness of link changes.
- Dynamic weight allocation: The link expiration time is taken as one of the decision-making metrics, and the entropy weight method is used to dynamically assign the weights of multiple decision metrics according to the network state.
- A network of fully mobile nodes: Existing routing protocols are primarily designed for fixed-topology networks that assume a stable network topology. This paper investigates a network consisting of mobile underwater sensor nodes, where these nodes concurrently generate data traffic. The study provides a novel approach to adaptive routing in dynamic topology networks.
2. Related Work
3. System Model
3.1. Motivation Scenario
3.2. Network Model and Assumptions
- Each node is assigned a unique ID for identification.
- All sensor nodes have the same initial energy and maintain a uniform communication range. The energy consumption of sensor nodes is considered to include the forwarding and receiving of data packets, excluding node mobility.
- The sink node remains stationary and has an unlimited energy supply.
- All sensor nodes have the same buffer size, which is quantified as the total number of data packets that can be stored.
- The communication links between nodes are symmetrical and reliable.
3.3. Q-Learning Framework for Routing in UWSNs
4. QLAR Algorithms
4.1. Decision Metrics of QLAR
4.1.1. Energy Metric
4.1.2. Distance Metric
4.1.3. Link Metric
4.2. Reward Function
4.3. Q-Value Initialization
5. Routing Protocol Design
5.1. Packet Structure
5.2. Neighbor Discovery Phase
5.3. Link Awareness Phase
| Algorithm 1 Neighbor discovery and link awareness |
| Input: Hello packet, Graph Output: Neighbor tables 1: for do 2: for do 3: // Broadcast Hello packet 4: if Hello timer is expired then 5: Broadcast Hello packet 6: Calculate Hello interval using Equation (14) 7: Reset Hello timer using 8: end if 9: // Neighbor discovery 10: if receives Hello packet from then 11: Get sender from Hello packet 12: if in then 13: Update neighbor record 14: Perform update step of EKPM 15: else 16: Add a new record for in 17: Calculate the initial Q-value using Equation (13) 18: Initialize a predict model for 19: end if 20: end if 21: // Link awareness 22: for do 23: Perform predict step of EKPM 24: Estimate using Equation (6) 25: if then 26: Remove from 27: else 28: Update neighbor table 29: end if 30: end for 31: end for 32: end for |
5.4. Data Transmission Phase
| Algorithm 2 Data transmission |
| Input: The Data packet to be transmitted, current node , Output: The next-hop for the Data packet to be transmitted 1: // Routing decision using Q-learning 2: while node receives a Data packet do 3: Extract Next-hop ID, Sender ID and Previous-hop ID fields from packet header 4: if node is not the next-hop then 5: Discard the Data packet 6: else 7: for then 8: Derive decision metrics using Equations (3)–(5) 9: Derive decision weights using Equation (10) 10: Calculate the reward using Equation (12) 11: Update the Q-value using Equation (2) 12: end for 13: Select next-hop with the maximum Q-value 14: if routing hole then 15: Cache the Data packet 16: else 17: Forward the Data packet to next-hop 18: end if 19: end if 20: end while 21: // Retransmission mechanism 22: while packet transmission fails do 23: if not reach the maximum retransmission limit then 24: Forward the Data packet 25: else 26: Discard the Data packet 27: end if 28: end while |
6. Numerical Results and Discussion
6.1. PDR Performance
6.2. Average E2ED Performance
6.3. Energy Consumption
6.4. Collision Performance
6.5. Impact of Hello Broadcast Interval
6.6. Impact of Parameters
6.6.1. Broadcast Interval Related Parameter
6.6.2. The Link Metric
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Domingo, M.C. An overview of the internet of underwater things. J. Netw. Comput. Appl. 2012, 35, 1879–1890. [Google Scholar] [CrossRef]
- Kao, C.C.; Lin, Y.S.; Wu, G.D.; Huang, C.J. A comprehensive study on the internet of underwater things: Applications, challenges, and channel models. Sensors 2017, 17, 1477. [Google Scholar] [CrossRef]
- Lin, J.; Yu, W.; Zhang, N.; Yang, X.; Zhang, H.; Zhao, W. A survey on internet of things: Architecture, enabling technologies, security and privacy, and applications. IEEE Internet Things J. 2017, 4, 1125–1142. [Google Scholar] [CrossRef]
- Luo, J.; Chen, Y.; Wu, M.; Yang, Y. A survey of routing protocols for underwater wireless sensor networks. IEEE Commun. Surv. Tutor. 2021, 23, 137–160. [Google Scholar] [CrossRef]
- Caruso, A.; Paparella, F.; Vieira, L.F.M.; Erol, M.; Gerla, M. The meandering current mobility model and its impact on underwater mobile sensor networks. In Proceedings of the IEEE INFOCOM 2008-The 27th Conference on Computer Communications, Phoenix, AZ, USA, 13–18 April 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 221–225. [Google Scholar]
- Coutinho, R.W.; Boukerche, A.; Vieira, L.F.; Loureiro, A.A. Performance modeling and analysis of void-handling methodologies in Underwater Wireless Sensor Networks. Comput. Netw. 2017, 126, 1–14. [Google Scholar] [CrossRef]
- Sandeep, D.; Kumar, V. Review on clustering, coverage and connectivity in underwater wireless sensor networks: A communication techniques perspective. IEEE Access 2017, 5, 11176–11199. [Google Scholar] [CrossRef]
- Souiki, S.; Feham, M.; Feham, M.; Labraoui, N. Geographic routing protocols for Underwater Wireless Sensor Networks: A survey. arXiv 2014, arXiv:1403.3779. [Google Scholar] [CrossRef]
- Coutinho, R.W.; Boukerche, A.; Vieira, L.F.; Loureiro, A.A. Geographic and opportunistic routing for underwater sensor networks. IEEE Trans. Comput. 2015, 65, 548–561. [Google Scholar] [CrossRef]
- Darehshoorzadeh, A.; Boukerche, A. Underwater sensor networks: A new challenge for opportunistic routing protocols. IEEE Commun. Mag. 2015, 53, 98–107. [Google Scholar] [CrossRef]
- Coutinho, R.W.; Boukerche, A.; Vieira, L.F.; Loureiro, A.A. Design guidelines for opportunistic routing in underwater networks. IEEE Commun. Mag. 2016, 54, 40–48. [Google Scholar] [CrossRef]
- Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
- Rodoshi, R.T.; Song, Y.; Choi, W. Reinforcement learning-based routing protocol for underwater wireless sensor networks: A comparative survey. IEEE Access 2021, 9, 154578–154599. [Google Scholar] [CrossRef]
- Patil, S.D.; Patil, P.S. A Hybrid PSO-GSA Approach for Cluster Head Selection and Fuzzy Logic Data Aggregation in DEEC-based WSNs. Int. J. Comput. Netw. Inf. Secur. 2025, 17, 48–70. [Google Scholar] [CrossRef]
- Juwaied, A.; Jackowska-Strumillo, L.; Sierszeń, A. Enhancing Clustering Efficiency in Heterogeneous Wireless Sensor Network Protocols Using the K-Nearest Neighbours Algorithm. Sensors 2025, 25, 1029. [Google Scholar] [CrossRef]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Wang, Z.; Du, J.; Hou, X.; Wang, J.; Jiang, C.; Zhang, X.P.; Ren, Y. Toward communication optimization for future underwater networking: A survey of reinforcement learning-based approaches. IEEE Commun. Surv. Tutor. 2024, 27, 2765–2793. [Google Scholar] [CrossRef]
- Hu, T.; Fei, Y. QELAR: A machine-learning-based adaptive routing protocol for energy-efficient and lifetime-extended underwater sensor networks. IEEE Trans. Mob. Comput. 2010, 9, 796–809. [Google Scholar]
- He, J.; Tian, J.; Pu, Z.; Wang, W.; Huang, H. Cross-Layer Routing Protocol Based on Channel Quality for Underwater Acoustic Communication Networks. Appl. Sci. 2024, 14, 9778. [Google Scholar] [CrossRef]
- Ahmed, M.; Salleh, M.; Channa, M.I. Routing protocols based on node mobility for Underwater Wireless Sensor Network (UWSN): A survey. J. Netw. Comput. Appl. 2017, 78, 242–252. [Google Scholar] [CrossRef]
- Ismail, A.; Wang, X.; Hawbani, A.; Alsamhi, S.; Abdel Aziz, S. Routing protocols classification for underwater wireless sensor networks based on localization and mobility. Wirel. Netw. 2022, 28, 797–826. [Google Scholar] [CrossRef]
- Xie, P.; Cui, J.H.; Lao, L. VBF: Vector-based forwarding protocol for underwater sensor networks. In NETWORKING’06: Proceedings of the 5th International IFIP-TC6 Conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communications Systems, Coimbra, Portugal, 15–19 May 2006; Proceedings 5; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1216–1221. [Google Scholar]
- Nicolaou, N.; See, A.; Xie, P.; Cui, J.H.; Maggiorini, D. Improving the robustness of location-based routing for underwater sensor networks. In Proceedings of the Oceans 2007-Europe, Aberdeen, Scotland, 18–21 June 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 1–6. [Google Scholar]
- Yan, H.; Shi, Z.J.; Cui, J.H. DBR: Depth-based routing for underwater sensor networks. In Proceedings of the NETWORKING 2008 Ad Hoc and Sensor Networks, Wireless Networks, Next Generation Internet: 7th International IFIP-TC6 Networking Conference, Singapore, 5–9 May 2008; Proceedings 7. Springer: Berlin/Heidelberg, Germany, 2008; pp. 72–86. [Google Scholar]
- Wahid, A.; Kim, D. An energy efficient localization-free routing protocol for underwater wireless sensor networks. Int. J. Distrib. Sens. Netw. 2012, 8, 307246. [Google Scholar] [CrossRef]
- Yu, H.; Yao, N.; Wang, T.; Li, G.; Gao, Z.; Tan, G. WDFAD-DBR: Weighting depth and forwarding area division DBR routing protocol for UASNs. Ad Hoc Netw. 2016, 37, 256–282. [Google Scholar] [CrossRef]
- Noh, Y.; Lee, U.; Wang, P.; Choi, B.S.C.; Gerla, M. VAPR: Void-aware pressure routing for underwater sensor networks. IEEE Trans. Mob. Comput. 2012, 12, 895–908. [Google Scholar] [CrossRef]
- Wang, Q.; Li, J.; Qi, Q.; Zhou, P.; Wu, D.O. An adaptive-location-based routing protocol for 3-D underwater acoustic sensor networks. IEEE Internet Things J. 2020, 8, 6853–6864. [Google Scholar] [CrossRef]
- Zhou, Y.; Cao, T.; Xiang, W. Anypath routing protocol design via Q-learning for underwater sensor networks. IEEE Internet Things J. 2020, 8, 8173–8190. [Google Scholar] [CrossRef]
- Jin, Z.; Ma, Y.; Su, Y.; Li, S.; Fu, X. A Q-learning-based delay-aware routing algorithm to extend the lifetime of underwater sensor networks. Sensors 2017, 17, 1660. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, Z.; Chen, L.; Wang, X. Reinforcement learning-based opportunistic routing protocol for underwater acoustic sensor networks. IEEE Trans. Veh. Technol. 2021, 70, 2756–2770. [Google Scholar] [CrossRef]
- Zhu, R.; Jiang, Q.; Huang, X.; Li, D.; Yang, Q. A reinforcement-learning-based opportunistic routing protocol for energy-efficient and void-avoided UASNs. IEEE Sensors J. 2022, 22, 13589–13601. [Google Scholar] [CrossRef]
- Mahajan, P.; Balamurugan, P.; Kumar, A.; Chalapathi, G.; Chamola, V.; Khabbaz, M. Multi-Objective MDP-based Routing In UAV Networks For Search-based Operations. IEEE Trans. Veh. Technol. 2024, 73, 13777–13789. [Google Scholar] [CrossRef]
- Jung, W.S.; Yim, J.; Ko, Y.B. QGeo: Q-learning-based geographic ad hoc routing protocol for unmanned robotic networks. IEEE Commun. Lett. 2017, 21, 2258–2261. [Google Scholar] [CrossRef]
- Dorri, A.; Kanhere, S.S.; Jurdak, R. Multi-agent systems: A survey. IEEE Access 2018, 6, 28573–28593. [Google Scholar] [CrossRef]
- Wang, P.; Wang, T. Adaptive routing for sensor networks using reinforcement learning. In Proceedings of the Sixth IEEE International Conference on Computer and Information Technology (CIT’06), Seoul, Republic of Korea, 20–22 September 2006; IEEE: Piscataway, NJ, USA, 2006; p. 219. [Google Scholar]
- Mammeri, Z. Reinforcement learning based routing in networks: Review and classification of approaches. IEEE Access 2019, 7, 55916–55950. [Google Scholar] [CrossRef]
- Cui, Y.; Zhang, Q.; Feng, Z.; Wei, Z.; Shi, C.; Yang, H. Topology-aware resilient routing protocol for FANETs: An adaptive Q-learning approach. IEEE Internet Things J. 2022, 9, 18632–18649. [Google Scholar] [CrossRef]
- Su, W.; Lee, S.J.; Gerla, M. Mobility prediction and routing in ad hoc wireless networks. Int. J. Netw. Manag. 2001, 11, 3–30. [Google Scholar] [CrossRef]
- Zhu, Y.; Tian, D.; Yan, F. Effectiveness of entropy weight method in decision-making. Math. Probl. Eng. 2020, 2020, 3564835. [Google Scholar] [CrossRef]
- Ishizaka, A.; Nemery, P. Multi-Criteria Decision Analysis: Methods and Software; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Gray, R.M. Entropy and Information Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Bin, W.; Kerong, B.; Yixue, H.; Mingjiu, Z. SQMCR: Stackelberg Q-learning based Multi-hop Cooperative Routing Algorithm for Underwater Wireless Sensor Networks. IEEE Access 2024, 12, 56179–56195. [Google Scholar] [CrossRef]
- Hernandez-Cons, N.; Kasahara, S.; Takahashi, Y. Dynamic hello/timeout timer adjustment in routing protocols for reducing overhead in MANETs. Comput. Commun. 2010, 33, 1864–1878. [Google Scholar] [CrossRef]
- Alsaqour, R.; Abdelhaq, M.; Saeed, R.; Uddin, M.; Alsukour, O.; Al-Hubaishi, M.; Alahdal, T. Dynamic packet beaconing for GPSR mobile ad hoc position-based routing protocol using fuzzy logic. J. Netw. Comput. Appl. 2015, 47, 32–46. [Google Scholar] [CrossRef]
- Arafat, M.Y.; Moh, S. A Q-learning-based topology-aware routing protocol for flying ad hoc networks. IEEE Internet Things J. 2021, 9, 1985–2000. [Google Scholar] [CrossRef]
- Camp, T.; Boleng, J.; Davies, V. A survey of mobility models for ad hoc network research. Wirel. Commun. Mob. Comput. 2002, 2, 483–502. [Google Scholar] [CrossRef]
- Mohemed, R.E.; Saleh, A.I.; Abdelrazzak, M.; Samra, A.S. Energy-efficient routing protocols for solving energy hole problem in wireless sensor networks. Comput. Netw. 2017, 114, 51–66. [Google Scholar] [CrossRef]
- Khasawneh, A.; Latiff, M.S.B.A.; Kaiwartya, O.; Chizari, H. A reliable energy-efficient pressure-based routing protocol for underwater wireless sensor network. Wirel. Netw. 2018, 24, 2061–2075. [Google Scholar] [CrossRef]
- Riley, G.F.; Henderson, T.R. The ns-3 network simulator. In Modeling and Tools for Network Simulation; Springer: Berlin/Heidelberg, Germany, 2010; pp. 15–34. [Google Scholar]
- Martin, R.; Rajasekaran, S.; Peng, Z. Aqua-Sim Next generation: An NS-3 based underwater sensor network simulator. In Proceedings of the 12th International Conference on Underwater Networks & Systems, Halifax, NS, Canada, 6–8 November 2017; pp. 1–8. [Google Scholar]
- Hu, S.; Wang, G.; Liu, Z.; Gao, X. Application of Graph Signal Sampling in Underwater Distributed Cooperative Detection. In Proceedings of the 2025 5th International Conference on Sensors and Information Technology, Nanjing, China, 21–23 March 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 785–789. [Google Scholar]
- Ventura, G.; Ardizzon, F.; Tomasin, S. Authentication by location tracking in underwater acoustic networks. arXiv 2024, arXiv:2410.03511. [Google Scholar] [CrossRef]
- Morozs, N.; Gorma, W.; Henson, B.T.; Shen, L.; Mitchell, P.D.; Zakharov, Y.V. Channel modeling for underwater acoustic network simulation. IEEE Access 2020, 8, 136151–136175. [Google Scholar] [CrossRef]























| Protocols | Features | Challenges |
|---|---|---|
| VBF [22] | Robust and scalable | Not for sparse networks |
| DBR [24] | Improve packet delivery ratio | More energy consumption and redundant forwarding |
| ALRP [28] | Adaptive forwarding area | Restricted to node distribution |
| QELAR [18] | Q-learning-based, single metric (energy efficiency) | Restricted to a single optimization objective |
| QDAR [30] | Q-learning-based, energy efficiency and latency | Fixed metric weights, centralized decision-making |
| QLFR [29], ROEVA [32], RLOR [31], DROR [31] | Q-learning combined with opportunistic routing to overcome routing holes, multiple decision metrics | Additional holding time, fixed metric weights |
| Parameters | Value |
|---|---|
| Simulator | NS-3 |
| Deployment area | 500 m × 500 m × 500 m |
| Number of sensor nodes | [20, 30, 40, 50, 60] |
| Number of sinks | 1 |
| Channel model | Binary range-based model in [54] |
| Communication range | 200 m |
| The speed of sensor nodes | [1, 2, 3, 4, 5] m/s |
| Average packet generation interval | [10, 20, 30, 40, 50] s |
| Packet size | 80 bytes |
| Transmission power | 2 W |
| Idle power | 0.008 W |
| Receiving power | 0.75 W |
| Mobility model | Gauss–Markov mobility model, tuning parameter 0.85, standard deviation 0.05, correlation time 30 s |
| Energy model | Energy model in [28] |
| Packet generation model | Poisson distribution |
| Acoustic speed | 1500 m/s |
| Antenna | Omni-directional |
| Learning rate | 0.9 |
| Discount factor | 0.3 |
| factor | 0.3 |
| HH-VBF | Radius of virtual pipeline 160 m |
| ALRP | k = 7, s |
| QLFR | The difference of holding time 0.2 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, X.; Wu, Y.; Zhu, M.; Ren, J. A Q-Learning-Based Link-Aware Routing Protocol for Underwater Wireless Sensor Networks. J. Mar. Sci. Eng. 2025, 13, 2374. https://doi.org/10.3390/jmse13122374
Li X, Wu Y, Zhu M, Ren J. A Q-Learning-Based Link-Aware Routing Protocol for Underwater Wireless Sensor Networks. Journal of Marine Science and Engineering. 2025; 13(12):2374. https://doi.org/10.3390/jmse13122374
Chicago/Turabian StyleLi, Xinyang, Yanbo Wu, Min Zhu, and Jie Ren. 2025. "A Q-Learning-Based Link-Aware Routing Protocol for Underwater Wireless Sensor Networks" Journal of Marine Science and Engineering 13, no. 12: 2374. https://doi.org/10.3390/jmse13122374
APA StyleLi, X., Wu, Y., Zhu, M., & Ren, J. (2025). A Q-Learning-Based Link-Aware Routing Protocol for Underwater Wireless Sensor Networks. Journal of Marine Science and Engineering, 13(12), 2374. https://doi.org/10.3390/jmse13122374

