A Q-Learning-Based Delay-Aware Routing Algorithm to Extend the Lifetime of Underwater Sensor Networks

Jin, Zhigang; Ma, Yingying; Su, Yishan; Li, Shuo; Fu, Xiaomei

doi:10.3390/s17071660

Open AccessArticle

A Q-Learning-Based Delay-Aware Routing Algorithm to Extend the Lifetime of Underwater Sensor Networks

by

Zhigang Jin

¹

,

Yingying Ma

¹,

Yishan Su

^1,*,

Shuo Li

^1,* and

Xiaomei Fu

²

¹

School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China

²

School of Marine Science and Technology, Tianjin University, Tianjin 300072, China

^*

Authors to whom correspondence should be addressed.

Sensors 2017, 17(7), 1660; https://doi.org/10.3390/s17071660

Submission received: 7 June 2017 / Revised: 12 July 2017 / Accepted: 14 July 2017 / Published: 19 July 2017

(This article belongs to the Section Sensor Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Underwater sensor networks (UWSNs) have become a hot research topic because of their various aquatic applications. As the underwater sensor nodes are powered by built-in batteries which are difficult to replace, extending the network lifetime is a most urgent need. Due to the low and variable transmission speed of sound, the design of reliable routing algorithms for UWSNs is challenging. In this paper, we propose a Q-learning based delay-aware routing (QDAR) algorithm to extend the lifetime of underwater sensor networks. In QDAR, a data collection phase is designed to adapt to the dynamic environment. With the application of the Q-learning technique, QDAR can determine a global optimal next hop rather than a greedy one. We define an action-utility function in which residual energy and propagation delay are both considered for adequate routing decisions. Thus, the QDAR algorithm can extend the network lifetime by uniformly distributing the residual energy and provide lower end-to-end delay. The simulation results show that our protocol can yield nearly the same network lifetime, and can reduce the end-to-end delay by 20–25% compared with a classic lifetime-extended routing protocol (QELAR).

Keywords:

underwater sensor networks; routing protocol; lifetime-extended; delay-aware; Q-learning technique

1. Introduction

For decades, underwater wireless sensor networks (UWSNs) have attracted significant interest. Many applications of UWSNs, including commercial exploitation, marine mammal studies and oceanography data collection [1,2] allow humans to sense the vast underwater domain and motivate research on UWSN design.

However, because of the harsh environment and limited spectrum source, communications in UWSNs are much more difficult than those in terrestrial sensor networks. One of the reasons is that the radiowaves employed in terrestrial sensor networks is not feasible in the underwater environment because of their rapid attenuation. For example, Berkeley Mica 2 motes have been reported to have only 120 cm communication range in an underwater environment at 433 MHz [3]. Currently, the only appropriate method for long distance communications is acoustic communication. The speed of sound in water is about 1500 m/s, five-orders slower than the speed of radiowaves, thus there is a long propagation delay in UWSNs [4]. Moreover, the sensor nodes are deployed under the sea, therefore, it is difficult to recharge their batteries [5]. Since the sensors are powered by batteries, the limited energy restricts the network lifetime of UWSNs. Network lifetime is the time span from the deployment to the instant when the network is considered nonfunctional [6]. In this paper, the network lifetime is defined as the time span from the deployment to the instant the energy of the first node is exhausted. All these different characteristics make the algorithms, especially the routing algorithms used in terrestrial networks, unfeasible for UWSNs [7,8].

Typical routing algorithms employ shortest path algorithms for routing decisions. Thus, nodes chosen frequently on the shortest paths drain more quickly than other nodes, leading to a shorter network lifetime. To prolong the network lifetime, many routing algorithms are proposed. The energy efficient algorithm in [9] pays attention to coverage. It can preserve k-coverage and achieve maximal coverage for an area with the least energy consumption. However, low energy consumption does not necessarily lead to a long network lifetime. The distribution of residual energy also affects the network lifetime. In [10], the prolong stable election protocol (P-SEP) exploits the heterogeneity of energy thresholds to avoid low-energy nodes to nominate cluster heads and avoid continuous selection of a node. P-SEP has features like aliveness, fairness, full distribution in cluster head selection and can prolong network lifetime remarkably. However, like P-SEP, many routing algorithms employ greedy approaches to determine the next hop or the cluster head on the path, without considering the long-term rewards. That is to say, greedy routing algorithms only choose the node with the highest direct reward, even if the packet transmission thereafter needs more hops. Thus, the optimal next hop for the current node determined by these algorithms may not be the global optimal one for the whole routing path.

Moreover, the algorithms mentioned above pay no attention to the end-to-end delay, which is an important indicator in UWSNs. The fog-supported learning automata adaptive probabilistic search (FLAPS) algorithm in [11] is a delay-efficient distributed route-discovering algorithm. It can forward messages at the minimum bandwidth cost and latency. Although the synchronization functions in FLAPS are not suitable in UWSNs because of the long propagation delay in UWSNs, the application of Q-learning algorithm improves the performance remarkably. Nodes can process in an adaptive and distributed way using the reward-penalty mechanism of the Q-learning algorithm. With moderate improvement, the Q-learning algorithm can be implemented in UWSNs. The Min-delay routing in [12] based on the Dijkstra algorithm can minimize delay, reduce link interruption and improve reliability. However, it is a multipath routing method, which means that there is more than one path from the source node to the sink node. Thus, Min-delay routing may increase the energy consumption. Above all, in order to prolong the network lifetime, nodes with more residual energy should be chosen as relay nodes even though they are far from the sink node, while in order to minimize the end-to-end delay, nodes near to the sink node should be chosen. Network life and end-to-end delay are both important in UWSNs. Therefore, it is necessary to introduce significant compromises at the routing design stage.

To cater for these issues, we propose a Q-learning-based delay-aware routing (QDAR) mechanism to extend the network lifetime for UWSNs. In QDAR, nodes only need to know about their residual energy and the delay from neighbors. As the action-utility function (Q-value) of Q-learning technique takes both direct reward and discounted long-term reward into account, Q-learning-based protocols can determine the global optimal next hop instead of a greedy one. The main contributions of QDAR can be summarized as follows: (1) it defines a data collection phase and designs the packet structure before routing decisions to quickly adapt to the dynamic underwater environment; (2) it takes both delay and residual energy into consideration by defining two kinds of cost functions: delay-related cost and energy-related cost; (3) it uses an adaptive mechanism to ensure a longer network lifetime and a relatively shorter delays: when the residual energy is enough, the end-to-end delay is restricted, while the residual energy of some nodes is lower than the threshold, an adequate path consisting of nodes with longer delays but more remaining energy is determined; (4) QDAR is easily extendible: energy consumption, channel capacity, communication reliability and many other metrics can be integrated into the action-utility functions in future research for different targets.

The QDAR algorithm can work adaptively and distributively through trade-offs between the network lifetime and end-to-end delays. The simulation results show that our algorithm achieves nearly the same network lifetime extension as the existing lifetime-extended protocol, and reduces end-to-end delay by 20–25%.

The rest of this paper is organized as follows: in Section 2, related works on underwater routing protocols are discussed briefly. In Section 3, the basic Q-learning technique is introduced and adopted into our system model. In Section 4 and Section 5, the QDAR algorithm is described in detail. The simulation results are shown and discussed in Section 6. Finally, we conclude this paper in Section 7.

2. Related Work

Underwater routing techniques are a hot research topic for UWSNs nowadays. There are several kinds of routing protocols that aim to improve energy efficiency, reduce end-to-end delay and prolong network lifetime [13,14]. In this section, we provide a review on research works that have been done on this topic.

Most energy-efficient routing protocols aim to reduce energy consumption and prolong network lifetime. A hierarchical routing algorithm called queen-bee evolution algorithm (QEGA) [15] works better in terms of energy consumption. QEGA has a high rate which results in premature convergence. Thus, the algorithm can find the optimal solution more quickly. However, QEGA does not consider the residual energy, which is important to extend network lifetime. The energy-saving vector-based forwarding (ES-VBF) protocol [16] defines a desirableness factor based on residual energy and location information. In the routing pipe, nodes with more residual energy are more possible to forward packets. Although the algorithm prolongs the network lifetime, it needs the location information of all the nodes, which is still a challenge to be solved.

The adaptive power controlled routing (APCR) [17] is an energy efficient routing schema that does not require any location information. In APCR, nodes are assigned to concentric layers according to the signal power of a received INTEREST packet broadcasted by sink nodes. Then, routing paths are decided based on layer numbers and residual energy. To improve the energy efficiency, nodes are able to adjust their transmission power to a set of values according to the information received during packet transmission. If forwarding nodes are found at multiple layers, the power is decreased. If no neighbor is found, the power is increased. Thus, APCR can achieve a high delivery ratio, but the number of forwarding nodes at each layer is not limited properly. If multiple nodes forward the same packet, the total energy consumption is increased.

The Q-learning-based adaptive routing (QELAR) protocol is proposed [18] and Q-learning is proved to perform well in UWSNs in several aspects. QELAR defines the reward function based on the residual energy of the sensor nodes. In this protocol, sensor nodes choose the node with more residual energy as the next hop, so that the network lifetime of the network can be extended. However, in QELAR protocol, each node takes the responsibility to learn the environment by metadata exchanging and decide the next hop, leading to a higher energy consumption for each node. Moreover, the protocol does not restrict end-to-end delay. When the number of the sensor nodes increases, the routing will detour with more and more nodes, then the end-to-end delay is prolonged. Thus, QELAR works inefficiently in some situations because of the long delay.

Many research works point out that the problems of latency in UWSNs are serious, especially for time-critical applications. In [19] the authors employ a probability model to describe the propagation delay of a link and select the next hop with lower delay. In [20], an underwater opportunistic routing (UWOR) is proposed. The forwarding set in which nodes can hear each other and prevent packet duplication is established. Each node in the forwarding set is assigned a relay priority which is related to the probability of successful transmission. The node with the highest priority and limited end-to-end delay can be chosen as the relay node. The simulation results show that UWOR can maximize good put while satisfying end-to-end latency requirements. However, it disables retransmission mechanisms, leading to a lower delivery ratio.

Moreover, there are protocols that can jointly reduce energy consumption and end-to-end delay. Modified energy weight routing (MEWR) protocol [3] is energy efficiency guaranteed using a minimum algorithm. In order to determine an optimal path with a low end-to-end delay as well as low energy consumption, the cost of a link is formulated as a mixture of energy weight and delay weight. In the path discovery phase, a node employs a greedy approach to find all its neighbors and determine an optimal one with the lowest cost. However, low energy consumption does not lead to long network lifetime effectively. Since MEWR does not take the residual energy of sensor nodes into account, it cannot optimize the energy distribution, which is crucial for network lifetime extension.

3. Q-learning Based Model

In this section, we give a brief introduction to the basic Q-learning technique, which is the fundamental theory of our algorithm. Next, we explain our Q-learning based system model.

3.1. The Basic Q-Learning Technique

Q-learning is a model-free reinforcement learning method. It provides agents with the capability of learning to act optimally. In this technique, an agent chooses an action at a particular state according to the reinforcement it receives. The reinforcement is composed of the direct reward or penalty and the future consequence the agent estimates. With the reinforcement, the agent can evaluate how good an action is in the current situation. The task of the Q-learning is to determine an optimal policy to get a higher reward. Once the agent receives the highest reward, it will stop at the state which is called absorbing goal. An agent in state x_n at step n can go to the next state x_n+1 by choosing action a_n under the policy

π

. The probability of shifting to state x_n+1 according to the law:

P_{x_{n} x_{n + 1}}^{a_{n}} = p r o b {x = x_{n + 1} | x_{n}, a_{n}}

(1)

The agent receives rewards according both the values of direct reward and the future reward. A direct reward r_n in state x_n is received immediately after the agent performs action a_n, only depending upon the state and action. A future reward is the total reward that the agent expects in the new state after action a_n. The action-utility function

Q^{π} (x, a)

, which is used to describe the expected return of action a in state x under policy π, is critical to Q-learning:

Q^{π} (x_{n}, a_{n}) = r_{n} + γ \sum_{x_{n + 1} \in X} P_{x_{n} x_{n + 1}}^{a_{n}} Q^{π} (x_{n + 1}, a)

(2)

where:

r_{n} = \sum_{x_{n + 1} \in X} P_{x_{n} x_{n + 1}}^{a_{n}} R_{x_{n} x_{n + 1}}^{a_{n}}

(3)

R_{x_{n} x_{n + 1}}^{a_{n}}

is the reward of action a_n, which can make the agent go into state

x_{n + 1}

from state

x_{n}

.

γ

(

0 \leq γ < 1

) is the discount factor. It means the reward received current step hence is worth less than that received now. Typically, to balance the direct and future reward, the value of

γ

is within (0.5, 0.99). If an optimal policy is performed thereafter, we can derive the optimal

Q

value. It is proved that at least one optimal stationary policy

π^{*}

exists [21]. Under the optimal policy,

Q^{*}

can be described as:

Q^{*} (x_{n}) = {m a x}_{a} (r_{n} + γ \sum_{x_{n + 1} \in X} P_{x_{n} x_{n + 1}}^{a_{n}} Q^{*} (x_{n + 1}, a))

(4)

Thus, in order to get an optimal Q, we have:

Q (x_{n}, a_{n}) = r_{n} + γ \sum_{x_{n + 1} \in X} P_{x_{n} x_{n + 1}}^{a_{n}} Q^{*} (x_{n + 1})

(5)

and:

a_{n} = \arg \max Q (x_{n}, a_{n})

(6)

where a_n is the optimal action to get the optimal Q.

3.2. Q-Learning Based System Model

In the UWSNs, the sink nodes are on the surface while the source nodes are deployed underwater. The sink nodes can receive data from multiple distributed source nodes [22]. In our QDAR mechanism, when a source node has a packet to send to the sink node, it broadcasts to the sink node to request communication as well as collect information. This mechanism is explained in the next section. With the collected information, the sink node can lay out a virtual topology from the source node to the sink node and decide the routing path thereafter.

Each packet in the networks can be seen as an agent in the Q-learning technique. The Q-learning state is related to the node which holds the packet. When a packet is at node i, the state of the agent is x_i. Packet forwarding from node i to node j is action a_i. If the transmission is successful, the packet (agent) state shifts from x_i to x_j. The routing path is the policy

π

, directing the packet (agent) to take the proper action. Obviously, it is costly for a node to perform all actions by sending packets to all the neighbor nodes and get all these Q values. Therefore, in our mechanism, the sink assumes a virtual packet and sends it in the virtual topology. By doing this virtual experiment, the sink node can perform QDAR algorithm and determine the routing path because the sink holds the nodes information, such as the delay and the residual energy.

4. QDAR Mechanism Overview

4.1. The QDAR Mechanism

The QDAR mechanism is designed for the overall routing process. There are five phases in QDAR mechanism: data_ready phase, routing decision, interest phase, packet forwarding and acknowledgement. The data_ready phase is responsible for information collection, after which the sink node can get the fundamental information for routing decision. Then, the sink node determines the path according to QDAR algorithm. During the interest phase, the sink node sends a packet to the source node along the determined path. Thus, the path can be constructed. Packets are sent along the path thereafter as long as the acknowledgement phase is successful. Otherwise, the whole mechanism restarts. The main communication procedure is briefly depicted as Figure 1.

4.2. Assumptions

The following assumptions apply:

Nodes hold their own depth information and can embed it in the packets;
Nodes in UWSNs implement Source_initiated Query [23];
The sink node keeps the successful and failed communication record of the nodes.

In the Source_initiated Query, if a source node has a packet to send to the sink, it first broadcasts a DATA_READY packet for both communication request and information collection. This phase is defined as the data_ready phase in this paper. During the data_ready phase, a node only forwards the packet if its depth is smaller than that of the previous node.

Once the sink node receives this packet, it can decide if it is interested in such data. If so, the sink node decides the routing path according to the QDAR algorithm and sends out an INTEREST packet to the source node along the decided path for path construction. This phase is defined as the interest phase. In this phase, each node records its previous hop and takes it as the next hop in the packet forwarding phase. Only nodes selected by QDAR algorithm join in the INTEREST packet forwarding, while the others turn into sleep state thereafter or take responsibilities for another communication task to save energy or improve network capacity.

After the INTEREST packet reaches the source node, the path is constructed. Then, the source node transmits the data packet through the constructed path. Finally, the sink node responds acknowledgement (ACK) packet after the successful data reception. If the source node receives ACK successfully, this communication episode concludes. The coming packets from the same source node are sent along the same path without repeated data collection and routing decision phases until transmission failure occurs.

4.3. The Packet Structures

Packet structures for DATA_READY packet and INTEREST packet, which can help establish the routing track, are designed as follows. DATA_READY packet contains the type and ID of the packet, source node address (SNA), time stamp (TS), current node depth (CND) and two arrays C and RES in which two costs and residual energy of this node (RE) are saved respectively. SNA is permanent during the network lifetime of the packet. Elements of C are calculation results of two cost functions: delay-related cost and energy-related cost. Denoting the residual energy of node i as

e_{r e s}^{i}

, we define the energy-related cost function

c e (e_{r e s}^{i})

as:

c e (e_{r e s}^{i}) = 1 - \frac{e_{r e s}^{i}}{e_{i n i}^{i}}

(7)

where

e_{i n i}^{i}

is the initial energy of node i. The less energy a node remains, the more it costs to forward packets. A node with higher cost is more reluctant to communicate. A delay-related cost function

c t (t_{i j})

is also defined:

c t (t_{i j}) = 1 - \frac{1}{t_{i j} + 1}

(8)

where

t_{i j}

represents the delay of sending packet between node i and node j. Clearly, the longer delay, the higher cost. On receiving a DATA_READY packet, the nodes have two tasks:

Calculate. The nodes extract RE and TS. With these information, the nodes can compute their energy-related costs and delay-related costs depending on the delay and residual energy.
Update and relay. The nodes update TS, C and RES packet fields with their own information or calculation results and relay the DATA_READY packet to neighbor nodes until the packet arrives at the sink node.

When the DATA_READY packet reaches the sink node, the sink node holds the relevant data and begins routing decision if it is interested in the packet. After the routing decision phase, the sink node creates an INTEREST packet composed of type and packet ID, destination node address (DNA), current node depth (CND) and path direction (PD). DNA is the source node address (SNA) of DATA_READY packet. The structures of both packets and their relationship are shown in Figure 2.

5. QDAR Algorithm

In this section, we describe the proposed QDAR algorithm in details. The important notations are listed in Table 1.

In order to design a Q-learning-based delay-aware routing protocol to extend the network lifetime of UWSNs, we define an action-utility function whose value is

Q

. The sink node has a matrix in which

Q

values of all the nodes are stored. These

Q

values are future rewards of packet forwarding and are used in the routing decisions. In our protocol, node i and packet forwarding from node i to node j are seen as state i and action j of Q-learning technique, respectively.

Firstly, we define a reward function

R_{s_{i j}}^{a_{j}}

for action j, which is related to both propagation delay and residual energy. If this transmission is successful, the reward of the action for node i is:

R_{x_{i j}}^{a_{j}} = - β_{0} - β_{1} C_{i j} = - β_{0} - β_{1} [c e (e_{r e s}^{i}) + {φ_{i j}}^{*} c t (t_{i j}) + c e (e_{r e s}^{j})]

(9)

where

a_{j}

is action j. Because forwarding packet occupies channel bandwidth and disturbs other nodes, a constant cost

β_{0}

is added into the function.

β_{1}

is the weight of the sum of delay-related cost and energy-related cost.

φ_{i j}

is the delay sensitivity. A higher

φ_{i j}

means the delay is more repellent.

If the transmission fails, node i will resend the packet, which means the node should pay double energy-related cost and more delay-related cost. Thus, the reward function becomes:

R_{x_{i i}}^{a_{j}} = - β_{0} - β_{2} C_{i i} = - β_{0} - β_{2} [2 \times c e (e_{r e s}^{i}) + {φ_{i j}}^{*} c t (t_{i j}^{'} + t_{i j}) + c e (e_{r e s}^{j})]

(10)

where

t_{i j}^{'}

is the time that node i spends in the failed transmission.

β_{2}

is the weight of the sum of delay-related cost and energy-related cost, in the same position as

β_{1}

.

To further prolong the network lifetime, we design an adaptive detouring path strategy. We define a set

N_{i}

for node i and an energy warning threshold

e_{wth}

. The elements of

N_{i}

are neighbor nodes of node i. When the residual energy of the next hop of node i, for example, node j, is lower than

e_{wth}

, or the energy-related cost

c e (e_{r e s}^{j}) > c e (e_{wth})

, the sink node modifies the

φ_{i h} |_{h \in N, h \neq j}

values to

φ_{i h}^{'} |_{h \in N, h \neq j}

, where

φ_{i h}^{'} |_{h \in N, h \neq j} = φ_{0} φ_{i h} |_{h \in N, h \neq j}

,

0 < φ_{0} < 1

. In this way, the weight of delay-related cost of communication with node j is higher than those of the other nodes. With

φ_{i h}^{'}

, the sink node can determine a detouring path by choosing nodes with more residual energy as the next hop of node i.

To calculate the direct reward, sink node keeps the communication record so as to estimate the state transition probabilities:

P_{x_{i j}}^{a_{j}}

and

P_{x_{i i}}^{a_{j}}

of each node.

P_{x_{i j}}^{a_{j}}

and

P_{x_{i i}}^{a_{j}}

are the probabilities of successful and failed packet forwarding respectively:

P_{x_{i j}}^{a_{j}} = \frac{m_{t}}{M_{t}},

(11)

P_{x_{i i}}^{a_{j}} = 1 - P_{x_{i j}}^{a_{j}} .

(12)

Suppose there have been

M_{t}

instances of communication up to time t and

m_{t}

is the frequency of successful packet forwarding, we can define the direct reward function as:

r_{i} = R_{x_{i j}}^{a_{j}} P_{x_{i j}}^{a_{j}} + R_{x_{i i}}^{a_{j}} P_{x_{i i}}^{a_{j}} = R_{x_{i j}}^{a_{j}} \frac{m_{t}}{M_{t}} + R_{x_{i i}}^{a_{j}} (1 - \frac{m_{t}}{M_{t}})

(13)

The highest

Q

value among all the actions is described as:

Q^{*} (x) = \max_{a} Q (x, a)

(14)

If the next hop is the sink node, the

Q

of this node is much higher than the others. According to Equations (5) and (14), we can define the action-utility function for each neighbor node of node i as:

Q (x_{i}, a_{j}) = r + γ (\frac{m_{t}}{M_{t}} Q^{*} (x_{j}) + (1 - \frac{m_{t}}{M_{t}}) Q^{*} (x_{i}))

(15)

Then, we choose the neighbor node with the highest

Q

as the next hop and update the previous

Q

storied with newly chosen

Q^{*}

. Initially, if the next hop is not the destination node, Q values in the matrix are set to 0. Otherwise,

Q

values are set to 1. Algorithm 1 for the routing mechanism is conducted as below.

Algorithm 1: The routing mechanism.

Initialize

Q

();

While x_i. next_hop ! = source node

for x_j in N_i do

calculate ce, ct, P;

nodes satisfy

c e (e_{wth}) < c e (e_{r e s}^{j})

are saved in set

N_{i 0}

;

end for

if

N_{i 0}! = \emptyset

then

set

φ_{i h}

to

φ_{i h}^{'}

,

h \in N_{i} / (N_{i 0})

;

calculate the direct reward r;

select the node x_j with maximum

Q

value in set N_i;

calculate

Q

(x_i, a_h),

h \in N_{i} / (N_{i 0})

;

a_j = argmax(

Q

(x_i, a_h));

else a_j = argmax(

Q

(x_i, a_h))

end if

x_i = x_j

The QDAR mechanism can adapt to the dynamic underwater environment quickly. An energy-related cost function and a delay-related cost function are defined in the data_ready phase. After that, the sink node has the collected information and performs QDAR algorithm. Firstly, it defines two reward functions with the cost functions for both successful and failed transmission. In the reward functions, there is an alterable parameter. Based on this parameter, an adaptive detouring path strategy is designed. Thus, QDAR algorithm can work adaptively with different residual energy. Then, with the reward functions and the corresponding probability functions, the action-utility function (Q-value) is determined.

Finally, the sink node chooses the global optimal next hop and then determines the routing path. In the next section, the simulation results prove that the adaptive solution in QDAR ensures a longer network lifetime and a relatively shorter delay.

6. Performance Evaluation

In this section, we evaluate the performance of the QDAR algorithm. Firstly, the performance of the QDAR algorithm with different parameters is shown. Then, we compare QDAR with the Q-learning-based lifetime-aware (QELAR) routing protocol and the vector-based forwarding (VBF) routing protocol. Finally, the performances of QDAR and QELAR are compared with different underlying MAC protocols.

6.1. Experimental Framework

In our simulation, 80 sensor nodes are randomly deployed in a 5000 m × 5000 m × 1500 m three-dimensional space. These sensor nodes are identical in every feature, such as the initial energy, communication power, transmission range and so on. Five sink nodes are randomly deployed on the water surface. As it is convenient to maintain these sink nodes, we assume that these sink nodes have infinite energy and can communicate with each other by radio. Thus, packets can be sent to any sink node. We place a source node at the bottom layer of the network. The transmission speed is 1500 m/s. At the source node, packets are generated and prepared to send to the sink node following an independent Poisson process with a rate λ (packets/s). The initial energy, energy warning threshold are set to 150 J and 75 J. Because

β_{1}

and

β_{2}

are in equal positions, they are set to the same value:

β_{1} = β_{2}

. The constant cost

β_{0} = 1

and

φ_{i j} = 1

. We use

φ_{0}

to restrict the detour routing. The underlying MAC protocol is the underwater power control (UPC) protocol [24]. Moreover, the preamble signal length should be considered in the end-to-end delay as it is long in UWSNs. For example, the preamble of the Aqua-sent OFDM is 0.49 s, significantly increasing the end-to-end delay. In accordance with the hardware specifications of underwater OFDM modems [25], the simulation parameters are listed in Table 2.

6.2. Evaluation with Different Parameters

To evaluate the influence of

φ_{0}

,

β_{1}

and

β_{2}

, we simulate the variance of residual energy and the average latency of each packet as these values change.

β_{1}

and

β_{2}

are the weights of the sum of delay-related cost and energy-related cost for successful and failed transmission respectively.

φ_{i j}

is the delay sensitivity.

φ_{0}

is the coefficient to modify

φ_{i j}

. Higher

β_{1}

and

β_{2}

can distribute energy more evenly because the residual energy is more important in routing decision. Higher

φ_{0}

means delay is more important so that the detouring routing is restricted.

In the simulation,

φ_{0}

,

β_{1}

and

β_{2}

vary between 0.1 and 0.9 with a step of 0.2. Comparing the variance of residual energy in Figure 3 with varying parameters, we can conclude that with larger

β_{1}

,

β_{2}

and smaller

φ_{0}

, the residual energy distributes more uniformly. When

β_{1} = β_{2} = 0.9

and

φ_{0} = 0.1

, the residual energy variance is about 18, less than 1/3 of that with

β_{1} = β_{2} = 0.1

and

φ_{0} = 0.9

. That is to say less nodes will drain the energy early. Thus, the network lifetime can be prolonged.

However, as shown in Figure 4, the end-to-end delay is increased. When

β_{1} = β_{2} = 0.9, φ_{0} = 0.1

, the delay is 42 s, about 25% longer than that with

β_{1} = β_{2} = 0.1, φ_{0} = 0.9

. This is because that the path detours to avoid nodes close to the sink node but with relatively lower energy. On one hand, higher

β_{1}

and

β_{2}

mean higher weight of the energy-related cost. Nodes with more residual energy cost less to communicate. Therefore, they are more favorable to packet forwarding, making the variance of residual energy lower. On the other hand, as

0 < φ_{0} < 1

,

φ_{0}

indicates that the sink node attaches less importance to delay during the routing decision. Although a node is far away from the sink node, it can be the next hop as long as the residual energy is high enough. As a result, detour routing with more residual energy and longer latency is chosen as the routing path. Furthermore, as

φ_{0}

increases, the rates of the incremental variance and decreasing average latency reduce. The reason is that, when

φ_{0}

increases to a certain degree, the sink node determines routing path with the lowest latency thereafter, paying little attention to the residual energy. However, when

φ_{0}

is too small, the average latency increases significantly because of the detour routing. Thus,

φ_{0}

can not increase or decrease infinitely. In the following evaluation, we set

φ_{0}

to 0.3, 0.5 and 0.7 respectively and set

β_{1}

,

β_{2}

to 0.7.

6.3. Comparison with QELAR and VBF

Figure 5 depicts the average latency of each package with different

φ_{0}

and the comparison with the QELAR protocol, a Q-learning-based lifetime-extended protocol. As λ increases, more packets need to be forwarded and the average latency is prolonged. For example, when

φ_{0} = 0.3

, the latency increases about 23% with λ grows from 0.02 to 0.1. This is because both QDAR and QELAR choose detour routing to distribute the residual energy evenly. When there are more packets sent in UWSNs, nodes closer to the sink node have relatively less residual energy. Thus, the nodes far away from the sink node but with more residual energy are chosen to relay packets.

It is also observed that, in UWSNs with a lower λ, QDAR can reduce latency more significantly. Especially when

φ_{0} = 0.7

, the latency is reduced by nearly 25%. The reason is that at the initial stage of communication, although the residual energy reduces, it is still sufficient and does not worth detour routing. However, QELAR detours from the outset at the initial stage, while QDAR waits until the residual energy of some nodes is lower than a threshold. The higher the

φ_{0}

is, the more attention is paid to latency. Another reason is that, in QDAR, it is the sink node that takes responsibility to construct routing path. Whereas in QELAR, every node in the network should learn the environment and choose an optimal action, leading to a longer delay. In this way, QDAR outperforms QELAR in the average latency by 20–25% as shown in Figure 5.

Since the network lifetime is another critical performance of network, we compare the network lifetime of QDAR, QELAR and VBF protocol [23]. The network lifetime is defined as the total performing time until the first node drains its energy. The ratio of QDAR network lifetime and QELAR network lifetime as well as the ratio of QDAR network lifetime and VBF network lifetime is shown in Figure 6. As depicted, the ratio of QDAR network lifetime and VBF network lifetime is about 1.2 with

φ_{0} = 0.7

. When

φ_{0} = 0.3

, the ratio increases to 1.3. This is because in VBF, all the nodes in the virtual pipe between a source and a destination have chances to relay packets. As a result, many packets are sent repeatedly by different nodes, making them drain energy more quickly. With λ growing from 0.02 to 0.1, more packets are sent and the nodes within the pipe die earlier. On the contrary, QDAR takes the residual energy into account and extends the network lifetime by avoiding to choose nodes with relative less energy. The ratio of QDAR network lifetime and QELAR network lifetime is smaller than 1. The reason is that QELAR can distribute routing traffic to nearly every node in UWSNs, extending the lifetime greatly. While QDAE also take delay into account, avoiding choosing nodes that are too far away from the sink node to forward packets. Thus, some nodes may exhaust energy earlier than that in QELAR. However, even for the worst case, the network lifetime can still achieve 82% of that in QELAR. The factor attributes to the result is our adaptive detouring path strategy: if the residual energy of a node is less than

e_{wth}

, an adequate path with nodes remaining more energy is chosen to relay packets, to postpone energy depletion of the node.

6.4. Evaluation with Different MAC Protocols

We next examine how the underlying MAC protocol layer affects the performance of QDAR and QELAR in terms of total energy consumption. The slotted floor acquisition multiple access (SFAMA) MAC protocol [26] and underwater power control (UPC) MAC protocol [24] are used for QDAR and QELAR, respectively. The results are shown in Figure 7, where the total energy consumption is normalized to (0, 1). As QDAR restricts routing decision with end-to-end delay, some packets may detour with more hops in QELAR than that of QDAR. More nodes forwarding packets means more total energy consumption. Thus, QELAR consumes more energy than QDAR regardless of the MAC protocol used. Besides, we can see that routing algorithms with UPC can achieve lower energy consumption than those with SFAMA. The reason is that UPC can reduce the transmission power level to achieve a better energy efficiency as well as smaller interference range. Moreover, as UPC can deal with the noise, interference and spatial channel reuse, QDAR do not need to pay much attention to these problems. Therefore, using UPC as the underlying MAC protocol, QDAR can prolong the network lifetime more efficiently.

7. Conclusions

In this paper, we have designed a novel delay-aware routing (QDAR) mechanism based on the Q-learning technique to extend the network lifetime of underwater sensor networks. The QDAR mechanism can reduce average latency as well as extend the network lifetime. The data_ready phase and the packet structures are designed for data collection. Then, the sink node applies the QDAR algorithm to determine the routing path. In QDAR, we extend the action-utility function with both residual energy-related cost and delay-related cost. In order to have a better tradeoff between residual energy of nodes and delay, an adaptive detouring path strategy is designed. When the residual energy is sufficient, a path with shorter delay is chosen. When the residual energy of a node is lower than a threshold, the weight of the delay-related cost is decreased so as to construct an adequate path avoiding nodes with relatively less energy, even though these nodes may be nearer to the sink node. Thus, QDAR can distribute the residual energy more evenly, which is crucial to extend the network lifetime. Moreover, as QDAR takes both direct rewards and future rewards into account, it can choose a global optimal next hop, whereas greedy algorithms only pay attention to the direct reward. After a routing decision, the path is constructed during the interest phase. Then, packets are forwarded and the communication ends with an acknowledgement. The QDAR mechanism can work adaptively and distributively in the dynamic underwater environment. We evaluate the performance of QDAR with different parameters and compare it with QELAR and VBF. The simulation results show that QDAR reduces the total energy consumption effectively and decreases the average latency significantly by 20–25% at the cost of only a little reduction in network lifetime. Therefore, QDAR is more adequate for time-critical applications.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61571318), (61571323), and the Guangxi Science and Technology Project AC16380094.

Author Contributions

Z.J. and X.F. conceived and designed the experiments; Y.M. performed the experiments; S.L. and Y.S. analyzed the data; X.F. contributed analysis tools; Y.M. wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Felemban, E.; Shaikh, F.K.; Qureshi, U.M.; Sheikh, A.A.; Qaisar, S.B. Underwater sensor network applications: A comprehensive survey. Int. J. Distrib. Sens. Netw. 2015, 11, 1–14. [Google Scholar] [CrossRef]
Sheikh, A.A.; Felemban, E.; Felemban, M.; Qaisar, S.B. Challenges and opportunities for underwater sensor networks. In Proceedings of the 12th IEEE International Conference on Innovations in Information Technology (IIT), Al Ain, United Arab Emirates, 28–30 November 2016; pp. 1–6. [Google Scholar]
Zhang, S.; Wang, Z.; Liu, M.; Qiu, M. Energy-aware routing for delay-sensitive underwater wireless sensor networks. Sci. China Inf. Sci. 2014, 57, 1–14. [Google Scholar] [CrossRef]
Li, N.; Martínez, J.F.; Meneses Chaus, J.M.; Eckert, M. A survey on underwater acoustic sensor network routing protocols. Sensors 2016, 16, 414. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Qian, L.; Zhang, S.; Liu, M.; Zhang, Q. A MACA-Based Power Control MAC Protocol for Underwater Wireless Sensor Networks. In Proceedings of the IEEE/OES Ocean Acoustics (COA), Harbin, China, 9–11 January 2016; pp. 1–8. [Google Scholar]
Kacimi, R.; Dhaou, R.; Beylot, A.L. Load balancing techniques for lifetime maximizing in wireless sensor networks. Ad Hoc Netw. 2013, 11, 2172–2186. [Google Scholar] [CrossRef] [Green Version]
Darehshoorzadeh, A.; Boukerche, A. Underwater sensor networks: A new challenge for opportunistic routing protocols. IEEE Commun. Mag. 2015, 53, 98–107. [Google Scholar] [CrossRef]
Han, G.; Jiang, J.; Bao, N.; Wan, L.; Guizani, M. Routing protocols for underwater wireless sensor networks. IEEE Commun. Mag. 2015, 53, 72–78. [Google Scholar] [CrossRef]
Ahmadi, A.; Shojafar, M.; Hajeforosh, S.F.; Dehghan, M.; Singhal, M. An efficient routing algorithm to preserve k-coverage in wireless sensor networks. J. Supercomput. 2014, 68, 599–623. [Google Scholar] [CrossRef]
Naranjo, P.G.V.; Shojafar, M.; Mostafaei, H.; Pooranian, Z.; Baccarelli, E. P-SEP: A prolong stable election routing algorithm for energy-limited heterogeneous fog-supported wireless sensor networks. J. Supercomput. 2016, 73, 1–23. [Google Scholar] [CrossRef]
Shojafar, M.; Pooranian, Z.; Naranjo, P.G.V.; Baccarelli, E. FLAPS: Bandwidth and Delay-Efficient Distributed Data Searching in Fog-Supported P2P Content Delivery Networks. J. Supercomput. 2017, 1–22. [Google Scholar] [CrossRef]
Bai, W.; Wang, H.; Shen, X.; Zhao, R.; Zhang, Y. Minimum delay multipath routing based on TDMA for underwater acoustic sensor network. Int. J. Distrib. Sens. Netw. 2016, 2016. [Google Scholar] [CrossRef]
Al Salti, F.; Alzeidi, N.; Arafeh, B.R. EMGGR: An energy-efficient multipath grid-based geographic routing protocol for underwater wireless sensor networks. Wirel. Netw. 2016, 23, 1301–1314. [Google Scholar] [CrossRef]
Ali, T.; Jung, L.T.; Faye, I. End-to-end delay and energy efficient routing protocol for underwater wireless sensor networks. Wirel. Pers. Commun. 2014, 79, 339–361. [Google Scholar] [CrossRef]
Pooranian, Z.; Barati, A.; Movaghar, A. Queen-bee algorithm for energy efficient clusters in wireless sensor networks. World Acad. Sci. Eng. Technol. 2011, 73, 1080–1083. [Google Scholar]
Wei, B.; Luo, Y.M.; Jin, Z.; Wei, J.; Su, Y. ES-VBF: An energy saving routing protocol. In Proceedings of the 2012 International Conference on Information Technology and Software Engineering, Beijing, China, 8−10 December 2012; pp. 87–97. [Google Scholar]
Al-Bzoor, M.; Zhu, Y.; Liu, J.; Reda, A.; Cui, J.H.; Rajasekaran, S. Adaptive power controlled routing for underwater sensor networks. In Proceedings of the International Conference on Wireless Algorithms, Systems, and Applications, Huangshan, China, 8–10 August 2012; pp. 549–560. [Google Scholar]
Hu, T.; Fei, Y. QELAR: A machine-learning-based adaptive routing protocol for energy-efficient and lifetime-extended underwater sensor networks. IEEE Trans. Mob. Comput. 2010, 9, 796–809. [Google Scholar]
Pompili, D.; Melodia, T.; Akyildiz, I.F. Distributed routing algorithms for underwater acoustic sensor networks. IEEE Trans. Wirel. Commun. 2010, 9, 2934–2944. [Google Scholar] [CrossRef]
Hsu, C.C.; Liu, H.H.; Gómez, J.L.G.; Chou, C.F. Delay-sensitive opportunistic routing for underwater sensor networks. IEEE Sens. J. 2015, 15, 6584–6591. [Google Scholar] [CrossRef]
Nowé, A.; Brys, T. A Gentle Introduction to Reinforcement Learning. Scalable Uncertainty Management; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar]
Zhang, Y.; Chen, Y.; Zhou, S.; Xu, X.; Shen, X.; Wang, H. Dynamic node cooperation in an underwater data collection network. IEEE Sens. J. 2016, 16, 4127–4136. [Google Scholar] [CrossRef]
Xie, P.; Cui, J.H.; Lao, L. VBF: Vector-Based Forwarding Protocol for Underwater Sensor Networks. In Proceedings of the International Conference on Research in Networking, Coimbra, Portugal, 15–19 May 2006; pp. 1216–1221. [Google Scholar]
Su, Y.; Zhu, Y.; Mo, H.; Cui, J.H.; Jin, Z. UPC-MAC: A Power Control MAC Protocol for Underwater Sensor Networks. In Proceedings of the International Conference on Wireless Algorithms, Systems, and Applications, Zhangjiajie, China, 7–10 August 2013; pp. 377–390. [Google Scholar]
Yan, H.; Zhou, S.; Shi, Z.J.; Li, B. A DSP implementation of OFDM acoustic modem. In Proceedings of the Second Workshop on Underwater Networks, Montreal, QC, Canada, 14 September 2007; pp. 89–92. [Google Scholar]
Molins, M.; Stojanovic, M. Slotted FAMA: A MAC protocol for underwater acoustic networks. In Proceedings of the IEEE Oceans, Singapore, 16–19 May 2007; pp. 1–7. [Google Scholar]

Figure 1. Q-learning based delay-aware routing mechanism.

Figure 2. Packet structures of DATA_READY packet in data_ready phase and INTEREST packet in interest phase and their relationship.

Figure 3. Variance of the residual energy with different values of

φ_{0}

,

β_{1}

and

β_{2}

.

Figure 3. Variance of the residual energy with different values of

φ_{0}

,

β_{1}

and

β_{2}

.

Figure 4. Average latency with different values of

φ_{0}

,

β_{1}

and

β_{2}

.

Figure 4. Average latency with different values of

φ_{0}

,

β_{1}

and

β_{2}

.

Figure 5. Average latency comparison between QDAR and QELAR with different packet generation rate λ (packets/s).

Figure 6. Lifetime ratios of QDAR and different routing protocols (QELAR and VBF) with different packet generation rate λ (packets/s).

Figure 7. Total energy consumption between QDAR and QELAR with different MAC protocols.

Table 1. Notations.

Name	Description
$a_{j}$	Packet forwarding from node i to node j
$β_{0}$	Constant cost
$β_{1}$ , $β_{2}$	Weight of two costs
$N_{i}$	The neighbor node set of i
$φ_{i j}$	Delay sensitivity of communication between node i and j
$φ_{i j}^{'}$	The modified $φ_{i j}$
$t_{i j}$	Delay of communication between node i and j
$t_{i j}^{'}$	The time of failed communication between node i and j
$c e (e_{r e s}^{i})$	The energy-related cost of node i
$c t (t_{i j})$	The delay-related cost of node i
$γ$	The discount factor of future reward

Table 2. Simulation parameters.

Name	Values
Transmission power	10 W
Receiving power	3 W
Idle power	30 mW
Data packet size	300 B
Transmission rate	3 kbps
Transmission range	500 m
Preamble signal length	0.49 s
Simulation time	10⁴ s

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jin, Z.; Ma, Y.; Su, Y.; Li, S.; Fu, X. A Q-Learning-Based Delay-Aware Routing Algorithm to Extend the Lifetime of Underwater Sensor Networks. Sensors 2017, 17, 1660. https://doi.org/10.3390/s17071660

AMA Style

Jin Z, Ma Y, Su Y, Li S, Fu X. A Q-Learning-Based Delay-Aware Routing Algorithm to Extend the Lifetime of Underwater Sensor Networks. Sensors. 2017; 17(7):1660. https://doi.org/10.3390/s17071660

Chicago/Turabian Style

Jin, Zhigang, Yingying Ma, Yishan Su, Shuo Li, and Xiaomei Fu. 2017. "A Q-Learning-Based Delay-Aware Routing Algorithm to Extend the Lifetime of Underwater Sensor Networks" Sensors 17, no. 7: 1660. https://doi.org/10.3390/s17071660

APA Style

Jin, Z., Ma, Y., Su, Y., Li, S., & Fu, X. (2017). A Q-Learning-Based Delay-Aware Routing Algorithm to Extend the Lifetime of Underwater Sensor Networks. Sensors, 17(7), 1660. https://doi.org/10.3390/s17071660

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Q-Learning-Based Delay-Aware Routing Algorithm to Extend the Lifetime of Underwater Sensor Networks

Abstract

1. Introduction

2. Related Work

3. Q-learning Based Model

3.1. The Basic Q-Learning Technique

3.2. Q-Learning Based System Model

4. QDAR Mechanism Overview

4.1. The QDAR Mechanism

4.2. Assumptions

4.3. The Packet Structures

5. QDAR Algorithm

6. Performance Evaluation

6.1. Experimental Framework

6.2. Evaluation with Different Parameters

6.3. Comparison with QELAR and VBF

6.4. Evaluation with Different MAC Protocols

7. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI