HAS4: A Heuristic Adaptive Sink Sensor Set Selection for Underwater AUV-Aid Data Gathering Algorithm

In this paper, we target solving the data gathering problem in underwater wireless sensor networks. In many underwater applications, it is not quick to retrieve sensed data, which gives us the opportunity to leverage mobile autonomous underwater vehicles (AUV) as data mules to periodically collect it. For each round of data gathering, the AUV visits part of the sensors, and the communication between AUV and sensor nodes is a novel high-speed magnetic-induction communication system. The rest of the sensors acoustically transmit their sensed data to the AUV-visit sensors. This paper deploys the HAS4 (Heuristic Adaptive Sink Sensor Set Selection) algorithm to select the AUV-visited sensors for the purpose of energy saving, AUV cost reduction and network lifetime prolonging. By comparing HAS4 with two benchmark selection methods, experiment results demonstrate that our algorithm can achieve a better performance.


Introduction
Ocean Big Data (OBD) is an emerging research area that benefits ocean environmental monitoring, offshore exploration, disaster prevention, and military surveillance. Since the oceans, rivers and lakes cover 71% of our planet, and the traditional "deploy, sense, retrieve, and post-process" routine for these sensing activities is difficult and costly for humans in large area applications. Deploying an underwater sensor network is very efficient for monitoring physical factors (e.g., temperature, salty, light) by a multitude of sensing modalities. Since it is impractical to connect every possible underwater sensor by wire, wireless communication is currently the dominant data delivery technique for numerous underwater applications. Current underwater communication fails in three categories, acoustic, electromagnetic and optical [1]. Benefits and drawbacks of these methods are compared in Table 1. Due to serious attenuation, except for acoustic waves, the effective communication range in sea water is no more than 100 m, which leads to acoustic communication being the dominant data delivery technique for numerous underwater applications. However, acoustic communication is the most difficult obstacle to the realization of these networks, which is caused by limited bandwidth (currently, few hundreds of bits per second), long propagation delay and unchangeable power-battery [2][3][4][5].
Fortunately, for underwater environments, magnetic-induction (MI) as an alternative wireless transmission mode besides acoustic communications obtains high bandwidth connectivity (up to 10 Mbps) with energy per-bit orders of magnitude lower than that of acoustic communications [6]. However, for MI, it allows nodes to robustly communicate only when they are a few meters from each other (usually less than 10 m) [7]. Since mobile autonomous underwater vehicles (AUVs) can hover close enough to a sensor node [8,9], using an AUV and multi-modal communications (e.g., acoustic and MI) can enhance the performance of UWSNs and can enable many critical applications. In this scenario, as shown in Figure 1, which leverages the advantages of different wireless underwater transmission modes and the mobility of AUVs, AUVs could be made aware of the sensing task and visit part of the sensors (sink nodes) for data gathering, while the rest of the sensors could acoustically forward their sensing data to sink nodes. Although AUVs can act as data mules and travel from node to node across a sparsely deployed sensor network to collect data [10], the slow mobility and limited energy that constrain them means that they can only visit a very small number of the sensors. Thus, much sensing data should be acoustically forwarded sensor by sensor. In such a procedure, packets may experience multiple relay nodes before reaching a sink node, which means that an enormous amount of energy is consumed on data forwarding along the path [11]. In addition, with the increasing number of acoustic communications, the probability of package collision and transmission energy consumption could be notably magnified, which will reduce the network performance. Moreover, reducing the total energy consumption and acoustic communications is not enough for prolonging the network lifetime, as some popular sensors, e.g., some neighbours around the sink, may run out of the energy faster than the others, which will cause energy consumption unbalance. Furthermore, due to the channel interference and limitation of acoustic channel capacity, a single sink node may not be able to receive all of the sensing data. Hence, multiple sink nodes are required to collect the whole sensing data. Due to the limitation of AUVs' energy, sink nodes should be carefully selected.
This paper addresses all of these problems mentioned above to solve the underwater data gathering problem. Targeting the reduction of the total energy consumption on acoustic communication and balancing the energy consumption among the networks, we propose HAS 4 , a Heuristic Adaptive sink sensor set selection algorithm for underwater AUV-aid data gathering. HAS 4 has both centralized and distributed versions. They both achieve energy efficient by decreasing the possible acoustic communication or reducing the maximum acoustic link length. HAS 4 prolongs the network lifetime by adaptively selecting short lifetime sensors as sink nodes to eliminate the energy consumption on data transmission.
The contributions of this paper are as follows: 1.
We use a mix integer linear problem (MILP) to formulate the underwater AUV-aid data gathering problem. In the formulation, we take the energy consumption on acoustic communication, AUV traveling distance, and energy balance into account to prolong the network lifetime and reduce the AUV traveling cost.

2.
We propose HAS 4 to solve the MILP (mix integer linear problem). HAS 4 is independent from network topologies, and has parameters for trading off between AUV traveling cost and network lifetime. By providing both centralized and distributed versions, HAS 4 widens the applicable scenarios.

3.
We conduct simulations to verify the efficiency of HAS 4 . Experiment results show both centralized and distributed HAS 4 prolong the network lifetime with low AUV traveling cost.
The remainder of this paper is organized as follows. Section 2 discusses related work. In Section 3, we introduce the problem of AUV-aid underwater data gathering problem. Centralized and distributed algorithms are proposed in Section 4 to solve the problem. Section 5 reports some simulation results. Finally, Section 6 concludes the paper.

Related Work
Generally, underwater data gathering systems fall into two categories: multi-hop and AUV-aid approaches.

Multi-Hop Approaches
In these approaches, underwater sensing data are forwarded hop-by-hop along routing paths. Thus, routing technique plays the key role in these kinds of approaches. Different from terrestrial wireless sensor networks (WSN), the underwater nodes move with water flows involuntarily in the underwater environment [12][13][14][15][16], and routing protocols designed for WSNs can not be directly applied to UWSNs. Thus, the research focus in this area is to design energy-efficient and reliable routing protocols.
Based on the information required in the protocols, most underwater routing can be classified into location-based and depth-based. Location-based routing protocols, such as vector-based forwarding [17], hop-by-hop vector-based forwarding [18], depth-controlled routing [19], etc., assume that each node knows its location and each packet contains the position of the sender. Depth-based routing protocols, such as depth-based routing [20], directional depth-based routing [21], and depth-based multi-hop routing [22], etc., use depth information to route the packets instead of the full location coordinates.
Due to the energy-extensive consumption of underwater acoustic communication and limited battery power, multi-hop approaches will reduce the lifetime of the network.

AUV-Aid Approaches
In these approaches, AUVs act as data mules to deliver underwater data. Vasilescu et al. illustrated the feasibility of leveraging AUVs for underwater data collection [10]. Tekdas et al. demonstrated the increase of the network lifetime via AUV gathering data from sensor nodes [23].
Research on AUV-aid data gathering has focused primarily on how to reduce the length of AUV's traveling path [24]. Basagni et al. take the Value of Information (VoI) that value of sensing data decreases over time into account and propose a Greedy and Adaptive AUV Path-planning (GAAP) algorithm [25]. Moreover, the cooperation among AUVs is also studied in 3D underwater environments [26]. However, these works are not using MI communications to improve the reliability and throughput.
In this paper, we leverage the advantages of different wireless underwater transmission modes and the mobility of AUVs to provide a cyber interconnection scheme that enables distributed and efficient data delivery from underwater sensors to the surface stations.

AUV-Aid Underwater Data Gathering Problem
In this section, we formulate the AUV-aid data gathering problem with the objectives of minimizing the AUV travel distance, minimizing the total energy consumption and maximizing the network lifetime.

Motivation and Overview
Leveraging the sub-sea MI communication model in [27], as shown in Figure 2, the MI channel capacity can exceed 5 × 10 6 bps. Moreover, we have built up a sub-sea MI testbed as shown in Figure 3. A data transmission based on Quadrature Phase Shift Keying (QPSK) modulation scheme with symbol rate equal to 100 kHz is demonstrated in Figure 4. Limited by the size of the water tank, we were only able to separate the transmitter and receiver at the distance of 0.7 m. The testing results show that the data transmission is 100% successful. Thus, leveraging the advantages of reliable and high speed MI communication, long distance acoustic communication and the mobility of AUVs, UWSNs can dramatically reduce energy cost on data gathering with high reliability.   Consider an AUV and many sensors form a underwater wireless sensor network. Each sensor is anchored to the target area for ocean monitoring/sensing applications. Both magnetic-induction (MI) and acoustic communication modules are equipped to every sensor for short distance high speed communication and long distance low speed communication, respectively. We assume that the perceived data are periodically gathered and delivered to the designated surface station. During each period, the AUV travels from the surface station, dives into the water and visits a subset of the sensors for data collection, brings the collected data back to the surface station, and prepares for the next round data collection. Since the MI communication range is quite short, the AUV should visit the sensor position exactly for MI communication. These sensors that are not visited by the AUV transmit their sensing data to chosen sensors via acoustical channels.
Suppose there are S = {s 1 , s 2 , · · · , s n } sensors, and the i-th sensor s i is anchored at the coordinate m k } denote the acoustic sub-channel set. The transmission range on the m acoustic sub channel is m t , and the interference range of this channel is m i . The neighbor sensor set for each sensor can be obtained according to their coordinates and communication range. For simplicity, in this paper, we assume the AUV's speed is constant at V, and the MI communication rate is constant at r mi .

Problem Formulation
As mentioned above, we have three objectives for this problem, and the first objective is to minimize the AUV traveling distance: where variable d ij is the distance between sensor i and sensor j and x ij is a binary variable that indicate the AUV travel path. If the AVU travel from sensor i to sensor j, The second objective is to minimize the total energy consumption: where D ij is the data amount from sensor i to sensor j. p out i and p in i denote the energy consumption of sensor i on transmitting and receiving through acoustic channels, respectively. N i is the neighbour sensor set of sensor i.
The final objective is to maximize the network lifetime. We define the network lifetime as the sensing round that the first sensor runs out of energy. Therefore, the objective of maximizing the network lifetime can be seen as maximizing the lifetime that is the minimum lifetime in the network: where p 0 i is the energy consumption on this round predefined sensing task, e i is the residual energy of sensor i and p out i (mi) is the energy consumption on transmitting its own data through MI. To achieve the above objectives, we have the following constraints: Notations used in above equations are listed in Table 2. Equations (5) and (6) are acoustic-channel interference constraints. Because of the interference in acoustical communication, each sensor can only transmit to or receive from another sensor through a specific acoustic channel at a time (Equation (5)). We use a binary variable χ m ij (t) to indicate the sub-channel state. If node i transmits data to node j on sub-channel m at time t, χ m ij (t) = 1. Otherwise, χ m ij (t) = 0. In Equation (6), I j m is the sensor set within the jth sensor's interference range on channel m. If sensor j is within the interference range of sensor k on sub-channel m, and sensor k is transmitting data to its neighbor l via m sub-channel, then sensor j's neighbor i will fail to transmit to sensor j through sub-channel m at this time because of signal interference (Equation (6)). Table 2. Notations.

d ij
The distance between node i to node j. c i The coordinate of node i. χ m ki (t) A binary variable, that equals to 1 if node i transmits data to node j on sub-channel m at time t. N i The neighbor node set of node i.

D ik
The data amount from node i to node k. c m ij The channel capacity between node i and node j on sub channel m. y i An indicator that equals 1 if node i is an S A node. S A The node set that will be visit by an AUV.

S U
The node set that will not be visit by an AUV.

S C
The node set that need to be confirm whether will be visit by AUV.
Equation (8) to (14) are data flow constraints. We define S A and S U to represent the sensor sets that will be visited by AUV and will not be visited by AUV, respectively. Then there is no data flow to S U sensors (Equations (8) and (9)) and no data flow from S A sensors (Equations (10) and (11)). Moreover, the data flow from an S U sensor equals the data it sensed (Equation (12)), and the outgoing data of each sensor is the sum of the incoming data and its sensing data (Equation (13)). Variable v i in Equation (12) is the sensing rate of sensor i. In addition, the data flow on each sub-channel should not exceed the link capacity (Equation (14)). Variable c m ij in Equation (14) is the channel capacity between sensor i and sensor j on sub channel m.
Equation (15) means that, during the data gathering process, each node should not spend more energy than its remained energy. Variable d i in Equation (16) is the buffer size of the ith sensor, and Equation (16) constrain the data amount stored on an S A sensor should not exceed the buffer size of this sensor. A binary variable y i is used to indicate the sensor type. If sensor i is an S A sensor, then y i = 1. Otherwise, y i = 0. Equations (18) and (19) are the travel tour constraints that each chosen sensor must visit exactly once. In addition, the AUV stop time on each sensor for data gathering is: Since there is a polynomial-time reduction from the traditional traveling salesman problem to the above problem by setting p in i and p out i to zero, the above problem is an non-deterministic polynomial-time hard (NP-hard) problem. We implement the problem using Matlab+YALMP+GUROBI and found that, for a small scenario with no more than eight nodes, it takes more than an hour to find the optimal solution. Considering that this problem is NP-hard, the execution time would grow exponentially, and we propose a heuristic algorithm H AS 4 to solve this problem in the following section.

Centralized and Distributed Algorithms
In this section, we provide HAS 4 to solve the MILP, and give a centralized and a distributed version of HAS 4 .

Description of HAS 4
Since the path planning problem is well studied as the traditional travelling salesman problem (TSP) in previous works [28], in this paper, we focus on a sink sensor selection problem. In HAS 4 , we define S C to represent the sensor set that needs to be confirmed as to whether it will be visited by AUV. At first, all sensors are in S C and then HAS 4 determines which sensor falls into S A and which sensor falls into S U . During this process, HAS 4 greedily chooses sensors with less energy to S A . This is because acoustic transmission consumes more energy than acoustic reception and MI communication. By selecting these sensors to S A , in this round of data gathering, these sensors could spend less energy to balance the network energy consumption and prolong the network lifetime.
For the objective of minimizing the AUV cost, the mobile AUV should visit as few sensors as possible. For the purpose of minimizing the network energy consumption, the network should reduce acoustic communications, which means that AUV should visit as many sensors as it can. It is easy to observe that these two objectives are opposite to each other. In HAS 4 , we define a variable to make a trade-off between these objectives. We provide both centralized and distributed version of HAS 4 . Centralized HAS 4 can leverage all sensors' information about remainder energy, data sensing rate, channel capacity, etc. to make a sub-optimal solution for the problem in Section 3.2. Although the surface station or AUV could be the centralized decision maker, with the long propagation delay of acoustic communications and slow speed of AUV, centralized HAS 4 may not scalable for large underwater networks. Distributed HAS 4 utilizes the information from sensor-self and its one-hop neighbours to decide which set this sensor belongs to. Thus, it is very practical with both small and large size networks.

Centralized HAS 4
In centralized HAS 4 , the trade-off factor l falls in [0, 1], and 1 -l is the proportion of S U candidate in all sensors, which means at least n × l sensors will be visited by the mobile AUV. The whole procedure is as shown in Algorithm 1.
Firstly, HAS 4 estimates the lifetime of each sensor to heuristically sort the sensors. The sensor lifetime is defined as the value that the remainder of the energy divides by the energy consumption in this period. The energy is consumed on data sensing and transmission, and sensing consumption is almost constant with the preassigned task. Since the energy spent on transmission varies between different modes and acoustic is much costlier than MI, without loss of generality, we use acoustic transmission cost as the communication energy consumption. In addition, due to the fact that we have not known the data flow, we average all possible values as the consumption. After sorting the sensors, a lifetime threshold is set as the lifetime of the nl -th sensor that sensors with smaller lifetimes can not be chosen into S U .
Secondly, HAS 4 greedy selects the minimum lifetime sensor from S C into S A , and then adds all its neighbours into U. We denote U as the tuple set of candidate sensors of S U with its upper level sensor. The upper level sensor is the sensor where the sensing data would transmit to.
Thirdly, HAS 4 heuristically chooses the sensor with the largest lifetime from U to verify if it could be added to S U . In the verification, HAS 4 compares its lifetime with the threshold and all constraints in Section 3.2. If it satisfies all conditions, HAS 4 adds this sensor to S U , changes channel indicators and adds its neighbours from S C to U. Otherwise, it removes this sensor from U. HAS 4 repeats this process until U is empty and then goes to the second step.
When S C is empty, the shortest tour among all S A could be solved by approximately solving a TSP.
Since the outer while loop will execute at most n times, where n is the node number, and the inner while loop will execute at most |U| times, the centralized HAS 4 is a Ω(mn) time algorithm, where m is neighbour number of the node with maximal neighbours.

Input:
An underwater sensor network; A tread off factor l between delay and lifetime; The residual energy of each sensor e i .

Output:
A set of sensor that will be visited by AUV S A ; Multiple data forward link; A traveling tour. Add all sensor to S C ; Ascending sort S C via lifetime e i −p out i p out i (mi) ; Define the threshold lifetime T l to be the lifetime of the nl -th sensor in S C ; while S C is not empty do u ← S C (1); Find an approximate tour among S A ;

Distributed HAS 4
The distributed HAS 4 is a multi-round procedure. Each round consists of the following stages: self-elect inquiry, self-elect affirm, root select, root affirm, channel-use broadcast and conflict handler.
Sensors serving different roles participate in different stages and show different behaviors in each stage. Moreover, in distributed HAS 4 , the trade-off factor l indicates the maximum acoustic path from any S U sensors to S A sensors.
In distributed HAS 4 , each S C node changes itself to be S A node with probability 1/lifetime. When there are some S A nodes, the relationship between S A and S C becomes a matching problem. Each S C node takes the distance to each S A node and the lifetime of each S A node as the preference to choose a root node. Each S A accepts some S C nodes on the purpose of maximizing its group. Once an S C node joins a group, it becomes S U node. When there are no S C nodes, this round of matching problem is solved.
The execution of distributed HAS 4 on S C nodes is as shown in Algorithm 2. For each S C node, in the self-elect inquiry stage, it temporarily changes itself to be an S A node with probability 1/lifetime and broadcasts a self-elect-inquiry message to its neighbours. If it does not receive any self-elect-inquiry messages from its neighbours or its lifetime is less than all its temporarily S A nodes, it will successfully be an S A node and broadcast a self-elect-affirm message. Otherwise, it will change back to S C . Once an S C node receives a self-elect-affirm message, it will add this node to its root candidate set U. In the root select stage, an S C node chooses the maximum lifetime node from U as root, and broadcasts a root-inquire message. If an S C node receives a positive message in the root affirm stage, it will temporarily become an S U node and broadcast a channel-occupation message based on the positive message. Once an S C node receives a channel-occupation message, it will update its available channels. If a temporary S U node does not receive any channel-occupation message or it has a longer lifetime or is closer to the root, it will successfully be an S U node. Otherwise, it will change back to S C and broadcast a channel-collision message.
Change v to S A with probability 1/lifetime; if change to S A then broadcast a self-elect-inquiry message; end Receiving self-elect-affirm messages; On triggering timer l ; U ← U {M i }; Descending sort U ; Choose the first node u from U as root; On triggering timer 2  The execution of distributed HAS 4 on S A nodes is as shown in Algorithm 3. For each S A node, it receives root-inquire messages, takes some appropriate S C nodes into its group according to all the constraints in Section 3.2, and broadcasts this decision. Once an S A node receives a channel-occupation message or channel-collision message, it will update the available channels. The execution of distributed HAS 4 on S U nodes is as shown in Algorithm 4. In distributed HAS 4 , S U nodes are in charge of forwarding the messages between S A and S C .
Since in root select, root affirm and conflict handler stages, messages from S C (S A ) should be forwarded hop-by-hop to S A (S C ) sensors, while, in the other stages, sensors just independently broadcast messages. Thus, the duration of root select stage is equal to the root affirm stage and conflict handler stage, and relates to the maximum hop in the network, which is l time slots. In addition, the duration of self-elect inquiry, self-elect affirm and channel-use broadcast is much shorter and equals one time slot. Thus, there are 3l + 3 time slots in each round of distributed HAS 4 .
Since the message that transmitted in a distributed algorithm is not determined, the message transmitting time of each time slot should be randomly chosen in distributed HAS 4 to reduce the collision probability. The execution of distributed HAS 4 will exit on a sensor, when itself and all its neighbors are not within S C .
For the worst execution of the distributed HAS 4 that there is only one S A node and all node are sequentially joined in the group, the S A node should answer n − 1 messages, and the message number for each of the rest nodes would not exceed 2n. Thus, the distributed HAS 4 is a Ω(n 2 ) message algorithm.

Simulation Setting
In this simulation, sensors are randomly deployed in a 10 km × 10 km × 1 km area, and the results are from 1000 runs. The initial energy of each sensor is 40 kW·h. The transmission power of acoustical communication is 190 dB re 1 µPa (approximate 6.67 W). Since MI communication can be powered by AUV, we don't take MI communication energy into account. The frequency of data gathering is 10 h. In addition, the data sensing rate follows the normal distribution of N (10, 2) kbit/s. The benchmark in this simulation is statically using the result of first round HAS 4 's execution.

Results and Analysis
As shown in Figure 5, network lifetime decreases with the increment of network size. The reason is that, in a large network, the data amount forwarded by relay nodes is much larger than a small network that consumes more energy on data forwarding. We can also observe that centralized algorithms achieve better performance than distributed algorithms. This is because centralized algorithms can leverage all sensors' information to make a sub-optimal solution. Due to the channel capacity, more S A sensors are needed in a larger network. Thus, as shown in Figure 6, AUV travels longer in a larger network than a small one.  4 , the network lifetime grows with the growth of trade-off factor. This is because the bigger the trade-off factor is, the less S U nodes exist, which reduces the amount of acoustic data communication. In distributed HAS 4 , since the trade-off factor is the longest source to end path. Thus, with a small trade-off factor, there are more S A nodes, which reduce the acoustic communications.  As we designed in centralized HAS 4 , the trade-off factor is the maximum percentage of S A nodes, the larger the factor is, the more S A nodes will be. Thus, the reason that the AUV tour increases with the increment of trade-off factor in Figure 9 is with the trade-off factor grows, S A nodes correspondingly increase, which leads to the increment of the average AUV tour. However, the threshold of distributed HAS 4 is designed as the maximum acoustic path from any S U sensors to S A sensors, which results in that as shown in Figure 10, and the number of S A nodes will decrease with the increase of threshold.

Conclusions
In this paper, we solve the underwater data gathering problem in an efficient scenario, in which an AUV is utilized to visit part of the sensors and collect data via high speed MI communication, while the remaining nodes acoustically route their data to the ones that will be visited by the AUV. We propose both distributed and centralized algorithms HAS 4 to select sink sensor sets in this AUV-aid data gathering applications. By setting the trade-off factor, HAS 4 can achieve both energy and AUV efficiency.
For data gathering scenarios, the cooperation among multiple AUVs or the partition of the gathering task with high delay, unreliability, and low bandwidth acoustic communication in large scale UWSNs is much more challenging. Thus, in the future, we will further examine underwater data collection, including multi-AUV, task partition and online path planning.

Conflicts of Interest:
The authors declare no conflict of interest.