Data Uploading Strategy for Underwater Wireless Sensor Networks

Underwater wireless sensor networks (UWSNs) have become a popular research topic due to the challenges of underwater communication. The existing mechanisms for collecting data from UWSNs focus on reducing the data redundancy and communication energy consumption, while ignoring the problem of energy-saving transmission after compression. In order to improve the efficiency of data collection, we propose a data uploading decision-making strategy based on the high similarity of the collected data and the energy consumption of the high similarity data compression. This decision-making strategy efficiently optimizes the energy consumption of the networks. By analyzing the data similarity, the quality of network communication, and uploading energy consumption, the decision-making strategy provides an energy-efficient data upload strategy for underwater nodes, which reduces the energy consumption in various network settings. The simulation results show that compared with several existing data compression and uploading methods, the proposed data upload methods has better energy saving effect in different network scenarios.


Figure 1. Underwater wireless sensor networks (UWSNs).
For large-scale underwater monitoring applications, nodes in UWSNs forward the collected data to the sink node through multi-hop routing. In the actual application environment, many raw data have spatial-temporal redundancy [2]. Therefore, the redundant network processing can reduce the energy consumption of network communication by minimizing the amount of data transmitted, and on the premise of preserving the information required by the application. In the energy-optimized protocol design of UWSNs, data compression is an important aspect of data processing within the network.
Compression algorithms applied to wireless sensor networks have put forward the basic requirements in terms of complexity and compression effect. In the article [3], compression precision and compression ratio are used as the performance indicators to evaluate the compression effect of a data compression algorithm based on data correlation applied in the wireless sensor network (WSN), including the two-step data compression algorithm based on sequence correlation (TSC-SC) [4], adaptive multiple-modality data compression algorithm based on lifting wavelet and adaptive polynomial fitting (AMLP) [5], and data compression algorithm based on piece-wise linear regression (PWLR) [6]. The computational complexity of these algorithms be satisfied by the UWSNs node hardware configuration can satisfy the computational complexity of these algorithms. These algorithms are all based on a time-domain compression algorithm, and the compressed data is sent after the accumulation of local compressed data. The compression accuracy is adjustable. These features ensure algorithm is suitable for the most sensor network applications, including UWSNs. As shown in the simulation results [3], under the same compression accuracy requirements, the AMLP algorithm produces a lower compression ratio, followed by TSC-SC, while the PWLR algorithm produces the highest compression ratio. To further analyze the time complexity of each algorithm, we used ATMEL AVR Studio 4 [7] to compile the algorithms to get the time cost of each algorithm, as shown in Figure 2.
Considering the data collected by a single node, the PWLR has significantly less compression time than other algorithms due to fewer execution steps and computations. However, its compression ratio is significantly lower than the AWLP, which is based on the lifting wavelet and the polynomial fitting. Shuang et al. [4] analyzed the compression energy consumption of the TSC-SC algorithm based on the polynomial fitting. When the fitting number is three, the algorithm shows good performance and the communication energy consumption is further reduced. However, its energy-saving effect is unsatisfactory, which means the computational energy consumption of the algorithm cannot be ignored. The communication energy consumption saved by data compression may not compensate for the additional computational energy consumption introduced, resulting in an increase in total energy consumption. For large-scale underwater monitoring applications, nodes in UWSNs forward the collected data to the sink node through multi-hop routing. In the actual application environment, many raw data have spatial-temporal redundancy [2]. Therefore, the redundant network processing can reduce the energy consumption of network communication by minimizing the amount of data transmitted, and on the premise of preserving the information required by the application. In the energy-optimized protocol design of UWSNs, data compression is an important aspect of data processing within the network.
Compression algorithms applied to wireless sensor networks have put forward the basic requirements in terms of complexity and compression effect. In the article [3], compression precision and compression ratio are used as the performance indicators to evaluate the compression effect of a data compression algorithm based on data correlation applied in the wireless sensor network (WSN), including the two-step data compression algorithm based on sequence correlation (TSC-SC) [4], adaptive multiple-modality data compression algorithm based on lifting wavelet and adaptive polynomial fitting (AMLP) [5], and data compression algorithm based on piece-wise linear regression (PWLR) [6]. The computational complexity of these algorithms be satisfied by the UWSNs node hardware configuration can satisfy the computational complexity of these algorithms. These algorithms are all based on a time-domain compression algorithm, and the compressed data is sent after the accumulation of local compressed data. The compression accuracy is adjustable. These features ensure algorithm is suitable for the most sensor network applications, including UWSNs. As shown in the simulation results [3], under the same compression accuracy requirements, the AMLP algorithm produces a lower compression ratio, followed by TSC-SC, while the PWLR algorithm produces the highest compression ratio. To further analyze the time complexity of each algorithm, we used ATMEL AVR Studio 4 [7] to compile the algorithms to get the time cost of each algorithm, as shown in Figure 2. As shown in Table 1, the underwater nodes are not only limited in energy, but also the computing power. The time it takes for a node to send single-byte data is about 0.8-8 ms [1]; When calculating a more complex implementation of the compression algorithm, the millisecond-level single-byte compression time makes the computational energy consumption close to or even more than the communication energy consumption of sending the raw data. Therefore, performing a more  Considering the data collected by a single node, the PWLR has significantly less compression time than other algorithms due to fewer execution steps and computations. However, its compression ratio is significantly lower than the AWLP, which is based on the lifting wavelet and the polynomial fitting. Shuang et al. [4] analyzed the compression energy consumption of the TSC-SC algorithm based on the polynomial fitting. When the fitting number is three, the algorithm shows good performance and the communication energy consumption is further reduced. However, its energy-saving effect is unsatisfactory, which means the computational energy consumption of the algorithm cannot be ignored. The communication energy consumption saved by data compression may not compensate for the additional computational energy consumption introduced, resulting in an increase in total energy consumption.
As shown in Table 1, the underwater nodes are not only limited in energy, but also the computing power. The time it takes for a node to send single-byte data is about 0.8-8 ms [1]; When calculating a more complex implementation of the compression algorithm, the millisecond-level single-byte compression time makes the computational energy consumption close to or even more than the communication energy consumption of sending the raw data. Therefore, performing a more advanced compression algorithm requires a lot of time, which may worsen the energy saving effect of compression algorithm. As shown in Table 1, compared with the milliwatt-level transmission power of terrestrial sensor nodes [8], the power consumption of UWSNs sensor nodes is very high [1], although UWSNs nodes can adjust their transmission power like terrestrial sensor nodes. For short-range communication (<1000 m), the transmission power ensures that the data can be received as low as 5 W [1], which further reduces the communication energy consumption of nodes, whereas the computing power of nodes is up to 0.8 W. In this case, the length of the compression execution time further reduces the energy saving effect.
Based on the above analysis, we conclude that the existing compression algorithm may not save energy in underwater applications, and only by continuously optimizing the compression algorithm can the probability of this problem be reduced. Energy optimization can be achieved through two aspects of data compression: reducing computational complexity [2] or reducing transmission power to lower the computational energy consumption occurring in local nodes. We need to consider the communication environment of the nodes to reduce the communication energy consumption of the link. Therefore, from the perspective of the whole network, applying the same algorithm to different nodes (which located at different positions), may result in different energy savings. Most of the existing energy optimization mechanisms adopt the unified compression algorithm for the whole network. It is only for specific applications and lacks adaptive ability.
For the accuracy requirements of various compression algorithms for different data applications and the different energy saving effects of network settings, such as network communication environment, and node location, etc. We propose a data upload decision-making mechanism to solve the potential compression energy saving problems. When the sensor nodes (such as the source nodes) need to send monitoring data to the sink nodes, a compression decision-making mechanism will introduced into the system under the premise of ensuring the application requirements. Nodes can choose the optimal compression strategy from the pool of compression algorithms through the decision-making mechanism to ensure the nodes transmit data packets with near-optimal energy loss and improve energy optimization.
The rest of this paper is organized as follows: In Section 2, we will briefly introduce the related research on energy optimization. In Section 3 we will show our novel data uploading strategy. Then, we proposed an evaluation of the performance of our strategy in Section 4. Finally, we present our conclusions in Section 5.

Data Reduction Strategies in UWSNs
In the context of continuous monitoring, most of the data change slowly, resulting in a large amount of data redundancy in spatial and temporal. As a result, in order to limit energy consumption, reducing the amount of data without affecting data quality has become one of the most popular strategies for using data aggregation, data compression, and combinations in UWSNs [9]. Compared with terrestrial WSN, the deployment of sensor nodes in UWSNs is sparser and the typical method of reducing UWSNs data is to use sparse sampling. Compressed sensing (CS) has been thoroughly studied [9], which allows sparse analog signals to be represented by fewer samples than the Nyquist sampling theorem [10] and is essential for reducing the samples in information acquisition and sparse signal recovery [11]. CS has been adopted as the representative data reduction technique in UWSNs. Lin et al. [12] proposed an energy-efficient compression framework for UWSNs; in this framework, compressed sensing theory is applied to reduce the number of sampling nodes. Wang et al. [13] applied a cluster-based method to exploit spatial and temporal correlations for data aggregation, in which the data collected by the sensors were transmitted to the data fusion center through a hierarchy of cluster heads that used wavelet decomposition to build the wavelet coefficients of the data. Jing et al. [14] used a dynamic programming approach to decide the energy allocation ratio. Gonglaing et al. [15] proposed interleave division multiple-access based compressed sensing for information acquisition with sensor networks to ensure the data transmission reliability of information map reconstruction. Some optimization strategies were used to improve the estimation accuracy of the CS framework [16]. The optimization of the dictionary requires training samples constructed by the raw signal, but the requirement limits the recovery of the compressed signal, because the recorded raw signal recorded is unknown at the receiving terminal.
However, to date, the CS methods are mainly applied to data aggregation and processing data compression at the sink node; yet, in many cases, most of the CS methods demand accurate time synchronization and frequent network communication, resulting in huge energy overhead. This often requires powerful sensor node communication and computation capabilities, which is inefficient and impractical [12] for underwater sensor nodes, as these conditions are hard to be satisfy in common UWSNs. CS cannot be directly used for compressing raw acoustic communication data since the data are non-sparse in the temporal domain [16]. Although some works have been published about the application of CS in underwater sensor networks, few studies have focused on the problem of energy-saving transmission after compression. For the strictly computation/communication/energy limitation, the energy consumption occurring in all sensor nodes should be considered, rather than just the sink node.
Theoretically, most data reduction strategies that can be processed in the terrestrial WSNs could also be applied to UWSNs if the computation complexity satisfies the hardware conditions of UWSNs sensor node. Next, we introduce the research on the data reduction strategies in WSNs, especially the database management techniques. Finally, we introduce three data compression algorithms, which are widely applied in WSNs, including those based on temporal correlation, spatial correlation, or spatial-temporal correlation.

Data Reduction Strategies in WSNs
Given the great limited sensor resources and large amount of sensed data, data management and transmission becomes more difficult, it is mandatory to introduce ways to reduce the volume of data to be transmitted. As largely detailed in [17], data reduction strategies in WSNs can generally divided into three classes: local processing algorithms [18], distributed source coding [19] and in-network data storage and query processing [20]. In-network Data Storage and Query Processing, with data being processed and stored on local node, only the end result of data analysis needs to be sent to the sink, sensor nodes in this scenario essentially form a distributed database [20,21]. In order to manage large amount of sensed data in an energy-efficient way, a database-oriented approach [20] of WSNs has proven to be useful. Bonnet et al. [22] have considered the WSNs as a relational database, in which sensor nodes represent a virtual table, having the capacity to generate continuous data streams, and which can be queried using SQL-Like queries. [23] introduced a game theoretic reward-based mechanism to balance the work load among network nodes, in which the database-oriented approach was applied to gain credit scores for nodes' relaying service, however, since energy is highly wasted due to cooperation for data forwarding, this is inefficient and impractical for underwater sensor nodes. Haifeng et al. [24] proposed a distributed similarity search approach for retrieving the similar high dimensional sensed data in WSNs, in which a distributed LSH-based model and a distributed approximate similarity search algorithm that can be all classed inside the category of database-oriented approach were applied to compute the similarity score in the cluster head of WSNs. Note that the study design also includes how to use the large image datasets to simulate the large-scale WSNs scenario, this is inefficient and impractical for underwater scenario in which the multimedia data are hard to be gathered and processed.
Since UWSNs are application-specific and the choice of the appropriate compression methods depends on processing capability of sensor nodes and the network communication condition, the design of a distributed database-oriented in-network processing [20] mechanism adaptable to various UWSNs conditions and data similarity is necessary.

Temporal Domain Compression Algorithms
The temporal domain compression algorithms are typical compression algorithms based on the temporal correlation of the data. The temporal correlation indicates that the difference is small in the data that are collected by a single node over a long period. Typical temporal domain compression algorithms include the piecewise approximation algorithm [25], the predictive coding algorithm [26], and wavelet transformation [27]. The basic principle of piecewise approximation is to sequentially read the average values of the data. If the difference between the average value of the current data and that of the next sampling data exceeds a given threshold, the average value and duration of the segment are output. The prediction coding algorithm uses a certain mathematical model to obtain the predicted value of the next sampling point. If the error between the predicted value and the true value is within the predefined range, the predicted value can be substituted for the true value. Exemplary prediction coding algorithms include the autoregressive prediction algorithm and the moving average prediction algorithm. A compression technique based on curve simulation (CODST) algorithm [28] applies the regression model and curve fitting technology to realize spatial-temporal data compression. A piecewise linear approximation algorithm was proposed [29] that uses a straight line to approximate the data points that have appeared but have not been compressed under a given error bound until a new data point breaks through the bound. Similarly, starting from this new data point, new lines are used to approximate subsequent arrivals. The enabling approximate querying (EAQ) algorithm was proposed [30] that first converts the original time series into a special time series description multi-version array (MVA). With this MVA prefix, the approximate version of the original time series with certain errors can be recovered. With increasing prefix, the error decreases gradually. The time series approximation with variable error is realized. Time correlation algorithms based on time series include the discrete Fourier transform [31] and discrete wavelet transform [32].

Spatial Compression Algorithm
The spatial compression algorithms are based on the spatial correlation of the data. The spatial correlation reflects the similarity of the nodes with similar geographical locations at the same time. Compared with the temporal domain compression algorithm, the spatial domain compression algorithm is more complex. The cooperation between nodes should also be considered, which means that more data communication between nodes is needed. Typical spatial compression algorithms include distributed source coding DSC [33] and the Huffman algorithm [34]. Kolo et al. [33] considered data compression and routing together to eliminate the redundancy of data space as much as possible.

Spatial-Temporal Domain Compression Algorithm
This kind of algorithm is a combination of the temporal domain compression algorithm and the spatial domain compression algorithm. They are designed for situations in which both time and spatial correlations exist in the network data. Typical algorithms include the wavelet algorithm [35,36] and the compressed sensing algorithm [37]. The spatial-temporal wavelet transform algorithm removes the redundancy of the raw data using wavelet transform on the rows and columns of the abstract raw data matrix and achieves better compression performance with lower operational costs. An adaptive optimal zero-elimination compression algorithm was proposed that uses the spatial-temporal correlation of data to remove redundancy [38]. Based on compressed sensing theory, a data collection method with a long life cycle was designed to achieve data compression [39]. The data matrix was obtained by matching the functional relationship between the data in the dynamic time warping (DTW) path and the asynchronous data, and then compressing the data using the correlation between the data [35]. The adaptive multi-mode data compression algorithm based on wavelet transform [36] does not consider the effects of energy consumption and storage capacity of nodes, although multi-dimensional correlation is considered.
From the above analysis, most of the traditional compression algorithms are hard to directly apply to UWSNs because the existing compression algorithms used in sensor networks tend to focus on the compression performance and transmission energy consumption of the algorithms, but neglect the computational energy consumption and network performance. This poses a barrier given the limited computation and communication capacity of UWSNs nodes. Most of the existing energy optimization mechanisms adopt the unified compression algorithm for the whole network for specific applications and lack adaptive ability.
Compression algorithms vary widely, and it manages the different data types. We wanted to choose an algorithm with low complexity and good compression performance. However, the complexity and compression performance of the algorithm is compromised in design. Most of the algorithms with better compression performance are highly complex. Lower-degree algorithms usually only produce general compression performance. At present, UWSNs data acquisition mostly involves text data. Based on the redundancy characteristics of UWSNs data acquisition and the limited computing, storage, and communication capabilities of sensor nodes. Using the prediction coding, the piecewise approximation algorithm, and the wavelet transform algorithm can mine the redundancy of data time dependence, and satisfied the computation complexity. Therefore, these algorithms can be considered as an alternative compression algorithms for UWSNs.

Data Uploading Strategy
In this section, we introduce the intelligent network data uploading decision-making mechanism for application to UWSNs gateways. Based on the similarity between the data to be uploaded and the data uploaded in the previous round, the mechanism first makes a decision whether to upload the data or not. If the decision-making results need to be uploaded, the total energy that is needed by the sensor nodes to upload the data after compression needs to be further analyzed, and then the optimal upload execution strategy is selected. This strategy performs the compression operation and then directly uploads the compressed data or raw data. The goal of the proposed data upload decision-making mechanism is to ensure that the data that are uploaded in different UWSNs settings can maximize their energy savings, thereby improving the efficiency of the UWSNs energy consumption and prolonging the network's lifetime. Data upload decisions include not uploading data, running a compression algorithm and then uploading compressed data, or uploading raw data without compression.

Relevant Concepts
In this section, we define the relevant concepts before a detailed description of our algorithm.

Compression Ratio (CR)
The CR is defined as the compressed data size over the raw data size: where D c and D o are the size of compressed data and the raw data, respectively. The smaller the compression ratio, the better the compression performance.

Data Similarity
Similarity [40,41] is used to describe and compare the similarity of data objects, and the value must be between 0 and 1. The higher the similarity of the objects, the larger the value. To represent the overlapping relationship between two N-dimensional datasets, the concept of data similarity is introduced to describe the degree of association between the two datasets. We calculated and compared the similarity of the current data to be uploaded and the previous data. The lower the value, the larger the discrepancy. The compression accuracy, which is set by all selected compression algorithms, is related to the compressed data accuracy. Then, a higher compression accuracy can be set to obtain accurate current data that have larger discrepancies from the uploaded data. To compute the energy consumption that results from the compression computation, we set the data similarity to be the compression accuracy of the selected compression algorithm, and finally we obtained the CR through the calculation.
The data similarity in our algorithm is only related to the data value, including the intersecting dimensionality between data sets and the corresponding overlapping proportion in each dimension. Its characteristics are as follows: (1) When the overlapping ratio in each dimension remains unchanged, the greater the intersecting dimensionality, the higher the similarity. (2) When the intersecting dimensionality between data sets remains unchanged, the larger the overlapping ratio in each dimension, the higher the similarity.
With the discussion above, we define the similarity of two data sets in dimension i as: where y 1 and y i2 denote the thresholds of the unions of the i-th attribute in the two data sets, which indicate the size of the union proportion; x i1 and x i2 denote the thresholds of the intersection of the ith attribute in the two data sets, which indicate the size of the intersecting proportion.
To compare the similarity between the two datasets and eliminate the influence of dimensions, dos i is normalized as: where n denotes the total number of dimensions. After dos i is derived, we define the similarity of the two data sets as the average of the similarities in all dimensions:

Joint Power Control and Rate Adaptation Algorithm
In this section, we propose an algorithm that combines the power control and the rate adjustment. The mechanism is based on the channel gain information and allows as much parallel transmission as possible by using a game theory-based Nash equilibrium solution [42]. To effectively protect marine animals, the mechanism introduces a dual-mode access control strategy based on the location results [43]. When sharing the spectrum with the marine biological system, excessive transmission power is avoided. Through the above power control algorithm, by implementing the power control scheme, we allow for as many concurrent transmissions as possible. However, the Signal to Interference plus Noise Ratio (SINR) at the receiver side changes significantly after the transmission power is adjusted. To manage the signal variations, the transmission rate should be tuned accordingly. In other words, modems with a higher SINR can choose a more aggressive mode with a higher data rate; otherwise, modems can choose a more conservative mode with a lower data rate to achieve lower bit error rate (BER) performance. Finally, to ensure a certain bit error rate requirement, we increase the transmission rate to reduce the network's energy consumption as much as possible. This mechanism provides the optimal transmission power and the transmission rate for the data upload decision-making mechanism in this paper. The details are provided below.

Power Control Algorithm
Considering the presence of marine animals or other acoustic systems around the current network, the sending node avoids impacting on these systems when calculating the transmit power. Therefore, the algorithm adopts a dual-mode power control mechanism that includes a sharing mode and an exclusive mode.

Sharing Mode
If the network has detected some marine animals, the sensor nodes in the network switch to the sharing mode to control their transmission power. In this mode, when a node request sending a packet, its transmission power should not be higher than the threshold that can affect the behavior of marine animals. In this method, we adopt the behavioral interruption threshold (160 dB re µPa) as the threshold. In networks, each node is considered selfish but rational, which means that they want to maximize their own interests. Therefore, the transmission power allocation problem can be treated as a game that can be solved by using the game theory. To fulfill the goals, a utility function that contains both a utility term and a pricing term is defined: where p i is the transmission power on the link i, h i is the channel gain in channel i, p −i is the transmission power on other links except link i, N is the total number of the UWSNs links, α i (bit/watt) is the bit price for one unit of transmission power, and σ 2 denotes the noise power (variance). We assumed that every node shares the same bandwidth B. Based on this assumption, B was omitted when we constructed the utility function. The first part of the utility function denoting the relationship between the level of sending power and the corresponding link's capacity is based on Shannon theory [44]. The second part of the utility function reflects the price of consuming a certain amount of power. Finally, the power allocation problem and the substitution are formulated, as shown in Equations (6) and (7), respectively: where p max max is the maximum sending power that is limited by the hardware's design, SINR th is the decoding threshold of the acoustic modem, T is the behavioral interruption threshold, and h mi is the channel gain between the sender and marine mammals. The third constraint in Equation (7) indicates that the receiving power from all senders should not be larger than the behavioral interruption threshold (160 dB re µPa) [45]. From this discussion, the sensor's power allocation problem is an optimization problem; therefore, the optimal power allocation problem can be solved by using the Nash equilibrium (NE) [42].

Exclusive Mode
If the network does not detect any marine animals, the sensor nodes will transmit their packets in exclusive mode. In this mode, the aim of environmentally friendly power control (EFPC) is to maximize the network throughput while reducing the energy consumption. Thus, we formulated the optimization problem (Equation (8)) and that with two limitations (Equation (9)).
The only difference compared with the share mode is that the third constraint, which is used for limiting the impacts on marine animals, is deleted. Through a similar deduction, the optimal power allocation problem can also be solved by using the NE.

Rate Adaptation Algorithm
Since an Orthogonal Frequency Division Multiplex (OFDM) modem is capable of working in five modes, the working modes [46] of OFDM modems are essentially decided by the current SINR, which reflects the current channel condition. In other words, with a higher SINR [1], modems can choose a more aggressive mode with a higher data rate. Otherwise, modems will choose a more conservative mode with a lower data rate. Inspired by this conclusion, combined with the bit error rate (BER) requirement at the receiver side, we propose a rate adaptation method for an underwater OFDM modem: where R t is the adopted transmission mode, M k is one of the five modes of the OFDM modems, and β k ·(β k−1 ) is the upper (lower) threshold for setting the modem mode to M k .

Data Upload Decision-Making Mechanism
When the UWSNs node needs to upload the collected data to the gateway, the data upload decision-making mechanism is introduced into the gateway system, and the gateway executes the data upload decision-making mechanism. The architecture is shown in Figure 3. and ( −1 ) is the upper (lower) threshold for setting the modem mode to .

Data Upload Decision-Making Mechanism
When the UWSNs node needs to upload the collected data to the gateway, the data upload decision-making mechanism is introduced into the gateway system, and the gateway executes the data upload decision-making mechanism. The architecture is shown in Figure 3.  As seen in Figure 3, the information that is required by the gateway to perform the data uploading decisions including: the similarity between the current data to be uploaded and the previous data; the compression ratio obtained by the alternative compression algorithm; and the transmission power, reception power, calculation power, data transmission rate, data retransmission rate, and source node location of the related nodes. This information is mainly provided by the three modules of sensor nodes. The node transmission power and its data transmission rate are obtained by performing the joint power control and rate adjustment algorithm that was described in Section 3.2. The algorithm can select the appropriate transmission power and data transmission rate according to the existence of animals and the channel conditions. The data retransmission rate is obtained by calculating the average retransmission rate of the data that were uploaded in the previous i-th rounds. The data similarity is obtained by calculating the similarity between the data to be uploaded in this round and the data from the previous round using the gateway processing unit, which was set as the precision requirement of each alternative compression algorithm, and the corresponding compression ratio and compression time are calculated. The receiving power and computing power are determined by the node's hardware. Based on the above information, the gateway makes the UWSNs data upload decisions with the goal of maximizing the energy savings of the integrated network. If the decision-making results no need to be uploaded, the decision-making process is completed; if the decision-making results need to be uploaded, the total energy As seen in Figure 3, the information that is required by the gateway to perform the data uploading decisions including: the similarity between the current data to be uploaded and the previous data; the compression ratio obtained by the alternative compression algorithm; and the transmission power, reception power, calculation power, data transmission rate, data retransmission rate, and source node location of the related nodes. This information is mainly provided by the three modules of sensor nodes. The node transmission power and its data transmission rate are obtained by performing the joint power control and rate adjustment algorithm that was described in Section 3.2. The algorithm can select the appropriate transmission power and data transmission rate according to the existence of animals and the channel conditions. The data retransmission rate is obtained by calculating the average retransmission rate of the data that were uploaded in the previous i-th rounds. The data similarity is obtained by calculating the similarity between the data to be uploaded in this round and the data from the previous round using the gateway processing unit, which was set as the precision requirement of each alternative compression algorithm, and the corresponding compression ratio and compression time are calculated. The receiving power and computing power are determined by the node's hardware. Based on the above information, the gateway makes the UWSNs data upload decisions with the goal of maximizing the energy savings of the integrated network. If the decision-making results no need to be uploaded, the decision-making process is completed; if the decision-making results need to be uploaded, the total energy consumption of the uploaded compressed data, including the node compression calculation's energy consumption and the total energy consumption of directly uploading the raw data, need to be compared. The sensor node executes an alternative compression algorithm according to the accuracy requirement or uploads the raw data directly without compression. Finally, the decision results are fed back to the source nodes.
To acquire historical information of the data similarity and compression algorithms for the data uploading decision-making mechanism, a two-dimension table was established to store the compression information of each underwater node, which is listed as {node number, data similarity, compression algorithms, average CR, average compression time, retransmission}. The proposed mechanism needs to compute the energy consumption of the data compression and transmission for each node. Figure 4 shows the communication process and describes the mechanism in steps.
To acquire historical information of the data similarity and compression algorithms for the data uploading decision-making mechanism, a two-dimension table was established to store the compression information of each underwater node, which is listed as {node number, data similarity, compression algorithms, average CR, average compression time, retransmission}. The proposed mechanism needs to compute the energy consumption of the data compression and transmission for each node. Figure 4 shows the communication process and describes the mechanism in steps.  Step 1: Carry out similarity calculation. Calculate the similarity of the 500 current data to be uploaded and the 500 previous data, which were sampled chronologically and expressed as , where ( , ) denotes the sampled data at one time. Next, calculate the similarity of the current sampled data and the previous sampled data. If the value is greater than 0.95, decide that the current data do not need to be uploaded due to the high data similarity between the current data and the previous data. Then, proceed to the step 8 and end this upload decision-making; otherwise, the mechanism continues to step 2.
Step 2: In this step, the source node number and data similarity query in the compression information table are completed. If the corresponding record cannot be found, then continue to step 3; otherwise, proceed to step 4.
Step 3: The prediction of the CR and compression time. In this step, the CR and the compression time for one byte of data ( ) that can meet the precision requirement of each alternative compression algorithm are calculate. Then, this calculation result is written into the compression information table. Step 1: Carry out similarity calculation. Calculate the similarity of the 500 current data to be uploaded and the 500 previous data, which were sampled chronologically and expressed as ds = (t 1 , d 1 ), (t 2 , d 2 ), . . . (t 500 , d 500 ) , where (t i , d i ) denotes the sampled data at one time. Next, calculate the similarity of the current sampled data and the previous sampled data. If the value is greater than 0.95, decide that the current data do not need to be uploaded due to the high data similarity between the current data and the previous data. Then, proceed to the step 8 and end this upload decision-making; otherwise, the mechanism continues to step 2.
Step 2: In this step, the source node number and data similarity query in the compression information table are completed. If the corresponding record cannot be found, then continue to step 3; otherwise, proceed to step 4.
Step 3: The prediction of the CR and compression time. In this step, the CR and the compression time for one byte of data (T nc ) that can meet the precision requirement of each alternative compression algorithm are calculate. Then, this calculation result is written into the compression information table.
Step 4: Perform the joint power control and rate adjustment algorithm that are described in Section 3.2 to determine the proper sending power and data transmission rate.
Step 5: The gateway obtains the relevant information of the nodes that are needed for the data upload decision-making mechanism ("Predictive information", Figure 3).
Step 6: The total energy consumption of the uploaded compressed data is calculated and compared, including the node compression calculation's energy consumption and the total energy consumption to directly uploading the raw data. Then, select the optimal upload execution strategy with the lowest total energy consumption. Finally, broadcast this selected strategy to the source node.
The total energy consumption of the compression and uploaded compressed data includes two parts: the compression calculation's energy consumption and the transmission's energy consumption including the energy consumption of sending and the energy consumption of receiving. The total energy consumption of directly uploading the raw data only includes the transmission's energy consumption. The compression calculation's energy consumption is formulated using Equation (11), the energy consumption of sending is calculated using Equation (12), and the energy consumption of receiving is calculated using Equation (13): As such, the total energy consumption of the uploaded compressed data after compression can be calculated using: Then, the total energy consumption of directly uploading the raw data without compression includes the energy consumption of sending (E ns−un ) and the energy consumption of receiving (E nr−un ), which can be calculated using Equations (15) and (16), respectively: After E ns−un and E nr−un are derived, the total energy consumption of directly uploading the raw data can be calculated using: When h = 1, E cp and E uncp can be calculated using Equations (18) and (19), respectively: E cp = P nc * L * T nc (dos) + P ns * L * CR(dos) * T n * (1 + rs) E uncp = P ns * L * T n * (1 + rs) where h, P nc , and L denote the total number of hops from the source node to the gateway, the compression calculation power, and the raw data size, respectively; dos represents the data similarity requirements; T nc is the average time cost for one byte of data compression under dos; P ns−i and P nr−i denote the sending power of the sending node and the receiving power of the receiving node at the ith hop of the transmission path, respectively; CR is the compression ratio that is obtained by the alternative compression algorithm under dos·T n−i , which is the average time cost for transmitting one byte of data, and determined by the transmission rate at the i-th hop; and rs i denotes the data retransmission rate of the sending node at the i-th hop of the transmission path. We did not consider the energy consumption of receiving of the gateway. Therefore, the transmission's energy consumption does not include the energy consumption of receiving when h = 1. Sep 7: Select the minimum values of E cp and E uncp and the corresponding strategy as the data upload decision. This decision could provide better energy saving results for the UWSNs.
Step 8: End the upload decision-making process.

Simulation and Evaluation
In this section, we verify the energy optimization function of the proposed data upload decision-making mechanism for a UWSNs. The network topology is shown in Figure 5, where A1-A3 are the three anchor nodes for locating sensor nodes and animals. Among them, A1 and A3 are deployed on the water surface, A2 is set to 250 m below the water surface. D1and D2 are the gateway nodes (sink nodes), deployed on the water surface, that receive the data from the sensor nodes and are responsible for receiving and collecting data. Suppose nodes S1-S3 are the source nodes of the data transmission and R1-R9 are the relay nodes, among which S1, S2, R1, R2, R3, and R4 are set to 1500 m below the water surface; and S3, R5, R6, R7, R8, and R9 are set to 500 m below the water surface. They are deployed in the range of 2000 × 3000 × 2000 m, and the range is divided into 12 1000 × 1000 × 1000 m grids. Each sensor node is placed in the center of a 1000 × 1000 × 1000 m three-dimensional cube space; the communication range is set to 1000 m. We use M to represent an animal that is detected traveling at a speed of V (1,0,0). We selected 10 groups of data in chronological order from the measured data of the temperature, salinity, and dissolved oxygen in one week near an island located in Sanya, Hainan province, China. Each group contained 50 sample data, the packet length was 100 B, and all data were normalized to [0, 1]. upload decision. This decision could provide better energy saving results for the UWSNs.
Step 8: End the upload decision-making process.

Simulation and Evaluation
In this section, we verify the energy optimization function of the proposed data upload decision-making mechanism for a UWSNs. The network topology is shown in Figure 5, where A1-A3 are the three anchor nodes for locating sensor nodes and animals. Among them, A1 and A3 are deployed on the water surface, A2 is set to 250 m below the water surface. D1and D2 are the gateway nodes (sink nodes), deployed on the water surface, that receive the data from the sensor nodes and are responsible for receiving and collecting data. Suppose nodes S1-S3 are the source nodes of the data transmission and R1-R9 are the relay nodes, among which S1, S2, R1, R2, R3, and R4 are set to 1500 m below the water surface; and S3, R5, R6, R7, R8, and R9 are set to 500 m below the water surface. They are deployed in the range of 2000 × 3000 × 2000 m, and the range is divided into 12 1000 × 1000 × 1000 m grids. Each sensor node is placed in the center of a 1000 × 1000 × 1000 m three-dimensional cube space; the communication range is set to 1000 m. We use M to represent an animal that is detected traveling at a speed of V (1,0,0). We selected 10 groups of data in chronological order from the measured data of the temperature, salinity, and dissolved oxygen in one week near an island located in Sanya, Hainan province, China. Each group contained 50 sample data, the packet length was 100 B, and all data were normalized to [0,1].

Figure 5. Network topology.
Three temporal domain compression algorithms, greedy disconnected piecewise linear approximation (GDPLA) [47], the first-order autoregression [48], and the 5/3 wavelets [49], were selected as the compression algorithm options for the data upload decision-making mechanism. The Three temporal domain compression algorithms, greedy disconnected piecewise linear approximation (GDPLA) [47], the first-order autoregression [48], and the 5/3 wavelets [49], were selected as the compression algorithm options for the data upload decision-making mechanism. The data upload decision-making mechanism that is proposed in this section is not limited to specific data types and the alternative compression algorithms mentioned here. The following simulation sets the data upload decision result to be uploaded (data similarity is less than 0.95). The compression ratios of different alternative compression algorithms under different data similarity requirements are shown in Table 2. To intuitively evaluate the compression performance of each alternative compression algorithm, Figure 6 depicts the CR with varying data similarities. This figure shows that the accuracy requirement that is represented by the data can similarity lowers the CR. When monitoring the data of a single node, there is such a relationship: the higher the data similarity, the lower the CR, and the better the compression performance. As shown in Figure 6, the GDPLA and first-order autoregressive prediction that the higher algorithmic complexities can produce better compression performance. algorithm, Figure 6 depicts the CR with varying data similarities. This figure shows that the accuracy requirement that is represented by the data can similarity lowers the CR. When monitoring the data of a single node, there is such a relationship: the higher the data similarity, the lower the CR, and the better the compression performance. As shown in Figure 6, the GDPLA and first-order autoregressive prediction that the higher algorithmic complexities can produce better compression performance. To evaluate the energy saving effect of different data upload decision-making strategies, we calculated the deviation between the total energy consumption of different upload strategies and the optimal energy consumption [50] of different UWSNs nodes in different network settings. The simulation results are listed at Table 3, and Figure 7 shows the energy saving deviation. To evaluate the energy saving effect of different data upload decision-making strategies, we calculated the deviation between the total energy consumption of different upload strategies and the optimal energy consumption [50] of different UWSNs nodes in different network settings. The simulation results are listed at Table 3, and Figure 7 shows the energy saving deviation. Table 3. Deviation between the energy consumption of the upload strategies and the optimal energy consumption.  Figure 7. Deviation between the energy consumption of the upload strategies and the optimal energy consumption. Table 3. Deviation between the energy consumption of the upload strategies and the optimal energy consumption.

Accuracy
No. As shown in Table 3, the network settings of our simulation include three data similarities (compression accuracy requirement), two numbers of hops, and two data retransmission ratios, which represent different compression accuracy requirements, distances nearer or farther from the gateway, and better or worse underwater acoustic channel quality, respectively. For example, the parameter setting in the first row shows the data similarity is 0.09, the number of hops away from the gateway is 2, and the data retransmission is 90% respectively.

Hops
In Figure 7, the parameter setting on x-axis d0.09-h2-r90% shows that the data similarity (d) is 0.09, the number of hops away from the gateway (h) is 2, and the data retransmission ratio (r) is 90%.
We evaluated the energy consumption performance of the method that applies our proposed data upload decision-making strategy. For comparison, we also evaluated the energy consumption performance of other alternative compression algorithms. As shown in Figure 7, all nodes can upload data with optimal energy consumption performance in different network settings. We have further analyzed the impact of the data similarity and transmission distance on data uploading energy consumption. Table 3 shows that due to the time-varying characteristics of the underwater channel, uploading compressed data does not necessarily save more energy than uploading data without compression. Determining whether to run an alternative compression algorithm and then uploading the compressed data is also related to the link quality of the transmission path and the data similarity. In detail, when the data similarity is low, the energy consumption of compression calculation is large, resulting in the total energy consumption of the uploaded compressed data being higher than the total energy consumption of the uploaded uncompressed raw data. Therefore, the source node close to the gateway will choose the compression strategy with no compression or low complexity. With the increase in the distance, the energy consumption of communication increases exponentially. At this time, even with the energy consumption of the compression calculation, the total energy consumption of data upload after compression is still lower than that of the raw data upload without compression. Therefore, the nodes choose the algorithm with a better compression effect to compress and upload data. When the data similarity is higher, which means significantly lower energy consumption and better compression performance, the source nodes will choose to perform compression and upload the compressed data after the operation of decision-making.
We compared the network energy efficiency gain when using the unified optimized compression algorithm with better compression performance, against when the data are not compressed or our proposed data upload decision-making mechanism is used. The simulation results are shown in Figure 8. Figure 8a shows that when the data similarity is lower or the source nodes are closer to the gateway, the optimized compression can obtain negative energy efficiency gain, mainly because the higher compression computation energy consumption could result in the total energy consumption required for uploading the compressed data exceeding the total energy consumption required for directly uploading the data without performing any compression. Instead, as shown in Figure 8b, our proposed data upload mechanism, which selects an adaptive upload strategy according to the quality of network communication and the requirement of compressed precision, can obtain higher energy efficiency gain against when the unified and optimized compression algorithm is used. For all the network settings above, we know that our proposed data upload decision-making strategy can upload the data with relatively lower energy consumption.
choose to perform compression and upload the compressed data after the operation of decision-making.
We compared the network energy efficiency gain when using the unified optimized compression algorithm with better compression performance, against when the data are not compressed or our proposed data upload decision-making mechanism is used. The simulation results are shown in Figure 8.   Figure 8a shows that when the data similarity is lower or the source nodes are closer to the gateway, the optimized compression can obtain negative energy efficiency gain, mainly because the

Conclusions
A large similarity problem exists in underwater sensor network data acquisition, with considerable amounts of redundant data in the network. Due to the quality of the network communication, the data compression algorithm that is used in the UWSNs may not save energy. Considering the limited energy available in underwater networks, to avoid wasting energy due to uploading compressing duplicate and redundant data, a data upload decision-making mechanism for underwater networks with the goal of optimizing energy consumption was proposed in view of the high similarity redundancy of the collected data. By analyzing the data similarity and the uploading energy consumption, the mechanism provides the underwater node with the best decision-making strategy for data upload. Compared with the single compression algorithm, the data upload decision-making mechanism proposed in this paper can achieve better energy savings.