Two-Tier PSO Based Data Routing Employing Bayesian Compressive Sensing in Underwater Sensor Networks

Underwater acoustic sensor networks play an important role in assisting humans to explore information under the sea. In this work, we consider the combination of sensor selection and data routing in three dimensional underwater wireless sensor networks based on Bayesian compressive sensing and particle swarm optimization. The algorithm we proposed is a two-tier PSO approach. In the first tier, a PSO-based clustering protocol is proposed to synthetically consider the energy consumption and uniformity of cluster head distribution. Then in the second tier, a PSO-based routing protocol is proposed to implement inner-cluster one-hop routing and outer-cluster multi-hop routing. The nodes selected to constitute i-th effective routing path decide which positions in the i-th row of the measurement matrix are nonzero. As a result, in this tier the protocol comprehensively considers energy efficiency, network balance and data recovery quality. The Bayesian Cramér-Rao Bound (BCRB) in such a case is analyzed and added in the fitness function to monitor the mean square error of the reconstructed signal. The experimental results validate that our algorithm maintains a longer life time and postpones the appearance of the first dead node while keeps the reconstruction error lower compared with the cutting-edge algorithms which are also based on distributed multi-hop compressive sensing approaches.

As maritime rights and interests are paid more and more attention, underwater wireless sensor networks (UWSNs), an important extension of WSNs into ocean, is of great value in various application fields, such as, oceanographic information collection, hydrological and environmental monitoring, resources exploration, disaster forecast, underwater navigation, military defense, and etc. [7]. For UWSNs, there exist two kinds of topological structures, two dimensional network and three dimensional network. In this work, we study the three dimensional UWSNs, with which all nodes are deployed at different depths in the ocean.
Generally, in WSNs, including UWSNs, the information from all sensor nodes, which are equipped with limited power battery, are gathered to the sink node. When the power of the node is used up, the

Related Work
CS, which unifies sampling and compression, has been enthusiastically promoted and studied [12,13]. It states that when the original signal is sparse itself or sparse on some basis, we can reconstruct it from compressed measurements. Making use of its advantages on simple encoding and efficient compressing, many researchers apply CS in WSN to achieve advanced compression compared with conventional compression algorithms [14][15][16]. CS could also be utilized for sensor selection; that is, only part of nodes will be chosen to transmit the product of its data with a random coefficient to the sink [17][18][19]. In the context that all nodes transmitted their data directly to the sink node in one hop, Hwang et al. [20] propose both centralized and decentralized algorithms for sensor selection under multi-variated noise condition. In [21], a tree-based algorithm named CDG is designed to reduce the payload falling on nodes close to the BS. In [22,23], a WSN is partitioned into clusters. Sensor readings are sent to CHs and the CHs send the received data to the BS. However, the measurement matrices they adopt are all full Gaussian ones which consume more power than the sparse measurement matrix used in our work. In the other word, they do not combine the routing and nodes selection with the help of CS.
PSO is a computational method that can be used to improve the optimization of candidate solutions through iteration. The way to adopt PSO for data routing [24] has been studied in several literatures [25][26][27]. Ref. [25] proposes a novel bi-velocity discrete PSO approach which extends PSO from the continuous version to the binary or discrete domain to find routing paths for WSN. Tukisi et al. [26] presents a PSO based method to find routings for the network with energy harvesting. Targeting on the problem that some of the nodes would be left out during cluster formation, ref. [27] introduces to apply the concepts of PSO and gravitational search algorithm to prevent residual nodes. All of these studies consider the application of PSO either in the clustering or in the routing, i.e., only one tier. And they do not consider the application of CS.
To the best of our knowledge, there are few works besides the following literatures that have considered the application of CS into combination of data routing and nodes selection in WSNs, especially in UWSNs. In [28], the network is partitioned into clusters. Then each CH collects the sensor readings within its cluster and generates CS measurements to be forwarded directly to the base station. As a result, the overall CS measurement matrix at the base station is a block diagonal matrix. Later on, in [29], an improved method is introduced which allows the multi-hop routing of generated CS measurements through intermediate CHs. The authors generalize the two methods in [30]. Compared to our work, they obtain CS measurements in each CH and then forward to sink node while we postpone this operation to be done in the sink node to reduce the burden in CHs. On the other hand, while choosing multi-hop routing paths from the CH to the sink node, they use traditional tree-based algorithm while we design a PSO-optimization based scheme.Specifically targeting on three dimensional UWSNs as well, distributed multi-hop CS has been proposed by Gong et al. in [31]. This algorithm randomly chooses a subset with size M of N nodes in the first place, then each chosen node finds a tour to the sink node. During the traveling of the message through the tour, each node computes the product of its sensing data and a random weighted coefficient and adds the product to the intermediate result it received. Finally the sink node obtains M measurements and reconstructs the whole data.Compared to our work, they ignore the noise and select the nodes which participate in sensing by Bernoulli generator. To conclude, though these works use sparse measurement matrices similar to what we do, they only consider energy efficiency while choosing appropriate routs without considering the sensing quality. Furthermore, they do not consider the optimization of clustering.
In this work, we introduce a two-tier PSO protocol for both clustering and routing. Specifically, our contributions are listed as below. (1) In the first tier, a PSO-based clustering protocol is proposed to find appropriate CHs with the comprehensive consideration of energy consumption efficiency and CH distribution uniformity. (2) In the second tier, a PSO-based routing protocol is introduced to find appropriate routs. A routing path consists of inner-cluster one-hop routing and outer-cluster multi-hop routing and each path corresponds to one cluster. Hence the number of paths equals to the number of clusters. It is worth mention that the multi-hop routing in our scheme is not restricted to consisting of CHs only. Note that these paths correspond to the rows of the measurement matrix for CS. In this way, we implement the combination of routing and nodes selection based on CS. Hence, the fitness function of PSO-based routing protocol comprises of several factors, including the criterions for energy efficiency, network balance, and data recovery qualities. To measure network balance, we divide the nodes in the 3-D model into different layers in accordance with their horizontal distance to the sink node. As we adopt Bayesian CS (BCS), Bayesian Cramér-Rao Bound (BCRB) is added in the fitness function to reflect the recovery performance.
With optimization in CH and routing chosen progress, the network could survive longer with relatively lower measurement error as demonstrated by the simulation results. What's more, taking energy-balancing into consideration contributes to form a more balanced energy-consuming network every round cycle, and also prolongs the network lifetime and reduces the sensing error.
The remainder of this paper is organized as follows. In Section 2 we present preliminary knowledge. In Section 3, we introduce our system model for UWSN scenario. The detailed descriptions of the clustering and routing algorithms based on CS and PSO are given in Section 4. The simulation results are presented in Section 5. Finally, Section 6 concludes the paper, and introduces the future work.
The notation used in this paper is according to the convention. Symbols for matrices and vectors are in boldface.

Compressive Sensing
CS theory proves that if a signal z of dimension N is sufficiently sparse in a certain domain; that is z = Ψx and x 0 = K, K N, where Ψ ∈ R N×N is an orthogonal basis and x ∈ R N , it is efficient to recover z from a random measurement vector that is obtained by y = Φz. M, the length of y is much smaller than N. Φ is named as a measurement matrix or sensing matrix. In the sequel, the random non-zero coefficients in Φ obey Gaussian distributions.
In general, it is an ill-posed problem to recover signal z from y since y has much smaller dimensions than z. We have to minimize the solution's 0 norm to find the sparsest result. However, it is an NP-hard problem, which is not capable to be represented by mathematical formulas. Candés et al. [13] state that as long as a measurement matrix satisfies the restricted isometry property (RIP) condition, the 0 norm can be replaced by 1 norm. A measurement matrix satisfies RIP of order of K when for all x 0 ≤ 2K and some 0 < < 1. Afterwards, many methods for such a convex optimization problem were introduced to solve the 1 norm optimization problem [32][33][34]. Later on, iterative-greedy-pursuit recovery algorithms [35][36][37] are proposed to reduce the time complexity.

Bayesian Estimation
In the presence of noise, the noisy measurement vector y is y = ΦΨx + n = Θx + n. ( where n ∈ R M denotes the noise. In the underwater acoustic channel, the overall noise at the receiver contains both the ambient noise and residual intersymbol interference (ISI) after equalization. Though the ambient noise is not AWGN in the underwater environment, the use of Gaussian approximation is often convenient and adequate [38]. In the presence of Gaussian ambient noise, according to [39] and [40], the overall noise can be treated as Gaussian, although it is not independent with regard to time step, since the signal after CS transform, i.e., transmitted signal is treated as Gaussian distributed. Hence, in the sequel, n is approximated as Gaussian distributed.
In [41], BCS is proposed to reconstruct a sparse signal from noisy measurements, if a statistical characterization of the signal is available. The computation time of BCS is comparable to or even faster than other classical greedy iteration algorithms. In this work, we adopt bayesian learning to recover the signal in the sink node. Therefore in this part, we make a brief review of BCS method. n i indicates the i-th element of n in Equation (2), and is approximated as a zero mean Gaussian variable with variance σ 2 . From the Bayesian point of view, to obtain the restoration of the original signal is to seek a full posterior probability function for x. Usually, a Laplacian distribution is a widely used prior for sparse signals. But considering it is not conjugate to Gaussian likelihood, in BCS, a hierarchical prior has been defined for x; that is, a zero-mean Gaussian prior is defined on each element of x where α i is the inverse-variance of x i and α satisfies a Gamma prior with parameters a and b with p(α) = ∏ N i=1 Γ(α i |a, b). By marginalizing over the hyperparameter α, the overall prior is given by Likewise, the hyperparameter α 0 = 1/σ 2 satisfies a Gamma prior Γ(α 0 |c, d). Note that a, b, c, and d are all set to zero to obtain a uniform hyperprior.
The posterior of x is with the mean matrix µ and the covariance matrix Σ, For the case of uniform hyperpriors, Finally, we can estimate all unknown parameters by maximizing the posterior in Equation (5). To find a approximate solution, we use EM algorithm to achieve the solution iteratively. In the algorithm, α and α 0 are re-estimated by every iteration with the form where µ i is the ith posterior mean weight from µ and γ i is defined as γ i 1 − α i Σ ii , with Σ ii the i-th diagonal element of the posterior covariance weight from Σ. Then new µ and Σ are obtained. The iterations terminate until the values of α and α 0 converges. Finally,x MAP = µ.

Particle Swarm Optimization
PSO was inspired by the social behavior of bird flocking or fish schooling. PSO obtains a set of candidate solutions according to the mathematical formulas, which are based on the positions and velocities of the particles. These particles are iterated in the search space to solve the problem. The motion of each particle is not only affected by its local best known position, but is also guided toward the best known position in the search-space, which are updated by better positions found by other particles. PSO is a meta-heuristic algorithm (Algorithm 1) because it makes little or no assumptions about the problem being optimized and is able to search within a very large candidate solutions space.
Initially, each particle is randomly assigned with a position vector x l = [x l,1 , x l,2 , ..., x l,D ] as well as a velocity vector v l = [v l,1 , v l,2 , ..., v l,D ], where D is the dimension of a particle and l ∈ (1, 2, ..., S) with S representing the total particles number. Then each particle keeps track of its personal best position P l and the global best one G in the whole search space. After finding the two best fitness values for the l-th particle, it updates the position and velocity by the fourmulas where m represents the current number of iteration, and r 1,l , r 2,l are random variables between [0, 1]. a 1 and a 2 are the learning factors while ω is a weight factor that controls the velocity of the particle.

Algorithm 1 PSO algorithm
for each particle do initialize particle end for while target fitness or maximum epoch is not attained do for each particle do calculate fitness if current fitness value better than (pbest) then pbest=current fitness end if end for set gbest to the best one among all pbest for each particle do update velocity update position end for end while

Proposed System Model
We consider the scenario that sensor nodes are randomly deployed in a three dimensional undersea area and sink node (base station) is placed on top of the area as shown in Figure 1. As introduced before, CS can only be used when the signal has sparsity in some domain. In the next, we would prove that the monitored oceanic data such as temperature, hardness, and salinity are indeed sparse in Fourier transform domain because of their spatial correlation. Take the ocean underwater temperature data rendered by National Aeronautic and Space Administration (NASA) (https://www.nasa.gov/specials/ocean-worlds) for example, we transform the data into its fast Fourier transform (FFT) domain and the results validate the sparsity as show in Figure 2. Another property of oceanic data is that its sparsity varies in different depth. As a result, we divide the network into different segments in line with its depth. Note that data with higher sparsity needs more information for recovery. Therefore, segments with higher sparsity are given higher cluster-heads chosen probability so that for these segments, more measurements are obtained. In our model, the data in different segments are transmitted to the sink node and reconstructed separately, hence every segment can be considered as a sub-model. The segment-divided network model is presented as Figure 1. In reality, the data is collected in a 3-dimensional network. We need to ransform them into one-dimensional data in accordance with its horizontal position and depth first. Hence the data in the i-th segment is represented as where N is the total number of nodes in the i-th segment. However, for simplicity, in the sequel, we take every sub-model as an integral model and the data are regarded as z = [z 1 , z 2 , ..., z N ]. Its sparse form is denoted as z = Ψx, where x is a sparse vector and Ψ is the FFT transform matrix.
In each segment, we use CS to reduce the transmission burden to the sink node. Therefore, energy is saved and the network lifetime is prolonged. Clustering is also applied in our model to further economize on energy. The noisy measurement vector collected in the sink node is y = Φz + n = ΦΨx + n. For the purpose of making network energy-consuming balanced and easier for a node to find the next relay node, we divide all nodes into different layers according to their distances to the sink node. In the following work, we divided them into four distinct layers, those who are closer to the sink node are given smaller layer number, which is depicted in Figure 3.

Proposed Algorithm
The overall algorithm contains three main steps illustrated as follows.
Firstly, we use PSO algorithm to choose M CHs for each segment and divide the nodes in each segment into M clusters. The second step is to select one routing path for each cluster using PSO algorithm, hence we obtain M routs to compose an integral routing matrix Φ ∈ R M×N . Figure 4 presents the flowchart of the whole procedure. In the flowchart, E represents the remaining energy of the whole network and f 1 , f 2 are the fitness functions of clustering and routing steps respectively. f 1 considers energy consumption and uniformity of cluster head distribution while f 2 focuses on energy consumption state and BCRB.
The detailed description about how the CHs and routing paths, i.e., the measurement matrix are chosen will be presented in the next subsections. After we get the measurement matrix, the sink node receives a M dimensional measurement y and its measurement matrix Φ. Finally, at the sink node, we use BCS algorithm to reconstruct the sparse signal x and then transform it to the original temperature signal z. After obtaining signals for all the segments one by one, the base station merges them together to recover the complete original signal.  To design the final object, there are three important factors to be taken into consideration: the signal reconstruction accuracy, the remaining energy of the network, and the balance state of the network.
Usually, mean squared error (MSE) is used to measure the reconstruction quality of the signals. If x(y) denotes the recovered signal based on noisy measurements y represented by Equation (2). The MSE in estimation of the vector x is given by However, the calculation of MSE requires the actual estimator which is impossible for us to get before choosing the measurement matrix. To solve this problem, we introduce Bayesian Cramér-Rao Bound (BCRB) to measure the lower bound of MSE; that is, MSE ≥ BCRB. Hence, adding BCRB into the fitness function is beneficial for us to find the solution with lower MSE, since minimizing BCRB is equal to obtaining minimum acceptable MSE. Subsequently, we will derive the formula of BCRB for our problem.

BCRB Derivation
The BCRB is computed as BCRB = Tr{J −1 B } with J B ∈ R N×N being the Bayesian Fisher information matrix (FIM) [42].
. Then J ij can be calculated as Therefore, the Bayesian FIM is given by Firstly, we compute the second part of Equation (12). To solve L(x), we must obtain the density function of signal x in the first place. In our model, the sparse signal x follows the Gaussian distribution with a mean vector µ and a covariance matrix Σ. Σ = diag(γ 1 , ..., γ N ) is a diagonal matrix with γ i denoting the variance of x i . As a result, the density of x is As the previous part is irrelevant to x, we abbreviate it asg. Then the formula in matrix form can be written as As a result ∂x∂x T = Σ −1 . Secondly, we turn to deal with the first part of FIM. To obtain −E (y,x) { ∂ 2 L(y|x) ∂x∂x T }, we need to know p(y|x). Since y = ΦΨx + n, the probability density function of y is related to the noise term n given the information of x. The noise can be the Gaussian distribution; that is, Subsequently, Finally, we put the two parts together as the description of FIM, Therefore, in our model, BCRB is represented by Tr{ 1 σ 2 Ψ T Φ T ΦΨ + Γ −1 }.

Underwater Energy Consumption Model
Ref. [43] gives an in-depth analysis of the energy consumptions of underwater acoustic networks according to the conceptions of underwater acoustic channel attenuation model, noise model and bandwidth model. The total energy consumption associated to one hop (which includes both the transmit and the receive energy at the two ends of the link), is determined by the power P r , acoustic electric conversion power P el t (l) which is the power required to transform acoustic signal to electric signal and single hop transmission delay t hop (L, l). Specifically, the energy consumed by each single hop, E hop is calculated as: The single hop transmission delay t hop (L, l) is determined by package length L and the distance l between two transmitting nodes as well as channel effective bandwidth αB(l) where α means channel utilization rate. So the formula becomes Herein, 10 −17.2 is the conversion factor from acoustic power in dBreµPa to electrical power in Watt, η represents electronic circuit conversion efficiency and P t (l) is the transmitting power of underwater nodes, which is calculated as N( f (l)) and A(l, f (l)) are noise parameter and attenuation coefficient, respectively. AN = A l, f (l) N f (l) is the channel parameter and can be plotted as a curve in terms of f for given l. Therefore, for certain l, we can obtain its best frequency f 0 (l) and then find its corresponding AN product, N f 0 (l) × A l, f 0 (l) from the AN curve.
SNR tgt denotes the target signal noise rate for the terminal to receive signals correctly. B(l) is the usable bandwidth given SNR tgt , which is B(l) = b × l −β . The positive parameters b, β depend on the target SNR as well. Unlike the transmit power, the receive power P r is independent of distance, and rather depends on the complexity of the receive operations. So it could be set as a constant value. More details can be found in [44].
In conclusion, the energy consumed for each hop in the terms of package length L and distance l is calculated as: Based on such a model, we can calculate the energy consumed by each node which participates the routing.

PSO for Clustering
First we use PSO to find appropriate nodes to be CHs. Because we divided the network into different segments, the percentage of CHs within each segment is highly related to the sparsity of the segment. In general, we set the CH ratio q to be 0.4, only in some segments which have relative low sparsity, the value of q i increases slightly in a trend like q i = 0.4 + 0.05 × s i − 0.08 where s i is the sparsity of the ith segment.
The particle's dimension is the same as the number of CHs. To initialize the particle, we generate M random integer numbers within [1, N]. For example, the i-th particle P i is M dimensional and P i,m represents the m-th value of it. If P i,m = 25, it means that in this particle, we choose node 25 to be the m-th CH. As a node cannot be chosen repeatedly in one particle, if a particle has duplicated numbers, it will be assigned a penalty fitness value −1 to be excluded during later progress.
After initializing all the particles, we calculate the fitness value for each particle. According to their fitness value, we choose the local best and the global best particles, update the particle position which corresponds to the coordinates of the nodes in this particle and velocity to get new fitness values.
In this part, we mainly focus on energy consumption and uniformity of cluster head distribution, hence the fitness function f does not include the measurements for signal recovery qualities in current stage. The fitness function f 1 comprises of three parts as below: • E p , represents the energy evaluation factor and E p is given by which measures the remaining energy of the chosen CHs.
Note that E init i is the initial energy of the i-th node while E res m is the residual energy of the m-th node and equals to E ini m − E c m , and We tend to choose nodes with higher residual energy to be CHs due to the fact that cluster heads consume more energy than normal nodes. • E c indicates the evaluation factor for the intra-cluster compactness and measures the average distance between nodes and their cluster heads. .., d j,N , i = j and d j,i measures the distance between node j and node i. In an unevenly distributed network, the sum of D must be higher than relatively evenly distributed network. As a consequence, we add this value E e = ∑ M j=1 d j /|C m | on our fitness function.
To sum up, the final form of fitness function f 1 of our cluster choosing algorithm is given by: where and w i is the weighting factor for the i-th item. To think about the problem in a balanced way, w 1 and w 2 are randomly chosen from the range [0.2, 0.4] while w 3 = 1 − w 1 − w 2 . When a particle contains duplicated nodes, we set a penalty factor to it to exclude it from iterations. This iteration keeps going on until either the global best fitness value is lower than the threshold we set or the maximal iterative times is reached. The global best particle obtained by the last iteration is the output of the cluster head selection scheme.

PSO for Choosing Routing Paths
The final step of our algorithm is to find the routing paths, then the measurement matrix. PSO is used in this stage as well and herein each particle contains M sub-particles, where M is the number of CHs. A sub-particle includes the nodeID information while sub-particle position includes priorities corresponding to all the nodes denoted in the sub-particle.
Each sub-particle determines a single rout from each cluster to the base station. Since we have already chosen all the CHs, there are two steps left for each CH to fulfill the corresponding routing path: (1) Each CH chooses the nodes in its cluster which transmit the messages to it in one hop; (2) After collecting messages in its cluster, each CH starts a multi-hop rout to the base station.
In one word, a complete routing path contains two parts, the one-hop inner cluster part and the multi-hop outer part. Therefore we divide the sub-particle into two parts, a and b. Part a determines which nodes from the inner cluster transmit their data to the CH while part b finds the routing path from the CH to the base station. It is difficult to jointly optimize these two part. Instead, we construct inner cluster part and outer cluster part separately.
The dimension of part a of sub-particle i equals to K i , where K i is the node numbers of each cluster except the i-th CH. The sub-particle i selects × K i nodes with highest priorities from part a to transmit data to the i-th CH in one hop. is the ratio controlling parameter which we would optimize in the simulation section. Figure 5 gives an example on how to choose nodes within one cluster and how to map part a of a sub-particle into real measurement matrix. The measurement matrix with dimension M × N is initially set to all zero matrix, where M is the number of CHs and N is the total number of nodes. Once a node is chosen into i-th rout, its corresponding position in the i-th row of measurement matrix is set to a randomly distributed coefficient which obeys Gaussian distribution.  The algorithm to find a multi-hop routing path from the i-th CH to the base station from part b is more sophisticated and will be illustrated in detail in the following.

i. Choosing candidate nodes
Regarding that if we take all nodes into consideration for each rout, the dimension of a sub-particle would be too large. Given the fact that when a node try to find its rout, those nodes in the opposite area from sink node may never be included. Therefore, before the initialization, we calculate the distances of all nodes to the sink node Dis s and the distance of all nodes to the CHs Dis c . Then choose an appropriate group of candidate nodes for each CH first. Dis s = d s,1 , ..., d s,N represents nodes' distances to the sink node while Dis c = d c,1 , ..., d c,N represents nodes' distances to the CH. dis = (x s − x c ) 2 + (y s − y c ) 2 + (z s − z c ) 2 is the distance between the CH and sink node where x s , y s , z s represent the coordinates of the sink node and x c , y c , z c are the coordinates of the CH. Note that these coordinates values could be obtained by range-based or range-free localizations before the data transmission. Only when a node k satisfies d s,k ≤ dis and d c,k ≤ dis, it would be added to the candidate group. Once the candidate groups are chosen, they will not be altered during the whole iteration period. Suppose the size of candidate group for the i-th CH is N i , then the dimension of part b of the sub-particle i equals to N i .

ii. Initialization
To find part b, we make use of the network's layered structure as we need to take directions into consideration. The nodes that are in the closer layer to the base station ought to have higher priority. This layer-based initialization method can control the routing path's direction towards the sink node as well as avoid trapping into a dead cycle and accelerate iterative process.
The priorities of the nodes in the farthest layer are initialized by randomly generating numbers between 0 to 1. In our layered model, the fourth layer is the farthest layer, so the priorities of the nodes in the fourth layer are set from 0 to 1, while the priorities of the nodes in the third layer are from 1 to 2. In one word, when the number of the layer decreases by 1, the initial priorities of the nodes increase by 1. Hence the priority is not initialized completely randomly but related to the layer the node belongs to. Figure 6 shows an example of our initialization. In this example, for the CH, there are altogether 12 candidate nodes dispersed in 4 different layers. It can be easily observed that the nodes in the higher layers have lower initialized priorities. That is how we achieve hierarchical initialization and obtain initialized multi-hop routing path from the CH to the sink node.

iii. Iteration
After initializing the particles, in each iteration, we are going to calculate particles' fitness values in terms of measurements matrices and select local best and global best ones. Then the particle positions are updated.
We have already introduced how to map part a of the sub-particle i to the corresponding row. Here we introduce how we map part b of sub-particle i to the i-th row of the measurement matrix.
Firstly we set the corresponding position of the i-th CH in the i-th row of Φ to a random Gaussian distributed coefficient. Then we find the next relay node by two steps iteratively until the complete routing path is found.
Step 1: If the previous node's distance to the sink node is smaller than its transmission range R c , stop searching since we already obtain a single rout from the i-th CH to the sink node. Otherwise, we go to step 2.
Step 2: Find all nodes that are not only in the candidate group but also within the previous node's transmission range, compare their priorities and find the largest one as the next relay node. Meanwhile, we set the corresponding position of chosen relay node in the i-th row of Φ to a random Gaussian distributed coefficient.
After mapping M sub-particles to M rows, the measurement matrix Φ is determined. The generated Φ is then used to compute the fitness value of corresponding particle. The fitness function includes two main items, the measurement of reconstruction quality, B r , which is related to BCRB and the measurement of energy consumed, E s .
As we calculated before, the formula of the BCRB is given by As BCRB implies a theoretical lower bound of reconstruction's mean square error, we wish it to be as small as possible.
The energy consumption of the routings we chose is measured by E s . It consists of three factors The first item , where E c k is the energy consumed at this round by node k and E res k is the rest energy of this node. It measures the energy cost ratio of nodes and should be small considering the durability of a network.
, where E init k is the initial energy of the k-th node.
On top of that, the balance of this network also needs to be considered. A well-known fact is that the nodes near the sink node are prone to consume more energy than the nodes far away. Those nodes burden heavier transmission tasks and are easier to convey their data to the sink repeatedly because they are more likely to be relay nodes. So to obtain a balanced network, we introduce a factor E b to describe a energy consumption ratio of different layer. E b = E 1 /E 4 + E 2 /E 3 , where E i represents the remaining energy of the ith layer. E b should be as large as possible as the lower layer tends to remain more energy in a balanced network than in an uneven network.
In a conclusion, the final fitness function consists of two parts as below, where W i is a weighting factor and W 1 + W 2 = 1, W i ≥ 0, i ∈ {1, 2}. In the simulation, W 1 is chosen randomly from the range [0.4, 0.6] while W 2 = 1 − W 1 .

Simulation Results
We use the real ocean temperature data provided by NASA (https://www.nasa.gov/specials/ ocean-worlds) to test our algorithm against the two cutting-edge methods. Note that the sensor network they deploy to monitor the environment is huge and we only choose 1000 nodes from this network which are deployed in a 3-D undersea region divided into 10 × 10 × 10 grids. The size of the area is 18.5 km × 14.5 km × 0.8 km. We compare our method with other two cutting edge method which also applies CS into routing. The method introduced in [31] for 3D UWSN is named as "DRMCS" which is the abbreviation of "distributed random multi-hop compressive sensing". The other method is called "ICCS" following its name in [29] and later on in [30]. We extend it to 3D scenario by changing the clustering and routing methods to 3D protocol through replacing the 2D distance (x, y) to 3D distance (x, y, z). The parameters we set for the energy consumption model in Section 4.1 are listed in Table 1 according to [43]. Before the whole simulations, we firstly select the appropriate value of parameter . It is impossible for us to obtain the signal z before the algorithm is run. To solve this problem, we do some preparatory experiments to decide an appropriate . In the first five rounds, we set = 0.9, run the experiments once, and then we obtain the reconstructed signalẑ. The reconstructed signalẑ is regarded as the approximation of the original signal z in order to compute the reconstruction error in the following simulations. That is, afterwards we ran the first five rounds with different ranging from 0.2 to 0.8, respectively. The average consuming energy and the reconstruction error are compared. For example, the results of round 1, 3 and 5 are shown in Table 2. Considering the reconstruction error and energy consumption state integrally, we found that in our algorithm = 0.4 is the most suitable. As a result, for the rest rounds, is set to be 0.4.
All experiments were performed on MATLAB 2018b and a computer with intel(R) Core(TM) i7-8700K 2.50 GHz CPU, 16.00 GB RAM and the Windows 10 system. In Figures 7 and 8, we compare the total energy consumption and the remaining energy for each round while the measurement ratio M/N is set to 0.4. Obviously our algorithm survives longer than the other two state-of-the-art methods.The first tier utilizes PSO to find the best cluster heads group while the second tier utilizes PSO to find the best routing matrix so that the whole network has the lowest average consuming energy per cycle while maintaining the sensing error in an acceptable range. Note that in the second tier, the optimization of the routing path can not only reduce the average energy cost per round but also help to make the network more balanced. In other words, besides the fact that our PSO-based routing scheme consumes less energy each round, another important reason for longer network lifetime is that our scheme obtains more balanced energy-consumption condition. The more balanced condition and less energy-consuming phenomenon of our scheme can also be proved in Figure 9, which shows the remaining living nodes of the network in each round. From Figure 9, we can observe that by our algorithm, the first dead node appears much later. The proposed algorithm has its first dead node at round more than 100 while the other algorithms has its first dead node at about round 70. Such a fact indicates that the proposed two-tier protocol consumes less energy each round as well as consumes energy more equally for each node of the whole network. The improvement attributes to the fact that in order to better balance the whole network, we add a factor named "network-balancing" in the particles' fitness value calculation. Considering that the nodes closer to the sink burden heavier transmission load and hence are easier to die, we put energy consumption ratios of inner and outer layers into consideration. Reducing the ratio can avoid repeatedly choosing inner layer nodes and balance the network energy consumption better. In order to observe the impact of node density on the network performance, we fix the total node number as 1000 and vary the volume of the area where these nodes are deployed. As shown in Figure 10, the average energy consumption in each round decreases with the increase of node density. Since the node number is fixed, the initial energy is fixed. Hence lower average energy consumption in each round means larger round cycles. This is due to the fact that we shrink the volume so that the average distances from CHs to sink node are shortened. From Figure 10, it can also be observed that compared with other two schemes, our algorithm consumes less average energy each round in the whole range of node density. To measure the delay of relevant schemes, we calculate the average hop numbers in the case of two different node densities. Average hop number is obtained by where hop i measures the hop numbers from the i-th CH to the sink node. Obviously, since the total node number is fixed and the volume is decreased, average hop numbers, i.e., delay is lower with higher node density as shown in Figure 11. On the other hand, our algorithm always presents the lowest delay compared with relevant schemes. Another performance we compare is the sensing quality which is measured by MSE between the original signal and reconstructed error. It can be observed from Figure 12 jointly with Figure 8 that our proposed algorithm tends to build a matrix Φ that leads to the smallest sensing error and meanwhile the energy consumed by our algorithm is the least in each round. Besides the previous experiments, we vary the measurements ratio M/N as well to observe the MSE with varying ratio. We test the average MSE for first 20 rounds. It can be observed that the MSE decreases with the increase of measurement ratio for each algorithm. Figure 13 validates that our algorithm presents significant advantage when the measurement ratio is less than 0.5. The reason is that we add the factor B r , which considers the reconstruction quality, into the fitness function while performing PSO for routing. Our proposed algorithm works well to reduce the MSE when the measurement ratio is low. While the measurement ratio goes up, the superiority of our proposed algorithm becomes less significant.

Conclusions and Future Work
In this paper, we proposed a two-tier Particle Swarm Optimization (PSO) algorithm in the clustering and routing stages for distributed multi-hop compressed sensing in 3-D UWSN. We divided the whole network into different segments according to the vertical distances of the nodes. On top of that, we also divided the whole network into different layers according to the horizontal distances from the nodes to the sink and the PSO algorithm we apply in the network is firmly associated with the layers. Dividing the layers help to define the energy consumption in different layers to measure energy-consuming balance and to initialize the particle hierarchically so as to accelerate the particle convergence speed. And a factor to forecast the signal recovery performance is added on the fitness function of the PSO algorithm for routing to control the MSE of the reconstructed signal while maintaining a relatively long lifetime. Our proposed algorithm is compared with recently proposed algorithms for distributed multi-hop compressive sensing routing problem using real marine temperature data rendered by NASA. The results validate that our proposed algorithm has a longer life time with lower measurement error and postpones the round that has the first dead node.
For the future work, we will focus on the synthesized consideration of the clustering and routing stage while doing optimization. In the current study, the two stages are considered separately. On the further study, we will conduct the two steps consistently to improve the performance a step further.