A Practical Data-Gathering Algorithm for Lossy Wireless Sensor Networks Employing Distributed Data Storage and Compressive Sensing

Reliability and energy efficiency are two key considerations when designing a compressive sensing (CS)-based data-gathering scheme. Most researchers assume there is no packets loss, thus, they focus only on reducing the energy consumption in wireless sensor networks (WSNs) while setting reliability concerns aside. To balance the performance–energy trade-off in lossy WSNs, a distributed data storage (DDS) and gathering scheme based on CS (CS-DDSG) is introduced, which combines CS and DDS. CS-DDSG utilizes broadcast properties to resist the impact of packet loss rates. Neighboring nodes receive packets with process constraints imposed to decrease the volume of both transmissions and receptions. The mobile sink randomly queries nodes and constructs a measurement matrix based on received data with the purpose of avoiding measuring the lossy nodes. Additionally, we demonstrate how this measurement matrix satisfies the restricted isometry property. To analyze the efficiency of the proposed scheme, an expression that reflects the total number of transmissions and receptions is formulated via random geometric graph theory. Simulation results indicate that our scheme achieves high precision for unreliable links and reduces the number of transmissions, receptions and fusions. Thus, our proposed CS-DDSG approach effectively balances energy consumption and reconstruction accuracy.


Introduction
As the perceptual layer of the Internet of Things (IoT) [1,2], wireless sensor networks (WSNs) [3] are widely deployed for purposes such as environment monitoring [4], industry automation [5] and military reconnaissance [6]. WSNs consist of many sensors and play a key role in sensing and gathering data from the surrounding environment. Because of harsh environments and energy-limited nodes, there are two key considerations in WSNs design: reliability and energy efficiency. In addition, nodes that are closer to the sink require more forwarding tasks than others, resulting in higher energy consumption as well as a reduction in the lifetime of the entire network.
Compressive sensing (CS) theory [7,8] provides a new method for reducing communication energy consumption. CS points out that, for the compressible signals in WSNs, a small collection of linear projections is sufficient to achieve near-perfect reconstruction, which reduces energy consumption and prolongs network lifetime. Thus, a considerable amount of research has been conducted concerning ways to utilize CS to gather data in WSNs. The CS-based data-gathering schemes in [9][10][11] obtained the member node readings utilizing fixed routing, in which ordinary nodes forward compressed data to the static sink node through multi-hops. Lou et al. [9] and Lou et al. [10] combined CS and routing protocols to reduce the number of transmissions. In [11][12][13][14][15], the use of sparse measurement matrix is investigated to reduce the number of nodes involved in data gathering. Introducing CS effectively reduces the energy required for communication and distributes energy consumption loads more evenly. However, if a parent node (which holds a combination of child node readings) loses its packet, then all the information from the child nodes is also lost. Hence, unreliable links have a serious impact on data gathering and make it difficult to reliable gather data reliably through a centralized sink node. Additionally, Kong et al. [16] reported that unreliable links are widespread in WSNs, where the average packet loss rate is 40-50%. Thus, assuming completely reliable links is unfeasible and oversimplifies the problem.
To resolve this problem, distributed data storage (DDS) [17][18][19] is proposed to enable reliable data gathering by employing redundancy. In contrast to a centralized sink, a mobile sink collects data from a small subset of the total nodes to recover all the data. It is worth mentioning that DDS effectively reduces the impact of packet loss on data gathering because there is no static routing, although few researchers have focused on this advantage. However, DDS requires a large number of transmission tasks to ensure sufficient redundancy, which is potentially catastrophic for nodes with energy limitations. Thus, it is imperative to investigate effective ways to apply DDS for data gathering with the dual purposes of resisting packet loss and reducing the number of transmissions.
To address this problem, many studies have been carried out on this topic. In [20][21][22], CS is combined with DDS to exploit the advantages of both technologies. The goal of Talari et al. [20] was to reduce the number of transmissions by exploiting the spatial correlations of nodes based on CS with the broadcast properties of wireless channels. In this scheme, the nodes store received data and broadcast the data with a given probability. The performance of data reconstruction was further improved in [21]. Yang et al. [21] found that the number of receptions was higher than the number of transmissions. Hence, Yang et al. [21] focused on reducing the total number of both transmissions and receptions simultaneously. In [22], both the spatial and temporal correlations of nodes are exploited to reduce the number of transmissions. All the above studies take advantage of broadcast routing and consider how to reduce the transmission energy cost. However, compared with fixed routing, such as tree routing and cluster routing, broadcasting data consumes more reception energy because neighboring nodes receive broadcast data whether they need it or not. For example, in [20][21][22], the neighboring nodes first receive the broadcasting data and then determine whether to merge the data based on certain conditions. Consequently, broadcasting data consumes large amount of reception energy, although the received data are rarely merged. Furthermore, none of these studies consider the problem of packet loss; instead, they make the unrealistic assumption that the wireless links are completely reliable.
Tackling the abovementioned consideration, two challenges must be resolved. The first involves how to effectively reduce the quantity of data disseminated (transmissions and receptions), especially the number of receptions rather than the number of fusions. The second problem is related to reducing the impact of lossy links (namely, the packet loss rate) on data reconstruction. To solve these two challenges, a distributed data storage and gathering algorithm based on compressive sensing (CS-DDSG) is proposed utilizing CS and DDS. Relying on collected data, the mobile sink generates a sparse measurement matrix aimed at reducing communication energy consumption. Furthermore, it is proven that the measurement matrix satisfies the restricted isometry property (RIP) [23]. Based on random geometric graph theory, an expression of the total number of transmissions and receptions is formulated to analyze the energy consumption of CS-DDSG.
The reminder of this paper is organized as follows. In Section 2, we commence by reviewing the CS theory and introduce the network model. In Section 3, we present the proposed CS-DDSG algorithm, describe the formulation of the measurement matrix and provide a proof that this matrix can satisfy RIP. Based on the proposed scheme, we formulate the expression of the total number of transmissions and receptions in Section 4. We present our simulations and their results and investigate the performance of CS-DDSC in Section 5. Finally, concluding remarks are provided in Section 6.

Preliminaries and Network Model
In this section, we introduce CS theory and then describe the network model and our motivation.

Compressed Sensing
In WSNs, assume that N sensor readings are denoted by X = (x 1 , · · · , x N ) T , where x i , i ∈ [1, N] denotes the reading of node i with K-sparse representation at a basis Ψ ∈ R N×N : where θ ∈ R N is a coefficient vector corresponding to the sparse basis Ψ. X is K-sparse and compressive if the vector θ has at most K(K ≤ N) nonzero coefficients or (N − K) smallest coefficients can be ignored. We assume the measurement matrix is Φ ∈ R M×N and is uncorrelated with the basis Ψ, then the CS measurements of X can be expressed as follows: where M N and Θ = ΦΨ is a sensing matrix. The original signal X can be reconstructed with an overwhelming probability from M measurements by l 1 -norm minimization as follows: whereX denotes the reconstructed sparse signal of X.

Network Model
We consider a single-sink WSN consisting of N battery-powered sensors. The sensors are deployed in a square area with a boundary length of 1. We assume all nodes have an identical transmission radius of r t , and that any two nodes can communicate with each other if their Euclidian distance d satisfies d ≤ r t . To guarantee the network connectivity, r t should also satisfy the following condition [24]: where S denotes the deployment area and S = 1 × 1. Let X N×1 = (x 1 , · · · , x N ) T denotes the N node readings. Since the readings are spatiotemporally correlative with each other, X can be compressed on an orthogonal basis Ψ = φ i,j N×N . The fast Fourier transform (FFT) orthonormal basis is adopted as the sparse representation basis in this paper. Let Φ = ϕ i,j M×N denote the measurement matrix. The measurement vector Y ∈ R M×1 can be computed with Equation (2). Furthermore, we introduce the expression of Φ in Section 3. Thus, the CS-DDSG network model coincides with the CS model. In addition, we define the normalized mean absolute error (NMAE) metric to evaluate the accuracy of reconstruction accuracy: Equation (6) shows that the smaller the NMAE is, the better performance the algorithm can achieve.

Motivation
In this subsection, we investigate the impact of packet loss on the CS recovery performance relying on the fixed routing. Figure 1 presents the performance of the CDG [9] algorithm with cluster topology in unreliable links. In this scheme, there are 100 nodes and the member nodes forward the packets to the cluster head via a one-hop route. When the packet loss rate is 10%, the recovery accuracy is worse than the accuracy in the ideal link. Furthermore, increasing the measurements cannot improve the algorithm's performance. For M = 50 measurements, Figure 2 indicates that the accuracy declines with the increase of packet loss rate. Equation (6) shows that the smaller the NMAE is, the better performance the algorithm can achieve.

Motivation
In this subsection, we investigate the impact of packet loss on the CS recovery performance relying on the fixed routing. Figure 1 presents the performance of the CDG [9] algorithm with cluster topology in unreliable links. In this scheme, there are 100 nodes and the member nodes forward the packets to the cluster head via a one-hop route. When the packet loss rate is 10%, the recovery accuracy is worse than the accuracy in the ideal link. Furthermore, increasing the measurements cannot improve the algorithm's performance. For 50 M = measurements, Figure 2 indicates that the accuracy declines with the increase of packet loss rate.    Equation (6) shows that the smaller the NMAE is, the better performance the algorithm can achieve.

Motivation
In this subsection, we investigate the impact of packet loss on the CS recovery performance relying on the fixed routing. Figure 1 presents the performance of the CDG [9] algorithm with cluster topology in unreliable links. In this scheme, there are 100 nodes and the member nodes forward the packets to the cluster head via a one-hop route. When the packet loss rate is 10%, the recovery accuracy is worse than the accuracy in the ideal link. Furthermore, increasing the measurements cannot improve the algorithm's performance. For 50 M = measurements, Figure 2 indicates that the accuracy declines with the increase of packet loss rate.    We consider one of the clusters containing N 1 nodes. For the CDG algorithm with fixed routing, the cluster head receives the data vector X N 1 ×1 = (x 1 , · · · , x i , · · · , x N 1 ) T in reliable links. The measurements Y can be represented as If the packet of node i is missing due to unreliable links, then its cluster head will receive X N 1 ×1 = (x 1 , · · · , x i , · · · , x N 1 ) T and the measurement Y can be represented as According to Equations (7) and (8), one missing packet affects every element y i of the measurement vector. Thus, the sink recovers all the data X using Y and Φ, which leads to an imprecise or invalid reconstruction. Furthermore, the accuracy is even worse under tree-based routing. This deficiency occurs because if one packet of a parent node is missing, then all the information from its child nodes is lost too. Additionally, simply increasing the number of measurements or the number of retransmissions does not help much in improving the recovery accuracy. Therefore, the CS-based algorithm is sensitive to packet loss. In the next section, we investigate how to resist unreliable links, while using fewer transmissions and receptions by utilizing broadcasting properties.

Procedures of CS-DDSG
Based on the network model, we propose CS-DDSG to avoid packet loss and reduce the total number of transmissions and receptions, as presented in Figure 3. The procedures involved in CS-DDSG are detailed below. Stage 1. Initialization. The proposed scheme requires precise time to help nodes to cooperate with each other. Assuming the network is synchronized and slotted based on Reference Broadcast Synchronization (RBS) [25], which can achieve the goal of high accuracy and energy-efficiency. At the beginning of data gathering, each node senses a data x i and generates a coefficient ϕ i = 1. Then, each node i forms an initial packet, denoted by S(i) which defines has two components: The component S(i).id stores the node ID of nodes and S(i).data stores the readings. Stage 2. Broadcasting. After a fixed and long enough period of time for synchronization and initialization, N s , (N s < N) nodes are randomly selected as source nodes with a probability p 1 in this stage. The source nodes broadcast their own packets and do not receive any packets. If an ordinary node m(m ∈ [1, N]) is located with the communication range of the source node n(n ∈ [1, N]) and has not received a packet before, then node m receives the data broadcasted by node n and updates its packet as follows: If node m has already received any other broadcast data, then this node stops receiving data; in other words, each node receives only one broadcast packet.
Stage 3. Forwarding. In the following, only the receiving nodes from Stage 2 continue to broadcast their updated packets to neighboring nodes with the probability p 2 . Similarly, the neighboring nodes around the forwarding nodes will receive a packet only if they have not received any prior packets. These new receiving nodes broadcast their updated packets as described above. Actually, the Stage 2 and Stage 3 could start simultaneously. Nodes get the packets of source nodes in Stage 2 and then decide whether to broadcast immediately. Thus, the neighboring nodes of those forwarding nodes could update their packets relying on the packets of source nodes or forwarding nodes. Finally, the forwarding operation will stop until there are no new reception nodes. Because of the reception condition and the small probability p 2 , in practice, the forwarding process stops after repeating only a few times, which is analyzed in Section 5 in detail.
Stage 4. Visiting. The mobile sink starts the visiting phase after a fixed and sufficiently long period, which can be preset according to the number of nodes N. M nodes are randomly queried by the mobile sink to extract the corresponding information, i.e., the measurement vector Y and the measurement matrix Φ. Finally, the entire network's readings X can be reconstructed from Y and Φ based on Equation (3). The entire pseudocode of CS-DDSG is presented in Algorithms 1 and 2.
According to Equations (7) and (8), one missing packet affects every element i y of the measurement vector. Thus, the sink recovers all the data X using ′ Y and Φ , which leads to an imprecise or invalid reconstruction. Furthermore, the accuracy is even worse under tree-based routing. This deficiency occurs because if one packet of a parent node is missing, then all the information from its child nodes is lost too. Additionally, simply increasing the number of measurements or the number of retransmissions does not help much in improving the recovery accuracy. Therefore, the CS-based algorithm is sensitive to packet loss. In the next section, we investigate how to resist unreliable links, while using fewer transmissions and receptions by utilizing broadcasting properties.

Procedures of CS-DDSG
Based on the network model, we propose CS-DDSG to avoid packet loss and reduce the total number of transmissions and receptions, as presented in Figure 3. The procedures involved in CS-DDSG are detailed below.

Selection of Parameters
In this subsection, we investigate the values of the parameters r t and p 2 . We consider a network with N = 400 nodes, which are randomly deployed over an area of size S = 1 × 1 in this paper. As described in Section 2, to ensure the network connectivity, r t must satisfy the condition in Equation (5). Thus, r t > 0.069; we set r t = 0.075.
In Stage 3 of CS-DDSG, nodes forward their updated packets with a probability p 2 and all neighboring nodes can receive this data. For the sake of an appropriate p 2 that reduces the number of transmissions N t and increases the proportion of reception nodes P r simultaneously, we simulate N r and P r versus p 2 by setting p 1 = 0.2 and r t = 0.075 as shown Figure 4, where all normal nodes stop receiving any data after merging one packet. As Figure 4 shows, as p 2 increases, the values of N t and P r both increase. Furthermore, P r increases almost linearly with p 2 . Thus, when p 2 = 0.32, 98% nodes receive a broadcast packet. Moreover, as p 2 increase beyond 0.32, N r increases less, while P r increases sharply. Therefore, the appropriate value for p 2 is 0.32, because that value provides a balanced trade-off between the number of transmissions and the percentage of receiving nodes.

Selection of Parameters
In this subsection, we investigate the values of the parameters t r and 2 p . We consider a network with 400 N = nodes, which are randomly deployed over an area of size 1 1 S = × in this paper. As described in Section 2, to ensure the network connectivity, t r must satisfy the condition in In Stage 3 of CS-DDSG, nodes forward their updated packets with a probability 2 p and all neighboring nodes can receive this data. For the sake of an appropriate 2 p that reduces the number of transmissions t N and increases the proportion of reception nodes r P simultaneously, we simulate r N and r P versus 2 p by setting 1 0.2 p = and 0.075 t r = as shown Figure 4, where all normal nodes stop receiving any data after merging one packet. As Figure 4 shows, as 2 p increases, the values of t N and r P both increase. Furthermore, r P increases almost linearly with 2 p . Thus, when 2 0.32 p = , 98% nodes receive a broadcast packet. Moreover, as 2 p increase beyond 0.32, r N increases less, while r P increases sharply. Therefore, the appropriate value for 2 p is 0.32, because that value provides a balanced trade-off between the number of transmissions and the percentage of receiving nodes.

Measurement Matrix Formulation
In this subsection, we present the formulation procedure for the measurement matrix. As we introduced above, in Stage 4, after the mobile sink queries the M nodes, which are denoted by

Measurement Matrix Formulation
In this subsection, we present the formulation procedure for the measurement matrix. As we introduced above, in Stage 4, after the mobile sink queries the M nodes, which are denoted by n i 1 , n i 2 · · · , n i k , · · · n i M , i 1 < i 2 < · · · < i M , i k ∈ [1, N], the measurement matrix Φ is constructed based on the M packets. Suppose Ω k is the index of node ID and its definition is expressed as follows: Initially, Φ is an all-zero M × N matrix, then Φ is formulated at this step which is given by Equation (12): For example, assume there are five nodes in the network (i.e., N = 5). If the mobile sink queries two nodes (i.e., M = 2), then Φ can initially be expressed as follows: Suppose that nodes 2 and 4 are selected by the sink, and their packets components are as follows: then ϕ 1,2 = ϕ 1,5 = 1 and ϕ 2,1 = ϕ 2,4 = 1. Finally, the matrix Φ becomes: Moreover, the measurement vector Y is expressed as follows: Obviously, Φ is a sparse matrix, whose sparsity degree is influenced by p, p 1 and p 2 . Furthermore, Equation (12) indicates that Φ is constructed by relying on the gathered data, which precludes the need to measure lost data. Thus, Y is not influenced by lost packets at all. Therefore, CS-DDSG is resistant to the packet loss rate.

Does the Measurement Matrices Satisfy RIP?
The structure of measurement matrix Φ is random and relies on the receiving nodes. Thus, CS-DDSG avoids measuring the lost nodes and avoids the packet loss. The question is: Does Φ obey RIP to utilize the CS theory? Unfortunately, it is an NP-hard problem to prove the RIP property of a matrix. However, Yang et al. [21] reported that recovery performance can be guaranteed with high probability when the rows of the measurement matrix are linearly independent. We investigate this proposition below.
The rows of Φ are linearly dependent when one of the following two situations occurs.

Case 1.
Any row ϕ k can be expressed as a linear combination of other rows.
Proof. The measurement coefficient is 1; thus, if ϕ k can be expressed as a linear combination of rows ϕ k 1 , · · · , ϕ k q , q = 2, · · · , N − 1, they satisfy the following: Suppose I k = j ϕ k,j = 0 and I k i = j ϕ k i ,j = 0 ; if the condition of Equation (17) is satisfied, then I k = ∪ q i=1 I k i and we can obtain where |·| denotes the number of elements in the set. Thus, Equation (17) can be satisfied when one of the following two situations occurs. The first situation would occur if node k were to receive packets from nodes k 1 , · · · , k q and merge their packets. However, this situation contradicts the reception condition under which each node receives one packet. Thus, the condition of Equation (17) cannot occur. The second situation would occurs when node k 2 receives a packet from node k 1 and node k 3 receives a packet from node k 2 . It follows that node k q receives a packet from node k q−1 . Finally, node k receives the packet from node k q . According to the condition in Equation (10), I k q satisfies the following: After node k updates its packet, I k satisfies: Obviously, k ∈ I k but k / ∈ I k q . Thus, I k = {k} ∪ ∪ q i=1 I k i and Equation (17) is false. Consequently, it can be concluded that no rows can be linearly expressed by other rows.

Case 2.
Any two rows ϕ i and ϕ j are linearly dependent.
Proof. ϕ i and ϕ j are linearly dependent if and only if they are precisely the same. However, according to the reception condition, each node receives only one packet and merges with its own unique packet. Therefore, although node i and node j may receive the same broadcasting packet from a common neighboring node, their packets will still be different. Therefore, none of the rows are linearly dependent.
In conclusion, the rows of the measurement matrix Φ are linearly independent; consequently, in CS-SSDG, X can be reconstructed from Y with a very high probability.

Formulating the Expression of the Total Number of Transmissions and Receptions
Compared with the mainstream algorithms [15,20,21], the proposed scheme CS-DDSG reduces the number of transmissions and receptions rather than the number of fusions. In this section, we formulate the total number of transmissions N Ttot and receptions N Rtot based on the random geometric graph (RGG) mode [26] and the torus convention [27] to investigate the efficiency in reducing N Ttot and N Rtot .
According to Section 3, N Ttot and N Rtot can be expressed as follows: where N P t and N P r denote the number of transmitting and reception nodes in Stage 2, respectively.
When N q * t = N q * −1 r ·p 2 ≤ 0, no node forwards packets and the forwarding process is completed.
Additionally, N 0 r = N P r and N 0 t = N s . Next, we formulate the expression of N P r , N q t and N q r .
4.1. Formulating N P r Proposition 1. The number of receptions in Stage 2 N P r is: Proof. According to the procedures of Stage 2, N P r equals the number of neighboring nodes around all the source nodes N s,nei minus the number of nodes N r2 located in the overlapping communication region of the two sources nodes. This relation occurs because each node receives just one packet and the number of receptions for those nodes is counted twice, thus N P r can be represented as follows: The average number of neighboring nodes for all source nodes N s,nei is expressed as follows: In Figure 5, the red circle denotes the communication region and S 2 represents the shaded area jointly covered by the two source nodes. A and B are two intersections. When the distance between two source nodes d(O, O ) satisfies 0 < d(O, O ) ≤ 2r t , N r2 exists. Thus, the probability p L of an existing communication between the two nodes is expressed as follows: In the N s source nodes, an average of N L nodes pairs satisfy the condition in Equation (26) (i.e., N L source nodes pairs can communicate with each other). The expressions for N L and N r2 are, respectively, as follows: Because the nodes are uniformly distributed and 0 < d(O, O ) ≤ 2r t , the probability p{d ≤ x} is equal to Thus, the probability density function (PDF) f 1 (x) is In this case, the area S 2 /2 equals the area of sector OAB minus the area of triangle OAB: Thus, the expected area of S 2 is calculated as follows: Combining Equations (27), (28) and (32), N r2 can be formulated as: Finally, we substitute Equations (25) and (33) into Equation (24), to obtain the representation of N P r : Combining Equations (27), (28) and (32), 2 r N can be formulated as: Finally, we substitute Equations (25) and (33) into Equation (24), to obtain the representation of

Formulating
Combining Equations (27), (28) and (32), 2 r N can be formulated as: Finally, we substitute Equations (25) and (33) into Equation (24), to obtain the representation of     In conclusion, the number of receptions q r N for Stage 3 in the th q forwarding can be expressed as follows:

Formulating
Next, we formulate the expression of In conclusion, the number of receptions q r N for Stage 3 in the th q forwarding can be expressed as follows: Next, we formulate the expression of In conclusion, the number of receptions N q r for Stage 3 in the q th forwarding can be expressed as follows: As shown in Figure 6, the nodes in shadow area S 3 would receive the packet. Thus, N q r1 is calculated as follows: In the above formula, S 3 can be expressed as follows: Because the nodes are uniformly distributed and 0 < d(O, O ) ≤ r t , the probability p{d ≤ x} equals Thus, the PDF f 2 (x) is Combining Equations (31), (37) and (39), we obtain Thus, we obtain

Calculating N q r2
Next, we formulate the expression of N q r2 . As presented in Figure 7a, the value of N q r2 is the number of receive node in area S 5 , thus we have where S 4 denotes the expected area of the black region, S 5 denotes the expected area of the shadow region, and p L denotes the probability that the distance between two nodes satisfies 0 ≤ d(O, O ) ≤ r t . Thus, we have the following: According to the method in [28], we can get the approximate value of S 5 , i.e., S 5 S 5,max = πr 2 t /6. Finally, combining Equations (43), (44) and (47), we obtain

Calculating N q r3
The expression of N q r3 is similar to that of N q r2 : As shown in Figure 7b, because r t ≤ d(O, O ) ≤ 2r t , S 6 is calculated as follows: where 1 2 2r r xarcsin x 2r t dx = π 24 r 2 t − √ 3 8 r 2 t , and S 7 is expressed as where S 12 is the intersection area of circle O and circle O when r t ≤ d(O, O ) ≤ 2r t , thus and Combining Equations (49), (50) and (53), we obtain As illustrated in Figure 8, the two black circles denote the communication range of two transmitting nodes, n q t1 and n q t2 , in the q forwarding. The red circle denotes the communication where S 8 and S 10 denote the area of the black region and S 9 and S 11 denote the area of the shadow region. Compared with Figures 7 and 8, we have the following: Thus, the expressions of N q r4 and N q r5 are: (59) In conclusion, by combining Equations (35) (60)

The Formulation of N Ttot and N Rtot
Theorem 1. Assume that all N sensor nodes are deployed randomly and uniformly in a distributed WSNs with a boundary length of 1, and each node has a transmission range of r t . If we gather data based on CS-DDSG scheme, then N Ttot and N Rtot are, respectively, expressed as follows: The expression forN P r is given in Equation (34).
Proof. As presented in the above derivation, we can obviously obtain Equation (61) based on Equation (21)

Performance Evaluation and Analysis
To evaluate the effectiveness of CS-DDSG, we ran simulations in MATLAB 2012b. The simulation parameters were set as shown in Table 1. Furthermore, we adopted the FFT orthonormal basis and the orthogonal matching pursuit (OMP) method for the reconstruction algorithm. We used the real sensor readings extracted from the GreenOrbs [29] system. The probability of forwarding in Stage 3 0.32 r t Communication radius 0.075 In this paper, we present the performance comparations of CS-DDSG, Compressive Sensing Data storage (CStorage) [20], Improved CStorage (ICStorage) [21], Compressed Network Coding based Distributed data Storage (CNCDS) [21] and Direct Cluster-Based Compressive Sensing Data Collection (DCCS) [15] on unreliable links. These first four schemes all combine DDS and CS to gather data. CStorage, ICStorage and CNCDS are concerned with reducing the number of transmission and fusions. In CStorage, intermediate nodes receive the broadcasting packets when they first receive, and then, they forward the received packet with a given probability. The intermediate nodes in ICStorage forward their own readings rather than the received source nodes readings. In the CNCDS scheme, the intermediate nodes receive broadcast packets only if the receiving node does not share any node IDs with the corresponding transmitting node. We also analyze the numbers of transmissions, receptions and fusions involved in the first four algorithms. DCCS combines CS and cluster topology to reduce the total power consumption with no consideration of packet loss rate. All member nodes gather data and transmit to cluster heads, where the CS measurements and measurement matrices are generated and send to sink directly. Additionally, we discuss the impact of packet loss rate, the number of measurements and the proportion of source nodes on the performance of CS-DDSG. The simulation results shown are the average values from 1000 runs.
First, we evaluate the performance on unreliable links when p1 = 0.3, M = 50, as shown in Figure 9. It can be seen that: (1) As p increases, the reconstruction accuracy of all the algorithms decreases in Figure 9a. When p ≤ 0.6, the NMAEs of the four algorithms are stable and increase gradually, which indicates that CS-DDSG is effective at resisting the packet loss. Although the packet loss rate impacts the nodes receiving broadcasting packets, the sink still gathers enough packets to recover the data. In addition, the sink constructs the measurement matrix based on received packets, which avoids the need to measure the lost nodes and reduces the impact of unreliable links on measurement vector Y. However, the performance of DCCS is poor with an increase in p. Sink cannot find the lossy nodes and still reconstructs data based on the original measurement matrices. Thus, DCCS is sensitive to p. (2) CS-DDSG outperforms the other algorithms. This improved performance occurs because in CS-DDSG, nodes receive only one packet which is broadcasted by its neighbor nodes in CS-DDSG. Thus, the measurement vectors have the characteristic of strong spatial correlation, which is utilized by CS to recover the data. However, in the other algorithms, nodes would fuse packets from distant nodes as long as the receipt condition is satisfied, which leads to a weak spatial correlation of measurement vectors. Thus, CS-DDSG outperforms the other algorithms. because in CS-DDSG, nodes receive only one packet which is broadcasted by its neighbor nodes in CS-DDSG. Thus, the measurement vectors have the characteristic of strong spatial correlation, which is utilized by CS to recover the data. However, in the other algorithms, nodes would fuse packets from distant nodes as long as the receipt condition is satisfied, which leads to a weak spatial correlation of measurement vectors. Thus, CS-DDSG outperforms the other algorithms. We present the total number of transmissions, receptions and fusions of the four algorithms in Figure 10  Furthermore, we investigate the fusion proportion of the total number of receptions. As presented in Figure 11, only 41% of the receiving nodes in CNCDS merge the received packets; the authors consider that only 41% of nodes lose energy. In fact, 59% of the receiving nodes also consume energy because they would receive the broadcast packet first and then determine whether the condition of CNCDS are satisfied; the received packets will be merged only if they satisfy the condition. Thus, energy is consumed even when the received packets are not fused. However, the number of receptions in [21] is the same as the number of fusions, which is less counted. Similarly, 46% and 48% of the receiving reception nodes in CStorage and ICStorage merge the packets, respectively. In CS-DDSG, all received nodes are fused and no redundancy occurs because the nodes receive packets only once. Thus, the energy consumption of CS-DDSG receiving nodes is much smaller than that of the other algorithms. In conclusion, CS-DDSG effectively reduces both the number of transmissions and receptions. We present the total number of transmissions, receptions and fusions of the four algorithms in Figure 10  Furthermore, we investigate the fusion proportion of the total number of receptions. As presented in Figure 11, only 41% of the receiving nodes in CNCDS merge the received packets; the authors consider that only 41% of nodes lose energy. In fact, 59% of the receiving nodes also consume energy because they would receive the broadcast packet first and then determine whether the condition of CNCDS are satisfied; the received packets will be merged only if they satisfy the condition. Thus, energy is consumed even when the received packets are not fused. However, the number of receptions in [21] is the same as the number of fusions, which is less counted. Similarly, 46% and 48% of the receiving reception nodes in CStorage and ICStorage merge the packets, respectively. In CS-DDSG, all received nodes are fused and no redundancy occurs because the nodes receive packets only once. Thus, the energy consumption of CS-DDSG receiving nodes is much smaller than that of the other algorithms. In conclusion, CS-DDSG effectively reduces both the number of transmissions and receptions.    Figure 12 presents the number of fusions and receiving nodes during each forwarding round when p 1 = 0.15. The forwarding process of CS-DDSG repeats five times, until no node remains to accept the broadcast packets, while CNCDS, CStorage and ICStorage repeat six, nine and twelve times, respectively. The network employing CS-DDSG has the fastest convergence and characteristics of efficiency due to the strictest reception conditions. Moreover, most of the data fusion occurs during Stage 2, and subsequently the number of fusions rapidly decreases in Stage 3 except in ICStorage.
In Figure 13, we investigate the recovery performance of the algorithms when p 1 = 0.3 and the number of measurements M, which is queried by the mobile sink, ranges from 15 to 150. It can be observed that, with an increase in M, the recovery accuracy of ICStorage, CStorage, CNCDS and CS-DDSG are improved and equivalent, while the performance of CS-DDSG becomes slightly better when M ≥ 100. This improvement occurs because the more information that is gathered, the better is the reconstruction accuracy. According to Equation (12), the sink constructs measurement matrix Φ based on the packets fused by the forwarding nodes. The forwarding nodes of CS-DDSG receive only one packet, and the Φ is sparser than that in the others algorithms. Consequently, less information is gathered and fewer nodes contribute to data recovery for CS-DDSG. However, with an increase in M, more information is gathered and the gaps separating the four algorithms decrease. When M > 100, CS-DDSG outperforms the four DDS-based algorithms due to the strong spatial correlation of the measurement vector. Moreover, the reconstruction accuracy of DCCS is the best when M is large enough and there is no packet loss. All nodes in DCCS participate in gathering data and DCCS adopts dense measurement matrix in clusters. Thus, more information is gathered. In addition, performance tends to be stable as M increases. In Figure 13, we investigate the recovery performance of the algorithms when 1 0.3 p = and the number of measurements M , which is queried by the mobile sink, ranges from 15 to 150. It can be observed that, with an increase in M , the recovery accuracy of ICStorage, CStorage, CNCDS and CS-DDSG are improved and equivalent, while the performance of CS-DDSG becomes slightly better when 100 M ≥ . This improvement occurs because the more information that is gathered, the better is the reconstruction accuracy. According to Equation (12), the sink constructs measurement matrix Φ based on the packets fused by the forwarding nodes. The forwarding nodes of CS-DDSG receive only one packet, and the Φ is sparser than that in the others algorithms. Consequently, less information is gathered and fewer nodes contribute to data recovery for CS-DDSG. However, with an increase in M , more information is gathered and the gaps separating the four algorithms decrease. When 100 M > , CS-DDSG outperforms the four DDS-based algorithms due to the strong spatial correlation of the measurement vector. Moreover, the reconstruction accuracy of DCCS is the  Figure 14 shows the performance of CS-DDSG under different packet loss ratios p and probabilities p 1 when M = 40. As p increases, the value of NMAE remains stable, i.e., N M AE ≈ 0.014. This result indicates that CS-DDSG effectively resists the packet loss and maintains high reconstruction accuracy even when unreliable links exist. Additionally, its accuracy is not influenced by p 1 due to the very sparse measurement matrix. probabilities 1 p when 40 M = . As p increases, the value of NMAE remains stable, i.e., NMAE 0.014 ≈ . This result indicates that CS-DDSG effectively resists the packet loss and maintains high reconstruction accuracy even when unreliable links exist. Additionally, its accuracy is not influenced by 1 p due to the very sparse measurement matrix.  Finally, we investigate how the proportion of source nodes 1 p impact the recovery accuracy in Figure 15. The simulation results show the following: (1)   NMAE 0.014 ≈ . This result indicates that CS-DDSG effectively resists the packet loss and maintains high reconstruction accuracy even when unreliable links exist. Additionally, its accuracy is not influenced by 1 p due to the very sparse measurement matrix.  Finally, we investigate how the proportion of source nodes 1 p impact the recovery accuracy in Figure 15. The simulation results show the following: (1) When 1 0.4 p = , the value of NMAE decreases as M increases because more nodes participate in data reconstruction as M increases.
(2) When M is fixed, CS-DDSG performance is improved and the trend of NMAE values is very close to the value of 1 p , varying from 0 to 0.6. This effect occurs because, when there are more source nodes, more nodes will receive broadcast packets before the sink obtains data. Hence, the amount of information used for reconstruction increases. However, due to the reception condition, the  Finally, we investigate how the proportion of source nodes p 1 impact the recovery accuracy in Figure 15. The simulation results show the following: (1) When p 1 = 0.4, the value of NMAE decreases as M increases because more nodes participate in data reconstruction as M increases. (2) When M is fixed, CS-DDSG performance is improved and the trend of NMAE values is very close to the value of p 1 , varying from 0 to 0.6. This effect occurs because, when there are more source nodes, more nodes will receive broadcast packets before the sink obtains data. Hence, the amount of information used for reconstruction increases. However, due to the reception condition, the measurement matrix Φ is sparse. Thus, information is increasingly limited. As a result, the trends of the NMAE values are close to the different value of p 1 .
Sensors 2018, 18, x FOR PEER REVIEW 22 of 24 measurement matrix Φ is sparse. Thus, information is increasingly limited. As a result, the trends of the NMAE values are close to the different value of 1 p . Figure 15. Performance of CS-DDSG with different value of 1 p and M .

Conclusions
In this paper, the data gathering problem is investigated in lossy WSNs using the simple but efficient proposed CS-DDSG algorithm that combines CS theory and DDS. Compared with other correlative and mainstream strategies, CS-DDSG balances the energy consumption and reconstruction performance effectively. In our proposed algorithm, nodes are selected to be source nodes with the probability 1 p to broadcast their packets. The neighboring nodes around the source nodes receive the broadcasting nodes and update their own packets, which are broadcasted with the probability 2 p . Then, all receiving nodes forward their updated packets with the probability 2 p . The process will be repeated a few times until there are no receiving nodes. Each receiving node receives only one packet. In this way, the numbers of transmissions and fusions are reduced, and the CS reconstruction accuracy is guaranteed. Moreover, the expression of the total number of transmissions and receptions is formulated via RGG. The simulation results and analysis validate that CS-DDSG outperforms the other algorithms in unreliable links. In addition, we investigate how the measurements M , the packet loss p and the probability 1 p influence the performance of CS-DDSG. In future research, we plan to explore the possibility of temporal correlations of node readings. Another potential extension of this work is to more strictly demonstrate that the measurement matrix satisfies the RIP.

Conclusions
In this paper, the data gathering problem is investigated in lossy WSNs using the simple but efficient proposed CS-DDSG algorithm that combines CS theory and DDS. Compared with other correlative and mainstream strategies, CS-DDSG balances the energy consumption and reconstruction performance effectively. In our proposed algorithm, nodes are selected to be source nodes with the probability p 1 to broadcast their packets. The neighboring nodes around the source nodes receive the broadcasting nodes and update their own packets, which are broadcasted with the probability p 2 . Then, all receiving nodes forward their updated packets with the probability p 2 . The process will be repeated a few times until there are no receiving nodes. Each receiving node receives only one packet. In this way, the numbers of transmissions and fusions are reduced, and the CS reconstruction accuracy is guaranteed. Moreover, the expression of the total number of transmissions and receptions is formulated via RGG. The simulation results and analysis validate that CS-DDSG outperforms the other algorithms in unreliable links.
In addition, we investigate how the measurements M, the packet loss p and the probability p 1 influence the performance of CS-DDSG. In future research, we plan to explore the possibility of temporal correlations of node readings. Another potential extension of this work is to more strictly demonstrate that the measurement matrix satisfies the RIP.