This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

In many wireless sensor network applications, the possibility of exceptions occurring is relatively small, so in a normal situation, data obtained at sequential time points by the same node are time correlated, while, spatial correlation may exist in data obtained at the same time by adjacent nodes. A great deal of node energy will be wasted if data which include time and space correlation is transmitted. Therefore, this paper proposes a data compression algorithm for wireless sensor networks based on optimal order estimation and distributed coding. Sinks can obtain correlation parameters based on optimal order estimation by exploring time and space redundancy included in data which is obtained by sensors. Then the sink restores all data based on time and space correlation parameters and only a little necessary data needs to be transmitted by nodes. Because of the decrease of redundancy, the average energy cost per node will be reduced and the life of the wireless sensor network will obviously be extended as a result.

Wireless sensor networks can be broadly applied in various areas such as environmental monitoring, medical care, intelligent homes, transportation, military fields,

Many new algorithms have been proposed recently, such as the self-based regression algorithm proposed by Deligiannakis, which first splits the recorded series into intervals of variable length [

Zhou and Lin have proposed a distributed spatial-temporal data compression algorithm based on a ring topology wavelet transformation [

This paper proposes a cluster optimal order estimation distributed structure tree depression algorithm (abbr. COOE-DSTD). The main idea of the algorithm is injecting the theory of optimal order estimation model into the field of data compression for wireless sensor networks, which has not been tried by others before. The simulation results demonstrate that the combination of optimal order estimation and distributed coding is effective for improving the data compression algorithm of wireless sensor networks. According to optimal order estimation we define the number of groups of data transmitted by nodes for obtaining relativity parameters, so redundant parameters can be prevented. At the same time, the wireless sensor network is divided into clusters, which not only improves the efficiency of the sink to deal with data, but also strengthens the ability of the sink to locate the position of exceptions, so the monitoring ability of the whole system will be greatly improved. The second part will introduce distributed coding theory and the optimal order estimation model imported by COOE-DSTD. The basic structure and flow chart of the DSTD algorithm and the COOE-DSTD algorithm will be given in the third section. The fourth part demonstrates the performance of the COOE-DSTD algorithm through comparison with DSTD from the point of view of average energy cost per node, signal to noise ratio and the ratio of the above two factors. Meanwhile, relative simulations are shown and analyzed. Conclusions of the paper and prospects for future work are offered in the last section.

Data obtained by nodes in wireless sensor networks includes spatial-temporal relativity, which is the basis to introduce distributed coding in the DSTD and COOE-DSTD algorithms. Meanwhile, optimal order estimation is introduced in COOE-DSTD. Now a necessary explanation of distributed coding and optimal order estimation is given.

Distributed coding is one kind of asymmetric coding. First let us take the situation of two nodes as an example, as

In the same way, current data from B can be restored according to code from B and data already restored from A. If data obtained by nodes are discrete independent identically distributed sequence, both nodes A and B can restore their original data as better as they don’t know the relativity parameters between A and B. When extending two nodes to N nodes, the sink obtains partial original data and code from nodes and computes relativity parameters. Then combined with these relativity parameters, the sink can restore all original data. Distributed coding derives from the sympatric data source coding theory which was proposed by Slepian and Wolf [

In _{H}(X, Y)_{H}(m,n)

The course of getting data through nodes in wireless sensor networks is considered a stationary stochastic process in this paper. The optimal order estimation can be explained as follows: as known to all, the performance of a given autoregressive model is up to the practical process, the number of samples, the estimation algorithm and the order selection criterion. The finite sample criterion has given an empirical estimation based on the residual energy statistical average and an autoregressive estimation algorithm of predicted variance, which makes the performance of the finite sample criterion dependent on the adopted estimation method. As two important criteria, both special the finite sample information criterion (abbr. FSIC) and the combined information criterion (abbr. CIC) have considered the increasing residual energy with the increase of model order [

Now we first illustrate FSIC and CIC in detail. For a practical stochastic process

We set the model of

In _{n}_{g}_{g}

If the

As the variance of ɛ̂_{n},
_{n}_{d}_{d}_{n}

Replacing

Compared with other criteria which just modify

If the optimal order of model is low, the CIC can avoid optimal order being estimated too low through a punishment factor

A simple description of the basic theory referred in the proposed algorithm is given above. Now we will illustrate the algorithm in detail. Based on distributed coding theory, DSTD first let nodes transmit N/3 groups’ original data to the sink, then the sink constructs the structure tree and computes prediction relativity parameters and number ^{i}

Both the DSTD algorithm and COOE-DSTD algorithm adopt the same theory to get predicted correlation coefficients and the same way to establish the structure tree for getting the coding instruction

The

Because of the existence of spatial-temporal relativity in the original data, we can construct a predictive model in the sink. That is to say, we can estimate the original data for the next moment based on the obtained data. It is necessary to declare that the node clustering operation is used after getting the optimal order of the estimation model in the COOE-DSTD algorithm. Both spatial relativity and temporal relativity of the original data obtained from nodes are relative to the optimal order, so the sink can divide all nodes into some clusters based on the value of the optimal order in the COOE-DSTD algorithm. For the first step, every node broadcasts its residual energy to nearby nodes. Secondly, every node compares its own residual energy with the received energy messages from nearby nodes. Thirdly, if one node detects that its energy is larger than that of a certain number of nodes (the number is determined by the value of optimal order), it will broadcast to those nearby nodes (those nodes’ remain energy are lower than that of the broadcaster) that it is their cluster head node. Many border nodes may be found after clustering, and these nodes can join a nearby cluster randomly. In this way, those nodes that have spatial relativity can be clustered together naturally. Then the Compression instruction will be computed by the sink for every node within a cluster. By contrast, in the DSTD algorithm the sink will compute predicted correlation coefficients and compression instructions within the whole sensor network, which will be more complex and more time and energy will be spent.

Supposing node j obtains some original data

In _{k}_{t}_{j})^{2}] as the difference of

Suppose discrete data

In

We can get:

In order to get predictive coefficients _{j}_{j})^{2}], we can differentiate E[(N_{j})^{2}] with respect to _{j}

According to the above analysis, if we get certain groups of original data (for example N/3 discussed in DSTD), we can get predictive coefficients _{j}_{j}_{j}_{j})^{2}] to realize the adjustment of _{j}

We can obtain
_{j}

Set:

We get:

Combining with the following two equations [

We can get:

So the updating formula of _{j}

Up to now, we can estimate
_{j}^{i}^{−1} _{n,j}_{n,j}^{i}^{−1} _{n,j}

Then we can get:

For a given probability

According to the above analysis, we can describe the DSTD algorithm as follows:

The sink obtains N/3 groups of original data transmitted from all nodes and then computes the initial predicted coefficient _{j}

The sink allocates a node in turn to transmit original data and computes the value of

The node allocated in (2) transmits original data to the sink. Other nodes get compression order

Combining the predictive coefficient _{j}_{j}

If original data of all nodes are estimated, the sink will compute and transmit compression order to nodes for the next moment. If the number of groups of original data obtained by nodes is up to N, the algorithm will turn to Step (1) or turn to Step (2).

The flow chart of the DSTD algorithm is shown in

All nodes transmit original data to the sink and then the sink will judge whether the order is optimal or not by CIC when every cycle transmission is finished. If the order is not optimal and the number of rounds is smaller than N/3, nodes will continue to transmit original data, or the sink will compute the initial predictive coefficient _{m}

The value of the optimal order is related with the predicted estimation coefficients which are associated with the space relativity of the original data obtained from nodes, so the sink divides all nodes into some clusters based on the value of the optimal order. Then the sink allocates a cluster head for every cluster and computes the compression instruction for every node in the cluster.

Nodes within a cluster apply the mod operation to the original data based on the compression order for getting the compression code. Then the cluster transmits all compression codes and its own original data to the sink.

Within a cluster, the sink combines the predictive coefficient _{m}_{m}

If the original data of all nodes are estimated, the ink will compute and transmit the compression order to clusters for the next moment. If the number of groups of original data obtained by the nodes is up to N, the algorithm will turn to Step (1), or, turn to Step (2).

Compared with the DSTD algorithm, optimal order estimation and the operation of clustering are introduced in the COOE-DSTD algorithm, which can decrease the dimension of the data disposed in the sink. The reason is that DSTD algorithm takes all nodes as relative nodes of the node waiting for estimating. If the size of the wireless sensor networks is big enough, some remote nodes which have little relativity with the node waiting for estimating will be considered impertinently, which not only increases the complexity of computation, but also decreases the accuracy of the algorithm.

Contrarily, the COOE-DSTD algorithm only considers those nodes which have spatial-temporal relativity with the node waiting for estimating through the judgment of optimal order estimation, which not only decreases the complexity of computation and the time delay, but also increases the accuracy of the restored data. Meanwhile, the average energy of node can be reduced because the actual data transmitted from nodes to sink is deceased. The method of dividing nodes into clusters is introduced in the COOE-DSTD algorithm, so the sink can locate the place where exceptions have taken place through the cluster structure. Because the predicted coefficients are constructed based on clusters and the data obtained by nodes in the same cluster naturally have spatial-temporal relativity, so the real-time ability and expandability of the whole monitoring system are obviously improved when the COOE-DSTD algorithm is used.

Presented above is the theoretical analysis of the performance and basic principles of the proposed algorithm. Here we will verify the above analysis through simulation examples. From the flow chart, we know that optimal order estimation and the clustering operation are introduced in the proposed algorithm. Compared with the DSTD algorithm, the main merits of the COOE-DSTD algorithm are shown as follows:

As known to all, most node energy is spent on transmitting original data. The number of groups of original data which must be sent to the sink can be reduced through the optimal order estimation, so the average energy cost per node can evidently be reduced.

The computing is concentrated on those nodes which have some extent relativity with the node which waits for estimating, and some nodes which have little relativity are ignored, so the data computation dimension is decreased, and then the accuracy of the data can be improved.

If the size of wireless sensor networks is very large, after the operation of clustering, sinks can locate the place where exceptions have taken place throughout the cluster structure. Compared with the one by one searching way, the clustering operation can reduce time delays and efficiently improve the real-time ability of the system.

Now the comparison is made between the DSTD algorithm and the COOE-DSTD algorithm through simulation examples. First we should set the criteria for evaluating the performance of the algorithms. Reducing the average energy cost per node to extend the service life of the whole system is one of the most important goals of the proposed algorithm. Besides, the ratio of peak signal to noise is used to evaluate the quality of the restored data in the field of data compression. The COOE-DSTD algorithm is not only able to reduce the communication load to decrease the energy cost of the nodes, but also can guarantee the accuracy of the restored data, so here we take the average energy cost per node, signal to noise ratio and their ratio as performance evaluation criteria for the algorithms. We can refer to Wang’s description in which Strong ARM SA-1100A is taken as an example to calculate the energy cost of nodes. The total energy cost of a node can be expressed as follows [

In _{lp}_{lt}_{lr}_{rt}

In the above equation, E_{elec} = 50 nJ/bit, ε_{amp} = 100 pJ/b/m^{2}, _{rt} with

The meaning of the parameters in

In

For an ordinary processor, if the distance from node to sink is vastly larger than the distance from node to node, the energy cost of executing instructions can be ignored, so the formula of the energy cost can be simplified as follows:

Both the COOE-DSTD algorithm and the DSTD algorithm need to transmit some groups of original data to the sink in the initialization stage, and then transmit the compression code to the sink, so the energy cost of a node at the beginning of algorithm is larger than later and it is rational to set the average energy cost per node as a performance criterion of the algorithm.

The ratio of peak signal to noise is given by:

In

Here we suppose nodes are distributed evenly in a surveillance area. The distance from node to neighbor node is 1 meter, and the distance from node to sink is 1,000 meters.

From

The reason is similar with the factors analyzed above. More and more data which have little spatial relativity are imported to estimate a new node value in the DSTD algorithm as the number of nodes increases. Similarly, through the optimal order estimation, data which have little spatial relativity are discarded in the COOE-DSTD algorithm, so the SNR of the COOE-DSTD algorithm is always larger than that of DSTD algorithm.

From

The reason is that all other nodes are taken as relative nodes in the DSTD algorithm when computing the compression order in the sink, so the compression code in the DSTD algorithm is longer than that in the COOE-DSTD algorithm, which leads to the energy cost per node in the DSTD algorithm being larger than that in the COOE-DSTD algorithm. Besides, we can find that the average energy cost per node shows a local minimum when the number of nodes of a sensor network is 120. That is to say, when the size of sensor networks is close to 120 in an application, we can obtain a local optimum by setting the number of nodes at 120 in applying the COOE-DSTD algorithm.

From

The simulation result of the situation of increasing the number of nodes is similar with the situation in

With the features of wireless sensor networks in mind, we have proposed the COOE-DSTD algorithm through implementation of optimal order estimation and the operation of clustering. Compared with the DSTD algorithm, the actual amount of computing is reduced because the optimal order is obtained, so the accuracy of the restored data can be greatly improved and the average energy cost can be reduced efficiently. The operation of clustering enables the sink to locate nodes which have detected exceptions quickly, therefore the time delay of the system can be greatly reduced and the scalability of the system will evidently be improved. However, the algorithm proposed in this paper mainly aims to deal with one-dimensional data. In future work we will try to extend the algorithm to deal with two-dimensional data.

This paper is supported by the National Natural Science Foundation of China (NSFC60974012), the Natural Science Foundation of ZheJiang Province (Y1100054), the Key Science and Technology Plan Program of Science and Technology Department of ZheJiang Province (2008C23097), the Science and Technology Plan Program of Science and Technology Department of Hangzhou (20091133B03).

Distributed coding.

Structure tree.

(a) The flow chart of DSTD; (b) The flow chart of COOE-DSTD.

The change of SNR with the increase of N.

The change of SNR as the increase of nodes J.

The change of AEC as N increases.

The change of AEC with the increase of nodes J.

The change of AEC/SNR with the increase of N.

The change of AEC/SNR with the increase of nodes J.