^{*}

This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (

Energy is an important consideration in wireless sensor networks. In the current compression evaluations, traditional indices are still used, while energy efficiency is probably neglected. Moreover, various evaluation biases significantly affect the final results. All these factors lead to a subjective evaluation. In this paper, a new criterion is proposed and a series of tunable compression algorithms are reevaluated. The results show that the new criterion makes the evaluation more objective. Additionally it indicates the situations when compression is unnecessary. A new adaptive compression arbitration system is proposed based on the evaluation results, which improves the performance of compression algorithms.

Wireless sensor networks (WSNs), a new network structure, have received continuous attention in recent ten years. Since the 1990s, when sensor networks emerged as a fundamentally new tool for military monitoring, nowadays they are widely used in many application fields such as agriculture, ecosystems, medical care and smart homes, especially for regions which are inaccessible or unattended. By right of the essential function in data collection, WSNs connect the physical environment with human beings [

Generally, each sensor node transmits monitoring data over its corresponding path to the sink. Since the nodes are battery-operated and no fixed infrastructure exists, energy becomes the primary concern in such networks. Moreover, the number of nodes in WSNs can be extremely large. It is prohibitively difficult to replace or recharge them to extend the operational lifetime of network. Thus, energy efficiency is considered as the major metric which impacts network performance significantly. Many advances have been made with the purpose of enhancing network lifetime [

Among different applications, continuous data collection for environmental monitoring is relatively popular [

Lots of compression methods are designed specifically for sensor networks. However, it seems to be difficult to get proper advice about which one is more suitable for a certain application. The lack of research on data compression evaluation and the corresponding criteria make it hard to provide efficient guidelines for both algorithm design and application. Besides, various kinds of evaluation bias tend to lead to inaccurate conclusions, which then leads to wrong choices.

In this paper, we study current compression algorithms for WSNs, and propose a novel evaluation criterion which is more applicable for them. The main contributions of our work are threefold:

First, a new evaluation criterion is presented to give attention to the energy efficiency of compression implemented in the sensor nodes. Since energy consumption is one of the most important design metrics in WSNs, this criterion will do well in such compression evaluation to provide useful suggestions during both design and application.

Second, current tunable compression algorithms aimed at WSNs are reevaluated in depth at the node level and the network level. Various kinds of real datasets are adopted, which cover almost all types of environmental data. Evaluation results based on our criterion and several traditional indices are compared avoiding different evaluation bias.

Third, based on the results, a novel compression arbitration system is proposed to enhance the performance of compression algorithms by avoiding unnecessary energy losses. Furthermore, several design considerations of compression are discussed. We suggest that design concept of compression algorithms should be changed due to the particularity of WSNs.

The remainder of the paper is structured as follows. Section 2 discusses the related work on both compression algorithms and evaluation methods. Several aspects that impact evaluation results are analyzed. Section 3 presents the principle of evaluation and defines the new criterion. Experiment setup and the methodology are described in Section 4 with the results and corresponding discussions given in Section 5. A new compression arbitration system is presented in Section 6 and Section 7 offers a summary to conclude the paper.

Data compression is regarded as a traditional technology used in digital communication, broadcasting, storage, and multimedia systems. Being applied to WSNs, compression faces more new challenges. Although there have been a number of algorithms proposed for WSNs up to the present [

Traditional compression is used with the purpose of improving the performances in communication time, transmission bandwidth and storage space. Various evaluation indices are defined. Some of them are extended for use in WSNs:

(1) Compression ratio

Compression ratio is one of the most important design indices in data compression. It visually describes the compression effect of algorithm, and is formulated as a ratio between the volume of the compressed data and the raw one. Based on it, the improvements in communication time, transmission bandwidth and storage space can be quantitatively measured.

In WSNs, compression ratio is also considered as one of the major evaluation criteria. Since it can indicate the reduction of communication energy costs, researchers prefer to show the exciting results produced by their new algorithms [

(2) Compression error

Compression error is another important criterion with various expression forms, such as RMS (Root Mean Square) error, peak error, SNR (Signal to Noise Ratio), and so on. It describes the degree of information loss after compressing.

In WSNs, lossy compression is much more popular due to the better compression ratio. Thus, compression error is unavoidable [

(3) Compression complexity

Compression complexity includes space complexity and time complexity, which represent the costs of hardware resources and execution time in data compressing, respectively. Lower space complexity means less memory occupation required; lower time complexity incurs shorter delays.

Nevertheless, compression complexity has not been seriously considered in WSNs. It is accepted that algorithms with high complexity are unsuitable for sensor nodes with restricted capabilities. Therefore, complexity seems more like a qualitative criterion. Users pay more attention to the feasibility of the algorithm, rather than the real costs of storage and time. Only in some specific applications, has compression complexity been quantitatively investigated [

In a word, researchers still prefer to use traditional standards for data compression evaluation in WSNs. Compression ratio seems the main criterion for choosing a more satisfied algorithm. However, as mentioned above, saving energy is the fundamental purpose in sensor networks. Each criterion listed above only partially reflects energy information. Thus, a new criterion is urgently desired for WSNs, though the existing ones are doing well in traditional compression evaluation.

During compression evaluating, several kinds of bias will directly influence the results. Among them, data bias and execution bias are two main aspects.

Data bias appears when non-uniform experimental datasets are used for the comparison of algorithms. It is well known that datasets with different characteristics will produce greatly different test results. For instance, data with higher redundancy trend to a lower compression ratio. So, it is difficult to distinguish which one improves the observed compression performance: the test data or the algorithm itself.

Unfortunately, data bias is ubiquitous in compression evaluation. Designers use their own datasets [

We list a series of compression algorithms and their related information in

In the literature which is most closely related to our own [

Five different types of compression methods were summarized in [

Our work is aimed to establish a relatively objective environment for data compression evaluation in WSNs. Thus, compression algorithms, in special for sensor nodes, are selected; and the performance is assessed which is focused on the energy consumption. To our best knowledge, it is the first time data compression has been evaluated systematically and objectively from the point of view of energy efficiency in WSNs. The introduction of energy information in the evaluation represents the biggest difference between our work and the previous ones. It should be advisable to pay attention to our evaluation results before new designing algorithms or choosing existing ones.

In this section, the selected compression algorithms are introduced briefly and the new evaluation criterion is proposed.

Two basic concepts are mentioned in this paper: compression ratio and peak error. Compression ratio, denoted by _{c}

It is obvious that the smaller _{c}_{P}

It indicates the maximum difference between raw data (

As mentioned in Section 2, there are several forms of compression error representation. Although RMS error and SNR seems more common in traditional compression methods, we think that peak error will be more appropriate for use in WSNs. Due to nodes’ limited computational capability, compression error seems inapplicable if it is defined as RMS or SNR. Besides the high complexity and large energy losses in error computation, compressed data need to be reconstructed at first, which will incur tremendous energy waste too. Since error requirement is generally given as an upper-bound beforehand by applications, more and more algorithms [

We introduce off-the-shelf compression algorithms designed for sensor nodes in this subsection. Their characteristics are all threefold:

First, peak error is defined as the maximum data deviation accepted by each application. It is predetermined and informed to the sensor nodes

Second, compression methods are tunable with respect to data accuracy. Changing _{P}

Third, algorithms belong to online compression with no training is needed.

(1) Predictive compression

In WSNs, environmental data show strong inter-relationships with each other in both temporal and spatial domains. Thus, various prediction models are established which predict current sample values in terms of the previous ones. An actual sample which is close to the predicted one will be removed from the raw data stream. Only the rest need to be transmitted. That becomes the basic principle of predictive compression.

Prediction based data compression was proposed well in [

(2) Wavelet transformation

Wavelet transformation based on lifting scheme is popular used in WSNs, owing to its low complexity in implementation. A 5/3 wavelet presented in [

(3) Data fitting

By right of the continuity in variation, it is proper to replace a data stream with a form of line to decrease the total bits needed in representation. In WSNs applications, several algorithms are put forward based on this idea. We merge them into one group, and call it data fitting. Methods we select in this paper are LAA (Linear Approximation Algorithm) [

To make an objective compression evaluation in WSNs, a proper criterion is needed, which focuses on the energy efficiency of each algorithm. We name it ESB (Energy-Saving Benefit) and denote it by

According to the various topologies, we describe ESB with two levels: node level and network level. The biggest difference between them is the consideration of energy costs in data receiving. At the node level, ESB is formulated as:

So,

At the network level, ESB is expressed as:

So,

Meanings of the symbols mentioned are listed in

In the communication part, _{TX}

In the computational part, _{MCU}_{MCU}_{c}_{P}_{MCU}_{c}

From (6) and (9), we can see ESB includes the information of both compression ratio and time complexity explicitly. It is evident that neither compression ratio nor time complexity is competent for estimating compression algorithms fairly from the energy point of view.

In addition, compression error is also included by ESB. Its effect works on compression ratio and time overhead, which impacts

Thus, the new evaluation criterion includes almost all the main metrics for evaluating compression, and reveals their internal relations by the way of energy evaluation. Besides, ESB additionally provides important information on whether data compression can bring energy savings or not. Just like our research presented in [_{uncomp}

WSNs have been universally used in environmental monitoring, including oceanography, atmospheric sciences, seismology, and so on. To guarantee an objective evaluation and remove bias in data selection, we choose actual and open datasets which are collected by sensor nodes and cover almost all common types and characteristics of environmental data. The datasets used in the test are summarized in

We choose a MicaZ node as the test platform for compression evaluation. It is commonly used in WSNs. The processor is an 8-bit Atmel ATmega128L microcontroller. To be fairer, processor speed is fixed at 8 MHz. As the results shown in [_{MCU}

At the node level, network topology is assumed as a simple single-hop network. Source nodes send data to a powerful sink directly. In that case, energy costs in data receiving are no need to be considered. At the network level, it is a multi-hop network. Compression affects the energy consumptions in both transmission and reception. All compression algorithms are reimplemented and recompiled for the execution bias avoiding. _{MCU}

To demonstrate the difference between the new criterion and the traditional ones, we show the evaluation results of all of them. For clearness, we summarize compression algorithms in

(1) Preferences in predictive compression

In Groups 1 and 2, _{P}

In

In addition, compression ratio differences will be enlarged as the error bound increases. In large error bounds, more data can be eliminated from the raw data stream.

In

The biggest difference between single exponential smoothing and the other two is that trend variation cannot be shown in the single one. As a result, in the other two methods, higher forecast accuracy is obtained due to the additional information. Meanwhile,

(2) Compression ratio comparison

Compression ratios (_{c}

In the figure, PMC-MR obtains the best compression effect of them all, while wavelet transformation is slightly worse than the others. In Group 1, autoregressive forecasting is better than the other three; in Group 2, single exponential smoothing is the best. It means simple model is competent for the test data. Wavelet transformation we use is one-level 5/3 wavelet. In this case, only half of the data (namely high frequency part) is compressed, which evidently limits its compression effect.

(1) Preferences in predictive compression

_{MCU}

In the figures, _{MCU}_{c}_{MCU}

In

In _{MCU}

(2) Compression complexity comparison

Because of the high time overheads in Group 2, we eliminate them from the time comparison. In

(1) Preferences in predictive compression

Due to the high time overhead in Group 2, it is hard to save energy by compression in common cases. Thus, we eliminate them from the ESB evaluation. ESB at the node and network level in Group 1 is presented in

(2) ESB comparison

Except the three exponential smoothing forecasting ones, the remaining algorithms are evaluation based on ESB at the node and network level. The results are shown in

It is clear that we obtain new comparison results which are different from the compression ratio and time complexity. Mainly owing to the excellent compression ratio and relatively low computational complexity, PMC-MR achieves the best energy-saving benefit among all algorithms listed. At the node level, it provides an average energy savings of 30% and the highest savings is as high as 70%. The probability that PMC-MR saves the total energy is higher than 75%. At the network level, ESB raises to 50% with the increase of hop counts.

It is worth mentioning that ESB of LAA is second only to PMC-MR at the node level. According to

On the other hand, the algorithms show possibilities of introducing additional energy consumptions, especially at the node level. It mainly appears in the small error bounds, because at those moments, compressing data cannot save enough energy to offset the additional costs in computation, which makes compression unnecessary.

As shown in Section 5, ESB is not always positive. In other words, data compression in WSNs is not always beneficial to energy conservation due to the additional computational energy dissipations. Thus, a low overhead method is needed as an assistant mechanism to avoid unnecessary losses in compression.

An adaptive compression arbitration system is proposed with its framework shown in

Prediction modeling

Before the arbitration, two models are established on-line to predict the compression ratio and the compression time. Information about the compression ratio and execution time for various datasets and application requirements is recorded for each prediction model. Since it is an on-line modeling, only a few samples are used allowing for saving energy.

Compression evaluation

After the modeling, the compression arbitration calculates a probable compression ratio for the given accuracy requirement and the corresponding time overhead based on the models. Then, the balance point between loss and benefit is estimated in the form of a compression ratio. Comparing the two kinds of compression ratio, the system draws a conclusion about whether compression will produce energy savings or not in the “comparison and judgment” sub-module. The feedback result is subsequentially applied to control the behavior of data processing (compression before transmission or direct transmission).

Adaptive modification

In this step, several samples are randomly selected for the verification of judgment accuracy. Once the target sample is given, its actual compression ratio and time overhead are measured for evaluating whether data compression is beneficial for energy savings. If the evaluation result is different from that of arbitration system, parameter modification is realized

The adaptive compression arbitration system is evaluated in a single-hop network with LTC as the test algorithm. Since the ultimate purpose of the arbitration system is reducing the total energy costs, we test the final energy savings provided by the new system under the different error bound levels and RF power levels. To show the efficiency of the system, two reference objects are used, which are the total energy costs for directly transmitting the raw data and the costs of compressing the data all along and then transmitting.

Energy consumptions for all three cases are presented in

In the paper, many of the current tunable compression algorithms designed for WSNs are reevaluated based on the a criterion. Since all algorithms are aimed to be used in WSNs, which consider energy consumption as the first design element, the new criterion ESB reveals the performances of algorithms more objectively.

Although several indices proposed before are do well in the traditional compression evaluation, they are probably unable to be felicitously applied to WSNs. According to the comparison results, compression ratio and time complexity cannot express well the energy performance of the compression algorithms. Compression ratio only indicates the reduction in the data amount, which is numerically expressed in communication savings; time complexity only affects the additional computational energy consumptions for compression. That is to say, neither of these two indices can reveal the complete energy information about compression.

Besides the impartiality in algorithm evaluation, ESB can also be used to detect the case when compression wastes energy. It will probably happen if increased computational energy cannot be compensated by the decreased communication energy consumption. This information is much more important in both design and application. However, it seems hard to obtain from the other criteria.

Therefore, several design considerations are discussed based on the evaluation results:

First, computational energy brought by data compression is not always negligible. It may occur that compression costs much more energy, even if it has a satisfactory compression ratio. So, compression algorithm with a lower compression ratio does not mean it is the proper one for WSNs.

Second, different types of instructions have greatly different effects on the performance of algorithm. Especially in the division instruction, more execution time is needed, which deteriorates the energy efficiency of compression rapidly. It is obviously shown in exponential smoothing forecasting. So, the division instruction should be avoided in sensor nodes. We suggest using shift operation instead of it as much as possible.

Last but not least, an adaptive compression arbitration system is proposed with the enlightenment provided by the evaluation results. The system enhances the performance of compression algorithms by avoiding unnecessary energy losses. With this arbitration system, the greatest energy savings are 33.4% when directly transmitting the data and 39.2% when compressing all the data.

This work was supported in part by the NSFC under grant #60976032, National Science and Technology Major Project under contract #2010ZX03006-003-01, and “863” Program under contract #2009AA01Z130.

The authors would like to appreciate the valuable datasets from the TAO Project Office of NOAA/PMEL, the ShakeMap Working Group, and the NWISWeb Support Team.

Compression ratio-Error bound curves of Group 1 under different parameters

Compression ratio-Error bound curves of Group 2 with different parameters

Compression algorithms comparison in compression ratio.

Execution time per byte-Error bound curves of Group 1 under different parameter

Execution time per byte-Error bound curves of Group 2 under different parameter

Compression algorithms comparison in execution time per byte.

ESB-Error bound curves of Group 1 under different parameter

ESB-Error bound curves of Group 1 under different parameter

Compression algorithms comparison in ESB at the node level.

Compression algorithms comparison in ESB at the network level.

Framework for adaptive compression arbitration system.

Energy efficiency of the adaptive compression arbitration system.

Summary of compression algorithms.

DISCUS [ |
lossless spatial domain | distortion probability of decoding error | data created by a Gaussian model | N/A |

PINCO [ |
lossless time domain | latency transmission cost | temperature map created | NS-2 |

PMC-MR PMC-MEAN [ |
lossy time domain | compression ratio | random walk data generated by a model | N/A |

sea surface temperature, salinity, shortwave radiation from TAO | ||||

LTC [ |
lossy time domain | bytes savings memory usage | environmental data from Continuous Monitoring System | Mote node |

Distributed Wavelet [ |
lossy spatial domain | SNR, bit rate, energy cost (ignore computational part) | data created by a second order AR model [ |
StrongARM |

temperature data on the Great Duck Island [ |
SA-1100 | |||

RACE [ |
lossy time domain | compression ratio compression error | environmental data from TAO | N/A |

DPCM [ |
lossless time & spatial domain | coding gains | autoregressive source, acoustic source, weather data from NCDC | N/A |

S-LZW [ |
lossless time domain | compression ratio execution time energy consumption | real data from SensorScope, Great Duck Island, ZebraNet, Calgary Corpus Geo | Tmote Sky ZebraNet node |

LAA [ |
lossy time domain | compression ratio mean square error | real temperature data from Australian Bureau of Meteorology | N/A |

Forecast-based Compression [ |
lossy time domain | successful rate of prediction energy cost | real temperature data | PowerTO SSIM |

bzip2, zlib, LZW, Wavelet, ADPCM [ |
lossless time domain | compression ratio block size | acceleration data from MTx sensor | laptop |

Gzip [ |
lossless time domain | compression ratio, energy cost execution time | six software modules | Tmote Sky |

Top-down piecewise linear approximation [ |
lossy time domain | compression ratio time complexity | time series data collected from one sensor node | C++ |

LZO, bzip2, Gzip, rar, rzip [ |
lossless time domain | buffer size compression ratio time delay | trace-file from University of Crete | Intel Core2 Duo |

Swinging Door [ |
lossy time domain | compression ratio mean square error energy cost (communication part) | real data from gas injection monitoring | NS-2 CC2520 |

COPE & DISCUS [ |
lossless spatial domain | compression ratio | N/A | N/A |

2D-DCT [ |
lossless time & spatial domain | amount of data transmission average error | indoor temperatures collected by sensor nodes | MicaZ Mote |

Types of predictive compression.

Autoregression | AR Model | Autoregressive Forecasting |

Moving Average | MA Model | Single Moving Average Forecasting |

Double Moving Average Forecasting | ||

Triple Moving Average Forecasting | ||

Exponential Smoothing | ARMA Model | Single Exponential Smoothing Forecasting |

Double Exponential Smoothing Forecasting | ||

Triple Exponential Smoothing Forecasting |

List of symbol representation.

_{uncomp} |
Total energy costs without compression |

_{comp} |
Total energy costs with compression |

_{P} |
Error tolerance |

Communication distance | |

Volume of raw data | |

Hop count | |

_{c} |
Compression ratio |

_{tran} |
Time overhead on transmitting one byte |

_{MCU} |
Time overhead on compressing one byte |

_{TX} |
Transmit power |

_{RX} |
Receive power |

_{MCU} |
Computation power |

Types of datasets.

Air Temperature | AT | C | |

Sea Level Pressure | SLP | Tropical Atmosphere Ocean Project [ |
hPa |

Relative Humidity | RH | % | |

Spectral Acceleration | PSA | The Pacific Northwest Seismograph Network [ |
pctg |

Gage Height | GH | NWIS web water data [ |
feet |

Farm Temperature | FT | The Georgia Automated Environmental Monitoring Network [ |
°F |

Values of parameters.

60–100 m | N/A | |

_{TX} |
57.42 mW | PA_Level=31 |

_{RX} |
62.04 mW | N/A |

_{tran} |
32 μs | 250 kbps data rate |

_{MCU} |
26.4 mW | 8 mA current draw |

Summary of algorithms evaluated.

Group 1 | Autoregressive Forecasting | |

Single Moving Average Forecasting | ||

Double Moving Average Forecasting | ||

Triple Moving Average Forecasting | ||

Group 2 | Single Exponential Smoothing Forecasting | |

Double Exponential Smoothing Forecasting | ||

Triple Exponential Smoothing Forecasting | ||

Group 3 | 5/3 Wavelet | N/A |

LAA | ||

PMC-MR | ||

PMC-MEAN | ||

LTC |