^{1}

^{*}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This work presents a data-centric strategy to meet deadlines in soft real-time applications in wireless sensor networks. This strategy considers three main aspects: (i) The design of real-time application to obtain the minimum deadlines; (ii) An analytic model to estimate the ideal sample size used by data-reduction algorithms; and (iii) Two data-centric stream-based sampling algorithms to perform data reduction whenever necessary. Simulation results show that our data-centric strategies meet deadlines without loosing data representativeness.

Despite their potential application, wireless sensor networks (WSNs) [

By considering the real-time applications in WSNs, we can identify some related work. In general, current contributions consider architectures and mathematical models for general applications [

In WSNs applications, physical variables, such as temperature and luminosity, can be monitored continuously along the network operation. The data set representing these physical variables can be referred to as

Before introducing our data-centric strategy, allow us to comment on data-stream related work. The data-stream contributions usually focus either in improving stream algorithms [

Considering real-time requirements and sensor-stream characteristics, we propose a data-centric strategy capable of reducing the data during data routing. In this case, the routing elements consider some application aspects, such as data type and deadline information. Our strategy considers: (1) a project design of real-time application to obtain the minimum deadlines; (2) an analytic model to estimate the ideal sample size used by the reduction algorithms; (3) and two stream-based sampling algorithms to perform data reduction when necessary during the routing task.

To validate our data-centric strategy, we use specific scenarios in which application deadlines cannot be met without data reduction. In our simulation, we use a naive tree routing based on shortest-path tree in a flat network. Application information is fed to relay nodes during build and rebuild tree phases. To identify the stream item delay, we consider that the clocks of the nodes are exactly synchronized. Thus, the time synchronization problem in WSNs [

Regarding data reduction strategies for WSNs, current researches use data fusion, aggregation, compression or correlation techniques [

Our contribution can be highlighted through the analytical model, project design, different sample stream algorithm, and evaluation considering three more realistic real-time scenarios. The general contributions of our strategy are the following:

This work is organized as follows. Section 2. presents the data-centric real-time reduction problem. Section 3. shows how to design real-time sensor network applications by using stream-based data reduction. Section 4. discusses a formal formulation that is used to determine the ideal sample size. Section 5. describes the sampling stream reduction algorithms. Simulation results are presented in Section 6., and Section 7. presents our conclusions and outlook.

The problem we address in this work is the sensor-stream reduction algorithms as a data-centric mechanism to meet deadlines in real-time applications. We consider the data-stream sampling technique to perform data reduction [

Let us consider a WSN monitoring physical or environmental conditions, such as temperature, sound, vibration, pressure, motion or pollutants, at different locations. Such a system is represented by the diagram [

This diagram illustrates the following behavior:

The ideal behavior denoted by

The sensed behavior is denoted by _{l}_{s}_{i}_{U}_{s}

The reduced behavior is denoted by

Based on these behaviors, the problem addressed in this work can be stated as follows:

To address the data-centric reduction problem in real-time applications, we consider the following assumptions:

_{1}_{n}

_{i}_{1},_{n}

_{dst}_{val}_{dst}_{val}_{low}_{hig}_{g}

These assumptions are considered in the whole paper. For instance, the routing algorithm is shortest path tree, the stream item is the set V = {Vi,…, V_{n}

The first task of our data-centric strategy considers the design of real-time application. The objectives of this design are the: characterization of the stream flow while it passes by each sensor node; identification of the software components required by real-time applications by each sensor node; and identification of the required hardware resources by each sensor node. These aspects are illustrated in

Basically, we have three steps to characterize the stream flow in each node: received data, data classification, and data processing. Considering the received data,

In

Finally, the hardware resources necessary must be identified considering the

The second task of our data-centric strategy considers the

In the _{s}_{dst}_{dst}

In some cases, ^{1}_{f}^{j}_{f}_{a}_{f}_{gen}_{src}

Every relay node computes the new local deadline (_{l}

This deadline accounts the route between the relay and the sink node, and it is defined as
_{src}^{j}

Let us consider _{src}^{1} to travel from source at relay node. Then, ^{2} will arrive in _{src}/h_{src}

This consideration is necessary, because the information in the relay node is only about _{src}

In a similar way, let us consider _{dst}^{1} to travel from relay node at sink. Then, ^{2} will arrive in _{dst}/h_{dst}

Thus, the estimated time to deliver

The first term of the sum is considered in _{del}^{1} has not arrived yet. Remember that _{dst}_{dst}_{rec}_{del}

Thus, |

The delay can be depicted as the following:

Thus, to compute |V′|, used to meet the application deadline, we consider the inequality
_{f} =_{s}

Meanwhile, considering the |

In order to identify both formulations, in simulation study (Section 6.), we will use the terms _{s}

Finally, the third aspect of our data-centric strategy considers the sensor-stream reduction algorithm (

The in-network (data-centric) reduction algorithm is integrated into a shortest-path routing tree. In this case, the routing tree is built, based on application requirements, from the sink (root) to the sources nodes by using a flooding strategy. In this flooding, _{dst}_{dst}^{1}^{nf}

In this forward process, when a relay node receives ^{1}, it checks the stream reduction criterion, in our case if ^{1} is stored and the node waits to receive and store {^{2}^{…}^{nf}

^{j}

^{j}

_{7}

^{j}

_{f}

^{j}

When the reduction is able, a histogram of

In order to identify both algorithms, in simulation study (Section 6.), we use _{central} and _{random} to represent the central and random elements choice, respectively. The _{central} sample reduction process is present in Algorithm 2.

Analyzing the Algorithm 2 we have:

Executes in

Define the inner loop that determines the number of elements at each histogram class of the resulting sample, considering _{cn}_{coli}_{cn}

Define the outer loop in which the input data is read and the sample elements are chosen. Because the inner loop is executed only when condition in line 8 is satisfied, the overall complexity of the outer loop is _{col}_{cn}_{coli}_{cn}_{coli.}_{cn}_{cn}_{cn}_{cn}

Re-sorts the sample in

Thus, the overall complexity is

since |

_{central}sampling reduction.

_{col}←

_{col}

_{col}|

_{col}-n'

_{col}

_{col}

_{col}←

_{col}← n

_{col}

This section presents the simulation study of our data-centric strategy in specific scenarios. We perform our evaluation by using the NS-2 (Network Simulator 2), version 2.33

To identify assess the network behavior, we variate the number of nodes and the stream size (|

To evaluate the delay in real-time scenarios, it is important to determine the minimum deadline (_{mm}_{min}_{min}

It is important to highlight that if either application has a deadline smaller than the one shown in

In the problem scope defined in Section 2., we discuss the impact of the solutions regarding the data quality, which is considered as our decision _{dst}_{val}

It is possible to apply these rules, because we consider the reduction of only one source, _{A}-Node_{B}_{A}_{B}

The deadlines for the real-time scenarios, that we consider, are 50% of the minimum deadlines with(out) concurrent traffic; minimum deadlines with delay caused by relay nodes in each packet transmitted with(out) concurrent traffic. These study are discussed in the next subsections. For all scenarios, we evaluate the simplified and complex formulations, both using _{central} or _{random}. We use a Monte Carlo simulation [

The first scenario considers half of minimum deadlines (_{a} = d_{min}/2_{a}

In this case, the _{a}

The reason is that, in the simplified formulation, the reduction is harder and less data is forwarded. When the number of nodes is 1024, the simplified formulation delivers 19% of data, while the complex formulation delivers 25%. This indicates that, considering only the deadline achievement, the simplified formulation is more appropriate. However, the reduction ratio is greater.

Regarding the data error evaluation. _{random}, in both formulations, has a smaller ϒ-error because the random choice improves data dispersion, and the simplified formulation has a smaller ϒ-error.

_{central}, in both formulation, has a smaller _{random} may be used when _{dst}_{central} should be used.

The partial conclusion, considering this critical real-time scenarios, is that the simplified formulation is more appropriate, because the deadlines are usually met while keeping data representativeness. Considering the sampling algorithm, _{random} or _{central} can be used when data application decisions are related, _{dst}_{val}

The second scenario, considers _{a} = d_{min}^{j})per _{a}/_{a}_{a}

_{a}_{a}, i.e._{a} > d_{min}

_{random}, in both formulation, has a smaller ϒ-error, because the random choice improves data dispersion. However, the simplified formulation has a smaller ϒ-error. In _{central}, in both formulation, has a smaller

Error evaluations suggest that the simplified formulation is more appropriate for small networks and the complex formulation is more scalable. In general, _{random} is preferable when the _{dst}_{val}_{central} should be chosen. However, the complex formulation is more appropriate when we have large scale networks. The partial conclusion, considering more realistic real-time scenarios, is that the complex formulation is more appropriate, because the deadlines are met in all cases and the data representativeness is kept.

This scenario considers 50% of minimum deadlines (_{a}_{min}/

In _{a}

_{random} has a smaller ϒ-error. However, in the complex formulation, _{central} presents a smaller ϒ-error when we have fewer data traffic (16% and 20%). The reason is that the complex formulation executes fewer consecutive

The data error evaluation suggests that the simplified formulation is slightly better than the complex one. However, the partial conclusion, considering this critical and realistic real-time scenario, is that the complex formulation is more appropriate, because deadlines are met and data representativeness is kept. Considering the sampling algorithms, the behavior is kept in both

The last scenario considers _{a}_{min}_{a}

objective of this scenario is to identify the best strategy when the relay nodes have extra tasks with high priority and the network has a traffic that gradually increases.

_{a}_{a}_{a} ≥ d_{min}

_{central} and _{random}. _{central}, in complex formulation, has a smaller ϒ-error. The reason is that the complex formulation performs the maximum Ψ-reduction sooner (_{central}

_{central} presents _{central} has a smaller

The errors evaluation suggest that the complex formulation is more appropriate with the _{central} strategy. The partial conclusion, considering this scenario, is that the complex formulation is actually more appropriate, because the deadlines are met in all cases while keeping data representativeness. Considering the sampling algorithm, the _{central} strategy with complex formulation is always indicated.

In real-time applications of wireless sensor networks, the time used to deliver sensor-streams from source to sink nodes is a major concern. The amount of data in transit through these constrained networks has a great impact on the delay. In this work, we presented a data-centric strategy to meet deadlines in soft real-time applications for wireless sensor networks. This work represents shows how to deal with time constraints at lower network levels in a data-centric way.

With our data-centric strategy we met application deadlines in several scenarios. In additional, we showed how to design real-time sensor-stream reduction applications and a analytical model used to found the ideal sample size. Results showed the efficiency of the strategy by reducing the delay without losing data representativeness. If the application is not strongly dependent on data accuracy, or the network operates in exception situation (e.g., few resources remaining or urgent situation detection), then data reduction algorithms are powerful tool for real-time applications for resource-constrained networks.

As future work, we intend to match the proposed application-level solution with lower-level ones, for example, by considering some real-time-enabled signal processing method. In this case, not only data from a source is reduced, but similar data from different sources is also reduced, resulting in a more efficient solution. Another future work is to use feedback information to enable the source nodes to perform the reduction sooner. However, we intend to improve the central sampling algorithm complexity to

This work is partially supported by the Brazilian National Council for Scientific and Technological Development (CNPq) under the grant numbers 477292/2008-9 and 474194/2007-8.

Build tree process.

Data-centric reduction design in WSNs real-time application, the sensor view.

Reduction algorithms.

Minimum deadlines.

Delay considering the half of deadlines without concurrent traffic.

ϒ-error considering the half of deadlines without concurrent traffic.

Φ-error considering the half of deadlines without concurrent traffic.

Delay considering the delay caused by relay nodes without concurrent traffic.

ϒ-error considering the delay caused by relay nodes without concurrent traffic.

Delay considering the half of deadlines with concurrent traffic.

ϒ-error considering the half of deadlines with concurrent traffic.

ϒ-error considering the half of deadlines with concurrent traffic.

Delay considering the delay caused by relay nodes with concurrent traffic.

ϒ-error considering the delay caused by relay nodes with concurrent traffic.

Simulation parameters.

Network size | Varied with density |

Queue size | Varied with stream |

Simulation time (seconds) | 1100 |

Stream periodicity (seconds) | 10 |

Radio range (meters) | 50 |

Bandwidth (kbps) | 250 |

Initial energy (Joules) | 1000 |