^{1}

^{*}

^{1}

^{2}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction.

Wireless Sensor Networks (WSNs) consist of few or several sensor nodes which are resource constrained. Some sensor nodes gather data from external environments and send information such as temperature, humidity and light to the sink. The information is sent hop by hop (intermediate nodes) until the sink is reached. However, data traffic is a problem in WSN due to high energy consumption [

These sensors can be used in many applications such as event detection, location, monitoring and control [

In this scenario, the sensor nodes frequently send the same data gathered from a specific area. The overlapping of information sent to the sink causes waste of energy, which decreases the network lifetime. The problem is even worse when the number of deployed nodes increases (scalability), because data communication is responsible for most of the energy consumption in WSN [

An energy efficient communication protocol helps improve the deployment of this type of network in environments such as vegetation and weather monitoring. The correlation between the data gathered by a sensor node and its neighbors, as well as the correlation between the data gathered by the sensor node itself over a given time [

The purpose of data prediction is to reduce data traffic to the sink. It has been adopted in several papers in the literature [

That approach usually takes into account the correlation of only one variable to be predicted (named dependent or response variable, e.g., temperature) and only one variable to predict the dependent variable (named independent or explanatory variable, e.g., time/epoch). However, the time variable is not the most correlated variable with others variables such as temperature, humidity and light.

Thus, the prediction adopted by current solutions, is sometimes not accurate. Consequently, the questions we address here are: “can we use the correlation between the variables gathered by the same sensor node to improve prediction accuracy?” and “is the multivariate prediction more accurate than published methods?”

We propose a method that performs prediction of data based on multivariate correlation. In our method, we take into account the correlation between two readings of data gathered by the sensor node and also the time/epoch variable (

In our approach we use a tree-based routing protocol to forward the data traffic from sensor nodes to the sink node, an approach similar to the one adopted by Li

In this paper, simulations with simple and multiple linear regression functions are carried out to evaluate the prediction solution. For our solution, initially the correlation degree of the variables gathered by the sensor node is measured to decide which variable will be the independent one. Here in this paper, the Pearson’s coefficient (r) [

An original application to data collection without any prediction mechanism was developed. This application emulates a real gathering of temperature, humidity and light data. Then, the original version of this application is compared to three enhanced versions, where two use simple linear regression and one uses multiple linear regression. The prediction accuracy performance is evaluated by means of Residual Sum of Squares (SSerr) and coefficient of determination (R^{2}).

Goel and Imielinski [

After PREMON, some works [

Xu and Lee [

Matos

Seo

Silva

Multivariate spatial and temporal correlation is the key to solve problems of prediction accuracy and improve energy savings through data reduction techniques. The papers found in the literature have superficially addressed prediction accuracy, but it is an essential issue in WSNs.

This paper has the advantage (

Our work is inspired on these techniques and concepts (spatial and temporal correlation, data reduction and prediction), already known in the literature to address energy saving issues in WSN. However, we focus on the challenge of improving prediction accuracy of WSN data based on a multivariate correlation method.

Several techniques have been defined to optimize energy consumption in applications for reducing data sent to the sink. The most common are compression, aggregation and fusion [

This section describes two concepts used by current works found in the literature, which we used in the conception of our solution. To the best of our knowledge, there is no other paper that uses multiple linear regressions to perform prediction and Euclidian distance to check correlation between neighbor sensor nodes readings, but we found papers such as that of Skordylis

Pearson’s coefficient [_{X1, X2} represents the relationship between two one-dimensional vectors _{1} and _{2}, to be compared in terms of their correlation. They contain samples window of two variables, _{1} = _{11}, … ,_{1i} and _{2} = _{21}, … ,_{2i}, where

The coefficient

Therefore, when coefficient (r) is close to the highest or lowest value (1 or −1), then the correlation between two vectors is high. Thus, we can calculate the spatial and temporal correlation of the readings of just one variable between two neighbor sensor nodes [

In addition, we can build a table which determines how much one variable is related to another. The correlation table for variables from real data trace is shown in the next section. Coefficient

The current solutions of data reduction by means of linear regression are performed by using simple linear regression based on the least squares [

Two application versions based on simple linear regression (as the current solutions) were developed to compare the performance evaluation of our solution, which use prediction based on univariate correlation (simple linear regression based on the least squares). One application version is also used by Matos _{1}, … ,_{i}_{i}

Coefficients β and α are calculated by each sensor node and, when arriving at the sink, they are used for data recovery, according to _{qi} and _{pi} represent one one-dimensional vectors, which respectively contain the values of the predictions made by one dependent variable _{qi} = _{q1}, …,_{qi} and _{pi} = _{p1}, … ,_{pi}, where

This approach is used in current solutions, but we propose the use of multiple linear regression instead of simple linear regression due to the fact that prediction accuracy in multivariate correlation is better. In the next section, we describe how to calculate β and α coefficients to perform our method.

The purpose of our approach is to improve prediction accuracy in the WSN data reduction. We use multivariate correlation to decrease prediction errors by means of multiple linear regression as follows:

multivariate temporal correlation is applied to perform prediction of consecutive readings by means of multiple linear regression in each sensor node;

each sensor node calculates its β and α coefficients and sends them to the sink, instead of sending all field readings;

multivariate spatial correlation is used to detect data overlapping by means of Euclidean distance. Therefore, we avoid that the same information is sent by several neighbor sensor nodes; and

the missing data can be generated by the sink.

The main contributions of this paper are: (1) discussion about prediction accuracy in environmental monitoring, which includes the correlation between gathered variables such as temperature, humidity and light; (2) it highlights that it is possible to use more accurate prediction solutions through the multivariate correlation method; and (3) it presents the challenges and shows, in details, the steps required to use this solution for data reduction based on prediction approach by multiple linear regression.

Our proposed solution is done in eight steps. Some premises are assumed, such as a neighbor coefficients table is created in each sensor node when it starts; a coefficients table is created in the sink; every sensor node remains in promiscuous mode and it stores neighbor coefficients; sampling window must be suitable to maximum size of the packet and defined early by the developer.

Step #1: the sensor node stores a fixed number of samples of gathered readings from all the variables in each cycle.

Step #2: each sensor node calculates coefficients β and α of the multiple linear regression function when the sampling window reaches the maximum storage threshold previously defined.

Step #3: before sending its β and α coefficients to the sink, the sensor node looks for duplicated entry in its neighbor coefficients table. These coefficients are received from its neighbor sensor nodes by broadcast.

Step #4: if the values generated by the sensor node have already been sent to a neighbor sensor node, the sensor node drops its β and α coefficients. Then, it sends a special packet of reduced size, named correlation packet. This packet advertises that the sensor node is correlated to another neighbor sensor node.

Step #5: if coefficients β and α have not been sent yet by another neighbor sensor node, the sensor node sends them to its parent node until the sink is reached.

Step #6: the sensor node also sends the sequence of variable readings which is used as independent variable. It is worth mentioning that this variable is calculated by using Pearson’s coefficient [

Step #7: when coefficients β and α reach the sink, they are used in the multiple linear regression function to predict the readings which have not been sent. Moreover, these coefficients are stored for later use by the correlation packets (Step #4).

Step #8: if a correlation packet reaches the sink instead of the coefficients, the sink looks for entries from the correlated node in its coefficients table (Step #7). Then β and α coefficients previously stored, are used to predict the readings.

WSNs consist of multiple nodes spread in a redundant way. Thus, we get a fault tolerant system through dense networks. On the other hand, these networks are usually composed of resource constrained devices. The energy is supplied by batteries and energy consumption can be better managed when the correlations from monitoring applications are taken. Therefore, we can develop solutions which reduce data traffic in the network. The spatial correlation can be exploited to optimize data communication to the sink and between neighbor sensor nodes [

The spatial correlation happens due to similarities of data being sent to the sink by several sources from high density network [_{N} = _{N1}, … ,_{Nj} and _{V} = _{V1}, … ,_{Vj}. In our case _{XN, XV} represents the correlation between two multidimensional vectors of dimension

The smaller the Euclidean distance is, the greater is the correlation between two vectors. Thus, we can compare coefficients β and α of the multiple linear regression generated from consecutive readings gathered by a sensor node to β and α coefficients from its neighbor sensor nodes at a given time. The sensor node checks if there is correlation between itself and its neighbor sensor nodes (Step #3), before sending a packet containing β and α coefficients of the multiple linear regression function. If the Euclidian distance is close to 0 (zero), then it means that a packet with the same content was previously sent by any other neighbor sensor node (Step #4).

In our proposed solution, the sensor node detects if there is multivariate spatial correlation between itself and its neighbor node by tree-based routing. This is similar to the compression mechanism adopted by Li _{XN, XV} [

The sensor node does not send coefficients β and α of the current readings to the sink if the Euclidian distance is 0 (zero). It eliminates the overlapping of information between neighbor sensor nodes. Thus, some sensor nodes do not send data packets at a given time. Therefore, it reduces the broadcast between neighbor sensor nodes and also the data forwarded by the relays.

The temporal correlation happens due to the fact that the sensor node gathers correlated data from one or more variables at a given time. This type of correlation is observed due to the nature of physical phenomena [

Our data reduction solution occurs in a distributed way, where each sensor node calculates coefficients β and α from the multiple linear regression function (Step #2). Then, it only sends β and α if there is no multivariate spatial correlation with other neighbor sensor node.

Coefficients β and α are not calculated by the simple linear regression as the amount of independent variables is greater. The multiple linear regression is described below:
_{0} = α for simplicity and compatibility with β and α coefficients of the simple linear regression.

The sink receives β and α coefficients, or the correlated packet for data recovery by means of prediction. It distinguishes this based on packet size. Thereafter, the predictor calculates the values of the missing readings based on β and α coefficients of the multiple linear regression function [

In our approach, we decided to adopt a statistical technique to be the predictor due to two main reasons: (1) we are initially developing studies to assess the effects of multivariate correlation and its advantages over univariate correlation; and (2) we intend to adopt computational intelligence techniques to identify its benefits over statistical techniques in further works.

The prediction of variables using multiple linear regression is calculated according to _{qij} represents one one-dimensional vector, which contains the values of predictions made for one dependent variable _{pij} represents the multidimensional vector, which contains values history of the samples from more than one independent variable _{qij} = _{qi1}, … ,_{qij} and _{pij} = _{pi1}, …, _{pij}, with _{pij}. β and α respectively represent the coefficients calculated using _{0} = α due to compatibility with the notation of β and α coefficients used in this paper.

The prediction by simple linear regression is calculated by

We used simulation to prove the performance of our solution. The simulation tool adopted was Tossim (

The whole code was developed for simulation by nesC to TinyOS 2.x. They can be embedded within the sensor nodes of the Tossim simulator and also within the real sensor nodes. This ensures that the same code used to simulate the experiments is able to perform tests in real scenarios in the future.

The simulation scenarios involve different situations of network density, data application values (gathered variables, correlated or not) and way of node deployment. Thus, we check possible real word scenarios by simulation.

Application versions were created to check the improvement of our solution. The first version is the baseline to compare the energy consumption. The aim of this version is to measure the energy consumption without prediction and to check how much each prediction solution will waste when data reduction is used by simple or multiple linear regressions.

The second version is a version adopted by Matos

The third version is a way to check if it is possible to improve prediction accuracy by changing only the independent variable. We used the temperature variable instead of the time variable, because it is more correlated with other variables. The best way to improve prediction accuracy is by decreasing prediction errors, using the same energy amount than the second version, but there is a trade-off between prediction accuracy and energy consumption.

The last version is our solution which uses the time and temperature variables together in the prediction. The correlation between gathered variables is higher than the time variable, and then we believe that prediction error will decrease, even though it wastes more energy. Each application version has different packets length, which determines how much energy will be wasted in data communication,

The performance evaluation was done through four application versions, which we used to simulate and compare multiple linear regression to simple linear regression and to the original version of a monitoring application. This monitoring application simulates the gathering of three variables from the environment: temperature, humidity and light. The application versions to achieve the simulations are:

First version: original application version, which sends temperature, humidity and light readings periodically every 1,024 clock shots from the sensor node, without performing prediction. This version was created to serve as a reference application for us to compare the energy consumption in the later versions, which uses prediction for data reduction.

Second version: enhanced version of the original application through a simple linear regression model. It sends only β and α coefficients for each dependent variable. It uses a counter (time variable) as independent variable to predict temperature, humidity and light. This version was designed to verify the energy consumption when simple linear regression is used to reduce data sent to the sink. It was also implemented to calculate SSerr and R^{2} to compare to the next versions. The counter is used as time variable, so it does not send any variable samples to the sink. This version is based on the method proposed by current works as Matos

Third version: enhanced version of the original application through a simple linear regression function, but using the temperature as independent variable, instead of time variable. It sends reading samples of the temperature variable and the β and α coefficients for each dependent variable (except temperature) to predict the dependent variables humidity and light. This version was designed to verify the impact of this model on energy consumption when simple linear regression was sending an independent variable to reduce data communication. It was also created to check SSerr and R^{2} compared to the second and third versions. The temperature was chosen as independent variable due to the results obtained from coefficient

Fourth version: enhanced version of the original application through a multiple linear regression function, using counter and temperature as independent variables. It sends reading samples of temperature and β and α coefficients for each dependent variable (except temperature) with _{0},_{1},_{2}) where _{0}. It predicts the dependent variables light and humidity. This version was designed to verify SSerr and R^{2} compared to the second and third versions. Our proposed method is based on this version.

For each application version, we used different types of packet according to each situation. TinyOS 2.x provides, by default, packets up to 28 bytes to be sent by WSN applications, where only 20 bytes can be used by user data and route information. Therefore, we designed application messages with sizes that fit the maximum acceptable size and each version has to be well worked out. The features of each application version are:

First version: for this version there is only one type of application packet of 14 bytes (

Second version: we created two types of application packets: one packet of 20 bytes (

Third version: three types of application packets were created in this version: one packet of 16 bytes (

Fourth version: three types of application packets were created in this version: one packet of 20 bytes (_{0},_{1},_{2}) where _{0}; one reduced size packet of 10 bytes (

Implemented applications have been run in Tossim. We have used traces (Intel Berkeley Research Lab on

We embed all four application versions within the sensor nodes in the Tossim. Then, the performance of prediction accuracy of the different applications was measured. Also, the energy consumption of data communication in an original application version was tracked. The energy consumption of the original version with three enhanced versions was compared, with two using simple linear regression and one using multiple linear regression (our proposed solution).

The two parameters used to reveal the overperformance or underperformance of prediction accuracy of our solution compared to current works are the Residual Sum of Squares (SSerr) and coefficient of determination (R^{2}). SSerr [^{2} [

Let
_{qi} represents an one-dimensional vector, which contain the values of the predictions made by one dependent variable _{qi} = _{q1}, …,_{qi}, where _{i}

The performance evaluation of our solution was also measured by ranging the sample amount. This shows how much our solution is affected by the trade-off between prediction accuracy and energy consumption. We repeated the scenario that had the best results among the scenarios simulated to check the behavior of our solution.

The evaluation metrics adopted for this work are: (1) efficiency of the energy consumption metrics; (2) and efficiency of the predictor metrics. Efficiency of energy consumption metrics are defined as—the total average of energy consumption in the network in Joule from the transmission of application packets (E_{trans}); the total average of energy consumption in the network in Joule from the reception of application packets by broadcast of the neighbor sensor nodes—gossiped (E_{recp}); the number of times that the multivariate spatial correlation was detected by sensor nodes (C_{spatial}); and the percentage of saved energy in the versions with linear regression (versions 2 to 4) in face of the original version (E_{saved}). Predictor efficiency metrics are defined as—the prediction error rate (SSerr); and the predictor improvement based on the coefficient of determination (R^{2}).

Energy waste in data communication is addressed by the energy consumption metric. According to each application version, the packet length is smaller in initial versions and is bigger in final versions. Thus, the energy consumption tends to be higher in the final version.

The spatial correlation is measured by the amount of times it is detected, showing how an application version saved energy by not sending a large data packet. Perhaps there are no significant differences between the applications versions, since this mechanism has not been modified, but only adapted for each other.

SSerr shows how many errors each application version has over the other. Probably, the initial versions has a higher prediction error than the last versions, because the use of correlated variables in prediction ensures fewer errors.

Coefficient of determination measures the improvement of predictor in relation to its error. Unlike SSerr, the improvement tends to be better in final versions.

Our work aims to improve prediction accuracy and is not more focused on saving energy than current solutions, but nevertheless we have checked the impact of our solution in face of current solutions to measure how feasible it is in a WSN.

Three characteristics are important to set up scenarios in our simulation. The first one is the behavior of the light variable. Sometimes, the light variable changes easily and leads to different results in the prediction, due to the variation of correlation between gathered variables. It can be presented in two forms, constant and not constant. Temperature and humidity variables are usually correlated,

The second one is the topology which can increase the energy consumption in random deployments. Usually, all application versions suffer the same effects on energy consumption, since the topology will not affect the prediction.

The last one is the network density which also influences the energy consumption, but does not affect the prediction. When the network density is high,

Then, in order to explain the simulation scenarios, we summarize the characteristics in

The Link Layer Model tool of TinyOS 2.x was used to create the grid and random topologies. In each scenario several nodes densities are used and summarized in

The coefficient

Given this, the temperature variable was used as independent variable for the application versions 3 and 4. Application version 2 uses only the time variable as independent variable and application version 3 uses only the temperature variable as independent variable, instead of the time variable. On the other hand, application version 4 uses the time variable and temperature variable as independent variables.

The main goal of our proposed solution is not to reduce energy consumption compared to the existent approaches based on simple linear regression, but rather find the best trade-off between energy consumption and prediction accuracy. In our method we use samples of the temperature variable to predict the humidity and light variables. While we slightly increase energy consumption compared to simple linear regression, we improve the prediction accuracy caused by simple linear regression.

_{trans}) and reception (E_{recp}) of data by sensor nodes. We observed the impact of our method by comparing the energy consumption of the multiple linear regression (our solution) to the simple linear regression (current works).

Under all conditions, the energy consumption is greater in the application versions that use simple or multiple linear regression based on the temperature variable instead of the time variable. This happens because when using the independent variable gathered by the sensor nodes, their reading samples have to be sent to the sink. Hence, they consume more energy than the application version that uses time (the counter) as independent variable.

The energy consumption due to message exchanges between sensor nodes in scenarios #1, #2, #5 and #6 is presented in _{trans} relation between the approaches remains constant, even when scalability changes and the approaches which use gathered variables consume twice as much E_{trans} than approaches which not use it. The relation between the E_{trans} of the original application and approaches with gathered variable is about 0.17 and with the current approach is about 0.08. In scenarios #3 and #4, the communication failure affected the energy consumption [

We checked that the energy consumption of the data sent by sensor nodes in the second application version (a.k.a. SimpleCount) is the lowest [

Nevertheless, we can also see that the energy consumptions of the third and fourth application versions (a.k.a. SimpleTemperature and Multiple, respectively) are the closest to the SimpleCount in face of the first application version (a.k.a. Original). Thus, it appears as stated before that our solution uses double the energy of the current solutions, but its energy consumption is still low when compared to the version without prediction (original version).

The amount of energy spent to receive messages (E_{recp}) from application broadcast on the transmission of neighbor sensor nodes (routing gossip) is observed in _{recp} of our approach is about three times smaller than the original application, but still consuming more energy than the current approach. We can see more details of the percentage of energy saving from the three application versions that use simple or multiple linear regression in face of the original application version in

The results of spatial correlation (C_{spacial}) showed no differences between our approach and current approaches, but it points to the fact that is essential to save energy. The amount of times that the correlation was detected is greater in the scenarios where there is fixed density of 0.25 sensor nodes per m^{2},

The SSerr and R^{2} results from prediction of humidity [

The SSerr and R^{2} results from prediction of the light [

We also observed that there are different behaviors in the results [

Therefore, we suggest that by using prediction based on multiple linear regression, the sensor node checks the improvements in an adaptive way, as in Jiang ^{2}.

After the results above, we decided to repeat the simulation to evaluate the energy consumption and prediction accuracy performance and analyze the behavior of our solution. The trade-off between these two performances is intrinsic because, in order to increase prediction accuracy, our solution sends samples gathered from a variable. Therefore, our solution consumes more energy than current solutions.

The relationship between energy consumption and prediction accuracy does not depend on the amount of sensor nodes, because prediction is done in a distributed and localized way. We learned that it depends on the amount of samples. Therefore, when we increase the amount of samples, energy consumption decreases, SSerr increases and R^{2} decreases, but the WSN cannot spend much energy, thus scenario #6 was simulated again, due to the fact that it had better performance results than the other scenarios.

The amount of samples ranged from 6 (six), 8 (eight) and 10 (ten), which we respectively named Scenario #6C, Scenario #6B and Scenario #6A. The energy consumption results in these scenarios from messages sent by the sensor nodes show that, in order to decrease the amount of samples from 10 (Scenario #6A with 100 sensor nodes) to 6 (Scenario #6C with 100 sensor nodes), the E_{trans} of the network increased from 1,834.32 μJ to 2,465.70 μJ. This happens because, by reducing the amount of samples, more packets will be sent. The E_{recp} results show that the energy consumption increased from 489,567.40 μJ (Scenario #6A with 100 sensor nodes) to 578,866.80 μJ (Scenario #6C with 100 sensor nodes).

The prediction improvement of humidity for the application version 4 (multiple linear regression) decreased from 0.995868 to 0.978811 [

The results for light level prediction are a little bit different from the results for humidity, but they display the same behavior. The improvement of the light level prediction for application version 4 (multiple linear regression) decreased from 0.999752 to 0.974384 [

The results obtained from light variable prediction were different from the results obtained from humidity variable prediction. Then, we checked the behavior of the threes gathered variables and used them in our performance evaluation.

Several sensor boards are able to monitor more than one variable (multisensor), adding new challenges, such as increasing precision by reducing prediction error. In this paper, we propose a method to improve prediction accuracy in WSN data reduction by applying multivariate spatial and temporal correlations.

Prediction accuracy of correlation mechanisms depends on the correlation analysis to determine which variable is highly correlated. The current approaches are not focused on the analysis of correlations and hence the prediction errors tend to be higher. The correlation analysis results of

Related works use simple linear regression based on the time variable as independent variable, so that they are more susceptible to errors than our proposal. Although multiple linear regression spends more energy than simple linear regression, it may be the best choice, especially for accuracy-sensitive applications (e.g., precision agriculture).

We conducted simulations involving simple and multiple linear regression functions (application versions from 2 to 4) to assess our prediction solution. The values of residual sum of squares (SSerr) and coefficient of determination (R^{2}) show that prediction accuracy may be the lowest, where simple linear regression based on the time variable is used as explanatory variable. Also, these results show that the best prediction accuracy is obtained when multiple linear regression is used. The multivariate correlation method outperforms some current methods in about 50% to humidity prediction and 21% to light prediction.

Finally, we have done some works trying to improve WSN solutions [

The authors would like to thank the Brazilian funding agencies FAPEPI (Ph.D. Scholarship) and CNPq for their financial support.

Operation of the monitoring system.

Operation of the monitoring system based on prediction proposed by current authors (simple linear regression).

Operation of the monitoring system based on prediction proposed on this paper (multiple linear regression).

Proposed mechanism diagram.

Readings packet length (version 1).

Coefficients packet length (version 2).

Correlation packet length (version 2).

Coefficients packet length (version 3).

Correlation packet length (version 3).

Readings packet length (version 3).

Coefficients packet length (version 4).

Correlation packet length (version 4).

Readings packet length (version 4).

Average energy of the radio consumed by messages sent to the sink:

Average energy of the radio consumed by messages received for gossip routing:

Performance evaluation of the prediction accuracy over one day from the trace to the application versions which use linear regression (app v2 to app v4):

Improvement and SSerr of the prediction performed by application versions for the humidity variable ranging sample amount (Scenario #6A—ten samples, Scenario #6B—eight samples and Scenario #6C—six samples):

Improvement and SSerr of the prediction performed by application versions for the light variable ranging sample amount (Scenario #6A—ten samples, Scenario #6B—eight samples and Scenario #6C—six samples):

Epochs from a collect day where the light variable is less correlated with the temperature and humidity variables.

Comparison of the main characteristics of solutions.

Goel and Imielinski [ |
Centralized | Yes | No | MPEG Standard—like | No | No |

Xu and Lee [ |
Localized | Yes | Yes | Dual prediction | No | No |

Matos |
Distributed | No | Yes | Simple Linear Regression | No | No |

Silva |
Distributed | No | Yes | Principal Component Analysis | No | No |

Our solution | Distributed | Yes | Yes | Multiple Linear Regression | Yes | Yes |

Characteristics of the simulation scenarios.

| |||||||
---|---|---|---|---|---|---|---|

| |||||||

1 | X | X | X | ||||

2 | X | X | X | ||||

3 | X | X | X | ||||

4 | X | X | X | ||||

5 | X | X | X | ||||

6 | X | X | X |

Network density in the simulation scenarios.

^{2}) by scenarios | ||||||
---|---|---|---|---|---|---|

4 | 0.1600 | 0.1600 | 0.2500 | 0.2500 | 0.2500 | 0.2500 |

9 | 0.0900 | 0.0900 | 0.1111 | 0.1111 | 0.2500 | 0.2500 |

16 | 0.0711 | 0.0711 | 0.0625 | 0.0625 | 0.2500 | 0.2500 |

25 | 0.0625 | 0.0625 | 0.0400 | 0.0400 | 0.2500 | 0.2500 |

36 | 0.0576 | 0.0576 | 0.0278 | 0.0278 | 0.2500 | 0.2500 |

49 | 0.0544 | 0.0544 | 0.0204 | 0.0204 | 0.2500 | 0.2500 |

64 | 0.0522 | 0.0522 | 0.0156 | 0.0156 | 0.2500 | 0.2500 |

81 | 0.0506 | 0.0506 | 0.0123 | 0.0123 | 0.2500 | 0.2500 |

100 | 0.0494 | 0.0494 | 0.0100 | 0.0100 | 0.2500 | 0.2500 |

Results of the correlation analysis.

1.0000 | −0.7987 | 0.4550 | −0.2681 | |

−0.7987 | 1.0000 | −0.2489 | 0.1987 | |

0.4550 | −0.2489 | 1.0000 | −0.1807 | |

−0.2681 | 0.1987 | −0.1807 | 1.0000 |

Percentage of the energy saving for sending and receiving data in face of the original application version.

| ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

2 | 0.93 | 0.93 | 0.93 | 0.93 | 0.93 | 0.89 | 0.93 | 0.92 | 0.92 | 0.87 | 0.92 | 0.87 |

3 | 0.87 | 0.87 | 0.87 | 0.87 | 0.87 | 0.82 | 0.87 | 0.85 | 0.86 | 0.82 | 0.86 | 0.82 |

4 | 0.86 | 0.86 | 0.86 | 0.86 | 0.86 | 0.81 | 0.86 | 0.84 | 0.85 | 0.79 | 0.85 | 0.80 |

Performance results of the SSerr and R^{2} from all versions in scenarios #1, #4 and #6.

| ||||||
---|---|---|---|---|---|---|

| ||||||

| ||||||

^{2} |
^{2} |
^{2} | ||||

Temperature | 0.210300 | 0.296891 | − | − | − | − |

Humidity | 9.355700 | 0.025813 | 2.033940 | 0.788210 | 0.203488 | 0.978811 |

Light | 2.121380 | 0.000000 | 0.073135 | 0.965525 | 0.054342 | 0.974384 |

Performance results of the SSerr and R^{2} from all versions in scenarios #2, #3 and #5.

| ||||||
---|---|---|---|---|---|---|

| ||||||

| ||||||

^{2} |
^{2} |
^{2} | ||||

Temperature | 10.321800 | 0.290535 | − | − | − | − |

Humidity | 4.964100 | 0.476813 | 8.583820 | 0.095316 | 0.185308 | 0.980470 |

Light | 140.150060 | 0.869629 | 794.135000 | 0.261311 | 1075.060000 | 0.000000 |