#
Virtual Sensing and Sensors Selection for Efficient Temperature Monitoring in Indoor Environments^{ †}

^{1}

^{2}

^{*}

^{†}

^{‡}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. The Considered Setting

## 4. Descriptive Analysis

## 5. Data Pre-Processing

#### 5.1. Distance Metrics

**Euclidean distance**: length of a line segment between the two points ${p}_{i},{p}_{j}$ defined as the ${L}_{2}$-norm $\sqrt{{({x}_{i}-{x}_{j})}^{2}+{({y}_{i}-{y}_{j})}^{2}}$;**Manhattan distance**: the ${L}_{1}$-norm of the distance, defined as $|({x}_{i}-{x}_{j})|+|({y}_{i}-{y}_{j})|$;**Chebyshev distance**: the ${L}_{\infty}$-norm of the distance, defined as $max\left\{\right|({x}_{i}-{x}_{j})|,|({y}_{i}-{y}_{j})\left|\right\}$;**Genetic Programming distance**: a combination of the previous three distances obtained by means of a genetic programming algorithm which generates a computation tree whose leaves may contain the three aforementioned distance values or a randomly generated constant and whose internal nodes are the scalar/vector operations defined as a set of primitives;**Pearson correlation**: it expresses a possible linear relationship between the statistical variables given by the temperature values of the two sensors, and it is defined as$$\left(\sum _{k=1}^{n}\left({t}_{i}^{k}-\overline{{t}_{i}}\right)\left({t}_{j}^{k}-\overline{{t}_{j}}\right)\right)/\left(\sqrt{{\sum}_{k=1}^{n}{({t}_{i}^{k}-\overline{{t}_{i}})}^{2}}\sqrt{{\sum}_{k=1}^{n}{({t}_{j}^{k}-\overline{{t}_{j}})}^{2}}\right),$$**Kendall correlation**: it expresses a possible ordinal (non-linear) association between the statistical variables given by the temperature values of the two sensors, and it is defined as$$\frac{2}{n(n-1)}\sum _{k<l}sgn({t}_{i}^{k}-{t}_{i}^{l})\phantom{\rule{4pt}{0ex}}sgn({t}_{j}^{k}-{t}_{j}^{l})$$**SHAP distance**: SHAP is a game-theoretic method that allows one to evaluate the contributions to the final result of the different predictors used in a machine learning model, with the relevance of the contribution of a predictor to the model being proportional to its SHAP value [26]. In our case, SHAP values for a generic sensor $senso{r}_{i}$ are obtained from an XGBoostRegressor model [27] that predicts the temperature value of $senso{r}_{i}$ on the basis of the temperature values of the other sensors. It is worth noticing that such a metric is not symmetric;**SAX-CBD distance**: SAX-CBD is a Compression-Based Dissimilarity measure [28] based on the assumption that the size of the compressed file of the concatenation of two discrete time series is inversely proportional to the number of patterns that they share. As a preliminary step, the temperature values obtained from $senso{r}_{i}$ and $senso{r}_{j}$ are discretized by means of Symbolic Aggregate approXimation (SAX) [29]; then, the value of the distance is computed as$$size\_of\left(compress({d}_{i}\phantom{\rule{4pt}{0ex}}+\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}\phantom{\rule{-0.166667em}{0ex}}+\phantom{\rule{4pt}{0ex}}{d}_{j})\right)/(size\_of\left(compress\left({d}_{i}\right)\right)+size\_of\left(compress\left({d}_{j}\right)\right)),$$

- For each sensor $senso{r}_{i}$, the rank $ran{k}_{d,i}$ of the other sensors according to the metric d was considered;
- then, we proceeded in an iterative way: for $k\in \{0,\cdots ,10\}$, k sensors among the worst ones in $ran{k}_{d,i}$ were discarded and a regression model was built. In more detail,
- -
- The sensors whose temperature values were to be used as predictors were determined considering the set of all sensors, except $senso{r}_{i}$ and the k sensors located in the last k positions of $ran{k}_{d,i}$;
- -
- Exploiting the training set data, a linear regression model was built to predict the temperature of $senso{r}_{i}$ using as input the temperature values of the sensors selected in the previous step and the features moy_sin, moy_cos, dow_sin, dow_cos, seconds_from_midnight_sin, and seconds_from_midnight_cos;
- -
- The resulting model was evaluated on the test set.

#### 5.2. Sensors Selection

#### 5.3. Feature Selection

## 6. Predictive Analysis

#### 6.1. Baseline Methods

#### 6.2. Particle Filters

#### 6.3. Machine Learning Approaches

#### 6.4. Deep Learning Approach

- The first (temporal) part (LSTM on the upper left side of Figure 14) takes a history of the four (standardized) reference temperatures as input. Then, a unidirectional LSTM layer, consisting of 128 units, from which we retrieve just the last outputs, followed by LayerNormalization, is applied;
- The second (atemporal) part (FCNN_1 on the bottom left side of Figure 14) takes the seven remaining (standardized) attributes as input, resulting from the feature selection process (Section 5). These attributes do not have any significant history, but are still important to generate the final output, since they allow the model to pinpoint the prediction in space and time. The aforementioned seven features are passed to a Dense layer, consisting of 64 neurons, followed by a ReLU activation function and a BatchNormalization layer;
- The third part (FCNN_2 on the right side of Figure 14) takes the outputs of the first two parts and concatenates them, generating a tensor of size 192. Then, BatchNormalization and Dropout with 0.1 rate are applied to the result of such a concatenation. Next, the data go through a dense layer of 128 units, followed by a ReLU activation function, and a BatchNormalization layer. The final output is produced by a single-unit Dense layer with linear activation function.

#### 6.5. Prediction Intervals Analysis

#### 6.6. Errors per Single Sensor

#### 6.7. Effects of Reducing the Training Data Size

#### 6.8. Comparison with a Brute Force Approach

## 7. Discussion

## 8. Conclusions and Future Work

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Liu, L.; Kuo, S.M.; Zhou, M. Virtual Sensing Techniques and their Applications. In Proceedings of the 2009 International Conference on Networking, Sensing and Control (ICNSC), Okayama, Japan, 26–29 March 2009; pp. 31–36. [Google Scholar]
- Li, H.; Yu, D.; Braun, J.E. A Review of Virtual Sensing Technology and Application in Building Systems. HVAC&R Res.
**2011**, 17, 619–645. [Google Scholar] - Saheba, R.; Rotea, M.; Wasynczuk, O.; Pekarek, S.; Jordan, B. Virtual Thermal Sensing for Electric Machines. IEEE Control Syst. Mag.
**2010**, 30, 42–56. [Google Scholar] - Oktavia, E.; Mustika, I.W. Inverse Distance Weighting and Kriging Spatial Interpolation for Data Center Thermal Monitoring. In Proceedings of the 1st International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Yogyakarta, Indonesia, 23–24 August 2016; pp. 69–74. [Google Scholar]
- Doucet, A.; De Freitas, N.; Gordon, N. An Introduction to Sequential Monte Carlo Methods. In Sequential Monte Carlo Methods in Practice; Springer: Berlin, Germany, 2001; pp. 3–14. [Google Scholar]
- Wang, J.; Zheng, Y.; Wang, P.; Gao, R.X. A Virtual Sensing Based Augmented Particle Filter for Tool Condition Prognosis. J. Manuf. Process.
**2017**, 28, 472–478. [Google Scholar] [CrossRef] - Montzka, C.; Moradkhani, H.; Weihermüller, L.; Franssen, H.J.H. Hydraulic Parameter Estimation by Remotely-sensed Top Soil Moisture Observations with the Particle Filter. J. Hydrol.
**2011**, 399, 410–421. [Google Scholar] [CrossRef] - Wang, J.; Xie, J.; Zhao, R.; Zhang, L.; Duan, L. Multisensory Fusion based Virtual Tool Wear Sensing for Ubiquitous Manufacturing. Robot. Comput. Integr. Manuf.
**2017**, 45, 47–58. [Google Scholar] [CrossRef] [Green Version] - Lary, D.J.; Faruque, F.S.; Malakar, N.; Moore, A. Estimating the Global Abundance of Ground Level Presence of Particulate Matter (PM2.5). Geospat. Health
**2014**, 611–630. [Google Scholar] [CrossRef] - Lary, D.J.; Alavi, A.H.; Gandomi, A.H.; Walker, A.L. Machine Learning in Geosciences and Remote Sensing. Geosci. Front.
**2016**, 7, 3–10. [Google Scholar] [CrossRef] [Green Version] - Rigol, J.P.; Jarvis, C.H.; Stuart, N. Artificial Neural Networks as a Tool for Spatial Interpolation. Int. J. Geogr. Inf. Sci.
**2001**, 15, 323–343. [Google Scholar] [CrossRef] - Snell, S.E.; Gopal, S.; Kaufmann, R.K. Spatial Interpolation of Surface Air Temperatures using Artificial Neural Networks: Evaluating their use for Downscaling GCMs. J. Clim.
**2000**, 13, 886–895. [Google Scholar] [CrossRef] - Xu, C.; Chen, H.; Wang, J.; Guo, Y.; Yuan, Y. Improving Prediction Performance for Indoor Temperature in Public Buildings based on a Novel Deep Learning Method. Build. Environ.
**2019**, 148, 128–135. [Google Scholar] [CrossRef] - Xue, H.; Jiang, W.; Miao, C.; Yuan, Y.; Ma, F.; Ma, X.; Wang, Y.; Yao, S.; Xu, W.; Zhang, A.; et al. DeepFusion: A Deep Learning Framework for the Fusion of Heterogeneous Sensory Data. In Proceedings of the 20th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), Catania, Italy, 2–5 July 2019; pp. 151–160. [Google Scholar]
- Ma, J.; Ding, Y.; Cheng, J.C.; Jiang, F.; Wan, Z. A Temporal-spatial Interpolation and Extrapolation Method based on Geographic Long Short-Term Memory Neural Network for PM2.5. J. Clean. Prod.
**2019**, 237, 117729. [Google Scholar] [CrossRef] - Ma, R.; Liu, N.; Xu, X.; Wang, Y.; Noh, H.Y.; Zhang, P.; Zhang, L. Fine-Grained Air Pollution Inference with Mobile Sensing Systems: A Weather-Related Deep Autoencoder Model. Proc. ACM Interact. Mobile Wearable Ubiquitous Technol. (IMWUT)
**2020**, 4, 1–21. [Google Scholar] [CrossRef] - Najjar, N.; Gupta, S.; Hare, J.; Kandil, S.; Walthall, R. Optimal Sensor Selection and Fusion for Heat Exchanger Fouling Diagnosis in Aerospace Systems. IEEE Sens. J.
**2016**, 16, 4866–4881. [Google Scholar] [CrossRef] - Palmer, K.A.; Bollas, G.M. Sensor Selection Embedded in Active Fault Diagnosis Algorithms. In IEEE Transactions on Control Systems Technology; IEEE: Piscatway, NJ, USA, 2019. [Google Scholar]
- Jin, H.; Su, L.; Chen, D.; Nahrstedt, K.; Xu, J. Quality of Information Aware Incentive Mechanisms for Mobile Crowd Sensing Systems. In Proceedings of the 16th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), Hangzhou, China, 22–25 June 2015; pp. 167–176. [Google Scholar]
- Jawaid, S.T.; Smith, S.L. Submodularity and Greedy Algorithms in Sensor Scheduling for Linear Dynamical Systems. Automatica
**2015**, 61, 282–288. [Google Scholar] [CrossRef] - Vitus, M.P.; Zhang, W.; Abate, A.; Hu, J.; Tomlin, C.J. On Efficient Sensor Scheduling for Linear Dynamical Systems. Automatica
**2012**, 48, 2482–2493. [Google Scholar] [CrossRef] - Gupta, V.; Chung, T.H.; Hassibi, B.; Murray, R.M. On a Stochastic Sensor Selection Algorithm with Applications in Sensor Scheduling and Sensor Coverage. Automatica
**2006**, 42, 251–260. [Google Scholar] [CrossRef] - Wu, J.; Jia, Q.S.; Johansson, K.H.; Shi, L. Event-based Sensor Data Scheduling: Trade-off between Communication Rate and Estimation Quality. IEEE Trans. Autom. Control
**2012**, 58, 1041–1046. [Google Scholar] [CrossRef] [Green Version] - Mo, Y.; Ambrosino, R.; Sinopoli, B. Sensor Selection Strategies for State Estimation in Energy Constrained Wireless Sensor Networks. Automatica
**2011**, 47, 1330–1338. [Google Scholar] [CrossRef] - Hare, J.Z.; Gupta, S.; Wettergren, T.A. POSE: Prediction-based Opportunistic Sensing for Energy Efficiency in Sensor Networks using Distributed Supervisors. IEEE Trans. Cybern.
**2017**, 48, 2114–2127. [Google Scholar] [CrossRef] - Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Keogh, E.; Lonardi, S.; Ratanamahatana, C.A.; Wei, L.; Lee, S.H.; Handley, J. Compression-based Data Mining of Sequential Data. Data Min. Knowl. Discov.
**2007**, 14, 99–129. [Google Scholar] [CrossRef] - Lin, J.; Keogh, E.; Wei, L.; Lonardi, S. Experiencing SAX: A Novel Symbolic Representation of Time Series. Data Min. Knowl. Discov.
**2007**, 15, 107–144. [Google Scholar] [CrossRef] [Green Version] - Oswal, S.; Singh, A.; Kumari, K. Deflate Compression Algorithm. Int. J. Eng. Res. Gen. Sci.
**2016**, 4, 430–436. [Google Scholar] - Fortin, F.A.; De Rainville, F.M.; Gardner, M.A.G.; Parizeau, M.; Gagné, C. DEAP: Evolutionary Algorithms Made Easy. J. Mach. Learn. Res.
**2012**, 13, 2171–2175. [Google Scholar] - Luke, S.; Panait, L. Fighting Bloat with Nonparametric Parsimony Pressure. In Proceedings of the 7th International Conference on Parallel Problem Solving from Nature (PPSN), Granada, Spain, 7–11 September 2002; pp. 411–421. [Google Scholar]
- Borda, J.D. Mémoire sur les élections au scrutin. In Histoire de l’Academie Royale des Sciences pour 1781; Gallica: Paris, France, 1784. [Google Scholar]
- Satopaa, V.; Albrecht, J.; Irwin, D.; Raghavan, B. Finding a “kneedle” in a haystack: Detecting knee points in system behavior. In Proceedings of the 31st International Conference on Distributed Computing Systems (ICDCS) Workshops, Minneapolis, MN, USA, 20–24 June 2011; pp. 166–171. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An Imperative Style, High-performance Deep Learning Library. Adv. Neural Inform. Proces. Syst.
**2019**, 32, 8026–8037. [Google Scholar] - Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat.
**2001**, 29, 1189–1232. [Google Scholar] [CrossRef] - Bergstra, J.; Yamins, D.; Cox, D. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, GA, USA, 16–21 June 2013; pp. 115–123. [Google Scholar]
- Koenker, R.; Hallock, K.F. Quantile Regression. J. Econ. Perspect.
**2001**, 15, 143–156. [Google Scholar] [CrossRef]

**Figure 1.**Location of the sensors in the considered premise. The blue cells represent the reference sensors that will be selected during the sensors selection phase.

**Figure 3.**Pearson (lower triangular part) and Kendall (upper triangular part) correlation values among the recorded temperatures.

**Figure 5.**Values assigned to (dow_sin, dow_cos) after a trigonometric transformation of the dow feature, where dow ranges from 0 (Monday) to 6 (Sunday).

**Figure 6.**Values assigned to (moy_sin, moy_cos) after a trigonometric transformation of the moy feature, where moy ranges from 0 (January) to 11 (December).

**Figure 7.**Values assigned to (sec_from _midnight _sin, sec_from_mi-dnight _cos) after a trigonometric transformation of the sec_from_mid-night feature ranging from 0 to 86,399.

**Figure 9.**Performance of linear regression, evaluated discarding sensors based on training set ranks. The dashed vertical line represents the elbow of the $Pearson$ error graph.

**Figure 10.**Weighted Borda count vote for each sensor. The vertical line represents the elbow of the graph, and it separates the selected sensors from the discarded ones.

**Figure 11.**Boxplots of the 95th percentile of the error provided by the XGBoost models built on the 5 spatial distances considered in this work.

**Figure 12.**SHAP values of the attributes considered in the second step of the feature selection process.

**Figure 15.**Linear regression prediction intervals related to sensor 3 test data with $\gamma =0.025$ and $\gamma =0.975$.

**Figure 16.**Gradient boosting regression prediction intervals related to sensor 3 test data with $\gamma =0.025$ and $\gamma =0.975$.

**Figure 18.**Sensor raspihat07 temperature values related to the prediction error outliers compared with the three nearest-neighbour sensors.

**Figure 20.**Results obtained from XGBoost on all possible combinations of k reference sensors, with $k\in \{1,\cdots ,4\}$. For each value of k, the vertical line represents the extent of the errors given by the different combinations, while the dots represent the median error. The red dashed horizontal line represents the error obtained by the subset of reference sensors selected by our approach.

Parameter Name | $\mathit{max}\_\mathit{depth}$ | $\mathit{learning}\_\mathit{rate}$ | $\mathit{n}\_\mathit{estimators}$ | $\mathit{reg}\_\mathit{alpha}$ | $\mathit{reg}\_\mathit{lambda}$ | $\mathit{gamma}$ | $\mathit{subsample}$ | $\mathit{colsample}\_$$\mathit{bytree}$ | $\mathit{min}\_\mathit{child}$$\_\mathit{weight}$ |
---|---|---|---|---|---|---|---|---|---|

Value | 16 | 0.015 | 350 | 78.87396 | 0.50044 | 5.95353 | 0.66425 | 0.65694 | 1.0 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Brunello, A.; Urgolo, A.; Pittino, F.; Montvay, A.; Montanari, A.
Virtual Sensing and Sensors Selection for Efficient Temperature Monitoring in Indoor Environments. *Sensors* **2021**, *21*, 2728.
https://doi.org/10.3390/s21082728

**AMA Style**

Brunello A, Urgolo A, Pittino F, Montvay A, Montanari A.
Virtual Sensing and Sensors Selection for Efficient Temperature Monitoring in Indoor Environments. *Sensors*. 2021; 21(8):2728.
https://doi.org/10.3390/s21082728

**Chicago/Turabian Style**

Brunello, Andrea, Andrea Urgolo, Federico Pittino, András Montvay, and Angelo Montanari.
2021. "Virtual Sensing and Sensors Selection for Efficient Temperature Monitoring in Indoor Environments" *Sensors* 21, no. 8: 2728.
https://doi.org/10.3390/s21082728