Temporal and Spatial Nearest Neighbor Values Based Missing Data Imputation in Wireless Sensor Networks
Abstract
:1. Introduction
2. Related Work
- Linear interpolation model (LIN) Algorithm [11].
- K-nearest temporal neighbors (TKNN) Algorithm.
- K-nearest spatial neighbors (SKNN) Algorithm,
- Applying K-nearest neighbor estimation (AKE) Algorithm [14].
- Data estimation using statistical model (DESM) Algorithm [15].
- The temporal and spatial correlations are considered separately or taken into account simultaneously in the research work. LIN and TKNN only make use of data in time dimension while SKNN and AKE utilize data in space dimension. DESM combines both of the time and space information to make imputation and balances them depending on a simple correlation coefficient. However, the way to fully exploit the time and space information is expected in order to improve the performance of the imputation;
- Most of research works focus on improving the accuracy of the imputation, but few of them consider the way to improve the percentage of cases in which a mission value can be estimated;
- Little research has been conducted to study the scenarios in which more than one sensor on the node has lost data during their work.
- Temporal and spatial nearest neighbor values are defined from two perspectives: geometrical distance and data distance, which make it available to exploit the temporal and spatial correlation hidden in the data more fully. In this situation, as the tool applied in the algorithm, linear regression shows better performance to make estimates for missing values, which contributes to better accuracy of imputation;
- Four temporal and spatial nearest neighbor values are combined into the algorithm, which makes it flexible to adapt to different situations in which parts of the temporal or spatial information may be unavailable. Benefiting from it, the percentage of cases in which a mission value is improved;
- Three scenarios in which there is more than one sensor on the WSNs node have been considered in the algorithm, which are consistent with actual conditions when the data get lost on the sensors;
- For evaluating the algorithm more reasonably, different from other research work, the raw datasets are utilized directly in most of our experiments without the preprocessing steps such as the mean imputation. Meanwhile, in an indoor and an outdoor WSNs datasets, the nodes have been chosen randomly in the experiments and the performance is evaluated on both sides: the percentage of cases in which a mission value can be estimated and the accuracy of the estimation for missing values.
3. Materials and Methods
3.1. Missing Data in WSNs
- The communication unit on the node fails at some time points, and the data cannot be sent to the data processing center, so both measurements of the sensors, the temperature and the humidity data will be missing at these time points;
- The communication unit on the node works properly, but due to the failure of the data processing center or the fault in the data transmission, the data coming from the node cannot be received by the data processing center; likewise, the temperature and the humidity data will be missing at this time point;
- The communication unit on the node works properly, but because of the fault of the sensors or the node itself, for instance, the capacity fade of battery, both measurements or one of them, becomes abnormal, the spike, for example [17]. This data has been sent to the data processing center but will be removed because it has been judged to be abnormal; therefore, the temperature and the humidity data or one of them will be missing at this time point.
3.2. Temporal and Spatial Nearest Neighbor Values for a Node in WSNs
3.2.1. Definition and Computation of Temporal and Spatial Nearest Neighbor Values
3.2.2. Correlations between a Node’s Raw Value and Its Spatial (Temporal) Nearest Neighbor Values
3.3. TSNN Imputation Algorithm
3.3.1. The Methods to Calculate the Imputation Value
Algorithm 1: TSNN algorithm. |
Input: sensor node ; temperature dataset , humidity dataset |
initial number of spatial nearest neighbors |
initial number of temporal nearest neighbors the spatial—temporal coefficient . |
Output: |
1. Get and |
2. |
3. which do not contain missing values for node |
4. |
5. |
6. |
7. for j from 1 to do |
8. On get |
9. |
10. if then |
11. |
12. else if then |
13. |
14. else if then |
15. |
16. else if then |
17. |
18. end if |
19. end for |
20. On get and |
21. if and then |
22. |
23. return |
24. else if then |
25. Construct the estimation equation |
26. Using A and B, regress the coefficients of and compute |
27. else if then |
28. Construct the estimation equation |
29. Using A and C, regress the coefficients of and compute |
30. else if then |
31. Construct the estimation equation |
32. Using A and D, regress the coefficients of and compute |
33. else if then |
34. Construct the estimation equation |
35. Using A and E, regress the coefficients of and compute |
36. end if |
37. Compute the |
38. Compute the |
39. Compute the using and |
40. return |
3.3.2. The Spatial–Temporal Coefficient
3.3.3. The Best for Spatial Nearest Neighbors and for Temporal Nearest Neighbors
4. Experimental Results
4.1. Evaluation Platform Used in Experiments
4.2. Evaluation Methods Used in Experiments
4.3. Experimental Datasets
4.3.1. Intel Lab Dataset
4.3.2. GreenOrbs Dataset
4.4. Evaluation of PCE
4.5. Evaluation of RMSE
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2002. [Google Scholar]
- Hossain, T.; Inoue, S. A Comparative Study on Missing Data Handling Using Machine Learning for Human Activity Recognition. In Proceedings of the 2019 Joint 8th International Conference on Informatics, Electronics & Vision (ICIEV) and 2019 3rd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Spokane, WA, USA, 30 May–2 June 2019; pp. 124–129. [Google Scholar] [CrossRef]
- Conroy, B.; Eshelman, L.; Potes, C.; Xu-Wilson, M. A dynamic ensemble approach to robust classification in the presence of missing data. Mach. Learn. 2016, 102, 443–463. [Google Scholar] [CrossRef] [Green Version]
- Gorodetsky, V.; Karsaev, O.; Samoilov, V. Direct Mining of Rules from Data with Missing Values. In Foundations of Data Mining and Knowledge Discovery; Studies in Computational Intelligence; Young Lin, T., Ohsuga, S., Liau, C.J., Hu, X., Tsumoto, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; Volume 6, pp. 233–264. [Google Scholar] [CrossRef]
- Tolle, G.; Polastre, J.; Szewczyk, R.; Culler, D.; Turner, N.; Tu, K.; Burgess, S.; Dawson, T.; Buonadonna, P.; Gay, D. A macroscope in the redwoods. In Proceedings of the 3rd International Conference on Embedded Networked Sensor Systems (SenSys ’05), Association for Computing Machinery, New York, NY, USA, 2–4 November 2005; pp. 51–63. [Google Scholar] [CrossRef]
- Lin, W.C.; Tsai, C.F. Missing value imputation: A review and analysis of the literature (2006–2017). Artif. Intell. 2020, 53, 1487–1509. [Google Scholar] [CrossRef]
- Tkachenko, R.; Izonin, I.; Kryvinska, N.; Dronyuk, I.; Zub, K. An Approach towards Increasing Prediction Accuracy for the Recovery of Missing IoT Data based on the GRNN-SGTM Ensemble. Sensors 2020, 20, 2625. [Google Scholar] [CrossRef] [PubMed]
- Jiang, N. A Data Imputation Model in Sensor Databases. In High Performance Computing and Communications: Third International Conference, HPCC 2007, Houston, USA, September 2007 Proceedings; Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T., Eds.; Lecture Notes in Computer Science; 2007; Volume 4782, pp. 26–28. [Google Scholar] [CrossRef]
- Pan, L.; Gao, H.; Li, J.; Gao, H.; Guo, X. CIAM: An adaptive 2-in-1 missing data estimation algorithm in wireless sensor networks. In Proceedings of the 2013 19th IEEE International Conference on Networks (ICON), Singapore, 11–13 December 2013; pp. 1–6. [Google Scholar] [CrossRef]
- Ren, X.; Sug, H.; Lee, H. A New Estimation Model for Wireless Sensor Networks Based on the Spatial-Temporal Correlation Analysis. J. Inf. Commun. Converg. Eng. 2015, 13, 105–112. [Google Scholar] [CrossRef]
- Pan, L.; Gao, H.; Gao, H.; Liu, Y. A Spatial Correlation Based Adaptive Missing Data Estimation Algorithm in Wireless Sensor Networks. Int. J. Wirel. Inf. Netw. 2014, 21, 280–289. [Google Scholar] [CrossRef]
- Tutz, G.; Ramzan, S. Improved methods for the imputation of missing data by nearest neighbor methods. Comput. Stat. Data Anal. 2015, 90, 84–99. [Google Scholar] [CrossRef] [Green Version]
- Troyanskaya, O.; Cantor, M.; Sherlock, G.; Brown, P.; Hastie, T.; Tibshirani, R.; Botstein, D.; Russ, B. Altman, Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17, 520–525. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pan, L.; Li, J. K-Nearest Neighbor Based Missing Data Estimation Algorithm in Wireless Sensor Networks. Wirel. Sens. Netw. 2010, 2, 115–122. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Ai, C.; Deshmukh, W.P.; Wu, Y. Data Estimation in Sensor Networks Using Physical and Statistical Methodologies. In Proceedings of the 28th International Conference on Distributed Computing Systems, Beijing, China, 17–20 June 2008; pp. 538–545. [Google Scholar] [CrossRef]
- Madden, S. Intel Lab Data. Available online: http://db.csail.mit.edu/labdata/labdata.html (accessed on 25 February 2021).
- Ni, K.; Ramanathan, N.; Chehade, M.N.H.; Balzano, L.; Nair, S.; Zahedi, S.; Kohler, E.; Pottie, G.; Hansen, M.; Srivastava, M. Sensor network data fault types. ACM Trans. Sen. Netw. 2009, 5, 25. [Google Scholar] [CrossRef] [Green Version]
- GreenOrbs. Available online: http://www.greenorbs.org/ (accessed on 25 February 2021).
- Rubin, D.B. Inference and Missing Data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
- Bo, C.; Ren, D.; Tang, S.; Li, X.Y.; Mao, X.; Huang, Q.; Mo, L.; Jiang, Z.; Sun, Y.; Liu, Y. Locating sensors in the forest: A case study in GreenOrbs. In Proceedings of the 2012 IEEE INFOCOM, Orlando, FL, USA, 25–30 March 2012; pp. 1026–1034. [Google Scholar] [CrossRef]
- Garson, G.D. Missing Values Analysis and Data Imputation; Statistical Associates Publishers: Asheboro, NC, USA, 2015. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Deng, Y.; Han, C.; Guo, J.; Sun, L. Temporal and Spatial Nearest Neighbor Values Based Missing Data Imputation in Wireless Sensor Networks. Sensors 2021, 21, 1782. https://doi.org/10.3390/s21051782
Deng Y, Han C, Guo J, Sun L. Temporal and Spatial Nearest Neighbor Values Based Missing Data Imputation in Wireless Sensor Networks. Sensors. 2021; 21(5):1782. https://doi.org/10.3390/s21051782
Chicago/Turabian StyleDeng, Yulong, Chong Han, Jian Guo, and Lijuan Sun. 2021. "Temporal and Spatial Nearest Neighbor Values Based Missing Data Imputation in Wireless Sensor Networks" Sensors 21, no. 5: 1782. https://doi.org/10.3390/s21051782
APA StyleDeng, Y., Han, C., Guo, J., & Sun, L. (2021). Temporal and Spatial Nearest Neighbor Values Based Missing Data Imputation in Wireless Sensor Networks. Sensors, 21(5), 1782. https://doi.org/10.3390/s21051782