Improved Interpolation and Anomaly Detection for Personal PM2.5 Measurement
Abstract
:1. Introduction
2. Methods
2.1. Proposed Algorithm
2.2. Optimal Bandwidth Selection Based on Leave One Out Cross-Validation (LOOCV)
2.3. Kernel Regression-Based Interpolation Using Linear Interpolation
2.4. Context-Aware Anomaly Detection
3. Experimental Tests
3.1. Bootstrap Simulation on Real Dataset
3.2. Optimal Bandwidth Selection
3.3. Interpolation and Anomaly Detection with Real-World Personal Data
3.3.1. Application of Interpolation and Anomaly Detection Method with Real-World Personal Dataset 1
3.3.2. Application of Interpolation and Anomaly Detection with Real-World Personal Dataset Two
4. Discussion
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Nakao, M.; Ishihara, Y.; Kim, C.H.; Hyun, I.G. The Impact of Air Pollution, Including Asian Sand Dust, on Respiratory Symptoms and Health-related Quality of Life in Outpatients with Chronic Respiratory Disease in Korea: A Panel Study. J. Prev. Med. Public Health 2018, 51, 130–139. [Google Scholar] [CrossRef]
- Bae, W.D.; Alkobaisi, S.; Horak, M.; Narayanappa, S.; AbuKhousa, E.; Park, C.-S. Predictive and Exposure Analytics: A Case Study of Asthma Exacerbation Management. J. Ambient Intell. Smart Environ. 2019, 11, 527–552. [Google Scholar]
- McAullay, D.; Williams, G.; Chen, J.; Jin, H.; He, H.; Sparks, R.; Kelman, C. A delivery framework for health data mining and analytics. In Proceedings of the Twenty-eighth Australasian conference on Computer Science (ACSC ’05), Newcastle, Australia, January 2005; pp. 381–387. [Google Scholar]
- Ashana, S.; Strong, R.; Megahed, A. Health Advisor: Recommendation System for Wearable Technologies enabling Proactive Health Monitoring. arXiv 2016, arXiv:1612.00800. [Google Scholar]
- Stekhoven, D.J.; Bühlmann, P. Missforest—Non-parametric missing value imputation for mixed type data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- mvnmle: ML Estimation for Multivariate Normal Data with Missing Values. Available online: https://cran.r-project.org/web/packages/mvnmle/index.html (accessed on 13 October 2019).
- mtsdi: Multivariate Time Series Data Imputation. Available online: https://cran.r-project.org/web/packages/mtsdi/index.html (accessed on 13 October 2019).
- Crookston, N.L.; FinleyNakao, A.O. An r package for knn imputation. J. Stat. Softw. 2013, 23, 1–16. [Google Scholar]
- Anomaly Detection with Time Series Forecasting. Available online: https://towardsdatascience.com/anomaly-detection-with-time-series-forecasting-c34c6d04b24a (accessed on 13 October 2019).
- Keogh, E.; Lin, J.; Lee, S.H.; Herle, H.V. Finding the most unusual time series subsequence: Algorithms and applications. Knowl. Inf. Syst. 2006, 11, 1–27. [Google Scholar] [CrossRef]
- Aggarwal, C. Time Series and Multidimensional Streaming Outlier Detection. In Outlier Analysis, 2nd ed.; Springer: Grewerbestrasse, Switzerland, 2017; pp. 273–310. [Google Scholar]
- Akouemo, H.N.; Povinelli, R.J. Time series outlier detection and imputation. In Proceedings of the 2014 IEEE PES General Meeting, National Harbor, MD, USA, 27–31 July 2014; pp. 1–5. [Google Scholar]
- Dietterich, T.; Zemicheal, T. Anomaly Detection in the Presence of Missing Values for weather data quality control. In Proceedings of the 2nd ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS ’19), Accra, Ghana, 3–5 July 2019; pp. 65–73. [Google Scholar]
- Nonparametric Regression. Available online: http://faculty.washington.edu/yenchic/17Sp_403/Lec8-NPreg.pdf (accessed on 14 October 2019).
- Cross-Validation (Statistics). Available online: https://en.wikipedia.org/wiki/Cross-validation_(statistics) (accessed on 14 October 2019).
- Linear Interpolation. Available online: https://en.wikipedia.org/wiki/Linear_interpolation (accessed on 14 October 2019).
- Moritz, S.; Sardá, A.; Bartz-Beielstein, T.; Zaefferer, M.; Stork, J. Comparison of different Methods for Univariate Time Series Imputation in R. arXiv 2015, arXiv:1510.03924. [Google Scholar]
- CRAN. Packages by Name. Available online: https://cran.r-project.org/web/packages/available_packages_by_name.html (accessed on 13 December 2019).
- Moritz, S.; Bartz-Beielstein, T. imputeTS: Time Series Missing Value Imputation in R. R J. 2017, 9, 207–218. [Google Scholar] [CrossRef] [Green Version]
- Junninen, H.; Niska, H.; Tuppurainen, K.; Ruuskanen, J.; Kolehmainen, M. Methods for imputation of missing values in air quality data sets. Atmos. Environ. 2004, 38, 2895–2907. [Google Scholar] [CrossRef]
- Walter, Y.O.; Kihoro, J.M.; Athiany, K.H.O.; Kibunja, H.W. Imputation of incomplete non-stationary seasonal time series data. Math. Theory Model. 2013, 3, 142–154. [Google Scholar]
- Numenta. The Science of Anomaly Detection; Numenta: Redwood City, CA, USA, 2015. [Google Scholar]
Data Pattern | Interpolation Method | Number of Missing Data | |||
---|---|---|---|---|---|
40 | 60 | 80 | 100 | ||
Up slope | Proposed | 26.919 | 26.426 | 25.733 | 28.599 |
Spline | 37.052 | 36.679 | 35.682 | 37.908 | |
LOCF | 36.604 | 36.909 | 35.707 | 38.393 | |
Agg | 656.741 | 657.476 | 659.759 | 658.761 | |
Down slope | Proposed | 24.687 | 25.704 | 27.840 | 28.945 |
Spline | 26.741 | 27.817 | 30.233 | 31.834 | |
LOCF | 34.223 | 35.888 | 38.719 | 39.818 | |
Agg | 283.824 | 278.758 | 280.763 | 280.666 | |
Flat 1 | Proposed | 4.551 | 3.981 | 4.341 | 4.376 |
Spline | 3.871 | 3.326 | 3.555 | 3.535 | |
LOCF | 5.215 | 4.803 | 4.874 | 5.403 | |
Agg | 4.426 | 3.918 | 4.052 | 4.071 | |
Flat 2 | Proposed | 3.633 | 3.751 | 3.694 | 3.656 |
Spline | 3.505 | 3.610 | 3.514 | 3.493 | |
LOCF | 4.554 | 4.817 | 4.893 | 4.783 | |
Agg | 4.031 | 4.129 | 4.062 | 4.062 | |
Flat 3 | Proposed | 2.335 | 2.310 | 2.429 | 2.325 |
Spline | 2.229 | 2.080 | 2.237 | 2.187 | |
LOCF | 2.991 | 3.027 | 3.226 | 2.875 | |
Agg | 2.271 | 2.125 | 2.293 | 2.233 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, J.; Kim, S. Improved Interpolation and Anomaly Detection for Personal PM2.5 Measurement. Appl. Sci. 2020, 10, 543. https://doi.org/10.3390/app10020543
Park J, Kim S. Improved Interpolation and Anomaly Detection for Personal PM2.5 Measurement. Applied Sciences. 2020; 10(2):543. https://doi.org/10.3390/app10020543
Chicago/Turabian StylePark, JinSoo, and Sungroul Kim. 2020. "Improved Interpolation and Anomaly Detection for Personal PM2.5 Measurement" Applied Sciences 10, no. 2: 543. https://doi.org/10.3390/app10020543
APA StylePark, J., & Kim, S. (2020). Improved Interpolation and Anomaly Detection for Personal PM2.5 Measurement. Applied Sciences, 10(2), 543. https://doi.org/10.3390/app10020543