# Binned Data Provide Better Imputation of Missing Time Series Data from Wearables

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

_{0}, and data were recorded at time interval $a$. In this study, we focused on a time series where data are missing continuously and are represented as Nan (not a number) values as shown here:

#### 2.1. Data

#### 2.2. Missing Value Generation

#### 2.3. Data Binning

#### 2.4. Data Imputation

#### 2.5. Performance Evaluation of Binning

_{i}and I

_{i}as the actual value and the imputed value for data point i, respectively, where N is the number of missing values, the RMSE is calculated as follows:

## 3. Results

#### 3.1. Imputation of 1 h of Missing Data

#### 3.2. Imputation of 15 min of Missing Data

#### 3.3. Quantitative Analysis

#### 3.4. Optimal Bin Size

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Garcia-Duran, A.; West, R. Recursive Input and State Estimation: A General Framework for Learning from Time Series with Missing Data. In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2021, Toronto, ON, Canada, 6–11 June 2021; pp. 3535–3539. [Google Scholar] [CrossRef]
- Emmanuel, T.; Maupong, T.; Mpoeleng, D.; Semong, T.; Mphago, B.; Tabona, O. A survey on missing data in machine learning. J. Big Data
**2021**, 8, 140. [Google Scholar] [CrossRef] [PubMed] - Wu, X.; Mattingly, S.; Mirjafari, S.; Huang, C.; Chawla, N.V. Personalized Imputation on Wearable-Sensory Time Series via Knowledge Transfer. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, ACM: Virtual Event, Ireland, 19–23 October 2020; pp. 1625–1634. [Google Scholar]
- Bogl, M.; Filzmoser, P.; Gschwandtner, T.; Miksch, S.; Aigner, W.; Rind, A.; Lammarsch, T. Visually and Statistically Guided Imputation of Missing Values in Univariate Seasonal Time Series. In Proceedings of the 2015 IEEE Conference on Visual An-alytics Science and Technology (VAST), Chicago, IL, USA, 25–30 October 2015; IEEE: Chicago, IL, USA, 2015; pp. 189–190. [Google Scholar]
- Horton, N.J.; Lipsitz, S.R. Multiple Imputation in Practice: Comparison of Software Packages for Regression Models with Missing Variables. Am. Stat.
**2001**, 55, 244–254. [Google Scholar] [CrossRef] - Jadhav, A.; Pramod, D.; Ramanathan, K. Comparison of Performance of Data Imputation Methods for Numeric Dataset. Appl. Artif. Intell.
**2019**, 33, 913–933. [Google Scholar] [CrossRef] - Lakshminarayan, K.; Harp, S.A.; Goldman, R.; Samad, T. Imputation of Missing Data Using Machine Learning Techniques. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, 2–4 August 1996; Association for the Advancement of Artificial Intelligence (AAAI): Palo Alto, CA, USA, 1996; pp. 140–145. [Google Scholar]
- Norazian Ramli, M.N.; Yahaya, A.S.; Ramli, N.A.; Yusof, N.F.F.M.; Abdullah, M.M.A. Roles of Imputation Methods for Filling the Missing Values: A Review. Adv. Environ. Biol.
**2013**, 7, 3861–3869. [Google Scholar] - Rubin, D.B.; Schenker, N. Multiple imputation in health-are databases: An overview and some applications. Stat. Med.
**1991**, 10, 585–598. [Google Scholar] [CrossRef] - Koehler, E.; Brown, E.; Haneuse, S.J.-P.A. On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses. Am. Stat.
**2009**, 63, 155–162. [Google Scholar] [CrossRef][Green Version] - Junger, W.; de Leon, A.P. Imputation of missing data in time series for air pollutants. Atmos. Environ.
**2015**, 102, 96–104. [Google Scholar] [CrossRef] - Mir, A.A.; Kearfott, K.J.; Çelebi, F.V.; Rafique, M. Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data. PLoS ONE
**2022**, 17, e0262131. [Google Scholar] [CrossRef] - Guk, K.; Han, G.; Lim, J.; Jeong, K.; Kang, T.; Lim, E.-K.; Jung, J. Evolution of Wearable Devices with Real-Time Disease Monitoring for Personalized Healthcare. Nanomaterials
**2019**, 9, 813. [Google Scholar] [CrossRef][Green Version] - Suwen, L.; Xian, W.; Gonzalo, M.; Chawla, N. Filling Missing Values on Wearable-Sensory Time Series Data. In Proceedings of the 2020 SIAM International Conference on Data Mining; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2020; pp. 46–54. [Google Scholar]
- Hartley, H.O.; Hocking, R.R. The Analysis of Incomplete Data. Biometrics
**1971**, 27, 783. [Google Scholar] [CrossRef] - Meng, X.-L.; Rubin, D.B. Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm. J. Am. Stat. Assoc.
**1991**, 86, 899–909. [Google Scholar] [CrossRef] - Malan, L.; Smuts, C.M.; Baumgartner, J.; Ricci, C. Missing data imputation via the expectation-maximization algorithm can improve principal component analysis aimed at deriving biomarker profiles and dietary patterns. Nutr. Res.
**2020**, 75, 67–76. [Google Scholar] [CrossRef] - Feng, T.; Narayanan, S. Imputing Missing Data in Large-Scale Multivariate Biomedical Wearable Recordings Using Bidirectional Recurrent Neural Networks with Temporal Activation Regularization. In Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); IEEE: Berlin, Germany, 2019; pp. 2529–2534. [Google Scholar]
- Molenberghs, G.; Verbeke, G. Multiple Imputation and the Expectation-Maximization Algorithm. In Models for Discrete Longitudinal Data; Springer-Verlag: New York, NY, USA, 2005; pp. 511–529. ISBN 9780387251448. [Google Scholar]
- Platias, C.; Petasis, G. A Comparison of Machine Learning Methods for Data Imputation. In Proceedings of the 11th Hellenic Conference on Artificial Intelligence, Athens, Greece, 2–4 September 2020; ACM: Athens, Greece, 2020; pp. 150–159. [Google Scholar]
- Rao, A.R.; Reimherr, M. Modern multiple imputation with functional data. Stat
**2021**, 10, e331. [Google Scholar] [CrossRef] - Templ, M.; Kowarik, A.; Filzmoser, P. Iterative stepwise regression imputation using standard and robust methods. Comput. Stat. Data Anal.
**2011**, 55, 2793–2806. [Google Scholar] [CrossRef] - Sadhu, A.; Soni, R.; Mishra, M. Pattern-Based Comparative Analysis of Techniques for Missing Value Imputation. In Proceedings of the IEEE 5th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, 30–31 October 2020; pp. 513–518. [Google Scholar]
- Zhang, S. Nearest neighbor selection for iteratively kNN imputation. J. Syst. Softw.
**2012**, 85, 2541–2552. [Google Scholar] [CrossRef] - Tang, F.; Ishwaran, H. Random forest missing data algorithms. Stat. Anal. Data Min. ASA Data Sci. J.
**2017**, 10, 363–377. [Google Scholar] [CrossRef] - Hong, S.; Lynn, H.S. Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Med. Res. Methodol.
**2020**, 20, 199. [Google Scholar] [CrossRef] - Kokla, M.; Virtanen, J.; Kolehmainen, M.; Paananen, J.; Hanhineva, K. Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: A comparative study. BMC Bioinform.
**2019**, 20, 492. [Google Scholar] [CrossRef] [PubMed][Green Version] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Lee, J.-H.; Lee, K.-H.; Kim, H.-J.; Youk, H.; Lee, H.-Y. Effective Prevention and Management Tools for Metabolic Syndrome Based on Digital Health-Based Lifestyle Interventions Using Healthcare Devices. Diagnostics
**2022**, 12, 1730. [Google Scholar] [CrossRef] - Chakrabarti, S.; Biswas, N.; Jones, L.D.; Kesari, S.; Ashili, S. Smart Consumer Wearables as Digital Diagnostic Tools: A Review. Diagnostics
**2022**, 12, 2110. [Google Scholar] [CrossRef] [PubMed] - Kennedy, E.C.; Turley, J.P. Time series analysis as input for clinical predictive modeling: Modeling cardiac arrest in a pediatric ICU. Theor. Biol. Med. Model.
**2011**, 8, 40. [Google Scholar] [CrossRef][Green Version] - Lipton, Z.C.; Kale, D.; Wetzel, R. Directly Modeling Missing Data in Sequences with RNNs: Improved Classification of Clinical Time Series. In Proceedings of the 1st Machine Learning for Healthcare Conference; PMLR: Westminster, UK, 2016; pp. 253–270. [Google Scholar]
- Yozgatligil, C.; Aslan, S.; Iyigun, C.; Batmaz, I. Comparison of missing value imputation methods in time series: The case of Turkish meteorological data. Theor. Appl. Clim.
**2013**, 112, 143–167. [Google Scholar] [CrossRef] - Boursalie, O.; Samavi, R.; Doyle, T.E. Evaluation Metrics for Deep Learning Imputation Models. In AI for Disease Surveillance and Pandemic Intelligence; Shaban-Nejad, A., Michalowski, M., Bianco, S., Eds.; Springer International Publishing: Cham, Switzerland, 2022; Volume 1013, pp. 309–322. ISBN 9783030930790. [Google Scholar]
- Christie, D.; Neill, S.P. Measuring and Observing the Ocean Renewable Energy Resource. In Comprehensive Renewable Energy; Elsevier: Amsterdam, The Netherlands, 2021; Volume 8, pp. 149–175. ISBN 9780128197349. [Google Scholar]
- Balasubramanian, S.; Meyyappan, T. Enhancing the Computational Intelligence of Smart Fog Gateway with Boundary-Constrained Dynamic Time Warping Based Imputation and Data Reduction. In Proceedings of the 3rd International Conference on Imaging Signal Processing and Communication (ICISPC), Singapore, 27–29 July 2019; IEEE: Singapore, 2019; pp. 15–23. [Google Scholar]
- Korkuć, P.; Arends, D.; Brockmann, G.A. Finding the Optimal Imputation Strategy for Small Cattle Populations. Front. Genet.
**2019**, 10, 52. [Google Scholar] [CrossRef] [PubMed][Green Version] - Støvring, H.; Kristiansen, I.S. Simple parametric survival analysis with anonymized register data: A cohort study with truncated and interval censored event and censoring times. BMC Res. Notes
**2011**, 4, 308. [Google Scholar] [CrossRef] [PubMed][Green Version] - Theodoridis, S. Bayesian Learning: Inference and the EM Algorithm. In Machine Learning; Academic Press: Cambridge, MA, USA, 2020; pp. 595–646. [Google Scholar] [CrossRef]
- Musil, C.M.; Warner, C.B.; Yobas, P.K.; Jones, S.L. A Comparison of Imputation Techniques for Handling Missing Data. West. J. Nurs. Res.
**2002**, 24, 815–829. [Google Scholar] [CrossRef] - Ghaderpour, E.; Pagiatakis, S.D.; Hassan, Q.K. A Survey on Change Detection and Time Series Analysis with Applications. Appl. Sci.
**2021**, 11, 6141. [Google Scholar] [CrossRef] - Ghaderpour, E. Multichannel antileakage least-squares spectral analysis for seismic data regularization beyond aliasing. Acta Geophys.
**2019**, 67, 1349–1363. [Google Scholar] [CrossRef] - Ghaderpour, E.; Pagiatakis, S.D. Least-Squares Wavelet Analysis of Unequally Spaced and Non-stationary Time Series and Its Applications. Math. Geosci.
**2017**, 49, 819–844. [Google Scholar] [CrossRef] - Rahman, S.A.; Huang, Y.; Claassen, J.; Heintzman, N.; Kleinberg, S. Combining Fourier and lagged k -nearest neighbor imputation for biomedical time series data. J. Biomed. Inform.
**2015**, 58, 198–207. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Continuous missing values in heart rate data. Data missing for 1 h from (

**a**) 3 pm (15:00 h) to 4 pm (16:00 h) and (

**b**) for 15 min from 3 pm (15:00 h) to 3:15 pm (15:15 h) are indicated by red lines.

**Figure 2.**Imputation of 1 h of missing data. Variations in RMSE when data of different bin sizes were used for imputing missing data of ‘inactive’ period of 3–4 am using (

**a**) EM, (

**b**) II, (

**c**) kNN, (

**d**) RF, and (

**e**) SI methods. Variations in RMSE when data of different bin sizes were used for imputing missing data of ‘active’ period of 3–4 pm using (

**f**) EM, (

**g**) II, (

**h**) kNN, (

**i**) RF, and (

**j**) SI methods.

**Figure 3.**Actual heart rate data and imputed heart rate data using EM methods for 1 h of missing data from 3 pm to 4 pm. The bin sizes are (

**a**) 1 h, (

**b**) 2 h, (

**c**) 3 h, (

**d**) 4 h, (

**e**) 5 h, (

**f**) 6 h, and (

**g**) entire data. (

**h**) Heart rate data for entire day along with imputed values obtained from EM method using data with 1 h bin size when data were missing from 3 pm to 4 pm.

**Figure 4.**Imputation of 15 min of missing data. Variations in RMSE when data with different bin sizes were used for imputing missing data of ‘inactive’ period of 3–3:15 am using (

**a**) EM, (

**b**) II, (

**c**) kNN, (

**d**) RF, and (

**e**) SI methods. Variations in RMSE when data with different bin sizes were used for imputing missing data of ‘active’ period of 3–3:15 pm using (

**f**) EM, (

**g**) II, (

**h**) kNN, (

**i**) RF, and (

**j**) SI methods.

**Figure 5.**The success rates of each imputation method: (

**a**,

**b**) show the success rates for volunteer V1, (

**c**,

**d**) show the success rates for volunteer V2, (

**e**,

**f**) show the success rates for volunteer V3, and (

**g**,

**h**) show the success rates for volunteer V4.

**Figure 6.**Dominance of the optimal bin size for active and inactive periods. For 1 h and 15 min of missing data for volunteers V1 (

**a**,

**b**), V2 (

**c**,

**d**), V3 (

**e**,

**f**), and V4 (

**g**,

**h**).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chakrabarti, S.; Biswas, N.; Karnani, K.; Padul, V.; Jones, L.D.; Kesari, S.; Ashili, S.
Binned Data Provide Better Imputation of Missing Time Series Data from Wearables. *Sensors* **2023**, *23*, 1454.
https://doi.org/10.3390/s23031454

**AMA Style**

Chakrabarti S, Biswas N, Karnani K, Padul V, Jones LD, Kesari S, Ashili S.
Binned Data Provide Better Imputation of Missing Time Series Data from Wearables. *Sensors*. 2023; 23(3):1454.
https://doi.org/10.3390/s23031454

**Chicago/Turabian Style**

Chakrabarti, Shweta, Nupur Biswas, Khushi Karnani, Vijay Padul, Lawrence D. Jones, Santosh Kesari, and Shashaanka Ashili.
2023. "Binned Data Provide Better Imputation of Missing Time Series Data from Wearables" *Sensors* 23, no. 3: 1454.
https://doi.org/10.3390/s23031454