# Binned Data Provide Better Imputation of Missing Time Series Data from Wearables

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

_{0}, and data were recorded at time interval $a$. In this study, we focused on a time series where data are missing continuously and are represented as Nan (not a number) values as shown here:

#### 2.1. Data

#### 2.2. Missing Value Generation

#### 2.3. Data Binning

#### 2.4. Data Imputation

#### 2.5. Performance Evaluation of Binning

_{i}and I

_{i}as the actual value and the imputed value for data point i, respectively, where N is the number of missing values, the RMSE is calculated as follows:

## 3. Results

#### 3.1. Imputation of 1 h of Missing Data

#### 3.2. Imputation of 15 min of Missing Data

#### 3.3. Quantitative Analysis

#### 3.4. Optimal Bin Size

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

**Figure 1.**Continuous missing values in heart rate data. Data missing for 1 h from (

**a**) 3 pm (15:00 h) to 4 pm (16:00 h) and (

**b**) for 15 min from 3 pm (15:00 h) to 3:15 pm (15:15 h) are indicated by red lines.

**Figure 2.**Imputation of 1 h of missing data. Variations in RMSE when data of different bin sizes were used for imputing missing data of ‘inactive’ period of 3–4 am using (

**a**) EM, (

**b**) II, (

**c**) kNN, (

**d**) RF, and (

**e**) SI methods. Variations in RMSE when data of different bin sizes were used for imputing missing data of ‘active’ period of 3–4 pm using (

**f**) EM, (

**g**) II, (

**h**) kNN, (

**i**) RF, and (

**j**) SI methods.

**Figure 3.**Actual heart rate data and imputed heart rate data using EM methods for 1 h of missing data from 3 pm to 4 pm. The bin sizes are (

**a**) 1 h, (

**b**) 2 h, (

**c**) 3 h, (

**d**) 4 h, (

**e**) 5 h, (

**f**) 6 h, and (

**g**) entire data. (

**h**) Heart rate data for entire day along with imputed values obtained from EM method using data with 1 h bin size when data were missing from 3 pm to 4 pm.

**Figure 4.**Imputation of 15 min of missing data. Variations in RMSE when data with different bin sizes were used for imputing missing data of ‘inactive’ period of 3–3:15 am using (

**a**) EM, (

**b**) II, (

**c**) kNN, (

**d**) RF, and (

**e**) SI methods. Variations in RMSE when data with different bin sizes were used for imputing missing data of ‘active’ period of 3–3:15 pm using (

**f**) EM, (

**g**) II, (

**h**) kNN, (

**i**) RF, and (

**j**) SI methods.

**Figure 5.**The success rates of each imputation method: (

**a**,

**b**) show the success rates for volunteer V1, (

**c**,

**d**) show the success rates for volunteer V2, (

**e**,

**f**) show the success rates for volunteer V3, and (

**g**,

**h**) show the success rates for volunteer V4.

**Figure 6.**Dominance of the optimal bin size for active and inactive periods. For 1 h and 15 min of missing data for volunteers V1 (

**a**,

**b**), V2 (

**c**,

**d**), V3 (

**e**,

**f**), and V4 (

**g**,

**h**).

