Error Estimation of Ultra-Short Heart Rate Variability Parameters: Effect of Missing Data Caused by Motion Artifacts

Application of ultra–short Heart Rate Variability (HRV) is desirable in order to increase the applicability of HRV features to wrist-worn wearable devices equipped with heart rate sensors that are nowadays becoming more and more popular in people’s daily life. This study is focused in particular on the the two most used HRV parameters, i.e., the standard deviation of inter-beat intervals (SDNN) and the root Mean Squared error of successive inter-beat intervals differences (rMSSD). The huge problem of extracting these HRV parameters from wrist-worn devices is that their data are affected by the motion artifacts. For this reason, estimating the error caused by this huge quantity of missing values is fundamental to obtain reliable HRV parameters from these devices. To this aim, we simulate missing values induced by motion artifacts (from 0 to 70%) in an ultra-short time window (i.e., from 4 min to 30 s) by the random walk Gilbert burst model in 22 young healthy subjects. In addition, 30 s and 2 min ultra-short time windows are required to estimate rMSSD and SDNN, respectively. Moreover, due to the fact that ultra-short time window does not permit assessing very low frequencies, and the SDNN is highly affected by these frequencies, the bias for estimating SDNN continues to increase as the time window length decreases. On the contrary, a small error is detected in rMSSD up to 30 s due to the fact that it is highly affected by high frequencies which are possible to be evaluated even if the time window length decreases. Finally, the missing values have a small effect on rMSSD and SDNN estimation. As a matter of fact, the HRV parameter errors increase slightly as the percentage of missing values increase.


Introduction
Heart Rate Variability (HRV) is a physiological phenomenon of fluctuation between adjacent Inter-Beat Intervals (IBI) [1,2]. The heart beats (HR) average and HRV are regulated by two main mechanisms. First, they are controlled by Autonomic Nervous Systems (ANS), which affect the activity of the sinoatrial node (SAN) [3][4][5]. In particular, ANS controls SAN by two signaling pathways that permit increasing a decrease in HR by sympathetic and parasympathetic nervous systems stimulations, respectively [2]. The second mechanism that regulates HR and HRV is the coupled-clock system which drives the automaticity of human SAN pacemaker cells even without neural input [3,6]. Aging and clinical condition (e.g., cardiovascular disease) affect both ANS and SAN responses inducing an alteration in HR and HRV [2,3]. For this reason, monitoring HR and HRV response permits assessing possible deterioration of cardiovascular responses. In particular, abnormal

Data
In this paper, we used the Multilevel Monitoring of Activity and Sleep in Healthy people (MMASH) dataset [29][30][31] providing 24 h of continuous Inter-Beat Intervals data (IBI), triaxial accelerometer data, sleep quality, physical activity, and psychological characteristics (i.e., anxiety status, stress events, and emotions) for 22 healthy young males (age = 27.29 ± 4.21 yrs; height = 179.91 ± 8.22 cm; weight = 75.05 ± 12.79 kg). Participants' anthropomorphic characteristics (i.e., age, height, and weight) were recorded at the start of the data recording. Moreover, participants filled in questionnaires that provide information about participants' psychological status, i.e., chronotype, anxiety status, and sleep quality. During the 24 h of the data recording, participants wore two devices that continuously recorded heart response (Polar H7 heart rate monitor-Polar Electro Inc., Bethpage, NY, USA) and the actigraphy data (ActiGraph wGT3X-BT-ActiGraph LLC, Pensacola, FL, USA). Moreover, the perceived moods were recorded at different times of the day and the daily stress was provided before sleeping in order to summarize the individual's stressful events of the day. Finally, twice a day (i.e., before going to bed and when they woke up), the subjects collected saliva samples at home in appropriate vials in order to assess Melatonin and Corisol saliva concentration. More details about the experimental setup of the MMASH dataset are provided in the data descriptor paper written by Rossi et al. and published on MDPI Data journal in 2020 [29].
In this study, we used only IBI data continuously recorded over 24 h on 22 healthy young males by a Polar H7 chest strap (Polar Electro Inc., Bethpage, NY, USA). This device is a Bluetooth low energy chest strap with an ECG sensor that provides valid information of inter-beat intervals (IBI) [32]. Participants wore the Polar H7 chest strap for 24 h (between 9:00 a.m. and 9:00 p.m. on the next day), and were instructed to wear the chest straps during both day (during physical activities too) and night. For the aim of this study, the IBI timeseries of each participants were split into 5 min time windows in order to compute gold standard HRV parameters. Moreover, to assess the effect of length of the time window on HRV estimation, ultra-short time windows (from 4 min to 30 s) were selected in each 5 min time window, and the HRV parameters were computed. Finally, we introduce missing values to assess their effect on HRV features. More details about the dataset preprocessing and the HRV estimation were provided in Sections 2.2 and 2.3, respectively. Finally, the data analysis details are provided in Section 2.4.
In accordance with the Helsinki Declaration as revised in 2013, the study was approved by the Ethical Committee of the University of Pisa (#0077455/2018).

Data Preprocessing
First of all, by using the Python hrv-analysis library (https://pypi.org/project/hrv-analysis), 2.18 ± 1.29% of RR-intervals in the dataset were detected as ectopic beats (i.e., disturbance of the cardiac rhythm frequently related to the electrical conduction system of the heart) or missing values induced by motion artifacts. These missing values were reconstructed via quadratic interpolation applied on the time domain, i.e., the heartbeats timestamp, instead of the duration domain, i.e., the duration of the heartbeats, as suggested by Morelli et al. [23].
In order to compute gold standard SDNN and rMSSD HRV features, the participants' 24 h RR-intervals time series were split into 5 min windows (6130 5 min time windows were extracted). Moreover, to detect the influence of ultra-short HRV features in each 5 min window, ultra-short windows (4 min, 3 min, 2 min, 1 min and 30 s) were randomly selected.
Finally, in order to simulate motion artifacts observable during IBI recording by wrist-worn devices equipped with PPG sensors, artificial missing values (i.e., 5%, 10%, 15%, 30%, 50%, and 70%) were created for each 5 min and ultra-short time windows in accordance with a random walk Gilbert burst model which simulates burst-error with a two-state Markov chain (i.e., good as 0 and bed as 1) [33] as showed in Morelli at al.'s paper [23].

HRV Parameters
HRV is usually calculated from 5 min IBI time series [34]. For the aim of our study, we focused on the two most popular time-domain HRV measurements: • SDNN refers to the standard deviation of IBI. It estimates overall power spectrum of IBI timeseries. The SDNN is defined as the "gold standard" to assess both morbidity and mortality in the population [2]. • rMSSD is the root mean square of the successive IBI differences estimates short-term components of HRV [35].
These two HRV metrics are computed in each 5 min window and in each ultra-short window with and without missing values.
Moreover, to better investigate the effect of missing values in the ultra-short HRV parameters, the power spectrum of the IBI time series was computed. The power spectrum S xx ( f ) of a time series x(t) describes the distribution of power into frequency components (ω) composing that signal. The minimal frequency that is possible to observe for each ultra-short time series was defined as a ratio between 1 and its length expressed in seconds (1/s), while the maximum is set at 0.4 in accordance with the literature. In order to derive the Power Spectrum Density (PSD) of the time series x(t), the Lomb-Scargle Periodogram was used instead of Fourier transformation because the inter-beats intervals (IBI) are not uniformly distributed [36,37]. VLF (power in very low-frequency ranges, i.e., ≤0.04 Hz), LF (power in low-frequency ranges, i.e., 0.04-0.15 Hz), HF (Power in high-frequency ranges, i.e., 0.15-0.4 Hz), and total power (Power in all the frequency ranges, i.e., ≤0.4) were obtained by the integration of the power in the relevant frequency range in the spectrum as showed in Equation (1). Finally, the ratio between LF and HF (LF/HF) was also computed:

Data Analysis
The relationship between "gold standard" HRV parameters computed in a 5 min window and ultra-short HRV parameters with and without missing values are assessed by Pearson's correlation coefficient. Values higher than 0.8 reflect a strong correlation. The estimation error between HRV parameters compute on 5 min and in ultra-short time window with and without missing values were computed by the Root Mean Squared Error (RMSE). The magnitude of the difference was assessed by Cohen's d effect size (ES). It is computed as the ratio between the mean difference between HRV gold standard parameters and ultra-short ones, and the pooled standard deviation as showed in Equation (2). ES < 0.2, (0.2-0.5], (0.5-0.8], >0.8 are considered trivial, small, medium, and large effect size. For example, if the ES is equal to 0.04, the two groups' means differ by 0.04 standard deviations, and this difference should be considered trivial: Bland and Altman plot (BA)was assessed in order to quantify the agreement between two quantitatives. Due to the fact that a huge number of Bland-Altman plots are required to describe all the comparisons, we decided to not insert all of these plots. We have resumed the results of BA plots by Bias, i.e., mean ± standard deviation of the differences, and systematic error, i.e., Pearson's correlation coefficient between difference and means, of ultra-short HRV parameters with and without missing values compared to "gold standard" HRV parameters.
Finally, Spearman's rank correlation coefficient was computed in order to assess the relationship between HRV features.
All the data preprocessing, feature extraction and data analyses are conducted by using the Python 3 programming language.

Results
Descriptive statistics of all the participants' psycho-physiological features provided in MMASH dataset are provided in Rossi et al.'s data descriptor paper [29].
Tables 1 and 2 report the descriptive statistics and statistical results of SDNN and rMSSD for all of the windows and missing values percentage, respectively. Small ES and moderate correlation (i.e., ES < 0.38, r > 0.69; see Table 1) are detected for ultra-short SDNN values with a time window longer than 2 min showing a bias lower than 10.70 ms (RMSE < 26.84 ms) without a statistically significant systematic error. On the contrary, moderate ES (i.e., ES > 0.45) is detected for time windows shorter than 1 min. On the contrary, rMSSD shows a trivial ES and moderate/strong correlation (i.e., ES < 0.20, r > 0.75; see Table 2) for almost all of the time windows showing a bias lower than 4.83 ms (RMSE < 21.47 ms) with a small systematic error (not statistically significant). In particular, only 70% of the missing values in 1 min and 30 s windows show small ES and moderate correlation (i.e., ES > 0.20, r < 0.62) with a bias higher than 5.37 ms (RMSE > 24.01 ms), while other time series lengths with a different percentage of missing values show trivial ES. Table 1. Descriptive statistics and statistical analysis of SDNN for all of the time window and missing values. The statistical analysis is focused to compare 5 min HRV parameters to ultra-short ones. * refers to small ES of ultra-short HRV features compared to 5 min ones, respectively. r, RMSE, and ES refer to the correlation, the Root Mean Square Error, and the effect size between the 'gold standard' and the ultra-short with and without missing values on SDNN, respectively. Bias refers to mean ± standard deviation of the difference of ultra-short HRV parameters compared to 'gold standard' HRV parameters (i.e., 5 min time window), while systematic error is the Pearson's correlation coefficient between means and differences between 5 min and ultra-short HRV estimation.  Table 2. Descriptive statistics (mean ± standard deviation) and statistical analysis of rMSSD for all of the time window and missing values. The statistical analysis is focused to compare 5 min HRV parameters to ultra-short ones. ** and * refer to the trivial and small ES of ultra-short HRV features compared to 5 min ones, respectively. r, RMSE, and ES refer to the correlation, the Root Mean Square Error and the effect size between the 'gold standard' and the ultra-short with and without missing values on rMSSD, respectively. Bias refers to mean ± standard deviation of the difference of ultra-short HRV parameters compared to 'gold standard' HRV parameters (i.e., 5 min time window), while systematic error is the Pearson's correlation coefficient between means and differences between 5 min and ultra-short HRV estimation.  Figure 1 compares the power spectrum density (PSD) of the 5 min time series and the ultra-short ones with missing values. The lower the time series length, the lower the very low frequencies (VLF) that are possible to evaluate. In particular, frequencies lower than 0.004 (1/240 s), 0.006 (1/180 s), 0.008 (1/120 s), 0.017 (1/60 s), and 0.03 (1/30 s) Hz are not possible to evaluate for 4 min, 3 min, 2 min, 1 min, and 30 s, respectively. However, even if the time series decrease, it is possible to assess both LF and HF. Due to the fact that rMSSD reflects the HF (a strong correlation was detected between rMSSD and HF in 5 min time series- Table 3), it is possible to accurately estimate rMSSD even if the length of the time series decreases due to the fact that all HF could be assessed. On the contrary, missing very low frequencies negatively affect SDNN estimation as the length of the time series decreases. In particular, the moderate correlation detected between SDNN and Total power (Table 3) demonstrates that short time series do not permit assessing all the frequencies in the power spectrum negatively affecting the SDNN estimation.  In line with the results found in Tables 1 and 2 for SDNN and rMSSD, respectively, Table 4 shows trivial and small ES for LF and HF for all of the ultra-short time series with almost all of the missing values (except for 30 s time series with 70% of missing value). Moreover, trivial and small ES for time series longer than 1 min and of 4 min were detected for total power and VLF, respectively. In particular, only total power of 4 min time series with all of the missing values percentage and 3 min time series with 0% of missing values show a trivial ES. Additionally, Table 4 and Figure 1 show that VLF and LF are underestimated as the percentage of missing values increases, while HF is overestimated. For these reasons, the length of the time series and the missing values strongly affect LF/HF HRV parameters. As a matter of fact, it is possible to estimate LF/HF values only in 4 min time windows without missing values where LF and HF are accurately detected. Table 4. Descriptive statistics (mean ± standard error of the mean) of VLF, LF, HF, LF/HF , and Total power of the power spectrum density for all of the time window and missing values. The statistical analysis is focused to compare 5 min HRV parameters to ultra-short ones. ** and * refer to the trivial and small ES of ultra-short HRV features compared to 5 min ones, respectively.

Discussion
HRV is a widely used tool in many research areas, e.g., cardiovascular, preventive medicine, and sport [38][39][40][41][42][43][44], due to its non-invasive and reliable characteristics that permit assessing ANS and SAN activation. Nowadays, long-and short-term recordings (i.e., 24 h and 5 min, respectively) are considered the more reasonable options for HRV analysis. However, in real-world applications, shorter IBI recordings than 5-min are required which permit the mobile devices to instantaneously display results avoiding any possible misleading insight induced by motion artifacts. Therefore, representative studies are required in literature to provide reference values for short-term HRV analysis with missing values. The main finding of this study is that our results are in line with the ones reported for other previous studies [25][26][27][28]. As a matter of fact, we found that rMSSD requires at least 30 s to obtain reliable values (trivial effect size; Table 2). On the contrary, SDNN requires IBI timeseries longer than 2 min to have reliable results (small effect size; Table 1). Moreover, the novel results provided in this study are that rMSSD is more highly affected by missing values compared to SDNN (i.e., 0.12 ± 0.04 and 0.04 ± 0.03, respectively) showing, however, a small increment in effect size from all of the time series lengths as their percentage of missing values increase.
To truly estimate SDNN, we need precise information from the whole power spectrum, i.e., from VLF to HF [45]. As a matter of fact, we have found a moderate/strong relationship (r = 0.73) between total power and SDNN obtained from 5 min time series. In particular, SDNN is more correlated with VLF than with LF and HF (Table 3), and the power of VLF (49.92%) is larger than the power of LF (31.88%) and HF (18.20%) by several orders of magnitude (Table 4) resulting in a smaller contribution of HF and LF to SDNN than VLF [45]. Hence, the small ES and RMSE higher than 22.88 ms between SDNN obtained from 5 min time series and ultra-short ones are induced by the fact that it is not possible to evaluate all the VLF required to estimate SDNN (Figure 1). For example, in 2 min IBI time series, it is possible to evaluate the power of frequencies higher than 0.008 losing the information of frequencies between 0.0033 and 0.008. For this reason, SDNN shows a bias of about −9.51 ± 23.28 ( RMSE = 25.15) with a small effect size (ES = 0.34). For these reasons, SAN is lower reflected in ultra-short SDNN estimation compared to 5-min time window ones. Moreover, a very small change in ES (0.04 ± 0.03) between 0 to 70% of missing values in all the time windows demonstrated that SDNN is accurate even if motion artifacts affect the IBI time series. Hence, SDNN is more affected by the length of the time window than the missing values that could be detected throughout the IBI time series.
On the contrary, rMSSD is more sensitive to high-frequency (HF) fluctuations [13,45] compared to LF, VLF, and total power. As a matter of fact, strong correlation is detected between rMSSD and HF obtained in 5 min IBI time series (r = 0.80, Table 3). Hence, even if the length of the IBI time series decreases, it still possible to accurately assess the HF. As a matter of fact, Beak et al. [26] have found that rMSSD and in particular HF calculated using ultra-short IBI time series could be a good surrogate to those calculated using standard 5-min intervals. For this reason and as shown in Table 2, rMSSD is found to be accurate for all the IBI timeseries without missing values (ES < 0.11, RMSE < 15.13 ms, r < 0.79 and bias < −2.34 ms). For all of the time windows, ES shows a small increment with the missing values up to 50% (0.05 ± 0.02), while an increase of about 0.15 ± 0.03 for rMSSD estimated with 70% of missing values. However, trivial ES was observed for almost all the time windows with 70% of missing values. As a matter of fact, Table 4 and Figure 1 show trivial ES in HF for up to 50% of missing values for all the time windows, while a small ES was detected for time windows with 70% of missing values shorter than 1 min. On the contrary, missing values and the length of the time windows strongly affect the ratio between HF and LF. In particular, only a 4 min time window without missing values permits accurately estimating LF/HF HRV parameters.
These results highlight the fact that approaches to interpolating missing values are not required for rMSSD, but they are needed for SDNN estimation as also suggested by Morelli et al. [23]. As a matter of fact, rMSSD captures fast changes in heart activity (HF), while SDNN is an index of the entire power spectrum and in particular of VLF (Tables 3 and 4). For this reason, interpolation methods that act as low pass filters affecting the power spectrum density of IBI time series have a negative impact on rMSSD, while a small effect on SDNN [23]. On the contrary, no interpolation approach introduced white errors, thus minimizing the impact on IBI successive differences that were the first computation step of rMSSD. Moreover, due to the fact that SDNN can be estimated from the entire power spectrum [45], the estimation of missing frequencies is required to improve accuracy of SDNN. Future works are scheduled in order to assess the effect of interpolation approaches and missing frequencies' estimation from known ones. Moreover, due to the fact that low cost wearable devices equipped with heart rate sensors are becoming more and more accurate in IBI recording, the reduction of noisy data could be the right choice to improve the quality of HRV parameters' estimation. However, this paper suggests that, even if the IBI time series contain a large amount of noise, a small error is expected in the estimation of both SDNN and rMSSD. In conclusion, wrist-worn low-cost devices could be a good surrogate of a good quality simple ECG even if they could have a huge quantity of missing values.
A limitation of this study is that the participants are all young males. Future works could be performed in order to detect if individual characteristics such as gender, age, and individual's habits could affect the estimation of ultra-short SDNN and rMSSD with and without missing values. Moreover, due to the fact that SDNN24 is an important HRV parameter to detect people's health status, an investigation is required to assess the effect of missing values on this HRV parameter.

Conclusions
As showed in previous studies, it is possible to accurately estimate rMSSD and SDNN by using 30 s and 2 min time windows, respectively. In particular, in this paper, we highlight the fact that ultra-short IBI time series do not permit assessing all the VLF increasing the bias on SDNN. Moreover, to the best of our knowledge, this is the first study that evaluates the effect of missing values on ultra-short HRV data. We found that the missing values have a low effect on both SDNN estimations, while having a moderate effect on rMSSD. In particular, the higher the missing values, the higher the bias in both rMSSD and SDNN showing trivial and small magnitudes of the differences, respectively.