Artifact Correction in Short-Term HRV during Strenuous Physical Exercise

Heart rate variability (HRV) analysis can be a useful tool to detect underlying heart or even general health problems. Currently, such analysis is usually performed in controlled or semi-controlled conditions. Since many of the typical HRV measures are sensitive to data quality, manual artifact correction is common in literature, both as an exclusive method or in addition to various filters. With proliferation of Personal Monitoring Devices with continuous HRV analysis an opportunity opens for HRV analysis in a new setting. However, current artifact correction approaches have several limitations that hamper the analysis of real-life HRV data. To address this issue we propose an algorithm for automated artifact correction that has a minimal impact on HRV measures, but can handle more artifacts than existing solutions. We verify this algorithm based on two datasets. One collected during a recreational bicycle race and another one in a laboratory, both using a PMD in form of a GPS watch. Data include direct measurement of electrical myocardial signals using chest straps and direct measurements of power using a crank sensor (in case of race dataset), both paired with the watch. Early results suggest that the algorithm can correct more artifacts than existing solutions without a need for manual support or parameter tuning. At the same time, the error introduced to HRV measures for peak correction and shorter gaps is similar to the best existing solution (Kubios-inspired threshold-based cubic interpolation) and better than commonly used median filter. For longer gaps, cubic interpolation can in some cases result in lower error in HRV measures, but the shape of the curve it generates matches ground truth worse than our algorithm. It might suggest that further development of the proposed algorithm may also improve these results.


Introduction
Heart rate variability (HRV) analysis focuses on studying changes in intervals between heart beats. This analysis can be a useful tool to detect not only underlying heart issues, but also general health problems of both physical and psychological nature, including sports performance [1], cardiovascular events [2], stroke [3], and depression [4]. Presently, HRV parameters are used in exercise training to set optimal training loads and improve the performance of the person [5,6]. The most commonly used parameters are: low-frequency (LF) and high frequency (HF) modulation of R-R interval changes, discrepancies of 20 ms at a single interval, through single extra long or short interval, to multiple missing intervals. Artifacts were corrected by deletion and linear, cubic, or splice interpolation. Artifact correction implemented in Kubios HRV software was also evaluated. Kubios HRV [39] is a heart rate variability analysis software with built-in artifact detection and correction. It is commonly used for HRV analysis, in particular in medical and sports literature.
In their comparison study, Giles and Draper underlined that discarding sections with artifacts leads to bias in further analysis and even the smallest artifacts should be corrected to ensure correct calculation of HRV measures. At the same time, approaches to artifact correction vary greatly and sometimes are not even clearly specified in existing literature. The lack of standard practice in collection and processing of HRV data makes comparison between studies difficult and sometimes impossible.
To compare the approaches to artifact correction, the authors calculated a set of HRV measures, including SDNN (standard deviation of N-N intervals), RMSSD (root mean square differences of successive R-R intervals), SD1 and SD2 (standard deviations of Poincare scattergram), and low to high frequency ratio. Exercise intensity, in particular over 60% of VO2max, proved to have a significant impact on HRV measures. Linear interpolation often resulted in the lowest bias in HRV measures, but required manual detection of artifacts as other methods. Kubios was able to automatically detect most of artifacts, but did not reduce bias in HRV measures. Based on this study, no conclusion was reached regarding maximum number of artifacts that can be considered acceptable, but the authors suggested that numbers exceeding four should be treated with a particular caution.
In several studies, it has been confirmed that deletion technique gives best results in case of time domain analysis, however it influences significantly frequency domain parameters [40][41][42]. Interpolation methods of HRV artifact correction cause increase in LF and VLF components [42,43]. Research conducted by Soler et al. [40] suggests that the method having smallest influence n time and frequency domain parameters of HRV is nonlinear predictive interpolation (NPI).
Alcantara et al. [21] in 2020 evaluated impact of different levels of thresholds for artifact correction and how they impact quantification of HRV. Processing of HRV data was performed in Kubios. They discovered that choice of threshold level for applied filters can lead to clinically relevant differences in statistical analysis both in time and frequency domain. Moreover, optimal filter choice might also depend on the age of study subjects and intensity of the activity. Younger subjects require weaker filters in general, while higher intensity activities require stronger filters. As a result, filter choice becomes especially difficult in large studies with a variety of participants and dynamic (at will) activity level.

NEEDED Project
The basis for this work is a 2018 North Sea Race Endurance Exercise Study (NEEDED). It collected broad spectrum of data from 59 participants during a recreational 91-km long bike race. Mean age of participants was 50, and 46 participants were male. A CT scan was performed and showed early stages of atherosclerosis among 25 participants. All participants used the same continuous HRM (Garmin Forerunner 935, paired with Garmin HRM chest strap), and 40 of them had power meters (Stages TM ). All data were downloaded in the raw format and processed using in-house software. This process, including details for handling typical data issues, was described by Wiktorski et al. [44]. The dataset is private.
One of the unique aspects of this study is a possibility to use natural variations in the course of the race as a basis for an analysis of artifact correction at different work-loads and external conditions introducing mechanical noise. In the NEEDED study all participants follow the same course, but they choose their own pace. A steep hill after roughly two thirds of the course was selected for in-depth analysis; it is visualized in Figure 1. Due to covered distance, each participant experiences a fair level of exhaustion when approaching the hill. At the same time, the hill is steep enough to elicit maximal or close to maximal effort. It is followed by a short flat period and a descent. These provide additional information as participants perform differently depending on their strategies, health and fitness. 57 1. Altitude profile (gray-shaded area), heart rate, speed, and power for an example participant approaching and descending the hill, which is analyzed in the paper.

Typical Filter Setup
For the purpose of this work, we inspected HRV data of 23 participants of NEEDED project that had complete recordings of race data without major data quality problems. Such problems could include data missing for more than 10 s, unexplained stops, or inconsistent distance registration. Two specific issues were frequently present: sudden spikes in beat-to-beat interval duration (usually exceeding 300 ms over surrounding average level) and missing data points where empty values were recorded. These two issues are consistent with artifacts classification in existing literature. It is important to note that, while peaks and empty periods occurred in HRV recordings, the HR data remained continuous and also unaffected, as seen in Figure 1. There are two likely reasons for this discrepancy. Firstly, the HRM monitor automatically corrects HR data to produce a clean recording suitable for further analysis without preprocessing. Secondly, HRV data are stored separately from all other data (speed, distance, HR, etc.) in the FIT file (https://www.thisisant.com/developer/ant/ licensing/flexible-and-interoperable-data-transfer-fit-protocol-license), which could lead to some unintended misalignment.
HRV analysis libraries, e.g., Python's HRV (https://pypi.org/project/hrv/) or R's RHRV (https: //cran.r-project.org/web/packages/RHRV/vignettes/RHRV-quickstart.html), include a set of basic and advanced filters to address common problems in HRV data. Collection of HRV data during a sports event might naturally lead to more frequent and larger artifacts due to movement of the HR sensor. We found that it was usually necessary to apply two or more filters in combination and manually experiment with filter parameters to remove artifacts effectively.
In most cases, a combination of a median filter and Kubios-inspired threshold-based cubic interpolation was effective at removing the artifact while preserving, at least visually, the general shape of beat-to-beat interval curve. Single spaced-out artifacts (not exceeding 1 artifact per 50 s of recording) were removed by a combination of median filter with width 3 or 5. Such artifacts include Points 1, 2, and 3 in Figure 2. When multiple artifacts were clustered together, such as Points 4 and 5, a combination of median filter with width 5 (or in some cases 7) and Kubios-like interpolation with threshold 5 was necessary.
However, such combination of filters was not enough to remove artifact towards beginning or end of the recording, such as Point 6 in that figure. Moreover, longer gaps, such as Points 1, 2, 3, and 4 in Figure 3, remained after filtering while peaks were removed. Such gaps should not be filled with simple interpolation due to the dynamics of heart rate.  In our experience and considering existing literature, it is currently necessary to perform individual inspection and manual experimentation with filter types and parameters. This fact becomes a key problem with growing amount of data coming from HRMs with continuous beat-to-beat data recording and in an end-user scenario where an analytic algorithm is to be implemented on a smart watch or in a smart phone application.

Filters' Impact on HRV Measures
An additional, sometimes overlooked, issue is that application of filters usually has a noticeable but also occasionally unpredictable impact on HRV measures. It might be particularly visible when filters are applied as a blanket solution in cases when manual inspection is impractical or infeasible.
To demonstrate this, we chose a set of participants without visible artifacts; applied typical filters to a short period of HRV data; and then compared typical HRV measures before and after filtering. This way the ground truth is known and can be used to measure changes filters introduce. Should we choose data with pre-existing artifacts, we would not have the necessary information to perform such an evaluation.
Based on the inspection of data from hill ascent and descent we observed that the ascent part is prone to many artifacts. It is not surprising since it is one of the most strenuous periods of the race. During the rest on the top of the hill, there were the fewest artifacts, with many participants without any. However, it is not representative of a typical active exercise condition. The descent part still exhibits many artifacts, but it was possible to identify several participants without artifacts. At the same time, this part can be considered representative of an active exercise in a sense that an activity is present and is not fully predictable.
In Tables 1-3, we present values for RMSSD (root-mean square differences of successive R-R intervals), SDNN (standard deviation of N-N intervals), SDSD (standard deviation of successive R-R interval differences), Total Power (over all frequency bands), LF (low-frequency modulation of R-R interval changes), HF (high frequency modulation of R-R interval changes) and SD1 and SD2 (standard deviations of Poincare scattergram). These cover a mix of time domain, frequency domain, and non-linear analysis of HRV. Usually, RMSSD, SDSD, and SD1 are reduced by a factor of 3 by the time median filter of width 7 was applied. This factor increases approximately linearly with growing width of the filter. However, for some participants who otherwise seem similar in terms of values of measures, population data, etc., we see reduction just by 25%. A more advanced Kubios-inspired threshold filter with cubic interpolation does not cause such changes, but it cannot rectify many artifacts. Many gaps and sudden peaks due to missing beats will remain; and in particular longer holes are not filled at all.
It means that adding a filter might have an impact on interpretation of HRV information even when applying the same filter consistently to potentially similar participants. This problem is not new and a need for more research and standardization in filter application is mentioned in the review works discussed in Section 1.

Proposed Method
In this section, we introduce two algorithms that address preservation of HRV measures during typical HRV artifact correction procedures -peak correction and gap filling.
In Algorithm 1, we present an approach for correcting single peaks in HRV. In principle, the approach can be applied to peaks of any size; however, available literature suggests that peaks exceeding local average beat-to-beat duration by a factor of 3 or 4 should be treated with additional care. The information on how the average value is procured is often omitted in publications or not given with enough detail to allow replication. Usually, it is calculated by either: (1) global mean or median for very short HRV samples; or (2) local sliding-window mean or median filtering for longer samples. The window length is a user-specified parameter provided as an input to the filtering process.
Input to the algorithms consists of HRV data in the form of list of beat-to-beat durations of length N; values are usually recorded in milliseconds. The result is a corrected list of beat-to-beat durations, of length N+, which will be longer than the input list. The actual change in the length of the list will depend on the amount and size of detected peaks, a value which is reflected in the variable total_new.
On Line 1, existing beat-to-beat durations are transferred from input to output variable, as for now without any corrections. On Line 2, we detect peaks, obtaining a list of tuples with location of each peak and its estimated magnitude with respect to surrounding data. Description of the peak detection algorithm detect_peaks is omitted as it can be implemented simply as a threshold-based or value discontinuity-based identification of local extrema. Such algorithms are available in common data analytic libraries, for example in SciPy (https://docs.scipy.org/doc/scipy/reference/generated/ scipy.signal.argrelextrema.html).
On Line 3, total_new variable is reset; it will hold a total number of added beats when correcting the detected peaks. On Line 4, we start iterating over each detected peak. We check first if the height of the peak est_beats does not exceed surrounding values by more than 3 on Line 5, and, if it does, the area is changed into a gap on Line 13 to be filled by a more advanced algorithm. Total number of beats added is updated on Line 14. HRV_BB_corr := HRV_BB; peak := detect_peaks(HRV_BB); total_new := 0; for peak in peaks do if peak est_beats ≤ 3 then mean_val := (HRV_BB [peak loc -1] +HRV_BB [peak loc +1] )/2; HRV_BB_before := HRV_BB [peak loc -2· peak est_beats : peak loc ) ; HRV_BB_after := HRV_BB (peak loc : peak loc +2· peak est_beats ] ; sd := SD(HRV_BB_before ∪ HRV_BB_after); HRV_BB_corr [peak loc +total_new : peak loc +total_new+peak est_beats -1) := normal(mean_val, sd, peak est_beats ); total_new :+ peak est_beats -1; else HRV_BB_corr [peak loc +total_new : peak loc +total_new+peak est_beats ) := NaN; total_new :+ peak est_beats -1; end end Further, on Line 6, expected value of the beat-to-beat duration is estimated based on the surrounding area. Loc parameter refers to location of the peak in the list HRV_BB. On Lines 7-9, values of estimated standard deviation are calculated based on periods before and after the detected peak, taking into account periods of time equal to 2 multiples of estimated peak height. Finally, beat-to-beat durations are generated using a normal distribution. The amount of generated values is based on the output of peak detection algorithm-effectively an integer multiple of average of surrounding beat-to-beat durations-and mean value and standard deviation as calculated on Lines 6-9. It is important to notice that on Line 10 indices in the result variable HRV_BB_corr are adjusted for the changing length with each detected peak; the variable total_new that maintains the correction sum is updated on Line 11.
In Algorithm 2, we present an approach to filling gaps that occur either naturally or as a byproduct of removing larger peaks-as in Algorithm 1. Input to the algorithm consists of HRV data in the form of list of beat-to-beat durations, usually in milliseconds, and HR data in the form of a time-ordered list of equisampled, instantaneous pulse values, usually in beats per minute. Such HR data are recorded independently of HRV by major continuous HRMs. An important observation is that sum of all the values of HRV_BB is equal to the total time for which HR is recorded, as represented in Equation (1). In principle, these two values should be identical, but, in practice, small misalignments might occur. These misalignments do not accumulate with time so any error will be minor and independent of the length of input. The result of the algorithm is a corrected list of beat-to-beat durations, which in this case is of the same length as the input. On Line 1, existing beat-to-beat durations are transferred from input to output, as for now without any corrections. On Line 2, we detect gaps, obtaining a list of tuples with location of each gap and its width. We choose to omit a description of the gap detection algorithm detect_miss as it effectively amounts to identification of non-number values in the HRV_BB list. Such detection is usually a built-in feature of most data analysis frameworks such as Pandas or R.
On Line 4, we start iterating over each gap, and, on Lines 4-7, standard deviation and beat-to-beat durations are calculated for periods before and after the gap. A time period double that of the gap size is taken into account on each side. On Lines 8-13, reference points for locating relevant periods in HR list are calculated, translating indices between HRV beat-to-beat durations to HR time recordings based on the dependence from Equation (1).
Having determined the reference points, missing beat-to-beat distances can be on Lines 14-22. For each beat, we calculate how far it is from the beginning and the end of the gap, on Lines 15 and 16, respectively. Then, estimated beat-to-beat duration and standard deviation are calculated using basic interpolation on Lines 17 and 18. The estimated beat-to-beat duration allows us to identify an estimated new location in the HR input list and retrieve the respective value.
This HR value is then used, on Line 19, to calculate a corrective scale factor that will adjust estimated beat-to-beat duration for any non-Linear changes to heart rate. Heart rate slowing down or accelerating is an import factor that influences HRV beat-to-beat durations in particular over longer gaps. Should the heart rate simply change linearly between beginning and end of the gap, this factor will be equal to one. On Line 20, a preliminary beat-to-beat duration value is generated using normal distribution based on the interpolated values, and, on Line 21, it is adjusted with the corrective scale factor.

Results
We implemented Algorithms 1 and 2 presented in the previous section in Python using Pandas and SciPy, and then performed proof-of-concept tests on the examples presented in Section 3. For this purpose, we also extended the cubic spline interpolation implementation. We demonstrated earlier that the existing implementation did not manage to fill in longer gaps, despite no such limitation should theoretically exist. It is necessary to underline that the presented results are an indication of the potential usefulness of this concept, rather than an ultimate proof. The main reason is the relatively small sample used for testing and shortage of standardized open datasets; we discuss this problem further in Section 6.
The algorithms were tested on real data. To verify the performance of the proposed algorithms, five types of artifacts were manually introduced to raw HRV signals ( Figure 4): (1) peak of the size of double mean value of the signal and width of a single sample; (2) peak of size of triple mean value of the signal and width of a single sample; and (3) gaps of size 3, 5. and 7 as undefined samples in the signal. These artifacts were introduced in random parts of the raw signal. Parameters of HRV signal from time and frequency domain were calculated for the raw signals and compared with the parameters calculated for signals initially contaminated with manually introduced artifacts and filtered using reference and proposed methods.  Data were acquired from two databases: recordings from NEEDED project and signals from PhysioNet simultaneous physiological measurements [45,46]. From NEEDED project, HRV signal and Heart Rate signal, recorded using Garmin Forerunner 935, paired with Garmin HRM chest strap, were used in the testing process. From PhysioNet database, HRV signal and Heart Rate signal, recorded using Polar RS800 Multi paired with chest strap with flat electrodes made of conductive rubber, were used in the testing process. The part of the signal related to 5 min uphill walking on the treadmill (15% track inclination, 1.2 m/s) was extracted for processing by Algorithms 1 and 2. Algorithm 1 was applied to eight recordings from NEEDED database that did not contain spike artifacts in the original signal and to 13 signals from PhysioNet database. In Tables 4-6, we present results of correction of randomly introduced single peaks for NEEDED database. Results for PhysioNet database are given in Tables 7 and 8. Peaks that have an approximate size two and three times the surrounding average value are considered. We compare our approach from Algorithm 1 with the earlier used median filter with width 3 and Kubios-like interpolation with threshold value 5. Tables 9 and 10 present summary of mean error calculated for the selected HRV parameters for all analyzed signals. Our approach provides good results with error at level of 2%, much better than the median filter (16% on average) and on par with the interpolation approach. However, the interpolation approach is not able to remove larger peaks (with amplitude four or five times the surrounding average value) or series of smaller ones. In contrast, our approach would also work for larger spikes, of values above three times the mean of the signal if necessary. Since our method is nondeterministic, all experiments were repeated 50 times and average values are presented in Tables 4-13. Standard deviation values ranged from 2 × 10 −7 to 1.3 × 10 −1 for NEEDED database. In the case of PhysioNet data, the SD values vary from 0.003 for time-domain and nonlinear parameters up to 11.51 for frequency-domain parameters as they reach values above 2000. We compare our approach with cubic spline interpolation (which is commonly used for such purpose, e.g., by Kubios). The values of mean error are presented in Tables 16 and 17. The results are fairly similar. Our method usually introduces a smaller error for small gaps, but can introduce larger differences for longer gaps, especially for frequency-based HRV measures. However, if we inspect the generated data visually, the shape of the HRV beat-to-beat durations reconstructed using the proposed algorithm is more similar to the original than in the case of spline interpolation. Several examples are presented in Figure 5 for a gap of three beats, in Figure 6 for a gap of five beats, and in Figure 7 for a gap of seven beats for NEEDED project and Figures 8 and 9 for PhysioNet database. The plots for 23 participants of the NEEDED project are presented in Appendix A and plots generated for signals from PhysioNet database are presented in Appendix B. For all signals from NEEDED project, the curves generated for filling the gaps by Algorithm 2 overlapped as the SD of the signal was very low, at the level of 0.03. In the case of signals from PhysioNet database, there are significant fluctuations visible in Figures A9-A21, which is caused by much larger values of SD for these signals, at the level of 20. We can notice that sometimes there is a small shift in time in the data generated by the proposed approach, which might be a source of increased error.

Discussion and Conclusions
Many HRV measures are sensitive to data quality and filtering is usually applied to combat the problem. At the same time, there is a consensus in existing literature that HRV measures are influenced by applied filtering to an extent to that produced clinically relevant differences in data interpretation. There is also lack of standardization and openness about types of filters applied and parameters used.
The existing manual approaches to filter selection and artifact correction in HRV data are not sustainable with proliferation of Personal Monitoring Devices capable of continuous recording of beat-to-beat HRV data. To address these problems, we propose two algorithms for automated artifact correction that were designed to have a minimal impact on HRV measures and to handle more artifacts than existing solutions.
In this paper, we present the algorithms and the studies to validate their initial versions using datasets. One collected during a recreational bicycle race and another one in a laboratory setting. Our findings suggest that the proposed algorithm can correct more artifacts than existing methods without a need for manual support or parameter tuning. At the same time, the error introduced to HRV measures for peak correction and shorter gaps is similar to the best existing solution (Kubios-inspired threshold-based cubic interpolation) and better than commonly used median filter. For longer gaps, cubic interpolation can in some cases result in lower error in HRV measures. At the same time, in this case, we observe that the shape of the curve reconstructed by our algorithm matches ground truth better than cubic interpolation. It might suggest that further development of the algorithm may also improve these results.
One of the challenges in HRV research is a shortage of standardized and open datasets. Most of the open HRV datasets are short in duration, collected during rest, and using ECG equipment. A search for heart rate variability data in PhysioNet [46] reveals only five results, of which after further inspection only two are to some extent relevant. The first one contains a computer model to generate data. The second one [45] contains real data collected with a consumer-level device and became available only during the review stages of this paper. Validation based on this dataset is included and consistent with earlier results. The problem of lack of open datasets and general standardization has been noticed by other authors, as mentioned in Section 2.
We plan further development of algorithm to improve results in filling longer gaps. The current version of the algorithm uses beat-to-beat distances and heart rate as inputs. Newer PMDs are now frequently capable of recording breath rate, which has an impact on HRV. We plan to investigate how these data could be used to improve accuracy of the algorithm; they might have a particular impact for long gaps. We also plan an extended verification by identifying new relevant subsections of NEEDED dataset and more participants from the new dataset made available recently on PhysioNet.    Figure A14. Participant 6 with multiple gaps in HRV beat-to-beat duration before filtering and after filtering (PhysioNet database).        Figure A20. Participant 12 with multiple gaps in HRV beat-to-beat duration before filtering and after filtering (PhysioNet database).