1. Introduction
Heart rate variability (HRV) is the physiological phenomenon of variation in the time interval between heartbeats [
1]. Typically, HRV is estimated using electrocardiography (ECG). R peaks, the upward deflections in ventricular depolarization complexes in ECG [
2] are detected, and the distances between consecutive R peaks, called RR intervals, are analyzed. There is a substantial body of research that detects R peaks in ECG using various signal processing methods [
3], as well as time series classification tools such as interval feature transformation [
4].
Photoplethysmography (PPG) is a non-invasive and low-cost optical measurement technique that provides important information about the cardiovascular system [
5,
6]. PPG tracks blood volume changes in peripheral blood vessels by illuminating the skin and measuring changes in light absorption. In practice, PPG signals are often collected using wrist-worn devices with optical sensors, such as smartwatches [
7,
8], or using a smartphone camera attached to a user’s finger [
9,
10] (we will refer to such signals as smartphone PPG). PPG signals are used to estimate heart rate [
7,
8,
10,
11], as well as blood oxygen saturation, blood pressure [
6,
10], etc.
Research [
12,
13,
14,
15] shows that, if a PPG signal is collected from a healthy subject at rest, then the intervals between consecutive major peaks in PPG can serve as a substitute for RR intervals in ECG for various heart rate variability metrics. At the same time, PPG signals collected during or right after exercise or from a patient with cardiovascular disease can have peak-to-peak intervals that differ significantly from RR intervals in ECG and cannot be used for HRV estimation [
12,
15].
In practice, PPG signals collected in an uncontrolled environment often suffer from measurement artifacts that can corrupt sufficiently long parts of the signals and make these parts unsuitable for reliable peak detection.
Figure 1 shows three common types of smartphone PPG signals that we encounter in practice:
While it is easy to perform well on signals like
Figure 1a, a reliable algorithm must also distinguish corrupted parts like
Figure 1c to exclude them from the analysis. For the signal parts such as
Figure 1b, a reliable algorithm should choose peaks corresponding to cardiac cycles rather than the noise in the signal.
Several algorithms were developed for peak-to-peak interval detection in PPG using slope analysis [
16,
17], automatic multiscale peak detection [
18,
19], neural networks [
20], adaptive threshold peak detection [
21,
22]. Existing algorithms filter out erroneous intervals using outlier detection based solely on the lengths of detected intervals. In our algorithm, we propose to analyze the signal during the detected intervals to estimate their reliability. A real-time algorithm for peak detection and signal quality estimation in smartphone PPG was proposed in [
9]. As a real-time algorithm, it was restricted in computational complexity and the variety of methods that can be used. Therefore we propose a new offline algorithm that uses continuous wavelet transform (CWT).
CWT is a tool that provides a representation of a signal in the time-scale domain. It is used for time-frequency localization [
23] and pattern matching [
24]. CWT was successfully used for the analysis of ECG signals [
25], electroencephalogram (EEG), and other time series data [
26]. Peak detection using continuous or discrete wavelet transform was performed in various contexts in [
24,
27,
28,
29,
30,
31,
32].
The main features of CWT that we use in the proposed algorithm are the ridge lines, i.e., curves consisting of points of local maxima at fixed scales in the time-scale domain. Using the correspondence between peaks in the signal and the ridge lines in CWT with the Mexican hat mother wavelet (see
Section 2.3), our algorithm uses the ridge lines lengths to detect the peaks in PPG that correspond to the heartbeats. The algorithm also uses the shape of the ridge lines and signal self-similarity characteristics as features to identify corrupted parts in PPG signals. We describe the proposed algorithm in detail in
Section 3.
We evaluate the accuracy of the proposed algorithm on three different publicly available datasets: Welltory-PPG-dataset, TROIKA [
7] and PPG-DaLiA [
8]. Note that, since PPG signals during exercise cannot be used for HRV estimation according to [
12,
15], for validation using TROIKA [
7] and PPG-DaLiA [
8] we only used parts of the signals that were collected before any labeled physical activity. Welltory-PPG-dataset is a new publicly available dataset that we introduce in this paper. It consists of smartphone PPG measurements with simultaneously collected ground truth RR intervals from the Polar H10 chest strap device. We introduce this dataset, since most publicly available PPG datasets (such as TROIKA [
7] and PPG-DaLiA [
8]) are collected using wrist-worn devices and are aimed at heart rate detection during physical activity, and the existing smartphone PPG dataset [
33] consists of 10-second signals which are too short for a reliable HRV assessment. We describe the dataset structure and data collection process in
Section 2.1.1.
While CWT is used for various signal types, including ECG, EEG, etc., it has not been used before for peak detection in PPG. In contrast with existing PPG analysis algorithms, our algorithm filters out unreliable peak-to-peak intervals by detecting corrupted parts in the signal. Our algorithm demonstrated good accuracy in basic HRV metrics on three publicly available datasets containing PPG from different sources. Thus, we conclude that the algorithm should generalize well to various PPG signals of different origins, be robust to signal corruption, and can be able to be used for reliable HRV estimation from PPG signals collected in an uncontrolled environment.
The paper is organized as follows. The datasets used for algorithm validation are described in
Section 2.1. HRV metrics used for accuracy estimation are described in
Section 2.2.
Section 2.3 contains a discussion about CWT and the ridge lines. A detailed description of the proposed algorithm is given in
Section 3.
Section 4 contains tables with HRV metrics estimation errors on the used validation datasets.
Section 5 provides additional discussion. It contains a justification of the methods used in the proposed algorithm, its limitations, and its comparison with previously developed algorithms. Moreover, it contains a justification of the choice of the ground truth labels in Welltory-PPG-dataset.
Section 6 contains concluding remarks.
3. Proposed Algorithm
In this section, we describe the proposed algorithm. In the rest of the paper, we will refer to peaks in PPG that correspond to heartbeats as R peaks for brevity. Peak-to-peak intervals will be referred to as RR intervals. We will use the CWT of the PPG signal with the Mexican hat function as the mother wavelet as the main tool.
In the first step of the algorithm, we perform signal preprocessing that addresses the two most common problems: abrupt shifts, or steps in signals and signals becoming almost constant due to hardware issues. A detailed description of preprocessing steps is given in
Section 3.1. Then we perform R peak detection. To detect R peaks in the signal, we will identify their corresponding ridge lines in the CWT. Our algorithm for R peak detection is based on two principles that must hold for non-corrupted PPG signals collected from healthy subjects at rest:
Therefore, we use a 2-step process for R peak detection. First, we estimate the heart rate from the PPG signal using the short-time Fourier transform spectrogram of the signal. Then, the algorithm uses the estimated heart rate as additional information to choose ridge lines corresponding to R peaks. Informally speaking, the algorithm chooses the most persistent ridge lines that arise in the CWT at a frequency corresponding to the estimated heart rate. Finally, the locations of R peaks are predicted as positions of the chosen ridge lines at the smallest scale. A detailed description of the R peak detection algorithm is given in
Section 3.3.
After the R peaks are detected, we consider the corresponding RR intervals. We evaluate the quality of detected RR intervals to identify and discard RR intervals found in corrupted parts of the signal. Since a non-corrupted PPG signal is almost periodic, the quality estimation is based on two considerations:
A signal must have similar shapes inside detected RR intervals;
The ridge lines in CWT that define the edges of a RR interval must have similar shapes. In particular, the distances between them should be approximately the same on different scale levels.
We assign to each RR interval its quality score based on the principles above and then discard the RR intervals with quality scores below an automatically determined threshold. We give a detailed description of the quality score calculation and the choice of the quality threshold in
Section 3.4.
Figure 3 shows the proposed algorithm workflow for a single channel PPG:
Once the algorithm finishes, we estimate the overall quality of the signal as the following ratio:
where
is the total number of RR intervals detected by the R peaks detection part of the algorithm and
is the number of detected RR intervals that were discarded by the filtration part of the algorithm.
In some cases, valleys in the PPG signals can be easier to locate than peaks. Thus, as a final step, we apply our algorithm to both the PPG signal and the negative of the PPG signal and obtain two sets of detected RR intervals. Then we choose the result that has the smallest discarded ratio as the algorithm output.
For a multi-channel PPG signal (e.g., smartphone PPG containing values in the red, green, and blue channels of the video frames captured by a smartphone camera), we choose the best channel as the one that has the smallest discarded ratio. Note that research [
34] suggests that channels with shorter wavelength should have the best signal to motion ratio. We do not make an a priori choice of the channel to avoid issues with device-specific color representations.
3.1. Signal Preprocessing
Most smartphone cameras provide frames at a rate of 30 frames per second. To increase accuracy, we interpolate the signal to a uniformly sampled sequence with a sampling frequency
Hz, as the upsampling is necessary for accurate HRV estimation [
17]. Most cameras produce red, green, and blue channel values in the standard range (0, 255). If the PPG signal is given in another range, we rescale it to the standard range. Signal preprocessing aims to identify the following common issues with the signal:
Step detection. In smartphone PPG signals, removing from or reapplying the finger to the camera results in abrupt steps in the signal. To detect such steps, we compute the running amplitude with a window length of 1 s. If the running amplitude exceeds 4 times the median of the running amplitude, a step in the signal is detected. The threshold value 4 was chosen empirically by examining a number of examples.
Constant signal detection. Sometimes the signal becomes constant if there are issues with color rendering in the frame or there is no finger over the camera at all. Analysis of examples shows that is a reliable threshold value for running signal amplitude to detect a constant signal.
Parts of the signal labeled as having signal steps or constant signal are set to zero. Then, we split the signal into continuous chunks between the labeled parts and set the chunks shorter than 2 s to zero. For every remaining chunk, we remove the trend by subtracting the running average in 2 s windows and apply a low-pass filter with a 10 Hz threshold to avoid aliasing and remove high-frequency noise.
3.2. Heart Rate Estimation
In this subsection, we describe the algorithm for heart rate estimation from a PPG signal. Our approach is similar to the TROIKA [
7] in that we are estimating heart rate by constructing a continuous curve consisting of peaks in the spectrogram columns. Our context is different from one in [
7] in two aspects: firstly, we are interested in PPG collected at rest, so we expect motion artifacts associated with occasional movement, rather than intensive physical exercise considered in [
7]. Secondly, we do not collect accelerometer data to estimate the subject’s movements. Therefore we use the following approach:
First we filter the spectrogram using a convolution with a 2d filter that highlights curves with a bounded rate of change;
We consider local maxima in the columns of the filtered spectrogram;
We find a rough estimate of heart rate frequency and construct a continuous curve consisting of local maxima that are closest to the estimate.
We describe our approach in detail below. Suppose that
is a discrete signal obtained by a uniform sampling with frequency
f from a function
with a discrete step
:
3.2.1. Sliding Window Spectrogram
As a feature, we use the signal spectrogram computed with a sliding Dirichlet window of 5 s in length. The spectrogram is computed as the squared magnitude of the short time Fourier transform (STFT) for frequencies
,
uniformly sampled between
and
Hz, with stride =
s:
While peaks in a spectrum of a single window may be associated with measurement artifacts, the heart rate corresponds to a continuous curve in the spectrogram. An example of the spectrogram is given in
Figure 4.
As we can see in
Figure 4, the heart rate frequency in that example gradually grows from
Hz at the beginning to
Hz at the end. The curve is clearly seen for the first 30 s and, after that, it becomes smudged between 30 and 50. To simplify the curve detection we convolve the spectrogram with a 2d filter shown in
Figure 5, which highlights continuously evolving curves on the spectrogram:
3.2.2. Local Maxima in the Spectrogram Columns
After the filtration, let us consider the points of local maxima in the columns of the spectrogram. We will construct the curve associated with heart rate frequency by following these points of local maxima in a continuous manner. To find the initial point on the curve, we need to choose between the local maxima in the first column of the spectrogram. In some cases, the point of the highest local maximum does not necessarily correspond to the actual heart rate. Therefore, we find a rough estimate of the heart rate and then choose the peak that is closest to this estimate. To find a rough estimate, we consider a bandpass filtration of the PPG signal with a narrow bandwidth and count the number of peaks and the number of zero crossings in the filtration. For more in detail, we use Algorithm 1. It takes
i, an index of the spectrogram column as an input, and returns the frequency of a peak in the
i-th spectrogram column, such that this frequency is the closest to the rough heart rate estimate.
Algorithm 1 Rough estimate of heart rate frequency in the i-th spectrogram column |
- 1:
-th column in spectrogram - 2:
▹ start time of the i-th sliding window with stride s - 3:
- 4:
part of the signal between and seconds - 5:
frequency corresponding to the maximum value of s - 6:
ifthen - 7:
- 8:
else ifthen - 9:
- 10:
else - 11:
- 12:
end if - 13:
bandpass filtration of x with cutoff frequencies and - 14:
number of times crosses the zero level - 15:
number of local maxima in - 16:
- 17:
- 18:
- 19:
frequencies corresponding to 3 highest peaks in s - 20:
return
|
Now we construct a curve in the spectrogram plane showing heart rate frequency during the measurement. First, we choose the initial frequency using Algorithm 1 for the 0-th column of the spectrogram. Then we construct the continuous curve following along local maxima in columns of the spectrogram. If at some step we cannot proceed continuously, then we perform Algorithm 1 again to find the point of local maximum in the spectrogram column that is closest to the rough estimate. Along the way, we apply exponential smoothing to the obtained curve for a more robust estimate. In more detail, we use the Algorithm 2:
Algorithm 2 Construction of the heart rate frequency curve |
- 1:
number of columns in spectrogram - 2:
output of Algorithm 1 for the 0-th column - 3:
fori in range(1, n) do - 4:
frequencies corresponding to 3 largest local maxima in the i-th column of spectrogram - 5:
- 6:
if then - 7:
- 8:
else - 9:
output of Algorithm 1 for the i-th column - 10:
- 11:
end if - 12:
end for
|
Now let us demonstrate the work of Algorithm 2.
Figure 6 shows the filtered version of the spectrogram of
Figure 4, the curves consisting of local maxima in the columns, and the heart rate frequency curve constructed by Algorithm 2. It correctly identifies the heart rate frequency change during the measurement.
3.3. R Peak Detection
3.3.1. Signal Scalogram and Ridge Lines
Recall our notation for the signal
that is uniformly sampled from function
with frequency
f:
where variable
t represents time measured in seconds. Let
denote the Mexican hat wavelet function given from Equation (
5). Consider scales
evenly spaced on a logarithmic scale:
Define the scalogram of the signal
as the finite sum approximation to the integral that defines CWT of the function
in Equation (
4):
The chosen scales range from to This range was chosen empirically to accurately reflect the position of R peaks on the smallest scale and provide enough smoothing to remove noisy peaks on the large scale.
So the scalogram is a discretization of the CWT, thus the rows of the scalogram represent similarities of the signal with a typical peak shape on the corresponding scales. The ridge lines in CWT will correspond to ridge lines in the
plane consisting of points of local maxima in the scalogram rows
Note that convolution with the scaled wavelet performs bandpass filtration, with wider bandwidth for smaller scales and narrower bandwidth for larger scales, as we can see in
Figure 7. Thus, we can consider rows of the scalogram as filtration, or smoothing of the initial signal, and the ridge lines indicate locations of peaks in the signal at different levels of smoothing. The R peaks in the signal are the more persistent peaks, which exist at different levels of smoothing, and therefore R peaks are tips of longer ridge lines that are present on a larger scale range.
Figure 8 shows a typical example that illustrates this idea:
As we can see in
Figure 8, there are more ridge lines in the upper part of the scalogram, corresponding to noisy peaks in the signal, but we are able to distinguish ridge lines that correspond to R-peaks by choosing the longer lines and using our previous estimates of heart rate as a heuristic that suggests how often the actual ridge lines are expected to appear on the scalogram plane. In the next section we will discuss in detail the Algorithm 3 that chooses the ridge lines that correspond to R peaks.
3.3.2. Choosing Ridge Lines Associated to R-Peaks
Recall that we defined scalogram in Equation (
10) as a matrix of shape
For a ridge line
and a scale
denote by
the time coordinate of the point on
with scale level
(if such a point exists):
For a given set of ridge lines
sorted according to their top points
define their RR intervals as the distances between the top points:
We will use these RR intervals to determine which curves to choose. First, we choose all the ridge lines that stretch from the bottom to the top of the scalogram. These are the curves corresponding to the most persistent peaks in the signal that are likely to be R-peaks. Some of the remaining curves are added to the set of chosen curves based on the likelihood of observing the corresponding set of RR intervals given the previously estimated heart rate frequency.
We estimate the likelihood of a set of potential RR intervals as follows. According to [
35], the length of RR intervals can be modeled as Inverse Gaussian distribution. In practice, the Inverse Gaussian distribution can be approximated by log-normal distribution [
36], thus we may assume that logarithms of RR intervals are normally distributed, so the average log-likelihood of a set of RR intervals given the heart rate frequency is proportional to
where
and
the expected heart rate frequency during interval
, and
is the expected length of RR interval. We use this estimate of likelihood to finally choose the ridge lines associated to R-peaks, as described in Algorithm 3:
Algorithm 3 Choosing ridge lines in the scalogram |
- 1:
estimated heart rate frequency given by Algorithm 2 - 2:
ridge lines with lowest point below the scale level and highest point over the scale level - 3:
ridge lines with lowest point between the scale and and highest point over the scale level . - 4:
Sort by the height of the lowest point in the increasing order - 5:
for in do - 6:
- 7:
set of intervals between curves from - 8:
= set of intervals between curves from - 9:
= - 10:
= - 11:
if then - 12:
- 13:
end if - 14:
end for - 15:
return
|
3.4. RR Intervals Filtration
PPG signals often contain corrupted parts where no accurate R-peak detection is possible, so these parts must be discarded to ensure robust HRV estimation. To identify such parts we use the following method. After we have detected RR intervals, we assign to each detected interval a number between 0 and 1 (its quality) that reflects the likelihood that the found interval is accurate and reliable. Then a quality threshold is determined and the RR intervals with the quality below the threshold are discarded.
To develop a quality estimate, we use the following principles:
Parts of the signal inside the neighbor intervals must have similar shapes and amplitudes;
The distance between ridge lines defining the neighbor R-peaks must be approximately the same on different resolution levels.
Let us consider the example shown in
Figure 9 to illustrate the principles above:
As we can see, in the corrupted part of the signal starting from 35 s, the signal parts inside RR intervals often can be weakly correlated, their amplitudes can be significantly different, and also the corresponding ridge lines do not repeat the same shape, and they bend in different ways, so the distance between varies on different scale levels more than it does for ridge lines in non-corrupted part of the signal. In the following discussion, we will describe our algorithm that filters out the RR intervals in the corrupted parts of the signal. For the signal shown in
Figure 9 it produces the result shown in
Figure 10:
To define the quality of the RR intervals, we introduce two auxiliary quality measures: similarity-based quality and CWT-based quality. In the following discussion, we denote by the sigmoid function.
3.4.1. Similarity-Based Quality
Let
and
be two signal chunks. Let
and
denote
x and
y resampled to vectors of size 50. Define the correlation similarity between
x and
y as the correlation coefficient between
and
, limited by 0 from below:
Let
and
denote the amplitudes of signals
x and
y. Suppose that
is greater than
and denote their quotient by
. Define the amplitude similarity as
The formula (
14) is chosen empirically to assign low values to pairs of signals where amplitudes are different by 4 times and more. Finally, define the similarity score between
x and
y as the geometric mean of its correlation similarity and amplitude similarity:
Now, if
is a sequence of detected RR intervals with R-peaks located at indices
, then the similarity quality of the
i-th RR interval
is defined as the geometric mean of its similarities with the neighbor RR intervals (using a single neighbor for
or
):
3.4.2. CWT-Based Quality
Suppose that
and
are two ridge lines. Consider a sequence
for all
where both
and
are defined. Let
be the standard deviation of distances between points on
and
measured in milliseconds. Now suppose that ends of some
intervals were detected using curves
and
in the scalogram. Then we define the CWT-based quality of the interval
as
3.4.3. Filtration
Suppose
is the sequence of detected RR intervals. Define the quality of each of the RR intervals as the product of similarity- and CWT-based quality:
Now we filter intervals by quality using an automatically determined threshold as follows:
The initial rigid threshold of is chosen empirically based on a number of signals examined. The final choice of the threshold value allows us to cut off the intervals that have substantially smaller quality than the rest of the intervals while trying to maximize the number of remaining intervals.
3.4.4. Outlier Detection
In addition to quality-based filtration, we apply outlier detection in the sequence of RR intervals based on their lengths. We use the following algorithm similar to one introduced in [
37]. Suppose that
is the sequence of detected RR intervals lengths. We then mark certain RR intervals as outliers and discard them from the final answer using Algorithm 4:
Algorithm 4 Algorithm to check if the i-th interval in is an outlier |
- 1:
= 10-, 50-, and 90-percentile of - 2:
- 3:
- 4:
if - 5:
and - 6:
and then - 7:
mark and as outliers - 8:
end if - 9:
if and and then - 10:
mark and as outliers - 11:
end if - 12:
ifthen - 13:
mark as an outlier - 14:
end if - 15:
ifthen - 16:
mark as an outlier - 17:
end if
|
4. Results
We tested our algorithm’s accuracy on three datasets: Welltory-PPG-dataset, TROIKA, and PPG-DaLiA, using two HRV metrics: SDNN defined by Equation (
1) and RMSSD defined by Equation (
2). For each PPG signal and a sequence of reference RR intervals, we compute the following quantities:
Discarded ratio: ratio of the number of RR intervals that were discarded by the filtration part of the algorithm to the total number of RR intervals detected by the algorithm peak detection part, cf. Equation (
6).
SDNN ae (RMSSD ae): the absolute error of estimation of SDNN (RMSSD), i.e., the difference between SDNN (RMSSD) derived from the sequence of intervals detected by the algorithm in PPG, and SDNN (RMSSD) derived from the sequence of reference RR intervals.
RR mae: the mean absolute error in RR interval detection, i.e., the mean absolute difference between an interval detected by the algorithm and the corresponding reference RR interval.
The values of SDNN ae, RMSSD ae, RR mae are given in milliseconds.
Table 1 shows the error values for samples in Welltory-PPG-dataset:
Recall that the TROIKA dataset contains PPG from subjects during treadmill exercises with simultaneous ECG. To verify the accuracy of our algorithm, we chose parts of the PPG data collected in the first 30 s of each PPG signal when the subjects were at rest according to the dataset description. Reference RR intervals were derived from ECG signals using a simple algorithm detecting local maxima of a given minimum height and manually verified for accuracy.
Table 2 shows the error values for samples in the TROIKA dataset:
The PPG-DaLiA dataset contains simultaneous PPG and ECG signals during annotated physical activity. For our analysis, we used parts of the signals with activity marked as “sitting”. This gives a signal of approximately 10 minutes for each subject. As in the previous dataset, reference RR intervals were derived from ECG signals using a simple algorithm detecting local maxima of given minimal height and manually verified for accuracy. For each subject, we cut the signal into non-overlapping 100-second segments. For each segment, we applied our algorithm and calculated the discarded ratio, SDNN ae, RMSSD ae, and RR mae metrics. The mean values of the calculated errors per subject are presented in
Table 3. Note that the ECG readings for subjects 8 and 12 contain ectopic beats. Those beats were excluded from calculations of reference SDNN and RMSSD values.