Dual Wavelength Photoplethysmography Framework for Heart Rate Calculation

The quality of heart rate (HR) measurements extracted from human photoplethysmography (PPG) signals are known to deteriorate under appreciable human motion. Auxiliary signals, such as accelerometer readings, are usually employed to detect and suppress motion artifacts. A 2019 study by Yifan Zhang and his coinvestigatorsused the noise components extracted from an infrared PPG signal to denoise a green PPG signal from which HR was extracted. Until now, this approach was only tested on “micro-motion” such as finger tapping. In this study, we extend this technique to allow accurate calculation of HR under high-intensity full-body repetitive “macro-motion”. Our Dual Wavelength (DWL) framework was tested on PPG data collected from 14 human participants while running on a treadmill. The DWL method showed the following attributes: (1) it performed well under high-intensity full-body repetitive “macro-motion”, exhibiting high accuracy in the presence of motion artifacts (as compared to the leading accelerometer-dependent HR calculation techniques TROIKA and JOSS); (2) it used only PPG signals; auxiliary signals such as accelerometer signals were not needed; and (3) it was computationally efficient, hence implementable in wearable devices. DWL yielded a Mean Absolute Error (MAE) of 1.22|0.57 BPM, Mean Absolute Error Percentage (MAEP) of 0.95|0.38%, and performance index (PI) (which is the frequency, in percent, of obtaining an HR estimate that is within ±5 BPM of the HR ground truth) of 95.88|4.9%. Moreover, DWL yielded a short computation period of 3.0|0.3 s to process a 360-second-long run.


Introduction
Multi-diagnostic wearable devices are of ongoing interest due to their ability to store and transmit information about the wearer inexpensively and efficiently. Many wearable sensors employ photoplethysmography (PPG), a low-cost optical technique used to detect blood volume changes in the microvascular bed of tissues [1]. This technique enables noninvasive detection of the cardiovascular pulse wave generated by the elastic nature of the peripheral vascular arteries excited by the quasi-periodic contractions of the heart [2][3][4]. PPG signals are used in pulse oximeters-devices that measure the light absorbed by functional hemoglobin (oxygenated and deoxygenated hemoglobin) and produce vital signs such as heart rate (HR) or peripheral capillary oxygen saturation (SpO 2 ) (an estimate of the arterial oxygen saturation (SaO 2 ) [5]). In order to obtain a PPG signal, light is typically shone through the skin and its reflection is captured by a photo-detector. In this study, we serially collect the reflection of light at two different wavelengths, namely, green and infrared (IR).
In the presence of substantial human motion, the quality of the measured PPG signal deteriorates [6]. Much effort has been exerted to suppress motion artifacts in order to extract high-quality vital signs from noise-contaminated PPG signals [3,[7][8][9][10]. This study contributes to this effort.
There are two main sources of motion artifacts that could contaminate a PPG signal collected from a human in motion [9]. The first source of noise is the sensor displacement relative to its original point of contact with the skin. This displacement could alter the path of light, and hence modify the signal collected by the photo-detector [11]. The second source of noise is skin and tissue deformations caused by the sensor's movement.
Zhang et al. [9] proposed an HR calculation method that uses a dual-wavelength sensor that comprises an IR and a green PPG signal. The IR PPG signal was employed to develop a noise source that was used to denoise the green PPG signal from which an HR level was extracted.
The HR calculation algorithm presented in [9] was tested on "micromotion artifacts" such as "finger tapping" and "fist opening and closing". In the current study, we examined the applicability of a related approach for more substantial movements and dynamic scenarios. Motivated by the sensor architecture proposed in [9], we expanded the HR calculation technique to high-intensity full-body repetitive "macro-motion" exercise data. The resulting Dual Wavelength (DWL) method collects green and IR PPG data from a dual-wavelength wrist unit and processes them to estimate the participant's heart rate. The performance of DWL was documented in an extensive motion experiment involving fourteen (14) human participants. There were three separate experiments. In the first (SNR experiment), we used all fourteen (14) participants. In the second experiment (wrist-based heart rate calculation), we used eleven (11) participants due to sensor failure on three of the participants. In the third experiment (palm-based heart rate calculation), we used twelve (12) participants due to sensor failure on two of the participants. Figure 1 shows the essentials of the DWL method. It consists of five (5) stages; 1. Preprocessing, 2. Motion-artifact detection, 3. Motion-artifact frequency components identification, 4. Denoising, and 5. Heart rate estimation. The inputs to the DWL system are green and IR PPG channels measured from a wrist-unit constructed for this study (see Section 2.1). The output is an HR level. First, the green and IR PPG signals are normalized by dividing the signal's AC component by its DC component. We then check whether significant motion noise is present in the PPG signals (Section 2.3.2). If the signals appear noise-free, the normalized green PPG signal is directly used to calculate an HR value. If the signals appear noise contaminated, we then extract the noise components from the IR PPG signal. These noise components are removed from the noisy green PPG signal. We employ a Cascading Adaptive Noise Cancellation (C-ANC) architecture that uses a QR-decomposition-based least-squares lattice (QRD-LSL) algorithm [12] to denoise the green PPG signal before it is used for HR calculation. A separate decision mechanism validates the HR estimate, and corrects it when noise levels are excessively high to produce a meaningful estimate. The rest of this paper is organized as follows. In Section 2, we present the materials and methods we employ in this study. Section 2.1 describes the experimental settings along with the sensors' suite. In Section 2.2, we use experimental data to present the rationale for choosing the IR PPG signal as noise reference signal. Section 2.3 introduces the DWL framework; a method for (1) denoising the green PPG using the noise components extracted from an IR PPG signal, and (2) computing HR levels. Lastly, in Section 2.4, we review alternative HR calculation methods that use auxiliary sensors as a noise source, namely accelerometers. These methods are TROIKA [7] and JOSS [8]. Section 3 presents the results of the DWL framework. In Section 3.1, we define the performance metrics used to compare the performance of the DWL method to that of our implementations of TROIKA and JOSS. In Section 3.2, we compare the actual performance of the DWL method to that of our implementations of TROIKA and JOSS. The comparison is made with respect to (1) the heart rate ground truth computed from an electrocardiography (ECG) signal, and (2) the heart rate levels obtained using TROIKA and JOSS. Section 3.3 validates the DWL framework by testing its performance on experimental data collected from the palms (instead of wrists) of the same participants during a second run (validation run). In Section 4, we conclude that the DWL method provides several desirable features, including the following: (1) the DWL framework uses only PPG signals; auxiliary signals (such as accelerometers used by TROIKA and JOSS) are not needed and (2) the DWL framework appears to exhibit high accuracy and lower computational burden in the presence of motion artifacts as compared to TROIKA and JOSS.

Materials and Methods
In this section, the materials and methods employed in this work are presented. In Section 2.1, we present the sensors used for data collection. We also describe the exercise protocol followed during data collection. In our framework, noise components are extracted from an IR PPG signal. In Section 2.2, we show using experimental data, the rationale behind the choice of IR PPG signal as a reference noise source. In Section 2.3, the DWL framework is introduced and described in detail (along with its five (5) stages, namely, pre-processing, Motion-artifact detection, Motion-artifact frequency components identification, Denoising, and Heart rate estimation). We compare the performance of the DWL method to alternative HR calculation methods that use auxiliary sensors as a noise source, namely accelerometers. These methods are TROIKA [7] and JOSS [8]. In Section 2.4, we present the framework of these two alternative HR calculation methods.

Experimental Protocol and Sensors Suite
We conducted a high-intensity full-body exercise experiment where we collected PPG, electrocardiography (ECG), and tri-axial accelerometer data. Accelerometers measured accelerations in three orthogonal directions X, Y, and Z, simultaneously [13]. Readings were obtained from fourteen (14) human participants while they were standing or running on a split-belt instrumented treadmill (Bertec Corp., Columbus, OH) [14]. First, a multiwavelength wrist oximeter unit was strapped around the participant's wrist. The wrist unit encloses two green LEDs (of wavelength λ G = 520 nm) and two IR LEDs (of wavelength λ IR = 940 nm), as well as a photo-detector. Additionally, a tri-axial accelerometer sensor was placed on the participant's arm (right above the PPG wrist-unit) and secured in place using athletic tape. Lastly, an ECG sensor was mounted onto the participant's chest using adhesive electrodes. Athletic tape was wrapped around each participant's chest to ensure the sensor's stability and good skin contact. Table 1 shows all the instruments and sensors used in the experiment. Both ECG and accelerometer data were recorded using the Delsys EMGworks Software. Multi-wavelength PPG wrist-unit data were recorded using an Arduino UNO. All signals were sampled at 100 Hz. Raw data were processed using MATLAB 2022b (Mathworks, Natick, MA) [15]. All raw data are available through the Github repository in [16].
The ECG signal was used to calculate the HR "ground truth" values. We manually labeled the R peaks for all ECG signals. The HR ground truth at time step l, HR GT (l), is obtained using the relationship where δ R−R (l) is the average time difference between each two consecutive R peaks present within the 8-second-long window, at time step l. The experimental protocol we followed during data collection was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of the New Jersey Institute of Technology (protocol code 2108010504; approved on 14 September 2021). All participants were physically fit, healthy, and athletic volunteers. Each participant was asked to run on a treadmill following an exercise profile that comprises six (6) stages. At each stage, the treadmill speed-hence the exercise intensity-was varied as follows: •

Infrared PPG Signal as Noise Reference Signal
According to [9], IR PPG signals are more affected by motion artifacts than green PPG signals. To verify this behavior in our experiment, we calculated the signal-to-noise (SNR) ratios for both the green and IR PPG signals. The SNR is defined as SNR(in dB) = 10 log 10 P desired signal P noise , where P desired signal and P noise are the power of the participant's heart rate component and motion artifact components, respectively. In order to calculate an SNR value, the desired and noise signal components should be identified and separated. At this stage, we used the participant's HR ground truth (obtained from an ECG signal, collected simultaneously with the PPG signals) using Equation (1), in order to determine the desired signal component.
The desired signal and noise components were obtained, respectively, from the green and IR PPG signals. First, the green and IR signals were normalized by dividing their AC component by their DC component. The desired signal component (the component that contains heart rate information) of the normalized PPG signal was obtained by applying two bandpass filters centered at the participant's HR frequency (fundamental frequency) and its second harmonic [9]. During this step the participant's HR was obtained from the ECG signal. The noise component was obtained by subtracting the desired signal component from the normalized signal.
We calculated the SNR values of the green and IR PPG signals for all fourteen (14) participants in the following manner. Every 2 s, the preceding 8-second-long PPG segment was used to obtain an SNR value. In total, each participant had between 175 and 177 SNR values for each PPG signal (green and IR signals). The first and last minute of the collected PPG data were omitted since these data segments were noise-free. SNR values for all participants were grouped together and their distribution is presented in Figure 2 as boxplots [22].
In our experiment, the SNR mean value of the IR PPG, µ IR SNR = −8.5 dB (black dot in Figure 2), was less than the SNR mean value of the green PPG signal, µ G SNR = −4.8 dB (green dot in Figure 2). These results are statistically significant for a level of significance α = 0.01. This difference supports the choice of IR PPG as a noise reference signal using experimental data. Figure 2. SNR values of IR and green PPG signals, respectively, calculated from all fourteen (14) participants. The dots represent the mean value of SNR. The red bars represent the median value of SNR. The red '+' signs represent outliers.

DWL Framework
The proposed DWL framework consists of the following stages (Figure 1), A. Preprocessing, B. Motion-artifact Detection, C. Motion-artifact Frequency Components Identification, D. Denoising, and E. Heart Rate Estimation. The inputs to the system are raw green and IR PPG signals measured using the dual-wavelength PPG wrist-unit sensor (described in Section 2.1). The output is an HR estimate,ĤR(l) at time step l (the initial time step is l = 1). We refer to the average of the latest Z estimates of the heart rate asĤR Figure 3 is a block diagram of the DWL method. The system produced a new estimate of HR at every time step (ĤR(l) at time step l). The time between two subsequent windows in our study was 2 s. In addition, the system produces three search ranges. They are; the "narrow search range", ∆ n (l + 1); the "medium search range", ∆ m (l + 1); and the "wide search range", ∆ w (l + 1). Ranges ∆ m (l + 1) and ∆ n (l + 1), which are used in the motionartifact frequency components identification process of Section 2.3.3, are centered atĤR(l). The range ∆ w (l + 1), which is used in the heart rate estimation process of Section 2.3.5, is centered atĤR (6) (l), the average of the 6 previous heart rate estimates. The ranges satisfy ∆ n (l + 1) < ∆ m (l + 1) < ∆ w (l + 1). Moreover, ∆ n (l + 1) = ∆ m (l + 1) 2 (for details on how we calculated ∆ w (l + 1) and ∆ m (l + 1), see Equations (A2) and (A4) in Appendix A, respectively). Lastly, we calculate a short-term 3-point-average heart rate,ĤR (3) (l), that we provide to the users and employ in Section 3.2 for assessing the performance of DWL.   Figure 4 is an illustration of a typical IR PPG spectrum. The magenta dashed line in Figure 4a is the heart rate estimated at time step l,ĤR(l). The black dotted line in Figure 4b is the average of the 6 previous heart rate estimates at time step l,ĤR (6) (l). In this example,ĤR(l) is 1.5 Hz andĤR (6) (l) is 1.45 Hz. Additionally, we present in Figure 4 the "wide search range", ∆ w (l + 1), as a green dashed rectangle, the "medium search range", ∆ m (l + 1), as a red dashed rectangle, and the "narrow search range", ∆ n (l + 1), as a blue dashed rectangle.

Pre-Processing
First, both green and IR PPG signals are normalized (block A of Figure 3). Normalization is done by dividing the signal's AC component by its DC component [23]. The AC component is obtained by passing the raw PPG signal through a Chebyshev Type II bandpass filter of order 5 and bandpass frequency range from 0.5 to 10 Hz. The DC component is obtained by passing the raw signal through a Chebyshev Type II low-pass filter of order 5 and passband frequency of 0.5 Hz.  ∆ w (l + 1) is illustrated as a green dashed rectangle. The wide search range is centered atĤR (6) (l) (black dotted line) and is used to search, at time step l + 1, forĤR(l + 1).

Motion-Artifact Detection
Motion artifact detection is used to determine whether the PPG signals are contaminated by motion noise (if they are not, we can bypass unnecessary noise suppression operations). The PPG signals go through the following three (3) local detectors to determine if appreciable levels of noise motion are present (block B of Figure 3): Local Detector 1 (D 1 )-Number of Peaks: The number of dominant peaks (whose magnitude exceeds 30% of the maximum peak for this example) in the frequency spectrum of the green PPG signal, denoted N p , is calculated. If N p exceeds two (2), D 1 indicates that the signal is contaminated with motion noise. If N p is 1 or 2, then we conclude that no appreciable motion noise is present, since the frequency of the heart rate and sometimes its second harmonic component are typically observed in the spectrum of a clean PPG signal.
Local Detector 2 (D 2 )-Power of Green Signal: The power of the green PPG signal calculated at the beginning of the experiment (when the participant is at rest) is considered the reference power, denoted P re f . At each time step l, the power of the green PPG, P G (l), is calculated and compared to the reference power P re f . If P G (l) is more than (1 + κ)P re f , D 2 indicates that the green PPG signal is contaminated with motion noise. The amplitude of the PPG signal might change over time [24]. Therefore, the reference power P re f is updated whenever no motion is detected in the system for five (5) consecutive time steps (global detector D 0 return '1'). In this case, the updated value of P re f is set to the power of the green PPG signal calculated at the current time step, l. In this study, we used κ = 0.2.
Local Detector 3 (D 3 )-Pearson Correlation between Green and IR PPG Signals: The correlation between the green and IR PPG signals is also used to assess noise contamination in the green signal. If the correlation between the green and IR PPG signals, ρ green, IR , is below a certain threshold (we used 0.8), then D 3 will decide that the green PPG signal is contaminated with motion noise.
Global Detector-Noise Detector: The decisions of the three local detectors are fed into a global detector that will decide whether the signal is noise contaminated. The global detector is shown as: where "∨" represents the OR logic operator.

Motion-Artifact Frequency Components Identification
If motion artifacts are detected in the normalized green PPG signal, we use the normalized IR signal to build the motion noise component set N noise (block C of Figure 3). N noise can be written as N noise = { f n i |1 ≤ i ≤ N n } where f n i is the i th discrete noise frequency component and N n is the number of elements in the set N noise . The set N noise , which contains all the noise frequency components that we aim to remove from the normalized green PPG signal, is obtained using the following five (5) steps in sequence. The first three steps capture noise with relatively high intensity, usually harmonically related frequency pairs that contaminate the PPG signals. The last two steps compare the IR and green signal spectra to discover additional noise components of reduced-intensity presence in the IR spectrum.
Step 1-Identification of dominant frequency components. First, we capture the dominant frequency components in the spectrum of the normalized IR PPG signal. Those are the frequencies (between 0.5 and 4 Hz) whose magnitude exceeds 50% of the highest peak in the IR PPG spectrum. Figure 5, which is an image that was created for illustration purposes, depicts how we capture dominant peaks from a typical IR signal. In this scenario, the highest peak (which actually corresponds to the participant's HR) is F 1 . Two other dominant peaks are shown as red circles (F 2 and F 3 ). Typically, the peaks captured in step 1 include the frequency of the participant's HR, as well as the frequencies of dominant noise components. We add all of them (F 1 , F 2 , and F 3 in our example) to N noise with the understanding that one of them may correspond to the participant's HR and may therefore need to be removed from N noise later. Step 2-Identification of harmonic frequency components. Noise components created by repetitive motion (e.g., when the participant is walking or running) typically occur in harmonically related pairs [25]. It is possible, however, that the PPG signal contains pairs of harmonically related noise components whose magnitude is smaller than the 50% threshold used in step 1 to identify dominant frequencies.
Step 2 is used to capture pairs of fundamental frequencies and their second harmonics present in the spectrum of the normalized IR PPG signal. Here, we look at all peaks whose magnitudes are above 30% of the highest peak in the IR PPG spectrum. For each such peak, we search for a harmonic at double its frequency. If a pair of harmonically related frequencies is thus discovered, its component(s) that were not flagged in step 1 are added to the noise frequency set N noise . Again, N noise may still contain at this stage a component that corresponds to the participant's true HR. Figure 6 uses the same spectrum shown in Figure 5 to illustrate how a pair of harmonically related components (F A , F B = F 3 ) was discovered. Of this pair, F B was known to us already from step 1 (it is the same as F 3 in Figure 5), and F A , discovered by step 2, is added to N noise . So now, N noise = {F 1 , F 2 , F 3 , F A }. Step 3-Removal of the heart rate from noise set. As mentioned in our setting in Section 2.3, our system creates a new estimate of the heart rate,ĤR(l) at every time step l. A new time step starts every 2 s when l is incremented by 1. Moreover, in step l + 1 we calculate ∆ w (l + 1) (the "wide search range") which is where we search forĤR(l + 1).
Next, frequency components in N noise which we captured during steps 1 and 2, and are close to the heart rate estimated at time step l (ĤR(l)) are removed from N noise , as we suspect they do not represent noise but rather represent the participant's HR. To be precise, at time step l + 1, we remove from N noise all the noise components in the "medium search range" ∆ m (l + 1). Figure 7 continues the examples of Figures 5 and 6 to illustrate step 3. In Figure 7a,b, we show the estimate of the participant's HR at time step l, denotedĤR(l). We also show ∆ m (l + 1), the "medium search range", [ĤR(l) − ∆ m (l + 1)/2,ĤR(l) + ∆ m (l + 1)/2], from which we remove dominant frequencies deposited earlier into N noise . The red squares in Figure 7a represent the frequency components that we obtained from steps 1 and 2 all of which are currently in N noise = {F 1 , F 2 , F 3 , F A }. We now discard the frequency around 1.2 Hz (labeled F 1 ) since it falls in ∆ m (l + 1), the "medium search range" (region represented by a red dashed rectangle in 7). Figure 7b shows (in red squares) the noise frequency components that are left in the noise set N noise = {F 2 , F 3 , F A }. N noise no longer contains the participant's HR.
The next two steps seek additional noise components, often attributed to repetitive movements by the participant, through comparison of the IR and green spectra.
Step 4: Step 4 focuses on instances where the noise set N noise , after step 3, has only one noise component, f n 1 . In this case, we look at the green spectrum. If we find a component at half f n 1 ( f n 1 /2) or twice f n 1 (2 × f n 1 ) in the green spectrum, we add this component to N noise . The only exception is if the component we seek to add falls into the narrow search range, ∆ n (l + 1), aroundĤR(l), [ĤR(l) − ∆ n (l + 1)/2,ĤR(l) + ∆ n (l + 1)/2]; in this case, we refrain from adding it to set N noise .
Step 5: This step addresses spectra that are dominated by vigorous limb swinging by the participant, which may cause displacement of the sensor. In this scenario, the green PPG signal is typically dominated by two high intensity harmonically related noise frequencies which may dwarf the component at the heart rate frequency. If these frequency components are not already placed in N noise after steps 1-3, they are added to N noise at this step. This step is automatically triggered when all the following conditions are met, namely; (a) the IR spectrum contains only one significant frequency component that dominates the spectrum; (b) the green spectrum contains only one pair of significant harmonically related frequencies; and (c) the dominant frequency component present in the IR spectrum matches one of the harmonically related frequencies discovered in the green spectrum.  Figure 7. Frequency spectrum of a typical IR PPG signal.ĤR(l) is the heart-rate estimate at time step l. ∆ m (l + 1) is the "medium search range" represented by a red dashed rectangle. The frequency components we obtained from step 1 and 2, namely, F 1 , F A , F 2 , and F 3 = F B , are represented by red squares. In (a) frequency F 1 falls within ∆ m (l + 1). In (b) we discard the frequency F 1 since it falls within ∆ m (l + 1) and leave the rest in N noise (F A , F 2 , and F 3 = F B ). Figure 8 is a real-life example that illustrates this scenario (signals were collected from participant 10 in our experiment, around time 136 s). We show the spectrum of participant 10's IR signal in Figure 8a and green signal in Figure 8b. We show in magenta the heart rate estimate at time step l,ĤR(l). The green signal captures the high-intensity harmonically related frequency pairs F 1 and F 2 of Figure 8b. The IR spectrum (Figure 8a) is dominated by the frequency F A that is equal to frequency F 2 from the green spectrum, but does not capture a noise component at F 1 . Here, frequencies F 1 and F A = F 2 are put into N noise .
At the end of this stage, the set N noise will contain N n elements that correspond to the noise frequencies we wish to remove from the normalized green PPG signal.

Denoising
Adaptive Noise Cancellation (ANC) filters are often employed to eliminate in-band motion artifacts [26,27]. In-band noise in our case occurs when the spectra of motion artifacts overlap significantly with that of the PPG signal [28]. An ANC filter for our environment would use as inputs (1) a noise contaminated signal, and (2) a noise reference signal. The ANC filter seeks to eliminate the noise components (measured by the reference signal) from the input noise contaminated signal and provide a noise-free version of the input signal.
Motivated by the architecture in [29], we employ a Cascading Adaptive Noise Cancellation (C-ANC) architecture to remove all the elements of the set N noise = { f n i |1 ≤ i ≤ N n } (developed in Section 2.3.3) from the green PPG signal, one element at the time. The block diagram of the proposed C-ANC is shown in Figure 9. We show the frequency spectrum of the input signal in Figure 9 (spectrum A). This is the green signal collected from participant 3 around time 66 s. The spectrum contains three noise frequency components that we wish to eliminate from the signal. The signal collected at the output of the C-ANC (spectrum D in Figure 9) does not contain any of the noise components; only the HR frequency component remained in the spectrum. A total of N n C-ANC were used to remove the noise components of N noise from the green PPG signal. At the i th stage (1 ≤ i ≤ N n ), the noise reference signal is a pure sinusoid of frequency f n i . For instance, the first ANC filter block shown in Figure 9 removes the first noise frequency component f n 1 from the normalized green PPG signal (see spectrum B of Figure 9). The output of the first block is denoted G PPG, 1 . G PPG, 1 is fed to the next block where the second noise frequency component f n 2 is removed (see spectrum C of Figure 9). The process is repeated until all noise components are removed from the normalized green PPG signal. The final output, G PPG, N n , is a noise-free version of the green PPG signal. In the proposed method, the QR-decomposition-based least-squares lattice (QRD-LSL) adaptive filter algorithm was used to remove noise components from the green PPG signal [30]. The method incorporates the desirable features of recursive least-square estimation (fast convergence rate), QR-decomposition (numerical stability), and lattice structure (computational efficiency) [12]. The implementation of the QRD-LSL filter in our study used the built-in MATLAB function "AdaptiveLatticeFilter" [31] with 10 filter taps and forgetting factor of 0.99. Figure 9. Cascading Adaptive Noise Canceler (C-ANC) block diagram. Spectrum A is of the noisecontaminated green PPG signal which is fed to the C-ANC. Spectrum B represents the green PPG signal's frequency spectrum after removal of the first noise frequency component, f n 1 . Spectrum C is obtained after removing a second noise frequency component, f n 2 . At this stage, f n 1 and f n 2 are removed from the input signal. Spectrum D is of the clean green PPG signal. It is obtained at the output of the C-ANC after all noise frequency components were eliminated.

Heart Rate Estimation
In this stage (see block E of Figure 3), the green PPG signal is used to compute an HR value. If no noise was detected in the green PPG (D 0 = 0), then the normalized green PPG is used for heart rate calculation. When noise was detected in the green PPG signal, a HR value is obtained from the denoised green signal (obtained at the output of block D in Figure 3, also shown in Figure 9). The "Heart Rate Estimation" stage comprises two steps, namely, "Initialization" and "Heart Rate Calculation".
Initialization (block E1 of Figure 3). This is a process of capturing a baseline HR at rest. In our experiment, it was a one-minute phase during which participants were asked to remain steady in order to capture noise-free green and IR PPG signals. To calculate the initial HR estimate,ĤR(1) at time step l = 1, we used the frequency spectrum of the normalized green PPG signal.ĤR(1) corresponds to the highest peak within the initial search range 0.5 to 3 Hz (which corresponds to 30 to 180 BPM).
Heart Rate Calculation (block E2 of Figure 3). At time step l + 1, the heart rate calculation method we propose employs the following variables in order to generate an HR estimate,ĤR(l + 1): 1.
The heart rate estimated from the previous time step l,ĤR(l).

2.
A heart rate candidate HR cand (l + 1) which is obtained from the spectrum of the green PPG signal.

3.
A heart rate prediction, HR pred (l + 1) which is obtained from the long-term (LT) trend of the past six (6) HR estimates. The LT trend is obtained using STL, the Seasonal-Trend decomposition using LOESS (locally estimated scatterplot smoothing) [32].
In this study, we used the MATLAB implementation, trenddecomp.
In this case, we follow the procedure recommended in [7] to consider at most three dominant peaks in the green spectrum, whose magnitude exceed 50% of the maximum peak. Here, HR cand (l + 1) is obtained by averaging all the peaks that we considered. The estimated heart rate,ĤR(l + 1) is calculated asĤ where β is a constant we set to 0.9.
The heart rate calculation process we used requires the availability of the previous six HR estimates in order to generate an HR prediction, HR pred (l + 1) at time step l + 1. Therefore, from time steps l = 2 to l = 6, the HR estimatesĤR(2) throughĤR (6) corresponds to the highest peak in the green spectrum, within the wide search range ∆ w (l + 1) (ĤR(l + 1) ∈ [ĤR (6) (l) ± ∆ w (l + 1)/2]). If no such peak is detected, we increment ∆ w (l + 1) by 0.02 Hz (or 1.2 BPM) and we search again for a peak. This process repeats until a peak is found.ĤR (6) (l) is the average of all the previously calculated HR estimates (see Equation (3)).

Alternative HR Calculation Methods
In most studies involving PPG signals collected from humans in motion, suitable reference signals, representing motion artifacts, were obtained through additional hardware [28]. For example, when the PPG sensor is mounted on the wrist of a running participant, accelerometer sensors mounted on the participant's wrist are often used as noise reference signals [33][34][35].
TROIKA is an HR calculation framework proposed by Zhang et al. [7]. TROIKA is based on Singular Spectrum Analysis (SSA) [36] followed by Sparse Signal Reconstruction (SSR) [37] to eliminate the noise dominant components present in PPG signals. The inputs to TROIKA are a green PPG signal and X, Y, and Z accelerometer data. The output is an HR estimate. In our implementation of TROIKA, the noise components were obtained from a tri-axial accelerometer. In [7], TROIKA was tested on data collected from a wrist-worn sensor (that encloses a green PPG channel and X, Y, and Z accelerometer data) from twelve (12) participants, during fast running at peak speed of 15 km/h. The heart rate average absolute error of TROIKA in this test was 2.34 beat per minutes (BPM).
A related method is based on Zhang's Joint Sparse Spectrum Reconstruction (JOSS). It was shown in [8] to exhibit a heart rate average absolute error as small as 1.28 BPM when tested on the same twelve (12) participants used in Zhang's TROIKA study [7]. In JOSS, the input signals are a green PPG signal and X, Y, and Z accelerometer data. The accelerometer data are considered the noise signals. The output is an HR estimate. Compared to TROIKA where PPG and accelerometer signals were sampled at 125 Hz, JOSS's low-sampling rate, namely 25 Hz, is an attractive feature that gives JOSS the potential to be implemented in Very Large-Scale Integration (VLSI) or Field Programmable Gate Array (FPGA) in wearable devices [8].
The HR calculation mechanism of the DWL method was inspired by that of TROIKA and JOSS. We compare the quality of HR calculated by the DWL method which does not require accelerometers, to our implementation of the accelerometer-dependent TROIKA and JOSS. The TROIKA and JOSS experimental results were obtained from the same participants that we employed in the analysis of the DWL method.

Results
In this section, the results of the HR values calculated using the DWL framework of Section 2.3 are computed and analyzed. First, we define (in Section 3.1) the performance metrics used to compare the performance of the DWL method to that of our implementation of TROIKA and JOSS. In Section 3.2, we assess the performance of the DWL method (using the performance metrics of Section 3.1) on data collected from the participants' wrists. This comparison is made with respect to (1) the HR ground truth computed from an ECG signal, and (2) the HR levels obtained using TROIKA and JOSS. Lastly, in Section 3.3, we validate the DWL framework by comparing its performance on experimental data collected from the palms (instead of the wrists) of the same participants during a second run (validation run).

Performance Metrics
To assess, evaluate, and compare the HR estimation performance of DWL method to TROIKA and JOSS, we used four metrics, namely; Mean Absolute Error (MAE) (Equation (9)); Mean Absolute Error Percentage (MAEP) (Equation (10)); a specific performance index (PI) [38] (Equation (11)) which is the frequency, in percent, of obtaining an HR estimate that is within ±5 BPM of the HR ground truth; and computation time (CT). We defined CT to be the total time duration (in seconds) that an algorithm takes to generate heart rate levels from the entire 360-second-long off-line data that has already been collected during the experimental run. We compare the HR values calculated by the three tested methods to ground truth values obtained from an ECG signal that is simultaneously recorded, hence synced, with the green and IR PPG waveforms and the X, Y, and Z accelerometer data. All R peaks in the ECG signal were manually labeled. The ground truth HR was obtained using Equation (1). The relevant definitions are: In Equations (9)-(11), ∆(l) is defined as where . is the absolute value. BPM HR method (l) is the HR in beat per minutes (BPM) calculated using each one of the tested methods (DWL, TROIKA, and JOSS) at time step l. BPM GT (l) is the HR ground truth value in BPM obtained as BPM GT (l) = HR GT (l) × 60, where HR GT (l) is calculated using Equation (1). In Equation (11) (3)).

DWL Performance on Wrist Data
Data were collected from fourteen (14) participants while standing, walking, and running on the treadmill, following the experimental protocol described in Section 2.1. In this section, we analyze data collected from participants 1 to 11. Data from participants 12, 13, and 14 are not included in our analysis since for these participants, the system suffered from physical malfunction (intermittent readings due to loss of sensor contact). However, we still provide the data for these participants in the repository in [16]. Every 2 s, the preceding 8-second-long green and IR PPG data were used to generate a short-term

3-point-average HR estimate,ĤR
(3) (l), using the DWL method. HR levels obtained using DWL are compared to those of TROIKA and JOSS.
For the TROIKA implementation, we used a sampling rate of 100 Hz. We recreated the TROIKA code using MATLAB. Our code was tested on the same dataset of the TROIKA paper and compared to the results presented in [7]. The results using our code are very close to the results presented in the TROIKA paper.
For the JOSS implementation, we used a sampling rate of 25 Hz, as suggested in the JOSS paper [8]. We recreated the JOSS code using MATLAB. Our code was tested on the dataset used in JOSS paper and compared to the results presented in [8]. The results using our code are very close to the results presented in the JOSS paper.
As examples, we show in Figure 10 the HR calculated for the whole experimental run for two participants, participant 3 ( Figure 10a) and participant 10 ( Figure 10b). We use red circles, green squares, and blue triangles to represent the HR values calculated using DWL, TROIKA, and JOSS, respectively. The ground truth HR is the solid black line. In Figure 10a all three methods generate accurate HR estimates (the magnitude of the noise level present in the signals of participant 3 was small). For participant 10 (see Figure 10b), however, TROIKA lost track of the correct heart rate from 120 to 175 s and from 250 to 325 s. This phenomenon (losing track of the correct HR) is referred to as Lock Loss. Similary, JOSS suffered from a Lock Loss from 225 s until the end of the experimental run. During these intervals, the DWL method was still able to estimate the participant's HRs accurately (see red circles of Figure 10b).
We calculate MAE, MAEP, PI, and CT for all eleven (11) experimental participants and present them in Tables 2-5, respectively. In Table 2, we show the MAE for DWL, TROIKA, and JOSS. We calculate and report the MAE mean and standard deviation for each method in the second to last row of  Table 3 summarizes the MAEP for DWL, TROIKA, and JOSS. We calculate and report the MAEP mean and standard deviation for each method in the last row of Table 3. Moreover, we calculate PI for DWL, TROIKA, and JOSS, and report it in Table 4. In the last row of Table 4, we calculate the PI mean and standard deviation of all participants.   Table 2 the average MAE for all eleven participants using the DWL method is MAE of 1.22|0.57 BPM ("mean|standard deviation") (see Table 2), which is smaller than average MAE of TROIKA (3.24|2.82 BPM) and JOSS (11.98|25.79 BPM), respectively. When we exclude participants who suffer from Lock Loss (shown in the last row of Table 2), DWL (with the same average MAE = 1.22|0.57 BPM) still yields a smaller average MAE than that of TROIKA (with average MAE = 2.05|1.03 BPM) and JOSS (with average MAE = 2.11|1.24 BPM). Note that the MAE calculated using DWL method did not exceed 5 BPM for any of the participants. However, this was not the case for TROIKA and JOSS method. Participant 10 presents an example where the MAE of TROIKA (9.34 BPM) and JOSS (21.8 BPM) exceeds 5 BPM, whereas the MAE of the DWL method is 0.85 BPM. Table 3. MAEP in % for all eleven (11) experimental participants, using DWL, TROIKA, and JOSS (ideal MAEP is 0%). The last row shows the MAEP average of all eleven (11) participants shown as "mean|standard deviation".  In addition to MAE, we calculate average MAEP of all three methods for all eleven participants. Average MAEP of DWL method of 0.95|0.38% is smaller than average MAEP of TROIKA (2.58|2.19%) and JOSS (8.68|17.6%) (see Table 3). Table 4 summarizes the PI values for all eleven (11) participants. The PI of DWL method is larger than the PI of TROIKA and JOSS. For instance, on average, the PI of DWL method is 95.88|4.9% that is greater than that of TROIKA with 83.87|12.75% and JOSS with 78.62|26.16%.

Participant
The CT is an indication of the algorithm's computational complexity. In order to be implement in wearable devices, the algorithm should be able to run in real-time and be energy efficient. A desirable algorithm should have a small CT. Table 5 shows the CT of DWL, TROIKA, and JOSS for participants 1 to 11. The average CT of DWL is smaller than that of TROIKA and JOSS. For instance, the average CT of DWL is 3.0|0.3 s is smaller than the average CT of TROIKA with 247.7|43.8 s and JOSS with 8.5|0.24 s. Table 5. CT in seconds for all eleven (11) experimental participants, using DWL, TROIKA, and JOSS. The last row shows the CT average of all eleven (11) participants shown as "mean|standard deviation". Additionally, we show the Bland-Altman plot (Figure 11a) of the HR values computed using DWL method for participants one (1) through eleven (11). The Bland-Altman plot describes the agreement between two quantitative measurements (A and B) by constructing the Limits of Agreements (LOA). These statistical limits are calculated by using the mean and the standard deviation of the differences between the two measurements. The resulting graph is a scatter plot, in which the y-axis shows the difference between the two paired measurements (A − B) and the x-axis represents the average of these measures ((A + B)/2) [39]. The LOA we use is [µ − 1.96 × σ , µ + 1.96 × σ] (1.96 × σ corresponds to 95% confidence level) where µ is the average difference between each HR estimate and the associated ground-truth HR against their average, and σ is the standard deviation [7]. The LOA in Figure 11a is [−4.9, 4.8] BPM. Moreover, we construct the scatter plot of the HR estimated using DWL method versus the associated ground truth HR for participants one (1) through eleven (11). The scatter plot is shown in Figure 11b. We construct a linear regression for the data points of Figure 11b. The fitted line is y = x − 0.2 (R 2 = 0.99), where x is the ground truth HR and y is the HR estimated using DWL method. The Pearson correlation between the HR estimated using DWL method and ground truth HR is also calculated and found to be 0.99. The high R 2 value and Pearson correlation indicate that DWL method is able to compute accurate HR levels.  Figure 11. (a) Bland-Altman plot of HR estimated using DWL method and the ground truth HR for participants one (1) to eleven (11). The LOA = [−4.9, 4.8] BPM. (b) Scatter plot of HR estimated using DWL method (on the y-axis) vs. the ground truth HR (x-axis) for participants one (1) to eleven (11). The linear regression line that fits the data is shown in black. The line is y = x − 0.2 (R 2 = 0.99). The Pearson correlation is found to be 0.99.

Validation of the DWL Method on Palm Data
In order to validate the performance of the DWL framework, we ran a second experiment (validation run). During the second experiment, we asked the same volunteers who participated in our previous experiment to run on the treadmill again, following the experimental protocol described in Section 2.1. We reused the same ECG, accelerometer, and PPG sensors. The only difference was that we mount the dual wavelength sensor onto the participant's palm (instead of wrist). Data for all participants are provided in the repository in [16].
Both wrist and palm experiments took place on the same day. There was a break of approximately 15 min between the first and the second run during which the dual-wavelength PPG sensor was relocated from the wrist to the palm of the participant. Participants 5 and 13 deviated from the data collection protocol by interfering with the sensor during collection. Their measurements were excluded from the analysis we provide (but are available in the repository in [16]).
MAE, MAEP, and PI were calculated from the twelve (12) participants of the "palm run" for DWL, TROIKA, and JOSS. We show in Table 6, the summary of the performance metrics (MAE, MAEP, and PI) obtained for the first run (the "wrist run"), and the second run (the "palm run"). The results are presented as "mean|standard deviation". Table 6 shows that the DWL method performs as well when the measurements were taken from the wrist as when they were taken from the palm. Table 6. Summary of performance metrics for run 1 (wrist run) and run 2 (validation palm run). For run 1, we showed the average performance of eleven (11) participants. For run 2, we showed the average performance of twelve (12) participants. Results are represented as "mean|standard deviation".

Discussion
We presented a framework for heart rate (HR) calculation under motion using a dualwavelength (green and IR) PPG sensor. We used PPG data collected from 14 individuals engaged in high-intensity full-body exercise. Analysis of green and IR PPG signals indicates that the IR PPG signal is a good noise reference signal. We employed this observation to develop a motion-resistant HR calculation method derived from [9] that measures noise components from the IR PPG signal. Afterwards, a green PPG signal is denoised and used for HR calculation. The proposed method, Dual Wavelength (DWL), was tested on experimental data collected from participants' wrists while the participants were standing, walking, and running on a treadmill. The performance of the method, using several measures of accuracy and computational effort, was then compared to popular methods in the literature that use data from a tri-axial accelerometer for denoising, namely TROIKA and JOSS. Using the experimental wrist-data we collected, we showed that the DWL method exhibits good performance in the face of motion artifacts. For instance, DWL yielded a Mean Absolute Error (MAE) of 1.22|0.57 BPM, Mean Absolute Error Percentage (MAEP) of 0.95|0.38%, and performance index (PI) (which is the frequency in percent of the event that we obtain an HR estimate that is within ±5 BPM of the HR ground truth) of 95.88|4.9%. Moreover, DWL yielded a short computation period of 3.0|0.3 s to process a 360-second-long run. We validated the performance of the DWL method by testing it on data collected from the participants' palms, obtaining similar behavior. The DWL method is desirable since (1) it performed well under high-intensity full-body repetitive "macromotion", exhibiting high accuracy in the presence of motion artifacts (as compared to the leading accelerometer-dependent HR calculation techniques TROIKA and JOSS); (2) it used only PPG signals; auxiliary signals such as accelerometer signals were not needed; and (3) it was computationally efficient, hence implementable in wearable devices.