A Robust Dynamic Heart-Rate Detection Algorithm Framework During Intense Physical Activities Using Photoplethysmographic Signals

Dynamic accurate heart-rate (HR) estimation using a photoplethysmogram (PPG) during intense physical activities is always challenging due to corruption by motion artifacts (MAs). It is difficult to reconstruct a clean signal and extract HR from contaminated PPG. This paper proposes a robust HR-estimation algorithm framework that uses one-channel PPG and tri-axis acceleration data to reconstruct the PPG and calculate the HR based on features of the PPG and spectral analysis. Firstly, the signal is judged by the presence of MAs. Then, the spectral peaks corresponding to acceleration data are filtered from the periodogram of the PPG when MAs exist. Different signal-processing methods are applied based on the amount of remaining PPG spectral peaks. The main MA-removal algorithm (NFEEMD) includes the repeated single-notch filter and ensemble empirical mode decomposition. Finally, HR calibration is designed to ensure the accuracy of HR tracking. The NFEEMD algorithm was performed on the 23 datasets from the 2015 IEEE Signal Processing Cup Database. The average estimation errors were 1.12 BPM (12 training datasets), 2.63 BPM (10 testing datasets) and 1.87 BPM (all 23 datasets), respectively. The Pearson correlation was 0.992. The experiment results illustrate that the proposed algorithm is not only suitable for HR estimation during continuous activities, like slow running (13 training datasets), but also for intense physical activities with acceleration, like arm exercise (10 testing datasets).


Introduction
Photoplethysmography (PPG) is a kind of popular optical measurement technique that can be used to detect blood volume changes in the micro-vascular bed of tissue [1]. Because of its simplicity and low-cost advantages, PPG has become the new technique for wearable measurement instead of the conventional electrocardiography (ECG) technique. The pulsatile "AC" physiological waveform can be obtained due to cardiac synchronous changes in blood volume with the heartbeat. Due to this property, PPG can be a source of real-time heart rate (HR) information calculation [2]. Ear, finger and wrist are all optional PPG measurement sites. However, in measurement sites, noise interference produced by motion artifacts (MAs) and cardiac arrhythmia is inevitable. Due to human movement, relative motion may occur between the sensor and skin so that the principle component of true HR information is weakened. The quality of the PPG sensor signal is especially susceptible to motion artifacts. In other words, the accuracy of heart rate estimation depends on the quality of the PPG. precise binary-decision algorithm. When there are no significant MAs, HR can be obtained from the pre-processed signal spectrum peaks. When there are significant MAs, then stage 2 appears to work. Stage 2 removes the corresponding MA spectrum peaks from pre-processed signal spectrum peaks and then goes through the next step combined with the single notch filter and the improved EEMD by a ternary-decision algorithm. Every step has one kind of HR tracking method. With the purpose of avoiding spectrum peak loss to keep tracking valid, stage 3 will calibrate the value of HR.
Sensors 2017, 17,2450 3 of 17 estimate the HR by a precise binary-decision algorithm. When there are no significant MAs, HR can be obtained from the pre-processed signal spectrum peaks. When there are significant MAs, then stage 2 appears to work. Stage 2 removes the corresponding MA spectrum peaks from pre-processed signal spectrum peaks and then goes through the next step combined with the single notch filter and the improved EEMD by a ternary-decision algorithm. Every step has one kind of HR tracking method. With the purpose of avoiding spectrum peak loss to keep tracking valid, stage 3 will calibrate the value of HR.   The main contribution of this algorithm has two aspects: (1) The different signal processing methods with high estimation accuracy are proposed according to different PPG features instead of using the sequential execution mode for every sliding window. (2) The running time in real-time HR estimation is greatly reduced, due to reduced computational complexity. This paper is organized as follows: Section 2 discusses various conditions where there is very strong MAs so that HR estimation is difficult. Section 3 introduces the proposed algorithm framework to deal with the problem mentioned above. Section 4 illustrates the experimental results using 23 datasets and compares these with other framework algorithms in this field, with corresponding discussions. Moreover, all datasets used in this paper will be available at Zhilin Zhang's homepage: https://sites.google.com/site/researchbyzhang/ and the related paper is [11].

Materials and Methods
In this section, the complete NFEEMD algorithm framework will be mentioned and the corresponding details of the proposed algorithm will be given. The overall flowchart of NFEEMD could be shown in Figure 1.
The framework of HR estimation during intense physical exercise is presented in Figure 1, then the signal-processing method in every case will be illustrated. For this algorithm, one-channel raw PPG and tri-axis acceleration data are needed. This framework includes two binary decisions and two ternary decisions.
Initialization: In the initialization stage, the system can enter into a stable heart-rate tracking state if HR can be estimated by choosing only one spectrum peak from the raw PPG periodogram. Before the stable heart-rate tracking state, the estimated HR values can be discarded, since the contaminated PPG with many spectral peaks may provide wrong prior HR information initially. It is known that the HR value cannot change rapidly in a short time [16], therefore correct prior HR information is essential to maintain continuous HR tracking. Once stable tracking starts, the first HR value is obtained. Then the main algorithm uses the previous HR estimation bpm_prev as its base. It should be noted that we identify the spectral peaks if they are larger than 30% of the maximum amplitude in the periodogram.
Pre-processing: It is necessary for raw PPG to be pre-processed in order to highlight the HR information. The repeated moving average filter [3] is applied to obtain the signal component of 0.5-3 Hz. The pre-processing signal is used in two cases: (1) in the HR tracking when there are no MAs; (2) in the first ternary decision for finding the number of PPG spectral peaks when there are MAs.
Binary Decision 1: This step aims to determine if there are MAs through the amplitude of the acceleration periodogram. MAs exist when the spectral amplitude of every axis acceleration data is greater than 0.1 (power spectral density). Of course, there are other algorithms for determining the existence of MAs, such as the correlation coefficient (CC) [17]. Through the comparison of three methods, the resulting judgement can be described in Figure 2. The main contribution of this algorithm has two aspects: (1) The different signal processing methods with high estimation accuracy are proposed according to different PPG features instead of using the sequential execution mode for every sliding window. (2) The running time in real-time HR estimation is greatly reduced, due to reduced computational complexity. This paper is organized as follows: Section 2 discusses various conditions where there is very strong MAs so that HR estimation is difficult. Section 3 introduces the proposed algorithm framework to deal with the problem mentioned above. Section 4 illustrates the experimental results using 23 datasets and compares these with other framework algorithms in this field, with corresponding discussions. Moreover, all datasets used in this paper will be available at Zhilin Zhang's homepage: https://sites.google.com/site/researchbyzhang/ and the related paper is [11].

Materials and Methods
In this section, the complete NFEEMD algorithm framework will be mentioned and the corresponding details of the proposed algorithm will be given. The overall flowchart of NFEEMD could be shown in Figure 1.
The framework of HR estimation during intense physical exercise is presented in Figure 1, then the signal-processing method in every case will be illustrated. For this algorithm, one-channel raw PPG and tri-axis acceleration data are needed. This framework includes two binary decisions and two ternary decisions.
Initialization: In the initialization stage, the system can enter into a stable heart-rate tracking state if HR can be estimated by choosing only one spectrum peak from the raw PPG periodogram. Before the stable heart-rate tracking state, the estimated HR values can be discarded, since the contaminated PPG with many spectral peaks may provide wrong prior HR information initially. It is known that the HR value cannot change rapidly in a short time [16], therefore correct prior HR information is essential to maintain continuous HR tracking. Once stable tracking starts, the first HR value is obtained. Then the main algorithm uses the previous HR estimation bpm_prev as its base. It should be noted that we identify the spectral peaks if they are larger than 30% of the maximum amplitude in the periodogram.
Pre-processing: It is necessary for raw PPG to be pre-processed in order to highlight the HR information. The repeated moving average filter [3] is applied to obtain the signal component of 0.5-3 Hz. The pre-processing signal is used in two cases: (1) in the HR tracking when there are no MAs; (2) in the first ternary decision for finding the number of PPG spectral peaks when there are MAs.
Binary Decision 1: This step aims to determine if there are MAs through the amplitude of the acceleration periodogram. MAs exist when the spectral amplitude of every axis acceleration data is greater than 0.1 (power spectral density). Of course, there are other algorithms for determining the existence of MAs, such as the correlation coefficient (CC) [17]. Through the comparison of three methods, the resulting judgement can be described in Figure 2.  It is known that there must be motion artifacts in the running state. Moreover, PPG and MAs may overlap when MAs are very strong. From Figure 2, it is obvious that our method is more suitable for the experiment than the CC algorithm.
Case 1: When MAs do not exist, an HR-estimation process is proposed. bpm_track (the final estimated HR value) depends on the highest spectral peak of pre-processed PPG. If |bpm_track − bpm_prev| > th_pass (a constant threshold), bpm_track is equal to bpm_prev.
HR Calibration 1: According to the calculation rules above, an HR-calibration mechanism is designed in order to prevent the correct spectral peak being lost due to over-reliance on the past heart rate values. If two consecutive heart-rate values are dependent on the past, the HR Calibration 1 algorithm will start when MAs do not exist. In this condition, bpm_track will be recalculated through the original PPG signal. Meanwhile, if |bpm_track − bpm_prev| ≤ th_pass, the value of bpm_track will be replaced by the bpm_prev's.
Binary Decision 2: The purpose of this step is to determine whether the accelerometer signal is reliable and whether the frequency component associated with motion artifacts can be accurately provided during strenuous exercise. It is known that the spectrum of tri-axis acceleration data is too messy to accurately extract the MA component by exploring the acceleration data. If the number of spectral peaks in any axis accelerometer data exceeds th_acc_peaks (a constant), then we consider the acceleration data of this window to be unreliable.
Case 2: At this time, bpm_track will be replaced by past HR value.
After Binary Decision 2, if the acceleration data is reliable, then we will remove the spectral peaks (absolute value less than or equal to 8) associated with the tri-axis acceleration from the spectral peaks of pre-processed PPG. The number of remaining spectral peak is then counted. Ternary Decision 1 divides the situation into three categories: the remaining zero (the peak corresponding to HR cannot be detected), the remaining one, leaving more than one. For the second case, a Ternary Decision 2 was designed by counting the peak number with a difference of th_pass BPM compared to bpm_prev. This ternary decision also divides the situation into three categories: the remaining zero, the remaining one, leaving more than one. As shown in Figure 1, the same situation undergoes the same treatment.
Case 3: (the remaining zero): This is the main part of the algorithm used to remove motion artifacts. The repeated single notch filter and improved EEMD algorithm are adopted herein.
(1) Single notch filter A second-order IIR filter is used to implement the single notch filter here, whose amplitude response is zero at a certain frequency. It is usually used to eliminate a particular frequency component, such as the removal of 50 Hz power-line interference. In this paper, the single notch filter is applied to the motion artifacts' removal from the PPG signal spectrum peaks based on the tri-axis acceleration signal spectrum peaks. The system function of the single notch filter is defined in Equation (1) as follows: where w 0 = 2π f 0 / f s is the notch digital frequency (rad); f 0 is notch frequency (Hz); f s is sampling frequency; and r is a constant (r = 0.96). The single notch filter can be applied when the MAs of the original signal exist after the binary decision. According to the tri-axis acceleration data, several frequencies corresponding to MAs are identified through the periodogram. Then, the related frequencies in the original PPG signal are removed, and the reconstructed, clean signal is obtained. Figure 3 shows a flowchart depicting the removal of MAs to reconstruct the PPG signal for accurate HR evaluation. It should be noted that |bpm_track_NF − bpm_prev| > th_NF_1 (a constant threshold) and |bpm_track_EEMD − bpm_prev| > th_EEMD_1 after the Ternary Decision 1, |bpm_track_NF − bpm_prev| > th_NF_2 (a constant threshold) and |bpm_track_EEMD − bpm_prev| > th_EEMD_2 after the Ternary Decision 2. th_NF and th_EEMD are the empirical values obtained from the first 12 experiments. Original PPG

Repeated Notch Filter
Reconstructed Signal: It should be noted that the usage number of the repeated single notch filter is different in a different signal window. On the one hand, this number depends on all spectral peaks detected in the tri-axis acceleration signal spectrum. If there is a repeated value, it is recorded as one. On the other hand, if the number does not exceed three, then this number is the times the single notch filter is applied, and the frequencies of the notch are the detected peaks. If the number exceeds three, then a spectral peak value is found from the spectrum of each axis acceleration data; that is, the number of times the single notch filter is used is up to three. The purpose of limiting the number of application times is to retain the component corresponding to heart rate of the original signal as much as possible. Figure 4 shows an example of case 3.
From Figure 4, it is known that the final heart rate estimation after the NFEEMD algorithm is 164.8 BPM. The true HR value by ECG is 163.2 BPM. The absolute error is 1.6 BPM. In fact, we can find the only spectral peak, 172.1 BPM, in the periodogram of the original PPG signal. However, this value is quite different from the true HR. It is the effect of strong motion artifacts that results in a large error in HR estimation. It is worth noting that MAs are superimposed on the PPG signal closely in this example, so that the true HR information is covered by MAs. Although MAs conceal the true HR information, the tri-axis motion acceleration data accurately reflects the frequency components of motion artifacts. After removing the MAs by the single notch filter, the estimated HR value calculated from the final reconstructed signal is 164.8 BPM, which is very close to the true heart rate value. It is shown that using the periodogram directly may lead to a large error and lost HR tracking. This example also demonstrates that the repeated use of the single notch filter can accurately remove the MA components without affecting the nearby real HR components, and the HR-value resolution is high. It should be noted that the usage number of the repeated single notch filter is different in a different signal window. On the one hand, this number depends on all spectral peaks detected in the tri-axis acceleration signal spectrum. If there is a repeated value, it is recorded as one. On the other hand, if the number does not exceed three, then this number is the times the single notch filter is applied, and the frequencies of the notch are the detected peaks. If the number exceeds three, then a spectral peak value is found from the spectrum of each axis acceleration data; that is, the number of times the single notch filter is used is up to three. The purpose of limiting the number of application times is to retain the component corresponding to heart rate of the original signal as much as possible. Figure 4 shows an example of case 3.
From Figure 4, it is known that the final heart rate estimation after the NFEEMD algorithm is 164.8 BPM. The true HR value by ECG is 163.2 BPM. The absolute error is 1.6 BPM. In fact, we can find the only spectral peak, 172.1 BPM, in the periodogram of the original PPG signal. However, this value is quite different from the true HR. It is the effect of strong motion artifacts that results in a large error in HR estimation. It is worth noting that MAs are superimposed on the PPG signal closely in this example, so that the true HR information is covered by MAs. Although MAs conceal the true HR information, the tri-axis motion acceleration data accurately reflects the frequency components of motion artifacts. After removing the MAs by the single notch filter, the estimated HR value calculated from the final reconstructed signal is 164.8 BPM, which is very close to the true heart rate value. It is shown that using the periodogram directly may lead to a large error and lost HR tracking. This example also demonstrates that the repeated use of the single notch filter can accurately remove the MA components without affecting the nearby real HR components, and the HR-value resolution is high. (2) Improved Ensemble Empirical Mode Decomposition Empirical mode decomposition (EMD) is a kind of non-linear and non-stationary signal decomposition algorithm proposed by Professor Huang, which aims to decompose any complicated time-series signal into a number of "intrinsic mode functions" [18]. However, its drawbacks are obvious. End effect and mode mixing are the primary factors that can limit the development of EMD. On the basis of EMD, Wu et al. proposed a noise-assisted data analysis method, Ensemble Empirical Mode Decomposition (EEMD) [19]. In this algorithm, mirror continuation is used to remove end effect, and its key idea is that the mode-mixing phenomenon can be weakened by adding white Gaussian noise. Zhang et al. [17] and Khan et al. [20] have proven that EMD or the EEMD algorithm performance well in the analysis of biomedical signals.
In the EMD approach, the given original signal ) ( n x is decomposed into the IMF components and a residue. Each IMF component should satisfy two conditions: (1) the number of extrema and the number of zero-crossings must either be the same or differ at most by 1; and (2) the mean value of the upper envelope and lower envelope must be zero at any positions in the signal. EMD uses a "sifting process" to obtain the local zero-mean until an IMF and residue are found. The "sifting process" includes several steps as described follows: (1)    (2) Improved Ensemble Empirical Mode Decomposition Empirical mode decomposition (EMD) is a kind of non-linear and non-stationary signal decomposition algorithm proposed by Professor Huang, which aims to decompose any complicated time-series signal into a number of "intrinsic mode functions" [18]. However, its drawbacks are obvious. End effect and mode mixing are the primary factors that can limit the development of EMD. On the basis of EMD, Wu et al. proposed a noise-assisted data analysis method, Ensemble Empirical Mode Decomposition (EEMD) [19]. In this algorithm, mirror continuation is used to remove end effect, and its key idea is that the mode-mixing phenomenon can be weakened by adding white Gaussian noise. Zhang et al. [17] and Khan et al. [20] have proven that EMD or the EEMD algorithm performance well in the analysis of biomedical signals.
In the EMD approach, the given original signal x(n) is decomposed into the IMF components and a residue. Each IMF component should satisfy two conditions: (1) the number of extrema and the number of zero-crossings must either be the same or differ at most by 1; and (2) the mean value of the upper envelope and lower envelope must be zero at any positions in the signal. EMD uses a "sifting process" to obtain the local zero-mean until an IMF and residue are found. The "sifting process" includes several steps as described follows: (1) identifying all extremes of x(n); (2) interpolating the upper envelope of maxima and the lower envelop of minima; (3) computing the mean of upper and lower envelope m(n); (4) extracting the detail c(n) = x(n) − m(n) and the residual; (5) iterating on the residual until c(n) satisfies the IMF conditions and then c(n) will be an IMF component; (6) repeating on the residual using sifting process (1)-(5) until the residual component is a monotonic function. The final decomposition result is described as follows: The updated EEMD algorithm adds an ensemble of N e signals into the given signal x(n) by adding white Gaussian noise w p (n)(p = 1, 2 . . . N e ) which has the same variance. The equation can be described as follows: Then, the EMD algorithm is applied to each of the new signals with noise, and the decomposition results can be described as follows: Then, the optimum choice of IMFs is obtained by computing the average of an ensemble of N e signals of the EMD algorithm: The basic principle of EEMD based on noise-assisted analysis is that the time-frequency space is composed of different scale components divided by the filter group when the additional white noise is evenly distributed over the entire time-frequency space. Of course, EEMD cannot eliminate the effect of mode mixing completely due to the additional noise, especially when the parameter N e is small.
To make the reconstructed signal derived from EEMD cleaner, singular spectrum analysis (SSA) [11] is combined with EEMD to extract the usable reconstructed signal. SSA generates a trajectory matrix from the original signal by a sliding window of length L. The trajectory matrix is approximated using singular value decomposition (SVD). The last step is to reconstruct the series.
In this paper, SSA is applied to each IMF component after EEMD, and then the signal is reconstructed by only the first eigenvalue which corresponds to the most principal component over the whole signal. Finally, the only IMF whose frequency is closest to the previous heart rate frequency is chosen as the reconstructed signal.
Selecting which IMF belongs to the constructed signal is crucial. In this paper, the cycle of each IMF is first estimated. Then the IMF signal whose cycle is closest to the bpm_prev is chosen as the constructed signal.
In the example above, although the PPG signal is contaminated by MAs, the HR information is still obtained by the single notch filter. However, it is almost impossible to extract the HR information by the single notch filter when the PPG coincides with the motion artifacts (the spectral peak of the PPG is completely overlapped by the MA component.) (Figure 5e,f). So EEMD is another proposed method in this case. Figure 5 gives an example of HR estimation using EEMD instead of using the single notch filter. The final HR estimation value is 153.8 BPM, showing that EEMD is an effective method. It should be noted that the signal which has high signal quality in Figure 5g is the reconstructed one after the Sensors 2017, 17, 2450 9 of 17 EEMD algorithm. From Figure 5, compared to the bpm_prev, the estimated HR by EEMD is closer to the true value, which shows the effectiveness of EEMD. On the other hand, there is a high-frequency resolution of estimation using EEMD, for example, Figure 6. The final estimated HR is 159.3 BPM, which is different from the spectral peak of the original PPG, 161.1 BPM. the true value, which shows the effectiveness of EEMD. On the other hand, there is a high-frequency resolution of estimation using EEMD, for example, Figure 6. The final estimated HR is 159.3 BPM, which is different from the spectral peak of the original PPG, 161.1 BPM. Case 4: (the remaining one): The reason why this situation is carried out alone is that in this case the remaining frequency component is the most reliable. Therefore, if the absolute value of the heart-rate value corresponding to the remaining one and bpm_prev is less than th_pass, this value is considered acceptable. In other words, bpm_track = bpm_prev. According to experience, it is not necessary to use the above complex algorithm due to the low probability of it not being accepted in this case.
Case 5: (leaving more than one): In this circumstance, the heart rate (bpm_track) closest to the absolute value of the past is found. If |bpm_track − bpm_prev| < th_pass, then the value is accepted. Or bpm_track = bpm_prev.
HR Calibration 2: This is similar to the HR Calibration 1 mechanism. In fact, there is often a lack of tracking as the heart rate continues to rise. Therefore, HR calibration is employed mainly to solve this problem. On the one hand, the heart-rate value prediction of the current window is obtained through gray system prediction (GM (1, 1)) of the first 30 values when MAs exist. If the absolute value of the predicted value and the past value is greater than 12, then the next condition will be checked.
On the other hand, three consecutive heart-rate values are dependent on the past. As long as these two conditions occur, the HR Calibration 1 algorithm will start when MAs exist to avoid losing   Case 4: (the remaining one): The reason why this situation is carried out alone is that in this case the remaining frequency component is the most reliable. Therefore, if the absolute value of the heart-rate value corresponding to the remaining one and bpm_prev is less than th_pass, this value is considered acceptable. In other words, bpm_track = bpm_prev. According to experience, it is not necessary to use the above complex algorithm due to the low probability of it not being accepted in this case.
Case 5: (leaving more than one): In this circumstance, the heart rate (bpm_track) closest to the absolute value of the past is found. If |bpm_track − bpm_prev| < th_pass, then the value is accepted. Or bpm_track = bpm_prev.
HR Calibration 2: This is similar to the HR Calibration 1 mechanism. In fact, there is often a lack of tracking as the heart rate continues to rise. Therefore, HR calibration is employed mainly to solve this problem. On the one hand, the heart-rate value prediction of the current window is obtained through gray system prediction (GM (1, 1)) of the first 30 values when MAs exist. If the absolute value of the predicted value and the past value is greater than 12, then the next condition will be checked.
On the other hand, three consecutive heart-rate values are dependent on the past. As long as these two conditions occur, the HR Calibration 1 algorithm will start when MAs exist to avoid losing track of spectral peaks. Then, a new bpm_track is calculated by the repeated single notch filter. If |bpm_track − bpm_prev| < 40, it is considered that bpm_track can be accepted. In order to better verify the effectiveness of the HR Calibration 2 model, we have done a set of comparative experiments to perform our algorithm in conditions using both HR Calibration 2 and not using it. Figure 7 illustrates the corresponding result, which shows that the HR-tracking trajectory (the blue curve) deviates from normal orbit when there is no calibration model. In contrast, the estimated HR (the black curve) can reflect the correct heart beat changes. track of spectral peaks. Then, a new bpm_track is calculated by the repeated single notch filter. If |bpm_track − bpm_prev| < 40, it is considered that bpm_track can be accepted. In order to better verify the effectiveness of the HR Calibration 2 model, we have done a set of comparative experiments to perform our algorithm in conditions using both HR Calibration 2 and not using it. Figure 7 illustrates the corresponding result, which shows that the HR-tracking trajectory (the blue curve) deviates from normal orbit when there is no calibration model. In contrast, the estimated HR (the black curve) can reflect the correct heart beat changes.    Additionally, the final output HR value of the current window is the average value of the previous windows by the cyclic moving average filter [16]. By virtue of the moving average filter with different windows, the final output result will be smooth.

Database Introduction
To promote the development of HR estimation using PPG signals corrupted by intense motion artifacts, the IEEE Signal Processing Society has already organized an algorithm contest (IEEE Signal Processing Cup). Their datasets in the contest are also used in [10][11][12][13][14][15]17,21] and our NFEEMD algorithm also applies the same datasets. There are 23 datasets in total. It should be noted that the first 12 datasets are considered as the training data, and the last 11 datasets are considered as the test data in the following experiments.
Each of the first 12 training datasets (running on the treadmill) records two-channel PPG signals, three-axis acceleration signals, and one-channel ECG signals from subjects aged from 18 to 35. The last 11 test datasets (including wrist-intense exercise) record subjects aged from 19 to 58. For each subject, the PPG signals were recorded from the wrist by two pulse oximeters with green LEDs (wavelength: 515 nm). Their distance (from center to center) was 2 cm. The acceleration signal was also recorded from the wrist by a three-axis accelerometer. Both the pulse oximeter and the accelerometer were embedded in a wristband, which was comfortably worn. The ECG signal was recorded simultaneously from the chest using wet ECG sensors. All signals were sampled at 125 Hz and sent to a nearby computer via Bluetooth. Three types of activities were used in the experiments. T0 requires each subject to run on a treadmill with changing speeds. For datasets with names containing 'TYPE01', the running speeds changed as follows: For the exercise type T1, the subject performed many actions including various forearm and upper arm exercises (e.g., shake hands, stretch, push, and so on, which are common in arm rehabilitation exercise), running, jumps, and push-ups. For the exercise type T2, the subject mainly performed intense forearm and upper arm movements (e.g., boxing). In each of this kind of datasets, heart rate is estimated in each time window of 8 s. Two successive time windows overlap by 6 s. Detailed experimental conditions follow in Table 1. Activity type 1 : T0 including resting/walking/running on a treadmill; T1 including arm rehabilitation exercise; T2 including intense arm movements (e.g., boxing); Unhealthy 2 : Unhealthy means that the subject has abnormal heart rhythm and blood pressure.

Performance Metrics
In this paper, the ground-truth HR from the simultaneous recorded ECG signal given by the database in each time window is chosen for comparison with the algorithm results. To evaluate the performance of the NFEEMD algorithm, two measurement indices which were also used in [10,11] are computed.
• AAE (Average Absolute Error) where BPM est (i) denotes the ground-truth HR value in the i-th time window; and BPM true (i) denotes the estimated HR using the NFEEMD algorithm. • AEP (Absolute Error Percentage) In addition, the Bland-Altman plot is given to show the agreement between the ground-truth HR values and the estimated HR values. LOA (Limit of Agreement) is also calculated, which is defined by [µ − 1.96σ, µ + 1.96σ]. The last evaluation index is the Pearson correlation coefficient between the ground-truth HR and estimation.

Parameter-Setting and Experimental Results
In the NFEEMD algorithm, we set the number of FFT points of the periodogram to 4096, and apply a rectangle window. The sampling rate of PPG and ACC is 125 Hz. The ratio of the standard deviation of the added noise in EEMD is 0.05, and N e is 5. The window length of SSA is 150. th_acc_peaks is 5. th_NF_1 is 15 and th_EEMD_1 is 15 after Ternary Decision 1. th_NF_2 is 10 and th_EEMD_2 is 10 after ternary decision 2. th_off (a constant threshold) is 30 in case 3. th_pass is 8 in this paper. These thresholds are empirical values derived from the first 12 datasets.

Discussion
From Table 2, the NFEEMD algorithm performs better compared to the others. For the first 12 of 23 datasets, the average absolute error (AAE) is 1.12 + 0.51 (mean ± standard deviation) BPM, and AAE is 2.68 + 2.19 BPM for the remaining 11 datasets. For all 23 datasets, an average absolute error of 1.87 BPM and standard deviation of 1.79 BPM are recorded using the NFEEMD framework under intense physical activities.
It should be noted that the most obvious difference between the first 12 datasets and the last 11 datasets is the severity of motion. The activities of sample set T0 on the treadmill have a certain regularity, and the activities of sample set T1 and sample set T2 including arm movements are intense and random. In Table 2, the average absolute error of the last 11 datasets (2.68 BPM) by using NFEEMD is significantly larger than the first 12 datasets (1.12 BPM). This result is consistent with the severity of the state of motion, so the intenser the movements, the larger the HR-estimation error obtained. Although the errors are a bit larger for the last 11 datasets, HR estimates do not get derailed, which indicates that the direction of the HR estimates is correct. If the HR estimates get derailed for some reason in the process of estimation, our HR-calibration section could work (Figure 7, for example). As the results of comparisons in Table 2 show, the NFEEMD algorithm could obtain the most accurate results on HR estimates for the last 11 datasets, as well as the second most accurate results on HR estimates for the first 12 datasets. Compared with the SPECTRAP algorithm, the accuracy of the NFEEMD algorithm is slightly lower than the SPECTRAP for 22 datasets (except dataset 13). In a word, the results in Table 2 indicate that the NFEEMD algorithm can adapt to more intense circumstances like boxing in the last 11 recordings and our algorithm is more robust.
The Bland-Altman plot is given in Figure 8 to test agreement between the ground-truth HR values and the estimation HR values, in order to show our algorithm's better performance. From Figure 8, we can see that more than 95% of estimated HR values are incorporated in the limit of agreement (LOA) expressed by [µ − 1.96σ, µ + 1.96σ] where the absolute value of mean µ = 0.02 BPM and standard deviation σ = 3.79 BPM. The difference error is large when the real HR values are between 50 and 80, which indicates that several HR estimations using NFEEMD do not work well when the subject is in a stationary state except for intense arm exercise like boxing. The algorithm works well when HR is increasing continually, which indicates that the MA-removal method and the heart-rate tracking method are valid when the subject is in a sustained exercise state like running.
In addition, Figure 9 gives the scatter plot of 23 recordings between the ground-truth HR values and the estimated HR values, where the Pearson correlation coefficient is 0.992 and the fitted line is y = 1.0101x − 1.2454 (x is the ground-truth HR values and y is the estimated HR value). In Figure 9, the high Pearson coefficient and the small absolute value of mean indicate the NFEEMD algorithm's better performance.  In order to verify the performance of the NFEEMD algorithm on HR tracking during intense physical exercise, we applied the proposed algorithm along with the same set of parameters to a new dataset where the subject performed intensive running and upper arm movements. The sampling frequency is 100 Hz. The results are given in Figure 10. It should be noted that the true HR values are obtained by the heart-rate monitor made by Decathlon and there would be errors in data obtained  In order to verify the performance of the NFEEMD algorithm on HR tracking during intense physical exercise, we applied the proposed algorithm along with the same set of parameters to a new dataset where the subject performed intensive running and upper arm movements. The sampling frequency is 100 Hz. The results are given in Figure 10. It should be noted that the true HR values are obtained by the heart-rate monitor made by Decathlon and there would be errors in data obtained between this monitor and the ECG. From Figure 10, the average absolute error is 3.72 BPM, which In order to verify the performance of the NFEEMD algorithm on HR tracking during intense physical exercise, we applied the proposed algorithm along with the same set of parameters to a new dataset where the subject performed intensive running and upper arm movements. The sampling frequency is 100 Hz. The results are given in Figure 10. It should be noted that the true HR values are obtained by the heart-rate monitor made by Decathlon and there would be errors in data obtained between this monitor and the ECG. From Figure 10, the average absolute error is 3.72 BPM, which indicates that the proposed algorithm performs well during intensive physical activities and the algorithm could be available for PPG derived from other platforms. To examine in more detail the performance of the NFEEMD algorithm with a change in sampling frequency, we experimented in 25 Hz sampling frequency using the same algorithm for one channel PPG and three-channel ACC. The corresponding AAE results for all datasets are listed in Table 3, which demonstrates that the NFEEMD algorithm performs better in 125 Hz sampling frequency than in 25 Hz. In other words, more detailed information can be recorded at a high sampling frequency so that the HR-estimation accuracy can be improved. Moreover, the signal sparsification technique through the M-FOCUSS in TROIKA and JOSS is applied on the HR estimation algorithm, which involves extensive computational complexity. For example, for the sampling frequency of 125 Hz, TROIKA takes about 3.5 h to estimate HR for the first 12 datasets on a computer equipped with Intel Core-i7 4790 at 3.6 GHz, 8-GB RAM, Windows 7 64 bit, and MATLAB 2013a. The algorithm proposed by Khan [20] takes 668 s on the same computer. The NFEEMD algorithm takes 229 s for calculation of the first 12 datasets and 476 s for all 23 datasets using the same computer configuration. In addition, our proposed algorithm takes 86 s for the first 12 datasets and 191 s for all 23 datasets when the sampling frequency is 25 Hz. JOSS takes 300 s for all the datasets at 25 Hz sampling frequency. It is obvious that our algorithm has the advantage of low computational complexity and short running time.
Of course, our algorithm also needs to be improved. On the one hand, Figure 8 shows that the difference error is large when the real HR values are between 50 and 80. The subjects tend to be in the rest state, or a critical state between rest and movement, when this circumstance occurs. In fact, HR estimation during running may be easier sometimes than in the rest or critical states, since in the latter process, there are large MA peaks and these peaks are actually scattered all over the spectrum. At this moment, there is no effective method to solve this problem. What is more, two-channel compared PPG signals can improve the algorithm performance and the correct HR peak can be determined with more confidence. On the other hand, high-frequency resolution and accuracy are obtained after using the repeated single notch filter advanced in this paper. Of course, an adaptive filter is also an option, although computation is increased. This research will be continued in future To examine in more detail the performance of the NFEEMD algorithm with a change in sampling frequency, we experimented in 25 Hz sampling frequency using the same algorithm for one channel PPG and three-channel ACC. The corresponding AAE results for all datasets are listed in Table 3, which demonstrates that the NFEEMD algorithm performs better in 125 Hz sampling frequency than in 25 Hz. In other words, more detailed information can be recorded at a high sampling frequency so that the HR-estimation accuracy can be improved. Moreover, the signal sparsification technique through the M-FOCUSS in TROIKA and JOSS is applied on the HR estimation algorithm, which involves extensive computational complexity. For example, for the sampling frequency of 125 Hz, TROIKA takes about 3.5 h to estimate HR for the first 12 datasets on a computer equipped with Intel Core-i7 4790 at 3.6 GHz, 8-GB RAM, Windows 7 64 bit, and MATLAB 2013a. The algorithm proposed by Khan [20] takes 668 s on the same computer. The NFEEMD algorithm takes 229 s for calculation of the first 12 datasets and 476 s for all 23 datasets using the same computer configuration. In addition, our proposed algorithm takes 86 s for the first 12 datasets and 191 s for all 23 datasets when the sampling frequency is 25 Hz. JOSS takes 300 s for all the datasets at 25 Hz sampling frequency. It is obvious that our algorithm has the advantage of low computational complexity and short running time.
Of course, our algorithm also needs to be improved. On the one hand, Figure 8 shows that the difference error is large when the real HR values are between 50 and 80. The subjects tend to be in the rest state, or a critical state between rest and movement, when this circumstance occurs.
In fact, HR estimation during running may be easier sometimes than in the rest or critical states, since in the latter process, there are large MA peaks and these peaks are actually scattered all over the spectrum. At this moment, there is no effective method to solve this problem. What is more, two-channel compared PPG signals can improve the algorithm performance and the correct HR peak can be determined with more confidence. On the other hand, high-frequency resolution and accuracy are obtained after using the repeated single notch filter advanced in this paper. Of course, an adaptive filter is also an option, although computation is increased. This research will be continued in future work.

Conclusions
This paper has proposed the NFEEMD HR estimation algorithm for intense physical exercises. One-channel PPG signal and tri-axis acceleration data are combined to generate a complex HR-estimation method. When MAs do not exist, only the spectrum of PPG is used to find the peak corresponding to HR information. However, the single notch filter and EEMD are applied to track the true HR value when MAs are strong. Finally, it is necessary to use the HR-calibration algorithm to avoid HR values heading into a runaway situation, so that the off-track errors can be decreased to a minimum. Tests on all of the 23 datasets of IEEE Signal Processing Society showed that the NFEEMD algorithm could comprehensively obtain higher accuracy than others, especially during intense physical activities. The measurement metrics illustrate that the NFEEMD algorithm is more robust and stable. This dynamic HR-estimation algorithm has many potential uses in wearable devices. People can monitor the real-time heart rate at home with wearable devices, or monitoring of long-term HR-tracking values can be undertaken for disease diagnosis.