1. Introduction
In recent years, there has been a rapid increase in the use of smart wearable devices, such as smartwatches and smart rings, which allow individuals to monitor various physiological parameters in real time [
1,
2,
3]. Namely, the implementation of photoplethysmography (PPG) over conventional electrocardiography (ECG) has gained considerable attention due to its non-invasive and cost-effective nature. Whereas an ECG device measures the heart’s electrical activity through electrodes placed on the body, a PPG device typically uses a light-emitting diode (LED) to shine light into the skin while a photodetector measures the amount of light that is either reflected or transmitted through the tissue. Due to the pulsatile nature of the circulatory system, changes in blood volume are reflected in the absorbed light, which can be used to derive health metrics, including heart rate (HR) [
4,
5].
In practice, PPG signals are often collected from sensors placed on the wrist or fingers, which can lead to motion artifacts (MAs), especially during physical activity. MAs can compromise the reliability of the measured PPG signal, leading to inaccuracies in heart rate detection and other derived metrics [
6,
7]. An accurate measurement of heart rate during exercise is especially useful as it provides valuable insights into intensity, recovery, and training effectiveness [
8,
9]. Beyond fitness monitoring, accurate heart rate measurement during physical movement has applications in clinical contexts, where it can assist in the monitoring of patients with cardiovascular conditions during daily activities [
10,
11,
12].
Despite the growing interest in PPG-based monitoring, the development of algorithms to process PPG signals and mitigate the effects of motion artifacts remains challenging. Spectrum-based approaches form the basis of many of these algorithms, leveraging time-frequency spectra derived from both PPG and acceleration signals to identify and remove periodic components caused by motion artifacts and isolate the frequencies associated with the heart rate [
13]. Notable algorithms include
SpaMa [
14] and its enhanced version,
SpaMaPlus [
15], which rely on spectral analysis and focus on identifying and removing peaks in the acceleration spectrum from the PPG spectrum to mitigate motion artifacts. The highest remaining peak in the PPG spectrum is then used to calculate the heart rate.
SpaMaPlus improves upon
SpaMa by incorporating a mean filter over recent heart rate estimates and implementing a heart rate tracking step and a reset mechanism to handle abrupt changes in heart rate. More advanced algorithms include TROIKA [
16] and JOSS [
17], which leverage sparsity-based spectrum estimation and spectral peak tracking techniques to estimate heart rate during intense physical activity, such as treadmill running. TROIKA employs singular spectrum analysis (SSA) for signal decomposition and reconstruction, discarding the SSA components whose frequencies closely match those of motion artifacts detected from accelerometer signals. JOSS extends this approach by formulating a joint sparse signal recovery model, enabling improved spectrum estimation using both PPG and accelerometer data. The spectral peak tracking mechanism reinforces the heart rate estimation by assuming that the spectral peak corresponding to the heart rate remains constant or shifts minimally between overlapping time windows.
Many existing algorithms rely on subject-specific tuning, where several adjustable parameters are tuned to each session specifically, which limits their generalization to daily life where ground truth data are unavailable for calibration [
18]. Moreover, these algorithms are often evaluated on short-duration datasets, which do not account for the variability in and complexity of longer-term recordings. The use of longer datasets is critical for ensuring real-world application and robust performance across diverse scenarios. In particular, many landmark studies have relied heavily on the IEEE Signal Processing Cup in 2015 datasets [
16,
17], which were collected in controlled laboratory settings for a short duration with limited types of physical activities.
More recently, deep learning methods have been proposed as powerful alternatives to classical methods of signal processing and PPG-based heart rate estimation [
15,
18,
19,
20]. Several studies have demonstrated that deep learning models, such as convolutional neural networks (CNNs) [
21] and recurrent neural networks (RNNs) [
22], can outperform traditional algorithms in mitigating motion artifacts and providing accurate heart rate measurements. Biswas et al. combined CNN and LSTM architectures to estimate heart rate and perform biometric identification using pre-processed PPG signals [
20]. Similarly, Shen et al. [
23] and Shashikumar et al. [
24] employed a 50-layer ResNeXt CNN and a wavelet transform followed by a CNN, respectively, to detect atrial fibrillation from PPG signals. Reiss et al. [
18] introduced an end-to-end deep learning framework for heart rate estimation, leveraging convolutional neural networks to process the time-frequency spectra of synchronized PPG and accelerometer signals, which significantly outperformed classical approaches.
Unfortunately, deep learning approaches often come with significant computational costs, requiring high processing power and memory resources. Such requirements pose challenges for their integration into smart wearable devices, which are constrained by limited computational capabilities and battery life [
25,
26].
In this work, we demonstrate that it is possible to achieve performance comparable to state-of-the-art deep learning models without relying on machine learning. Our proposed method highlights that, in certain applications, effective signal processing and algorithmic innovations can bridge the gap traditionally addressed by deep learning. This is particularly relevant for scenarios where machine learning may not be feasible due to computational constraints, limited access to annotated training datasets, or preferences for simpler, interpretable solutions. By advancing non-machine learning approaches, we provide a valuable alternative that expands the toolbox of techniques available for wearable applications, supporting the development of lightweight and scalable solutions.
3. Results and Discussion
The results are presented as the mean absolute error (MAE) in beats per minute (bpm) for the current method without any subject-specific or dataset-specific tuning, with the same parameters applied uniformly across all the datasets. The MAE results for the publicly available datasets are compared to those of the classical methods (
SpaMa,
SpaMaPlus, and
Schaeck2017) as well as the average and ensemble CNN models presented in [
18], which employed leave-one-session-out cross-validation.
The results obtained demonstrate that non-machine learning solutions can serve as a viable alternative for heart rate estimation. Specifically, our proposed method reduced the MAE for the
PPG-DaLiA dataset by 1.45 bpm, as shown in
Table 2. While the CNN ensemble previously achieved the lowest MAE of 7.65 ± 4.2 bpm, our method obtained an MAE of 6.2 ± 2.0 bpm. Likewise, the MAEs for the
WESAD dataset showed less than 1 bpm difference between the CNN ensemble and our proposed method, where MAEs of 7.47 ± 3.3 bpm and 8.1 ± 2.6 bpm were obtained, respectively. For the
WESAD dataset, our method outperformed classical approaches (
SpaMa,
SpaMaPlus, and
Schaeck2017) as well as the CNN average by achieving lower MAE values, as detailed in
Table 3. For the
IEEE_Test dataset, our method also demonstrated a lower MAE (10.8 ± 9.6 bpm) compared to all the other methods, except for
SpaMa (9.2 ± 11.4 bpm).
However, our algorithm did not perform equally well across all the datasets. Namely, on the
IEEE_Training dataset, our method yielded an MAE of 6.5 ± 3.6 bpm, which was higher than that of all the other evaluated methods except
SpaMa. For instance, the CNN ensemble achieved a lower MAE of 4 ± 5.4 bpm. Detailed results for the
IEEE_Training and
IEEE_Test datasets are shown in
Table 4 and
Table 5, respectively.
One objective of collecting the UTOKYO dataset was to explore the use of finger PPG and accelerometer signals, as all the other datasets were recorded using wrist-worn devices. A major challenge when analyzing signals from a smart ring is the high level of motion artifacts. These elevated noise levels result from the smart ring being prone to rotating and shifting around the finger during movement, unlike wrist-worn devices, such as smart watches, which usually maintain a fixed orientation. Additionally, 35 running sessions from the dataset were performed outdoors, which exhibited greater motion artifacts than those obtained indoors under laboratory conditions, potentially due to increased exposure to ambient light and greater variability in movements.
To quantify the influence of motion artifacts on the PPG signals of each dataset, we calculated three signal quality indices (SQIs) previously defined by Song et al. [
28]. The P index indicates the presence of high-frequency noise by measuring the reduction in local extrema after smoothing. The Q index reflects the influence of baseline wander, and the R index assesses motion artifact contamination based on the variability in the available peak and valley points. Higher values across all three indices indicate better signal quality. To facilitate a comparison between datasets, we also report the relative signal quality index (rSQI), which expresses the signal quality of each dataset relative to the
UTOKYO dataset. The SQIs are presented in
Table 6. All the rSQI values are positive, indicating that the
UTOKYO dataset contains the noisiest PPG signals among all the datasets evaluated.
The three classical methods and the current method were evaluated on the
UTOKYO dataset. For the classical methods, the results were obtained with session-specific tuning. Previous findings show that these methods are highly sensitive to parameter setting [
18], which motivated the use of session-specific tuning to allow for a comparison between the lowest achievable MAEs and those of our method, which maintained fixed and unchanged parameters. The adjusted parameters for the
SpaMa methods included the number of PPG and acceleration peaks considered in the spectral analysis, as well as the minimum frequency difference required to remove overlapping peaks [
18]. The adjusted parameters for the
Schaeck2017 algorithm included the maximum allowable difference between two consecutive heart rates, the standard deviation used in the Gaussian band stop filter, and the size of the correlation window [
29].
The proposed method evaluated on the
UTOKYO dataset achieved an overall MAE of 7.9 ± 8.2 bpm, whereas the
SpaMa and
SpaMaPlus methods yielded an MAE of 37.6 ± 26.2 bpm and 32.3 ± 26.0 bpm, respectively. The
Schaeck2017 algorithm obtained an MAE of 14.1 ± 15.7 bpm. Results are summarized in
Table 7.
Figure 6 compares the reconstructed heart rates for two sessions from the
UTOKYO dataset using the three classical methods and the current method.
Figure 6a,c correspond to an indoor walking session recorded under low-noise conditions. The
SpaMa and
SpaMaPlus methods both yielded MAEs of 2.19 bpm, while the proposed algorithm achieved an MAE of 1.33 bpm, and the
Schaeck2017 method obtained an MAE of 4.0 bpm. In contrast,
Figure 6b,d present an outdoor running session with substantially higher noise levels. Under these conditions, the performance of
SpaMa and
SpaMaPlus deteriorates significantly, with MAEs of 33.0 and 20.4 bpm, respectively. The current method remains more robust in this high-noise environment, achieving an MAE of 2.6 bpm. The
Schaeck2017 method yields an MAE of 8.9 bpm. These results suggest that while
SpaMa and
SpaMaPlus can be effective in evaluating signals with less motion artifacts, their application may be limited in real-world scenarios.
Although the performance of deep learning models is compared to our proposed method for the publicly available datasets, a comparison was not conducted for the UTOKYO dataset. This decision was based on the complexity involved in implementing and validating deep learning models within the scope of this work. Thus, further investigation into the application of deep learning methods for finger-based PPG and accelerometer data is warranted.
A common issue with many tracking algorithms is the accumulation of errors over time. Our method tries to mitigate this issue by not relying solely on the previous heart rate value but instead considering multiple values. As such, even in cases where the algorithm makes an incorrect prediction, the error propagation can be limited. However, this improvement comes at the cost of requiring a longer time window, which may be a limitation for real-time applications that demand immediate feedback.
In terms of computational efficiency, our method performed comparably to the three classical approaches. Specifically, across all five datasets, our method was on average 28% faster, albeit using 6% more memory than the
Schaeck2017 method, which was specifically designed for embedded applications and reported to be up to 80 times faster than the JOSS algorithm [
29]. By contrast, our method was approximately 2% slower than
SpaMa and 8% slower than
SpaMaPlus, while requiring 52% and 49% less memory, respectively. Regarding deep learning-based methods, the CNN architecture presented in [
18] utilizes 8.5 million parameters and requires 69.5 million computations per heart rate estimation, making it unsuitable for deployment on resource-constrained devices. However, the authors also introduce a resource-optimized CNN model with only 26 K parameters, designed to operate within a 32 KB memory footprint, which increases the MAE of the
PPG-DaLiA and
WESAD datasets to 9.99 ± 5.9 bpm and 8.2 ± 3.6 bpm, respectively. As noted previously, we did not implement this CNN model in the current study, and further investigation is warranted for a complete assessment of its computational efficiency.
Furthermore, the final post-processing step of our algorithm includes a moving average filter. We chose to incorporate this step because commercial PPG- and ECG-based wearables commonly apply similar post-processing to enhance the readability of the displayed heart rate signal for users. However, depending on the target application, particularly those requiring a faster response time, it may be desirable to reduce the filter’s window size. Thus, to evaluate the influence of the window size, we repeated the analysis using a shorter 10-second window. The resulting changes in the MAE were minimal, with variations remaining within ±1.3 bpm across all the datasets. Specifically, the MAE increased for PPG-DaLiA (+0.5 bpm), UTOKYO (+0.6 bpm), and WESAD (+1.2 bpm), while it decreased for IEEE_Test (–0.4 bpm) and IEEE_Training (–1.3 bpm).
An additional drawback of the current method is its reliance on the presence of detectable PSD peaks. In scenarios where the input signal is entirely corrupted or absent, such as when the sensor loses consistent contact with the skin, this method may be less reliable than machine learning or deep learning models that can leverage other data sources and health trends to generate HR estimates.