Next Article in Journal
Underwater Antenna Technologies with Emphasis on Submarine and Autonomous Underwater Vehicles (AUVs)
Next Article in Special Issue
A Degradation-Aware Dual-Path Network with Spatially Adaptive Attention for Underwater Image Enhancement
Previous Article in Journal
Image Haze Removal Using Dual Dark Channels with the Whale Optimization Algorithm and an Image Regression Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Unified Benchmarking Framework for Classical Machine Learning Based Heart Rate Estimation from RGB and NIR rPPG

by
Sahar Qaadan
1,*,
Ghassan Al Jayyousi
1,* and
Adam Alkhalaileh
2,*
1
Mechatronics Engineering Department, German Jordanian University, Madaba Street, Amman 11180, Jordan
2
Mechanical and Maintenance Engineering Department, German Jordanian University, Madaba Street, Amman 11180, Jordan
*
Authors to whom correspondence should be addressed.
Electronics 2026, 15(1), 218; https://doi.org/10.3390/electronics15010218
Submission received: 21 November 2025 / Revised: 23 December 2025 / Accepted: 30 December 2025 / Published: 2 January 2026
(This article belongs to the Special Issue Image Processing and Analysis)

Abstract

This work presents a unified benchmarking framework for evaluating classical machine-learning–based heart-rate estimation from remote photoplethysmography (rPPG) across both RGB and near-infrared (NIR) modalities. Despite extensive research on algorithmic rPPG methods, their relative robustness across datasets, illumination conditions, and sensor types remains inconsistently reported. To address this gap, we standardize ROI extraction, signal preprocessing, rPPG computation, handcrafted feature generation, and label formation across four publicly available datasets: UBFC-rPPG Part 1, UBFC-rPPG Part 2, VicarPPG-2, and IMVIA-NIR. We benchmark five rPPG extraction methods (Green, POS, CHROM, PBV, PCA/ICA) combined with four classical regressors using MAE, RMSE, and R 2 , complemented by permutation feature importance for interpretability. Results show that CHROM is consistently the most reliable algorithm across all RGB datasets, providing the lowest error and highest stability, particularly when paired with tree-based models. For NIR recordings, PCA with spatial patch decomposition substantially outperforms ICA, highlighting the importance of spatial redundancy when color cues are absent. While handcrafted features and classical regressors offer interpretable baselines, their generalization is limited by small-sample datasets and the absence of temporal modeling. The proposed pipeline establishes robust cross-dataset baselines and offers a standardized foundation for future deep-learning architectures, hybrid algorithmic–learned models, and multimodal sensor-fusion approaches in remote physiological monitoring.

1. Introduction

Remote photoplethysmography (rPPG) enables non-contact estimation of physiological signals—most notably heart rate—using only standard video cameras. By tracking subtle color or reflectance fluctuations on the skin surface caused by blood-volume changes, rPPG provides an unobtrusive alternative to contact-based sensors and has become increasingly relevant for health monitoring, human–computer interaction, and affective computing. Numerous signal-extraction algorithms have been proposed, including chrominance-based methods (CHROM [1], POS [2]), projection-based techniques (PBV [3]), and blind source separation approaches such as PCA and ICA [4]. The green-channel baseline remains a foundational method in color-based rPPG [5]. However, their reported performance varies widely across datasets, sensor modalities, illumination conditions, and subject motion, making it challenging to assess their true robustness.
Classical machine-learning regressors—such as Random Forest [6], XGBoost [7], Support Vector Regression [8], and Linear Regression [9,10]—remain commonly used in rPPG research due to their simplicity, interpretability, and suitability for small datasets. Yet the interaction between rPPG extraction methods, handcrafted features, and model choice is still insufficiently explored. Many prior studies evaluate only a single dataset or a narrow selection of algorithms, leaving open questions regarding cross-dataset reliability, feature relevance, and the generalizability of classical pipelines for physiological estimation.
To address these gaps, this study presents a unified benchmarking framework that standardizes ROI extraction, signal preprocessing, rPPG estimation, feature computation, and label generation across four publicly available datasets: UBFC-rPPG Part 1 and Part 2 [11], VicarPPG-2 [12], and IMVIA-NIR [13]. These datasets collectively span low-cost RGB recordings, high-frame-rate RGB videos with controlled motion and stress conditions, and near-infrared (NIR) facial recordings designed to evaluate rPPG performance under monochromatic illumination where color cues are absent. Within this unified pipeline, we benchmark five rPPG extraction methods (Green [5], POS [2], CHROM [1], PBV [3], and PCA/ICA [4] for NIR) and four machine-learning regressors using MAE, RMSE, and R 2 , complemented by permutation feature importance for model interpretability.
By comparing algorithmic rPPG methods and regression models under consistent conditions across both RGB and NIR modalities, this work establishes reliable baselines, identifies the most robust combinations of features and regressors, and highlights limitations inherent to small-sample classical machine-learning pipelines. The results provide a foundation for future research, including deep-learning architectures, hybrid algorithmic–learned models, and multimodal sensor-fusion approaches for remote physiological monitoring.

2. Benchmarking Datasets

2.1. UBFC-rPPG Part 1 Dataset

The UBFC-rPPG Part 1 dataset is a publicly available facial video collection designed for evaluating remote photoplethysmography (rPPG) methods [11]. It includes 6 subjects with 7 recordings captured at 30 fps using a Logitech webcam. Each recording is approximately two minutes long, synchronized with fingertip pulse oximeter measurements to provide ground-truth heart rate [11].

2.2. UBFC-rPPG Part 2 Dataset

The UBFC-rPPG Part 2 dataset extends the original collection to 42 subjects and 42 recordings, each about two minutes long [11]. Videos were captured with a low-cost Logitech C920 HD webcam at 30 fps and 640 × 480 RGB resolution. Ground-truth heart-rate signals were obtained using a fingertip pulse oximeter (CMS50E, Contec Medical Systems Co., Ltd., Qinhuangdao, China). During recordings, subjects performed a fast, mentally engaging math task to emulate realistic computer-usage scenarios. This dataset provides tightly synchronized video and physiological signals, making it suitable for benchmarking ROI-selection strategies, rPPG signal-extraction algorithms, and heart-rate estimation methods [2].

2.3. VicarPPG-2 Dataset

The VicarPPG-2 dataset is a high-frame-rate, multi-modal benchmark for evaluating heart-rate and short-term HRV estimation [12]. It contains 10 subjects recorded at 60 fps using a Logitech Brio webcam (Logitech International S.A., Lausanne, Switzerland) under controlled indoor lighting, with simultaneous ECG (250 Hz) and finger-PPG (60 Hz) ground truth. Each participant completed four 5-min conditions—baseline resting, structured head-movement tasks, a Stroop-based stress game, and post-workout recovery—totaling 200 min of synchronized video and physiological signals. The dataset is notable for long continuous recordings, realistic motion and stress variations, and dual-sensor ground truth, making it ideal for benchmarking modern rPPG and HRV algorithms [12].

2.4. IMVIA-NIR

The IMVIA-NIR dataset is a specialized public resource specifically curated for advancing research and benchmarking methods in remote photoplethysmography (rPPG) [13]. Designed to address the scarcity of publicly available resources in the Near-Infrared (NIR) vision domain, the dataset features 20 videos collected from 10 diverse subjects in an indoor setting. Data acquisition was performed using an IDS UI-3240ML-NIR-GL camera (IDS Imaging Development Systems GmbH, Obersulm, Germany) and an 850 nm NIR LED light source (OSRAM Opto Semiconductors GmbH, Regensburg, Germany), with videos recorded at 20 frames per second at a 1280 × 1024 resolution. A key feature of the dataset is its structure into two challenging subsets: one where subjects were static (‘still’) and another where subjects were speaking (‘talking’), the latter introducing significant motion artifacts. For ground-truth physiological data, the dataset provides synchronized Blood Volume Pulse (BVP) signals recorded at 64 Hz using an Empatica E4 watch (Empatica Inc., Boston, MA, USA), making it a valuable tool for training and evaluating robust rPPG algorithms under motion and NIR imaging conditions [13].
The characteristics and specifications of the data used for evaluation are summarized in Table 1, which provides an overview of the benchmark datasets used in this study.

3. Methodology

The proposed processing pipeline unifies ROI extraction, signal construction, preprocessing, rPPG estimation, feature extraction, and label generation across all RGB and NIR [11,12,13] datasets, while maintaining dataset-specific training and evaluation in order to enable fair and comparable benchmarking rather than cross-dataset generalization. For the RGB recordings, facial regions were detected using a Haar cascade and the mean red, green, and blue intensities were extracted from each frame to form three temporal color traces. In addition to the chrominance-based rPPG algorithms, the green channel was also included as a standalone method due to its well-established high signal-to-noise ratio for pulsatile analysis [5]. In contrast, the NIR dataset provides single-channel grayscale videos; therefore, each detected facial ROI was divided into a fixed 3 × 3 grid, producing nine spatially distinct intensity signals per frame. These patch-based signals were later decomposed using PCA and ICA to obtain candidate pulsatile components. All signals—whether derived from RGB channels, the green channel, or NIR patches were subsequently scaled, bandpass-filtered, and segmented with 50% overlap to ensure a unified preprocessing pipeline. rPPG estimation was performed using algorithmic chrominance-based methods (POS, CHROM, PBV), the standalone green-channel trace, and blind source separation methods (PCA/ICA) for the NIR recordings. The resulting waveforms were used to compute a standardized set of handcrafted temporal, spectral, and nonlinear features, while synchronized PPG signals provided per-segment heart-rate labels via peak counting. Finally, four machine-learning models Random Forest, XGBoost, Support Vector Regression, and Linear Regression—were trained on these features, with permutation feature importance (PFI) used to assess and refine the contribution of each feature to the regression performance.
To clarify the scope of evaluation, all models were trained and tested within each dataset independently, and no cross-dataset or cross-modality train–test transfer experiments (e.g., training on UBFC and testing on Vicar) were performed. Accordingly, the reported results reflect method performance under standardized processing conditions across datasets, rather than model generalization or transferability across domains or sensing modalities.

3.1. ROI Extraction and Signal Construction

3.1.1. RGB Datasets

For each video, facial regions were detected using a Haar cascade classifier for frontal faces [14]. To ensure temporal consistency of the region of interest (ROI) and to reduce frame-to-frame jitter, the detected face bounding boxes were stabilized using a median-based approach: bounding box coordinates (position and size) were aggregated across frames, and the median bounding box was selected and applied consistently to all frames of the video. Videos were retained only if successful face detection was achieved in all but at most five frames, ensuring reliable ROI extraction; occasional missed detections within this tolerance were considered negligible and did not introduce meaningful discontinuities in the extracted signals. From each frame within the stabilized ROI, the mean intensity values of the red, green, and blue channels were computed, forming three temporal traces corresponding to R, G, and B. These traces were stored as .npy files to ensure uniform processing across datasets.

3.1.2. NIR Dataset

For the NIR recordings, the videos are provided in single-channel grayscale format [13]. A frontal-face Haar cascade classifier [14] was then applied to locate the facial region in each frame. Whenever a face was detected, the corresponding bounding box was extracted and used as the region of interest.
Unlike the RGB datasets, where a single mean value per color channel was computed, the grayscale face ROI in the NIR dataset was divided into a fixed 3 × 3 grid, producing nine equally sized spatial patches. For each patch, the mean pixel intensity was computed, resulting in a nine-dimensional feature vector that captures coarse spatial variations across the face. This patch-based design provides richer information in the absence of color channels and helps preserve local reflectance differences that are relevant for rPPG estimation.
For every video, all nine-patch intensity vectors were concatenated over time, yielding a matrix of shape ( T , 9 ) , where T denotes the number of frames. These matrices were saved as .npy files to maintain consistency with the storage format used for the RGB datasets and to enable uniform downstream processing.

3.2. Signal Preprocessing

3.2.1. RGB Datasets

Each color-channel is scaled from the original range of [ 0 ,   255 ] to [ 0 ,   1 ] to reduce variability caused by illumination differences and to standardize subsequent filtering operations. Then each color-channel was processed using a third-order Butterworth bandpass filter with cutoff frequencies of 0.7–3.0 Hz, covering the physiological heart-rate range of approximately 42–180 bpm. This filtering step suppresses slow illumination drift and high-frequency noise while preserving the dominant cardiac oscillations.
To increase the number of training samples and improve robustness, the filtered RGB signals were segmented using a sliding window with 50% overlap. This ensures equal-length temporal samples and provides more data points for training the regression models.

3.2.2. NIR Dataset

For the NIR recordings, the raw grayscale facial signals were first decomposed using PCA and ICA, as described in the following section. After extracting the principal and independent components, the resulting waveforms underwent the same preprocessing steps as the RGB datasets: amplitude scaling to the range [ 0 ,   1 ] , Butterworth bandpass filtering [15] (0.7–3.0 Hz), and segmentation using a sliding window with 50% overlap. This ensures that all datasets, regardless of modality, follow a consistent preprocessing pipeline.

3.3. rPPG Method Computation

3.3.1. RGB Datasets

The rPPG windowing procedure is independent of the segmentation length. Specifically, within each segment, all rPPG methods employed a fixed internal window length of 1.6 s, corresponding to 48 samples at 30 Hz (UBFC-rPPG Part 1/2) and 96 samples at 60 Hz (VicarPPG-2). Adjacent windows overlapped by 50%, and the resulting window-level pulse estimates were combined using Hann-weighted overlap-add reconstruction to form a continuous waveform within each segment.
For each window, RGB values are mean-normalized and the local covariance matrix Σ is computed with ridge regularization to ensure numerical stability. Specifically, a small constant ε = 10 6 is added to the diagonal of Σ prior to inversion.
POS Method
The POS algorithm follows the projection-based formulation proposed in [16] and relies on lightweight temporal normalization without variance equalization across channels. For each window, the RGB segment is first mean-normalized by dividing each channel by its temporal mean and subtracting 1. The normalized segment C ( t ) is then projected onto two chrominance directions,
X = 3 R 2 G , Y = 1.5 R + G 1.5 B
A balancing coefficient,
α = σ ( X ) σ ( Y )
is computed per window, and the instantaneous pulse estimate is formed as
S ( t ) = X ( t ) α Y ( t )
Each windowed waveform is subsequently zero-centered and standardized, multiplied by a Hann window, and accumulated via overlap-add. After reconstruction, the POS signal is globally mean-centered and normalized to unit variance.
CHROM Method
The CHROM algorithm is implemented following the chrominance-based formulation introduced in [1]. Although POS and CHROM are historically presented with different projection definitions, both methods can be expressed within a unified linear chrominance-projection framework. In this work, a common projection notation is adopted to emphasize implementation consistency across rPPG methods rather than to imply algorithmic equivalence.
Unlike POS, which applies only per-window mean normalization prior to projection, CHROM introduces stronger illumination compensation. The RGB recording is first globally mean-normalized once over the entire signal. Subsequently, for each window, the RGB channels are zero-centered and standardized to equalize variances across channels, thereby improving robustness to illumination drift and large-scale color imbalance.
After applying the CHROM-specific chrominance projection and linear combination—expressed here using the unified notation of Equations (1)–(3)—each window is standardized, Hann-weighted, and reconstructed via overlap-add. The final waveform is then globally mean-centered and normalized to unit variance. While the mathematical expressions are written using the same symbols as POS for clarity, the defining distinction between the two methods lies in their normalization strategies, consistent with the original CHROM formulation.
PBV Method
The PBV algorithm constructs a data-driven projection based on a predefined skin-tone direction [3]. For each window, RGB values are mean-normalized and the local covariance matrix Σ is computed with ridge regularization to ensure numerical stability. Specifically, a small constant ε = 10 6 is added to the diagonal of Σ prior to inversion. Let u = [ 0.33 , 0.77 , 0.53 ] T denote the normalized empirical skin-tone vector. The PBV projection vector is obtained as
z = Σ 1 u Σ 1 u
The rPPG waveform is then computed by projecting the normalized RGB segment onto z:
S ( t ) = C ( t ) z
Each window is standardized, Hann-weighted, and combined using overlap-add. A third-order Butterworth bandpass filter (0.7–3 Hz) is applied to the reconstructed signal, which is then globally standardized.
Overlap-Add Reconstruction
All three methods employ the same reconstruction strategy. Let w [ n ] denote the Hann window and S k [ n ] the normalized pulse estimate for window k. The final waveform is given by
x [ n ] = k S k [ n ] w [ n ] k w [ n ] + ε
where ε is a small constant used to avoid division by zero in samples not fully covered by overlapping windows. The reconstructed rPPG signals are globally normalized prior to feature extraction.
Green Channel
The green channel is widely used in rPPG because hemoglobin absorbs light most strongly in the green region of the visible spectrum [5]. As a result, pulsatile blood-volume changes produce the highest signal-to-noise ratio (SNR) in the green channel compared to the red or blue channels. This makes the green trace particularly effective for extracting cardiac-related oscillations, even under varying illumination. Therefore, in addition to the three rPPG methods, the extracted and filtered green-channel segments were also included as input to the machine-learning models.

3.3.2. NIR Dataset

For the NIR recordings, the videos are provided in single-channel grayscale format [13]. A frontal-face Haar cascade classifier [14] was applied to locate the facial region in each frame. Whenever a face was detected, the corresponding bounding box was extracted and used as the region of interest (ROI).
Unlike the RGB datasets, where a single mean value per color channel was computed, the grayscale face ROI in the NIR dataset was uniformly partitioned into a fixed 3 × 3 grid in pixel space, producing nine equally sized rectangular patches after ROI cropping. The patch dimensions were determined by integer division of the ROI height and width, ensuring that all patches have identical pixel size. For each patch, the mean pixel intensity was computed, resulting in a nine-dimensional feature vector that captures coarse spatial intensity variations across the face.
For every video, the nine-patch intensity vectors were concatenated over time, yielding a matrix of shape ( T , 9 ) , where T denotes the number of frames. No normalization was applied at the pixel or patch level prior to extraction. Instead, standardization was performed at the feature level as part of the preprocessing pipeline before applying PCA or ICA, ensuring comparable scaling across patch-based temporal signals while preserving their relative spatial intensity relationships.
These matrices were saved as .npy files to maintain consistency with the storage format used for the RGB datasets and to enable uniform downstream processing.
PCA-Based rPPG Extraction
For PCA, the ( T , 9 ) patch matrix was projected into a decorrelated component space [17]. All nine principal components were retained. For each component, a power spectral density estimate was computed using the periodogram. The component exhibiting the strongest spectral peak within the physiological heart-rate band (0.7–3 Hz) was selected as the rPPG estimate. This “maximum-HR-peak” criterion is widely used in BSS-based rPPG methods to identify the source dominated by the pulsatile rhythm.
ICA-Based rPPG Extraction
The ICA-based method followed an analogous pipeline, but used independent component decomposition to recover statistically independent latent sources [18]. The ( T , 9 ) patch matrix was whitened and separated into nine ICA components. For each component, its periodogram was computed, and the one with the strongest peak in the heart-rate band was chosen as the recovered rPPG signal. As with PCA, the resulting component was filtered to produce the final waveform.
Post-Processing
As described in the Signal Preprocessing section, the selected PCA- or ICA-derived waveform was then subjected to the same preprocessing steps used for the RGB datasets. The signal was first scaled to the range [ 0 ,   1 ] and filtered using a third-order Butterworth bandpass filter (0.7–3 Hz) to isolate the cardiac frequency band. The filtered waveform was subsequently segmented into fixed-length windows using a 50% overlapping sliding window, ensuring consistency with the RGB preprocessing pipeline. This uniform post-processing procedure allows the NIR-derived rPPG signals to be directly comparable to those extracted from the RGB datasets.

3.4. Feature Extraction

For each segmented rPPG waveform, a total of fifteen handcrafted features were extracted. These features were computed independently for the green channel, CHROM, POS, and PBV signals, as well as the PCA- and ICA-derived NIR signals. The feature set included five time-domain descriptors, five frequency-domain descriptors, and five nonlinear dynamical features, described as follows.
(1)
Time-Domain Features
Five standard statistical features were computed directly from the amplitude distribution of the signal segment:
  • Mean of the waveform.
  • Variance as a measure of amplitude dispersion.
  • Skewness, characterizing waveform asymmetry.
  • Kurtosis, describing the heaviness of the signal tails.
  • Lag-1 autocorrelation, computed by normalizing the autocorrelation of the zero-mean signal at a one-sample lag.
These metrics capture the fundamental statistical structure of the temporal waveform and its linear dependencies.
(2)
Frequency-Domain Features
Frequency-domain characteristics were computed using the periodogram power spectral density (PSD). After normalizing the PSD to unit total power, the following features were extracted:
  • Dominant frequency, corresponding to the PSD peak.
  • Dominant power, i.e., the PSD magnitude at the dominant frequency.
  • Spectral centroid, the power-weighted mean frequency.
  • Spectral entropy, quantifying spectral flatness.
  • Spectral bandwidth, computed as the standard deviation of the spectrum around the centroid.
These features capture oscillatory behavior relevant to heart-rate estimation, including periodicity strength and spectral complexity.
(3)
Nonlinear Features
To characterize dynamical and complexity-related properties of the rPPG signal, five nonlinear features were extracted:
  • Hjorth activity, equivalent to the signal variance [19].
  • Hjorth mobility, describing the mean frequency of the signal based on first derivatives [19].
  • Hjorth complexity, measuring the change in frequency content over time [19].
  • Sample entropy, quantifying the irregularity and unpredictability of the waveform [20].
  • Permutation entropy, a complexity measure based on ordinal pattern statistics [21].
These nonlinear descriptors provide sensitivity to subtle changes in waveform morphology and dynamical structure beyond linear measures.
(4)
Standardization
For machine-learning models requiring scale-invariant inputs (SVR and Linear Regression), all features were standardized using z-score normalization computed from the training set only. The z-score of a feature value x was computed as
z = x μ σ
where μ and σ denote the mean and standard deviation of the feature within the training set. Feature scaling was not applied to tree-based models (Random Forest and XGBoost), which are inherently insensitive to feature magnitude.

3.5. Heart-Rate Label Construction

For each segmented video sample, a corresponding ground-truth heart-rate label was computed from the synchronized PPG signal. Because the four datasets provide ground-truth PPG in different formats, two approaches were used.
(1)
Peak-Based Labeling for UBFC-rPPG Part 1, UBFC-rPPG Part 2, and IMVIA-NIR
For the first three datasets, the raw PPG waveforms do not include explicit annotations for systolic peaks. Therefore, peak detection was performed using the NeuroKit2 library [22], which provides a validated implementation of the Elgendi peak detection pipeline. Prior to peak detection, each PPG segment was band-pass filtered using a third-order zero-phase IIR Butterworth filter with cut-off frequencies of 0.5–8 Hz to suppress baseline wander and high-frequency noise while preserving cardiac pulsations. Systolic peaks were then identified using the Elgendi method, which employs adaptive thresholding based on moving-average envelopes of the squared signal, thereby reducing sensitivity to amplitude fluctuations and motion artifacts. Segments with insufficient or failed peak detections were retained, as peak-count–based estimation inherently reflects signal quality degradation; however, short detection gaps (on the order of a few frames) were negligible relative to the segment duration and did not materially affect heart-rate estimation.
Let N peaks denote the number of PPG peaks in a segment and T the segment duration in minutes. The heart rate (in beats per minute) for that segment was computed as:
HR = N peaks T
This procedure yields an effective heart-rate estimate aligned with the same temporal window used for rPPG feature extraction.
It is worth noting that for short temporal windows, particularly the 10 s segments used in the UBFC datasets, peak-count–based heart-rate estimation is subject to quantization effects. In this setting, a single missed or spurious peak corresponds to a discrete HR change of approximately ±6 BPM, which can manifest as isolated error spikes. While alternative labeling strategies—such as computing HR from the average RR (or PP) interval—can reduce this quantization error when reliable inter-beat annotations are available, peak counting was intentionally adopted in this work to maintain methodological consistency across datasets and to reflect realistic short-window rPPG operating conditions. The impact of this quantization effect is therefore acknowledged and considered when interpreting results on short segments.
(2)
Labeling for the VicarPPG-2 Dataset
The VicarPPG-2 dataset includes PPG signals with pre-annotated systolic peak indicators provided for every sample [12]. In this case, peak detection was not required. Instead, the number of peaks inside each segmented PPG window was directly counted using the supplied annotations. The heart-rate label for each segment was then computed identically to the previous datasets using Equation (8).
(3)
Summary
In all datasets, ground-truth labels were constructed by peak counting over the exact temporal extent of each segment, ensuring temporal alignment between the rPPG features and the reference heart-rate values. This peak-counting approach avoids reliance on precomputed dataset-level heart-rate values and allows per-segment labeling consistent with the segmentation scheme used throughout the pipeline.

3.6. Data Augmentation

Given the limited size of the UBFC-rPPG Part 1 and IMVIA-NIR datasets, data augmentation was performed by adding zero-mean Gaussian noise to each extracted rPPG segment [23]. The noise standard deviation was fixed to 10 3 and was applied uniformly across all segments, independent of the underlying signal variance. Feature extraction was then applied to the augmented segments using the same procedure as for the original data. The ground-truth labels remained unchanged, as they were derived directly from the reference sensor recordings.

3.7. Machine-Learning Regression and Evaluation

The resulting feature set was divided into training and testing subsets using an 80/20 split. This split was implemented such that video segments originating from the same recording never appeared in both the training and testing sets, thereby preventing temporal leakage and ensuring a realistic evaluation of generalization performance. Four regression models were evaluated: Random Forest Regressor [6], XGBoost Regressor [7], Support Vector Regression (SVR) [8], and Linear Regression [9,10].
To ensure full repeatability of the benchmarking framework, all model hyperparameters, random seeds, and software versions were explicitly fixed and reported. The evaluated models were configured as follows:
  • Random Forest Regressor:n_estimators = 200, max_depth = None, random_state = 42, n_jobs = −1.
  • Support Vector Regression (SVR): RBF kernel with C = 10 and epsilon = 0.1. All remaining parameters were kept at scikit-learn default values, including gamma = “scale”.
  • Linear Regression: Default scikit-learn configuration.
  • XGBoost Regressor:n_estimators = 300, learning_rate = 0.05, max_depth = 5, random_state = 42, n_jobs = −1. All other parameters were left at XGBoost default values.
The random seed (random_state = 42) was fixed for all stochastic models to ensure deterministic behavior across runs.
Each model was trained independently for every rPPG extraction method, and performance was evaluated using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination ( R 2 ).
To assess feature relevance, permutation feature importance (PFI) was computed for each trained model using repeated random permutations on a held-out test set. Feature importance was quantified as the mean change in prediction error (negative mean squared error) over multiple permutation repeats. Features exhibiting consistently negative importance—indicating that their permutation led to slight performance improvements—were interpreted as weak, noisy, or redundant and were removed in a controlled manner, after which models were retrained to assess performance changes.
All experiments were conducted using the following software environment:
  • Python: 3.13.5
  • NumPy: 2.1.3
  • SciPy: 1.15.3
  • pandas: 2.2.3
  • scikit-learn: 1.6.1
  • XGBoost: 3.1.1

4. Results

The performance of all rPPG extraction methods, feature sets, and regression models was evaluated across the four datasets using MAE, RMSE, and R 2 . To ensure fair and comparable evaluation, each dataset was processed using its native segmentation duration, while differences in frame rate were explicitly accounted for by reporting the corresponding segment lengths in samples. Specifically, 10-s segments were used for UBFC-rPPG Part 1 and Part 2, corresponding to 300 samples each. For the IMVIA-NIR dataset, 20-s segments were used, corresponding to 400 samples. For the VicarPPG-2 dataset, both 20-s and 25-s segments were evaluated, corresponding to 1200 and 1500 samples, respectively.
Table 2, Table 3, Table 4, Table 5 and Table 6 summarize the results obtained using both the full 15-feature set and the PFI-reduced subsets. Immediately after each table, we list the positive-mean PFI features in descending order for the CHROM method under the two best-performing models overall—Random Forest and XGBoost. These ranked feature lists correspond exactly to the “PFI-Reduced Features” columns shown in the tables and provide a transparent view of which features contributed most strongly to the improved performance obtained with permutation-based feature selection. Across datasets, clear performance differences emerged between rPPG methods and regression models, while permutation-based feature reduction influenced performance to varying degrees. The following tables present these quantitative comparisons in detail.
  • UBFC-rPPG Part 1—Features with Positive Mean PFI used for CHROM
  • RF: Dominant Frequency, Skewness, Sample Entropy
  • XGB: Dominant Frequency, Skewness, Mean, Sample Entropy, Autocorrelation (lag = 1), Spectral Bandwidth, Dominant Power, Spectral Centroid
  • UBFC-rPPG Part 2—Top Features by Positive Mean PFI for CHROM
  • RF: Dominant Frequency, Dominant Power, Sample Entropy, Permutation Entropy, Spectral Entropy, Skewness, Mean, Kurtosis, Hjorth Activity
  • XGB: Dominant Frequency, Dominant Power, Spectral Centroid, Autocorrelation (lag = 1), Sample Entropy, Spectral Entropy, Variance, Hjorth Complexity, Skewness
  • IMVIA-NIR—Top Features by Positive Mean PFI for PCA
  • RF: Skewness, Permutation Entropy, Mean, Sample Entropy, Variance
  • XGB: Skewness, Variance, Mean, Kurtosis, Dominant Power, Spectral Bandwidth, Sample Entropy
  • VICAR-2 (25 s)—Top Features by Positive Mean PFI for CHROM
  • RF: Dominant Frequency, Spectral Centroid, Dominant Power, Kurtosis, Skewness, Hjorth Activity, Variance, Autocorrelation (lag = 1)
  • XGB: Dominant Frequency, Spectral Centroid, Skewness, Variance, Kurtosis, Dominant Power, Autocorrelation (lag = 1)
  • VICAR-2 (20 s)—Top Features by Positive Mean PFI for CHROM
  • RF: Dominant Frequency, Spectral Centroid, Sample Entropy, Skewness, Autocorrelation (lag = 1), Hjorth Activity, Hjorth Mobility, Variance, Hjorth Complexity
  • XGB: Dominant Frequency, Spectral Centroid, Variance, Autocorrelation (lag = 1), Skewness, Spectral Entropy, Sample Entropy

5. Discussion

The results across all four datasets consistently demonstrate that the CHROM algorithm is the most reliable and robust rPPG extraction method in the context of classical machine-learning–based heart-rate regression. Across UBFC-rPPG Part 1, UBFC-rPPG Part 2, and VicarPPG-2, CHROM outperformed POS, PBV, and the standalone green-channel signal for nearly every model and evaluation metric. This is particularly evident for the tree-based regressors (Random Forest and XGBoost), where CHROM achieved the lowest MAE and RMSE and the highest R 2 values. The superior performance of CHROM can be attributed to its stronger illumination compensation and variance-normalization procedures, which stabilize the chrominance components under varying lighting and skin-tone conditions. These results are consistent with previous findings that attribute CHROM’s robustness to its built-in color-balance normalization, making it particularly effective on RGB recordings with realistic illumination variation.
In contrast to the RGB datasets, the IMVIA-NIR recordings required a different strategy for rPPG extraction due to their single-channel grayscale nature. Here, the PCA-based approach proved substantially more effective than ICA across all models. PCA combined with spatial patch decomposition leveraged the structured variation present in the 3 × 3 grid of facial patches, consistently recovering components with strong cardiac periodicity. ICA, while theoretically capable of isolating independent sources, tended to introduce instability and noise amplification due to the limited dimensionality and the absence of color information. The clear superiority of PCA highlights the importance of spatial redundancy when working with NIR data, where color cues are absent and the pulsatile variations must instead be extracted from subtle spatial reflectance modulations.
Negative R 2 values indicate that a model performs worse than a constant predictor equal to the mean ground-truth heart rate, reflecting cases where the extracted rPPG signal contains insufficient physiological information for reliable estimation. The strongly negative R 2 values observed for some classical methods (e.g., Green, PBV, and POS on VicarPPG-2) are primarily caused by severe motion and illumination artifacts that degrade signal quality rather than by deficiencies in the regression models themselves. Since all methods were evaluated under identical conditions, relative performance comparisons remain meaningful even when absolute R 2 values are negative, and simple baselines (mean-HR and dominant-frequency HR) help contextualize when learning-based models provide benefit beyond trivial estimators.
Despite these encouraging results, several challenges inherent to the datasets must be acknowledged. Most recordings—particularly UBFC-rPPG Part 1 and IMVIA-NIR—contain a very limited number of subjects and only a few minutes of video per participant. Even with segment-level augmentation based on Gaussian noise, the overall diversity of physiological states, lighting conditions, and motion patterns remains limited. This restricts the capacity of classical machine-learning models to generalize beyond the specific subjects and recording configurations present in each dataset. Furthermore, because we ensured strict separation of training and test segments by enforcing that no segments from the same video appear in both sets, cross-segment temporal leakage was eliminated at the cost of increased regression difficulty. This conservative split strategy likely contributed to the relatively low or sometimes negative R 2 values observed in several models, especially those constrained by linear assumptions (e.g., Linear Regression).
The limitations of classical machine-learning models themselves also played a significant role. Although Random Forest and XGBoost captured nonlinear dependencies effectively, they lack temporal modeling capabilities and must rely entirely on handcrafted features. Physiological signals such as rPPG waveforms contain rich temporal structure and nonlinear oscillatory patterns that cannot be fully represented by summary statistics alone, regardless of how carefully the features are curated. This limitation is particularly evident in datasets with more motion or illumination variability, where handcrafted features can fail to encode subtle temporal cues essential for stable pulse extraction.
Permutation Feature Importance (PFI) provided additional insight into which features most strongly supported accurate predictions. Across datasets, the dominant frequency, skewness, sample entropy, spectral centroid, and autocorrelation emerged repeatedly among the highest-ranked features for the CHROM method. These results suggest that both frequency-domain periodicity (e.g., dominant frequency, dominant power) and nonlinear dynamical descriptors (e.g., entropy measures, Hjorth parameters) play a central role in capturing the quality of the pulsatile signal extracted by CHROM and PCA. Moreover, removing negatively contributing features via PFI-based selection often improved performance, indicating that some handcrafted descriptors introduced noise rather than providing discriminative value.

6. Conclusions

This study demonstrates that, despite their simplicity and interpretability, classical machine-learning pipelines exhibit inherent limitations when applied to physiological waveform regression—particularly under the small-sample conditions that characterize current rPPG datasets. Nevertheless, two consistent trends emerged across all evaluated datasets. In the RGB modality, CHROM consistently achieved the strongest performance, confirming the effectiveness of chrominance-based normalization for mitigating illumination variability and recovering robust pulse signals. In the NIR modality, the PCA-based approach substantially outperformed ICA, indicating that spatial patch decomposition combined with principal-component selection is a reliable strategy for extracting pulsatile information from grayscale recordings.
Beyond classical machine learning, the proposed framework provides a structured foundation for future methodological extensions. By standardizing ROI extraction, signal preprocessing, segmentation, and label generation across modalities, the pipeline can be directly extended to deep-learning architectures or hybrid algorithmic–learning approaches. In particular, algorithmically extracted rPPG signals (e.g., CHROM or PCA outputs) may serve as structured inputs to learning-based models, while permutation feature-importance analysis can guide feature selection or architectural design in data-limited settings.
Several promising directions emerge for future work. First, the NIR modality could be enriched by incorporating additional grayscale-specific rPPG extraction strategies, such as local reflectance-based or motion-aware methods. Second, more advanced data-augmentation techniques—spanning temporal perturbations and scenario-level variations—could be explored to improve generalization in small-sample regimes. Third, explicit temporal modeling, including lightweight autoregressive models or learned temporal encoders, may further exploit the sequential structure of rPPG signals beyond fixed-window feature extraction. Finally, multimodal fusion strategies combining RGB and NIR information at the feature or decision level represent a natural extension of the unimodal baselines established in this work.
The strict separation of training and testing segments adopted in this study further highlights the need for larger, more diverse rPPG datasets to support robust generalization for both classical and learning-based models. Overall, the results confirm that algorithmic rPPG methods combined with structured preprocessing constitute a strong and interpretable baseline for advancing future research in remote physiological monitoring.
Finally, the proposed benchmarking framework supports sustainability by improving the reliability and reproducibility of camera-based heart-rate monitoring. By reducing dependence on specialized medical hardware and enabling scalable, non-contact sensing, such standardized evaluation pipelines contribute to accessible health-monitoring solutions and broader deployment of early physiological risk-detection systems.
The practical utility of this framework is exemplified in two primary domains. First, in telehealth and remote patient monitoring, standardized evaluation ensures that rPPG-derived vitals achieve the clinical consistency necessary for diagnostic support, allowing healthcare providers to monitor chronic conditions via existing consumer devices. This software-centric approach reduces the environmental footprint associated with the manufacturing and disposal of short-lifecycle medical wearables. Second, in occupational safety and automotive monitoring, the framework facilitates the deployment of non-intrusive systems capable of detecting physiological stress or fatigue. By providing a validated, reproducible pathway for these technologies, the framework supports a sustainable health infrastructure that prioritizes proactive risk detection through pervasive, low-power sensing rather than resource-intensive clinical interventions.

Author Contributions

Conceptualization, S.Q., G.A.J. and A.A.; Methodology, S.Q., G.A.J. and A.A.; Software, G.A.J. and A.A.; Validation, G.A.J. and A.A.; Formal analysis, S.Q., G.A.J. and A.A.; Investigation, S.Q., G.A.J. and A.A.; Resources, S.Q., G.A.J. and A.A.; Data curation, S.Q., G.A.J. and A.A.; Writing—original draft, S.Q., G.A.J. and A.A.; Writing—review & editing, S.Q., G.A.J. and A.A.; Visualization, S.Q., G.A.J. and A.A.; Supervision, S.Q.; Project administration, S.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The three datasets used in our study are available at the following links: 1. UBFC-rPPG 1 and 2 Datasets: https://sites.google.com/view/ybenezeth/ubfcrppg#h.p_7WE4Ard4uPlU (accessed on 12 December 2025); 2. IMVIA-NIR Dataset: https://sites.google.com/view/ybenezeth/imvia-nir?authuser=0 (accessed on 12 December 2025); 3. VicarPPG 2 Dataset: https://docs.google.com/forms/d/e/1FAIpQLScwnW_D5M4JVovPzpxA0Bf1ZCTaG5vh7sYu48I0MVSpgltvdw/viewform (accessed on 12 December 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. de Haan, G.; Jeanne, V. Robust Pulse Rate from Chrominance-Based rPPG. IEEE Trans. Biomed. Eng. 2013, 60, 2878–2886. [Google Scholar] [CrossRef] [PubMed]
  2. Premkumar, S.; Hemanth, D.J. Intelligent Remote Photoplethysmography-Based Methods for Heart Rate Estimation from Face Videos: A Survey. Informatics 2022, 9, 57. [Google Scholar] [CrossRef]
  3. de Haan, G.; van Leest, A. Improved motion robustness of remote-PPG by using the blood volume pulse signature. Physiol. Meas. 2014, 35, 1913. [Google Scholar] [CrossRef] [PubMed]
  4. Poh, M.Z.; McDuff, D.J.; Picard, R.W. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Opt. Express 2010, 18, 10762–10774. [Google Scholar] [CrossRef] [PubMed]
  5. Verkruysse, W.; Svaasand, L.O.; Nelson, J.S. Remote plethysmographic imaging using ambient light. Opt. Express 2008, 16, 21434–21445. [Google Scholar] [CrossRef] [PubMed]
  6. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  7. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  8. Moguerza, J.M.; Muñoz, A. Support vector machines with applications. Stat. Sci. 2006, 21, 322–336. [Google Scholar] [CrossRef]
  9. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  10. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  11. Bobbia, S.; Macwan, R.; Benezeth, Y.; Mansouri, A.; Dubois, J. Unsupervised skin tissue segmentation for remote photoplethysmography. Pattern Recognit. Lett. 2019, 124, 82–90. [Google Scholar] [CrossRef]
  12. Gudi, A.; Bittner, M.; van Gemert, J. Real-Time Webcam Heart-Rate and Variability Estimation with Clean Ground Truth for Evaluation. Appl. Sci. 2020, 10, 8630. [Google Scholar] [CrossRef]
  13. Benezeth, Y.; Krishnamoorthy, D.; Botina Monsalve, D.J.; Nakamura, K.; Gomez, R.; Mitéran, J. Video-based heart rate estimation from challenging scenarios using synthetic video generation. Biomed. Signal Process. Control 2024, 96, 106598. [Google Scholar] [CrossRef]
  14. Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1, p. I-I. [Google Scholar] [CrossRef]
  15. Proakis, J.G.; Manolakis, D.G. Digital Signal Processing: Principles, Algorithms, and Applications, 4th ed.; Pearson Prentice Hall: London, UK, 2006. [Google Scholar]
  16. Wang, W.; den Brinker, A.C.; Stuijk, S.; de Haan, G. Algorithmic Principles of Remote PPG. IEEE Trans. Biomed. Eng. 2017, 64, 1479–1491. [Google Scholar] [CrossRef]
  17. Lewandowska, M.; Rumiński, J.; Kocejko, T.; Nowak, J. Measuring Pulse Rate with a Webcam—A Non-contact Method for Evaluating Cardiac Activity. In Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS), Szczecin, Poland, 18–21 September 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 405–410. [Google Scholar]
  18. Poh, M.Z.; McDuff, D.; Picard, R. Advancements in Noncontact, Multiparameter Physiological Measurements Using a Webcam. IEEE Trans. Bio-Med Eng. 2010, 58, 7–11. [Google Scholar] [CrossRef]
  19. Hjorth, B. EEG analysis based on time domain properties. Electroencephalogr. Clin. Neurophysiol. 1970, 29, 306–310. [Google Scholar] [CrossRef] [PubMed]
  20. Richman, J.S.; Moorman, J.R. Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol.-Heart Circ. Physiol. 2000, 278, H2039–H2049. [Google Scholar] [CrossRef] [PubMed]
  21. Bandt, C.; Pompe, B. Permutation entropy: A natural complexity measure for time series. Phys. Rev. Lett. 2002, 88, 174102. [Google Scholar] [CrossRef] [PubMed]
  22. Makowski, D.; Pham, T.; Lau, Z.J.; Brammer, J.C.; Lesspinasse, F.; Pham, H.; Schölzel, C.; Chen, A.S.H. NeuroKit2: A Python toolbox for neurophysiological signal processing. Behav. Res. Methods 2020, 53, 1689–1696. [Google Scholar] [CrossRef] [PubMed]
  23. Kim, G.I.; Chung, K. Extraction of Features for Time Series Classification Using Noise Injection. Sensors 2024, 24, 6402. [Google Scholar] [CrossRef] [PubMed]
Table 1. Overview of the benchmark datasets used in this study.
Table 1. Overview of the benchmark datasets used in this study.
DatasetSubjectsRecordingsFrame Rate (fps)Sensor Rate (Hz)Total Length (min)
UBFC-rPPG Part 16730308.4
UBFC-rPPG Part 24242306045.2
VicarPPG-210406060212.3
IMVIA-NIR1020206424.2
Table 2. Evaluation Results of Models across Feature Sets on UBFC Part-1 Dataset.
Table 2. Evaluation Results of Models across Feature Sets on UBFC Part-1 Dataset.
MethodsModelsAll FeaturesPFI-Reduced Features
MAERMSER2MAERMSER2
GreenRF6.218.78−0.2456.148.67−0.213
SVR7.128.66−0.2125.717.280.144
LR10.2016.52−3.40210.2716.04−3.152
XGB6.558.96−0.2956.298.77−0.241
ChromRF4.094.850.6213.844.740.638
SVR6.477.850.0043.884.650.651
LR5.967.620.0635.827.370.123
XGB4.605.660.4844.545.620.491
PBVRF5.657.850.0065.467.700.044
SVR9.6311.78−1.2398.5510.39−0.741
LR7.769.90−0.5827.789.93−0.591
XGB6.108.17−0.0775.867.720.038
POSRF8.4710.75−0.8658.2110.30−0.711
SVR8.3810.71−0.8538.0910.44−0.761
LR8.4810.45−0.7648.2410.21−0.682
XGB10.8812.93−1.6989.4411.67−1.198
Note: MAE: Mean Absolute Error; RMSE: Root Mean Square Error; R 2 : Coefficient of Determination; RF: Random Forest; SVR: Support Vector Regression; LR: Linear Regression; XGB: Extreme Gradient Boosting. Red colored values indicate the best performance (lowest MAE/RMSE, highest R 2 ) for a given Feature Set and Condition.
Table 3. Evaluation Results of Models across Feature Sets on UBFC Part-2 Dataset.
Table 3. Evaluation Results of Models across Feature Sets on UBFC Part-2 Dataset.
MethodsModelsAll FeaturesPFI-Reduced Features
MAERMSER2MAERMSER2
GreenRF5.497.350.3985.857.680.343
SVR7.319.190.0585.406.770.488
LR7.569.390.0167.348.950.107
XGB5.917.740.3326.007.730.333
ChromRF4.736.020.5964.745.970.602
SVR6.38.110.2675.146.470.533
LR6.68.960.1046.628.940.108
XGB5.246.220.5684.605.570.654
PBVRF8.410.24−0.1698.079.85−0.082
SVR8.8411.12−0.3798.5110.28−0.178
LR7.439.330.0307.439.330.030
XGB9.6411.37−0.4438.9511.20−0.399
POSRF7.559.290.0387.609.330.029
SVR7.109.090.0776.818.380.217
LR6.768.750.1456.939.010.095
XGB8.0210.14−0.1478.7910.52−0.236
Note: MAE: Mean Absolute Error; RMSE: Root Mean Square Error; R 2 : Coefficient of Determination; RF: Random Forest; SVR: Support Vector Regression; LR: Linear Regression; XGB: Extreme Gradient Boosting. Red colored values indicate the best performance (lowest MAE/RMSE, highest R 2 ) for a given Feature Set and Condition.
Table 4. Evaluation Results of Models across Feature Sets on IMVIA-NIR Dataset.
Table 4. Evaluation Results of Models across Feature Sets on IMVIA-NIR Dataset.
MethodsModelsAll FeaturesPFI-Reduced Features
MAERMSER2MAERMSER2
PCARF3.254.25−0.0402.773.450.315
SVR3.766.02−1.0874.665.72−0.885
LR3.955.90−1.0024.216.14−1.171
XGB2.653.750.1902.303.260.388
ICARF4.925.89−0.9964.425.65−0.842
SVR4.575.71−0.8764.115.11−0.506
LR4.565.32−0.6314.565.49−0.735
XGB4.986.48−1.4214.715.56−0.782
Note: MAE: Mean Absolute Error; RMSE: Root Mean Square Error; R 2 : Coefficient of Determination; RF: Random Forest; SVR: Support Vector Regression; LR: Linear Regression; XGB: Extreme Gradient Boosting. Red colored values indicate the best performance (lowest MAE/RMSE, highest R 2 ) for a given Feature Set and Condition.
Table 5. Evaluation Results of Models across Feature Sets on VICARppg 2 Dataset 25 s segment length.
Table 5. Evaluation Results of Models across Feature Sets on VICARppg 2 Dataset 25 s segment length.
MethodsModelsAll FeaturesPFI-Reduced Features
MAERMSER2MAERMSER2
GreenRF12.1015.59−0.65112.1015.59−0.651
SVR12.1616.32−0.80911.7414.30−0.388
LR11.4515.47−0.62411.3515.06−0.539
XGB12.2716.30−0.80312.7516.36−0.818
ChromRF6.209.760.3536.009.260.418
SVR8.0111.430.1136.789.850.341
LR9.3812.91−0.1319.4012.90−0.130
XGB6.6210.170.2986.279.410.399
PBVRF11.0014.41−0.41010.8914.17−0.362
SVR10.5114.26−0.38110.9714.22−0.372
LR10.7714.49−0.42510.5213.95−0.320
XGB11.9015.58−0.64810.6714.36−0.400
POSRF11.0514.13−0.35510.4313.38−0.215
SVR11.2613.95−0.32111.4314.33−0.393
LR10.5913.57−0.25010.4013.18−0.179
XGB11.6915.06−0.53910.4913.32−0.204
Note: MAE: Mean Absolute Error; RMSE: Root Mean Square Error; R 2 : Coefficient of Determination; RF: Random Forest; SVR: Support Vector Regression; LR: Linear Regression; XGB: Extreme Gradient Boosting. Red colored values indicate the best performance (lowest MAE/RMSE, highest R 2 ) for a given Feature Set and Condition.
Table 6. Comparison of Models across Different Feature Sets and Two Conditions for VICARppg 2 Dataset 20 s segment length.
Table 6. Comparison of Models across Different Feature Sets and Two Conditions for VICARppg 2 Dataset 20 s segment length.
MethodsModelsAll FeaturesPFI-Reduced Features
MAERMSER2MAERMSER2
GreenRF12.1415.79−0.59712.6616.24−0.690
SVR12.0915.63−0.56411.2214.98−0.438
LR11.9215.39−0.51711.5414.99−0.439
XGB12.5016.44−0.73212.7916.06−0.652
ChromRF6.6310.930.2356.7610.470.297
SVR7.8211.640.1327.1110.830.249
LR8.5711.690.1258.2411.540.147
XGB7.3311.510.1526.7310.810.251
PBVRF11.6215.88−0.61510.6414.19−0.290
SVR12.8816.86−0.82012.8816.86−0.820
LR11.6115.90−0.61911.2115.28−0.495
XGB12.2716.33−0.70811.5715.13−0.466
POSRF11.0514.07−0.26810.8413.94−0.245
SVR10.5713.71−0.20510.9713.77−0.215
LR10.4713.40−0.15110.3213.16−0.109
XGB11.1114.46−0.33911.4414.97−0.435
Note: MAE: Mean Absolute Error; RMSE: Root Mean Square Error; R 2 : Coefficient of Determination; RF: Random Forest; SVR: Support Vector Regression; LR: Linear Regression; XGB: Extreme Gradient Boosting. Red colored values indicate the best performance (lowest MAE/RMSE, highest R 2 ) for a given Feature Set and Condition.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qaadan, S.; Al Jayyousi, G.; Alkhalaileh, A. A Unified Benchmarking Framework for Classical Machine Learning Based Heart Rate Estimation from RGB and NIR rPPG. Electronics 2026, 15, 218. https://doi.org/10.3390/electronics15010218

AMA Style

Qaadan S, Al Jayyousi G, Alkhalaileh A. A Unified Benchmarking Framework for Classical Machine Learning Based Heart Rate Estimation from RGB and NIR rPPG. Electronics. 2026; 15(1):218. https://doi.org/10.3390/electronics15010218

Chicago/Turabian Style

Qaadan, Sahar, Ghassan Al Jayyousi, and Adam Alkhalaileh. 2026. "A Unified Benchmarking Framework for Classical Machine Learning Based Heart Rate Estimation from RGB and NIR rPPG" Electronics 15, no. 1: 218. https://doi.org/10.3390/electronics15010218

APA Style

Qaadan, S., Al Jayyousi, G., & Alkhalaileh, A. (2026). A Unified Benchmarking Framework for Classical Machine Learning Based Heart Rate Estimation from RGB and NIR rPPG. Electronics, 15(1), 218. https://doi.org/10.3390/electronics15010218

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop