1. Introduction
Remote photoplethysmography (rPPG) is a non-contact technique that estimates heart rate by analyzing subtle blood-volume-induced variations in skin color captured in camera videos. Because it does not require wearing sensors, rPPG can be deployed in scenarios where wearable devices are impractical or restricted, and it has therefore been actively studied for healthcare monitoring, driver-state assessment, and remote physiological sensing applications [
1,
2].
Despite its potential, rPPG is highly sensitive to recording conditions due to its reliance on optical reflectance. In particular, motion-induced artifacts caused by subtle subject movements, illumination imbalance between the camera and the skin, and changes in ambient lighting are major factors that degrade rPPG signal quality. These factors have long been recognized as key challenges that hinder the reliable use of rPPG in real-world environments. Motion artifacts are especially difficult to address because they comprise multiple irregular and aperiodic components, making them hard to quantify with a single signal model or to remove with a single unified strategy [
3,
4].
In contrast, illumination variations are relatively more amenable to physical and mathematical modeling, as the illumination component can be separated from the reflectance or chromatic components. Accordingly, a broad range of studies has explored illumination handling, including formula-driven approaches based on Retinex theory that jointly correct color and intensity [
5,
6] and model-based illumination normalization methods that normalize brightness or mitigate lighting effects [
7,
8]. More recently, incorporating illumination correction into rPPG pipelines has also been considered to improve robustness under changing lighting conditions.
However, most prior illumination-normalization-based rPPG studies primarily emphasize visual quality enhancement or color restoration accuracy. Comparatively less attention has been paid to whether these procedures preserve or enhance the temporally subtle chromatic fluctuations driven by blood perfusion—the core information source for rPPG. While normalization can stabilize global color distributions, it may simultaneously attenuate or distort the minute color variations associated with pulsatile blood flow [
9].
To overcome these limitations, recent efforts have also explored applying deep learning-based illumination normalization or image restoration models to rPPG. Although learning-based approaches can be robust under complex lighting, their high computational cost and long inference latency often limit their practicality for rPPG applications that require real-time processing. This limitation is particularly critical for mobile or embedded deployment, where model compactness and latency are constraints as important as estimation accuracy [
10,
11].
Motivated by these considerations, this study proposes a lightweight illumination normalization method specifically designed for rPPG, with the goal of improving robustness to illumination changes while maintaining real-time feasibility. Unlike conventional illumination normalization methods that mainly target perceptual enhancement or global color restoration, the proposed method is designed to explicitly preserve the inter-channel RGB ratio structure associated with pulsatile blood-flow signals. By mathematically separating chromatic information from illumination and applying normalization only to the brightness component, the proposed method suppresses lighting-induced luminance fluctuations while preserving temporal chromatic stability that is critical for physiological signal extraction. Because it does not require training and is computationally efficient, it can serve as a practical preprocessing strategy for improving the temporal stability and reliability of rPPG signals across diverse lighting environments. Unlike conventional illumination normalization methods that treat inter-channel ratio preservation as a byproduct, the proposed method explicitly enforces this constraint as a primary design objective for rPPG signal extraction. This distinction is critical because rPPG relies on subtle inter-channel temporal variations rather than absolute intensity, and preserving this structure directly impacts physiological signal fidelity.
3. Method
Figure 1 presents the overall framework of the proposed method, describing the end-to-end pipeline designed for robust rPPG extraction under diverse real-world illumination conditions.
3.1. Dataset
To evaluate the robustness of the proposed rPPG signal extraction method against illumination variations, this study employs two publicly available datasets constructed under distinct environmental conditions: DLCN [
21] and MR-NIRP [
22]. The DLCN dataset was collected in a controlled environment with systematically varying lighting conditions, whereas the MR-NIRP dataset was acquired in an uncontrolled driving environment. These contrasting settings enable a comprehensive assessment of the generalization capability of the proposed method under diverse illumination scenarios. A comparison of the key characteristics of the two datasets is summarized in
Table 2. All facial images presented in this study were obtained from these publicly available datasets and were used within the scope of the datasets’ original ethical approval and informed consent procedures.
3.1.1. DLCN
The DLCN dataset is a recent publicly available benchmark constructed to quantitatively evaluate the performance of rPPG signal extraction under complex and diverse nighttime illumination conditions. It consists of RGB facial videos collected from a total of 98 participants, with all videos recorded at a resolution of 640 × 480 pixels and a frame rate of 30 fps. A distinctive characteristic of the DLCN dataset is its session-based structure, in which both illumination conditions and physiological states are systematically controlled.
For each subject, a total of eight sessions are provided. Sessions 1–4 are recorded under resting conditions, while Sessions 5–8 are captured after physical exercise, thereby incorporating physiological variability. Each session is further divided into four illumination scenarios defined by combinations of light intensity and light position, and the same scenarios are repeated under identical conditions before and after exercise. The detailed illumination scenarios are as follows:
Fixed Intensity & Fixed Position (FI & FP): A stable illumination condition in which both the light intensity and position remain constant.
Variable Intensity & Fixed Position (VI & FP): A condition where the light position is fixed while the illumination intensity varies over time, inducing brightness fluctuations.
Fixed Intensity & Variable Position (FI & VP): A condition in which the light intensity remains constant but the light position changes, resulting in continuously varying incident angles on the face.
Variable Intensity & Variable Position (VI & VP): The most complex scenario, where both the illumination intensity and position vary simultaneously, closely approximating real-world illumination conditions.
By independently or jointly varying illumination intensity and position across these four scenarios, the DLCN dataset systematically models diverse environments that induce abrupt changes in facial reflectance characteristics. Since each session is accompanied by ground-truth heart rate measurements, the dataset enables quantitative analysis of how video-based rPPG algorithms are affected by different levels of illumination variation. Owing to this well-controlled and comprehensive design, DLCN serves as a highly suitable benchmark for validating the robustness of rPPG algorithms against illumination changes.
Examples of DLCN frames under different scenarios over time are illustrated in
Figure 2.
3.1.2. MR-NIRP
The MR-NIRP dataset is a publicly available benchmark consisting of videos captured in real-world driving environments along with synchronized ground-truth physiological signals, providing an important basis for evaluating rPPG performance under uncontrolled conditions. The dataset includes simultaneously recorded RGB videos and near-infrared (NIR) videos in the 940 nm and 975 nm bands from a total of 18 participants, with all videos captured at a resolution of 640 × 640 pixels and a frame rate of 30 fps.
The recording environments are broadly categorized into Driving and Garage settings. The Garage environment represents a relatively stable condition with a single illumination setup. Since this study focuses on analyzing the impact of illumination variations on rPPG signal extraction, the Garage environment—where illumination changes are minimal—is excluded, and only the Driving environment is utilized.
The Driving data are further divided into three scenarios based on motion intensity: Drive-small, Drive-still, and Drive-large. To independently assess the effects of illumination changes, this study uses only the Drive-small and Drive-still scenarios, which involve relatively limited motion. The Drive-large scenario is excluded from the analysis because it contains substantial head movements and posture changes, leading to a mixture of illumination effects and motion-induced artifacts.
Consequently, the MR-NIRP dataset used in this study comprises driving videos recorded under diverse and mixed illumination conditions, including non-uniform lighting, saturation, daytime, and nighttime scenarios. This configuration enables the validation of the robustness of the proposed method under irregular and dynamically changing illumination conditions encountered in real-world driving environments.
Examples of the MR-NIRP dataset under various environmental conditions are shown in
Figure 3.
3.2. Illumination Normalization
In this study, we propose a preprocessing method that preserves color components while normalizing only the illumination component in RGB frames, thereby reducing the influence of illumination variations and stably emphasizing color changes. Specifically, each RGB frame is decomposed into an illumination component and a color component, the illumination component is independently normalized, and the color information is subsequently re-integrated. This formulation facilitates robust rPPG signal extraction under varying lighting conditions.
First, the mean RGB vector of frame
t is computed as follows:
where
,
, and
denote the average values of the red, green, and blue channels, respectively.
As described in Equation (
1), the illumination magnitude is defined as the maximum component of the RGB vector, corresponding to the dominant channel intensity. This value is used as a global brightness scale of the frame. The illumination component
is computed as follows:
Using the illumination magnitude defined in Equation (
2), the color component can be independently extracted by removing the illumination scale. In this study, the illumination magnitude is defined as the maximum component of the frame-mean RGB vector rather than the channel average or an
-norm-based magnitude, because the aim of the proposed formulation is not to reconstruct physical illumination itself, but to isolate a global brightness scale while preserving the inter-channel RGB ratio structure that is essential for chrominance-based rPPG. The use of the maximum RGB component is not intended to estimate physical illumination, but to define a dominant-channel-referenced scaling that minimizes cross-channel mixing. This is particularly important in rPPG, where pulsatile signals are encoded in subtle inter-channel relationships, and alternative formulations such as averaging or norm-based scaling may distort these relationships. Specifically, using the maximum RGB component enables brightness scaling with respect to the dominant channel, which helps preserve the original color direction of the RGB vector. In contrast, alternative definitions such as channel averaging or
norms may introduce cross-channel mixing effects in the scaling term, potentially distorting the relative chromatic structure. The color component
is computed as follows:
where
is a small constant introduced to avoid division by zero. As shown in Equation (
3), the resulting color component represents a normalized color direction in which the maximum channel value is scaled to one. Accordingly, by using
, the normalized color component preserves the color direction of the original RGB vector while reducing illumination-driven magnitude variation, thereby minimizing distortion of subtle blood-flow-induced chromatic changes.
Figure 4 illustrates example images obtained by separating the illumination and color components from an input frame using Equations (
2) and (
3). The color component
represents inter-channel RGB ratios with the illumination magnitude removed. Since it encodes color direction rather than perceptual color, the resulting image may appear biased toward a single hue. In particular, human skin regions tend to appear reddish due to their inherently higher red-channel contribution.
After independently separating the color and illumination components, illumination normalization is applied. A general illumination scaling model is expressed as follows:
where
denotes an illumination adjustment factor.
Given a target illumination level
, the scaling factor
can be defined as follows:
As shown in Equation (
5), the illumination magnitude
is forced to a fixed reference value
, regardless of the original frame brightness.
The normalized illumination value is then recombined with the original color direction to generate the final color vector. Importantly, the illumination component is not completely discarded, since illumination variations still carry informative cues for rPPG extraction. The final RGB vector is obtained by combining the normalized illumination magnitude with the original color component as follows:
where
represents an RGB vector with normalized illumination applied.
Compared to existing methods that decompose illumination and color components, the proposed approach is intended to be more suitable for rPPG tasks. Conventional techniques such as Retinex, Histogram Equalization, Gamma Correction, and Color Constancy introduce nonlinear transformations across RGB channels, which can distort or attenuate the subtle color variations that are critical for rPPG signal extraction. In contrast, the proposed method preserves the original color direction while linearly normalizing only the global illumination component. As a result, it improves robustness to illumination changes while helping preserve rPPG-related color variations, which may contribute to higher signal-to-noise ratio (SNR) and more stable heart rate estimation. A comparison with existing illumination normalization methods is summarized in
Table 3.
3.3. rPPG Extraction
The rPPG processing pipeline proposed in this study consists of illumination normalization (I-norm) for obtaining color signals robust to illumination variations, facial and skin region extraction, algorithm-based rPPG signal extraction, SNR-based signal quality assessment, and heart rate (BPM) estimation. In particular, the proposed method applies I-norm to suppress brightness drift caused by illumination changes. Subsequently, an SNR-based signal quality evaluation is performed to select reliable rPPG signals, enabling stable heart rate estimation even under low-light conditions. The overall workflow of the proposed pipeline is illustrated in
Figure 5.
After applying I-norm to each frame, facial preprocessing was conducted. Since the DLCN dataset is provided with pre-cropped facial regions, no additional facial preprocessing was applied. In contrast, for the MR-NIRP dataset, faces were detected and cropped from the original frames. For all frames, facial landmarks were extracted using a 3D Dense Face Alignment-based method [
23], which were subsequently used to define skin regions and generate skin masks.
The skin mask was generated by constructing a facial boundary polygon based on jaw-to-brow landmarks to remove background regions. In addition, the eye, nose, and mouth regions were further masked to suppress motion-related noise. By excluding dynamic facial components such as eye blinking and mouth movements, the proposed pipeline aims to extract more reliable rPPG signals.
To analyze the effect of the proposed I-norm preprocessing on various color-based rPPG algorithms, several widely used methods were employed in this study, including POS [
3], ICA [
24], CHROM [
4], GREEN [
25], LGI [
26], and PBV [
27].
The Plane-Orthogonal-to-Skin (POS) method projects normalized RGB signals onto two orthogonal chrominance components and constructs the rPPG signal as follows:
where
and
are scaling coefficients computed based on the standard deviations within a temporal window.
In Independent Component Analysis (ICA)-based rPPG, the RGB signal is assumed to be a linear mixture of statistically independent source components, and periodic blood-volume-induced variations are separated through independent component decomposition. The RGB signal vector
is decomposed into independent components as follows:
where
denotes the unmixing matrix. Among the resulting independent components, the one exhibiting the largest spectral energy within the heart rate frequency band is selected as the rPPG signal.
The Chrominance-based rPPG (CHROM) method constructs an illumination-robust rPPG signal by combining two chrominance difference signals. The chrominance components are defined as follows:
and the final rPPG signal is obtained by combining the two components using their standard deviation ratio as follows:
where
and
denote the standard deviations of
and
within a temporal window.
The GREEN method is a simple approach that extracts the rPPG signal using only the green channel, which is known to be most sensitive to blood volume variations. The rPPG signal at frame
t is defined as follows:
followed by band-pass filtering to emphasize the heart-rate frequency band. Although computationally efficient and suitable for real-time processing, the GREEN method is relatively vulnerable to illumination changes.
Local Group Invariance (LGI) exploits instantaneous proportional changes in color signals to achieve robustness against illumination variations. Specifically, the temporal derivatives of RGB signals are normalized as follows:
and the three components of
are subsequently reduced to a single rPPG signal using either principal component analysis (PCA) [
28] or a weighted summation scheme.
The Pulse Blood Volume (PBV) method assumes that the relative ratios among color channels induced by blood volume changes remain constant. Based on this assumption, the rPPG signal is extracted by projecting the normalized RGB signal
onto a specific direction as follows:
where
is a projection vector designed to maximize color responses associated with blood volume variations.
The raw rPPG signals obtained from each algorithm are first detrended to remove baseline drift, and a Butterworth band-pass filter with a frequency range of 0.7–3.0 Hz is applied to retain only the heart-rate band. Subsequently, an SNR (Signal-to-Noise Ratio)-based signal quality evaluation stage is introduced to assess the reliability of the rPPG signal.
The SNR quantifies how strongly heart-rate-related components appear in the rPPG signal relative to surrounding noise and is used as an indicator of signal quality. Based on the power spectral density (PSD)
, a frequency band of ±0.1 Hz around the dominant frequency is defined as the signal band, while the remaining frequencies within the physiological range (0.7–3.0 Hz) are treated as the noise band. The SNR is computed as follows:
A higher SNR value indicates that heart-rate-related frequency components are more dominant than noise components, implying higher rPPG signal quality. Therefore, in this study, the SNR value is used as a signal quality indicator to minimize the influence of noisy signal segments.
For heart rate estimation, both Contact Photoplethysmography (cPPG) and rPPG signals are resampled to 30 Hz and temporally aligned to have the same signal length. The analysis is then performed using non-overlapping 30-s windows. Within each window, the fast Fourier transform (FFT) is applied to obtain the frequency spectrum
. The dominant frequency is determined as follows:
and the heart rate (BPM) is computed as follows:
The difference between rPPG-based BPM and cPPG-based BPM across all windows is evaluated using the mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and the percentage of estimates within a ±6 BPM tolerance (PTE6).
Through this pipeline structure, illumination-robust rPPG signals can be generated, and the influence of noise can be reduced through SNR-based signal quality assessment, enabling more stable heart rate estimation.
4. Results
4.1. Signal-Level Validation of Illumination Normalization
To evaluate the effectiveness of the proposed illumination normalization method on the DLCN dataset, the AC/DC ratio was analyzed.
Figure 6 illustrates the distribution of AC/DC ratios over the entire dataset, where the yellow distribution corresponds to the original signals and the blue distribution represents the normalized results. Compared to the original signals, the AC/DC ratio distribution after normalization is shifted toward lower values, suggesting that the proposed method reduces illumination-induced DC components while maintaining pulsatile information.
In addition, frame-level changes after applying illumination normalization to the DLCN dataset are illustrated in
Figure 7.
In addition to overall illumination stabilization, several analyses were conducted to verify the preservation of color information. Specifically, skin-region signals of the R, G, B, Cb, and Cr channels were compared with and without illumination normalization. The comparison results are summarized in
Table 4.
An increase was observed in all channels except for the Cb channel. Overall, the standard deviations exhibited minimal variation, suggesting that the color components were largely preserved after normalization.
Figure 8 illustrates a comparison of the R, G, B, Cb, and Cr signals before and after illumination normalization.
To evaluate the degree to which illumination variations were suppressed, channel-wise temporal stability was analyzed in addition to color variation. The changes in the stability ratio for each channel are summarized in
Table 5.
More than a twofold improvement in temporal stability was observed across all channels, suggesting that the proposed illumination normalization helps suppress temporal flicker. Beyond color preservation at the frame level, inter-channel correlations are also critical for rPPG signal extraction. To examine this aspect, changes in the correlations among the RGB channels and between the Cb and Cr channels were analyzed. The correlation changes between channels are summarized in
Table 6.
As the correlation coefficients among the RGB channels increase overall, the color direction appears to become more stable, suggesting that illumination-induced common-mode variations are more consistently aligned. This stabilization may facilitatemore reliable directional color changes, which are essential for chrominance-based rPPG methods. Meanwhile, the Cb and Cr correlation increases from −0.431 to −0.322 after illumination normalization. Although the correlation value itself increases, its absolute magnitude decreases, indicating a weakened negative correlation between the chrominance channels. Since the Cb and Cr components primarily represent color-difference information that is more sensitive to illumination and color bias than to blood perfusion, this change suggests that illumination-related chrominance coupling is partially suppressed rather than directly enhancing blood-flow-induced information.
Figure 9 compares the baseline rPPG signals, the signals obtained after illumination normalization (I-norm), and the reference cPPG signals for several representative methods. In the time domain, the I-norm signals generally exhibit waveform patterns that are more consistent with the reference cPPG signals than the baseline signals. In the frequency domain, the normalized PSDs show that, after illumination normalization, the dominant spectral peaks tend to move closer to the cPPG peak or become more clearly concentrated around the physiological frequency band. This tendency is particularly evident in methods such as CHROM, ICA, PBV, and POS, where the baseline signals exhibit broader or less consistent spectral distributions, whereas the I-norm signals show improved peak concentration and reduced spectral dispersion. These observations suggest that the proposed normalization reduces illumination-induced fluctuations and may help preserve pulse-related periodic components more effectively than the baseline input.
Before applying illumination normalization, the average frame processing time was 1.18 ms per frame. After incorporating the normalization process, the processing time increased to 1.34 ms per frame, corresponding to an average overhead of approximately 0.16 ms per frame. This additional computational cost is marginal and does not affect real-time processing performance.
4.2. Effect of Illumination Normalization on rPPG Extraction Performance
The effect of the proposed illumination normalization (I-norm) on rPPG signal extraction performance was analyzed. Heart rate estimation performance before (baseline) and after applying I-norm was compared on the DLCN dataset using the POS, ICA, and CHROM algorithms. The analysis was conducted using a window size of 30 s without overlap, and BPM values were computed for each window. The results are presented in
Table 7.
As shown in
Table 7, after applying I-norm, most algorithms exhibited reduced MAE and RMSE values and an increased PTE6 ratio. These results indicate that I-norm alone already provides consistent performance improvements over the baseline across multiple state- and scenario-based conditions, even before any signal-quality filtering is applied. In particular, the CHROM algorithm demonstrated a substantial improvement in the Rest condition, where the MAE decreased from 11.33 BPM to 5.84 BPM.
Table 8 shows that I-norm generally achieved the best performance among the compared image enhancement methods across both state- and scenario-based evaluations. In particular, I-norm yielded the lowest errors in most conditions for POS and CHROM, and maintained more stable performance than Retinex, Gamma, and Lab as illumination conditions became more challenging. Although the error increased under scenarios involving illumination position changes, I-norm still showed the most favorable overall results, indicating that it was among the most favorable enhancement methods for robust rPPG extraction under the evaluation settings considered in this study.
To evaluate the feasibility of real-time application, BPM estimation performance was further compared by varying the window size to 5, 10, 20, and 30 s with a 1-s estimation interval. The results are summarized in
Table 9. The analysis indicates that the highest accuracy was achieved with a 30-s window. As the window size decreased, the estimation accuracy tended to decline because the periodic characteristics of the rPPG signal were insufficiently captured. In particular, a marked increase in MAE was observed across all algorithms when a 5-s window was used. Therefore, considering both accuracy and stability, a 30-s window was adopted as the default configuration in this study.
The same analysis was also conducted on the MR-NIRP dataset, and the results are presented in
Table 10. A general trend of MAE reduction after applying I-norm was observed across most algorithms. For example, in the PBV algorithm, the MAE decreased from 13.98 to 10.27 under the Driving Small Motion condition. Similarly, the CHROM algorithm showed a substantial improvement in the Driving Still condition, where the MAE decreased from 11.16 to 7.21.
4.3. Effect of SNR-Based Signal Quality Filtering on rPPG Estimation
To further analyze the effect of I-norm, an additional signal quality evaluation based on the signal-to-noise ratio (SNR) was applied to assess the incremental effect of filtering low-quality segments after illumination normalization On the DLCN dataset, the results of Baseline + SNR and I-norm + SNR were compared, and the results are presented in
Table 11.
When I-norm + SNR was applied, most algorithms showed substantial reductions in MAE, RMSE, and MAPE, along with an increase in the PTE6 ratio. In the Rest condition, the MAE of the POS algorithm decreased from 8.57 in Baseline + SNR to 2.91 after applying I-norm + SNR. Similarly, for the CHROM-based rPPG method, the MAE in the Rest condition decreased substantially from 7.70 to 2.36. A similar trend was observed in the Exercise condition, where the MAE of the CHROM algorithm decreased from 6.96 to 3.59. The SNR-based filtering step further improved performance by excluding low-quality segments. Importantly, I-norm alone already provides substantial performance improvement prior to any SNR-based filtering. For example, the CHROM MAE is significantly reduced before filtering is applied, indicating that the primary performance gain originates from illumination normalization rather than post hoc selection. Moreover, when comparing Baseline + SNR with I-norm + SNR, the I-norm-based pipeline generally achieved better overall performance, suggesting that illumination normalization contributed beyond the filtering effect.
The same analysis was also conducted on the MR-NIRP dataset, and the results are presented in
Table 12. In the Driving Small Motion condition, the MAE of the POS algorithm decreased from 9.17 BPM to 4.15 BPM. In the Driving Still condition, the MAE of the ICA algorithm decreased from 7.38 BPM to 1.43 BPM.
The experimental results indicate that the proposed illumination normalization did not yield uniform improvements across all conditions, but produced more pronounced performance gains in scenarios with substantial illumination changes and in color-based rPPG algorithms. In particular, on the DLCN dataset, the improvement was more evident under complex conditions where both illumination intensity and position varied simultaneously, and the MAE of the CHROM algorithm was reduced from approximately 18–19 BPM to 4.87 BPM. Moreover, an overall reduction in error was also observed in the Drive-small and Drive-still conditions of the MR-NIRP dataset, suggesting that the proposed method can operate effectively not only in controlled illumination environments but also in more realistic conditions considered in this study. These findings suggest that the proposed method does not simply correct brightness, but also reduces frame-to-frame brightness fluctuations in a manner that is consistent with preserving the inter-channel color ratio information used for rPPG extraction.
4.4. SNR-Based Quality Filtering Strategy
The signal-to-noise ratio (SNR) is a metric that represents the ratio between the energy of the signal and that of the noise. When the SNR is greater than 0, the signal energy exceeds the noise energy, meaning that periodic patterns or dominant frequency components tend to appear more clearly. In physiological signals such as rPPG, this condition increases the likelihood that the frequency component corresponding to the heart rate can be observed in a stable manner.
In contrast, when the SNR is lower than 0, the noise energy becomes dominant over the signal energy, increasing the possibility that the frequency components of the true physiological signal are masked or distorted by noise. Therefore, SNR = 0 can be interpreted as the boundary where the signal energy is at least equal to the noise energy, and it can serve as a practical reference point for evaluating signal quality. An example of signals categorized based on SNR = 0 is illustrated in
Figure 10.
When the SNR threshold (th) was set to 0, the data loss rate observed in the DLCN dataset is summarized in
Table 13. The data retention rate remains high (over 93–97%), indicating that performance improvement is not achieved by discarding large portions of data.
Similarly, the data loss rate for the MR-NIRP dataset is presented in
Table 14.
4.5. Comparison with Existing Benchmark Studies
The performance of the proposed I-norm + SNR-based rPPG processing pipeline was compared with previously reported results.
Table 15 presents a comparison between the algorithm-based baseline results and deep learning-based model performances reported in the original DLCN study and the results obtained in this study.
The proposed method achieved lower MAE than conventional algorithm-based approaches under most illumination conditions. For example, in the FI&FP condition, the MAE of the POS algorithm decreased from 8.04 BPM in the previous study to 0.85 BPM with the proposed method. The results also show competitive performance compared with deep learning-based models.
A similar comparison was conducted on the MR-NIRP dataset, and the results are summarized in
Table 16. In the Driving Small Motion condition, the proposed method achieved an MAE of 4.15 BPM using the POS algorithm, which is lower than the error reported by the baseline methods. In the Driving Still condition, the MAE reached 1.43 BPM when the ICA algorithm was applied, outperforming the existing algorithm-based approaches. The proposed method also showed meaningful performance compared with deep learning-based rPPG models. For instance, while PhysNet reported an MAE of 4.37 BPM in the Driving Still condition, the proposed method achieved a significantly lower MAE of 1.43 BPM.
An SNR value lower than 0 indicates that the noise component is greater than the pulse-related signal within the target frequency band. In the DLCN analysis, such failure segments were most frequent under VP conditions, accounting for approximately 33% ± 5, followed by VI at approximately 26% ± 3, FP at approximately 22% ± 4, and FI at approximately 19% ± 4. This can be interpreted as resulting from the fact that, when illumination position changes (VP) are involved, variations in facial shading, reflection, and local brightness imbalance become more pronounced, causing non-periodic optical changes to dominate over the periodic components of the rPPG signal. Changes in illumination intensity (VI) also led to SNR degradation, but their effect was relatively smaller than that of position changes. This tendency was consistent with the actual performance results, where the VI & VP condition, involving simultaneous changes in illumination intensity and position, emerged as the most challenging scenario. Therefore, the SNR-based selection was more effective in relatively stable illumination conditions, but it also revealed a limitation in that failure segments occurred more frequently in complex environments involving illumination position changes.
5. Discussion
In this study, only the luminance component was normalized while preserving color information as much as possible. To verify the effect of the proposed normalization, the differences in the mean values of the R, G, B, Cb, and Cr color channels before and after normalization were analyzed.
Figure 11 visually illustrates the magnitude of these changes.
After applying the proposed brightness normalization technique, the mean values of all RGB and CbCr channels changed only within an approximately ±6–12% range, indicating that the channel-wise DC components were not excessively distorted and that the proposed normalization primarily compensated for global illumination bias. More importantly, these limited channel-wise shifts suggest that the method does not substantially alter the inter-channel RGB ratio structure, which is critical for preserving blood-flow-related chromatic information in rPPG. In particular, the increase in the mean values of the R, G, and B channels can be interpreted not as an artificial amplification of the absolute frame brightness, but as a slight upward adjustment of the effective signal level that includes skin reflectance components, which may enhance the responsiveness of channels sensitive to blood flow variations. In contrast, the decrease in the mean value of the Cb channel can be attributed to a shift in the blue-difference component toward a more neutral reference, mitigating illumination-induced color bias, while the increase in the Cr channel emphasizes redness variations associated with blood perfusion. Notably, these mean value changes were accompanied by negligible variations in standard deviation, suggesting that the normalization primarily adjusted the DC level while largely preserving the temporal rPPG waveform structure. This supports the interpretation that the proposed method functions not merely as a perceptual brightness correction, but as an rPPG-oriented normalization strategy that stabilizes illumination while preserving temporally subtle chromatic variations relevant to physiological signal extraction. At the same time, the dynamic range across all channels was expanded by approximately 2.4–2.6×, allowing subtle AC-component-based color variations to become more prominent and thereby contributing to an improvement in the SNR of the rPPG signal.
In addition to analyzing changes in color components, Temporal Stability was evaluated to assess the temporal stability of brightness variations. The results show that Temporal Stability improved by approximately twofold across all channels, suggesting that the proposed illumination normalization helps suppress temporal brightness instability and flicker components. Temporal flicker can induce low-frequency brightness variations that may overlap with the frequency band of rPPG signals, potentially degrading heart rate estimation accuracy. Therefore, the proposed normalization process can be interpreted as providing a more stable illumination environment for rPPG signal extraction by suppressing such flicker components.
To further verify rPPG performance under stabilized illumination conditions, inter-channel correlations were analyzed. Compared to the original video, the correlations among the R–G–B channels increased by an average of approximately , while a relatively larger improvement of about was observed for the Cb–Cr channel pair. This can be interpreted as a result of the corrected luminance component being consistently reflected across the RGB channels during brightness normalization, leading to improved color balance. In the chrominance domain, brightness normalization reduced mutual interference between the Cb and Cr channels and yielded a more normalized color-difference distribution, indicating reduced illumination-induced color distortion and enhanced preservation of independent chromatic information. These properties are particularly important for color-based physiological signal analysis such as rPPG, as they enable more stable separation of chromatic components and improve the overall reliability and robustness of signal extraction.
The final performance reported in this study was obtained by applying SNR-based signal selection after brightness normalization. However, the observed gains were not solely attributable to the filtering step. As shown in the comparative results, I-norm alone already yielded overall improvements over the baseline across multiple conditions, indicating that illumination normalization itself played a major role in the performance gain. The SNR-based filtering step provided additional improvement by excluding low-quality segments, thereby further enhancing the stability of BPM estimation. On the DLCN dataset, the MAE of the CHROM method under the most complex illumination condition (VI & VP) was reduced from approximately 18–19 BPM, as reported in prior benchmark studies, to BPM, corresponding to an error reduction of over , while PTE6 increased substantially from below to over . Although performance degradation was observed in some scenarios compared to the deep-learning-based CSTR-Net, this limitation can be attributed to the algorithm-based nature of the proposed approach rather than a learning-based model. Nevertheless, achieving illumination-robust performance without any additional training highlights the practical significance of the proposed method for real-time and lightweight systems. Similar trends were observed on the MR-NIRP dataset, where the MAE of the POS and ICA methods in the Driving Small Motion scenario decreased from BPM to BPM and from BPM to BPM, respectively, demonstrating competitive performance compared to prior MR-NIRP-based studies and some deep-learning-based models. These results suggest that the proposed brightness normalization and SNR-based post-processing generalize well not only to controlled environments but also to real driving conditions, indicating the potential to extend the practical limits of algorithm-based rPPG.
The findings of this study suggest that the proposed method is effective in improving robustness to illumination changes; however, its benefit varied depending on the algorithm and scenario. Larger improvements were observed for algorithms that are sensitive to inter-channel color changes, such as CHROM and POS, whereas methods such as ICA still exhibited relatively large errors under some complex conditions. This may be attributed to the fact that the proposed method primarily stabilizes the brightness component, but does not simultaneously address other performance-degrading factors, including motion, changes in reflection characteristics, and ROI tracking errors. In addition, because motion-intensive conditions were excluded from MR-NIRP in order to focus on the analysis of illumination effects, the present results have limited generalizability to real-world environments involving substantial motion. The exclusion of motion-intensive scenarios was a deliberate design choice to isolate illumination effects. Including large motion would introduce confounding factors, making it difficult to attribute performance changes specifically to illumination normalization. Broader evaluation against additional classical normalization methods remains for future work. In addition, direct comparison with alternative quality-aware selection strategies was not included in the present study, which should be addressed in future work. Therefore, future work should explore the integration of illumination normalization with motion artifact suppression and ROI stabilization techniques, while also evaluating its performance against other quality-aware selection approaches.
This study adopts an algorithm-based approach focused on compensating for illumination variations, and therefore performance limitations may arise in environments involving large motion or in scenarios requiring learning-based representations. Although the proposed method showed strong BPM estimation performance, the temporal waveform remained unstable under some low-light conditions, which may limit its applicability to waveform-based physiological indicators such as HRV (Heart Rate Variability). However, waveform fidelity itself was not quantitatively analyzed in this study, and thus the severity of this instability could not be specified numerically. In addition, direct physiological validation of hemodynamic information preservation was not provided, although improved temporal stability, preserved color-direction structure, and consistent BPM gains across multiple rPPG algorithms provide indirect evidence that the method does not severely distort blood-flow-related color variations. Moreover, while embedded hardware implementation was not directly evaluated, the proposed method consists only of lightweight, deterministic normalization operations, suggesting its feasibility as a compact preprocessing block for embedded rPPG pipelines. Future work will therefore include waveform-level validation, integration with motion compensation, and extension to deep-learning-based and embedded rPPG systems.
6. Conclusions
In this study, we proposed an illumination normalization technique and an SNR-based signal selection strategy to mitigate the degradation of rPPG signal quality caused by brightness variations. The proposed illumination normalization selectively compensates for the luminance component while preserving the relative relationships among color components. The experimental results showed that after brightness normalization, the mean variation in the color channels remained within ±6–12%, indicating that the structural characteristics of the color information were preserved. In addition, the normalization process helped suppress global brightness fluctuations and flicker components caused by illumination changes by expanding the dynamic range and improving temporal stability.
This signal-level stabilization led to significant improvements in heart rate estimation performance. On the DLCN dataset, under the most challenging illumination condition (VI&VP), the MAE of the CHROM method decreased from approximately 18–19 BPM reported in previous studies to 4.87 BPM, while the proportion of estimates within the tolerance range (PTE6) increased substantially to 82.48%. Similar improvements were observed on the MR-NIRP dataset. In the Driving Small Motion scenario, the MAE of the POS and ICA algorithms decreased to 4.15 BPM and 1.43 BPM, respectively, showing performance gains under the driving conditions considered in this study.
Although a performance gap remains in some scenarios compared with learning-based models, the proposed approach improves robustness to illumination variations without requiring additional training or increased model complexity. Therefore, the proposed method can serve as an effective preprocessing strategy for existing rPPG algorithms and has strong potential for real-time, lightweight physiological signal measurement systems.