Contactless Cardiovascular Assessment by Imaging Photoplethysmography: A Comparison with Wearable Monitoring

Despite the notable recent developments in the field of remote photoplethysmography (rPPG), extracting a reliable pulse rate variability (PRV) signal still remains a challenge. In this study, eight image-based photoplethysmography (iPPG) extraction methods (GRD, AGRD, PCA, ICA, LE, SPE, CHROM, and POS) were compared in terms of pulse rate (PR) and PRV features. The algorithms were made robust for motion and illumination artifacts by using ad hoc pre- and postprocessing steps. Then, they were systematically tested on the public dataset UBFC-RPPG, containing data from 42 subjects sitting in front of a webcam (30 fps) while playing a time-sensitive mathematical game. The performances of the algorithms were evaluated by statistically comparing iPPG-based and finger-PPG-based PR and PRV features in terms of Spearman’s correlation coefficient, normalized root mean square error (NRMSE), and Bland–Altman analysis. The study revealed POS and CHROM techniques to be the most robust for PR estimation and the assessment of overall autonomic nervous system (ANS) dynamics by using PRV features in time and frequency domains. Furthermore, we demonstrated that a reliable characterization of the vagal tone is made possible by computing the Poincaré map of PRV series derived from the POS and CHROM methods. This study supports the use of iPPG systems as promising tools to obtain clinically useful and specific information about ANS dynamics.


Introduction
Heart rate (HR) and heart rate variability (HRV) are crucial markers of an individual's psychophysical status. The HR is the mean number of heartbeats in a time interval, generally considered over one minute, e.g., beats per minute (BPM), whereas HRV refers to the fluctuations in the duration of the time intervals between consecutive heartbeats and reflects the ability of the autonomic nervous system (ANS) to maintain homeostasis. Alterations in HR are completely nonspecific and can be observed in many physiological and pathological conditions, including sleep, physical exercise, stress, anxiety, illness, and assumptions about drugs [1]. On the other hand, the analysis of HRV represents a useful tool for the overall characterization of autonomic dynamics and can provide useful information about alterations in vagal activity [2].
The state-of-the-art method to extract HRV is based on the acquisition of electrocardiographic (ECG) signals, placing multiple electrodes on specific anatomical body positions. The wearable monitoring devices based on photoplethysmography (PPG) do not require this constraint. PPG techniques rely on noninvasive optical sensors applied in contact with the skin to measure the change in the cardiac blood volume pulse (BVP) (for example, from the finger, wrist or ear of the subject) [3]. The BVP signal is obtained by measuring the variations of reflected light in the peripheral skin tissues: the amount of light that reflects by penetrating the skin provides an indication of the changes in blood flow [4]. The pulsatile component of the PPG signal oscillates with the heart cycle period, and thus, the functional characteristics of pulse rate (PR) and pulse rate variability (PRV) extracted from the PPG signal are comparable to the HR and HRV extracted from the ECG [3,4]. Even if PPG sensors have several benefits compared with the ECG sensors (ease of use, low costs, convenience, etc.), they still need to be applied directly to the skin, constraining their feasibility in situations where free mobility is required, in presence of skin damage (burn/ulcer/trauma), or when sensors are too obtrusive, for example, during the monitoring of preterm babies. Noncontact measurement of PR and PRV would be a potential solution for these practical issues, simplifying data acquisition in nonclinical scenarios (e.g., human-machine interaction [5], driver monitoring [6]), and allowing effective remote monitoring of patients [7]. Studies on contactless measurements of PR and PRV include thermal imaging [8], microwave Doppler effect [9], PR from speech [10], and image-based photoplethysmography (iPPG) [11]. In this study, we focused on PR and PRV extraction by making use of the remote-camera-based iPPG method. In addition to the fact that the use of wearable sensors can be avoided using iPPG, the technique is easily scalable, since digital cameras are inexpensive and widely available. The iPPG method is based on similar principles as the PPG optical sensors. The initiated pulse wave travels through the arterial vascular system reaching various parts of the body and resolves a short-termed change in blood volume in the observed skin region. In the skin region, the intensity of the absorbed light depends on the blood volume [12]. So, due to changes in the RGB channel intensities, the BVP can be estimated. For the iPPG signal acquisition, video sequences are usually taken from the subject's face due to the high blood supply and imaging simplicity.
In the recent scientific literature, several image processing methods have been proposed for the extraction of BVP signals from face videos [13][14][15]. The continuous and rapid development of iPPG extraction methods emphasizes the importance of comparing different methods for iPPG-based PR and PRV estimation [16].

iPPG Extraction Methods Short Description
Green-Red Difference (GRD) iPPG is estimated by the green signal, while the red signal is considered as containing artifacts [17] Adaptive Green-Red Difference (AGRD) An adaptive color difference operation between the green and red channels is applied to reduce motion artifacts [18] Principal Component Analysis (PCA) The most relevant information of RGB data is expressed as a set of new orthogonal variables, called principal components [19] Independent Component Analysis (ICA) RGB signals are decomposed by means of blind source separation and the component with the most prominent peak in the PR bandwith is chosen according to [22] Chrominace-Based (CHROM) A model of PPG-induced variations in color intensities is employed to improve motion robustness [23] Plane-Orthogonal-to-Skin (POS) Improved version of CHROM method which uses a projection plane orthogonal to the skin tone for pulse extraction [13] Laplacian Eigenmap (LE) Unfolds data distribution in a hyperdimensional space in order to reduce dimensionality [20] Stochastic Proximity Embedding (SPE) Generates an one-dimensional Euclidean embedding out of the RGB-data, where the similarities between the related observations are preserved [21] Theoretical comparison can be complicated because iPPG extraction algorithms consist of multiple steps that are often nonuniformly described. To overcome this complexity, the proposed main steps of iPPG-based PR estimation according to [15] were applied. In addition to their main steps, a stepwise overview of the PRV extraction was included.
The publicly available UBFC-RPPG dataset designed for iPPG analysis was used to test the performances of the eight algorithms [24]. Considering that the UBFC-RPPG dataset mimics realistic computer interaction including rigid-and nonrigid face movements and illumination variances, we aimed at optimizing the algorithms by applying ad hoc pre-and postprocessing steps to create robustness for the movements and illumination variances, and thus, make the image-based PPG applicable for daily use.
To summarize, the general objective of this study is to verify if the iPPG system can successfully provide clinically useful information on ANS regulation, in particular in the field of vagal activity quantification, where the study of HRV (and therefore of PRV) is considered crucial. To this end, we evaluated iPPG reliability in terms of PRV metrics in time, frequency, and nonlinear domains, in addition to the PR mean, which is less sensitive to errors due to its averaging effect and less specific regarding the quantification of ANS modulation. Considering that several different algorithms can be used for iPPG signal extraction, we compared eight methods (see Table 1) to investigate which algorithm shows the best performance in terms of PRV features in realistic human-machine interaction scenarios. Among these methods, SPE was tested for the first time in the field of iPPG signal processing. The iPPG-based PR and PRV metrics were compared with the state-of-the-art wearable technique, the finger clip PPG.

Dataset Description
The UBFC-RPPG dataset [24] consists of 42 videos acquired from 42 subjects. The dataset was created by making use of a custom C++ application for video acquisition while filming using a simple low-cost webcam (Logitech C920 HD Pro) with a resolution of 640 × 480 in uncompressed 8-bit RGB format at 30 fps. All experiments were conducted indoors with a varying amount of sunlight and indoor illumination. The subjects were required to be seated one meter away from the camera while playing a sensitive mathematical game that aimed at an augmentation of the heart rate and simultaneously emulating a normal human-computer scenario. Since the mathematical task required the full focus of the subject, they kept staring at the screen during the experiment. Therefore, the face remained always visible in the webcam-recorded video. To receive the ground truth data, a pulse oximeter finger clip sensor (Contec Medical CMS50E) was used to acquire PPG signals synchronized with each video. The experimental setup and an exemplary frame of one video clip can be seen in Figure 1. The duration of the video recordings in the UBFC-RPPG dataset varies from 45.6 s (1368 frames) to 68.4 s (2052 frames).

iPPG Signal Processing
The proposed algorithmic framework for iPPG signal extraction makes use of eight different image-based extraction algorithms (GRD, AGRD, PCA, LE, SPE, ICA, CHROM, and POS), whose performances are optimized through ad hoc signal pre-and postprocessing steps. A sequence of RGB frames is taken as an input. Each frame consists of pixels with coordinates i,j, which can be described by vectors c i,j (t) = (r i,j (t), g i,j (t), b i,j (t)) T , i.e., the transposed red r i,j (t), green g i,j (t), and blue b i,j (t) channels. The iPPG extraction algorithm comprises four substeps, as presented in the workflow image in Figure 2. To retrieve the final iPPG signal for accurate comparison with the finger-clip-extracted PPG signal, four substeps need to be completed: (1) ROI selection and RGB extraction, where the changing RGB signal in the skin pixels can be measured due to the passing blood pulse; (2) preprocessing, where the raw RGB signal is detrended and filtered to assure blood pulse measurements in a physiological range; (3) iPPG extraction, where the filtered RGB signal is translated in the raw iPPG signal (in this study, this step was executed testing eight different algorithms to allow a performance comparison); and (4) postprocessing, where the raw iPPG signal is removed from noise. When completing this final step, the iPPG signal is ready for PR and PRV analysis. Concerning the second and the last substeps, the selection of the filter processes is based on reported methods in the literature. Then, these methods were experimentally tested on the UBFC-RPPG dataset, where only the pre-and postprocessing method combinations with the best performances were applied. The combination of the SPA detrending with the MA filter and bandpass filter shows the most promising results for the preprocessing of the RGB signal. On the other hand, the combination of the wavelet, bandpass, EMD filters, and the outlier suppression was found to be the most promising for the postprocessing of the initial iPPG signal. Each substep is extensively discussed below.  Flow chart of the four steps of blood volume pulse signal extraction using imaging photoplethysmography. A frame taken from one of the UBFC-RPPG videos is shown in the figure [24], the result of face detection is shown in green.

Region of Interest (ROI) Selection and RGB Extraction
The purpose of this algorithmic step is to constitute the temporal RGB signals that are later used to estimate the iPPG signal. These raw signals are directly extracted from the region of interest (ROI). Selecting a suitable ROI is therefore a crucial and challenging step, as it has a direct impact on the reliability and accuracy of the overall algorithm [27]. The raw RGB signal extraction exists out of four substeps, including face detection, face tracking, skin masking, and raw RGB signal extraction.

1.
Face detection: There are three main popular approaches for the ROI choice, namely taking a rectangular ROI encompassing the whole face [11,23,28,29], considering the forehead and cheekbone regions [30,31], or considering only the forehead [32]. Since, in the UBFC-RPPG dataset, hair covers the forehead region for several subjects, our approach considers the whole face region as ROI. To detect the face and select the ROI, a facial rectangle is located for the first frame, employing a cascade classifier constructed using the Viola-Jones algorithm [33].

2.
Face tracking: To eliminate the problem of rigid head motions where the subject moves their head outside the defined ROI bounding box, a face tracking system is desirable. Given the high computational cost of the Viola-Jones algorithm, we use the Lucas-Kanase approach (KLT tracker) [34], tracking specific features of the face over time.

3.
Skin masking: Only skin pixels contribute to the PR-related information. Therefore, skin masking is performed on every frame to filter out the nonskin pixels. The RGBH-H-CbCr skin color model [35] is applied: a skin color map is determined based on the skin color distribution and utilized on the chrominance segments of the input frames to distinguish pixels that seem to be skin.

4.
Raw RGB signal extraction: The time-variant raw RGB signals c 0 (t) = (r 0 (t), g 0 (t), b 0 (t)) are produced by calculating the average pixel value of the skin pixels within the ROI for all frames over time. The average color intensities over the ROI frames in time are calculated as follows:

Preprocessing of RGB Signals
Noise and artifacts are removed from the raw RGB signals for each frame, making sure to not suppress frequency components in the human heart rate bandwidth. The three preprocessing steps applied in this study are detrending, moving average (MA), and bandpass filtering [36]. Detrending of the raw RGB signal is a substantial step, as the pulsatile component of the iPPG signal has a much lower amplitude than the slowly varying baseline [17]. In this study, the smoothness priors approach (SPA) is applied according to [11,37]. The great advantage of this method over a simple mean-centering detrending method is that the trend can be removed without affecting the PR bandwidth. Applying the moving average (MA) filter, smoothed signals c(t) = (r(t), g(t), b(t)) are obtained, where the high-frequency noise is suppressed. The bandpass filter suppresses frequency components outside the PR bandwidth (0.65-4 Hz) [11,28].

iPPG Signal Extraction
The aim of this step in the algorithm is to extract the raw iPPG signal, i.e., iPPG 0 , computing the weights for each of the preprocessed RGB color signals as follows: where w r , w g , and w b are, respectively, the weights of the R, G, and B color channel signals [15]. To compute these weights, different methods can be deployed. In this framework, eight different methods are compared: the GRD, AGRD, PCA, ICA, LE, SPE, CHROM, and POS.
GRD: In the GRD method [17], the green channel is used for the extraction of iPPG, since it generally has the highest signal amplitude [38], whereas the red channel is considered to be an artifact. Thus, the iPPG signal can be computed as follows: AGRD: The AGRD method [18] includes an adaptive bandpass filter with the aim to remove residual motion artifacts within the iPPG signal. The approach can be described by the following equation: It should be noted that preprocessing is an essential step for the AGRD method because, otherwise, g(t) = g 0 (t) and r(t) = r 0 (t) will result in a zero iPPG signal. PCA: The PCA procedure [19,28,39] is a linear dimensionality reduction technique that identifies patterns in the RGB signals in order to capture intensity variations due to blood pulses. In the PCA, a set of observed signals from correlated variables is projected into a linearly uncorrelated orthogonal basis, called principal components. The principal components are defined by where X is the set of observed signals and f i are the corresponding eigenvectors of the covariance matrix C = E(XX T ). The number of principal components is usually lower than the number of observed multivariate signals. ICA: The ICA technique [11,22,40] is the most popular blind source separation technique for iPPG computation. It is used to separate unknown source signals S(t) from a set of observed mixed signals given by X(t) = AS(t), where A is the mixing matrix [14]. The approximated source signals can be found asŜ(t) = Wt, where W is the separation matrix that approximates the inverse of A. ICA assumes that the components are statistically independent and non-Gaussian and will then choose the component with the most prominent peak in the PR bandwidth. CHROM: The CHROM method, as proposed by [23], aims for robustness to subject motion by employing a model of PPG-induced variations in color intensity. For this technique, the iPPG signal is defined as: with σ 1 (t, L) and σ 2 (t, L) being the L-point running standard deviations, defined as: In the algorithm used for this framework, L = 1.6 s, as suggested in [23]. POS: The POS method [13] is quite similar to the CHROM method and can be considered as its simplified and improved version. The iPPG signal is calculated as follows: Here, σ 1 (t, L) and σ 2 (t, L) are again the L-point running standard deviations of x 1 (t) and x 2 (t), respectively. However, now, they are defined as Like in the CHROM method, L corresponds to 1.6 s, as suggested in [13]. LE: The LE method [20] is a technique aimed at unfolding a nonlinear data distribution in a hyperdimensional space, in order to reduce its dimensionality. When the approach is applied as the iPPG extraction method, it should increase the accuracy in the separation of the iPPG signal from residual sources of fluctuations in light. The LE algorithm maps the averaged RGB signals for the ith frame into a three-dimensional (R-G-B) space, and the final goal is to map their distribution onto a one-dimensional space, preserving the local relationship between data points. Firstly, the adjacent graph G is constructed, computing the Euclidean distance between the data points.
Nodes i and j of the graph G are considered adjacent if i is among the k-nearest neighbors of j and vice versa. Then, manifold learning is used to solve the following optimization problem: where y i and y j are two points which are nearest neighbors in the low-dimensional space, and W ij is the weight that measures the closeness of the points x i and x j in the higher-dimensional space. For the extraction of the iPPG signal, the parameter k is usually set at 12 [20]. In our study, we tested the LE algorithm using different distance metrics (Euclidean, city-block, Chebyshev, Minkowski, and Mahalanobis) and changing the value of the parameter k. According to the experimental results on the UBFC-RPPG dataset, the Mahalanobis distance with k = 9 resulted the most promisingly. SPE: SPE [21] is a self-organizing algorithm used to produce low-dimensional embeddings that preserve similarities between a set of related observations. In this study, the SPE approach is applied for the first time to RGB data. In our case, the similarities are the fluctuations in the RGB signal intensities due to blood pulsation, from which the iPPG signal can be estimated. The method starts with an initial configuration, and iteratively refines itself by randomly selecting points x a , x b and adjusting their coordinates to match the Euclidean distances d a,b on the map more closely to their respective proximities r a,b . To avoid oscillatory behavior, the magnitude of the adjustments was controlled by a learning rate parameter, λ, which decreases during the data point refinement. The refined coordinates are updated by: and where is a small value to avoid the division by zero.

Postprocessing of iPPG Signals
Postprocessing of the photoplethysmographic signal iPPG 0 , extracted from RGB channels, is the fourth and last algorithmic step. Postprocessing improves the quality of the signal and can be especially useful when the artifacts and noise were not (completely) removed at the preprocessing step. Within this framework, the wavelet bandpass filtering, the EMD procedure, and outlier suppression are considered.

1.
Wavelet filtering: An adaptive two-step wavelet filtering [13,17,18,41] is applied, assuming that frequency components of the iPPG 0 signal related to noise have weaker power with respect to the components related to the PR. The first step of the method was to perform a continuous wavelet of the iPPG 0 signal. Here, the wavelet coefficients with a wide Gaussian window centered at a scale corresponding to the maximum of squared wavelet coefficients are averaged over a 15 s temporal running window. Secondly, a general Gaussian filter is applied. To reconstruct the iPPG signal, the inverse continuous wavelet is performed [41].

2.
Empirical mode decomposition: The purpose of empirical mode decomposition (EMD) [42] is to split the iPPG 0 signal into a noise component and PR-related component. The EMD technique decomposes the signal into several unique intrinsic mode functions (IMFs) and one residue function (R) according to: with I MF i (t) the i th IMF at time step t, R n (t) the residue function at time step t, and n the number of EMD iterations. The extracted IMF signal is the filtered main signal, making the peak frequency more lucid and, herewith, making the iPPG signal more reliable for PR and PRV metric extraction.

3.
Outlier suppression: Am MA filter was again applied to smooth out the signal and suppress the high-frequency peaks that correspond to noise. Figure 3 shows the iPPG signal extracted from the video clip of the first subject of the UBFC-RPPG dataset by using the POS method.

Pulse Rate and Pulse Rate Variability Analysis
In this study, we compared cardiovascular metrics extracted from iPPG and finger PPG through the study of PR and PRV. In order to extract such information, the interbeat interval series (IBI) was extracted from both PPG signals, estimating the time intervals between the successive systolic peaks [43,44]. Ectopic beats were corrected by using the Kubios HRV tool, obtaining the series of normal-to-normal (NN) intervals [45]. Each IBI corresponds to a cardiac cycle, and thus, the PR is equal to the inverse of the IBI duration. The average PR is the mean value of PR in the whole signal duration. In order to obtain a comprehensive characterization of the performances of the two techniques (contactless and wearable) for the assessment of cardiovascular dynamics, we analyzed not only the average PR but also the features extracted from the PRV in the time and frequency domains. Concerning the time domain, two statistical indexes were computed: the root mean square of the successive differences (RMSSD) and the standard deviation of NN intervals (SDNN). In addition, we computed two geometrical parameters based on the study of the probability density distribution of NN intervals: the triangular index (TI) and the triangular interpolation of the NN intervals (TINN).
As regards the frequency domain, we considered the power in three main bandwidths: very low frequency (VLF, below 0.04 Hz), low frequency (LF, between 0.04 Hz and 0.15 Hz), and high frequency (HF, between 0.15 and 0.4 Hz). The normalized values of the power in the LF and HF bands were also computed (LFnu = LF LF+HF , HF = HF LF+HF ), together with the LF/HF ratio.
Applying chaos theory methodologies to characterize PRV nonlinear dynamics, we studied the Poincaré first return map of the NN interval series. We quantified the point dispersion in the Poincaré map by using the ellipse fitting method [46]. The features SD1 and SD2 represent the standard deviation of the points along the short and long axis of the ellipse that best fits the data in the Poincaré map, and SD1/SD2 is their ratio [46,47].
An overview of the PRV metrics is given in Table 2.

Quality Metrics
To assess the reliability of the estimated PRV features extracted from iPPG, we compared them with the related finger clip PPG reference values. The following statistical analyses were applied: Spearman's correlation analysis, Bland Altman analysis, and normalized root mean square error computation. The Spearman correlation analysis was applied to quantify the monotonic relationship between the estimated (iPPG-based) and reference (finger clip PPG-based) PR and PRV metrics. The use of such a nonparametric test is justified by the non-Gaussian distribution of the samples (p < 0.05 of the null hypothesis of having Gaussian samples of the Kolmogorov-Smirnov test [48]). The Spearman correlation coefficient ρ and the related p-values were extracted form each correlation test.
To measure the quality of fit between the estimated (iPPG-based) and the reference (finger clip PPG-based) metrics, the normalized root mean square error was calculated [49]. The NRMSE is given by: with The normalization of the RMSE facilitated the performance comparison between the different extraction methods.
The Bland-Altman analysis was used to assess the agreement between the datasets made by the feature values for both iPPG and finger clip PPG methods. The coordinates of each point in the Bland-Altman plot were represented by the mean of the measurements using the iPPG signal, x iPPG , and the finger clip PPG signal, x PPG , per subject on the x-axis, as well as the difference between these two measurements on the y-axis: Bland-Altman quantifies the bias between the mean differences labeled as 95% limits of agreement (LoA). The LoA was calculated as follows: LoA(95%) =d ± 1.96SD (16) whered is the mean difference between the iPPG-based estimation and the finger clip PPG-based reference value, and SD is the standard deviation of the differences [50].

Results
For all the different iPPG signal extraction methods, including RGD, AGRD, PCA, LE, SPE, ICA, CHROM, and POS (see Section 2.2.3), we extracted the PRV features listed in Table 2. For each of these methods, we employed the Spearman correlation coefficient (ρ) with its p-value, the NRMSE, and the Bland-Altman analysis, including the mean bias and its 95% confidence intervals (expressed byd ± 1.96SD). Table 3 summarizes the results of the Spearman correlation analysis on the PRV features, showing the correlation coefficients (ρ) and their statistical significance (p-value) per each iPPG-extraction method. Considering the averaged PR, Spearman's correlation coefficient was higher than 0.9 in five of eight methods (except for PCA, LE, and SPE). We also obtained strong monotonic relationships (ρ > 0.7) with statistical significance (p < 0.05) for SDNN in the time domain, VLF-power and LF-power in the frequency domain, and for SD2 and SD1/SD2 in the nonlinear domain. The CHROM and POS methods strongly correlated over all the above-mentioned features with, respectively, ρ = 0.972 and ρ = 0.994 for PR, ρ = 0.768 and ρ = 0.818 for SDNN, ρ = 0.935 and ρ = 0.939 for VLF-power, ρ = 0.831 and ρ = 0.903 for LF-power, ρ = 0.882 and ρ = 0.936 for SD2, and finally, ρ = 0.734 and ρ = 0.698 for SD1/SD2 (p-value < 0.001 for all). Table 4 summarizes the results of the calculated values of NRMSE per iPPG extraction method for all subjects. When the NRMSE approaches the zero value, the iPPG-based extraction method fits the ground truth. Once the mean NRMSE < 0.20, the method was considered a good fit. For all features, high performance was obtained, except for the HF-power, as well as LF/HF in the frequency domain.

Normalized Root Mean Square Error
Concerning the time-domain analysis, the POS approach resulted to be the most performing, returning the lowest NRMSEs for all the features, except for the RMSSD values, which were better fitted by using GRD method. Moving to the frequency domain, the NRMSEs were higher than in the time domain. Moreover, in this case, the most promising results were obtained through the POS approach, with NRMSE < 0.20 for VLF-power, LF-power, HFnu, and LFnu. Interestingly, observing the results of the Poincaré map features (SD1, SD2, and SD1/SD2), a good performance of the GRD and POS methods can be noticed.

Bland-Altman Analysis
Bland-Altman analysis was applied to quantify the agreement between the PRV parameters derived from iPPG and finger clip PPG [51]. In Table 5, the Bland-Altman mean differences,d, and the confident intervals, [d ± 1.96SD], are depicted.  Table 5 reveals a close agreement between the iPPG-and PPG-based PRV feature extractions, especially in the time domain and for nonlinear features. Considering the PR, the LE method showed the smallest mean difference (d = −0.0409 BPM), but the corresponding 95% intervals reported a large variation between the estimations. On the other hand, the POS method showed more robustness with a mean difference of −0.186 BPM and a corresponding 95% confidence interval from −4.05 to 3.68 BPM (the corresponding Bland-Altman plot is reported in Figure 4). Regarding RMSSD, the GRD method disclosed the smallest mean bias and 95% confidence interval. Beholding SDNN, TI, and TINN, the CHROM method showed the smallest mean biases. However, the POS method, even if presenting higherd values, reduced the corresponding variations in the 95% limits. Based on the results of the Bland-Altman analysis, the POS and CHROM methods were found to also be the best for frequency features, except for VLF-power, where the lowest mean difference and confidence interval were found by applying the ICA method. Concerning the nonlinear techniques, the GRD method showed the highest agreement for SD1, with a mean bias of −2.37 ms and a corresponding 95% confidence interval from −31.6 to 26.8 ms. Regarding SD2, the POS method showed the highest accuracy with a mean bias of −2.47 ms and with 95% limits of agreement: −19.1 to 14.2 ms. As for the ratio of SD1 to SD2, the smallest biases were found using the GRD and POS methods (−0.163 and −0.169, respectively), but the narrowest confidence interval was obtained with the CHROM approach (from −0.491 to 0.136), implying more robustness. The CHROM method showed a corresponding mean bias of −0.177 for SD1/SD2.

Discussion and Conclusions
We evaluated contactless iPPG performance, comparing it with finger clip PPG, which is considered the ground truth for wearable monitoring [52,53]. The publicly available UBFC-RPPG dataset, including 42 subjects, was used to facilitate the replication of the results. The realistic human-machine interaction mimicked in the UBFC-RPPG dataset is a big step toward daily use of the contactless iPPG, as face motion (rigid and nonrigid), illumination variances, and subjects with make-up are included. The previous literature in the field of remote PPG usually presents a validation of the PPG signal extraction methods considering only the average of the PR. This procedure turns out to be incomplete and misleading, since, in this way, the artifacts are averaged out, resulting in an overall better agreement. A recent study compared the PRV extracted from iPPG with the HRV obtained from ECG during a specific experimental protocol in a controlled setting, looking at the reliability of the PRV features [50]. In this study, we extracted the PRV signals from the UBFC-RPPG videos, and we investigated the reliability of PRV features by comparing them with the corresponding wearable PPG values. We analyzed the PRV signals in both the time and frequency domains [2], together with the parameters extracted trough the Poincaré first return map, a technique taken from the theory of nonlinear time series analysis [46,47].
To investigate the feasibility of contactless iPPG-based PRV feature extraction in realistic daily settings, the iPPG extraction procedure was made more robust for the motion and illumination artifacts by applying ad hoc pre-and postprocessing steps, as described in Sections 2.2.2 and 2.2.4. Eight different methods for iPPG signal extraction were used (see Section 2.2.3 for details): GRD, AGRD, PCA, LE, SPE, ICA, CHROM, and POS. These approaches included both linear (e.g., GRD) and nonlinear (e.g., LE) techniques already proposed in the past literature, but also other methodologies used here for the first time to process RGB images (i.e., SPE).
To evaluate the agreement of the contactless iPPG-and wearable PPG-based PRV features, we computed three statistical analyses: the Spearman correlation test, the NRMSE calculation, and the Bland-Altman analysis. Considering the results obtained from the three statistical analyses (see Tables 3-5), we were able to reach high performances in the computation of PR mean, especially with the GRD, AGRD, ICA, CHROM, and POS methods. Looking at the standard PRV features, the POS method resulted to be the most reliable in both time and frequency domain, with a higher number of features with the biggest correlation coefficients, the lowest NRMSEs, and the narrowest confidence intervals in the Bland-Altman analysis. It showed the best results for the features which are related to the overall autonomic activity (e.g., SDNN, TINN, VLF-power, and LF-power), without specific physiological correlates in terms of sympathetic and parasympathetic modulation.
However, the real power of HRV and PRV, as tools for cardiovascular assessment, is given by the strong association between vagal activity and specific features related to shortterm variability of such time series, for example RMSSD and HF-power-related features.
Analyzing the values of the normalized HF-power (HF-power nu), we noticed an acceptable level of agreement (NRMSE = 0.19), a low bias (d = −1.26), and a significant correlation between iPPG and wearable PPG, but with a correlation coefficient equal to 0.49.
The study of the Poincaré map allowed us to broaden the investigation on the reliability of iPPG as a technique for the assessment of autonomic regulation. In fact, by adding to the list of standard features those calculated from the shape of the Poincaré plot using the ellipse fitting method, it is possible to evaluate two other parameters traditionally linked to vagal activity: SD1 and SD1/SD2 [54,55]. As for SD1, the most promising findings were obtained by applying the GRD method, but the Spearman correlation coefficient was equal to 0.4. The CHROM and POS methods outperformed the other methods for SD2 and SD12 computation. The iPPG SD1/SD2 was demonstrated to be the vagal-correlated parameter with the highest reliability, showing a Spearman correlation coefficient of 0.7, and of 0.73 when the POS and CHROM methods were used.
Our results suggest the use of the POS and CHROM techniques for the extraction of the PPG signal from ultrashort videos, since the related PRV features are the most reliable when compared with those obtained with wearable devices. In fact, using these approaches, it was proven that it is possible to characterize both overall autonomic (by computing SDNN or LF-power) and specific vagal dynamics (by extracting SD1/SD2). Approaches based on nonlinear techniques for dimensionality reduction (LE and SPE) led to less promising results. This poor performance could be due to the short duration of RGB signals (approximately 1 min 8 s).
To conclude, we can state that an iPPG system can successfully provide clinically useful information about autonomic dynamics, even when ultrashort video clips (less than 5 min [53,56]) are recorded. The possibility of extracting reliable information regarding changes in vagal activity through the Poincaré analysis of PRV signals obtained from iPPG techniques would allow to monitor the progress of many pathologies in a completely contactless manner. It is known, in fact, that reduced vagal activity correlates with various cardiovascular diseases, including heart failure, ischemia injury, arrhythmia, and hypertension [57], but also with metabolic [58] and mental disorders [59]. The remarkable agreement found for the SD1/SD2 feature might justify and promote the use of nonlinear methods, and especially the Poincaré map, as a reliable tool for iPPG analysis and vagal activity investigation. Cardiovascular assessment through common, cheap, and contactless devices would significantly improve the possibilities and conditions of monitoring, for example, in home monitoring applications, in periods when physical contact is not recommended (possible pandemics), and in subjects for whom even wearable devices are too obtrusive (infants and newborns).

Limitations and Future Work
The main limitations of our study concern the RGB signal extraction. In our study, the whole face was chosen as ROI. To automatize the ROI detection the Viola-Jones (face detection) and KLT tracker (face tracking) algorithms were applied. The downside hereof is that the face must always be visible, otherwise, the algorithm fails to find the ROI. Moreover, to create the skin mask, a simple thresholding method based on an empirical skin color distribution (RGBHH-CbCr model) was applied. However, when, for example, hair falls in the same color range, it is not excluded from the skin mask, disturbing the PPG extraction. In future works, making use of adaptive skin segmentation might benefit the quality of the RGB signal [27,60].
Our future research in the field of rPPG will also be directed towards the investigation of the influence of the recording length on the performance of nonlinear embedding size reduction techniques, such as LE or the here-proposed SPE. Indeed, considering that in previous works the LE method had been declared to be the most promising technique [14], we hypothesize that the recoding length could be a probable cause of the poor performance we found. On the other hand, considering that the CHROM and POS methods resulted to be the most performing with the UBFC-RPPG dataset, our future analyses will be oriented towards the application of these methods to other datasets with different light conditions and duration, as well as towards the optimization of these algorithms to test whether the introduction of specific changes could further improve their performance. Moreover, in future studies, we will always test the reliability of existing and new algorithms as a function of the quality of the videos, using the algorithms for image quality assessment present in the literature [25,26,61].

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

AGRD
Adaptive green-red difference ANS Autonomic nervous system CHROME Chrominace-based GRD