Assessment of ROI Selection for Facial Video-Based rPPG

In general, facial image-based remote photoplethysmography (rPPG) methods use color-based and patch-based region-of-interest (ROI) selection methods to estimate the blood volume pulse (BVP) and beats per minute (BPM). Anatomically, the thickness of the skin is not uniform in all areas of the face, so the same diffuse reflection information cannot be obtained in each area. In recent years, various studies have presented experimental results for their ROIs but did not provide a valid rationale for the proposed regions. In this paper, to see the effect of skin thickness on the accuracy of the rPPG algorithm, we conducted an experiment on 39 anatomically divided facial regions. Experiments were performed with seven algorithms (CHROM, GREEN, ICA, PBV, POS, SSR, and LGI) using the UBFC-rPPG and LGI-PPGI datasets considering 29 selected regions and two adjusted regions out of 39 anatomically classified regions. We proposed a BVP similarity evaluation metric to find a region with high accuracy. We conducted additional experiments on the TOP-5 regions and BOT-5 regions and presented the validity of the proposed ROIs. The TOP-5 regions showed relatively high accuracy compared to the previous algorithm’s ROI, suggesting that the anatomical characteristics of the ROI should be considered when developing a facial image-based rPPG algorithm.


Introduction
Cardiovascular disease (CVD) is a disease that can affect the heart and the body's vascular system. Most cardiovascular diseases exist as long-lasting chronic diseases, and there is a lack of appropriate measures to continuously monitor and prevent them [1]. In order to prevent CVD, it is necessary to continuously monitor vital signs for example electrocardiogram, heartbeat, and blood pressure, must be continuously monitored, and professional instruments, such as an IR-UWB heart rate monitor and invasive blood pressure monitor, are required to measure them. However, these devices are for professional use, are expensive, and are not suitable for home use. In addition to professional measuring instruments, there is a method of inferring vital signs, such as heart rate and blood pressure, using an electrocardiogram (ECG). Although electrocardiography is the most accurate method, a photoplethysmography (PPG) method has been developed that can infer the heartbeat in an inexpensive and simple way. PPG is 98% similar to ECG and is an optical technology that requires a single sensor [2]. PPG has become common in recent years and is widely used in wearable vital sign measuring devices, such as smartwatches.
Recently, research on noncontact technology has been progressing beyond contact-type devices, such as wearable devices and heart rate monitors. The photoplethysmography (PPG) measurement method using a facial image is called remote PPG (rPPG) and face PPG (fPPG); rPPG can be measured only with an RGB video camera. Research on rPPG technology was carried out by focusing on the PPG technology of oximeter. PPG is a method of acquiring the pulse waveform of blood vessels noninvasively by using the optical properties of changes in blood vessels on the skin and is used to find out the state of the heartbeat. According to Beer-Lambert's law [3], the absorbance of a single compound is proportional to its concentration. Hemoglobin has the highest absorbance at the green wavelength, which is a wavelength of 532 nm and utilizes the characteristic that biological tissue reflects and transmits part of the light when the light source is transmitted through the body. The rPPG measurement method using the RGB camera is based on the fact that the extracted value of the ROI from each frame is similar to the PPG waveform [4]. Figure 1 shows a graph of light absorption of deoxyhemoglobin (HHb), oxyhemoglobin (O2Hb), and carbaminohemoglobin (COHb), which are the most abundant in blood. The amount of light absorbed depends on the wavelength of the light, and it shows the greatest absorption at the wavelength of 400-440nm, which is the green channel. The absorption of the wavelength affects the change in the diffuse reflection value, which is responsible for the change in the information received by the RGB camera. of the heartbeat. According to Beer-Lambert's law [3], the absorbance of a single compound is proportional to its concentration. Hemoglobin has the highest absorbance at the green wavelength, which is a wavelength of 532 nm and utilizes the characteristic that biological tissue reflects and transmits part of the light when the light source is transmitted through the body. The rPPG measurement method using the RGB camera is based on the fact that the extracted value of the ROI from each frame is similar to the PPG waveform [4]. Figure 1 shows a graph of light absorption of deoxyhemoglobin (HHb), oxyhemoglobin (O2Hb), and carbaminohemoglobin (COHb), which are the most abundant in blood. The amount of light absorbed depends on the wavelength of the light, and it shows the greatest absorption at the wavelength of 400-440nm, which is the green channel. The absorption of the wavelength affects the change in the diffuse reflection value, which is responsible for the change in the information received by the RGB camera. Representative rPPG methods include the ICA [5], GREEN [6][7][8][9], CHROM [10], POS [11], SSR [12], PBV [13], and LGI [14] methods. The ROI selection method is largely divided into a color-based skin detector and a method for designating a chosen area, and there is no clear rationale for this. In this paper, seven representative methods of rPPG are compared with the ROI proposed by each method using pyVHR [15] to provide accuracy. Experiments are conducted using publicly available data, such as LGI [14] and UBFC [16], suggesting that the proposed ROI displays higher accuracy.
The main contributions of this work are:  Proposal of 31 ROIs that can be used in the rPPG method using an anatomical basis.  Proposal of a BVP similarity (rBS) metric for a performance evaluation in various ROIs.  Performance evaluation of the rBS rank the TOP-5 and BOT-5 using ROI combinations.
The software is available on GitHub (https://github.com/TVS-AI/Pytorch_rppgs (accessed date 26 November 2021)) for experimentation. This paper is organized as follows. The rPPG methods will be described in Section 2. Section 3 describes the ROI of Section 2's algorithms and the proposed region of interest. Section 4 presents the experimental results, and Section 5 provides a conclusion.

rPPG (Remote Photoplethysmography)
The pixels extracted from a face image taken with the RGB camera have face reproduction information, noise, and BVP values. Various methods for extracting the BVP have been studied by analyzing the raw signal in which various information is combined. Table 1 summarizes representative rPPG methods. As the result of the POS and CHROM method, it has relatively less spread of MAE and PCC values, and highly accurate results can be obtained [15]. Representative rPPG methods include the ICA [5], GREEN [6][7][8][9], CHROM [10], POS [11], SSR [12], PBV [13], and LGI [14] methods. The ROI selection method is largely divided into a color-based skin detector and a method for designating a chosen area, and there is no clear rationale for this. In this paper, seven representative methods of rPPG are compared with the ROI proposed by each method using pyVHR [15] to provide accuracy. Experiments are conducted using publicly available data, such as LGI [14] and UBFC [16], suggesting that the proposed ROI displays higher accuracy.
The main contributions of this work are: • Proposal of 31 ROIs that can be used in the rPPG method using an anatomical basis. • Proposal of a BVP similarity (rBS) metric for a performance evaluation in various ROIs. • Performance evaluation of the rBS rank the TOP-5 and BOT-5 using ROI combinations.
The software is available on GitHub (https://github.com/TVS-AI/Pytorch_rppgs (accessed date 26 November 2021)) for experimentation. This paper is organized as follows. The rPPG methods will be described in Section 2. Section 3 describes the ROI of Section 2's algorithms and the proposed region of interest. Section 4 presents the experimental results, and Section 5 provides a conclusion.

rPPG (Remote Photoplethysmography)
The pixels extracted from a face image taken with the RGB camera have face reproduction information, noise, and BVP values. Various methods for extracting the BVP have been studied by analyzing the raw signal in which various information is combined. Table 1 summarizes representative rPPG methods. As the result of the POS and CHROM method, it has relatively less spread of MAE and PCC values, and highly accurate results can be obtained [15].

Method Characteristic
GREEN [6][7][8][9] The green channel is preferred for BVP extraction because it has more diffuse reflection information from hemoglobin than other channels. In [17], an attempt was made to visually show the pulse change by maximizing the amount of change in the green channel.
ICA [5] A method of splitting a multidimensional signal into multiple components. The whitening matrix was obtained using Jacobian rotation, and the actual original signal was separated by multiplying the whitening matrix by the mixed signal. In [5], the mixed signal was separated into four independent components using the JADE method, and empirically, the second signal was used as the PPG signal.
CHROM [10] The CHROME method removes noise caused by light reflection through color difference channel normalization.
SSR [12] The SSR method is based on the absorbance of hemoglobin. Using Subspace Rotation and Temporal Rotation has the advantage of extending the pulse amplitude and reducing the distortion by the light reflection.
POS [11] The POS method aims to reduce the specular noise problem presented by the CHROM to the "plane orthogonal to skin" method. A PPG signal is generated by a projection of the plane orthogonal to skin tone from the temporally normalized RGB plan.
PBV [13] It suggests a pulse blood vector that distinguishes the pulse-induced color changes from motion noise in the RGB source.
LGI [14] It suggested a robust algorithm in various environment using differentiable local transformations

Typical ROI Methods
A facial image-based rPPG algorithm requires a process of finding a face region and selecting an ROI within the found region for efficient signal extraction. Two main methods are used to detect the face area. The most used method is (1) the Viola-Jones method for face detection, which detects a face using the Harr feature [18]. As an alternative to feature-based face detection. there is (2) a skin region detection method [19]. In the past, in the ROI selection process, a method was used based on the face area detected by the Viola-Jones algorithm [20]. This method had the problem of including the background of the border in the ROI in addition to the face area. In another study, using single or additional coordinates within the face area, the forehead, cheeks, and the proposed regions were selected as ROIs [21]. Table 2 shows the ROI selection method of the representative rPPG method mentioned in Section 2. Representative rPPG methods are tried to use the face area as much as possible without focusing on a specific ROI. GREEN and ICA were used for facial image cropping, and CHROM, SSR, POS, PBV, and LGI were used to generate rPPG signals by extracting only specific skin colors.

ROI Analysis Studies
A previous study mentioned that ROI affects signal quality and computational load in the rPPG method [21]. Studies also raised the problem of designating the entire face as an ROI. It was assumed that there would be a protruding part of the blood vessel distribution, Sensors 2021, 21, 7923 4 of 15 and the accuracy was evaluated for the forehead, left and right cheeks, nose, mouth, nasal dorsum, and chin. As a result, the cheeks and forehead were selected as excellent ROIs.

Thickness of Human Face Skin
rPPG is a contrast between specular reflection and diffuse reflection that occurs when light hits the skin. Specular reflections are pure light reflections from the skin, while diffuse reflections are reflections due to absorption and scattering of skin tissue that depend on changes in blood volume [22]. Figure 2 shows the principle of how the camera receives BVP (blood volume pulse) information. When the light source hits the skin, some of the light is absorbed by the skin and blood vessels, and the remaining diffuse reflection information is received by the camera. Depending on the thickness of the skin, the reflection information of the light source can be different. Although blood vessels decrease reflectance and transmittance, diffuse reflection exhibits sensitive dependence on the depth of blood vessels, that is, the thickness of the skin [23]. According to the thickness of the skin, the absorption amount of the light source decreases, which represents a large difference between the specular reflection and diffuse reflection information. The thickness of the dermis and epidermis of 39 anatomical sites of 10 cadavers were measured [24]. The 39 areas used in [24], the relative thickness of the dermis and the epidermis, and the relative thickness of the skin calculated based on the information are as shown in Table 3.

ROI Analysis Studies
A previous study mentioned that ROI affects signal quality and computational load in the rPPG method [21]. Studies also raised the problem of designating the entire face as an ROI. It was assumed that there would be a protruding part of the blood vessel distribution, and the accuracy was evaluated for the forehead, left and right cheeks, nose, mouth, nasal dorsum, and chin. As a result, the cheeks and forehead were selected as excellent ROIs.

Thickness of Human Face Skin
rPPG is a contrast between specular reflection and diffuse reflection that occurs when light hits the skin. Specular reflections are pure light reflections from the skin, while diffuse reflections are reflections due to absorption and scattering of skin tissue that depend on changes in blood volume [22]. Figure 2 shows the principle of how the camera receives BVP (blood volume pulse) information. When the light source hits the skin, some of the light is absorbed by the skin and blood vessels, and the remaining diffuse reflection information is received by the camera. Depending on the thickness of the skin, the reflection information of the light source can be different. Although blood vessels decrease reflectance and transmittance, diffuse reflection exhibits sensitive dependence on the depth of blood vessels, that is, the thickness of the skin [23]. According to the thickness of the skin, the absorption amount of the light source decreases, which represents a large difference between the specular reflection and diffuse reflection information. The thickness of the dermis and epidermis of 39 anatomical sites of 10 cadavers were measured [24]. The 39 areas used in [24], the relative thickness of the dermis and the epidermis, and the relative thickness of the skin calculated based on the information are as shown in Table 3.     * are normalized ratios calculated by dividing each thickness by the thinnest value in each category. (1) the relative thickness of the epidermis. (2) the relative thickness of the dermis. (3) the relative thickness.

Proposed ROI
To conduct rPPG experiments on the anatomical regions mentioned in Section 3.3.1, we selected the experimental regions. When selecting ROI candidates, the scalp area (temporal scalp, anterior scalp, posterior scalp), ear area (preauricular, upper helix, mid helix, conchal bowl, earlobe, rear ear), and neck area (anterior neck, lateral neck) were excluded. In addition, the area around the eyes (upper medial eyelid, upper lateral eyelid, lower eyelid, and tear trough) was integrated into one region because the size of the region was small. Finally, the symmetrical parts such as the nasolabial fold and marionette fold were divided into two areas, left and right. Table 4 shows the proposed 31 regions and skin thickness.

Assessment Metric of Proposed ROI
We used three measurement methods used in rPPG to evaluate the performance of the proposed ROIs. In addition, we propose a relative BVP similarity (rBS) method for evaluating the relative superiority of each ROI.
• MAE (Mean Absolute Error): MAE was used to see the accuracy of the estimated waveform for each rPPG method.
• RMSE (Root Mean Square Error): RMSE was used to view the standard mean error.
• PCC (Pearson's Correlation Coefficient): PCC is a method for interpreting the linear relationship between two given signals. The closer the absolute value of the PCC result to 1, the more linear it is.
S(t) is the ground truth, andŜ(t) is the result of the rPPG method. In addition, µ is the average value of S(t), and µ is the average value ofŜ(t). The results of each of the above three methods were processed to generate the rBS (relative BVP similarity), which is a final evaluation metric. rBS = (log(max(MAE) − MAE + e) + log(max(RMSE) − RMSE + e)) * |PCC| (4) In the rPPG method, the MAE is used as a measure to determine the absolute difference value from the actual BVP waveform, and the RMSE is used as a measure to determine the variance value of the difference. The PCC is used to determine the linear relationship between the measured value and the original value. The closer the absolute value of the PCC is to 1, the more linear it is. The waveform of the BVP is significant in extracting ultralow frequency (ULF), very low frequency (VLF), low frequency (LF), and high frequency (HF) well. The included disease information is shown in Table 5. The smaller the MAE and RMSE values, the more they were shown to be similar to the actual data so that the area with a smaller value is more effective. Because each frequency band means different information, it was designed to have a big impact on the linearity of the waveform.

ROI Assessment Procedure
In order to set the ROIs suggested in Section 3.3.2, three procedures were performed: the Face Mesh Generation, ROI Candidate Setting, and ROI Selection. Figure 3 is the procedure for assessing the proposed ROIs. The ROI setting was carried out in the preprocessing step of rPPG, and the ROI was created using the landmark created through the face mesh method. Face mesh extraction methods can be divided into cascaded regression-based and deep learning-based methods.
In the rPPG method, the MAE is used as a measure to determine the absolute difference value from the actual BVP waveform, and the RMSE is used as a measure to determine the variance value of the difference. The PCC is used to determine the linear relationship between the measured value and the original value. The closer the absolute value of the PCC is to 1, the more linear it is. The waveform of the BVP is significant in extracting ultralow frequency (ULF), very low frequency (VLF), low frequency (LF), and high frequency (HF) well. The included disease information is shown in Table 5. The smaller the MAE and RMSE values, the more they were shown to be similar to the actual data so that the area with a smaller value is more effective. Because each frequency band means different information, it was designed to have a big impact on the linearity of the waveform.

ROI Assessment Procedure
In order to set the ROIs suggested in Section 3.3.2, three procedures were performed: the Face Mesh Generation, ROI Candidate Setting, and ROI Selection. Figure 3 is the procedure for assessing the proposed ROIs. The ROI setting was carried out in the preprocessing step of rPPG, and the ROI was created using the landmark created through the face mesh method. Face mesh extraction methods can be divided into cascaded regression-based and deep learning-based methods.    Figure 4b shows the face mesh provided by the deep learning-based Mediapipe Project, which are Open-source Face Mesh Projects [25]. In the cascade regressionbased method, the representative project open face creates a face mesh with 68 key points and is available in Dlib. As a deep learning-based method, Google's Media-pipe Project creates a face mesh with 468 key points [26]. In [27], a comparison was conducted with the SAMM dataset composed of various emotion videos of human faces, and the Media-pipe showed high performance with a slight difference. Therefore, in this paper, face landmarks were created using a Media-pipe that can show excellent results in generating various ROIs, and ROI candidates were created by combining landmarks. available in Dlib. As a deep learning-based method, Google's Media-pipe Project creates a face mesh with 468 key points [26]. In [27], a comparison was conducted with the SAMM dataset composed of various emotion videos of human faces, and the Media-pipe showed high performance with a slight difference. Therefore, in this paper, face landmarks were created using a Media-pipe that can show excellent results in generating various ROIs, and ROI candidates were created by combining landmarks.

Data and Statistical Analysis
The rPPG method is affected by whether the input video is encoded, light uniformity, and skin color. When the video is encoded, the rPPG information is quantized, and the complete information may not be transmitted [28]. If the light is not uniform, the face is not properly detected [29]. The darker the skin color, the lower the amount of diffuse reflection because the melanin content changes [30].  available in Dlib. As a deep learning-based method, Google's Media-pipe Project creates a face mesh with 468 key points [26]. In [27], a comparison was conducted with the SAMM dataset composed of various emotion videos of human faces, and the Media-pipe showed high performance with a slight difference. Therefore, in this paper, face landmarks were created using a Media-pipe that can show excellent results in generating various ROIs, and ROI candidates were created by combining landmarks.

Data and Statistical Analysis
The rPPG method is affected by whether the input video is encoded, light uniformity, and skin color. When the video is encoded, the rPPG information is quantized, and the complete information may not be transmitted [28]. If the light is not uniform, the face is not properly detected [29]. The darker the skin color, the lower the amount of diffuse reflection because the melanin content changes [30].

Data and Statistical Analysis
The rPPG method is affected by whether the input video is encoded, light uniformity, and skin color. When the video is encoded, the rPPG information is quantized, and the complete information may not be transmitted [28]. If the light is not uniform, the face is not properly detected [29]. The darker the skin color, the lower the amount of diffuse reflection because the melanin content changes [30].
In this paper, the UBFC and LGI-PPGI datasets, which have the least three effects listed above, were selected to verify the validity of the proposed ROIs [14,16]. The UBFC and LGI-PPGI datasets are composed of raw video data and have uniform light brightness. Figure 6 shows the Fitzpatrick skin color types. Type I means Pale white skin color, Type II means Fair skin color, Type III means Darker white skin color, Type IV means Light brown skin color, Type V means Brown skin color, and Type VI means Dark brown or black skin. In this paper, experiments were conducted with light skin colors of Type I and II among the six skin colors classified on the Fitzpatrick scale. A proposed ROI mask was generated for two datasets, POS and CHROM were applied to the image to which the generated mask was applied, and superiority was verified using the proposed metric. Figure 6 shows the Fitzpatrick skin color types. Type I means Pale white skin color, Type II means Fair skin color, Type III means Darker white skin color, Type IV means Light brown skin color, Type V means Brown skin color, and Type VI means Dark brown or black skin. In this paper, experiments were conducted with light skin colors of Type I and II among the six skin colors classified on the Fitzpatrick scale. A proposed ROI mask was generated for two datasets, POS and CHROM were applied to the image to which the generated mask was applied, and superiority was verified using the proposed metric.

Benchmark Dataset
• UBFC [16]: It consists of 42 videos, heart rate, and a label in which the heart waveform is recorded. The participants looked directly at the camera installed at a distance of 1 m while filming the video and were filmed while solving the given quiz.

•
LGI-PPGI [14]: A video was recorded by giving 6 subjects four conditions: no motion, motion, vigorous motion, and dialogue.

Assessment of Proposed ROI
The results of the experiment with POS, CHROM on the UBFC and LGI-PPGI datasets are as follows. Figures 4 and 5 show the results of performing seven methods on the UBFC and LGI-PPGI datasets by specifying 31 regions. It can be seen that the MAE and RMSE values of region numbers 0, 1, 3, and 27 are excellent regardless of the method type. Figure 6 is the PCC result, and the values of region numbers 0, 10, 27, and 28 show results close to 1. Figure 7 shows the results of the MAE, RMSE, and PCC metrics on the UBFC data. The yellow boxes show the TOP-5 score, and the blue boxes show the BOT-5 score for each metric. The yellow box indicates the TOP-5 in each metric, it can be seen that region 0 and region 10 are commonly included in the TOP-5 in the whole metrics.

Benchmark Dataset
• UBFC [16]: It consists of 42 videos, heart rate, and a label in which the heart waveform is recorded. The participants looked directly at the camera installed at a distance of 1 m while filming the video and were filmed while solving the given quiz.

•
LGI-PPGI [14]: A video was recorded by giving 6 subjects four conditions: no motion, motion, vigorous motion, and dialogue.

Assessment of Proposed ROI
The results of the experiment with POS, CHROM on the UBFC and LGI-PPGI datasets are as follows. Figures 4 and 5 show the results of performing seven methods on the UBFC and LGI-PPGI datasets by specifying 31 regions. It can be seen that the MAE and RMSE values of region numbers 0, 1, 3, and 27 are excellent regardless of the method type. Figure 6 is the PCC result, and the values of region numbers 0, 10, 27, and 28 show results close to 1. Figure 7 shows the results of the MAE, RMSE, and PCC metrics on the UBFC data. The yellow boxes show the TOP-5 score, and the blue boxes show the BOT-5 score for each metric. The yellow box indicates the TOP-5 in each metric, it can be seen that region 0 and region 10 are commonly included in the TOP-5 in the whole metrics.      Figure 8 shows the results of the MAE, RMSE, and PCC metrics on LGI-PPGI data. Regions 0, 10, and 27 are commonly included in TOP-5 in the whole metrics. Regions 0 and 10 were found to be the best regions in both datasets.          Figure 10 is a visualization of the results of Table 5. The yellow areas are the TOP-5 regions, and the blue areas are the BOT-5regions. The white regions are the other remaining 21 regions.       Table 7 is an analysis table for the correlation among the ROIs, the thickness of the skin, and the number of pixels in the region. As a result of Pearson's correlation, the correlation between skin thickness and rBS rank was 0.50, with moderate positive linearity, and the number of pixels in each region was −0.53, with moderate negative linearity. It can be seen that the thinner the skin and the larger the region, the better the results obtained. According to the results of Table 6, it was shown that there was a correlation with the thickness of the skin and the number of pixels in the region. However, the average number of pixels in the proposed TOP-5 regions is 696 pixels, which is very different from the existing 25,000 pixels used for the entire face. The smaller the region, the easier it is to be exposed to noise, such as light distortion or movement. To solve this problem, a combination of regions was proposed, and an experiment was conducted. Table 8 shows the region combination of the proposed region and the evaluation results of the existing ROI method. The average thickness of TOP-5 is 1191.11, and the number of pixels is 2431. BOT-5 has an average thickness of 1581.39 and an immersive pixel count of 1030. As a result, the region combination had a positive effect on the improvement of the results, and the proposed TOP-5 combination showed higher accuracy than the Face + Skin method, and BOT-5 showed lower accuracy. 955 5 29 Right Lower Cheek 1335.91 840 10 30 Left Lower Cheek 1335.91 1174 13 Correlation coefficient (Thickness, rBS rank) 0.50 (# of pixels, rBS rank) −0.53  Figure 11 is the BVP extracted from the proposed ROI using the POS method. Yellow is the BVP extracted from the TOP-5 ROI. Comparing it with the blue BOT-5 BVP, the yellow waveform is more similar to the green ground truth. In particular, there is less variability and less noise than the blue waveform.  Figure 11 is the BVP extracted from the proposed ROI using the POS method. Yellow is the BVP extracted from the TOP-5 ROI. Comparing it with the blue BOT-5 BVP, the yellow waveform is more similar to the green ground truth. In particular, there is less variability and less noise than the blue waveform. Figure 11. rPPG signal extracted with the proposed ROI using POS (yellow: TOP-5, green: ground truth, blue: BOT-5).

Conclusions
In summary, in this paper we have proposed:

•
Proposal of ROI candidates among 31 facial regions through skin thickness and anatomical analysis.

•
A metric called rBS that can be used to assess the excellence of each ROI.
In conclusion, the ROI selection in the rPPG method is as important as the signal extraction method. As rPPG uses diffuse reflection information, it has been demonstrated that the thickness of the skin affects the result. To extract the validity of skin thicknessbased ROI selection, 31 masks and rBS metrics were proposed. For the UBFC and LGGI datasets, CHROM, GREEN, ICA, PBV, POS, SSR, and LGI were experimentally verified. Figure 11. rPPG signal extracted with the proposed ROI using POS (yellow: TOP-5, green: Ground truth, blue: BOT-5).

Conclusions
In summary, in this paper we have proposed:

•
Proposal of ROI candidates among 31 facial regions through skin thickness and anatomical analysis. • A metric called rBS that can be used to assess the excellence of each ROI.
In conclusion, the ROI selection in the rPPG method is as important as the signal extraction method. As rPPG uses diffuse reflection information, it has been demonstrated that the thickness of the skin affects the result. To extract the validity of skin thicknessbased ROI selection, 31 masks and rBS metrics were proposed. For the UBFC and LGGI datasets, CHROM, GREEN, ICA, PBV, POS, SSR, and LGI were experimentally verified. In addition, using the proposed rBS metric, experiments were conducted on 31 areas of the face. The right malar, left malar, glabella, lower medial forehead, and upper medial forehead showed the best results for BVP and BPM extraction. Each area showed a strong correlation with the actual signal, and especially the PCC result was excellent.
Lastly, as the information that can be obtained in one area of the proposed ROI is limited, experiments were conducted on the TOP-5, the entire face, and BOT-5, and the superiority of the TOP-5 was proven. Therefore, it will contribute to effective ROI promotion in the future facial image-based rPPG extraction method, and an improvement of reliability and accuracy of the rPPG method is expected through effective ROI selection.
Existing rPPG methods focused on how well to remove noise from the extracted color information by extracting the color information of the ROIs. Through this study, the superiority of the proposed ROIs using the existing rPPG methods were verified, and it was found that the ROI affects the accuracy of the rPPG method. The rPPG methods that have been conducted so far lack research on the correlation of the ROIs. In a future study, we intend to generate an rPPG algorithm that learns the expression of the correlation in each region using the GNN (Graph Neural Network).