Next Article in Journal
Solar–Hydrogen Storage System: Architecture and Integration Design of University Energy Management Systems
Next Article in Special Issue
Addressing the Non-Stationarity and Complexity of Time Series Data for Long-Term Forecasts
Previous Article in Journal
AI-Based Approach to Firewall Rule Refinement on High-Performance Computing Service Network
Previous Article in Special Issue
Deep Feature Retention Module Network for Texture Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring the Feasibility of Vision-Based Non-Contact Oxygen Saturation Estimation: Considering Critical Color Components and Individual Differences

1
Department of AI & Informatics, Graduate School, Sangmyung University, Seoul 03016, Republic of Korea
2
Department of Human-Centered Artificial Intelligence, Sangmyung University, Seoul 03016, Republic of Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2024, 14(11), 4374; https://doi.org/10.3390/app14114374
Submission received: 11 March 2024 / Revised: 16 May 2024 / Accepted: 18 May 2024 / Published: 22 May 2024
(This article belongs to the Special Issue State-of-the-Art of Computer Vision and Pattern Recognition)

Abstract

:
The blood oxygen saturation, which indicates the ratio of oxygenated hemoglobin to total hemoglobin in the blood, is closely related to one’s health status. Oxygen saturation is typically measured using a pulse oximeter. However, this method can cause skin irritation, and in situations where there is a risk of infectious diseases, the use of such contact-based oxygen saturation measurement devices can increase the risk of infection. Therefore, recently, methods for estimating oxygen saturation using facial or hand images have been proposed. In this paper, we propose a method for estimating oxygen saturation from facial images based on a convolutional neural network (CNN). Particularly, instead of arbitrarily calculating the AC and DC components, which are essential for measuring oxygen saturation, we directly utilized signals obtained from facial images to train the model and predict oxygen saturation. Moreover, to account for the time-consuming nature of accurately measuring oxygen saturation, we diversified the model inputs. As a result, for inputs of 10 s, the Pearson correlation coefficient was calculated as 0.570, the mean absolute error was 1.755%, the root mean square error was 2.284%, and the intraclass correlation coefficient was 0.574. For inputs of 20 s, these metrics were calculated as 0.630, 1.720%, 2.219%, and 0.681, respectively. For inputs of 30 s, they were calculated as 0.663, 2.142%, 2.612%, and 0.646, respectively. This confirms that it is possible to estimate oxygen saturation without calculating the AC and DC components, which heavily influence the prediction results. Furthermore, we analyzed how the trained model predicted oxygen saturation through ‘SHapley Additive exPlanations’ and found significant variations in the feature contributions among participants. This indicates that, for more accurate predictions of oxygen saturation, it may be necessary to individually select appropriate color channels for each participant.

1. Introduction

Oxygen saturation is an indicator of how much of the hemoglobin in the blood is bound to oxygen. Oxygen saturation is generally expressed as a percentage and refers to the ratio of hemoglobin combined with oxygen. It is used as an important indicator in evaluating the amount of oxygen supplied to cells and tissues. Oxygen saturation plays a key role in maintaining a healthy state. A sufficient oxygen supply is essential for maintaining cell metabolism and function, and oxygen deficiency can lead to serious physiological problems. Therefore, measuring oxygen saturation helps to evaluate the oxygen supply status of the body and identify abnormalities in blood circulation and respiratory function [1,2]. In particular, the importance of oxygen saturation has become more prominent in infectious disease situations such as COVID-19. Even before symptoms develop due to COVID-19 infection, there is a decrease in the oxygen saturation in the blood, which is helpful for the early detection and diagnosis of symptoms. Therefore, measuring oxygen saturation plays an important role in the early diagnosis and treatment of respiratory diseases [3].
One of the main ways to measure oxygen saturation is to use a finger pulseoximeter. A pulseoximeter measures the oxygen saturation in the blood in a non-invasive manner, allowing it to monitor the patient’s current condition in real-time situations [4]. However, pulseoximeters may not provide accurate measurements in some environments. For example, cold hands can cause vasoconstriction, which can reduce blood flow and potentially affect the accuracy of the reading. In addition, for fingers with nail polish, the nail polish present may affect the amount of light transmission, resulting in inaccurate measurements [5]. Especially in infectious disease situations such as COVID-19, it can increase the risk of infectious diseases, as you have to use a device within someone else’s use. As these problems and the risk of infection have become more prominent, non-contact oxygen saturation measurement methods have been actively studied in response [6,7].
In general, non-contact oxygen saturation studies are based on the operating principle of a pulseoximeter. A pulseoximeter is based on the principles of the different absorptions of different light wavelengths by hemoglobin species. A pulseoximeter transmits infrared and red light to the finger. As shown in Figure 1, hemoglobin without oxygen and hemoglobin with oxygen absorb infrared and red light differently. At this time, hemoglobin that is not bound to oxygen absorbs more infrared light, and hemoglobin that is bound to oxygen absorbs more red light.
The pulseoximeter measures and analyzes the amount of light absorbed by the sensor. It determines the oxygen saturation in the blood by calculating the difference in absorption for infrared and red light [9]. In measuring oxygen saturation, the alternating current (AC) and direct current (DC) components form an important part of the principle of measurement. By measuring the transmitted light hundreds of times per second, the pulseoximeter can distinguish between the AC component, which is a variable pulsating component of arterial blood, and the DC component, which is an unchanging static component of a signal consisting of tissue, venous blood, and non-pulse arterial blood. The static component can then be removed from the signal so that the pulsating component, which is typically composed of 1% to 5% of the total signal, can be separated. As shown in Equation (1), dividing the AC level by the DC level at each wavelength also compensates for the change in the incident light intensity, which can remove other complex variables from the equation. Doing this at each wavelength reduces the main source of absorption in the arterial blood and separates the relative absorbance of oxygenated Hb. The ratio of the red signal ( A C R e d / D C R e d ) to the infrared signal ( A C I R / D C I R ) reflects a ratio similar to that of oxygenated Hb, which can be converted to oxygen saturation through Equation (2) [10].
R o R = A C R e d D C R e d A C I R D C I R  
S p O 2 = A · R o R + B
Reviewing existing studies summarized in Section 2, the method of calculating the AC and DC components was also used in the non-contact oxygen saturation measurement. However, since IR wavelengths are not observed in RGB cameras, as shown in Figure 1, a visible light wavelength band with light absorption characteristics similar to IR was used. However, unlike pulseoximetry, this method is highly sensitive to factors such as skin color and ambient lighting conditions. Therefore, the methods for calculating the AC and DC components are different in each existing paper, and this diversity can affect the accuracy and reliability of the results. In fact, existing studies have not been able to produce satisfactory results.
Therefore, in this study, the oxygen saturation was estimated using the signals obtained from the video as the input of the model without calculating the AC and DC components. Specifically, we separated the signals obtained from videos into 10 color spaces and predicted the oxygen saturation using a ResNet-based deep learning model. We investigated the feasibility of various color components, rather than specific wavelength color components, in reflecting the oxygen saturation values. Considering the time-consuming nature of accurately measuring the oxygen saturation, we diversified the model inputs into 10 s, 20 s, and 30 s intervals to compare performance. Additionally, we analyzed the color channels contributing to oxygen saturation prediction on a per-subject basis using SHAP (Shapley additive explanations).

2. Related Works

Recently, methods have been proposed to measure oxygen saturation non-invasively using facial or hand images. These methods utilize specialized cameras with filters capable of capturing specific wavelength bands [11] or commonly available RGB cameras such as webcams or smartphone cameras. While specialized cameras can capture wavelength bands suitable for measuring oxygen saturation, they are not widely used. Therefore, there is a need for methods that can measure oxygen saturation using RGB cameras.
Previous studies that utilize RGB cameras for measuring oxygen saturation have primarily employed the ratio of ratios (RoRs) method. Tarassenko et al. [12] and Rahman et al. [13] estimated the AC and DC components for red and blue wavelengths from the facial region of the RGB image and calculated the RoRs to measure oxygen saturation. Sun et al. [14] estimated the AC and DC components for red and green wavelengths from the dorsum of the hand region in the RGB image and calculated the RoRs to measure oxygen saturation.
Additionally, some methods go beyond using two wavelength bands and utilize all three channels of the RGB image. Wei et al. [15] estimated the AC and DC components by using all three wavelength bands from the facial region of the RGB image, while Tian et al. [16] estimated the AC and DC components by using all three wavelength bands from the palm region of the RGB image and calculated the RoRs to measure oxygen saturation.
Furthermore, methods that use convolutional neural networks (CNNs) to measure oxygen saturation have been proposed, which differ from previous regression-based methods. Akamatsu et al. [17] calculated the AC and DC components using the red, green, and blue wavelengths from the facial region of the RGB image and fed the resulting spatiotemporal map into a CNN model to measure oxygen saturation.
Various papers have proposed methods using RGB cameras to measure oxygen saturation, with most considering the AC and DC components. However, the methods for estimating the AC and DC components from the photoplethysmography (PPG) signal, despite the clear relationship between RoRs and oxygen saturation, vary among the papers.
In the examined papers, Tarassenko et al. [12] calculated the AC component as the average of the peaks and valleys in the PPG signal and the DC component as the mean of the PPG signal. Rahman et al. [13] calculated the AC component as the standard deviation of the PPG signal and the DC component as the mean of the PPG signal. Sun et al. [14] applied band-pass filtering with a bandwidth of 0.5–5 Hz to obtain the AC component and low-pass filtering with a cut-off frequency of 0.3 Hz to obtain the DC component. Wei et al. [15] performed band-pass filtering with a bandwidth of 0.6–3 Hz to obtain the AC component and low-pass filtering with a cut-off frequency of 3 Hz to obtain the DC component. Tian et al. [16] used an eighth-order Butterworth filter with a bandwidth of 0.1 Hz centered around the estimated heart rate to obtain the AC component as the average of the peaks and valleys in the filtered signal and a second-order low-pass Butterworth filter with a cut-off frequency of 0.1 Hz to obtain the DC component as the median of the filtered signal. Akamatsu et al. [17] applied band-pass filtering with a bandwidth of 0.75–2.5 Hz to obtain the AC component and low-pass filtering with a cut-off frequency of 0.3 Hz to obtain the DC component.
While extracting the AC and DC components is essential in the principle of estimating oxygen saturation, the performance of predicting oxygen saturation varies greatly depending on how these components are extracted [16]. Therefore, in this paper, we proposed a method for estimating oxygen saturation using RGB cameras without arbitrarily extracting the AC and DC components, utilizing the time-series data obtained from RGB images. Table 1 shows a summary of the related works.

3. Method

In this study, oxygen saturation was calculated using acquired images. To do this, the segment corresponding to the skin area in the image was extracted by using RGB values, and the RGB signals in the skin area were then obtained. Afterwards, the RGB signal was converted to YCrCgCb and the HSV color space, and then these signals were used to generate 10-dimensional feature data. Afterward, the oxygen saturation was calculated using a ResNet-based CNN model. Figure 2 shows the overall process.

3.1. Dataset

In this study, an experiment was conducted by controlling the factors that could affect the oxygen saturation prediction, such as race, distance, and lighting. A total of 14 Korean participants (eight males and six females) between the ages of 24 and 32 took part in the experiment. Table 2 shows the demographic characteristics and the average values of the Cg and Cb color channels of the subjects. Figure 3 displays the distribution of skin colors for all the data. Since the study was limited to Koreans, the dots were clustered in one area. During the data collection, the participants were instructed to sit comfortably and minimize movement as much as possible. Additionally, they were asked to hold their breath to lower their oxygen saturation levels, and if they experienced discomfort, they were instructed to resume breathing. Each experiment lasted for 5 min and 30 s, during which breath-holding was repeated three times. Each participant took part in the experiment twice.
The experiment was conducted in an indoor environment. To acquire the facial images, a conventional RGB camera (Logitech C920e) [18] was used. Figure 4 presents a comparison of the signals obtained from the Logitech C920e webcam, iPhone 14, and Galaxy Z Flip4. Since smartphones undergo their own color correction and other adjustments, it can be difficult to accurately assess color changes. For this reason, in this study, only the webcam (Logitech C920e) was used in the experiments to obtain more accurate and reliable results. To enable the participants to comfortably sit in the chair and participate in the experiment, the camera was placed 60 cm away from the participants and captured facial images at a speed of 30 frames per second with a resolution of 640 × 480 pixels. Simultaneously, to measure the oxygen saturation levels, the participants’ fingers were attached to a CMS-50E pulseoximeter [19]. This device was synchronized with the recorded images using timestamps. Prior to the experiment, consent was obtained from all the participants. The averages of the Cg and Cb color channels in Table 2 were calculated as the mean values of the two experimental datapoints for each participant.

3.2. Data Preprocessing

The facial images obtained through the camera contained not only the face but also the clothing and background. To isolate only the face, a single-shot multiple detector (SSD) face detector was used to detect the faces in the images. However, the detected face regions still included the eyebrows, hair, and other elements. To remove these unwanted components, the RGB color space of the image was converted to the YCbCr color space, and only the skin region was selected. The selected range for the skin region was 0 ≤ Y ≤ 235, 77 ≤ Cb ≤ 127 and 133 ≤ Cr ≤ 173 [20].
After this process, the selected skin region was in the RGB color space. In this paper, the selected skin region of interest (ROI) used a time series of 10 channels (R, G, B, Y, Cr, Cg, Cb, H, S, and V) as the data. Therefore, the RGB signals were transformed into the YCrCgCb and HSV signals. To do this, the average values of R, G, and B per frame were calculated within the selected ROI. These values were then normalized to the range from 0 to 1 by dividing them by the maximum pixel value of 255. We denoted the normalized R, G, and B signals as R′, G′, and B′, respectively. Using Equations (3)–(6) [21] and Equations (7)–(12) [22], the RGB signals could be transformed into the YCrCgCb and HSV signals, respectively.
Y = 16 + 65.481 × R + 128.533 × G + 24.966 × B  
C r = 128 + 112 × R + 93.786 × G + 18.214 × B
C g = 128 + 81.085 × R + 112 × G + 30.915 × B
C b = 128 + 37.797 × R + 74.203 × G + 112 × B
C m a x   = max R ,   G ,   B
C m i n = min R ,   G ,   B
= C m a x C m i n  
H =   60 ° × G B m o d 6                           ,   i f   C m a x = R 60 ° × B R + 2                             ,   i f   C m a x = G 60 ° × R G + 4                               ,   i f   C m a x = B
S =       0                                                                                 ,   i f   C m a x = 0     C m a x                                                                       ,   i f   C m a x   0
V = C m a x
Using the RGB signals and the transformed YCrCgCb and HSV signals, a dataset of size N × 10 was generated, where N represents the number of frames in the video. In this paper, considering that it takes a minimum of 30 s to successfully measure oxygen saturation [23], signals from 10, 20, and 30 s were used, resulting in dataset lengths of 300, 600, and 900, respectively.
As mentioned in Section 3.1, the oxygen saturation values were measured at a sampling rate of 1 Hz during image acquisition. Therefore, the length of the generated data needed to be adjusted accordingly. For example, when using the signals from a 10-second period, only one of the 10 oxygen saturation values measured during that time should be selected. In this paper, to best represent the decreasing trend of oxygen saturation while preserving the original values, the average of the median values was used. To calculate this, the n oxygen saturation values (in this case, n = 10, 20, 30) were first sorted, and then the average value of the oxygen saturation values corresponding to the third interval, divided into five intervals, was taken. Figure 5 shows the resulting signals when the original signal and the oxygen saturation values for 10, 20, and 30 s were calculated using the average of the median values for one of the participants.

3.3. Model Training Method

In this paper, a CNN model based on ResNet [24] was created to measure oxygen saturation using the N × 10 dataset generated in Section 3.2. Figure 6 illustrates the structure of the model developed in this paper for measuring oxygen saturation. Similar to ResNet, skip connections were used in this model to combine the input data with the output at each layer and pass it to the next layer.
The model training was conducted three times, each time using input data of size 300 × 10, 600 × 10, and 900 × 10. Two videos obtained from each participant were used, with one video used for model training and the other for testing. The loss was updated using the mean square error, and the Adam optimizer [25] was employed as the optimization algorithm. The batch size was set to 32, and the maximum number of epochs was set to 4000 for training. For testing, the model from the epoch with the lowest loss was utilized, and sliding windows of 1 s were used to obtain continuous outputs.
In the initial layers of ResNet, low-level features are primarily extracted, which mainly encompass basic information related to changes in each color space. At this stage, the model can learn information such as the average color variations and relationships between the colors. In the later stages of the model, based on the low-level features extracted in the initial layers, more complex patterns are recognized, and crucial information related to oxygen saturation is derived. By integrating various features across channels and time frames, such as temporal change analysis and a combination of high-level features, the model learns complex patterns that accurately predict oxygen saturation. This is crucial for the model to draw meaningful conclusions about oxygen saturation from changes in the color space. In conclusion, while the initial layers of ResNet extract basic information, such as the average color variations in each color space, the later layers combine and analyze this information to understand and predict more complex patterns related to oxygen saturation. In this study, half of the 28 videos acquired from the 14 participants were used to train the model, while the remaining videos were used for testing.

4. Results

To quantitatively evaluate the performance of the trained model in predicting oxygen saturation, we used the Pearson correlation coefficient, mean absolute error, and root mean square error. The Pearson correlation coefficient is one of the methods used to measure the linear correlation between two variables, indicating a strong correlation between the variables when the absolute value is close to 1. The mean absolute error represents the average error between the predicted and actual values, and smaller values indicate better predictive performance of the model. The root mean square error represents the square root of the average of the squared differences between the predicted and actual values, indicating the standard deviation of the prediction errors. Smaller values suggest better predictive performance, and like the mean absolute error, it is used for model evaluation and comparison. However, the root mean square error is more sensitive to larger errors, as its values increase with larger errors, making it more sensitive for evaluating model performance compared to the mean absolute error.

4.1. Results by Subject

The experimental results are as follows: Figure 7, Figure 8 and Figure 9 depict a comparison between the oxygen saturation estimated using the model trained with data from 10, 20, and 30 s, respectively, and the oxygen saturation measured using the contact-based sensor. The complete results for all participants are summarized in Table 3, Table 4 and Table 5.
The results indicate that the trained model did not perform well in predicting oxygen saturation compared to previous studies. However, it is important to note that previous studies on the non-contact estimation of oxygen saturation evaluated the results by averaging the individual training and testing results for each participant. Taking this into consideration, our study also conducted training and testing on a participant-specific basis. Figure 10, Figure 11 and Figure 12 show selected graphs comparing the oxygen saturation estimated using the model trained with data from 10, 20, and 30 s, respectively, with the oxygen saturation measured using the contact-based sensor. The complete results for all participants are summarized in Table 6, Table 7 and Table 8. The intraclass correlation coefficient (ICC), which measures the agreement between two evaluations, ranged between 0 and 1.
Upon examining the results, it can be observed that the individual models performed either better or comparably to previous studies, and there was a significant difference in exactness compared to the ensemble model. In Section 4.2, the reasons for the discrepancies in the output values between the individual and ensemble models are analyzed.

4.2. Feature Analysis

In order to analyze the features used by the trained model for prediction based on the observed differences in the output values between the individual and ensemble models in Section 4.1, SHAP (Shapley additive explanations) was employed [26]. SHAP is based on the concept of Shapley values, which is derived from cooperative game theory. Shapley values measure the worth of each player in a cooperative game. When applied to evaluating feature importance in machine learning models, SHAP values provide insights into the impact of features on model predictions.
The range of SHAP values varies depending on the characteristics of the features and the model’s structure. The absolute magnitude of a SHAP value represents the contribution of each feature, indicating the extent to which it influences the model’s prediction. A larger SHAP value indicates a greater impact of the corresponding feature on the model’s prediction.
When calculating the SHAP values for each feature, all combinations of features are considered to evaluate the model’s predictions. In other words, the difference between the model’s predictions when a feature is excluded and the original predictions is computed, and this difference represents the importance (SHAP value) of that feature.
Figure 13, Figure 14 and Figure 15 depict the SHAP values and feature contributions for the individual models trained and tested on the data used in Figure 10, Figure 11 and Figure 12. These visualizations were used to confirm whether the features with dominant contributions significantly impacted each individual participant.
Figure 13, Figure 14 and Figure 15 indicate significant variations in the importance of the features among different subjects. Furthermore, since these graphs demonstrate the contribution to the data used for training, it is confirmed whether the feature with the predominant contribution was important to each subject by confirming the contribution to the data used in testing. Figure 16, Figure 17 and Figure 18 show the results, displaying the SHAP values for models trained on input data for 10 s, 20 s, and 30 s.
For each subject in Figure 16, the Cr and B channels contributed the most. In Figure 17, the Y and Cg channels were the primary contributors for each respective subject, and in Figure 18, the Cg channel had the largest contribution. When examining individual models for all subjects, it became evident that most subjects had the same predominant features contributing to both the training and testing data. Considering not only the differences in feature importance but also the fact that the features contributing the most to the predictions vary from person to person, this supports the idea that personalized models perform better in predicting oxygen saturation than a unified model. Additionally, it suggests that in future research aimed at improving the exactness of oxygen saturation prediction, it may be necessary to individually select appropriate color channels for each subject rather than relying solely on the conventional approach of using specific color channels. Feature analysis graphs for all subjects are included in Appendix A.

5. Discussion

This paper proposed a method for estimating oxygen saturation using convolutional neural networks and images acquired with an RGB camera. Because the AC and DC components, which are essentially calculated in the measurement of oxygen saturation, can vary sensitively depending on the calculation method, the time-series data of the image was directly utilized. Furthermore, to account for the significant time required for successful oxygen saturation measurement, experiments were conducted with varying input times. Table 9 summarizes the oxygen saturation prediction results according to the input data lengths.
First, we trained a unified model using half of the images acquired from the subjects and conducted tests. The results were as follows: when using 10 s of data, the Pearson correlation coefficient was calculated as 0.414, with the MAE (mean absolute error) and RMSE (root mean square error) calculated as 2.592% and 3.108%, respectively. When using 20 s of data, the Pearson correlation coefficient was calculated as 0.473, with the MAE and RMSE calculated as 3.869% and 4.797%, respectively. When using 30 s of data, the Pearson correlation coefficient was calculated as 0.475, with the MAE and RMSE calculated as 3.025% and 3.547%, respectively. This can be considered as not accurately predicting oxygen saturation when compared to previous research. However, studies that measured oxygen saturation non-invasively and individually trained and tested for each subject generally. Therefore, in this study, similar to previous research, we conducted training and testing for each subject separately. As a result, for a 10-second input, the Pearson correlation coefficient was calculated as 0.570, with an MAE of 1.755% and RMSE of 2.284%, with an ICC (intraclass correlation coefficient) of 0.574. For a 20-second input, the Pearson correlation coefficient was calculated as 0.630, with an MAE of 1.720% and RMSE of 2.219%, with an ICC of 0.681. For a 30-second input, the Pearson correlation coefficient was calculated as 0.663, with an MAE of 2.142% and RMSE of 2.612%, with an ICC of 0.646. In comparison to the previous studies [15,17] that estimated oxygen saturation from the facial region of interest (ROI), their average correlation coefficients were 0.68 and 0.49, respectively, indicating that these prior studies performed well in predicting oxygen saturation. Furthermore, the method proposed in this paper is more efficient as it utilizes the signals obtained from images without the need to calculate additional AC and DC components.
In this study, we investigated the factors influencing oxygen saturation estimation by controlling various variables such as illumination, distance, and skin tone through SHAP (Shapley additive explanations) value analysis. Particularly, the analysis utilizing the SHAP values revealed significant variations in the feature importance contributions across different subjects when the trained model predicted oxygen saturation. This finding suggests that there may be diverse sets of features that exert the greatest influence on predictions for each individual. Therefore, to improve the accuracy of oxygen saturation predictions, it may be necessary to move beyond the conventional approach of using specific color channels and instead consider employing different color channels for each subject.
However, this study did not utilize a more diverse dataset encompassing various demographics such as gender, age, and skin tone. This limitation could impact the generalizability of the research findings. For instance, certain gender, age, or skin tone groups may have different color channels that affect oxygen saturation. Future research should analyze the color channels contributing to oxygen saturation prediction in greater detail by utilizing more diverse datasets, thereby enhancing our understanding of the influences of features and improving the predictive power of the model across different environments.
Furthermore, the database used in this study only included oxygen saturation values ranging from 85% to 100%, obtained by instructing participants to hold their breath. Therefore, models trained to predict oxygen saturation using this database may exhibit a tendency to predict lower oxygen saturation values than those used in training. To address this limitation, it is necessary to employ more diverse databases reflecting a wider range of oxygen saturation values. Considering that in the future, the International Organization for Standardization (ISO) will verify the accuracy of pulseoximeters through hypoxia-induced clinical trials that change oxygen saturation by 70 to 100%, the data needs to diversify the range of oxygen saturation changes. This will involve utilizing datasets with a more varied range of oxygen saturation changes for predicting oxygen saturation.
Therefore, future research plans involve conducting studies to predict oxygen saturation using more diverse datasets that consider gender, age, skin color, and varied ranges of oxygen. This will involve analyzing the color channels contributing to oxygen saturation prediction and diversifying prediction models by grouping individuals with similar contributions from each color channel without creating individualized models.

Author Contributions

Conceptualization, E.C.L.; methodology, H.A.S. and C.L.S.; data collection, H.A.S.; software, H.A.S. and C.L.S.; validation, H.A.S.; formal analysis, H.A.S. and C.L.S.; investigation, H.A.S. and C.L.S.; writing—original draft preparation, H.A.S. and C.L.S.; writing—review and editing, E.C.L.; visualization, H.A.S.; supervision, E.C.L.; project administration, E.C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by the Field-oriented Technology Development Project for Customs Administration through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT and Korea Customs Service (2022M3I1A1095155).

Institutional Review Board Statement

Based on the 13-1-3 of the Enforcement Regulations of the Act on Bioethics and Safety of the Republic of Korea, ethical review and approval were waived (IRB-SMU-C-2023-1-008) for this study by Sangmyung University Institutional Review Board, because this study uses only simple contact measuring equipment or observation equipment that does not follow physical changes.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1. SHAP values of the personalized model used in Figure 10 (all subjects). The x-axis represents the features and the y-axis shows the contributions according to the features. Among them, the orange bar is the feature with the greatest contribution.
Figure A1. SHAP values of the personalized model used in Figure 10 (all subjects). The x-axis represents the features and the y-axis shows the contributions according to the features. Among them, the orange bar is the feature with the greatest contribution.
Applsci 14 04374 g0a1
Figure A2. SHAP values of the personalized model used in Figure 11 (all subjects).
Figure A2. SHAP values of the personalized model used in Figure 11 (all subjects).
Applsci 14 04374 g0a2
Figure A3. SHAP values of the personalized model used in Figure 12 (all subjects).
Figure A3. SHAP values of the personalized model used in Figure 12 (all subjects).
Applsci 14 04374 g0a3
Figure A4. SHAP values for the training and testing data of a model trained with 10 s of signals as the input data (all subjects).
Figure A4. SHAP values for the training and testing data of a model trained with 10 s of signals as the input data (all subjects).
Applsci 14 04374 g0a4
Figure A5. SHAP values for the training and testing data of a model trained with 20 s of signals as the input data (all subjects).
Figure A5. SHAP values for the training and testing data of a model trained with 20 s of signals as the input data (all subjects).
Applsci 14 04374 g0a5
Figure A6. SHAP values for the training and testing data of a model trained with 30 s of signals as the input data (all subjects).
Figure A6. SHAP values for the training and testing data of a model trained with 30 s of signals as the input data (all subjects).
Applsci 14 04374 g0a6

References

  1. Lewis, C.A.; Fergusson, W.; Eaton, T.; Zeng, I.; Kolbe, J. Isolated nocturnal desaturation in COPD: Prevalence and impact on quality of life and sleep. Thorax 2009, 64, 133–138. [Google Scholar] [CrossRef] [PubMed]
  2. American Academy of Sleep Medicine Task Force. Sleep-related breathing disorders in adults: Recommendations for syndrome definition and measurement techniques in clinical research. The Report of an American Academy of Sleep Medicine Task Force. Sleep 1999, 22, 667. [Google Scholar] [CrossRef]
  3. Starr, N.; Rebollo, D.; Asemu, Y.M.; Akalu, L.; Mohammed, H.A.; Menchamo, M.W.; Melese, E.; Bitew, S.; Wilson, I.; Tadesse, M.; et al. Pulse oximetry in low-resource settings during the COVID-19 pandemic. Lancet Glob. Health 2020, 8, e1121–e1122. [Google Scholar] [CrossRef] [PubMed]
  4. Severinghaus, J.W.; Honda, Y. History of blood gas analysis. VII. Pulse oximetry. J. Clin. Monit. 1987, 3, 135–138. [Google Scholar] [CrossRef] [PubMed]
  5. DeMeulenaere, S. Pulse oximetry: Uses and limitations. J. Nurse Pract. 2007, 3, 312–317. [Google Scholar] [CrossRef]
  6. Cheng, J.C.; Pan, T.S.; Hsiao, W.C.; Lin, W.H.; Liu, Y.L.; Su, T.J.; Wang, S.M. Using Contactless Facial Image Recognition Technology to Detect Blood Oxygen Saturation. Bioengineering 2023, 10, 524. [Google Scholar] [CrossRef]
  7. Sasaki, S.; Sugita, N.; Terai, T.; Yoshizawa, M. Non-Contact Measurement of Blood Oxygen Saturation Using Facial Video Without Reference Values. IEEE J. Transl. Eng. Health Med. 2023, 12, 76–83. [Google Scholar] [CrossRef]
  8. Prahl, S.A. Tabulated Molar Extinction Coefficient for Hemoglobin in Water. 1999. Available online: https://omlc.org/spectra/hemoglobin/summary.html (accessed on 18 January 2023).
  9. Chan, E.D.; Chan, M.M.; Chan, M.M. Pulse oximetry: Understanding its basic principles facilitates appreciation of its limitations. Respir. Med. 2013, 107, 789–799. [Google Scholar] [CrossRef] [PubMed]
  10. Sinex, J.E. Pulse oximetry: Principles and limitations. Am. J. Emerg. Med. 1999, 17, 59–66. [Google Scholar] [CrossRef] [PubMed]
  11. Humphreys, K.; Ward, T.; Markham, C. A CMOS camera-based pulse oximetry imaging system. In Proceedings of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, Shanghai, China, 17–18 January 2006. [Google Scholar]
  12. Tarassenko, L.; Villarroel, M.; Guazzi, A.; Jorge, J.; Clifton, D.A.; Pugh, C. Non-contact video-based vital sign monitoring using ambient light and auto-regressive models. Physiol. Meas. 2014, 35, 807–831. [Google Scholar] [CrossRef] [PubMed]
  13. Rahman, H.; Ahmed, M.U.; Begum, S. Non-contact physiological parameters extraction using facial video considering illumination, motion, movement and vibration. IEEE Trans. Biomed. Eng. 2019, 67, 88–98. [Google Scholar] [CrossRef] [PubMed]
  14. Sun, Z.; He, Q.; Li, Y.; Wang, W.; Wang, R.K. Robust non-contact peripheral oxygenation saturation measurement using smartphone-enabled imaging photoplethysmography. Biomed. Opt. Express 2021, 12, 1746–1760. [Google Scholar] [CrossRef] [PubMed]
  15. Wei, B.; Wu, X.; Zhang, C.; Lv, Z. Analysis and improvement of non-contact SpO2 extraction using an RGB webcam. Biomed. Opt. Express 2021, 12, 5227–5245. [Google Scholar] [CrossRef] [PubMed]
  16. Tian, X.; Wong, C.W.; Ranadive, S.M.; Wu, M. A Multi-Channel Ratio-of-Ratios Method for Noncontact Hand Video Based SpO2 Monitoring Using Smartphone Cameras. IEEE J. Sel. Top. Signal Process. 2022, 16, 197–207. [Google Scholar] [CrossRef]
  17. Akamatsu, Y.; Onishi, Y.; Imaoka, H. Blood Oxygen Saturation Estimation from Facial Video via DC and AC components of Spatio-temporal Map. arXiv 2022, arXiv:2212.07116. [Google Scholar]
  18. Logitech. Available online: https://www.logitech.com/ko-kr/products/webcams/c920e-business-webcam.960-001360.html (accessed on 12 March 2023).
  19. CMS50E Fingertip Pulse Oximeter. Available online: https://www.pulseoximeter.org/cms50e.html (accessed on 23 March 2023).
  20. Chai, D.; Ngan, K. Face segmentation using skin-color map in videophone applications. IEEE Trans. Circuits Syst. Video Technol. 1999, 9, 551–564. [Google Scholar] [CrossRef]
  21. De Dios, J.J.; Garcia, N. Face detection based on a new color space YCgCr. In Proceedings of the Proceedings 2003 International Conference on Image Processing, Barcelona, Spain, 14–17 September 2003. [Google Scholar]
  22. Saravanan, G.; Yamuna, G.; Nandhini, S. Real time implementation of RGB to HSV/HSI/HSL and its reverse color space models. In Proceedings of the 2016 International Conference on Communication and Signal Processing, Melmaruvathur, India, 6–8 April 2016. [Google Scholar]
  23. Rahman, A.E.; Ameen, S.; Hossain, A.T.; Jabeen, S.; Majid, T.; Afm, A.U.; Tanwi, T.S.; Banik, G.; Shaikh, M.Z.; Islam, M.J.; et al. Success and time implications of SpO2 measurement through pulse oximetry among hospitalised children in rural Bangladesh: Variability by various device-, provider-and patient-related factors. J. Glob. Health 2022, 12, 04036. [Google Scholar] [CrossRef] [PubMed]
  24. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  25. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  26. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Figure 1. The wavelength of oxygenated hemoglobin and the deoxygenated hemoglobin absorption coefficient [8].
Figure 1. The wavelength of oxygenated hemoglobin and the deoxygenated hemoglobin absorption coefficient [8].
Applsci 14 04374 g001
Figure 2. Entire process of the oxygen saturation prediction model.
Figure 2. Entire process of the oxygen saturation prediction model.
Applsci 14 04374 g002
Figure 3. The distribution of skin colors for all the data.
Figure 3. The distribution of skin colors for all the data.
Applsci 14 04374 g003
Figure 4. Comparison of the signals obtained from the Logitech C920e webcam, iPhone 14, and Galaxy Z Flip4.
Figure 4. Comparison of the signals obtained from the Logitech C920e webcam, iPhone 14, and Galaxy Z Flip4.
Applsci 14 04374 g004
Figure 5. The result of calculating the average of the original signal of oxygen saturation measured with a pulseoximeter and the median value of the oxygen saturation value.
Figure 5. The result of calculating the average of the original signal of oxygen saturation measured with a pulseoximeter and the median value of the oxygen saturation value.
Applsci 14 04374 g005
Figure 6. CNN model structure based on ResNet.
Figure 6. CNN model structure based on ResNet.
Applsci 14 04374 g006
Figure 7. Predicted oxygen saturation and actual oxygen saturation of a model trained using 10 s of signals as the input data (integrated model). For each subject, three indices were measured: corr, MAE, and RMSE: “corr” stands for the Pearson correlation coefficient, MAE stands for the “mean absolute error”, and RMSE stands for the “root mean square error”.
Figure 7. Predicted oxygen saturation and actual oxygen saturation of a model trained using 10 s of signals as the input data (integrated model). For each subject, three indices were measured: corr, MAE, and RMSE: “corr” stands for the Pearson correlation coefficient, MAE stands for the “mean absolute error”, and RMSE stands for the “root mean square error”.
Applsci 14 04374 g007
Figure 8. Predicted oxygen saturation and actual oxygen saturation of a model trained using 20 s of signals as the input data (integrated model).
Figure 8. Predicted oxygen saturation and actual oxygen saturation of a model trained using 20 s of signals as the input data (integrated model).
Applsci 14 04374 g008
Figure 9. Predicted oxygen saturation and actual oxygen saturation of a model trained using 30 s of signals as the input data (integrated model).
Figure 9. Predicted oxygen saturation and actual oxygen saturation of a model trained using 30 s of signals as the input data (integrated model).
Applsci 14 04374 g009
Figure 10. Predicted oxygen saturation and actual oxygen saturation of a model trained using 10 s of signals as the input data (personalized model).
Figure 10. Predicted oxygen saturation and actual oxygen saturation of a model trained using 10 s of signals as the input data (personalized model).
Applsci 14 04374 g010
Figure 11. Predicted oxygen saturation and actual oxygen saturation of a model trained using 20 s of signals as the input data (personalized model).
Figure 11. Predicted oxygen saturation and actual oxygen saturation of a model trained using 20 s of signals as the input data (personalized model).
Applsci 14 04374 g011
Figure 12. Predicted oxygen saturation and actual oxygen saturation of a model trained using 30 s of signals as the input data (personalized model).
Figure 12. Predicted oxygen saturation and actual oxygen saturation of a model trained using 30 s of signals as the input data (personalized model).
Applsci 14 04374 g012
Figure 13. SHAP values of the personalized model used in Figure 10. The x-axis represents the features and the y-axis shows the contribution according to the features. Among them, the orange bar is the feature with the greatest contribution.
Figure 13. SHAP values of the personalized model used in Figure 10. The x-axis represents the features and the y-axis shows the contribution according to the features. Among them, the orange bar is the feature with the greatest contribution.
Applsci 14 04374 g013
Figure 14. SHAP values of the personalized model used in Figure 11.
Figure 14. SHAP values of the personalized model used in Figure 11.
Applsci 14 04374 g014
Figure 15. SHAP values of the personalized model used in Figure 12.
Figure 15. SHAP values of the personalized model used in Figure 12.
Applsci 14 04374 g015
Figure 16. SHAP values for the training and testing data of a model trained with 10 s of signals as the input data.
Figure 16. SHAP values for the training and testing data of a model trained with 10 s of signals as the input data.
Applsci 14 04374 g016
Figure 17. SHAP values for the training and testing data of a model trained with 20 s of signals as the input data.
Figure 17. SHAP values for the training and testing data of a model trained with 20 s of signals as the input data.
Applsci 14 04374 g017
Figure 18. SHAP values for the training and testing data of a model trained with 30 s of signals as the input data.
Figure 18. SHAP values for the training and testing data of a model trained with 30 s of signals as the input data.
Applsci 14 04374 g018
Table 1. Summary of the related works.
Table 1. Summary of the related works.
MethodAuthorResult
MetricValue
Use two
waves
Face ROICalculate the RoR and measure the oxygen saturation by estimating the AC and DC components for red and blue wavelengthsTarassenko et al. [12]Corrcoef−0.8
Calculate the RoR and measure the oxygen saturation by estimating the AC and DC components for red and blue wavelengthsRahman et al. [13]Corrcoef0.95
Hand ROICalculate the RoR and measure the oxygen saturation by estimating the AC and DC components for red and blue wavelengthsSun et al. [14] R 2 0.87
Use three wavesFace ROICalculate the AC and DC components and measure the oxygen saturation using all red, green, and blue wavelengthsWei et al. [15]Corrcoef0.68
Hand ROICalculate the AC and DC components and measure the oxygen saturation using all red, green, and blue wavelengthsTian et al. [16]MAE1.26
CNNFace ROIThe space-time map is input to the CNN model using the AC and DC components calculated using the data of the red, green, and blue wavelengths.Akamatsu et al. [17]Corrcoef0.496
Table 2. Demographic characteristics and the average values of the Cg and Cb color channels for the participants.
Table 2. Demographic characteristics and the average values of the Cg and Cb color channels for the participants.
IDGenderAgeNationalityCb Channel
Average
Cr Channel
Average
1Male25Korean116.0145.6
2Female25Korean115.0143.5
3Male27Korean118.0143.3
4Male27Korean117.4142.8
5Female26Korean116.9142.7
6Female30Korean117.3142.8
7Male28Korean117.5143.1
8Female24Korean117.6144.0
9Female24Korean117.6143.6
10Female25Korean120.1140.6
11Male26Korean118.0142.5
12Male28Korean114.9143.9
13Male27Korean115.0144.4
14Male32Korean114.0146.3
Table 3. Statistical results of a model trained using 10 s of signals as the input data (integrated model). For each subject, three indices were measured: the correlation coefficient, MAE, and RMSE: “correlation coefficient” stands for the Pearson correlation coefficient, MAE stands for the “mean absolute error”, and RMSE stands for the “root mean square error”.
Table 3. Statistical results of a model trained using 10 s of signals as the input data (integrated model). For each subject, three indices were measured: the correlation coefficient, MAE, and RMSE: “correlation coefficient” stands for the Pearson correlation coefficient, MAE stands for the “mean absolute error”, and RMSE stands for the “root mean square error”.
SubjectCorrelation
Coefficient
MAERMSE
10.4301.7382.087
20.4682.7283.181
30.5592.0592.667
40.3972.1482.807
50.5023.2083.649
60.4173.4823.782
70.5003.9874.763
80.5692.8123.201
90.3373.3583.979
100.4763.0213.623
110.2941.5502.035
120.2001.3581.691
130.3572.3752.980
140.2942.4633.060
Mean0.4142.5923.108
Table 4. Statistical results of a model trained using 20 s of signals as the input data (integrated model).
Table 4. Statistical results of a model trained using 20 s of signals as the input data (integrated model).
SubjectCorrelation
Coefficient
MAERMSE
10.5860.8256.338
20.5123.0623.816
30.6961.7602.373
40.6725.8406.343
50.4962.4612.978
60.5826.1026.692
70.5123.3203.678
80.3773.7464.259
90.2415.0335.852
100.4687.6908.289
110.5944.2604.737
120.3782.4132.903
130.0563.7624.459
140.4483.8884.437
Mean0.4733.8694.797
Table 5. Statistical results of a model trained using 30 s of signals as the input data (integrated model).
Table 5. Statistical results of a model trained using 30 s of signals as the input data (integrated model).
SubjectCorrelation
Coefficient
MAERMSE
10.3722.6353.249
20.5063.7724.355
30.6642.0492.370
40.7482.0662.676
50.5291.8552.264
60.5843.6284.050
70.4922.9933.360
80.4753.1903.788
90.3955.6626.489
100.4015.6866.168
110.5701.7202.390
120.1991.7852.079
130.2382.7143.438
140.4812.5982.981
Mean0.4753.0253.547
Table 6. Statistical results of a model trained using 10 s of signals as the input data (personalized model).
Table 6. Statistical results of a model trained using 10 s of signals as the input data (personalized model).
SubjectCorrelation
Coefficient
MAERMSEICC95% Confidence Interval
10.4981.4871.8740.5920.487~0.675
20.5131.7743.1860.5620.369~0.686
30.6932.2122.6840.7630.524~0.862
40.7593.9904.5100.601−0.213~0.852
50.4511.3431.9220.6150.517~0.693
60.6531.7411.9730.434−0.196~0.751
70.5092.3782.8280.375−0.184~0.662
80.6201.7182.2790.6770.597~0.742
90.4110.8721.3740.2970.005~0.494
100.5760.3570.6940.7140.639~0.773
110.7131.0821.4440.7700.710~0.817
120.4032.7353.2380.3530.004~0.564
130.4961.2671.7090.5380.418~0.632
140.6841.6182.255 0.7430.631~0.815
Mean0.5701.7552.2840.574-
Table 7. Statistical results of a model trained using 20 s of signals as the input data (personalized model).
Table 7. Statistical results of a model trained using 20 s of signals as the input data (personalized model).
SubjectCorrelation
Coefficient
MAERMSEICC95% Confidence Interval
10.5791.5581.9220.6640.577~0.733
20.6692.1253.1990.7500.682~0.803
30.5162.9803.6380.6360.467~0.742
40.8173.0143.4650.725−0.189~0.907
50.5121.9352.5550.5860.293~0.737
60.7671.0441.2570.678−0.195~0.882
70.5721.4902.0400.6200.519~0.700
80.6882.2142.9340.6030.389~0.729
90.2691.1651.5700.3690.174~0.513
100.7230.6300.8580.8160.756~0.859
110.7561.1571.5860.8270.713~0.887
120.5771.2241.5600.6870.567~0.768
130.6711.3081.7060.7870.709~0.841
140.6982.2412.772 0.7830.612~0.865
Mean0.6301.7202.2190.681-
Table 8. Statistical results of a model trained using 30 s of signals as the input data (personalized model).
Table 8. Statistical results of a model trained using 30 s of signals as the input data (personalized model).
SubjectCorrelation
Coefficient
MAERMSEICC95% Confidence Interval
10.7271.8122.3130.6130.112~0.797
20.2314.7156.1200.3630.191~0.498
30.6262.8533.2000.5900.085~0.783
40.8452.3432.9390.8540.697~0.917
50.8212.8373.3360.7060.376~0.836
60.4401.7752.2100.4280.105~0.617
70.5621.0741.4450.6860.589~0.758
80.7002.4123.1050.6140.423~0.731
90.7590.8571.1520.8400.785~0.879
100.6362.3762.6380.385−0.196~0.707
110.6881.4281.7660.7750.681~0.837
120.7970.6760.8550.8500.790~0.891
130.6183.0253.3920.505−0.215~0.782
140.8281.8042.092 0.8310.785~0.867
Mean0.6632.1422.6120.646-
Table 9. Summary of the oxygen saturation prediction results according to the input data lengths.
Table 9. Summary of the oxygen saturation prediction results according to the input data lengths.
Correlation CoefficientMAERMSE
Integrated Model10 s0.4142.5923.108
20 s0.4733.8694.797
30 s0.4753.0253.547
Personalized Model10 s0.5701.7552.284
20 s0.6301.7202.219
30 s0.6632.1422.612
Previous Research [15]0.680-1.819
Previous Research [17]0.4961.181.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Seong, H.A.; Seok, C.L.; Lee, E.C. Exploring the Feasibility of Vision-Based Non-Contact Oxygen Saturation Estimation: Considering Critical Color Components and Individual Differences. Appl. Sci. 2024, 14, 4374. https://doi.org/10.3390/app14114374

AMA Style

Seong HA, Seok CL, Lee EC. Exploring the Feasibility of Vision-Based Non-Contact Oxygen Saturation Estimation: Considering Critical Color Components and Individual Differences. Applied Sciences. 2024; 14(11):4374. https://doi.org/10.3390/app14114374

Chicago/Turabian Style

Seong, Hyeon Ah, Chae Lin Seok, and Eui Chul Lee. 2024. "Exploring the Feasibility of Vision-Based Non-Contact Oxygen Saturation Estimation: Considering Critical Color Components and Individual Differences" Applied Sciences 14, no. 11: 4374. https://doi.org/10.3390/app14114374

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop