Improving Remote Photoplethysmography Performance through Deep-Learning-Based Real-Time Skin Segmentation Network
Abstract
:1. Introduction
- Existing studies on rPPG have emphasized the importance of skin segmentation. However, few attempts have been made to improve rPPG by skin segmentation. This study confirmed a noise reduction and rPPG improvement with Skin-SegNet and confirmed that the rPPG improvement is higher in noisy environments caused by talking or motion artifacts. Skin-SegNet shows an average improvement of 20% in terms of the MAPE compared with existing threshold-based skin segmentation methods (YCbCr [4], HSV [5]). Additionally, the average success rate of heart rate estimation within 5 bpm in a talking environment is 9.5%.
- Due to the nature of image processing, there is a trade-off between accuracy and processing speed. However, Skin-SegNet achieves state-of-the-art (SOTA) performance in terms of accuracy and speed. Skin-SegNet-based rPPG measurement shows 10 times faster processing speed than existing deep learning methods of ROI selection, and there is no significant difference in performance. The inference time of Skin-SegNet was 15 ms, which is a level capable of real-time processing (>30 frames per second (FPS)).
2. Related Works
3. Method
3.1. Information Blocking Decoder
3.2. Spatial Squeeze Module
3.3. Real-Time Skin Segmentation Network (Skin-SegNet)
4. Result
4.1. Datasets
4.2. Evaluation
4.3. Evaluation Result
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gil, E.; Orini, M.; Bailon, R.; Vergara, J.M.; Mainardi, L.; Laguna, P. Photoplethysmography pulse rate variability as a surrogate measurement of heart rate variability during non-stationary conditions. Physiol. Meas. 2010, 31, 1271. [Google Scholar] [CrossRef] [PubMed]
- Wieringa, F.P.; Mastik, F.; Steen, A.V.D. Contactless multiple wavelength photoplethysmographic imaging: A first step toward “SpO2 camera” technology. Ann. Biomed. Eng. 2005, 33, 1034–1041. [Google Scholar] [CrossRef] [PubMed]
- Humphreys, K.; Ward, T.; Markham, C. Noncontact simultaneous dual wavelength photoplethysmography: A further step toward noncontact pulse oximetry. Rev. Sci. Instrum. 2007, 78, 044304. [Google Scholar] [CrossRef]
- Phung, S.L.; Bouzerdoum, A.; Chai, D. A novel skin color model in ycbcr color space and its application to human face detection. In Proceedings of the International Conference on Image Processing, Rochester, NY, USA, 22–25 September 2002. [Google Scholar]
- Dahmani, D.; Cheref, M.; Larabi, S. Zero-sum game theory model for segmenting skin regions. Image Vis. Comput. 2020, 99, 103925. [Google Scholar] [CrossRef]
- Lewandowska, M.; Rumiński, J.; Kocejko, T.; Nowak, J. Measuring pulse rate with a webcam—A non-contact method for evaluating cardiac activity. In Proceedings of the 2011 Federated Conference on Computer Science and Information Systems (FedCSIS), Szczecin, Poland, 18–21 September 2011; pp. 405–410. [Google Scholar]
- De Haan, G.; Jeanne, V. Robust pulse rate from chrominance-based rPPG. IEEE Trans. Biomed. Eng. 2013, 60, 2878–2886. [Google Scholar] [CrossRef]
- Wang, W.; Den Brinker, A.C.; Stuijk, S.; De Haan, G. Algorithmic principles of remote PPG. IEEE Trans. Biomed. Eng. 2016, 64, 1479–1491. [Google Scholar] [CrossRef] [PubMed]
- Casado, C.A.; López, M.B. Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces. arXiv 2022, arXiv:2202.04101. [Google Scholar] [CrossRef] [PubMed]
- Scherpf, M.; Ernst, H.; Misera, L.; Malberg, H.; Schmidt, M. Skin Segmentation for Imaging Photoplethysmography Using a Specialized Deep Learning Approach. In Proceedings of the 2021 Computing in Cardiology (CinC), Brno, Czech Republic, 13–15 September 2021; pp. 1–4. [Google Scholar]
- Verkruysse, W.; Svaasand, L.O.; Nelson, J.S. Stuart. Remote plethysmographic imaging using ambient light. Opt. Express 2008, 16, 21434–21445. [Google Scholar] [CrossRef] [PubMed]
- Bobbia, S.; Benezeth, Y.; Dubois, J. Remote photoplethysmography based on implicit living skin tissue segmentation. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 361–365. [Google Scholar]
- Bobbia, S.; Luguern, D.; Benezeth, Y.; Nakamura, K.; Gomez, R.; Dubois, J. Real-time temporal superpixels for unsupervised remote photoplethysmography. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1341–1348. [Google Scholar]
- Nikolskaia, K.; Ezhova, N.; Sinkov, A.; Medvedev, M. Skin detection technique based on HSV color model and SLIC segmentation method. In Proceedings of the 4th Ural Workshop on Parallel, Distributed, and Cloud Computing for Young Scientists, Ural-PDC 2018, CEUR Workshop Proceedings, Yekaterinburg, Russia, 15 November 2018; pp. 123–135. [Google Scholar]
- Tran, Q.V.; Su, S.F.; Sun, W.; Tran, M.Q. Adaptive pulsatile plane for robust noncontact heart rate monitoring. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 5587–5599. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Park, H.; Sjosund, L.; Yoo, Y.; Monet, N.; Bang, J.; Kwak, N. Sinet: Extreme lightweight portrait segmentation networks with spatial squeeze module and information blocking decoder. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 2066–2074. [Google Scholar]
- Lee, K.; You, H.; Oh, J.; Lee, E.C. Extremely Lightweight Skin Segmentation Networks to Improve Remote Photoplethysmography Measurement. In Proceedings of the International Conference on Intelligent Human Computer Interaction, Tashkent, Uzbekistan, 20–22 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 454–459. [Google Scholar]
- Mehta, S.; Rastegari, M.; Caspi, A.; Shapiro, L.; Hajishirzi, H. Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 552–568. [Google Scholar]
- Park, H.; Yoo, Y.; Seo, G.; Han, D.; Yun, S.; Kwak, N. C3: Concentrated-comprehensive convolution and its application to semantic segmentation. arXiv 2018, arXiv:1812.04920. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Mehta, S.; Rastegari, M.; Shapiro, L.; Hajishirzi, H. Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9190–9200. [Google Scholar]
- Lee, C.H.; Liu, Z.; Wu, L.; Luo, P. Maskgan: Towards diverse and interactive facial image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 5549–5558. [Google Scholar]
- Stricker, R.; Müller, S.; Gross, H.M. Non-contact video-based pulse rate measurement on a mobile service robot. In Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication, Edinburgh, UK, 25–29 August 2014; pp. 1056–1062. [Google Scholar]
- Li, X.; Chen, J.; Zhao, G.; Pietikainen, M. Remote heart rate measurement from face videos under realistic situations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Tulyakov, S.; Alameda-Pineda, X.; Ricci, E.; Yin, L.; Cohn, J.F.; Sebe, N. Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Poh, M.Z.; McDuff, D.J.; Picard, R.W. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Opt. Express 2010, 18, 10762–10774. [Google Scholar] [CrossRef] [PubMed]
- Toisoul, A.; Kossaifi, J.; Bulat, A.; Tzimiropoulos, G.; Pantic, M. Estimation of continuous valence and arousal levels from faces in naturalistic conditions. Nat. Mach. Intell. 2021, 3, 42–50. [Google Scholar] [CrossRef]
- Du, W.; Wang, Y.; Qiao, Y. Recurrent spatial-temporal attention network for action recognition in videos. IEEE Trans. Image Process. 2017, 27, 1347–1360. [Google Scholar] [CrossRef] [PubMed]
- Hwang, H.; Lee, K.; Lee, E.C. A Real-time Remote Respiration Measurement Method with Improved Robustness based on a CNN Model. Appl. Sci. 2022, 12, 11603. [Google Scholar] [CrossRef]
# | Input | Operation | Output | k, p |
---|---|---|---|---|
1 | 3 × 244 × 244 | SE block | 12 × 112 × 112 | Downsampling |
2 | 12 × 112 × 112 | SE block | 16 × 56 × 56 | Downsampling |
3 | 16 × 56 × 56 | DS + SE block | 16 × 28 × 28 | Downsampling |
4 | 16 × 28 × 28 | S2 module | 32 × 28 × 28 | [k = 3, p = 1], [k = 5, p = 1] |
5 | 32 × 28 × 28 | S2 module | 32 × 28 × 28 | [k = 5, p = 1], [k = 3, p = 2] |
6 | 32 × 28 × 28 | S2 module | 32 × 28 × 28 | [k = 5, p = 2], [k = 3, p = 4] |
7 | 32 × 28 × 28 | S2 module | 32 × 28 × 28 | [k = 5, p = 1], [k = 5, p = 1] |
8 | 32 × 28 × 28 | S2 module | 32 × 28 × 28 | [k = 3, p = 2], [k = 3, p = 4] |
9 | 32 × 28 × 28 | S2 module | 32 × 28 × 28 | [k = 3, p = 1], [k = 5, p = 2] |
10 | 48 × 28 × 28 | Concatenation, Conv2d | 2 × 28 × 28 | Encoder output |
11 | 2 × 28 × 28 | Bilinear2d | 2 × 56 × 56 | Upsampling |
12 | 16 × 56 × 56 | Conv2d | 2 × 56 × 56 | Shortcut |
13 | 2 × 56 × 56 | Gate function | 2 × 56 × 56 | Information blocking operation |
14 | 2 × 56 × 56 | Bilinear2d | 2 × 112 × 112 | Upsampling |
15 | 2 × 112 × 112 | Bilinear2d, Conv2d | 2 × 224 × 224 | Upsampling |
Methods | MAPE | Coverage5 | Coverage3 | SNR | |
---|---|---|---|---|---|
YCbCr [4] | 5.58 | 80.25% | 66.0% | 0.9290 | 0.83 |
HSV [5] | 9.53 | 73.25% | 60.50% | 0.8782 | 0.75 |
ELSNet [18] | 4.48 | 85.25% | 71.50% | 1.0004 | 0.88 |
Deeplabv3+ Mobile [16] | 2.01 | 96.78% | 95.17% | 1.5479 | 0.88 |
Deeplabv3+ HR [16] | 1.99 | 96.80% | 95.11% | 1.5647 | 0.88 |
Skin-SegNet (ours) | 1.81 | 96.78% | 95.17% | 1.6077 | 0.89 |
Methods | Mean (ms) | Max. (ms) | MACs (G) | Parameters (M) |
---|---|---|---|---|
Deeplabv3+ Mobile [16] | 124 | 130 | 35.66 | 5.22 |
Deeplabv3+ HR [16] | 324 | 359 | 6.03 | 71.71 |
ELSNet [18] * | 5 | 7 | 0.023 | 0.01 |
Skin-SegNet (ours) * | 12 | 15 | 0.047 | 0.019 |
rPPG Method | Skin Segmentation Method | MAPE | Coverage5 | Coverage3 |
---|---|---|---|---|
CHROM [7] | YCbCr [4] * | 9 | 63.41% | 49.57% |
HSV [5] * | 11.7 | 54.45% | 39.49% | |
ELSNet [18] * | 6.7 | 71.22% | 51.0% | |
Deeplabv3+ Mobile [16] | 1.9 | 93.83% | 90.51% | |
Deeplabv3+ HR [16] | 1.9 | 94.47% | 91.48% | |
Skin-SegNet (ours) * | 2.3 | 93.65% | 91.18% | |
OMIT [9] | YCbCr [4] * | 10.2 | 60.45% | 48.67% |
HSV [5] * | 11.2 | 58.76% | 42.33% | |
ELSNet [18] * | 6.9 | 67.09% | 55.39% | |
Deeplabv3+ Mobile [16] | 1.7 | 95.52% | 94.29% | |
Deeplabv3+ HR [16] | 1.7 | 95.46% | 94.03% | |
Skin-SegNet (ours) * | 1.8 | 95.19% | 93.31% | |
PCA [6] | YCbCr [4] * | 8.7 | 65.49% | 49.18% |
HSV [5] * | 20.9 | 42.32% | 32.43% | |
ELSNet [18] * | 6.7 | 68.87% | 54.11% | |
Deeplabv3+ Mobile [16] | 1.6 | 95.92% | 94.60% | |
Deeplabv3+ HR [16] | 1.5 | 96.67% | 94.58% | |
Skin-SegNet (ours) * | 2.1 | 94.81% | 93.35% | |
POS [8] | YCbCr [4] * | 14.7 | 44.38% | 34.47% |
HSV [5] * | 16.0 | 43.80% | 30.75% | |
ELSNet [18] * | 14 | 46.66% | 35.12% | |
Deeplabv3+ Mobile [16] | 1.9 | 96.72% | 92.30% | |
Deeplabv3+ HR [16] | 1.8 | 96.47% | 91.74% | |
Skin-SegNet (ours) * | 2.4 | 95.81% | 93.05% |
rPPG Method | Skin Segmentation Method | MAPE | Coverage5 | Coverage3 |
---|---|---|---|---|
CHROM [7] | YCbCr [4] * | 9.5 | 62.84% | 48.58% |
HSV [5] * | 12.9 | 60.48% | 45.11% | |
ELSNet [18] * | 7.4 | 69.17% | 54.41% | |
Deeplabv3+ Mobile [16] | 1.1 | 98.03% | 96.36% | |
Deeplabv3+ HR [16] | 1.1 | 98.61% | 97.60% | |
Skin-SegNet (ours) * | 1.2 | 98.63% | 97.76% | |
OMIT [9] | YCbCr [4] * | 10.6 | 62.14% | 47.23% |
HSV [5] * | 12.2 | 62.42% | 49.41% | |
ELSNet [18] * | 7.6 | 67.97% | 55.08% | |
Deeplabv3+ Mobile [16] | 1.0 | 98.56% | 97.39% | |
Deeplabv3+ HR [16] | 1.2 | 98.18% | 97.69% | |
Skin-SegNet (ours) * | 1.2 | 98.59% | 97.39% | |
PCA [6] | YCbCr [4] * | 9.8 | 63.32% | 49.70% |
HSV [5] * | 26.4 | 60.29% | 45.40% | |
ELSNet [18] * | 7.4 | 68.31% | 55.37% | |
Deeplabv3+ Mobile [16] | 1.0 | 98.39% | 97.85% | |
Deeplabv3+ HR [16] | 1.2 | 98.13% | 97.60% | |
Skin-SegNet (ours) * | 1.4 | 98.19% | 97.10% | |
POS [8] | YCbCr [4] * | 14.6 | 53.46% | 39.70% |
HSV [5] * | 18.2 | 55.47% | 42.08% | |
ELSNet [18] * | 13.4 | 56.27% | 41.83% | |
Deeplabv3+ Mobile [16] | 1.4 | 98.68% | 96.36% | |
Deeplabv3+ HR [16] | 1.4 | 98.34% | 96.91% | |
Skin-SegNet (ours) * | 1.0 | 98.42% | 97.80% |
MAPE (%) | Number of Frames | Time (s) | |
---|---|---|---|
2.59 | 14,737 | 491 | |
1.54 | 24,886 | 830 | |
1.21 | 20,433 | 681 | |
0.51 | 26,184 | 873 | |
26.9 | 4596 | 153 | |
5.24 | 663 | 22 | |
0.75 | 800 | 27 | |
0.3 | 512 | 17 | |
0.5 | 5483 | 183 | |
1.59 | 3979 | 133 | |
0.52 | 263 | 9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, K.; Oh, J.; You, H.; Lee, E.C. Improving Remote Photoplethysmography Performance through Deep-Learning-Based Real-Time Skin Segmentation Network. Electronics 2023, 12, 3729. https://doi.org/10.3390/electronics12173729
Lee K, Oh J, You H, Lee EC. Improving Remote Photoplethysmography Performance through Deep-Learning-Based Real-Time Skin Segmentation Network. Electronics. 2023; 12(17):3729. https://doi.org/10.3390/electronics12173729
Chicago/Turabian StyleLee, Kunyoung, Jaemu Oh, Hojoon You, and Eui Chul Lee. 2023. "Improving Remote Photoplethysmography Performance through Deep-Learning-Based Real-Time Skin Segmentation Network" Electronics 12, no. 17: 3729. https://doi.org/10.3390/electronics12173729
APA StyleLee, K., Oh, J., You, H., & Lee, E. C. (2023). Improving Remote Photoplethysmography Performance through Deep-Learning-Based Real-Time Skin Segmentation Network. Electronics, 12(17), 3729. https://doi.org/10.3390/electronics12173729