HLNet: A Unified Framework for Real-Time Segmentation and Facial Skin Tones Evaluation
Abstract
:1. Introduction
- (1)
- We propose an efficient hair and face segmentation network that uses newly proposed modules to achieve real-time inference while guaranteeing performance.
- (2)
- A module called InteractionModule is given, which exploits multi-dimensional feature interactions to mitigate the weakening of spatial information as the network becomes deeper and deeper.
- (3)
- A novel skin color level evaluation algorithm is proposed and obtains accurate results on a manually labeled dataset.
- (4)
- Our method achieves superior results on multiple benchmark datasets.
2. Related Works
3. Methodology
3.1. High-To-Low Dimension Fusion Network
3.2. Facial Skin Tone Classification
Algorithm 1: Segmentation-based inference algorithm for smoothed facial region extraction |
4. Experiments
4.1. Implementation Details
4.2. Datasets
4.2.1. Face and Hair Segmentation Datasets
4.2.2. Manually Annotated Dataset
4.3. Evaluation Metrics
5. Results and Discussion
5.1. Segmentation Results
5.1.1. Overall Comparison
5.1.2. Comparison with SOTA Lightweight Networks
5.1.3. Ablation Study
5.2. Facial Skin Tone Classification Results
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Rousset, C.; Coulon, P.Y. Frequential and color analysis for hair mask segmentation. In Proceedings of the 2008 15th IEEE International Conference on Image Processing, San Diego, CA, USA, 12–15 October 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 2276–2279. [Google Scholar]
- Shen, Y.; Peng, Z.; Zhang, Y. Image based hair segmentation algorithm for the application of automatic facial caricature synthesis. Sci. World J. 2014, 2014, 748634. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Abbas, Q.; Garcia, I.F.; Emre Celebi, M.; Ahmad, W. A feature-preserving hair removal algorithm for dermoscopy images. Skin Res. Technol. 2013, 19, e27–e36. [Google Scholar] [CrossRef] [PubMed]
- Borza, D.; Ileni, T.; Darabant, A. A deep learning approach to hair segmentation and color extraction from facial images. In Proceedings of the International Conference on Advanced Concepts for Intelligent Vision Systems, Poitiers, France, 24–27 September 2018; pp. 438–449. [Google Scholar]
- Wen, S.; Dong, M.; Yang, Y.; Zhou, P.; Huang, T.; Chen, Y. End-to-end detection-segmentation system for face labeling. IEEE Trans. Emerg. Top. Comput. Intell. 2019. [Google Scholar] [CrossRef]
- Luo, L.; Xue, D.; Feng, X. EHANet: An Effective Hierarchical Aggregation Network for Face Parsing. Appl. Sci. 2020, 10, 3135. [Google Scholar] [CrossRef]
- Muhammad, U.R.; Svanera, M.; Leonardi, R.; Benini, S. Hair detection, segmentation, and hairstyle classification in the wild. Image Vis. Comput. 2018, 71, 25–37. [Google Scholar] [CrossRef] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 39, 640–651. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2017; pp. 234–241. [Google Scholar]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. 2015. Available online: https://arxiv.org/abs/1511.07122 (accessed on 30 April 2016).
- Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1529–1537. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 325–341. [Google Scholar]
- Stricker, M.A.; Orengo, M. Similarity of color images. In Proceedings of the International Society for Optics and Photonics, San Jose, CA, USA, 23 March 1995; pp. 381–392. [Google Scholar]
- Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. 2017. Available online: https://arxiv.xilesou.top/abs/1706.05587 (accessed on 5 December 2017).
- Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–27 July 2017; pp. 11–19. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. Enet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. 2016. Available online: https://arxiv.xilesou.top/abs/1606.02147 (accessed on 7 June 2016).
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 405–420. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. 2017. Available online: https://arxiv.xilesou.top/abs/1704.04861 (accessed on 17 April 2017).
- Poudel, R.P.; Liwicki, S.; Cipolla, R. Fast-SCNN: Fast Semantic Segmentation Network. 2019. Available online: https://arxiv.xilesou.top/abs/1902.04502 (accessed on 12 February 2019).
- Li, H.; Xiong, P.; Fan, H.; Sun, J. Dfanet: Deep feature aggregation for real-time semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Angeles, CA, USA, 16–20 June 2019; pp. 9522–9531. [Google Scholar]
- Wang, Y.; Zhou, Q.; Liu, J.; Xiong, J.; Gao, G.; Wu, X.; Latecki, L.J. LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation. 2019. Available online: https://arxiv.xilesou.top/abs/1905.02423 (accessed on 13 May 2019).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, CA, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Zhang, H.; Dana, K.; Shi, J.; Zhang, Z.; Wang, X.; Tyagi, A.; Agrawal, A. Context encoding for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 19–21 June 2018; pp. 7151–7160. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Los Angeles, CA, USA, 16–20 June 2019; pp. 3146–3154. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected Crfs. 2014. Available online: https://arxiv.xilesou.top/abs/1412.7062 (accessed on 7 June 2016).
- Levinshtein, A.; Chang, C.; Phung, E.; Kezele, I.; Guo, W.; Aarabi, P. Real-time deep hair matting on mobile devices. In Proceedings of the Conference on Computer and Robot Vision, Toronto, ON, Canada, 8–10 May 2018; pp. 1–7. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. 2019. Available online: https://arxiv.xilesou.top/abs/1902.09212 (accessed on 25 February 2019).
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 19–21 June 2018; pp. 4510–4520. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. 2015. Available online: https://arxiv.xilesou.top/abs/1502.03167 (accessed on 2 March 2015).
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. 2019. Available online: https://arxiv.xilesou.top/abs/1904.01355 (accessed on 20 August 2019).
- Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Cardoso, M.J. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Québec City, QC, Canada, 14 September 2017; pp. 240–248. [Google Scholar]
- He, K.; Sun, J.; Tang, X. Guided image filtering. In Proceedings of the European Conference on Computer Vision, Crete, Greece, 5–11 September 2010; pp. 1–14. [Google Scholar]
- He, K.; Sun, J. Fast Guided Filter. 2015. Available online: https://arxiv.xilesou.top/abs/1505.00996 (accessed on 5 May 2015).
- Levin, A.; Lischinski, D.; Weiss, Y. A closed-form solution to natural image matting. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 30, 228–242. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kae, A.; Sohn, K.; Lee, H.; Learned-Miller, E. Augmenting CRFs with Boltzmann machine shape priors for image labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Sydney, Australia, 1–8 December 2013; pp. 2019–2026. [Google Scholar]
- Yang, S.; Luo, P.; Loy, C.C.; Tang, X. From facial parts responses to face detection: A deep learning approach. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 3676–3684. [Google Scholar]
- Svanera, M.; Muhammad, U.R.; Leonardi, R.; Benini, S. Figaro, hair detection and segmentation in the wild. In Proceedings of the IEEE International Conference on Image Processing, Phoenix, AZ, USA, 25–28 September 2016; pp. 3676–3684. [Google Scholar]
- Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE. Signal. Process. Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef] [Green Version]
Stage | Type | Output Size |
---|---|---|
Encoder | - | |
Conv2D | ||
DwConv2D | ||
DwConv2D | ||
InteractionModule | ||
FFM | ||
DilatedGroup | ||
Decoder | UpSample2D | |
Conv2D | ||
SoftMax |
Metric | LFW (OC = 3) | CelebHair (OC = 3) | Figaro1k (OC = 2) | |||
---|---|---|---|---|---|---|
U-Net | HLNet | U-Net | HLNet | U-Net | HLNet | |
mIoU | 83.46 | 83.81 | 88.56 | 89.55 | 77.75 | 78.39 |
fwIoU | 92.75 | 90.28 | 91.79 | 91.98 | 83.01 | 83.12 |
pixelAcc | 95.83 | 94.69 | 95.54 | 96.08 | 90.28 | 90.73 |
mPixelAcc | 88.84 | 90.35 | 93.61 | 94.49 | 84.72 | 84.93 |
Model | #Param (M) | FPS | FLOPs (G) | mIoU (%) |
---|---|---|---|---|
ENet [19] | 0.36 | 8.24 | 0.94 | 89.97 |
LEDNet [24] | 2.3 | 6.44 | 3.28 | 88.63 |
Fast-SCNN [22] | 1.6 | 20.35 | 0.41 | 87.14 |
MobileNet(0.5) + UNet [21] | 0.37 | 5.80 | 0.75 | 86.08 |
DFANet [23] | 0.42 | 17.72 | 0.08 | 82.88 |
HLNet (ours) | 1.2 | 12.23 | 0.94 | 90.32 |
HLNet (ours) † | 1.2 | 12.23 | 0.94 | 90.98 |
Method | RGB | HSV | YCrCb |
---|---|---|---|
Histogram (8 bins) | 75% | 78% | 73% |
Histogram with PCA (256 bins) | 77% | - | - |
Color Moment | 73% | 77% | 80% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Feng, X.; Gao, X.; Luo, L. HLNet: A Unified Framework for Real-Time Segmentation and Facial Skin Tones Evaluation. Symmetry 2020, 12, 1812. https://doi.org/10.3390/sym12111812
Feng X, Gao X, Luo L. HLNet: A Unified Framework for Real-Time Segmentation and Facial Skin Tones Evaluation. Symmetry. 2020; 12(11):1812. https://doi.org/10.3390/sym12111812
Chicago/Turabian StyleFeng, Xinglong, Xianwen Gao, and Ling Luo. 2020. "HLNet: A Unified Framework for Real-Time Segmentation and Facial Skin Tones Evaluation" Symmetry 12, no. 11: 1812. https://doi.org/10.3390/sym12111812