A Standardized Approach for Skin Detection: Analysis of the Literature and Case Studies
Abstract
:1. Introduction
- An exhaustive review of the literature on skin-color-detection approaches, with a detailed description of methods freely available.
- Collection and study of virtually any real skin-detection dataset available in the literature.
- A testing protocol for comparing different approaches for skin detection.
- Four different deep-learning architectures have been trained for skin detection. The proposed ensemble obtains a state-of-the-art performance (the code is made publicly available at https://github.com/LorisNanni (accessed on 26 November 2022)).
2. Methods for Skin Detection
- Age, ethnicity, and other human characteristics. Human racial groupings have skin that ranges in color from white to dark brown; the age-related transition from young to old skin determines a significant variety in tones.
- Shooting conditions connected with acquiring devices’ characteristics and lighting variations have a large effect on the appearance of skin. In general, changes in lighting level or light-source distribution determine the presence of shadows and changes in skin color.
- Skin paint: Tattoos and makeup affect the aspect of the skin.
- Complex background: The presence of skin-colored objects in the background can fool the skin detector.
- The formulation of the problem based on either segmenting the image into human skin regions or treating each pixel as skin or non-skin, regardless of its neighbors. There are few area-based skin-color detection methods [31,32,33,34], including some recent methods (e.g., [35,36]) based on convolutional neural networks.
- The type of approach [37]: Rule-based methods define explicit rules for determining skin color in an appropriate color space; machine learning approaches use nonparametric or parametric learning approaches to estimate the color distribution of the training.
- According to other taxonomies from the field of machine learning [38] that consider the classification step, statistical methods include parametric methods based on Bayes’ rule of mixed models [39] applied at a pixel level. Diffusion-based methods [40,41] extend the analysis to adjacent pixels to improve classification performance. Neural network models [42,43] take into account both color and texture information. Adaptive techniques [44] rely on coordination patterns to adapt to specific conditions (e.g., lighting, skin color, and background). Model calibration often provides performance benefits but increases computation time. Support Vector Machine (SVM)-based systems are parametric models based on SVM classifiers. When the SVM classifier is trained by active learning, this class also repeats the adaptive method [14]. Blending methods are methods based on combining different machine-learning approaches [45]. Finally, hyperspectral models [46] are based on acquisition instruments with hyperspectral capabilities. Despite the benefits of the availability of spectral information, these approaches are not included in this survey, as they only apply to ad hoc datasets.
- Deep-learning methods have shown outstanding potential in dermatology for skin-lesion detection and identification [6]; however, they usually require annotations beforehand and can only classify lesion classes seen in the training set. Moreover, large-scale, open-sourced medical datasets normally have far fewer annotated classes than in real life, further aggravating the problem.
- GMM [39] is a simple skin-detection approach based on the Gaussian mixture model that is trained to classify non-skin and skin pixels in the RGB color space.
- Bayes [39] is a fast method based on a Bayesian classifier that is trained to classify skin and non-skin pixels in the RGB color space. The training set is composed of the first 2000 images from the ECU dataset.
- SPL [55] is a pixel-based skin-detection approach that uses a look-up table (LUT) to determine skin probabilities in the RGB domain. For the test image, it is probable that each pixel, x, is occluded, and so apply a threshold, τ, to determine whether it is not occluded/nose.
- Cheddad [56] is a fast pixel-based method that converts the RGB color space into a 1D space by separating the grayscale map from its non-red encoded counterpart. The classification process uses skin probability to define the bottom and upper bounds of the skin cluster, and a classification threshold, τ, determines the outcome.
- Chen [43] is a statistical skin-color method that was designed to be implemented on hardware. The skin region is delineated in a transformed space obtained as the 3D skin cube, whose axes are the difference of two-color channels: sR = R-G, sG = G-B, and sB = R-B.
- SA1 [57], SA2 [44], and SA3 [58] are three skin-detection methods based on spatial analysis. Starting with the skin-probability map obtained with the pixel-color detector, the first step in spatial analysis is to correctly select high-probability pixels, as skin seeds. The second step is to find the shortest path to propagate the “shell” from each seed to each individual pixel. During the enhancement process, all non-adjacent pixels are marked as non-skin. SA2 [44] is an evolution of the previous approach, using both color and textural features to determine the presence of skin: it extracts the textural features from the skin probability maps rather than from the luminance channel. SA3 [58] is a further evolution of the previous spatial analysis approaches that combines probabilistic mapping and local skin-color patterns to describe skin regions.
- DYC [59] is a skin-detection approach which takes into account the lighting conditions. The approach is based on the dynamic definition of the skin cluster range in the YCb and YCr subspaces of YCbCr color space and on the definition of correlation rules between the skin color clusters.
- In [1,60], several deep-learning segmentation approaches are compared: SegNet, U-Net; DeepLabv3+; HarD-NetMSEG (Harmonic Densely Connected Network) (https://github.com/james128333/HarDNet-MSEG, Last access on 5 November 2022); [61] and Polyp-PVT [62], a deep-learning segmentation model based on a transformer encoder, i.e., PVT (Pyramid Vision Transformer) (https://github.com/DengPingFan/Polyp-PVT, Last access on 5 November 2022).
- ALDS [63] is a framework based on probabilistic approach that initially utilizes active contours and watershed merged mask for segmenting out the mole, and, later, the SVM and Neural Classifier are applied for the classification of the segmented mole.
- DNF-OOD [6] applies a non-parametric deep-forest-based approach to the problem of out-of-distribution (OOD) detection
- SANet [64] contains two sub-modules: superpixel average pooling and superpixel attention module. The authors introduce a superpixel average pooling to reformulate the superpixel classification problem as a superpixel segmentation problem, and a superpixel attention module is utilized to focus on discriminative superpixel regions and feature channels.
- OR-Skip-Net [65] is an outer residual skip connection that was designed and implemented to deal with skin segmentation in challenging environments, irrespective of skin color, and to eliminate the cost of the preprocessing. The model is based on a deep convolutional neural network.
- In [29], a new approach for skin detection that performs a color-based data augmentation to enrich the dataset with artificial images to mimic alternate representations of the image is proposed. Data augmentation is performed in the HSV (hue, saturation, and value) space. For each image in a dataset, this approach creates fifteen new images.
- In [30], a different color space is proposed; its goal is to represent the information in images, introducing a linear and nonlinear conversion of the RGB color space through a conversion matrix (W matrix). The W matrix values are optimized to meet two conditions: firstly, maximizing the distance between centers of skin and non-skin classes; and, secondly, minimizing the entropy of each class. The classification step is performed with the adoption of neural networks and an adaptive neuro-fuzzy inference system called Adaptive network-based fuzzy inference system (ANFIS).
- SSS-Net [66] captures the multi-scale contextual information and refines the segmentation results especially along object boundaries. It also reduces the cost of the preprocessing, as well.
- SCMUU [67] stands for skin-color-model updating units, and it performs skin detection by using the similarity of adjacent frames in a video. The method is based on the assumption that the face and other parts of the body have a similar skin color. The color distribution is used to build chrominance components of the YCbCr color space by referring to facial landmarks.
- SKINNY [68] is a U-net based model. The model has more depth levels; it uses wider convolutional kernels for the expansive path and employs inception modules alongside dense blocks to strengthen feature propagation. In such a way, the model is able to increase the multi-scale analysis range.
Hand Segmentation
- Refined U-net [19]: The authors proposed a refinement of U-net that performs with a few parameters and increases the inference speed, while achieving high accuracy during the hand-segmentation process.
- CA-FPN [69] stands for Context Attention Feature Pyramid Network and is a model designed for human hand detection. In this method, a novel Context Attention Module (CAM) is inserted into the feature pyramid networks. The CAM is designed to capture relative contextual information for hands and build long-range dependencies around hands.
3. Materials and Methods
3.1. Deep Learning for Semantic Image Segmentation
- Initial learning rate = 0.01;
- Number of epoch = 10 (using the simple data augmentation approach, DA1; see Section 3.3) or 15 (the latter more complex data augmentation approach, DA2 (see Section 3.3), since the slower convergence using this larger augmented training set);
- Momentum = 0.9;
- L2 Regularization = 0.005;
- Learning Rate Drop Period = 5;
- Learning Rate Drop Factor = 0.2;
- Shuffle training images every epoch;
- Optimizer = SGD (stochastic gradient descent).
3.2. Loss Functions
- Dice Loss is a commonly accepted measure for models used for semantic segmentation. It is derived from the Sorensen–Dice ratio coefficients that test how similar two images are. The value range is [0, 1].
- Tversky Loss [81] deals with a common problem in machine learning and image segmentation that manifests as unbalanced classes in dataset, meaning that one class dominates the other.
- Focal Tversky Loss: The cross-entropy (CE) function is designed to limit the inequality between two probability distributions. Several variants of CE have been proposed in the literature, including, for example, focal loss [82] and binary cross-entropy. The first uses a modulation coefficient y > 0 to allow the model to focus on rough patterns rather than correctly classified patterns. The second is an adaptation of CE applied to a binary classification problem (i.e., a problem with only two classes).
- Focal Generalized Dice Loss allows users to focus on a limited ROI to reduce the weight of ordinary samples. This is achieved by regulating the modulating factor.
- Log-Cosh-Type Loss is a combination of Dice Loss and Log-Cos. Log-Cosh function is commonly applied with the purpose of smoothing the curve in regression applications.
- Cross-entropy: The cross-entropy loss (CE) function provides a measure of the difference between two probability distributions. The aim is to minimize these differences and avoid deviations between small and large areas. This can be problematic when working with unbalanced datasets. Thus, a weighted cross-entropy loss and a better-balanced classification for unbalanced scenarios were introduced [85]. The weighted binary cross-entropy formula is given in (14).
- Intersection-over-Union (IoU) loss is another well-known loss function, which was introduced for the first time in [86].
- Structure Loss is based on the combination of weighted Intersect-over-Union and weighted binary-crossed entropy. In Table 2, Formula (19) refers to structure loss, while Formula (20) is a simple variation that wants to give more importance to the binary-crossed entropy loss.
- Boundary Enhancement Loss is a loss proposed in [87] which explicitly focus on the boundary areas during training. This loss has very good performances, as it does not require any pre- or postprocessing of the image nor a particular net in order to work. In [60], the authors propose to combine it with Dice Loss and weighted cross-entropy loss.
- Contour-aware loss was proposed for the first time in [88]. It consists of a weighted binary cross-entropy loss where the weights are obtained with the aim of giving more importance to the borders of the image. In the loss, a morphological gradient edge detector was employed. Basically, the difference between the dilated and the eroded label map is evaluated. Then, for smoothing purposes, the Gaussian blur was applied.
3.3. Data Augmentation
- DA1, base data augmentation consisting of horizontal and vertical flip, 90° rotation.
- DA2, this technique performs a set of operations to the original images in order to derive new ones. These operations comprehend shadowing, color mapping, vertical, or horizontal flipping, and others.
4. Performance Evaluation
4.1. Performance Indicators
4.2. Skin Detection Evaluation: Datasets
- Compaq [39] is one of the first and most widely used large-scale skin datasets, consisting of images collected from web browsing. The original dataset was composed of 9731 images containing skin pixels and 8965 images with no skin pixels. Moreover, only 4675 skin images come with a ground truth.
- TDSD [93] contains 555 images with highly imprecise annotations produced with automatic labeling.
- Chile [94] contains 103 images with different lighting conditions and complex backgrounds. The ground truth is manually interpreted with moderate accuracy. The ECU Skin dataset [95] is a collection of 4000 color images with a relatively high ground-truth annotation. It is particularly challenging because they contain a wide variety of lighting conditions, background scenes, and skin types.
- Schmugge [96] is a collection of 845 images with accurate annotations on the three classes (skinned/non-skinned/unrelated). The dataset includes images come from different face datasets (i.e., the University of Chile database, the UOPB dataset, and the AR face dataset).
- Feeval [15] is a low-quality dataset composed of 8991 frames extracted from 25 online videos. The image quality is very low, as well as the precision of the annotations.
- The MCG skin database [97] contains 1000 images selected from the Internet, including blurred backgrounds, various ambient lights, and various human beings. Ground truths have been obtained by hand marking, but it is not accurate, as sometimes eyes, eyebrows, and even wrists are marked with skin.
- The VMD [98] contains 285 images; it is usually implemented to recognize human activity. The images cover a wide range of lighting levels and conditions.
- The SFA dataset [99] contains 1118 manually labeled images (with moderate accuracy).
- Pratheepan [100] contains 78 images randomly downloaded from Google.
- The HGR [58] contains 1558 images representing Polish and American Sign Language gestures with controlled and uncontrolled backgrounds.
- The SDD [101] contains 21,000 images, some images taken from a video and some others taken from a popular face dataset with different lighting conditions and with different skin colors of people around the world.
- VT-AAST [102] is a color-image database for benchmarking face detection and includes 66 images with precise ground truth.
- The Abdominal Skin Dataset [18] consists of 1400 abdominal images collected by using Google image search and then manually segmented. The dataset preserves the diversity of different ethnic groups and avoids the racial bias implicit in segmentation algorithms: 700 images represent dark-skinned people, and 700 images represent light-skinned people. Additionally, 400 images represent individuals with high body mass index (BMI), evenly distributed between light and dark skins. The dataset also took into account other inter-individual variation, such as hair and tattoo coverage, and external variation, such as shadows, when preparing the dataset.
Name (Abbr.) | Ref. | Images | Ground Truth | Download | Year |
---|---|---|---|---|---|
Compaq (CMQ) | [39] | 4675 | Semi-supervised | currently not available | 2002 |
TDSD | [93] | 555 | Imprecise | http://lbmedia.ece.ucsb.edu/research/skin/skin.htm (accessed on 26 November 2022) | 2004 |
UChile (UC) | [94] | 103 | Medium Precision | http://agami.die.uchile.cl/skindiff/ (accessed on 26 November 2022) | 2004 |
ECU | [95] | 4000 | Precise | http://www.uow.edu.au/~phung/download.html (currently not available) (accessed on 26 November 2022) | 2005 |
VT-AAST (VT) | [102] | 66 | Precise | ask to the authors | 2007 |
Schmugge (SCH) | [96] | 845 | Precise (3 classes) | https://www.researchgate.net/publication/257620282_skin_image_Data_set_with_ground_truth (accessed on 26 November 2022) | 2007 |
Feeval | [15] | 8991 | Low quality, imprecise | http://www.feeval.org/Data-sets/Skin_Colors.html (accessed on 26 November 2022) | 2009 |
MCG | [97] | 1000 | Imprecise | http://mcg.ict.ac.cn/result_data_02mcg_skin.html (ask the authors) (accessed on 26 November 2022) | 2011 |
Pratheepan (PRAT) | [100] | 78 | Precise | http://web.fsktm.um.edu.my/~cschan/downloads_skin_dataset.html (accessed on 26 November 2022) | 2012 |
VDM | [98] | 285 | Precise | http://www-vpu.eps.uam.es/publications/SkinDetDM/ (accessed on 26 November 2022) | 2013 |
SFA | [99] | 1118 | Medium Precision | http://www1.sel.eesc.usp.br/sfa/ (accessed on 26 November 2022) | 2013 |
HGR | [44,58] | 1558 | Precise | http://sun.aei.polsl.pl/~mkawulok/gestures/ (accessed on 26 November 2022) | 2014 |
SDD | [101] | 21,000 | Precise | Not available | 2015 |
Abdominal Skin Dataset | [18] | 1400 | Precise | https://github.com/MRE-Lab-UMD/abd-skin-segmentation (accessed on 26 November 2022) | 2019 |
4.3. Hand-Detection Evaluation: Datasets
- EgoYouTubeHands (EYTH) [70] dataset: It comprehends images extracted from YouTube videos. Specifically, authors downloaded three videos with an egocentric point of view and annotated one frame every five frames. The user in the video interacts with other people and performs several activities. The dataset has 1290 frames with hand annotation at the pixel level, where the environment, number of participants, hand sizes, and other factors vary among different images.
- GeorgiaTech Egocentric Activity dataset (GTEA) [103]: The dataset contains images from videos about four different subjects performing seven daily activities. Originally, the dataset was built for activity recognition in the same environment. The original dataset has 663 images with pixel-level hand annotations, considering hand till arm. Arms have been removed for a fair training, as already achieved in previous works (e.g., [70]).
5. Experimental Results
- PVT(2), sum rule between PVT combined with DA1 and PVT combined with DA2;
- HSN(2) is similar to PVT(2), i.e., sum rule between one HSN combined with DA1 and one HSN combined with DA2;
- FH(2), sum rule among two H_S (one combined with DA1, the latter with DA2) and two H_A (one combined with DA1, the latter with DA2);
- FH(4) computes FH(2) twice, and the output is aggregated by using the sum rule.
- FH(2) + 2 × PVT(2), weighted sum rule between PVT(2) and FH(2); the weight of PVT(2) is assigned so that its importance in the ensemble is the same of FH(2) (notice that FH(2) consists of four networks, while PVT(2) is built by only two networks).
- FH(4) + 4 × PVT(2), weighted sum rule between PVT(2) and FH(4); the weight of PVT(2) is assigned so that its importance in the ensemble is the same of FH(4).
- AllM = ELossMix2(10) + (10/4) × FH(2) + (10/2) × PVT(2), weighted sum rule among ElossMix2(10), FH(2), and PVT(2); as in the previous ensemble, the weights are assigned so that each ensemble member has the same importance. ELossMix2(10) is an ensemble, combined by sum rule, of ten stand-alone DeepLabV3+ segmentators with Resnet101 backbone (pretrained as detailed before using VOC); the ten networks are obtained by coupling five loss, vix.: LDiceBES, Comb1, Comb2, and Comb3 (see Table 2 for loss definitions) one time, using DA1, and another time, using DA2.
- AllM_H = ELossMix2(10) + (10/4) × FH(2) + (10/2) × PVT(2) + (10/2) × HSN(2), similar to the previous one but with the add-on of HSN(2).
5.1. Skin Segmentation
5.2. Hand Segmentation
- RN18 a stand-alone DeepLabV3+ segmentators with backbone Resnet18 (pretrained in ImageNet);
- ERN18(N) is an ensemble of N RN18 networks (pretrained in ImageNet);
- RN50 a stand-alone DeepLabV3+ segmentators with backbone Resnet50 (pretrained in ImageNet);
- ERN50(N) is an ensemble of N RN50 networks;
- RN101 a stand-alone DeepLabV3+ segmentators with backbone Resnet101 (pretrained as detailed in before using VOC);
- ERN101(N) is an ensemble of N RN101 networks.
- ELoss101(10) is an ensemble, combined by sum rule, of 10 RN101, each coupled with data-augmentation DA1 and a given loss function; the final fusion is given by 2 × + 2 × + 2 × Comb1 + 2 × Comb2 + 2 × Comb3, where, with 2 × , we mean two different RN101 trained by using the loss function.
- ELossMix(10) is an ensemble that is similar to the previous one, but here data augmentation is used to increase diversity: the networks coupled with the loss used in ELoss101(10) ( , Comb1, Comb2, and Comb3) are trained one time, using DA1, and another time, using DA2 (i.e., 5 networks each trained two times, so we have an ensemble of 10 networks);
- ELossMix2(10) is similar to the previous ensemble, but it used LDiceBES instead of .
- Some approaches adopt ad hoc pretraining for hand segmentation, so the performance improves, but it becomes difficult to tell whether the improvement is related to model choice or better pretraining;
- Others use additional training images, making performance comparison unfair.
6. Conclusions and Future Research Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lumini, A.; Nanni, L. Fair comparison of skin detection approaches on publicly available datasets. Expert Syst. Appl. 2020, 160, 113677. [Google Scholar] [CrossRef]
- Han, S.S.; Park, I.; Eun Chang, S.; Lim, W.; Kim, M.S.; Park, G.H.; Chae, J.B.; Huh, C.H.; Na, J.I. Augmented Intelligence Dermatology: Deep Neural Networks Empower Medical Professionals in Diagnosing Skin Cancer and Predicting Treatment Options for 134 Skin Disorders. J. Investig. Dermatol. 2020, 140, 1753–1761. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.R.H.; Pavlova, M.; Famouri, M.; Wong, A. Cancer-Net SCa: Tailored deep neural network designs for detection of skin cancer from dermoscopy images. BMC Med. Imaging 2022, 22, 143. [Google Scholar] [CrossRef] [PubMed]
- Maniraju, M.; Adithya, R.; Srilekha, G. Recognition of Type of Skin Disease Using CNN. In Proceedings of the 2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR), Hyderabad, India, 10–12 March 2022; pp. 1–4. [Google Scholar]
- Zhao, M.; Kawahara, J.; Abhishek, K.; Shamanian, S.; Hamarneh, G. Skin3D: Detection and longitudinal tracking of pigmented skin lesions in 3D total-body textured meshes. Med. Image Anal. 2022, 77, 102329. [Google Scholar] [CrossRef]
- Li, X.; Desrosiers, C.; Liu, X. Deep Neural Forest for Out-of-Distribution Detection of Skin Lesion Images. IEEE J. Biomed. Health Inform. 2022, 27, 157–165. [Google Scholar] [CrossRef] [PubMed]
- Pfeifer, L.M.; Valdenegro-Toro, M. Automatic Detection and Classification of Tick-borne Skin Lesions using Deep Learning. arXiv 2020. [Google Scholar] [CrossRef]
- Hsu, R.L.; Abdel-Mottaleb, M.; Jain, A.K. Face detection in color images. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 696–706. [Google Scholar]
- Argyros, A.A.; Lourakis, M.I.A. Real-time tracking of multiple skin-colored objects with a possibly moving camera. In Computer Vision—ECCV 2004; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
- Roy, K.; Mohanty, A.; Sahay, R.R. Deep Learning Based Hand Detection in Cluttered Environment Using Skin Segmentation. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 640–649. [Google Scholar]
- Sang, H.; Ma, Y.; Huang, J. Robust Palmprint Recognition Base on Touch-Less Color Palmprint Images Acquired. J. Signal Inf. Process. 2013, 4, 134–139. [Google Scholar] [CrossRef]
- De-La-Torre, M.; Granger, E.; Radtke, P.V.W.; Sabourin, R.; Gorodnichy, D.O. Partially-supervised learning from facial trajectories for face recognition in video surveillance. Inf. Fusion 2015, 24, 31–53. [Google Scholar] [CrossRef]
- Lee, J.-S.; Kuo, Y.-M.; Chung, P.-C.; Chen, E.-L. Naked image detection based on adaptive and extensible skin color model. Pattern Recognit. 2007, 40, 2261–2270. [Google Scholar] [CrossRef]
- Han, J.; Award, G.M.; Sutherland, A.; Wu, H. Automatic skin segmentation for gesture recognition combining region and support vector machine active learning. In Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, Southampton, UK, 10–12 April 2006; pp. 237–242. [Google Scholar]
- Stöttinger, J.; Hanbury, A.; Liensberger, C.; Khan, R. Skin paths for contextual flagging adult videos. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Las Vegas, NV, USA, 30 November–2 December 2009; Volume 5876 LNCS, pp. 303–314. [Google Scholar]
- Kong, S.G.; Heo, J.; Abidi, B.R.; Paik, J.; Abidi, M.A. Recent advances in visual and infrared face recognition-A review. Comput. Vis. Image Underst. 2005, 97, 103–135. [Google Scholar] [CrossRef] [Green Version]
- Healey, G.; Prasad, M.; Tromberg, B. Face recognition in hyperspectral images. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1552–1560. [Google Scholar]
- Topiwala, A.; Al-Zogbi, L.; Fleiter, T.; Krieger, A. Adaptation and Evaluation of Deep Learning Techniques for Skin Segmentation on Novel Abdominal Dataset. In Proceedings of the 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), Athens, Greece, 28–30 October 2019; pp. 752–759. [Google Scholar]
- Tsai, T.H.; Huang, S.A. Refined U-net: A new semantic technique on hand segmentation. Neurocomputing 2022, 495, 1–10. [Google Scholar] [CrossRef]
- Goceri, E. Automated Skin Cancer Detection: Where We Are and The Way to The Future. In Proceedings of the 2021 44th International Conference on Telecommunications and Signal Processing (TSP), Online, 26–28 July 2021; pp. 48–51. [Google Scholar]
- Rawat, V.; Singh, D.P.; Singh, N.; Kumar, P.; Goyal, T. A Comparative Study of various Skin Cancer using Deep Learning Techniques. In Proceedings of the 2022 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), Greater Noida, India, 20–21 May 2022; pp. 505–511. [Google Scholar]
- Afroz, A.; Zia, R.; Garcia, A.O.; Khan, M.U.; Jilani, U.; Ahmed, K.M. Skin lesion classification using machine learning approach: A survey. In Proceedings of the 2022 Global Conference on Wireless and Optical Technologies (GCWOT), Malaga, Spain, 14–17 February 2022; pp. 1–8. [Google Scholar]
- Jones, O.T.; Matin, R.N.; van der Schaar, M.; Prathivadi Bhayankaram, K.; Ranmuthu, C.K.I.; Islam, M.S.; Behiyat, D.; Boscott, R.; Calanzani, N.; Emery, J.; et al. Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: A systematic review. Lancet Digit. Health 2022, 4, e466–e476. [Google Scholar] [CrossRef] [PubMed]
- Wen, D.; Khan, S.M.; Xu, A.J.; Ibrahim, H.; Smith, L.; Caballero, J.; Zepeda, L.; de Blas Perez, C.; Denniston, A.K.; Liu, X.; et al. Characteristics of publicly available skin cancer image datasets: A systematic review. Lancet Digit. Health 2022, 4, e64–e74. [Google Scholar] [CrossRef]
- Kakumanu, P.; Makrogiannis, S.; Bourbakis, N. A survey of skin-color modeling and detection methods. Pattern Recognit. 2007, 40, 1106–1122. [Google Scholar] [CrossRef]
- Zarit, B.D.; Super, B.J.; Quek, F.K.H. Comparison of five color models in skin pixel classification. In Proceedings of the Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV’99 (Cat. No.PR00378), Corfu, Greece, 26–27 September 1999; pp. 58–63. [Google Scholar]
- Ibrahim, N.B.; Selim, M.M.; Zayed, H.H. A Dynamic Skin Detector Based on Face Skin Tone Color. In Proceedings of the 8th International Conference on In Informatics and Systems (INFOS), Giza, Egypt, 14–16 May 2012; pp. 1–5. [Google Scholar]
- Naji, S.; Jalab, H.A.; Kareem, S.A. A survey on skin detection in colored images. Artif. Intell. Rev. 2018, 52, 1041–1087. [Google Scholar] [CrossRef]
- Xu, H.; Sarkar, A.; Abbott, A.L. Color Invariant Skin Segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 21–24 June 2022; pp. 2906–2915. [Google Scholar]
- Nazari, K.; Mazaheri, S.; Bigham, B.S. Creating A New Color Space utilizing PSO and FCM to Perform Skin Detection by using Neural Network and ANFIS. arXiv 2021. [Google Scholar] [CrossRef]
- Chen, W.C.; Wang, M.S. Region-based and content adaptive skin detection in color images. Int. J. Pattern Recognit. Artif. Intell. 2007, 21, 831–853. [Google Scholar] [CrossRef]
- Poudel, R.P.K.; Zhang, J.J.; Liu, D.; Nait-Charif, H. Skin Color Detection Using Region-Based Approach. Int. J. Image Process. 2013, 7, 385. [Google Scholar]
- Kruppa, H.; Bauer, M.A.; Schiele, B. Skin Patch Detection in Real-World Images. In Proceedings of the Annual Symposium for Pattern Recognition of the DAGM, Zurich, Switzerland, 16–18 September 2002; p. 109f. [Google Scholar]
- Sebe, N.; Cohen, I.; Huang, T.S.; Gevers, T. Skin detection: A Bayesian network approach. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 26–26 August 2004; Volume 2, pp. 2–5. [Google Scholar]
- Kim, Y.; Hwang, I.; Cho, N.I. Convolutional neural networks and training strategies for skin detection. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3919–3923. [Google Scholar]
- Zuo, H.; Fan, H.; Blasch, E.; Ling, H. Combining Convolutional and Recurrent Neural Networks for Human Skin Detection. IEEE Signal Process. Lett. 2017, 24, 289–293. [Google Scholar] [CrossRef]
- Kumar, A.; Malhotra, S. Pixel-Based Skin Color Classifier: A Review. Int. J. Signal Process. Image Process. Pattern Recognit. 2015, 8, 283–290. [Google Scholar] [CrossRef]
- Mahmoodi, M.R.; Sayedi, S.M. A Comprehensive Survey on Human Skin Detection. Int. J. Image Graph. Signal Process. 2016, 8, 1–35. [Google Scholar] [CrossRef]
- Jones, M.J.; Rehg, J.M. Statistical color models with application to skin detection. Int. J. Comput. Vis. 2002, 46, 81–96. [Google Scholar] [CrossRef]
- Mahmoodi, M.R.; Sayedi, S.M. Leveraging spatial analysis on homogonous regions of color images for skin classification. In Proceedings of the 4th International Conference on Computer and Knowledge Engineering (ICCKE), Ferdowsi, Iran, 29–30 October 2014; pp. 209–214. [Google Scholar]
- Nidhu, R.; Thomas, M.G. Real Time Segmentation Algorithm for Complex Outdoor Conditions. Int. J. Sci. Technoledge 2014, 2, 71. [Google Scholar]
- Chen, L.; Zhou, J.; Liu, Z.; Chen, W.; Xiong, G. A skin detector based on neural network. In Proceedings of the Communications, Circuits and Systems and West Sino Expositions, Chengdu, China, 29 June–1 July 2002; Volume 1, pp. 615–619. [Google Scholar]
- Chen, Y.H.; Hu, K.T.; Ruan, S.J. Statistical skin color detection method without color transformation for real-time surveillance systems. Eng. Appl. Artif. Intell. 2012, 25, 1331–1337. [Google Scholar] [CrossRef]
- Kawulok, M.; Kawulok, J.; Nalepa, J. Spatial-based skin detection using discriminative skin-presence features. Pattern Recognit. Lett. 2014, 41, 3–13. [Google Scholar] [CrossRef]
- Jiang, Z.; Yao, M.; Jiang, W. Skin Detection Using Color, Texture and Space Information. In Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery, Hainan, China, 24–27 August 2007; pp. 366–370. [Google Scholar]
- Nunez, A.S.; Mendenhall, M.J. Detection of Human Skin in Near Infrared Hyperspectral Imagery. Int. Geosci. Remote Sens. Symp. 2008, 2, 621–624. [Google Scholar]
- Sandnes, F.E.; Neyse, L.; Huang, Y.-P. Simple and practical skin detection with static RGB-color lookup tables: A visualization-based study. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016; pp. 2370–2375. [Google Scholar]
- Song, W.; Wu, D.; Xi, Y.; Park, Y.W.; Cho, K. Motion-based skin region of interest detection with a real-time connected component labeling algorithm. Multimed. Tools Appl. 2016, 76, 11199–11214. [Google Scholar] [CrossRef]
- Jairath, S.; Bharadwaj, S.; Vatsa, M.; Singh, R. Adaptive Skin Color Model to Improve Video Face Detection. In Machine Intelligence and Signal Processing; Springer: Berlin/Heidelberg, Germany, 2016; pp. 131–142. [Google Scholar]
- Gupta, A.; Chaudhary, A. Robust skin segmentation using color space switching. Pattern Recognit. Image Anal. 2016, 26, 61–68. [Google Scholar] [CrossRef]
- Oghaz, M.M.; Maarof, M.A.; Zainal, A.; Rohani, M.F.; Yaghoubyan, S.H. A hybrid Color space for skin detection using genetic algorithm heuristic search and principal component analysis technique. PLoS ONE 2015, 10, e0134828. [Google Scholar]
- Xu, T.; Zhang, Z.; Wang, Y. Patch-wise skin segmentation of human body parts via deep neural networks. J. Electron. Imaging 2015, 24, 043009. [Google Scholar] [CrossRef]
- Ma, C.; Shih, H. Human Skin Segmentation Using Fully Convolutional Neural Networks. In Proceedings of the IEEE 7th Global Conference on Consumer Electronics (GCCE), Nara, Japan, 9–12 October 2018; pp. 168–170. [Google Scholar]
- Dourado, A.; Guth, F.; de Campos, T.E.; Li, W. Domain adaptation for holistic skin detection. arXiv 2019. [Google Scholar] [CrossRef]
- Conaire, C.Ó.; O’Connor, N.E.; Smeaton, A.F. Detector adaptation by maximising agreement between independent data sources. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007. [Google Scholar]
- Cheddad, A.; Condell, J.; Curran, K.; Mc Kevitt, P. A skin tone detection algorithm for an adaptive approach to steganography. Signal Process. 2009, 89, 2465–2478. [Google Scholar] [CrossRef]
- Kawulok, M. Fast propagation-based skin regions segmentation in color images. In Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, Shanghai, China, 22–26 April 2013. [Google Scholar]
- Kawulok, M.; Kawulok, J.; Nalepa, J.; Smolka, B. Self-adaptive algorithm for segmenting skin regions. EURASIP J. Adv. Signal Process. 2014, 2014, 170. [Google Scholar] [CrossRef]
- Brancati, N.; De Pietro, G.; Frucci, M.; Gallo, L. Human skin detection through correlation rules between the YCb and YCr subspaces based on dynamic color clustering. Comput. Vis. Image Underst. 2017, 155, 33–42. [Google Scholar] [CrossRef]
- Nanni, L.; Lumini, A.; Loreggia, A.; Formaggio, A.; Cuza, D. An Empirical Study on Ensemble of Segmentation Approaches. Signals 2022, 3, 341–358. [Google Scholar] [CrossRef]
- Huang, C.-H.; Wu, H.-Y.; Lin, Y.-L. HarDNet-MSEG: A Simple Encoder-Decoder Polyp Segmentation Neural Network that Achieves over 0.9 Mean Dice and 86 FPS. arXiv 2021. [Google Scholar] [CrossRef]
- Dong, B.; Wang, W.; Li, J.; Fan, D.-P. Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers. arXiv 2021. [Google Scholar] [CrossRef]
- Farooq, M.A.; Azhar, M.A.M.; Raza, R.H. Automatic Lesion Detection System (ALDS) for Skin Cancer Classification Using SVM and Neural Classifiers. In Proceedings of the 2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE), Taichung, Taiwan, 31 October–2 November 2016; pp. 301–308. [Google Scholar]
- He, X.; Lei, B.; Wang, T. SANet:Superpixel Attention Network for Skin Lesion Attributes Detection. arXiv 2019. [Google Scholar] [CrossRef]
- Arsalan, M.; Kim, D.S.; Owais, M.; Park, K.R. OR-Skip-Net: Outer residual skip network for skin segmentation in non-ideal situations. Expert Syst. Appl. 2020, 141, 112922. [Google Scholar] [CrossRef]
- Minhas, K.; Khan, T.M.; Arsalan, M.; Naqvi, S.S.; Ahmed, M.; Khan, H.A.; Haider, M.A.; Haseeb, A. Accurate Pixel-Wise Skin Segmentation Using Shallow Fully Convolutional Neural Network. IEEE Access 2020, 8, 156314–156327. [Google Scholar] [CrossRef]
- Zhang, K.; Wang, Y.; Li, W.; Li, C.; Lei, Z. Real-time adaptive skin detection using skin color model updating unit in videos. J. Real-Time Image Process. 2022, 19, 303–315. [Google Scholar] [CrossRef]
- Tarasiewicz, T.; Nalepa, J.; Kawulok, M. Skinny: A Lightweight U-Net for Skin Detection and Segmentation. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 2386–2390. [Google Scholar]
- Xie, Z.; Wang, S.; Zhao, W.; Guo, Z. A robust context attention network for human hand detection. Expert Syst. Appl. 2022, 208, 118132. [Google Scholar] [CrossRef]
- Khan, A.U.; Borji, A. Analysis of Hand Segmentation in the Wild. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4710–4719. [Google Scholar]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Minaee, S.; Boykov, Y.Y.; Porikli, F.; Plaza, A.J.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
- Zhang, W.; Fu, C.; Zheng, Y.; Zhang, F.; Zhao, Y.; Sham, C.W. HSNet: A hybrid semantic network for polyp segmentation. Comput. Biol. Med. 2022, 150, 106173. [Google Scholar] [CrossRef] [PubMed]
- Jadon, S. A survey of loss functions for semantic segmentation. In Proceedings of the 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2020, Virtual, 27–29 October 2020; pp. 1–7. [Google Scholar]
- Salehi, S.S.M.; Erdogmus, D.; Gholipour, A. Tversky loss function for image segmentation using 3D fully convolutional deep networks. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Quebec City, QC, Canada, 10 September 2017; Volume 10541. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 42, 2980–2988. [Google Scholar]
- Qin, X.; Zhang, Z.; Huang, C.; Gao, C.; Dehghan, M.; Jagersand, M. Basnet: Boundary-aware salient object detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 7479–7489. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Aurelio, Y.S.; de Almeida, G.M.; de Castro, C.L.; Braga, A.P. Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function. Neural Process. Lett. 2019, 50, 1937–1949. [Google Scholar] [CrossRef]
- Rahman, M.A.; Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Las Vegas, NV, USA, 12–14 December 2016; Volume 10072 LNCS. [Google Scholar]
- Yang, D.; Roth, H.; Wang, X.; Xu, Z.; Myronenko, A.; Xu, D. Enhancing Foreground Boundaries for Medical Image Segmentation. arXiv 2020, arXiv:2005.14355. [Google Scholar]
- Chen, Z.; Zhou, H.; Lai, J.; Yang, L.; Xie, X. Contour-Aware Loss: Boundary-Aware Learning for Salient Object Segmentation. IEEE Trans. Image Process. 2021, 30, 431–443. [Google Scholar] [CrossRef]
- Nanni, L.; Cuza, D.; Lumini, A.; Loreggia, A.; Brahnam, S. Deep ensembles in bioimage segmentation. arXiv 2021, arXiv:2112.12955. [Google Scholar]
- Nanni, L.; Brahnam, S.; Paci, M.; Ghidoni, S. Comparison of Different Convolutional Neural Network Activation Functions and Methods for Building Ensembles for Small to Midsize Medical Data Sets. Sensors 2022, 22, 6129. [Google Scholar] [CrossRef]
- Nanni, L.; Cuza, D.; Lumini, A.; Loreggia, A.; Brahman, S. Polyp Segmentation with Deep Ensembles and Data Augmentation. In Artificial Intelligence and Machine Learning for Healthcare; Springer: Cham, Switzerland, 2023; pp. 133–153. [Google Scholar]
- Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure To Roc, Informedness, Markedness & Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
- Zhu, Q.; Wu, C.-T.; Cheng, K.; Wu, Y. An adaptive skin model and its application to objectionable image filtering. In Proceedings of the 12th Annual ACM International Conference on Multimedia, New York, NY, USA, 10–16 October 2004; p. 56. [Google Scholar]
- Ruiz-Del-Solar, J.; Verschae, R. Skin detection using neighborhood information. In Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Republic of Korea, 17–19 May 2004; pp. 463–468. [Google Scholar]
- Phung, S.L.; Bouzerdoum, A.; Chai, D. Skin segmentation using color pixel classification: Analysis and comparison. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 148–154. [Google Scholar] [CrossRef] [PubMed]
- Abdallah, A.S.; El-Nasr, M.A.; Abbott, A.L. A new color image database for benchmarking of automatic face detection and human skin segmentation techniques. Int. J. Comput. Inf. Eng. 2007, 20, 353–357. [Google Scholar]
- Schmugge, S.J.; Jayaram, S.; Shin, M.C.; Tsap, L.V. Objective evaluation of approaches of skin detection using ROC analysis. Comput. Vis. Image Underst. 2007, 108, 41–51. [Google Scholar] [CrossRef]
- Huang, L.; Xia, T.; Zhang, Y.; Lin, S. Human skin detection in images by MSER analysis. In Proceedings of the 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 1257–1260. [Google Scholar]
- Sanmiguel, J.C.; Suja, S. Skin detection by dual maximization of detectors agreement for video monitoring. Pattern Recognit. Lett. 2013, 34, 2102–2109. [Google Scholar] [CrossRef]
- Casati, J.P.B.; Moraes, D.R.; Rodrigues, E.L.L. SFA: A human skin image database based on FERET and AR facial images. In Proceedings of the IX workshop de Visao Computational, Rio de Janeiro, Brazil, 3–5 June 2013. [Google Scholar]
- Tan, W.R.; Chan, C.S.; Yogarajah, P.; Condell, J. A Fusion Approach for Efficient Human Skin Detection. Ind. Inform. IEEE Trans. 2012, 8, 138–147. [Google Scholar] [CrossRef] [Green Version]
- Mahmoodi, M.R.; Sayedi, S.M.; Karimi, F.; Fahimi, Z.; Rezai, V.; Mannani, Z. SDD: A skin detection dataset for training and assessment of human skin classifiers. In Proceedings of the Knowledge-Based Engineering and Innovation (KBEI), Tehran, Iran, 5–6 November 2015; pp. 71–77. [Google Scholar]
- Li, Y.; Ye, Z.; Rehg, J.M. Delving Into Egocentric Actions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 287–295. [Google Scholar]
- Wang, W.; Yu, K.; Hugonot, J.; Fua, P.; Salzmann, M. Recurrent U-Net for Resource-Constrained Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27–28 October 2019; pp. 2142–2151. [Google Scholar]
GMM | Bayes | SPL | Cheddad Chen | SA1 SA2 SA3 | DYC | SegNet U-Net DeepLab HardNet | PVT HSN | |
---|---|---|---|---|---|---|---|---|
Preprocessing steps | ||||||||
None | x | x | x | x | x | x | ||
Dynamic adaptation | x | x | ||||||
Color space | ||||||||
Basic color spaces | x | x | x | x | x | |||
Perceptual color spaces | x | |||||||
Orthogonal color spaces | x | |||||||
Other (e.g., color ratio) | x | |||||||
Problem formulation | ||||||||
Segmentation based | x | x | x | |||||
Pixel based | x | x | x | x | x | |||
Type of pixel classification | ||||||||
Rule based | x | x | ||||||
Machine learning: parametric | x | x | ||||||
Machine learning: non-parametric | x | |||||||
Type of classifier | ||||||||
Statistical | x | x | ||||||
Mixture techniques | x | |||||||
Adaptive methods | x | |||||||
CNN | x | |||||||
Transformer | x |
Name | Formula | Parameters Description |
---|---|---|
Dice Loss | The weight, wk, aims to help focus the network on a limited area (so inversely proportional to the frequency of symbols for a given class k). | |
Tversky Index | α and β are two weighting factors used to balance false negative and false positive; n is the negative class, and p is the positive class. In the special case, for α = β = 0.5, we reduced the Tversky exponent to the equivalent Dice factor. | |
Tversky Loss | We fixed and . We used these values in order to put attention on false negatives. | |
Focal Tversky Loss | We chose . | |
Focal Generalized Dice Loss | We chose . | |
Log-Cosh Generalized Dice Loss | ||
Log-Cosh Focal Tversky Loss | ||
SSIM Index | Here,
and
are the local means;
and
are the standard deviations, and , is the cross-covariance for images x, y, while , are regularization constants | |
SSIM Loss | L_MS (Y,T), it defined as L_S, but instead of SSIM, we use the multiscale structural similarity (MS-SSIM) index. | |
Different Functions Combined Loss | ||
Weighted Cross-Entropy Loss | is the weight given to the i-th pixel of the image for the class k. These weights were calculated by using an average pooling over the mask with a kernel 31 × 31 and a stride of 1 in order to also consider nonmaximal activations. | |
Intersection over Union | ||
Weighted Intersect-over-Union Loss | , are calculated as aforementioned. | |
Dice Boundary Enhancement Loss | norm. | |
Contour-Aware Loss | and are dilation and erosion operations with a 5 × 5 kernel. K is a hyperparameter for assigning the high value to contour pixels, and the value was set to 5 empirically; is the matrix with 1 in every position. |
Name | Formula |
---|---|
Precision | |
Recall/True-Positive Rate (TPR) | |
F1 Measure/Dice | |
IoU | |
False-Positive Rate (FPR) |
Name | Ref. | Images | Ground Truth | Download | Year |
---|---|---|---|---|---|
EYTH | [70] | 1290 | Precise | https://github.com/aurooj/Hand-Segmentation-in-the-Wild (accessed on 26 November 2022) | 2018 |
GTEA | [103] | 663 | Precise | https://cbs.ic.gatech.edu/fpv/ (accessed on 26 November 2022) | 2015 |
DA | PRAT | MCG | UC | CMQ | SFA | HGR | SCH | VMD | ECU | VT | AVG | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
H_S | DA1 | 0.903 | 0.880 | 0.903 | 0.838 | 0.947 | 0.964 | 0.793 | 0.744 | 0.941 | 0.810 | 0.872 |
H_S | DA2 | 0.911 | 0.884 | 0.903 | 0.844 | 0.950 | 0.968 | 0.776 | 0.683 | 0.943 | 0.835 | 0.870 |
H_A | DA1 | 0.913 | 0.880 | 0.900 | 0.809 | 0.951 | 0.967 | 0.792 | 0.717 | 0.945 | 0.799 | 0.867 |
H_A | DA2 | 0.909 | 0.886 | 0.893 | 0.848 | 0.951 | 0.968 | 0.775 | 0.707 | 0.944 | 0.832 | 0.871 |
FH(2) | DA1/DA2 | 0.920 | 0.892 | 0.913 | 0.859 | 0.953 | 0.971 | 0.793 | 0.746 | 0.951 | 0.839 | 0.884 |
FH(4) | DA1/DA2 | 0.920 | 0.892 | 0.916 | 0.862 | 0.954 | 0.971 | 0.795 | 0.765 | 0.951 | 0.831 | 0.886 |
PVT | DA1 | 0.920 | 0.888 | 0.925 | 0.851 | 0.951 | 0.966 | 0.792 | 0.709 | 0.951 | 0.828 | 0.878 |
PVT | DA2 | 0.923 | 0.892 | 0.908 | 0.863 | 0.951 | 0.968 | 0.776 | 0.709 | 0.952 | 0.848 | 0.879 |
PVT(2) | DA1/DA2 | 0.925 | 0.892 | 0.925 | 0.863 | 0.952 | 0.970 | 0.781 | 0.719 | 0.954 | 0.850 | 0.883 |
HSN | DA1 | 0.927 | 0.893 | 0.920 | 0.851 | 0.953 | 0.966 | 0.777 | 0.704 | 0.951 | 0.800 | 0.874 |
HSN | DA2 | 0.924 | 0.896 | 0.889 | 0.860 | 0.953 | 0.969 | 0.781 | 0.690 | 0.953 | 0.855 | 0.877 |
HSN(2) | DA1/DA2 | 0.928 | 0.897 | 0.915 | 0.860 | 0.955 | 0.970 | 0.775 | 0.671 | 0.953 | 0.860 | 0.879 |
FH(2) + 2 × PVT(2) | DA1/DA2 | 0.927 | 0.894 | 0.932 | 0.868 | 0.954 | 0.971 | 0.797 | 0.767 | 0.955 | 0.853 | 0.893 |
FH(4) + 4 × PVT(2) | DA1/DA2 | 0.926 | 0.894 | 0.933 | 0.869 | 0.954 | 0.971 | 0.798 | 0.768 | 0.955 | 0.847 | 0.892 |
ElossMix2(10) | DA1/DA2 | 0.924 | 0.893 | 0.929 | 0.850 | 0.956 | 0.970 | 0.789 | 0.739 | 0.952 | 0.829 | 0.883 |
AllM | DA1/DA2 | 0.929 | 0.895 | 0.939 | 0.868 | 0.956 | 0.972 | 0.800 | 0.770 | 0.956 | 0.846 | 0.893 |
AllM_H | DA1/DA2 | 0.931 | 0.897 | 0.941 | 0.869 | 0.956 | 0.972 | 0.799 | 0.773 | 0.957 | 0.854 | 0.895 |
Method | YEAR | PRAT | MCG | UC | CMQ | SFA | HGR | SCH | VMD | AVG |
---|---|---|---|---|---|---|---|---|---|---|
Bayes | 2002 | 0.631 | 0.694 | 0.661 | 0.599 | 0.760 | 0.871 | 0.569 | 0.252 | 0.630 |
SA3 | 2014 | 0.709 | 0.762 | 0.625 | 0.647 | 0.863 | 0.877 | 0.586 | 0.147 | 0.652 |
U-Net | 2015 | 0.787 | 0.779 | 0.713 | 0.686 | 0.848 | 0.836 | 0.671 | 0.332 | 0.706 |
SegNet | 2017 | 0.730 | 0.813 | 0.802 | 0.737 | 0.889 | 0.869 | 0.708 | 0.328 | 0.734 |
[67] | 2020 | 0.812 | 0.841 | 0.829 | 0.773 | 0.902 | 0.950 | 0.714 | 0.423 | 0.781 |
[83] | 2021 | 0.926 | 0.888 | 0.916 | 0.842 | 0.955 | 0.971 | 0.799 | 0.764 | 0.883 |
AllM_H | 2023 | 0.931 | 0.897 | 0.941 | 0.869 | 0.956 | 0.972 | 0.799 | 0.773 | 0.892 |
IoU | EYTH | GTEA |
---|---|---|
RN18 | 0.759 | 0.761 |
RN50 | 0.782 | 0.808 |
RN101 | 0.806 | 0.841 |
ERN18(10) | 0.778 | 0.777 |
ERN50(10) | 0.796 | 0.812 |
ERN101(10) | 0.821 | 0.841 |
IoU | LOSS | EYTH | GTEA |
---|---|---|---|
ERN101(10) | 0.821 | 0.841 | |
ELoss101(10) | Many loss | 0.821 | 0.849 |
ELossMix(10) | Many loss | 0.819 | 0.852 |
ELossMix2(10) | Many loss | 0.823 | 0.852 |
IoU | DA | EYTH | GTEA |
---|---|---|---|
H_S | DA1 | 0.745 | 0.757 |
H_S | DA2 | 0.760 | 0.769 |
H_A | DA1 | 0.802 | 0.831 |
H_A | DA2 | 0.802 | 0.826 |
FH(2) | DA1/DA2 | 0.810 | 0.826 |
FH(4) | DA1/DA2 | 0.810 | 0.826 |
PVT | DA1 | 0.799 | 0.819 |
PVT | DA2 | 0.814 | 0.830 |
PVT(2) | DA1/DA2 | 0.808 | 0.837 |
HSN | DA1 | 0.818 | 0.833 |
HSN | DA2 | 0.815 | 0.836 |
HSN(2) | DA1/DA2 | 0.812 | 0.843 |
FH(2) + 2 × PVT(2) | DA1/DA2 | 0.824 | 0.840 |
FH(4) + 4 × PVT(2) | DA1/DA2 | 0.824 | 0.840 |
ELossMix2(10) | DA1/DA2 | 0.823 | 0.852 |
AllM | DA1/DA2 | 0.831 | 0.847 |
AllM_H | DA1/DA2 | 0.834 | 0.848 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nanni, L.; Loreggia, A.; Lumini, A.; Dorizza, A. A Standardized Approach for Skin Detection: Analysis of the Literature and Case Studies. J. Imaging 2023, 9, 35. https://doi.org/10.3390/jimaging9020035
Nanni L, Loreggia A, Lumini A, Dorizza A. A Standardized Approach for Skin Detection: Analysis of the Literature and Case Studies. Journal of Imaging. 2023; 9(2):35. https://doi.org/10.3390/jimaging9020035
Chicago/Turabian StyleNanni, Loris, Andrea Loreggia, Alessandra Lumini, and Alberto Dorizza. 2023. "A Standardized Approach for Skin Detection: Analysis of the Literature and Case Studies" Journal of Imaging 9, no. 2: 35. https://doi.org/10.3390/jimaging9020035
APA StyleNanni, L., Loreggia, A., Lumini, A., & Dorizza, A. (2023). A Standardized Approach for Skin Detection: Analysis of the Literature and Case Studies. Journal of Imaging, 9(2), 35. https://doi.org/10.3390/jimaging9020035