You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

26 April 2023

Efficient and Low Color Information Dependency Skin Segmentation Model

,
,
and
1
Department of AI & Informatics, Graduate School, Sangmyung University, Seoul 03016, Republic of Korea
2
Department of Computer Science, Graduate School, Sangmyung University, Seoul 03016, Republic of Korea
3
Department of Human-Centered Artificial Intelligence, Graduate School, Sangmyung University, Seoul 03016, Republic of Korea
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Application of Machine Learning in Image Processing and Computer Vision

Abstract

Skin segmentation involves segmenting the human skin region in an image. It is a preprocessing technique mainly used in many applications such as face detection, hand gesture recognition, and remote biosignal measurements. As the performance of skin segmentation directly affects the performance of these applications, precise skin segmentation methods have been studied. However, previous skin segmentation methods are unsuitable for real-world environments because they rely heavily on color information. In addition, deep-learning-based skin segmentation methods incur high computational costs, even though skin segmentation is mainly used for preprocessing. This study proposes a lightweight skin segmentation model with a high performance. Additionally, we used data augmentation techniques that modify the hue, saturation, and values, allowing the model to learn texture or contextual information better without relying on color information. Our proposed model requires 1.09M parameters and 5.04 giga multiply-accumulate. Through experiments, we demonstrated that our proposed model shows high performance with an F-score of 0.9492 and consistent performance even for modified images. Furthermore, our proposed model showed a fast processing speed of approximately 68 fps, based on 3 × 512 × 512 images and an NVIDIA RTX 2080TI GPU (11GB VRAM) graphics card.

1. Introduction

Skin segmentation is the task of detecting human skin regions in an image. It is a preprocessing method commonly used in various applications and is especially important in the field of biological systems and medicine []. Its applications include face detection, hand gesture recognition, and biosignal measurements such as remote photoplethysmography (rPPG) [,,]. The performance of skin segmentation is important because it directly affects the performance of these applications, and it should be lightweight to avoid affecting the processing time of the entire process. For example, in rPPG measurement, information about a human heart is contained only in the human skin pixels, and elements such as the background or moving mouth, eyes, and hair across the face can contaminate the signal. Therefore, accurate skin segmentation is important for reliable measurement of rPPG signals [,].
Thresholding-based methods are the most commonly used skin segmentation approaches. This simple and quick technique segments the skin region by defining a limited range of skin colors within a specific color space such as YCbCr or HSV [,,,,]. However, several problems exist because skin segmentation is performed using only pixel color information. The first is the change in the illumination conditions. The range of skin color is limited, but the skin color can change owing to illumination. Changes in illumination conditions are more frequent in the real world than in laboratory environments, which significantly affects the performance of thresholding-based methods [,]. The second is the presence of pixels of the same color as skin. This also causes performance degradation in thresholding-based methods [,]. Finally, defining the range of skin colors perfectly is challenging because skin colors vary according to race or individual differences. Therefore, thresholding-based methods are unsuitable for applications where preprocessing performance is important.
Recently, owing to improvements in computer performance, learning-based methods that use machine or deep learning have attracted interest in various fields. In particular, deep-learning-based methods exhibit better performance than traditional methods, and real-time processing is possible. Therefore, studies are being actively conducted in most computer vision fields, as well as in skin segmentation. To the best of our knowledge, the best-performing deep-learning-based methods are Tarasiewicz’s proposed method and Salah’s proposed method [,]. Tarasiewicz’s proposed method is based on U-Net and trained using the ECU dataset [,]. This method yielded a high F-score of 0.9230. Salah’s proposed method classifies skin and nonskin pixels using a simple convolutional neural network (CNN). Their model was trained using the SFA dataset and showed a high F-score of 0.9765 in experiments [].
Both Tarasiewiz’s and Salah’s methods showed an overperformance for each dataset; however, these methods also have problems. These problems are described in Section 2.2. In this study, we propose a high-performance method for solving these problems. The contributions of this study are as follows:
  • We propose a lightweight skin segmentation method that is more suitable than previous methods for real-time application preprocessing.
  • We used data augmentation techniques to reduce the color-information dependency of the model and demonstrated this experimentally.

3. Method

In this section, we describe the architecture of the proposed method based on SINet and a mobile vision transformer (MobileViT) [,]. SINet is an extremely lightweight network used for portrait segmentation. One of the main contributions of SINet is information blocking. Information blocking can reduce typical segmentation errors by providing additional information to regions in which the model is uncertain (Section 3.1). We combined the SINet architecture with MobileViT to improve model performance. The MobileViT is a lightweight vision transformer that can simultaneously encode local and global information (Section 3.2). Therefore, we changed the encoder of SINet to a MobileViT block. However, this change caused the model to be heavy; therefore, we applied Simplified Channel Attention (SCA) to make the model lighter (Section 3.3) []. Finally, to reduce the color information dependence of the proposed model, we used the data augmentation technique proposed by Xu (Section 3.4) []. The overall architecture of the proposed model is illustrated in Figure 4.
Figure 4. The overall architecture of our proposed model.

3.1. Information Blocking

Information blocking has been proposed to reduce errors in the segmentation model []. In the encoder–decoder structure used in SINet, the encoder loses details of the local information while extracting the feature maps. Thus, the segmentation model of this structure has low certainty at the boundary between the foreground and background. Owing to the disadvantage of losing the details of the information, several studies have often used skip connections to compensate []. However, using a skip connection not only provides useful information, but also unnecessary information that can act as noise. Thus, SINet applies information blocking, which provides additional information only for uncertain regions.
The equation of information blocking is shown in Equation (7):
M = 1 max   ( softmax ( X low ) ) , I = X h i g h   M .
X l o w is a feature map of the same size as the high-resolution feature map obtained by performing pointwise convolution and bilinear upsampling on the final feature maps of the encoder. X h i g h is a high-resolution feature map and is the mean element-wise product. The maximum softmax value in the feature maps can be considered as the confidence maps of the model for each pixel’s class. By subtracting 1 from the confidence map and the element-wise product from the high-resolution feature, additional information can be provided only to low-confidence regions. This reduces the uncertainty of the model, thereby reducing the typical segmentation errors.

3.2. MobileViT

MobileViT is a type of Vision Transformer (ViT) suitable for low-resource devices such as mobile devices []. ViT, proposed by Dosoviskiy, showed state-of-the-art (SOTA) performance by dividing images into patches and feeding them as inputs to a vanilla transformer []. However, ViT lacks inductive biases compared to CNN. Therefore, it is large-scale dataset-dependent and requires strong regularization. In contrast, MobileViT has the same properties as convolution because it processes local information using a CNN and global information using a transformer. Furthermore, because of this, it has a sufficient capacity to learn visual representations, allowing the model to be lighter and faster. Therefore, MobileViT is suitable for segmentation tasks that require simultaneous handling of local and global information; therefore, we replaced the encoder of SINet with the MobileViT block to improve the performance of skin segmentation. In addition, a gate depthwise convolution feed-forward network (GDFN) was used instead of the simple feed-forward network of MobileViT blocks []. The GDFN is useful for learning local image structures and allows hierarchical models to focus on fine details using the gate mechanism. The GDFN formula is given by Equation (8):
X ^ = W p G a t i n g ( X ) + X , G a t i n g ( X ) = ϕ ( W d W p L N ( X ) ) W d W p L N ( X ) .
W p indicates pointwise convolution. W d represents depthwise convolution. L N indicates layer normalization []. ϕ means gaussian error linear units (GELU) []. In addition, the activation functions of MobileViT and SINet were replaced by GELU.

3.3. Simplified Channel Attention

In this study, SCA was applied to make MobileViT lighter. SCA is an attention mechanism that simplifies Channel Attention (CA) []. The SCA equation is shown in Equation (9):
S C A ( X ) = X W p o o l ( X )
where indicates a channel-wise product. W means convolution. p o o l refers to global average pooling. Through experiments, the authors demonstrated that there was no performance loss in the denoising task compared with CA. In addition, the original MobileViT required the process of unfolding X H × W × C to X U P × N × d , and then folding to the original dimension to apply Attention while maintaining positional information. Here, P = w × h means the size of patch, and N = H × W P means number of patches. However, SCA did not require them, so we removed those processes. A comparison of the computational costs of SINet and MobileViT is presented in Table 2. The proposed model requires 5.04 GMACs and requires the number of 1.09M parameters. The details of the model are listed in Table 3.
Table 2. The computational cost of SINet with MobileViT.
Table 3. The details of the model.

3.4. Xu’s Data Augmentation

Existing skin segmentation methods rely on color information. This can cause performance degradation in real-world environments. To address this problem, Xu proposed a novel data augmentation technique. Xu noted that the color information dependence of deep-learning-based methods is due to the skin color bias of the dataset. Because most datasets are biased towards bright skin tones, the proposed method addresses this bias by modifying hue, saturation, value channels of the image. The hue channel of the images was rotated by 60°, the saturation channels decayed at ratios of (0.8, 0.6, 0.4, 0.2, 0.0), and the value channel changed at ratios of (1.0, 0.8, 0.6, 0.4, 0.2). The authors of demonstrated performance improvements in skin-type and race-group images through experiments using this method. We also experimentally demonstrate that the color information dependence is reduced compared to other methods that do not use this method. Examples of modified images are shown in Figure 5.
Figure 5. Example of Xu’s method: (a) Example images of hue channel rotation; (b) Example images of saturation channel decay; (c) Example of images of value channel change.

4. Experiments

4.1. Implementation Details

The training setting of the proposed model mostly follows that of MobileViT. Similar to SINet’s training, only the encoder part was trained 200 epochs with batch size 8, and then the whole model was trained 100 epochs with batch size 4. The weight of the model was initialized using a truncated normal distribution []. The loss function used the mean of cross entropy and DICE coefficient with reference to the experimental results of the Skinny model. To learn the boundary better, the part that calculates the loss of the model using only the boundary component was added in the same way as SINet. The equation for the loss function is shown in Equation (10):
L o s s = 1 2 i n C E ( y i ,   y i ^ ) + D I C E ( y i , y i ^ ) + λ 2 j k { C E ( y j b ,   y j b ^ ) + D I C E ( y j b , y j b ^ ) }
where n means the number of pixels in the image. y i , y i ^ denote the labels of the i-th pixel and the predicted label, respectively. y j b , y j b ^ denote the labels of the j-th pixel and the predicted label of the boundary component image for the input image, respectively. λ means ratio that control balance of boundary loss term. C E is the cross-entropy loss function and D I C E is the DICE loss function using the DICE coefficient. AdamW was used as the optimizer for the model []. The initial learning rate of the model was 0.0002. The learning rate was increased to 0.002 by five epochs when training only the encoder and by ten epochs when training the entire model, and then lowered to 0.0002 through a cosine annealing schedule []. Finally, an L2 weight decay of 0.01 was used. The model was implemented using PyTorch and NVIDIA RTX 2080Ti GPU (11GB VRAM) graphics cards.

4.2. Datasets

We used the ECU datasets for training and evaluation. The ECU dataset was collected by Edith Cowan University for facial detection and skin segmentation. A total of 4000 color images were obtained. Of these, 1% were obtained through digital cameras, whereas the remainder were collected online between 2002 and 2003. They have tried to secure diversity in various ways. Therefore, images of various skin colors were collected and consisted of images of all exposed skin areas, such as the neck and arms, not only facial skin. The illumination conditions also included images acquired in indoor and outdoor environments. The data used for the training were the same as those used by Tarasiewicz. A total of 1750 images were used for training, 250 for verification, and 2000 for evaluation. At this time, 26,250 images were used for training which increased 15 times due to the data augmentation technique.
Additionally, we used only the Pratheepan dataset for the evaluation. The Pratheepan dataset contains images for skin segmentation randomly collected using Google. The dataset consisted of 32 images of faces with simple backgrounds and 46 images of multiple people with complex backgrounds, totaling 78 images.

4.3. Performance for ECU and Pratheepan Datasets

The precision, Recall, and F-score were used in all the experiments. The evaluation metrics were calculated by averaging the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) for each pixel in the test dataset images. Table 4 shows the performance of the model for the ECU and Pratheepan datasets. Table 5 shows the confusion matrix of our proposed model. Salah’s model has no open code or model weights; thus, no experimental results are available for the ECU dataset. For the ECU dataset, our proposed model showed a better performance than Skinny, the best performing model. For the Pratheepan dataset, Salah’s proposed model performed the best. Our model performed second best. Examples of skin mask images from the Pratheepan dataset are shown in Figure 6.
Table 4. Performance of the model for ECU dataset and Pratheepan dataset.
Table 5. Confusion matrix of our proposed model.
Figure 6. Examples of skin mask image for Pratheepan dataset: (a) Input; (b) Skinny; (c) Salah’s; (d) SINet; (e) Ours.

4.4. Performance on Images Modified by the Xu’s Method

Subsequent experiments used only the ECU dataset and compared the proposed model with Skinny and SINet. Table 6 shows the performance of the experiment, in which the test dataset was modified using Xu’s method. In the case of Skinny, the F-score was approximately 34% compared to when the image was not modified. However, the proposed model only reduces the F-score by approximately 2%. SINet trained with the same dataset did not exhibit significant performance degradation. In addition, Skinny performed poorly on images with modified hues. In contrast, our proposed model showed that the performance was constant for any modification, demonstrating that the model has a low color information dependency owing to Xu’s proposed data augmentation technique. Examples of the experimental results are shown in Figure 7.
Table 6. Performance of the model for modified images.
Figure 7. Examples of result image for modified image by Xu’s method. First, second, and third columns are examples of result images for image modified by hue, saturation, and value, respectively (red: FP, blue: FN): (a) Input; (b) Skinny; (c) SINet; (d) Ours.

4.5. Performance for Gray Scale Images

The performances of the grayscale images are shown in Table 7. The models used in the experiment were not trained using grayscale images. The performance on grayscale images was also higher in our proposed model than in the other models. Example images from the experiment are shown in Figure 8.
Table 7. Performance of the model for gray scale images.
Figure 8. Examples of result image for gray scale image: (a) Input; (b) Skinny; (c) SINet; (d) Ours.

5. Discussion

The proposed model exhibited the best performance for the ECU dataset in the experiment, and the computational cost was 77% lower than that of the previous highest-performance model, Skinny. For the Pratheepan dataset, our model showed a slightly lower performance than that of Salah’s proposed model. However, the proposed model is more efficient because the computational cost is approximately 44% lower, whereas the performance decreases by 7%. Figure 9 shows the relationship between the computational cost and F-score, and the size of the circle is proportional to the number of parameters of the model.
Figure 9. F-score and GMACs for the model’s for Pratheepan dataset. The size of the circle is proportional to the number of parameters.
The proposed model is efficient, with a high performance and low computational cost and parameters. This is useful in applications involving devices with limited resources such as embedded or mobile devices. It can also be used for preprocessing applications that require real-time processing because it has a fast processing speed of 68 fps based on 3 × 512 × 512 images and an NVIDIA RTX 2080TI GPU (11GB VRAM) graphics card.

6. Conclusions

In this study, we propose an efficient MobileViT-based skin segmentation model with low color dependency. The proposed model shows high performance in experiments on the ECU and Pratheepan datasets but requires a lower computational cost and number of parameters than the existing model. In addition, we demonstrate that our proposed model is less dependent on color information, with no significant performance degradation, even in hue, saturation, value-modified, or grayscale images.
The model proposed in this study has a lower computational cost than existing models. However, it does not have sufficient performance improvement compared to the significantly heavier SINet, owing to architectural changes in SINet. In the future, we will study ways to improve this to maintain the performance and make the model more lightweight, similar to SINet.

Author Contributions

Conceptualization, E.C.L. and H.Y.; methodology, H.Y.; software, H.Y.; validation, J.O. and K.L.; formal analysis, H.Y. and J.O.; investigation, K.L.; data curation, H.Y. and J.O.; writing—original draft preparation, H.Y.; writing—review and editing, E.C.L. and K.L.; visualization, H.Y.; supervision, E.C.L.; project administration, E.C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by Field-oriented Technology Development Project for Customs Administration through National Research Foundation of Korea (NRF) funded by the Ministry of Science & ICT and Korea Customs Service (2022M3I1A1095155).

Data Availability Statement

Since our study used public open datasets, the data can be accessed through the website that provides the datasets.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Harsha, B.K. Skin Detection in images based on Pattern Matching Algorithms-A Review. In Proceedings of the International Conference on Inventive Computation Technologies(ICICT), Coimbatore, India, 26–28 February 2020. [Google Scholar]
  2. Pujol, F.A.; Pujol, M.; Jimeno-Morenilla, A.; Pujol, M.J. Face detection based on skin color segmentation using fuzzy entropy. Entropy 2017, 19, 26. [Google Scholar] [CrossRef]
  3. Jalab, H.A.; Omer, H.K. Human computer interface using hand gesture recognition based on neural network. In Proceedings of the National Symposium on Information Technology(NSITNSW), Riyadh, Saudi Arabia, 17–19 February 2015. [Google Scholar]
  4. Casado, C.A.; López, M.B. Face2PPG: An unsupervised pipeline for blood volume pulse extraction from faces. arXiv 2022, arXiv:2202.04101. [Google Scholar]
  5. Scherpf, M.; Emst, H.; Misera, L.; Schmidt, M. Skin Segmentation for Imaging Photoplethysmography Using a Specialized Deep Learning Approach. In Proceedings of the Computing in Cardiology (CinC), Brno, Czech Republic, 13–15 September 2021. [Google Scholar]
  6. De Haan, G.; Jeanne, V. Robust pulse rate from chrominance-based rPPG. IEEE Trans. Biomed. Eng. 2013, 60, 2878–2886. [Google Scholar] [CrossRef] [PubMed]
  7. Naji, S.; Jalab, H.A.; Kareem, S.A. A survey on skin detection in colored images. Artif. Intell. Rev. 2019, 52, 1041–1087. [Google Scholar] [CrossRef]
  8. Phung, S.L.; Bouzerdoum, A.; Chai, D. A novel skin color model in ycbcr color space and its application to human face detection. In Proceedings of the International on Image Processing, Rochester, NY, USA, 22–25 September 2002. [Google Scholar]
  9. Hajraoui, A.; Sabri, M. Face detection algorithm based on skin detection, watershed method and gabor filters. Int. J. Comput. Appl. 2014, 94, 33–39. [Google Scholar] [CrossRef]
  10. Tao, L. An FPGA-based parallel architecture for face detection using mixed color models. arXiv 2014, arXiv:1405.7032. [Google Scholar]
  11. Kolkur, S.; Kalbande, D.; Shimpi, P.; Bapat, C.; Jatakia, J. Human skin detection using RGB, HSV and YCbCr color models. arXiv 2017, arXiv:1708.02694. [Google Scholar]
  12. Störring, M. Computer Vision and Human Skin Colour. Ph.D. Thesis, Aalborg University, Aalborg, Denmark, 2004. [Google Scholar]
  13. Kakumanu, P.; Makrogiannis, S.; Bourbakis, N. A survey of skin-color modeling and detection methods. Pattern Recognit. 2007, 40, 1106–1122. [Google Scholar] [CrossRef]
  14. Tarasiewicz, T.; Nalepa, J.; Kawulok, M. Skinny A lightweight u-net for skin detection and segmentation. In Proceedings of the IEEE International Conference on Image Processing(ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020. [Google Scholar]
  15. Salah, K.B.; Othmani, M.; Kherallah, M. A novel approach for human skin detection using convolutional neural network. Vis. Comput. 2022, 38, 1833–1843. [Google Scholar] [CrossRef]
  16. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015. [Google Scholar]
  17. Phung, S.L.; Bouzerdoum, A.; Chai, D. Skin segmentation using color pixel classification: Analysis and comparison. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 148–154. [Google Scholar] [CrossRef] [PubMed]
  18. Casati, J.P.B.; Moraes, D.R.; Rodrigues, E.L.L. SFA: A human skin image database based on FERET and AR facial images. In Proceedings of the IX Workshop on Computational Vision—WVC 2013, Rio de Janeiro, Brazil, 3–5 June 2013. [Google Scholar]
  19. Kim, Y.; Hwang, I.; Cho, N.I. Convolutional neural networks and training strategies for skin detection. In Proceedings of the IEEE International Conference on Image Processing(ICIP), Beijing, China, 17–20 September 2017. [Google Scholar]
  20. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  21. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–15 June 2015. [Google Scholar]
  22. Tan, W.R.; Chan, C.S.; Yogarajah, P.; Condell, J. A fusion approach for efficient human skin detection. IEEE Trans. Ind. Inform. 2011, 8, 138–147. [Google Scholar] [CrossRef]
  23. Abdallah, A.S.; Bou El-Nasr, M.A.; Abbott, A.L. A new color image database for benchmarking of automatic face detection and human skin segmentation techniques. Int. J. Comput. Inf. Eng. 2007, 1, 3782–3786. [Google Scholar]
  24. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  25. Kawulok, M.; Kawulok, J.; Nalepa, J.; Smolka, B. Self-adaptive algorithm for segmenting skin region. EURASIP J. Adv. Signal Process. 2014, 170. [Google Scholar] [CrossRef]
  26. Park, H.; Siosund, L.; Yoo, Y.; Monet, N.; Bang, J.; Kwak, N. Sinet: Extreme lightweight portrait segmentation networks with spatial squeeze module and information blocking decoder. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020. [Google Scholar]
  27. Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
  28. Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple baselines for image restoration. In Proceedings of the Conference on Computer Vision—ECCV, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
  29. Xu, H.; Sarkar, A.; Abbott, A.L. Color Invariant Skin Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022. [Google Scholar]
  30. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  31. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; p. 30. [Google Scholar]
  32. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022. [Google Scholar]
  33. Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
  34. Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
  35. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  36. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  37. Hanin, B.; Rolnick, D. How to start training: The effect of initialization and architecture. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2018; p. 31. [Google Scholar]
  38. Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
  39. Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.