Machine Learning Methods in Skin Disease Recognition: A Systematic Review

: Skin lesions affect millions of people worldwide. They can be easily recognized based on their typically abnormal texture and color but are difﬁcult to diagnose due to similar symptoms among certain types of lesions. The motivation for this study is to collate and analyze machine learning (ML) applications in skin lesion research, with the goal of encouraging the development of automated systems for skin disease diagnosis. To assist dermatologists in their clinical diagnosis, several skin image datasets have been developed and published online. Such efforts have motivated researchers and medical staff to develop automatic skin diagnosis systems using image segmentation and classiﬁcation processes. This paper summarizes the fundamental steps in skin lesion diagnosis based on papers mainly published since 2013. The applications of ML methods (including traditional ML and deep learning (DL)) in skin disease recognition are reviewed based on their contributions, methods, and achieved results. Such technical analysis is beneﬁcial to the continuing development of reliable and effective computer-aided skin disease diagnosis systems. We believe that more research efforts will lead to the current automatic skin diagnosis studies being used in real clinical settings in the near future.


Introduction
The skin protects the human body from outside hazardous substances and is the largest organ in the human body.Skin diseases can be induced by various causes, such as fungal infections, bacterial infections, allergies, or viruses [1].Skin lesions, as one of the most common diseases, have affected a large population worldwide.
A skin lesion can be easily recognized based on its abnormal texture and/or color, but it can be difficult to accurately diagnose due to similar symptoms occurring with different types of lesions.Figure 1a,b, show that both contact dermatitis and eczema present similarly with redness, swelling, and chapping on a hand.Similarly, as seen in Figure 1c,d, measles and keratosis show red dots that are sporadically distributed on the skin.These similarities lower diagnosis accuracy, particularly for dermatologists with less than two years of clinical experience.To administer appropriate treatment, it is essential to identify skin lesions correctly and as early as possible.Early diagnosis usually leads to a better prognosis and increases the chance of full recovery in most instances.The training of a dermatologist requires several years of clinical experience as well as high education costs.The average tuition fee in medical school was USD 218,792 in 2021 has risen to around USD 1500 each year since 2013 [2].An automated solution for skin disease diagnosis is becoming increasingly urgent.This is especially true in developing countries, which often lack the advanced medical equipment and expertise Artificial intelligence (AI) has made rapid progress in image recognition over the last decade, using a combination of methods such as machine learning (ML) and deep learning (DL).Segmentation and classification models can be developed using traditional ML and DL methods.Traditional ML methods have simpler structures compared to DL methods, which use extracted and selected features as inputs [4].The DL method can automatically discover underlying patterns and identify the most descriptive and salient features in image recognition and processing tasks.This progress has significantly motivated medical workers to explore the potential for the application of AI methods in disease diagnosis, particularly for skin disease diagnosis.Moreover, DL has already demonstrated its capability in this field by achieving similar diagnostic accuracy as dermatologists with 5 years of clinical experience [5].Therefore, DL is considered a potential tool for cost-effective skin health diagnosis.
The success of AI applications heavily relies on the support of big data to ensure reliable performance and generalization ability.To speed up the progress of AI application in skin disease diagnosis, it is essential to establish reliable databases.Researchers from multiple organizations worked together to create the International Skin Imaging Collaboration (ISIC) Dataset for skin disease research and organized challenges from 2016 to 2020 [6].The most common research task is to detect melanoma, a life-threatening skin cancer.According to the World Health Organization (WHO), the number of melanoma cases is expected to increase to 466,914 with associated deaths increasing to 105,904 by 2040 [7].
Several additional datasets with macroscopic or dermoscopic images are publicly accessible on the internet, such as the PH2 dataset [8], the Human Against Machine with 10,000 training images (HAM 10,000) dataset [9], the BCN 20,000 dataset [10], the Interactive Atlas of Dermoscopy (EDRA) dataset [11], and the Med-Node dataset [12].Such datasets can be used individually, partially, or combined based on the specific tasks under investigation.
Macroscopic or close-up images of skin lesions on the human body are often taken using common digital cameras or smartphone cameras.Dermoscopic images, on the other hand, are collected using a standard procedure in preliminary diagnosis and produce images containing fewer artifacts and more detailed features, which can visualize the skin's inner layer (invisible to the naked eye).Although images may vary with shooting distance, illumination conditions, and camera resolution, dermoscopic images are more likely to achieve higher performance in segmentation and classification than macroscopic images [13].
To conduct a rigorous and thorough review of skin lesion detection studies, we utilized a systematic search strategy that involved the PubMed database-an authoritative, widelyused resource for accessing biomedical and life sciences literature.Our analysis primarily focused on studies published after 2013 that explored the potential applications of ML in skin lesion diagnosis.To facilitate this analysis, we sourced high-quality datasets from reputable educational and medical organizations.These datasets served as valuable tools not only for developing and testing new diagnostic algorithms but also for educating the broader public on skin health and the importance of early detection.By utilizing these datasets in our research, we gained a deeper understanding of the current state-of-the-art in skin lesion detection and identified areas for future research and innovation.
We discuss the most relevant skin recognition studies, including datasets, image preprocessing, up-to-date traditional ML and DL applications in segmentation and classification, and challenges based on the original research or review papers from journals and scientific conferences in ScienceDirect, IEEE, and SpringerLink databases.The common schematic diagram of the automated skin image diagnosis procedure is shown in Figure 2.

Skin Lesion Datasets
To develop skin diagnosis models, various datasets of different sizes and skin lesion types have been created by educational institutions and medical organizations.These datasets can serve as platforms to educate the general public and as tools to test newly developed diagnosis algorithms.
The first dermoscopic image dataset, PH2, had 200 carefully selected images with segmented lesions, as well as clinical and histological diagnosis records.The Med-Node dataset contains a total of 170 macroscopic images, including 70 melanoma and 100 nevus images, collected by the Department of Dermatology, University Medical Center Groningen.Due to the limited image quantity and classes, these labeled datasets are often used together with others in the development of diagnosis models.
The most popular skin disease detection dataset, ISIC Archive, was created through the collaboration of 30 academic centers and companies worldwide with the aim of improving early melanoma diagnosis and reducing related deaths and unnecessary biopsies.This dataset contained 71,066 images with 24 skin disease classes [3], but only 11 of the 24 classes had over 100 images.The images were organized by diagnostic attributes, such as benign, malignant, and disease classes, and clinical attributes, such as patient and skin lesion information, and image type.Due to the diversity and quantity of images contained, this dataset was used to implement the ISIC challenges from 2016-2020 and dramatically contributed to the development of automatic lesion segmentation, lesion attribute detection, and disease classification [3].
The first ISIC 2016 dataset had 900 images for training and 350 images for testing under 2 classes: melanoma and benign.The dataset gradually increased to cover more disease classes from 2017 to 2020.Images in ISIC 2016-2017 were fully paired with the disease annotation from experts, as well as the ground truth of the skin lesion in the form of binary masks.The unique information included in the ISIC dataset is the diameter of each skin lesion, which can help to clarify the stage of melanoma.In the ISIC 2018 challenge, the labeled HAM 10,000 dataset served as the training set, which included 10,015 training images with uneven distribution from 7 classes.The BCN 20,000 dataset was utilized as labeled samples in the ISIC 2019 and ISIC 2020 challenge, which consisted of 19,424 dermoscopic images from 9 classes.It is worth noting that not all ISIC datasets were completely labeled in the ISIC 2018-2020 datasets.
To prompt research into diverse skin diseases, images from more disease categories have been (and continue to be) collected.Dermofit has 1300 macroscopic skin lesion images with corresponding segmented masks over 10 classes of diseases, which were captured by a camera under standardized conditions for quality control [14].DermNet covers 23 classes of skin disease with 21,844 clinical images [11].For precise diagnosis, melanoma and nevus can be further refined into several subtypes.For example, the EDRA dataset contains only 1011 dermoscopic images from 20 specific categories, including 8 categories of nevus, 6 categories of melanoma, and 6 other skin diseases.The images under melanoma were further divided into melanoma, melanoma (in situ), melanoma (less than 0.76 mm), melanoma (0.76 to 1.5 mm), melanoma (more than 1.5 mm), and melanoma metastasis.
Table 1 summarizes these datasets according to image number, disease categories, number of labeled images, and binary segmentation mask inclusion.It can be seen that the ISIC Archive is the largest public repository with expert annotation from 25 types of diseases, and PH2 and Med-Node are smaller datasets with a focus on distinguishing between nevus and melanoma.Developing a representative dataset for accurate skin diagnosis requires tremendous manpower and time.Hence, it is common to see that one dataset is utilized for multiple diagnosis tasks.For example, the ISIC datasets can be used to identify melanoma or classify diverse skin lesions, while some types of skin lesions have more representative samples than others.Such an uneven data distribution may hinder the ability of developed diagnostic models to handle a desired variety of tasks.
There are no clear guidelines or criteria on the dataset selection for a specific diagnosis task; moreover, there is little discussion on the required size and distribution of a dataset.A small dataset may lead to insufficient model learning, and a large dataset may result in a high computational workload.The trade-off between the size and distribution of the dataset and the accuracy of the diagnosis model should be addressed.On the other hand, it is hard to identify diseases by a diagnosis model for unseen diseases, or diseases with similar symptoms.These intricate factors may restrict the accuracy and, hence, adoption of diagnosis models developed using publicly available datasets in clinical diagnoses.

Image Preprocessing
Hair, illumination, circles, and black frames can often be found in images in the ISIC datasets, as seen in Figure 2. Their presence lowers image quality and, therefore, hinders subsequent analysis.To alleviate these factors, image pre-processing is utilized, which can involve any combination or all of the following: resizing, grayscale conversion, noise removal, and contrast enhancement.These methods generally improve the diagnosis efficiency and effectiveness, as well as segmentation and classification processes.
Image resizing is a general go-to solution for managing images with varied sizes.With this method, most images can be effectively resized to similar dimensions.Moreover, DL methods can handle images with a broad range of sizes.For example, CNNs have translation invariance, which means that the trained model can accommodate varied image sizes without sacrificing performance [20].Thus, researchers can confidently tackle skin classification and segmentation tasks under diverse image scales.Resizing images involves cropping irrelevant parts of images and standardizing the pixel count before extracting the region of interest (ROI).This can shorten the computational time and simplify the following diagnosis tasks.The resized images are often converted to grayscale images using grayscale conversion to minimize the influence caused by different skin colors [21].To eliminate various noise artifacts, as shown in Figure 3, various filters can be introduced [22].A monomorphic filter can normalize image brightness and increase contrast for ROI determination.Hair removal filters are selected according to hair characteristics.A Gaussian filter, average filter, and median filter can be utilized to remove thin hairs, while an inpainting method can be used for thick hairs and a Dull Razor method for more hair [23,24].
Some ISIC images have colored circles, as shown in Figure 3c, which can be detected and then refilled with healthy skin.The black framing seen in the corners of Figure 3d is often replaced using morphological region filling.Following noise removal, contrast enhancement is achieved for the ROI determination using a bottom-hat filter and contrastlimited adaptive histogram equalization.Face parsing is another commonly used technique to remove the eyes, eyebrows, nose, and mouth as part of the pre-processing process prior to facial skin lesion analysis [25].

Segmentation and Classification Evaluation Metrics
The performance of skin lesion segmentation and classification can be evaluated by precision, recall (sensitivity), specificity, accuracy, and F1-score, which are from Equation (1) to Equation ( 5) [26].
where N TP , N TN , N FP , and N FN are defined as the number of true-positive samples under each class, true-negative samples under each class, false-positive samples under each class (pixels with wrongly detected class), and false-negative samples (pixels wrongly detected as other classes), respectively.Other metrics for classification performance measurements include AUC and IoU.AUC stands for the area under the ROC (receiver operating characteristic) curve, and a higher AUC refers to a better distinguishing capability.Intersection over union (IoU) is a good metric for measuring the overlap between two bounding boxes or masks [27].Table 2 provides a summary of the segmentation and classification tasks, DL methods, and the corresponding evaluation metrics.

Skin Lesion Segmentation Methods
Segmenting the ROI is a crucial step in the diagnosis of skin lesions, directly impacting the informative feature extraction process and accurate classification.

Traditional Segmentation Methods
Traditional image segmentation methods use pixel, region, and edge-based approaches to extract skin lesions from images.Pixel-based segmentation methods, such as binary or Otsu thresholding, can outline each pixel into two categories (i.e., healthy skin or skin lesion).However, these methods can generate discontinuous results, particularly from dermoscopic images, mainly due to low contrast and smooth transitions between lesions and healthy skin.Region-based segmentation can identify and combine adjacent pixels to form skin lesion regions by merging or growing regions.Merging combines adjacent pixels with similar intensity together, while growing starts from a point and checks nearby pixels to expand region coverage [37,38].The difficulties faced in implementing these methods lie in the variety of colors and textures of individual skin lesions.Edge-based segmentation methods, such as the watershed algorithm [39], utilize intensity changes between adjacent pixels to outline the boundary of a skin lesion.These methods are susceptible to noise, such as hair, skin texture, and air bubbles, which can lead to convergence, especially around noisy points, and produce erroneous segmentation results.In general, traditional segmentation methods often struggle to achieve accurate results when segmenting images with noise, low contrast, and varied color and texture.
Neural networks (NNs), evolutionary computation, and fuzzy logic have been proposed as methods to segment the ROI based on learning, natural evolution, and human reasoning.These methods can be used individually or in combination to achieve better performance [22].For example, the fuzzy method has been applied with both splitting and merging techniques to segment dermoscopic images [40].This combination generated unsupervised perceptual segmentation using fused color and texture features.Later, Devi developed an automatic skin lesion segmentation system using fuzzy c-means (FCM) clustering together with histogram properties [41].The histogram property was used to select the number of clusters, and the color channel for FCM clustering (hue, saturation, value) was determined based on individual entropy.This system can effectively segment the lesion regions from normal skin automatically with an accuracy of 95.69% when compared with traditional methods.Moreover, there is still much space for advancement in terms of accuracy.

DL Skin Lesion Segmentation Methods
DL approaches are capable of learning hierarchical features and extracting highlevel and powerful characteristics from images.Both supervised and unsupervised DL approaches are used for effective computational segmentation.For example, Vesal proposed a supervised method called SkinNet, a modified version of U-Net, to segment skin lesions in the ISBI 2017 dataset, which is a subset of the ISIC archive that contains a larger number of images from seborrheic keratosis, benign nevus, and melanoma [28].Al-Masni proposed a full-resolution convolutional network (FrCN) for skin lesion segmentation [29].This method can directly learn the full-resolution features of each individual pixel of the input data without pre-or post-processing.The segmented skin lesions with fine boundaries were reported when compared with the results from other DL approaches, such as the fully convolutional network, U-Net, and SegNet under the same conditions and with the same datasets.
Multiple DL methods can be combined to search for better solutions.Ünver combined the YOLO and GrabCut algorithms to detect lesion locations using the PH2 and ISBI 2017 datasets [30].This method was capable of processing dimension-independent images, leading to high-resolution segmentation results.A hybrid DL approach named the Grasshopper Optimization Algorithm (GOA) based on K-means was used to extract the ROI from dermoscopic images following hair removal and contrast enhancement [31].This hybrid approach achieved better segmentation accuracy when compared with traditional K-means for the ISIC 2017, ISIC 2018, and PH2 datasets.
Due to the tedious process required to obtain pixel-level annotations in skin lesion images, Ali compared the performance of the supervised U-Net and unsupervised methods using F1 scores and IoU [32].As expected, the supervised method showed much better segmentation accuracy, while the unsupervised method was reported as a good approach when facing a shortage of image annotations.
To balance segmentation performance and annotation workload, a semi-supervised segmentation method was developed.It applied a combined CNN/Transformer model followed by a decoder module to learn a semantic segmentation map and enrich the encoder block by learning an auxiliary task during training [20].The model's performance demonstrated better results when compared with the unsupervised methods in segmenting images from the ISIC 2017, ISIC 2018, and PH2 datasets.

Skin Lesion Classification
Generally, classification tasks in skin lesion analysis can be categorized into binary classification, which aims to identify melanoma for early treatment, and multi-class classification, which is used to identify a variety of skin diseases.Different skin lesions may exhibit similar sizes, textures, colors, and shapes, and there can be significant levels of correlation between skin lesions, which often poses challenges in classification.Both traditional ML methods and DL methods can be used; however, the former requires feature extraction and selection.

Feature Extraction and Selection
Feature extraction and selection aim to identify an optimal feature set with excellent discrimination capability in this case to classify skin lesion images.Expert knowledge and clinical experience are preferred, which may effectively and efficiently guide feature extraction and selection.Image processing algorithms can generate features useful for classification, but the effectiveness of extracted features relies on the disease symptoms and diagnosis tasks at hand.To yield accurate melanoma diagnosis, various methods are explored to extract color, texture, edge, and shape features.In the popular ABCD rule, features are selected based on the knowledge that melanoma has an asymmetrical shape, an irregular border, multiple colors and shapes, and a diameter larger than 6 mm.Specifically, A stands for asymmetry, B stands for border, C stands for color, and D stands for diameter [42].Similar features are extracted using the CASH mnemonic, i.e., color, architecture, symmetry, and homogeneity [43].In addition, a three-point checklist method was proposed based on the asymmetry of color and structure, atypical network, and bluewhite structures observed in melanoma diagnosis [44].This method was further extended to a seven-point checklist by adding the size of the lesion, itch, or altered sensation of the lesions [45].
The color features can be extracted from the HSV space for hue, saturation, and lightness, the YCbCr space for illumination, and the blue-difference and red-difference chroma and grayscale space.Texture features extracted from grayscale images can reflect the surface roughness of the lesions, such as through the gray-level co-occurrence matrix (GLCM), which includes contrast, correlation, energy, and homogeneity [46], and the neighborhood gray-tone difference matrix, which includes coarseness, busyness, complexity, and texture strength [47].Additionally, the histogram of oriented gradients (HOG) can be adopted to judge the shape and edge of skin lesions using uniformity or regularity [48].
From the above discussion above, it can be seen that many image features to be extracted are determined from expert knowledge and practical experience, but they may not be all-encompassing in representing skin lesions.The same feature set may not perform well under different tasks, and the effectiveness of such features should be carefully chosen for each specific task [49].For example, the features extracted using the ABCD rule may yield high accuracy in melanoma detection but not in multiple skin disease classification.
Moreover, the existence of redundant or noisy information can produce irrelevant features that degrade overall diagnostic accuracy.Therefore, it is necessary to select representative feature subsets based on the research tasks and available data and eventually build a classification model with fast training, better performance, and lower complexity.

DL-Based Feature Extract and Selection Methods
To reduce feature dimensions while preserving the original feature space information, researchers have attempted to explore DL models for feature extraction, such as VGG16, VGG19, and Inception V3 [33].The extracted features are used as inputs of NNs to classify moles as benign or malignant.The features extracted by the Inception V3 model gave the best accuracy compared to the other two VGG networks for the same classification method.DL model-based feature extraction has also been used in multi-class classification.For example, Kassem used the original fully connected layers with GoogLeNet architecture to extract features and fed them into an SVM for an eight-category classification [34].
DL has also been used in feature selection, which integrates deep feature information to generate the most discriminant feature vector for the classification task.For example, the pre-trained ResNet-50 and ResNet-01 were utilized for feature extraction and then the kurtosis-controlled maximal feature (KcMF) approach was utilized for prominent feature selection [35]).The top 70% of features from the two ResNet models were fused and fed into an SVM for melanoma diagnosis using three datasets HAM 10,000, ISBI 2017, and ISBI 2016.To recognize skin cancer, Khan combined a DenseNet pre-trained convolutional neural network (CNN) model for deep feature extraction and an iterationcontrolled Newton Raphson (IcNR) method for feature selection [36].For effective feature selection, a framework with the entropy-controlled neighborhood component analysis was proposed to select principle features and extricate redundant features [50].The effectiveness of this framework was validated on four benchmark datasets, including PH2, ISIC MSK, ISIC UDA, and ISBI-2017.Similarly, Amin fused deep features extracted from the pre-trained AlexNet and VGG16 and then applied principal component analysis (PCA) for optimal feature selection when classifying skin images into benign malignant [51].
The classification results vary with combinations of feature sets and classifiers.Hegde combined color and texture features in three ways and fed them into four classification models [52], the artificial neural network (ANN), linear discriminant analysis (LDA), SVM, and Naïve Bayes.The extracted features were derived from the ABCD rule, HOG, and GLCM, which were then compared individually and in a hybrid way.The LDA classifier using color features and the SVM classifier using texture features gave the best accuracy in both binary and multi-class classification.The features obtained from the application of the ABCD rule were more informative than the others when applied individually.Among the combined texture and color features, the combination of HOG and GLCM was the best choice when classifying using either LDA or SVM [52].
With very limited training data, more representative features were generated using a deep CNN and feature encoding strategy [53].The generated features could deal with large variations within melanoma classes, as well as small variations between melanoma and non-melanoma classes in this classification task.

Traditional ML Models for Skin Disease Classification
The traditional ML methods shown in Figure 4 use extracted and selected features as inputs for classification tasks.After feature selection, traditional ML models such as the SVM, Naïve Bayes (NB) classifier, and K-nearest neighbor (KNN) are widely used to classify skin diseases or identify melanoma as opposed to pigmentation.These traditional ML algorithms with standard structures can generally produce good results in terms of both precision and accuracy [54].The SVM can categorize skin lesions by creating a decision plane to separate different classes based on extracted color, texture, shape, and edge features.The K-nearest neighbor (KNN) differentiates normal skin and skin lesions by comparing the similarity of the input features.Tree-structured classification methods, such as decision tree (DT) and random forest (RF), can handle non-numeric features from the ABCD rule, such as an asymmetrical shape and an irregular border.The NB can classify skin lesion images into their corresponding disease categories based on the highest probability [54].
To compare multi-class categorization performance, Hameed used two strategies to classify skin images [55].One is a three-category classification: healthy, inflammatory diseases (acne, psoriasis, and eczema), and non-inflammatory diseases (benign and malignant).The other is a six-category classification: healthy, acne, eczema, psoriasis, benign, and malignant.DT, SVM, K-NN, and ensemble classifiers with different kernels were applied to the two classification strategies.The result showed that the classification accu-racy decreased for all classifiers when the number of categorized classes increased, and quadratic SVM achieved the highest classification accuracy in both strategies.For the same identification task, Hameed also compared the performance of multi-level multi-class classification and single-level multi-class classification [23].The multi-level multi-class ML classification system achieved better classification accuracy, implemented using both traditional ML and DL.
In summary, the research activities in traditional ML classification models focus on the relevant parameter optimization and input feature selection.Researchers continue to explore the combination of ML algorithms, parameters, and extracted features.Although these activities have been addressed in several studies and successful applications have been made, there is potential to develop new methodologies and improve classification performance.

Deep Learning Models for Skin Disease Classification
DL has become more sophisticated in its use for image recognition and has achieved increasingly good outcomes [56][57][58][59].As a subcategory of ML, it can automatically discover underlying patterns and identify the most descriptive and salient features for image binary and multi-class classifications, as described by Figure 5. DL methods are able to process raw image data directly, without the need for a feature preparation step, although they require higher computational costs.Researchers have compared the performance of various DL methods to balance computational cost and classification performance.For example, Ali et al. reported that the DCNN model achieved better performance with less computation time in skin lesion recognition using the HAM 10,000 dataset, compared with the ResNet, AlexNet, MobileNet, VGG-16, and DenseNet models [60].Additionally, modified DL model architectures have been adopted to enhance multi-class skin lesion classification [34].One method is to replace the last three layers of the GoogLeNet architecture with a fully-connected SoftMax and a classification output layer for the eight-class classification task.The same task can also be achieved by using the original fully connected layers of the GoogLeNet architecture to extract features and feed them into an SVM.The former achieved better multi-class classification performance for the imbalanced data distribution of the ISIC 2019 dataset.
To take advantage of multiple DL methods, hybrid networks are proposed to diagnose skin diseases.Ahmed combined two CNN architectures, i.e., ResNet and Inception V3 [61], and classified skin lesions into seven types.This ensemble network achieved better accuracy and precision compared to an individual network for the ISIC 2018 dataset.To detect different perspectives of skin lesions, multiple classifiers are used and each classifier focuses on learning one specific aspect.The outputs of these classifiers can be combined to determine the final decision.For example, a two-level system with multiple classifiers was used to exploit the different characteristics of melanoma [62].The first layer included five individual classifiers: a perceptron combined with color local binary patterns, a perceptron combined with color HOG, one GAN for segmentation coupled with the classic ABCD rule, and two end-to-end individual CNNs (ResNet and AlexNet).A final perceptron was designed to combine these classifiers for melanoma diagnosis.The result exceeded the performance of the individual classifiers when testing with the ISIC 2019 and PH2 databases.
Different loss functions have been explored for the precise detection of melanoma.Since the size of the skin lesion is critical to identifying the disease stage, in two-stage and melanoma cancer classifications, Patil combined the CNN architecture with various loss functions, including hinge, loss KL, loss MSE, loss cosine, loss cross-entropy, and similarity measure for text processing (SMTP) [63].Eventually, an improved CNN architecture with SMTP loss function was chosen due to its better classification performance and efficiency.
The loss function choice can significantly impact the performance of DL models.In image segmentation tasks, commonly used loss functions include (a) Dice loss, which measures the overlap between predicted and ground truth segmentations [64,65], (b) Jaccard loss, which calculates the ratio of intersection over the union of the two sets [66], (c) binary cross-entropy loss, which measures the distance between predicted and true binary masks in binary segmentation [67], and (d) categorical cross-entropy loss for multi-class segmentation under several object classes [65].In classification tasks, common loss functions include (1) binary cross-entropy loss, which measures the probability distribution over the two classes [68], (2) categorical cross-entropy loss for multi-class classification [68], (3) focal loss, which addresses class imbalance problems by giving more weight to misclassified examples of the minority class, and (4) label smoothing loss, which regularizes the output of the network by adding noise to the ground truth labels.Overall, the loss function should be selected based on the nature of the task and the characteristics of the datasets, which is often an important aspect of designing and training DL models.
As discussed in Section 2.1, some skin lesion datasets are unlabeled, which requires more effective learning rules.A convolutional spiking NN with an unsupervised spike timing-dependent plasticity (STDP) learning rule was developed to distinguish melanoma from melanocytic nevi [69].As an unsupervised process, this method can learn from small training datasets effectively without the obvious negative influence of overfitting.
In short, research on DL-based skin lesion classification is moving towards exploring available methods, developing new ones, and combining different DL techniques.These efforts include network structure design, layer connection, loss function application, and activation function design.For datasets with large numbers of images per class, DL performs better than traditional ML, and even for datasets with few images, DL can overcome this issue through data augmentation.
There is no solid evidence that DL methods always outperform ML methods in skin disease diagnosis.ML can outperform DL models where representative features are well extracted in the recognition tasks.For example, Alsaade et al. reported that the performance of an ANN model was superior to that of the AlexNet and ResNet50 models when tested on the ISIC 2018 and PH2 datasets [70].In this study, 216 features were retrieved using the local binary pattern and GLCM, and then fed into the ANN algorithm to detect skin diseases.

Current Status, Challenges, and Outlook
Automatic skin lesion image segmentation and classification is a well-established and continuously expanding area of research focus and interest.Its aim is to provide systematic solutions for quantitatively analyzing images, rapidly detecting ROI areas, and reliably diagnosing diseases.With the advancement of image processing methods, these studies offer the potential for developing online diagnosis platforms.However, several challenges and concerns must be addressed before further exploitation.

Current Research Publication Status
To explain the current research status in skin lesion detection, we conducted a search on the PubMed database on 7 February 2022; this is a free resource that supports the search and retrieval of biomedical and life sciences literature, which can be found in the supplementary material.The research papers were searched using the keywords "skin lesion classification", "skin lesion segmentation", "skin lesion detection", and "skin lesion recognition", while removing papers without "skin" in the title.The search results, shown in Table 3, indicate a dramatic increase in skin diagnosis publications from 2020, with the of publications reaching 300 in 2022.Melanoma and skin cancer are identified as the principal areas of focus in skin lesion recognition and have been discussed in 246 papers.From the collected publications, ML and DL applications in skin disease recognition were discussed in 266 papers as shown in the supplementary document.Out of these, 224 papers mentioned DL application in skin lesion diagnosis, while the remaining 42 research papers used ML methods.The other 28 papers discussed AI applications in skin diagnosis.Thus, the preference of the methodologies adopted by researchers is DL, ML, and other AI methods.The most popular DL method reported is the CNN, which appeared in 43 papers.This coincides with the CNN contribution to image recognition tasks.An interesting observation is the significant number of transfer learning methods found in recent publications, indicating a research trend in skin image diagnosis [71].As expected, dermoscopic images were the most popular image used in melanoma and skin lesion identification processing.

Macroscopic Images with Robust Diagnosis
Different characteristics of macroscopic and dermoscopic images require particular rules and methods in skin lesion diagnosis, and a direct inference between the two types of images has not been established yet.From the published datasets, one may conclude that dermoscopic images are more commonly collected and, therefore, used more often in computational diagnosis with better segmentation and classification performance.The diagnosis performance of macroscopic or close-up images is not so robust due to the limited number of available images and inconsistent image quality.However, the growing usage of smartphones makes the captured macroscopic images more convenient for online skin lesion diagnosis.To advance recognition model development in skin diagnosis, images collected using smartphones should coincide with the clinician's views during skin lesion examination, and the related features from those images should comprehensively support the diagnosis.Moreover, ethical issues in the dataset collection process should be taken into consideration, especially regarding facial images.Finally, recognition model development should not only target well-known diseases but also progressive or unknown diseases.

Racial and Geographical Biases in Public Datasets
Advances in automatic diagnostics are largely driven by the use of datasets with large repositories of digital images.Public datasets are a compilation of data accessible through an online platform, typically made available by government agencies, academic institutions, or private organizations.Due to the challenges associated with gathering skin lesion images, utilizing publicly available datasets is frequently a more practical alternative to collaborating with medical institutions to obtain such datasets.Public datasets such as the ISIC contain skin disease images mostly from subjects with light-colored skin, collected in the USA, Europe, and Australia.The current recognition models are trained using these unequal distribution datasets.While such recognition models may have outstanding capabilities to recognize lesions on subjects with light-colored skin, their effectiveness is in question when dealing with skin lesion images from subjects from other geographical regions.Moreover, the prevalence and characteristics of skin disease vary with racial and ethnic groups.Traditional ML and DL models fail to offer correct recognition when a test image is from an under-represented skin color group and/or lesion type.More balanced datasets are needed, with clinical data based on gender, age, skin type, and race.This is critical when using AI diagnosis to improve in rural areas and increase global access to specialist expertise.

Dataset Characteristics and DL Methods
Along with the rapid development of DL methods, skin lesion diagnosis is moving from an expertise-oriented approach towards a computational intelligence-based approach.When the training datasets are correctly labeled, most DL methods applied for skin lesion analysis are supervised.However, the number of labeled images is not always sufficient for classification model building.Thus, semi-supervised DL methods and weakly-supervised DL methods may be good alternative choices, although they may not achieve comparable performance to supervised methods due to incomplete, inexact, and inaccurate labeled skin lesion datasets.However, in some cases, semi-supervised DL methods can be trained using incompletely labeled datasets, and weakly-supervised methods can handle inaccurate and inexact labeled datasets.Hence, it may be possible to adopt unsupervised DL methods in the cases of unlabeled skin disease images.For example, GAN-based unsupervised methods can learn relatively consistent patterns from a large scale of data without expensive annotation and personal bias [58,72].Some efforts have been reported on the use of unsupervised methods in biomedical image processing [4].
Meanwhile, unsupervised domain adaptation (UDA) is also a good alternative to reduce the requirement of annotation by transferring the knowledge on labeled data to unlabeled data.Few studies have been reported using fully unlabeled datasets in skin disease recognition, which is likely due to the expected task difficulty and low model performance [71].
Most researchers working on the diagnosis method development are from computer science backgrounds and do not have a fundamental clinical understanding of the diagnosis domain.As such, the current ML and DL models are developed using publicly accessible datasets, which may not be suitable for analyzing real clinical data collected from daily consultations.In other words, the effectiveness of the developed diagnosis models needs to be verified against further independent data not currently available.In addition, future diagnosis frameworks should consider user requirements from the perspectives of both doctors and patients.

Conclusions
AI-based skin lesion diagnosis is an increasingly attractive research area, which has been largely driven by the availability of appropriate methods and continually updated abundant datasets.Although relevant topics have been addressed over the last decade, there are still many aspects for investigation and room for improvement.
This paper reviews public skin lesion datasets, the applied image preprocessing methods, and the subsequent skin lesion segmentation and classification methods.The current status, challenges, and outlook in ML-driven skin disease diagnosis are also discussed.Such studies can empower the development of advanced concepts and methodologies.In conclusion, future trends regarding image segmentation and classification of skin lesions require the development of more comprehensive datasets, investigation of more robust models, particularly for macroscopic image recognition, and methods for increasingly reliable automated diagnosis.

Figure 2 .
Figure 2. Schematic diagram of skin image diagnosis.

Figure 3 .
Figure 3. Images with various types of noise from the ISIC archive.

Table 1 .
Summary of publicly available skin lesion datasets.

Table 3 .
Number of publications for skin diagnosis.