Machine Learning and Deep Learning Methods for Skin Lesion Classification and Diagnosis: A Systematic Review

Computer-aided systems for skin lesion diagnosis is a growing area of research. Recently, researchers have shown an increasing interest in developing computer-aided diagnosis systems. This paper aims to review, synthesize and evaluate the quality of evidence for the diagnostic accuracy of computer-aided systems. This study discusses the papers published in the last five years in ScienceDirect, IEEE, and SpringerLink databases. It includes 53 articles using traditional machine learning methods and 49 articles using deep learning methods. The studies are compared based on their contributions, the methods used and the achieved results. The work identified the main challenges of evaluating skin lesion segmentation and classification methods such as small datasets, ad hoc image selection and racial bias.


Introduction
The annual incidence of melanoma cases has increased by 53%. This is due in part to increased ultraviolet (UV) exposure [1]. Despite the fact that melanoma is one of the deadliest types of skin cancer, early identification can lead to a high chance of survival.
Cancer develops when cells in the body begin to proliferate uncontrollably. Metastasizing means that cancerous cells may form in practically any place of the body and spread [2]. In this regard, the uncontrolled proliferation of abnormal skin cells is referred to as skin cancer. Uncorrected DNA damage to skin cells, most typically produced by UV radiation from the sun or tanning beds, creates mutations, or genetic flaws, that cause skin cells to reproduce rapidly and produce malignant tumors.
There are several varieties of benign and malignant melanomas that make the diagnosis of skin lesions complex. Squamous Cell Carcinoma (SCC), Basal Cell Carcinoma (BSC), and melanoma are major forms of irregular skin cells seen in clinical practice [3]. Further, the Skin Cancer Foundation (SCF) [4] distinguishes three less common types of abnormal cells, namely Merkel cell carcinoma, Actinic Keratosis (AKIEC), and Atypical moles. The six forms of skin lesions are depicted in Figure 1. The second most harmful cells are atypical moles after melanoma cases. According to SCF [4], the following are the distinctions between abnormal tissues:

1.
Actinic Keratosis (AKIEC) or solar keratosis: This is a form of keratosis that occurs on the skin. It is a crusty, scaly growth on the skin. It is classified as pre-cancer because it has the potential to turn into skin cancer if left untreated. The dominating cause of such skin cancer forms is skin tissue damage caused by UV radiation [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]. A dermatologist's visual examination is a common clinical procedure for melanoma diagnosis [18]. The precision of the clinical diagnosis is somewhat deceptive [19]. Dermoscopy is a non-invasive diagnostic method that interconnects clinical dermatology with dermatology by enabling the display of morphological characteristics which are not discernible using a naked eye examination. The morphological details visualized can significantly be improved with different techniques such as solar scans [20], microscopy of the Epiluminescence (ELM), Cross-polarization epiluminescence (XLM), and side transillumination [21][22][23][24]. Therefore, the dermatologist receives further diagnostic criteria. Dermoscopy improves diagnostic performance by 10-30% relative to a non-discrete eye [25][26][27][28]. Nevertheless, [29][30][31] reported that the diagnostic accuracy of the dermoscopy was decreased with novice dermatologists in contrast with expert dermatologists because this process requires a great deal of experience to identify lesions [32].
According to ref. [33] professional dermatologists have achieved 90% sensitivity and 59% specificity in the identification of skin lesions. Around the same time, the statistics for less qualified doctors indicated a significant decline for general practitioners of about 62-63%.
A visual inspection by a dermatologist of the suspicious skin region is the first step in diagnosing a malignant lesion. A correct diagnosis is essential because certain types of lesions have similarities; moreover, the accuracy of the Computer-Aided System (CAD) is close to the experienced dermatologist's diagnosis [34][35][36]. Without the use of technology, dermatologists can diagnose melanoma with a 65-80% accuracy rate [37]. For suspect cases, a dermatoscopic image is taken using a very high-resolution camera to complete the visual examination. The lighting is controlled and a filter is used during the recording to reduce skin reflections and thus visualize the deeper layers of skin. This technical assistance led to a 49% improvement in skin lesion diagnosis [31]. Ultimately, the combination of a visual examination and dermatoscopy images led to an absolute accuracy of 75-84% for melanoma detection [38,39].
Classifying lesions of the skin has been an aim of the machine learning community for some time. Automated classification of lesions is used in clinical examination to help physicians and allow rapid and affordable access to lifesaving diagnoses [40], and outside of the hospital environment, smartphone apps have been used [41]. Before 2016, most research adopted the traditional machine learning workflow of preprocessing (enhancement), segmentation, feature extraction, and classification [41][42][43]. These phases are explained in the following section:

1.
Image enhancement: This phase aims to eliminate all noise and artifacts such as hair and blood vessels in dermoscopic images; 2.
Segmentation: Segmenting the Region of Interest (ROI) is a crucial step in CAD systems. The process of segmenting skin cancer images is made more complex by a large number of different skin lesions. It quickly became one of the most complex and tedious tasks in the CAD system; 3.
Feature extraction: After defining the ROI, the goal of the feature extraction step is to identify the best set of features that have high discrimination capability to classify the dataset into two or more classes; 4.
Classification and detection: The proposed system is evaluated according to its capability to classify the dataset into different classes. Hence, the choice of classifier is critical for a better performance. However, it depends on the set of extracted feature and the required number of classes. The classification performance measures are accuracy, specificity, sensitivity, precision, and Receiver Operating Characteristic (ROC).
The need for high-performance CAD systems is essential for lesion detection and diagnosis. Feature selection is a crucial task for CAD system development. The choice of appropriate features took a long time for the automatic recognition of pigmented skin lesion images in 1987 [44]. In the same manner, errors and data loss have a significant influence on the classification rate. For example, an inaccurate segmentation result often results in poor outcomes in feature extraction and, thus, low accuracy in the classification. Machine vision and computer analysis are becoming more critical to produce a successful automatic melanoma diagnosing system [45][46][47][48][49][50]. An accurate CAD system would help doctors and dermatologists to make better and more dependable diagnoses.
Many CAD systems have been identified using different border detection, extraction, selection, and classification algorithms. Some studies [51][52][53][54][55][56][57][58] have proposed the study and analysis of image processing techniques to diagnose skin cancer; moreover, they compared Artificial Intelligence (AI) and CAD system performances against the diagnostic accuracy of experienced dermatologists. However, further work is required to define and reduce ambiguity in automated decision support systems to enhance diagnostic accuracy. There is no comprehensive and up-to-date review of the automatic skin lesion diagnostic model. The constant development, in recent years, of new dermoscopic research classification algorithms and techniques would benefit from such a study.

Systematic Review
We have looked for systematic reviews and original research papers written in English in the ScienceDirect, IEEE, and SpringerLink databases. In this analysis, only papers that were published in journals and recorded proper scientific proceedings were considered.
Papers were included based on the inclusion criteria: (i) classification or segmentation of skin lesions binary or multi-class, (ii) traditional machine learning method, (iii) deep learning models, (iv) digital image modality, (v) papers published in well-defined journals, and (vi) published in English.
The exclusion criteria were used to exclude the irrelevant studies based on the list of criteria presented as follows: (i) review articles, (ii) papers published in a language different from English, (iii) conference papers, (iv) books, and (v) book chapters.
The PRISMA flow diagram in Figure 2 shows the selection procedure [59]. The initial search identified 111,701 literature sources satisfying the search criteria. These sources were supplemented with 5757 records identified using other methods (forward and backward snowballing). After the removal of duplicate records, the number of papers ended up with 106,398 records. After applying the inclusion criteria, 801 full-text articles were identified, which were further inspected by applying the exclusion criteria. Finally, 53 articles using traditional machine learning methods and 49 articles using deep learning were selected. The selected articles were further analyzed and their results are discussed in this study. In addition, we listed only the models with the best score in each sample from studies that tested several models.

Datasets
The performance of melanoma diagnosis has improved with the dermoscopic method [21]. Dermoscopy is an invasive skin imaging technique that can capture enlightened and enlarged pictures of skin lesions to improve the clarity of the spots. The effect of the deeper skin could be improved by removing the reflection from the skin surface [60]. Automatic identification of dermoscopic images of melanoma is a challenging task due to many factors: firstly, intraclass variability in lesions such as texture, scale, and color; secondly, the high resemblance between the lesions of melanoma and non-melanoma; finally, the environmental conditions around it including hair, veins, and color calibration charts and rule marks.
In this section, the most used datasets in this area of research are described. A broad range of available and free online datasets such as MedNode, DermaIS, DermQuest, the ISIC 2016, 2017, 2018, and 2019, Ph2, and paid like Dermofit were used.
MED-NODE consists of 170 images of melanoma and nevus; it is divided into 70 and 100, respectively. The dataset came from the Digital Image Archive of the University of Medicine in the Netherlands, Department of Dermatology. The device for the sensing of skin cancer on microscopic images was developed and tested [61]. DermIS [62] is the Dermatology Information System skin dataset. This dataset is divided into two classes, nevus and melanoma. It contained 69 images: 26 nevus and 43 melanoma. DermQuest is a dataset consisting of 137 images. These images are divided into two classes, melanoma and nevus; these classes have 76 and 61 images, respectively [63].
The PH2 [64] database was created in collaboration with Porto University, Technical University of Lisbon, and Hospital Pedro Hispano in Matosinhos. It is comprised of 200 RGB color images with a 768 × 560 pixel resolution. This dataset has three groups of images: melanoma, normal nevus, and atypical nevus, with 40, 80, and 80 images in each category, respectively.
The skin colors characterized in the PH2 database may differ from white to creamy white. As illustrated in Figure 3, the images were carefully selected, taking into account their resolutions, quality, and dermoscopic features. The International Skin Imaging Collaboration (ISIC) 2016 [65] dataset, which is referred to as the 2016 ISIC-ISBI challenge, provides 900 training images. Participants can produce and submit automated results using a separate test dataset (350 images). The training dataset consists of two classes. These classes are melanoma and benign, in which each class contains 173 and 727 dermoscopic images, respectively.
The International Skin Imaging Collaboration (ISIC) 2017 dataset [66], which is also referred to as the 2017 ISBI Challenge on Skin Lesion Analysis Towards Melanoma Detection. This challenge provides training data (2000 images), a separate validation dataset (150 photos), and a blind held-out test dataset (600 images). The training dataset consists of three classes divided into 374, 254, and 1372 dermoscopic images for melanoma, seborrheic keratosis, and nevus.  The International Skin Imaging Collaboration (ISIC) 2018 dataset [67,68], which is also referred to as the HAM10000 ("Human Against Machine with 10,000 training images") dataset, was divided into a training dataset, consisting of 10015 images, and the test dataset, consisting of 1512 images. This dataset was compiled using a variety of dermatoscopy techniques on all anatomic sites (except mucosa and nails) from a retrospective sample of patients who had undergone skin cancer screening at multiple institutions. The training dataset consists of seven classes: AKIEC, BCC, Benign Keratosis (BKL), Dermatofibroma (DF), Melanocytic nevus (NV), Melanoma (MEL), Vascular lesion (VASC). There are varying numbers of images in each of these groups. The MEL has 1113, the NV has 6705, the BCC has 514, the AKIEC has 327, the BKL has 1099, the DF has 115, and the VASC has 142. The classification of different images into seven groups in this dataset is one of the most difficult challenges.
There is another dataset from the International Skin Imaging Collaboration, ISIC 2019 (BCN_20000) [69] consists of eight known classes and one class for outlier images. These classes are MEL, NV, BCC, AKIEC, BKL, DF, VASC, and SCC. ISIC 2019 consists of 25,331 images, where AKIEC has 867, BKL has 2624, BCC has 3323, DF has 239, NV has 12,875, MEL has 4522, SCC has 628, and VASC has 253. Figure 5 depicts the several types of skin cancer. This dataset is one of the hardest to categorize into eight classes with an uneven amount of photos in each class. The hardest challenge is to detect outliers or any of the other "out of distribution" diagnosis confidence. The Dermofit Image Library dataset is composed of 1300 skin images with corresponding class labels and lesion segmentations. There are ten lesion categories in this dataset: AKIES, BCC, hemangioma, intraepithelial carcinoma, nevus, SCC, pyogenic granuloma, seborrhoeic keratosis, DF, and MEL. These classes have 331, 76,257,239,65,45,97,88,78, and 24 images for nevus, MEL, seborrhoeic keratosis, BCC, DF, AKIES, hemangioma, SCC, Intraepithelial Carcinoma, and Pyogenic Granuloma [70].
The ISIC challenge 2020 dataset consisted of 33,126 dermoscopic images. These images were acquired from over 2000 patients. Images in the dataset were decomposed into nine classes in addition to an unknown image class [72]. Table 1 summarizes the total number of images and the total number of images in each class for all of these datasets.
For more than 30 years, skin cancer detection by CAD systems has been a hot topic of research [73]. For example, several methods for melanoma identification, classification, and segmentation have been developed and tested . The following section addresses the researcher's effort in the state of the art using journal papers published in ScienceDirect, IEEE, and SpringerLink databases during the last five years only.

Traditional Machine Learning
The rule asymmetry, border, color, diameter (ABCD) was used by authors in [100] to analyze the border and color of skin lesions. They classified the features using a multilayer perceptron network (MLP) based on backpropagation training. In [101], a Gabor filter and geodesic active contours are used to enhance the image and remove hair. Then, the ABCD scoring method was used to extract features. Finally, a combination of existing methods was used to classify lesions. In [102], the authors classified melanoma based on the thickness of the lesion by three values. Two classification schemata were used: the first classified lesions into thin or thick and the second schema classified lesions into thin, intermediate, and thick. They combined logistic regression with artificial neural networks for classification. In [103], lesions were enhanced using a median filter separately on each channel of the RGB space. Finally, these lesions were segmented based on a deformable model. A segmentation method based on the Chan-Vese model was proposed by [104]. Images were first enhanced using an isotropic diffusion filter, and then ABCD was used to extract the features from the segmented regions. These features were classified using a support vector machine (SVM). In [105], the authors proposed a classification system for BCC and MEL using Paraconsistent logic (PL) and annotation with two values. They extracted the degrees of evidence, formation pattern, and diagnosis comparison.
The spectra values that were used to differentiate between normal, BCC and MEL were 30, 96, and 19, respectively. A Delaunay Triangulation was used in [106] to extract the binary mask of ROIs. The authors of [107] segmented the histopathological images by extracting the granular layer boundary, and then the intensity profiles were used to classify two lesions only.
A comparison of 4 classification methods that were used for skin lesion diagnosis is summarized in [108] to determine the best method. Based on the histopathology of BCC, the authors of [109] combined two or three Fourier transform features to form one Z-transform feature. A CAD system using principal component analysis (PCA) and SVM was proposed in [110] to classify skin psoriasis. A framework for the segmentation of BCC was proposed in [111]. The hemoglobin component was clustered using k-means. In [112], the classification system for skin cancer was based on deep lesions using 3D reconstruction. They utilized adaptive snake, stereo vision, structure from motion, depth from focus for segmentation, and classification.
In [113], the lesion was extracted using a self-generated neural network. Then, the descriptive border, texture, and color features were removed. Finally, an ensemble classifier network that combined fuzzy neural networks with backpropagation (BP) neural network was used to classify the lesions based on the extracted features. A fixed grid wavelet network was used by [114] to enhance and segment skin lesions. Then, these features were classified by D-optimality orthogonal matching pursuit. Based on a chaotic time series analysis of the boundary, Khodadadi et al. [115] analyzed the irregular boundary of an infected skin lesion by Lyapunov exponent and Kolmogorov-Sinai entropy.
In [116], a segmentation technique for skin lesions was proposed based on ant colony using three types of features from lesions such as texture, relative colors, and geometrical properties. Finally, these features were classified by two classifiers: artificial neural network (ANN) and K-nearest neighbour (KNN). Based on shape and color, the authors of [117] combined some features after segmentation using ABCD. Finally, these features were classified and tested individually and after being connected. The Histogram of Gradients (HOG) and the Histogram of Lines (HL) were used in [118] to create a bag of features for each one separately. The bag of features was used to extract texture and color features for skin lesion detection. Color features were extracted using 3rd Zernike moments.
Roberta et al. [119] proposed a skin lesion diagnosis based on the ensemble model for feature manipulation. Przystalski et al. [120] proposed multispectral lesion analysis by fractal methods. Jaisakthi et al. [121] proposed a segmentation method for skin lesions using the Grab-Cut algorithm and k-means. Do et al. [122] introduced a melanoma detection system using a smartphone. Images were acquired using a smartphone camera. Then, they searched for the best processing method that worked appropriately with smartphones. This method started with a hierarchical segmentation approach and numerical features to classify a skin lesion. Adjed et al. [123] proposed a feature extraction method using wavelet transform, curvelet transforms, and local binary pattern (LBP). Finally, the extracted features were classified using an SVM. Hosseinzade [124] divided lesion images based on fabric characteristics. Fabric characteristics were described by the Gabor filter, and then these characteristics were classified by k-means.
Akram et al. [125] proposed an automatic skin lesion segmentation and classification system. They used several diverse ABCD, fuzzy C-means, pair threshold binary decomposition, HOG, and linear discriminant analysis (LDA). Jamil et al. [126] proposed a technique for skin lesion detection. Finally, they classified lesions based on color, shape, Gabor wavelet, and Gray intensity features. Khan et al. [127] combined Bhattacharyya distance and variance as an entropy-based method for skin lesion detection and classification.
Tan et al. [128] enhanced Particle Swarm Optimization (PSO) for skin lesion feature optimization. The authors modified two PSO models for discriminative feature selection; the first one performing a global search by combining lesion features and an in-depth local search by separating lesions into specific areas. The second modified PSO was for random acceleration coefficients. Subsequently, these features were classified using different classification methods. Tajeddin et al. [129] classified skin melanoma based on highly discriminative features. They started with contour propagation for lesion segmentation. Based on the peripheral area to extract features, lesions were mapped by log-polar space using Daugman's transformation. Finally, different classifiers were compared.
To classify skin lesions, the authors in [130] utilized the structural co-occurrence Matrix of frequencies extracted from dermoscopic images. Peñaranda et al. [131] classified skin lesions by analyzing skin cells using Fourier transform infrared. Finally, a study was conducted that used the perturbations that influenced results to determine the right effects. Wahba et al. [132] proposed a skin lesion classification system based on the Graylevel difference method. They tried to discriminate between four lesions by extracting features using ABCD and cumulative level-difference mean. Finally, these features were classified using an SVM. Zakeri et al. [133] proposed a CAD system to differentiate between melanoma and dysplastic lesions. They enhanced the grey-level co-occurrence matrix to extract features. Finally, these features were classified using an SVM. Pathan et al. [134] proposed a detection system for pigment networks and differentiated between typical and atypical network patterns. In [135], laser-induced breakdown spectroscopy was used with a combination of statistical methods to distinguish between the soft tissue of the skin.
Chatterjee et al. [136] utilized the non-invasive image of a skin lesion to distinguish melanoma from nevi. They obtained the texture pattern of the skin using 2D wavelet packet decomposition. Qasim et al. [137] proposed a skin lesion classification system. KNN was used with the enhanced images by the Gaussian filter to extract ROI. Finally, the segmented ROI was classified using an SVM. Madooei et al. [138] utilized a blue-whitish structure to differentiate melanoma from nevi lesions. Saez et al. [139] utilized the color of lesions to classify these lesions as melanoma or nevi. The lesions were classified by the color itself and their neighborhood color values. Navarro et al. [140] proposed a segmentation system for skin lesions. They classified the segmented lesions into melanoma and nevi. Riaz et al. [141] proposed a CAD system for skin lesions. Their system started with lesion segmentation to extract ROI. They utilized Kullback-Leibler divergence to detect lesion boundaries.
Sabbaghi et al. [142] presented a QuadTree based on the perception of lesion color. They found that the three most common colors of melanoma were blue-grey, black, and pink. Finally, they used different classifiers, such as SVM, ANN, LDA, and random forests (RFs). Murugan et al. [143] utilized watershed segmentation to extract ROI. Features were extracted using ABCD and Gray-Level Co-occurrence Matrix (GLCM). Finally, these features were classified using KNN, RF, SVM. Khalid et al. [144] suggested a segmentation method for dermoscopic skin lesion images using a combination of wavelet transform with morphological operations. Majumder et al. [145] proposed three features that were used in melanoma classification based on the ABCD rule. Chatterjee et al. [146] introduced fractal-based regional texture analysis (FRTA) to extract shape, fractal dimension, texture, and color features to classify three lesions using SVM with RBF.
Chatterjee et al. [147] proposed a classification system for four kinds of lesions. Features were extracted using cross-correlation techniques based on frequency domain analysis. Their system differentiated between benign and malignant lesions of the epidermal and melanocytic classes in a binary manner. Upadhyay et al. [148] extracted color, orienttion histogram, and gradient location of skin lesion features. These features were fused and classified as benign or malignant using an SVM. Pathana et al. [149] proposed a skin lesion CAD system. Garcia-Arroyo et al. [150] proposed a skin lesion detection system. Lesions were segmented using fuzzy histogram thresholding. Hu et al. [151] suggested a skin lesion classification approach. To measure the similarity between features, they introduced codewords for a bag of features. Moradi et al. [152] suggested a skin lesion segmentation and classification system based on sparse kernel representation. Pereira et al. [153] proposed a skin lesion classification system based on characteristics of lesions borderline in addition to combining LBP with gradients.

Deep Learning
Kawahara et al. [154] proposed a skin lesion classification system using a convolutional neural network (CNN). The modified pre-trained CNN was able to classify images with different resolutions. Yu et al. [155] proposed a novel CNN for skin lesion segmentation and classification. The proposed CNN was based on residual learning and consisted of 50 layers. Codella et al. [156] proposed a deep residual network for skin lesion classification using the benchmark dataset ISIC2016. Bozorgtabar et al. [157] proposed a decision support system that localized skin cancer automatically using deep convolution learning for pixel-wise image segmentation. Yuan et al. [158] proposed a skin lesion segmentation system by leveraging CNN. The network consisted of 19 layers. Instead of using the traditional loss function, they utilized Jaccard Distance as a loss function.
Sultana et al. [159] proposed a skin lesion detection system using deep residual learning with a regularized framework. Rundo et al. [160] utilized ABCD for lesions to analyze skin lesions. Finally, ad hoc clustering was performed using a pre-trained deep Levenberg-Marquardt neural network. Creswell et al. [161] proposed denoising adversarial autoencoders to classify limited and imbalanced skin lesion images. Harangi [162] ensembled four different CNNs to investigate the impact on performance. Guo et al. [163] utilized and ensembled multi-ResNet to analyze skin lesions. The training images for each ResNet were pre-treated in different ways while the labels were still like the original.
Monedero et al. [164] utilized the thickness of lesions to detect melanoma using the Breslow index. The extracted texture, shape, pigment network, and color features of lesions were classified using GoogleNet to classify lesions into five types. Hagerty et al. [165] utilized deep learning and conventional image processing to extract different skin lesion features. The extracted features from deep learning and traditional processing of images were combined and fused. Finally, the newly generated features were used to classify lesions. Polap [166] proposed a skin lesion classification based on IoT. He used deep learning over IoT. The proposed model classified images in a short time, but with a low classification rate. Such a system could be used in a smart home as a part of an intelligent monitoring system [167].
Sarkar et al. [168] proposed a depth-wise separable residual deep convolutional network to classify skin cancer. The non-local means filter was succeeded by the contrastlimited adaptive histogram equalization (CLAHE) over the discrete wavelet transform (DWT) algorithm. Zhang et al. [169] proposed a CNN model with attention residual learning to classify skin lesions into three classes. The proposed deep model has four residual blocks with a total of 50 layers that consist of the deep model. Albahar [170] introduced a skin lesion detection system for binary classification of malignant or benign. He proposed a CNN model consisting of seven layers. He also proposed a regularization method to control the complexity of the classifier using the standard deviation of the weight matrix.
González-Díaz [171] introduced a skin lesion CAD system using CNN called Der-maKNet. The proposed CNN was based on ResNet50, but the author started by applying a modulation block over the convolutional res5c layer outputs. Two pooling layers (AVG and Polar AVG) were worked together at the same time. Three fully connected layers were used at the end of CNN. However, because melanoma growth differently, the third fully connected one precedes the asymmetry block. The asymmetry block was used to detect the different methods of melanoma growth.
Kawahara et al. [172] proposed a skin lesion classification system using CNN that simultaneously worked on multiple tasks. The proposed CNN is able to classify sevenpoint melanoma checklist criteria. The proposed CNN skin lesion diagnosis classified skin lesion images and meta-data of patients. Yu et al. [173] proposed a skin lesion classification system using CNN and the local descriptor encoding method. The lesion features were extracted from images using ResNet50 and ResNet101. Then, a fisher vector (FV) was used to build a global image representation using the extracted features of ResNet. Finally, an SVM was used with a Chi-squared kernel for classification. Dorj et al. [174] proposed a skin cancer classification system using CNN. The pre-trained Alex-Net was used to extract features while the Error-Correcting Output Codes (ECOC) SVM was used to classify the extracted features.
Gavrilov et al. [175] proposed a skin neoplasm (cancer) classification system using CNN. They applied transfer learning to inception V3 (Googlenet). Furthermore, the development of web and mobile applications was created to allow patients to assess their lesions and give a preliminary diagnosis using images captured by themselves. Chen et al. [176] proposed a classification system for facial skin diseases. They used three CNN models with transfer learning to classify five skin diseases of the face. The proposed model was worked on through a cloud platform. Mahbod et al. [177] utilized four different CNN models (AlexNet, VGG, ResNet-18, and ResNet-101) to classify three skin lesions. They used SVM, RF, and MLP to classify the extracted features from CNN. The different classification results were enameled together to generate a single classification for the input lesions. Brinker et al. [178] proposed a skin lesion classification system using a pre-trained model. The modified pre-trained model ResNet50 outperformed expert dermatologists in classifying lesions into melanoma and nevus.
Tan et al. [179] utilized PSO for skin lesion segmentation. They tried to optimize PSO using different methods, such as Firefly Algorithm (FA), spiral research action, probability distributions, crossover, and mutation. To enhance lesion segmentation, K-Means was used. The hybrid learning PSO (HLPSO) was used in the development of CNN. The classification system could classify lesions into melanoma and nevus. Khan et al. [180] used a custom CNN of ten layers for image segmentation and a deep pre-trained CNN model for feature extraction. Then, an improved moth flame optimization (IMFO) algorithm was used for feature selection. The selected features were fused using multiset maximum correlation analysis and classified using the Kernel Extreme Learning Machine (ELM) classifier.
Tschandl et al. [181] proposed combining and expanding different CNNs for the segmentation and classification of skin lesions. They used three well-known benchmark datasets. Finally, they found that post-processing with a small dataset that contains noise decreased Jaccard loss. Vasconcelos et al. [182] proposed a skin lesion segmentation system using morphological geodesic active contour. Different CNN models were used, such as full resolution convolutional networks (FrCNs), deep class-specific learning with probabilitybased step-wise integration (DCL-PSI). The proposed model was able to classify skin lesions into melanoma and nevus.
Burlina et al. [183] proposed a CNN for acute Lyme disease from erythema migrans images, even with different acquisition and quality conditions. They fine-tuned and replaced the final layers of ResNet50. The proposed model was able to classify four different lesions. Maron et al. [184] proposed a system that classified five types of skin lesions. They applied transfer learning to ResNet50 in addition to fine-tuning CNN weights. They compared the classification rate using a CNN against 112 expert dermatologists.
Goyal et al. [185] proposed a skin lesion segmentation system by ensembling the segmentation output from Mask R-CNN and DeeplabV3+. Albert [186] proposed Predict-Evaluate-Correct K-fold (PECK), which trains CNNs from limited data. The proposed system was used for skin lesion classification in his research, Synthesis, and Convergence of Intermediate Decaying Omnigradients (SCIDOG), to detect the contour of lesions. Finally, the segmented lesions were classified using a CNN with SVM and RF. Ahmad et al. [187] utilized three loss functions by fine-tuning ResNet152 and InceptionResNet-V2 layers. Euclidean space was used to compute the L-2 distance between images. Finally, the L-2 distance was used to adapt the weights to classify images.
Kwasigroch et al. [188] proposed a skin lesion classification using a CNN with hillclimbing for search space. This process led to increasing the size of the network, which reduced the computational cost. Adegun et al. [189] proposed an encoder and decoder network with subnetworks connected using skip connections. The proposed CNN was used for skin lesion segmentation and pixel-wise classification. Song et al. [190] suggested that CNNs could segment, detect, and classify skin lesions. To control the imbalanced datasets, they utilized a loss function based on the Jaccard distance and the focal loss. Wei et al. [191] proposed a skin lesion recognition system based on fine-grained classification to discriminate features. A different lightweight CNN was utilized for segmentation and classification.
Gong et al. [192] proposed a dermoscopic skin image classification system using deep learning models. The authors enhanced skin images using StyleGANs while the enhanced image was classified using 43 CNNs. CNNs were divided into three groups with different fusion methods. Finally, the classification decision was generated using the max voting technique.
Nasiri et al. [193] proposed a skin lesion classification system based on deep learning with case-based reasoning (CBR). Öztürk et al. [194] proposed a segmentation system for skin lesions. Hosny et al. [195] proposed a new deep CNN classification system for skin lesions. The authors performed three different experiments with three datasets. They compared the accuracy between using traditional machine learning classifiers and the emerging technology with CNN. They found that the conventional machine learning classifier led to a lower classification rate.
Amin et al. [196] proposed a skin lesion classification system. Firstly, they enhanced images, then biorthogonal 2-D wavelet transforms and the Otsu algorithm were used to segment lesions. Finally, two pre-trained models were fused serially to extract features for classification using PCA. Mahbod et al. [197] proposed investigating the effect of the different image sizes of skin lesions using transfer learning with pre-trained models.
Hameed et al. [198] proposed a skin lesion classification system based on a multiclass multilevel algorithm. Traditional machine learning and deep learning methods were used with the proposed model. Zhang et al. [199] proposed an optimization algorithm for optimal weight selection to minimize the network output error for skin lesion classification. Hasan et al. [200] proposed a semantic segmentation network to segment skin lesions called DSNet. They utilized depth-wise separable convolution to reduce the number of parameters that produced a lightweight network. Al-masni et al. [201] proposed a diagnostic framework for skin lesion classification systems, which combined segmentation of lesion boundaries with multiple classification stages. The proposed system segmented the lesions using a full resolution convolutional network (FrCN) with four CNNs for classification. This system was evaluated and tested using three benchmark datasets.
Pour et al. [202] proposed a segmentation method for skin lesions using a CNN. The CNN was trained from scratch using a small dataset with augmentation. Olusola et al. [203] utilized image augmentation (a variant of SMOTE). Then, they classified skin lesions into benign and malignant using SquuezeNet. Hosny et al. [204] utilized Alexnet with transfer learning to classify the challenging dataset ISIC2018. They used different approaches for lesion segmentation. Hosny et al. [205] proposed a CAD system for skin lesions using the challenging dataset ISIC2019. That dataset has several challenges, such as imbalanced classes and unknown images. The authors utilized the bootstrap weighted classifier with a multiclass SVM. This classifier changed the weights according to the image class. Finally, the authors dealt with unknown images in two different ways. They trained GoogleNet with a new class containing a different number of unknown images collected from various sources. The second way was the similarity score; if the high similarity score of the tested image with the known eight classes was less than 0.5, the tested image was identified as an unknown image or out of distribution. The number of classes used for skin disease recognition in the analyzed works is summarized in Figure 6. Generally, the vast majority of works used two-class recognition only, while only one study [172] used 10 classes for recognition.

Discussion and Conclusions
The paper is an analytical survey of the literature on skin lesion image classification and disease recognition. It is a comprehensive review of the methods and algorithms used in the processing, segmentation, and classification of skin lesion images. Both classical machine learning methods and deep convolutional neural network models were explored. A discussion of the available and known datasets with a comparison between these datasets was introduced. At the end of the systematic survey, a table was used to compare state-ofthe-arts methods in a novel way. The column "simple" in Table 2 refers to the proposed method not being complicated and easy to apply and did not require specified hardware, whereas the column "Contribution Achieved" was added to indicate if the research paper achieved what was discussed or not.
The comparison of methods of classification for skin lesions shows that the problem formulations of each work vary slightly. The efficient melanoma detection process has five core elements, focused on data acquisition (collection), fine-tuning, selection of features, deep learning, and final model development. The first step is to acquire data in which data from publicly available benchmarks, non-listed and non-public databases, such as the melanoma detection images collected from the internet, are used for detecting skin cancer.      There were several methods of learning with regard to deep learning based on transfer learning, while others were based on ensemble approaches, and some employed neural networks and hybrid techniques of fully convolutional neural networks. The pre-trained deep learning models and handcrafted methods that were based on a deep-leaning approach have already shown promising results for high-precision accuracy of melanoma detection.
There is a limited range of images for training and testing available for comparison as most of the datasets are small. With small datasets, the proposed methods do well, but are prone to over-fitting, and when tested with a large image set, are reliably unpredictable. For example, just 200 images are included in the PH2 dataset. The problem of training with a small dataset could be mitigated by data augmentation, image generation by an adversarial generative network, and transfer learning. Some researchers use non-public databases and internet images. This makes it more difficult to replicate the findings and results because the dataset is not available, whereas the selection of images from the internet may be biased.
The creation of large public image datasets with images as representative of the world's people as possible to avoid racial bias [206] is another major task in this research field. Image prejudice based on gender and race AI prejudice means that the models and algorithms fail to give optimal results for people of an under-represented gender or ethnicity.
Mostly, skin lesions from light-colored skin can be seen in current datasets. For example, the ISIC dataset images are mostly obtained in the USA, Europe and Australia. In addition, CNNs try to extract the skin color to achieve a proper classification for darkskinned humans. This can only happen if the training dataset contains sufficient images of dark-skinned people. The size of the lesion also has significant importance. If the lesion size is smaller than 6mm, melanoma can not be detected easily.
The addition of clinical data such as race, age, gender, skin type, as inputs for classifiers may help to increase classification accuracy. This supplemental data could be beneficial for dermatologists' decision making. These aspects should be included in future work. Finally, according to the authors' perspective and based on the size of the dataset, if the dataset contains a large number of images per class, deep learning is better than traditional machine learning. Even with datasets containing few images, deep learning can overcome this issue by using different methods of augmentation. Deep learning makes intelligent decisions on its own with a higher accuracy rate. The pre-trained deep learning models and handcrafted methods that were based on a deep learning approach have already shown promising results for the high-precision accuracy of melanoma detection.