Analysis of Artificial Intelligence-Based Approaches Applied to Non-Invasive Imaging for Early Detection of Melanoma: A Systematic Review

Simple Summary Melanoma is the most dangerous type of skin cancer worldwide. Early detection of melanoma is crucial for better outcomes, but this often can be challenging. This research explores the use of artificial intelligence (AI) techniques combined with non-invasive imaging methods to improve melanoma detection. The authors aim to evaluate the current state of AI-based techniques using tools including dermoscopy, optical coherence tomography (OCT), and reflectance confocal microscopy (RCM). The findings demonstrate that several AI algorithms perform as well as or better than dermatologists in detecting melanoma, particularly in the analysis of dermoscopy images. This research highlights the potential of AI to enhance diagnostic accuracy, leading to improved patient outcomes. Further studies are needed to address limitations and ensure the reliability and effectiveness of AI-based techniques. Abstract Background: Melanoma, the deadliest form of skin cancer, poses a significant public health challenge worldwide. Early detection is crucial for improved patient outcomes. Non-invasive skin imaging techniques allow for improved diagnostic accuracy; however, their use is often limited due to the need for skilled practitioners trained to interpret images in a standardized fashion. Recent innovations in artificial intelligence (AI)-based techniques for skin lesion image interpretation show potential for the use of AI in the early detection of melanoma. Objective: The aim of this study was to evaluate the current state of AI-based techniques used in combination with non-invasive diagnostic imaging modalities including reflectance confocal microscopy (RCM), optical coherence tomography (OCT), and dermoscopy. We also aimed to determine whether the application of AI-based techniques can lead to improved diagnostic accuracy of melanoma. Methods: A systematic search was conducted via the Medline/PubMed, Cochrane, and Embase databases for eligible publications between 2018 and 2022. Screening methods adhered to the 2020 version of the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Included studies utilized AI-based algorithms for melanoma detection and directly addressed the review objectives. Results: We retrieved 40 papers amongst the three databases. All studies directly comparing the performance of AI-based techniques with dermatologists reported the superior or equivalent performance of AI-based techniques in improving the detection of melanoma. In studies directly comparing algorithm performance on dermoscopy images to dermatologists, AI-based algorithms achieved a higher ROC (>80%) in the detection of melanoma. In these comparative studies using dermoscopic images, the mean algorithm sensitivity was 83.01% and the mean algorithm specificity was 85.58%. Studies evaluating machine learning in conjunction with OCT boasted accuracy of 95%, while studies evaluating RCM reported a mean accuracy rate of 82.72%. Conclusions: Our results demonstrate the robust potential of AI-based techniques to improve diagnostic accuracy and patient outcomes through the early identification of melanoma. Further studies are needed to assess the generalizability of these AI-based techniques across different populations and skin types, improve standardization in image processing, and further compare the performance of AI-based techniques with board-certified dermatologists to evaluate clinical applicability.


Introduction
It has been well established that the detection of late-stage melanoma is associated with poorer, potentially fatal outcomes.Early detection of melanoma is crucial to prevent mortality and reduce unnecessary invasive procedures including surgical biopsy [1].The use of in vivo diagnostic imaging modalities to aid in the earlier detection of this deadly skin cancer has shown significant efficacy, particularly with techniques such as reflectance confocal microscopy (RCM), optical coherence tomography (OCT), and dermoscopy [2].RCM uses a diode laser to provide high-resolution images in horizontal sections at the cellular level as deep as the papillary dermis, while OCT uses near-infrared light to capture microscopic images up to 2 mm below the skin's surface (Figures 1 and 2) [2,3].Dermoscopy utilizes a dermatoscope with polarized or non-polarized light to visualize patterns and microstructures in the epidermis and superficial dermis (Figure 3) [4].

Introduction
It has been well established that the detection of late-stage melanoma is as with poorer, potentially fatal outcomes.Early detection of melanoma is crucial to mortality and reduce unnecessary invasive procedures including surgical biopsy use of in vivo diagnostic imaging modalities to aid in the earlier detection of thi skin cancer has shown significant efficacy, particularly with techniques such as refl confocal microscopy (RCM), optical coherence tomography (OCT), and dermosc RCM uses a diode laser to provide high-resolution images in horizontal section cellular level as deep as the papillary dermis, while OCT uses near-infrared ligh ture microscopic images up to 2 mm below the skin's surface (Figures 1 and 2) [2 moscopy utilizes a dermatoscope with polarized or non-polarized light to visua terns and microstructures in the epidermis and superficial dermis (Figure 3) [4].Despite the advantages of these non-invasive imaging modalities, the appli them requires substantial training and expertise, thus leading to variable results nostic accuracy.Further, the quality of images may affect the interpretation and di time, calling for an unbiased and automated recognition process [5].Artificial inte (AI)-based techniques have been developed to analyze images obtained by these modalities, with the aim of automating the diagnostic process while providing ob and consistency in results.The use of AI-based techniques for the analysis of nondermatologic imaging modalities has the potential to revolutionize the way in wh cancers are detected and diagnosed.Studies have demonstrated that machine algorithms have the potential to minimize the number of artifacts that need t viewed, speed up the diagnostic process, and reduce the number of clinic visit tients [6].While the use of AI in medical image recognition has grown tremendou the past several years, its potential in the dermatological evaluation of skin can mains largely unrealized.High-quality and accurate algorithms rely on a balanced with a high diversity of inputs to allow for precise and accurate diagnoses [7].T ferent studies exploring the use of machine learning-based technologies in the an non-invasive dermatologic imaging modalities may show variations in results b the differences in algorithm development and image quality.Despite the advantages of these non-invasive imaging modalities, the application of them requires substantial training and expertise, thus leading to variable results in diagnostic accuracy.Further, the quality of images may affect the interpretation and diagnostic time, calling for an unbiased and automated recognition process [5].Artificial intelligence (AI)-based techniques have been developed to analyze images obtained by these imaging modalities, with the aim of automating the diagnostic process while providing objectivity and consistency in results.The use of AI-based techniques for the analysis of non-invasive dermatologic imaging modalities has the potential to revolutionize the way in which skin cancers are detected and diagnosed.Studies have demonstrated that machine learning algorithms have the potential to minimize the number of artifacts that need to be reviewed, speed up the diagnostic process, and reduce the number of clinic visits for patients [6].While the use of AI in medical image recognition has grown tremendously over the past several years, its potential in the dermatological evaluation of skin cancers remains largely unrealized.High-quality and accurate algorithms rely on a balanced dataset with a high diversity of inputs to allow for precise and accurate diagnoses [7].Thus, different studies exploring the use of machine learning-based technologies in the analysis of non-invasive dermatologic imaging modalities may show variations in results based on the differences in algorithm development and image quality.
Furthermore, the multitude of AI-based techniques used to analyze dermatological images can complicate clinical decision-making without clear guidelines.It can be challenging to determine the feasibility of AI in clinical practice, as well as the role of machine learning, convolutional neural networks, and deep learning in these techniques.Additionally, it is important to consider the limitations of AI.While it has the potential to revolutionize the field, it can also be biased depending on the datasets and training process.For instance, datasets used to train AI models may be primarily composed of fairer skin tones, which can result in less accurate diagnoses for patients with darker skin tones [8].Skin tones can affect the results and output based on AI model training, making it crucial to ensure that the datasets used to train these models are diverse and representative of the patient population [9].It is also crucial to compare the diagnostic accuracy of different models trained on diverse datasets to accurately assess their potential for clinical use.An imbalanced dataset during neural network training can lead to uneven performance in the analysis of non-invasive dermatologic imaging modalities using machine learning algorithms [10,11].Therefore, ethical considerations and guidance from the Food and Drug Administration (FDA) are necessary to ensure the safe and effective implementation of AI-based techniques in clinical settings.These considerations highlight the importance of comprehensively evaluating the current literature on the use of AI-based techniques to thoroughly understand the potential for AI in the non-invasive diagnosis of melanoma.This review aims to address these concerns and provides a better understanding of the current state of the field, with a goal of guiding the development of future research supporting the implementation of these techniques into clinical tools.
In this review, we seek to analyze and describe the current state of AI in non-invasive skin cancer detection and summarize the outcomes of various algorithms.This review provides a comprehensive analysis of the current state of AI in non-invasive skin cancer detection, highlights the clinical implications of various algorithms, and identifies areas of future research.Given the potential of these techniques to improve patient outcomes and reduce unnecessary invasive procedures, understanding the current state of the field is of paramount importance.

Literature Search Strategy
A literature search of PubMed/Medline, Embase, and Cochrane was used to search for papers published from 2018 to 2023, analyzing the use of AI-based techniques applied to images of malignant melanoma produced from reflectance confocal microscopy, optical coherence tomography, and dermoscopy.The PRISMA guidelines were followed during data extraction, analysis, and reporting.The following search terms were used: "melanoma", "neural network", "diagnosis or detection", "carcinoma", "lesion or growth or cancer or neoplasm or tumor or malignant or metastatic", "computer systems", "skin cancer detection", "digital pathology", "machine or deep learning", "algorithms", "artificial intelligence", "skin cancers", "diagnostic techniques and procedures", "dermoscopy", "reflectance confocal microscopy", and "optical coherence tomography".

Study Eligibility and Selection
The study followed the Preferred Reporting Items for Systematic Reviews and Metaanalyses (PRISMA) guidelines, the protocol was registered in the OSF database (registration number: osf-registrations-z8tve-v1), and the search results were sorted, screened, and manually assessed by two independent authors (R.H.P. and E.F.) Any discrepancies in study selection and inclusion criteria were resolved through the input of a third reviewer (J.L.)Only original, peer-reviewed research articles in the English language were selected for review.Studies that directly compared human experts or histopathology with the AIbased evaluation of non-invasive imaging for melanoma detection (dermoscopy, reflectance confocal microscopy, and optical tomography) were part of the inclusion criteria.Studies that did not directly diagnose melanoma, did not evaluate the use of AI-based approaches with RCM, OCT, and dermoscopy, involved commentary/editorials on the subject of AIbased approaches, and discussed models for lesion segmentation without classification were excluded from the review.Finally, studies that did not evaluate diagnostic accuracy, AUC, or sensitivity/specificity rates for the early detection of melanoma using AI-based techniques were excluded (Figure 4).
From the selected articles, the following information was collected and analyzed: general information regarding the paper (authors, publication year, and origin), the purpose/objective of the work, non-invasive imaging modality, evaluation metric (diagnostic accuracy, area under the curve, sensitivity and specificity rates), and AI methodology.

Study Analysis and Performance Metrics
The evaluation of the performance of AI-based approaches applied in combination with reflectance confocal microscopy, optical coherence tomography, and dermoscopy for the detection of melanoma was a critical aspect of this systematic review.The selection and extraction of appropriate performance metrics from the reviewed papers was crucial to accurately assess the diagnostic accuracy of the evaluated approaches.The three primary performance metrics chosen for this study were accuracy, sensitivity, and specificity.Accuracy is a widely used classification performance measure that indicates the percentage of accurately classified skin lesions, which is the ratio between the total number of correctly classified lesions and the overall number of examined lesions.However, accuracy may not be an appropriate metric when there is an imbalance in the distribution of target classes in the dataset.Therefore, sensitivity and specificity, which are unaffected by class imbalance, were also considered in this study.Sensitivity is a measure of the proportion of true positive cases, i.e., the number of cases correctly identified as melanoma, in relation to the total number of actual melanoma cases in the dataset.Specificity, on the other hand, measures the proportion of true negative cases, i.e., the number of cases correctly identified as non-melanoma, in relation to the total number of actual non-melanoma cases in the dataset.The mean sensitivity and specificity of papers comparing AI-based techniques with dermatologist performance was calculated in this study.For studies evaluating OCT and RCM, accuracy rates were studied due to the mutual availability of this metric in the reviewed papers.From the selected articles, the following information was collected and analyzed: general information regarding the paper (authors, publication year, and origin), the purpose/objective of the work, non-invasive imaging modality, evaluation metric (diagnostic accuracy, area under the curve, sensitivity and specificity rates), and AI methodology.

Study Analysis and Performance Metrics
The evaluation of the performance of AI-based approaches applied in combination with reflectance confocal microscopy, optical coherence tomography, and dermoscopy for the detection of melanoma was a critical aspect of this systematic review.The selection and extraction of appropriate performance metrics from the reviewed papers was crucial to accurately assess the diagnostic accuracy of the evaluated approaches.The three primary performance metrics chosen for this study were accuracy, sensitivity, and specificity.Accuracy is a widely used classification performance measure that indicates the percentage of accurately classified skin lesions, which is the ratio between the total number of correctly classified lesions and the overall number of examined lesions.However, accuracy may not be an appropriate metric when there is an imbalance in the distribution of target classes in the dataset.Therefore, sensitivity and specificity, which are unaffected by class imbalance, were also considered in this study.Sensitivity is a measure of the proportion of true positive cases, i.e., the number of cases correctly identified as melanoma, in Other performance metrics, such as the area under the curve (AUC), positive predictive value (PPV), and negative predictive value (NPV), were also reported and discussed, when available.AUC is a commonly used performance metric that provides a comprehensive evaluation of the performance of a classification model across all possible decision thresholds.PPV and NPV are metrics that indicate the proportion of positive and negative cases, respectively, that are correctly identified by the model.The performance metrics were used to analyze and compare the diagnostic accuracy of the evaluated AI-based approaches.The analysis of these metrics allowed for a comprehensive evaluation of the effectiveness of AI-based approaches in the detection of melanoma, as well as the identification of potential areas for improvement and future research.

Results
After initial screening, a total of 287 articles were assessed for eligibility.The PRISMA flow diagram summarizing the identification of studies is shown in Figure 1.Following a careful evaluation of their titles and abstracts, 40 articles were ultimately included in this systematic review.Most of the articles applied AI-based approaches to dermoscopic images (n = 37), followed by optical coherence tomography images (n = 1) and reflectance confocal microscopy images (n = 2) (Tables 1-3).

Artificial Intelligence-Based Approaches Applied to Dermoscopic Images
Deep learning has gained significant attention in the field of dermatology, particularly in the analysis of dermoscopy images.Dermoscopic images of melanoma have been widely used in the development and evaluation of deep learning-based models.The majority of articles in our review trained their algorithms on publicly available datasets of dermoscopic images of melanoma that were confirmed by either histopathology, follow-up examination, expert consensus, or confirmation by in vivo confocal microscopy.A summary of these publicly available datasets is shown in Table 4. Foahom Gouabou et al. presented a deep learning ensemble method to obtain the accurate computer-assisted diagnosis (CAD) of melanoma [12].This was undertaken through the evaluation of 1113 dermoscopic images of melanoma lesions extracted from a public dataset.The proposed framework achieved an area under the receiver operating curve (AU-ROC) of 0.93 for melanoma detection and outperformed similar existing methods within this task.Furthermore, this study also evaluated the decision-making process between the algorithm versus a trained dermatologist in distinguishing between benign keratosis and melanoma.The algorithm demonstrated enhanced efficiency and accuracy in diagnosing challenging-to-classify pigmented lesions for the task (p = 0.90 for melanoma), while also offering a transparent and unbiased decision-making process.Marchetti et al. presented one of the earliest studies, in 2016, comparing the diagnostic accuracy of dermoscopic melanoma images using deep learning algorithms with dermatologists [13].This study uniquely used five individual methods to combine individual automated predictions into "fusion" algorithms.Notably, the top fusion model outperformed eight experienced dermatologists.The average sensitivity and specificity of dermatologists in accurate classification (benign lesions vs. malignant) was 82% and 59%, respectively.The study found that the best fusion computer algorithm achieved a remarkable ROC area of 0.86, demonstrating a significant improvement over the mean ROC area of 0.71 observed among the eight readers in classification (p = 0.001).
Subsequently, in 2020, Marchetti et al. proposed a similar study in which the best performing algorithm considerably outperformed eight dermatologists and nine dermatology trainees (p < 0.001) [14].Xia et al. proposed a two-stage approach that identifies all lesions in an image, estimates their likelihood of malignancy, and generates an image-level likelihood for high-level screening [15].This strategy led to an AUC of 0.959 based on dermoscopic images, which was augmented to an AUC of 0.961 using a publicly available dataset from ISIC 2018.This two-stage model demonstrated satisfactory performance and could be used in a PCP triage for dermatology at scale for images concerning malignancy as a complete end-to-end system.Xin et al. proposed a novel transformer network, SkinTrans, which demonstrated 94.1% accuracy on a clinical dataset of 1113 melanoma dermoscopic images, performing better than traditional CNN models [16].
In a larger study utilizing several publicly available datasets, Singh et al. applied a segmentation model to four public datasets, PH2, ISIC 2017, ISIC 2018, and ISIC 2019, yielding an accuracy score of 99.50%, 99.33%, 98.56%, and 98.04%, respectively [17].The proposed method had a significantly higher score in sensitivity and specificity in the field of the diagnosis of melanoma lesions.In comparison, Kaur et al. proposed an automated melanoma classifier based on a deep convolutional neural network to accurately classify benign vs. malignant melanoma.Images were obtained from the ISIC 2016, ISIC 2017, and ISIC 2020 datasets, for which the proposed DCNN classifier achieved accuracy rates of 81.41%, 88.23%, and 90.42%, respectively [24].This model demonstrated favorable performance in comparison with other neural networks, while offering an advanced framework for the automation of the diagnostic process.
Naeem et al. proposed a deep learning-based framework for the multiclassification of skin cancer using dermoscopic images [18].For melanoma, the training set included 3166 images with a validation set of 452 images and a testing set of 904 images.This system achieved better performance as compared to four pre-trained classifiers, with accuracy of 92.18% for the classification of melanoma.It achieved an AUC of 0.9833, recall of 99.9%, precision of 92.21%, an f1-score of 91.37%, and accuracy of 92.21%.Lee et al. introduced a suite of deep neural network designs tailored to melanoma detection from dermoscopy images [19].This study produced sensitivity of 92.8%, a PPV of 78.5%, and an NPV of 91.2%.This model allows for the use of a pre-screening tool in the diagnostic process of melanoma while providing a fine balance between computational efficiency and accuracy.Fraiwan et al. utilized the HAM1000 dataset of dermoscopic images to classify melanoma vs. non-melanoma skin cancers [20].The melanoma class was detected with 71% sensitivity (i.e., recall) but 43.1% precision.The highest reported accuracy rate from this study was 76.7%.
Further, Martin-Gonzalez et al. developed a deep learning-based tool to differentiate between benign skin lesions and melanoma in the hospital setting [22].This was based on a dataset of 232 dermoscopic images, which were analyzed by the system.The nevus group had a significantly lower mean diagnostic threshold (27.12 ± 35.44%) compared to the melanoma group (72.50 ± 34.03%), with a p-value of less than 0.001.The area under the ROC curve was 0.813.Sensitivity of 0.691, specificity of 0.802, and accuracy of 0.776 were achieved at a diagnostic threshold of 67.33%.Of note, this study did not incorporate the use of public databases.Lu et al. proposed a deep learning-based classification system for melanoma detection using dermoscopic images from the HAM10000 dataset [23].This method offered an accuracy rate of 100% for the detection of melanoma, a sensitivity rate of 94.05%, and a precision rate of 97.07%.Vaiyapuri et al. developed a novel computational intelligence-based melanoma detection and classification approach using dermoscopic images with maximum accuracy of 97.50% [21].Arshad et al. similarly used the HAM10000 database to apply an automated framework for multiclass skin lesion classification, obtaining an accuracy rate of 91.7% [25].
Xing et al. proposed a novel Zoom-in Attention and Metadata Embedding (ZooME) melanoma detection network and applied this to the ISIC 2020 dataset composed of 33,126 dermoscopy images [26].This model achieved a 92.23%AUC score, 84.59% accuracy, 85.95% sensitivity, and 84.63% specificity.Thus, this model uniquely demonstrates a benefit in the extraction and utilization of unique pathological information from dermatoscopic images and in embedding various patient demographics for better prediction.Pham et al. introduced a novel technique that enhances melanoma prediction on an imbalanced dataset through a reconstructed CNN architecture and optimized algorithms [27].The training dataset, consisting of 17,302 melanoma and nevus images, is the largest dataset utilized in this context.A comprehensive comparison was conducted between the model's performance and that of 157 dermatologists from 12 university hospitals in Germany, all based on the same dataset.The findings revealed that the proposed approach outperformed all 157 dermatologists and demonstrated superior performance to the state-of-the-art method, achieving an impressive area under the curve of 94.4%, sensitivity of 85.0%, and specificity of 95.0%.
Another deep learning-based approach, proposed by Nawaz et al., evaluated on three standard datasets (ISBI 2016, ISIC 2017, and PH2), demonstrated that the presented method outperformed current approaches [29].This method achieved average accuracy of 95.40, 93.1, and 95.6% on the ISIC 2016, ISIC 2017, and PH2 datasets, respectively, highlighting its robustness in skin lesion recognition and segmentation.This approach combines faster region-based convolutional neural networks (RCNN) with fuzzy k-means clustering (FKM), allowing for considerable improvements in melanoma detection even in the presence of image artifacts such as hair, blood vessels, lighting variations, and noise.In their work, Kim et al. introduced a publicly available deep learning algorithm for the classification of malignant melanoma.The proposed approach leveraged skin lesion images and expert labeling outcomes obtained from convolutional neural networks [28].The U-Net model employed achieved a notable Dice similarity coefficient of 81.1% when compared to the expert labeling results.Moreover, the algorithm demonstrated high classification accuracy of 80.06% for malignant melanoma cases.These findings highlight the effectiveness of the proposed approach in accurately identifying and classifying malignant melanoma.
To overcome the problem of class imbalance in datasets, Sayed et al. proposed a new model for the classification of skin lesions using ISIC 2020 [30].This study utilized a hybrid version of a convolutional neural network architecture and bald eagle search (BES) optimization to solve this issue.This deep network model for the prediction of melanoma skin cancer used fewer parameters and achieved overall accuracy of 98.37%, with specificity of 96.47%, sensitivity of 100%, an f-score of 98.40%, and an area under the curve of 99%.These findings demonstrated that the proposed model was both robust and efficient, outperforming VGG19, GoogleNet, and ResNet50.
Moreover, Foahom-Gouabou et al. proposed an ensemble of convolutional neural networks (CNNs) with a directed acyclic graph to aggregate binary CNNs, resulting in the best balanced accuracy (76.6%) among multiclass CNNs and other related works on the ISIC 2018 public dataset [31].This method is noteworthy for its hierarchical workflow, which promotes transparency in the decision-making process and, as a result, simplifies the interpretation for dermatologists.Alsaade et al. aimed to develop a system for the diagnosis of skin cancer using deep learning and traditional AI machine learning algorithms [32].
The system was evaluated on dermoscopy images from two datasets, PH2 and ISIC 2018.The proposed method outperformed state-of-the-art methods for both datasets, with the artificial neural network (ANN) model achieving the highest accuracy of 97.50% for PH2 and 98.35% for ISIC 2018 compared to the convolutional neural network (CNN) model.
Two papers developed similar models based on the ISIC 2017 dataset.Iqbal et al. developed a deep learning model for the automated multiclass classification of skin lesions using dermoscopic images from ISIC databases [33].The proposed deep convolutional neural network (DCNN) approach outperformed state-of-the-art algorithms, achieving 94% precision, 93% sensitivity, and 91% specificity in ISIC-2017 with a 0.964 AUROC.This proposed approach provides a feasible way to automate and expedite the skin lesion classification task, potentially saving effort, time, and human lives.Similarly, Acosta et al. developed a deep learning-based approach that involves a two-stage process using mask and region-based convolutional neural networks (CNNs) and a ResNet152 structure for lesion classification [34].The model achieved an accuracy and balanced accuracy increase of 3.66% and 9.96%, respectively, compared to the best results reported in the 2017 International Symposium on Biomedical Imaging Challenge dataset.This model boasts an excellent balance between overall accuracy (0.904), sensitivity (0.820), and specificity (0.925).
In 2017, Tognetti et al. aimed to develop a deep convolutional neural network, called iDCNN_aMSL, to support dermatologists in the differentiation of early melanoma from atypical nevi using dermoscopic images and clinical data [35].The model was compared to the intuitive diagnoses of dermatologists with different experience levels and achieved the best accuracy, reducing the ratio of inappropriate excision.This model achieved an area under the curve of 90.3%, sensitivity of 86.5%, and specificity of 73.6% compared to the intuitive diagnoses of dermatologists (sensitivity of 77% and specificity of 61.4%), and it can provide valuable assistance to dermatologists in making informed medical decisions that can help to reduce the number of unnecessary excisions.Guo et al. proposed a deep convolutional neural network trained with both cross-entropy and covariance discriminant loss [37].This approach improves the model outputs and extracted features simultaneously, and a new embedding loss is designed to separate the features of melanoma and nonmelanoma images more effectively.The proposed method achieved sensitivity of 0.942 and 0.917 on the ISBI 2018 Skin Lesion Analysis dataset, demonstrating its efficacy in melanoma recognition.R K et al. also presented a deep convolutional neural network for automated melanoma detection that can adapt to different hardware and software limitations [38].The network was trained on dermoscopic skin images from open sources and achieved high average values for accuracy, sensitivity, and specificity of 82.95%, 82.99%, and 83.89%, respectively, when tested on a dataset of 2150 images.Minagawa et al. compared the performance of 30 Japanese dermatologists with a deep neural network (DNN) for the dermoscopic diagnosis of skin tumors in different datasets [39].Interestingly, the study found that the dermatologists' sensitivity for malignancy prediction was significantly higher for the Shinshu dataset (Japanese only) compared to the ISIC dataset (predominately fair-skinned), and the DNN had higher specificity than the human readers.The study suggests that a DNN may help to improve the diagnostic performance for skin tumors in patients with different skin types and may be more effective at identifying malignant features after initial screening by dermatologists.
Further, Nasiri et al. presented an approach for the early detection of melanoma using deep learning in a case-based reasoning (CBR) system [40].The system, called DePicT Melanoma Deep-CLASS, utilizes a convolutional neural network (CNN) composed of sixteen layers to classify skin lesions as benign or malignant melanoma.The efficiency of this system was demonstrated using the ISIC Archive dataset and the proposed method was shown to be effective in malignancy detection, with validation on 1796 dermoscopy images.Winkler et al. aimed to investigate the diagnostic performance of a deep learning convolutional neural network (CNN) for melanoma diagnosis across different melanoma localizations and subtypes [41].Each set included 30 melanomas and 100 benign lesions of related localizations and morphology.The study found that the CNN showed high-level performance in superficial spreading melanomas, nodular melanomas, and lentigo maligna melanomas and facial solar lentigines/seborrhoeic keratoses/nevi.In acrolentiginous melanomas, the CNN's sensitivity was lower, but high specificity was noted.
Phillips et al. conducted a study with the aim of assessing the accuracy of a neural network called Deep Ensemble for Recognition of Melanoma (DERM) in detecting malignant melanoma from dermoscopic images of pigmented skin lesions [42].The DERM model was trained and tested using a dataset consisting of 7102 dermoscopic images encompassing both melanoma and benign lesions.The findings revealed that DERM achieved an impressive area under the ROC curve (AUC) of 0.93, along with sensitivity of 85.0% and specificity of 85.3%.Additionally, the study conducted a comprehensive meta-analysis that examined the accuracy of naked-eye examination, with or without dermoscopy, performed by specialist and general physicians.The meta-analysis demonstrated that primary care physicians achieved an AUC of 0.83, with sensitivity of 79.9% and specificity of 70.9%, while dermatologists achieved an AUC of 0.91, with sensitivity of 87.5% and specificity of 81.4%.The study suggests that DERM has the potential to be used as a decision support tool in primary care by providing dermatologist-grade recommendations on melanoma likelihood.Following their previous work, the same research group conducted a subsequent study to evaluate the accuracy of an artificial intelligence algorithm in detecting melanoma in dermoscopic images of skin lesions captured using smartphones and digital single-lens reflex (DSLR) cameras [51].The algorithm achieved an impressive area under the receiver operator characteristic (ROC) curve of 91.8% when compared to histopathological diagnosis.Notably, at 100% sensitivity, the algorithm achieved specificity of 64.8%, while clinicians achieved specificity of 69.9%.
Several studies have gone beyond the traditional binary perspective, presented by Brinker et al., Haenssle et al., and Marchetti et al., in an effort to undertake multiclass classification tasks [43,44,46].In 2019, Brinker et al. compared the diagnostic accuracy of a deep convolutional neural network (CNN) with that of dermatologists in the classification of melanoma and nevi images [43].A total of 804 dermoscopic images were presented to nine dermatologists, and the CNN achieved higher sensitivity and specificity in lesion classification than the dermatologists.The study concluded that automated dermoscopic melanoma image classification was significantly superior to both junior and board-certified dermatologists.Subsequently, Brinker et al. trained another deep learning algorithm on open-source images, which outperformed 136 of 157 dermatologists in classifying dermoscopic melanoma images [44].The algorithm was measured in terms of sensitivity, specificity, and receiver operating characteristics and achieved higher scores than dermatologists of all levels of experience.
Similarly, in a 2020 study by Marchetti et al., computer algorithms from an international melanoma detection challenge outperformed eight dermatologists and nine dermatology residents in diagnosing melanoma from dermoscopy images, with an area under the receiver operating characteristic curve of 0.87 compared to 0.74 and 0.66, respectively [14].The algorithm had superior specificity at the dermatologists' overall sensitivity, and the imputation of algorithm classifications increased the dermatologists' sensitivity and specificity.Although the study's artificial setting lacked the full spectrum of skin lesions and clinical metadata, the results suggest that deep neural networks have the potential to improve human performance in skin lesion classification.
In one of the earliest studies that uniquely provided additional clinical information to dermatologists within the reader study, Haenssle et al. compared the diagnostic performance of a deep learning convolutional neural network (CNN) for dermoscopic melanoma recognition to that of 58 dermatologists, including 30 experts [46].The CNN had higher specificity than the dermatologists at their respective sensitivity and achieved a greater area under the curve (AUC) for receiver operating characteristics (ROC) than the dermatol-ogists.The study suggests that dermatologists may benefit from assistance by a CNN's image classification.
Hagerty et al. proposed a unique fusion approach of conventional image processing and deep learning to diagnose melanoma dermoscopy images [45].The fusion technique combines three handcrafted, biologically inspired image processing modules and one clinical information module with a ResNet50 deep learning network.Using logistic regression to ensemble the scores from both processing arms, the fusion technique achieved classification accuracy of 0.94, compared to 0.87 for the deep learning classifier alone and 0.90 for the conventional image processing classifier alone.The study suggests the further investigation and development of fusion techniques for melanoma diagnosis.Li et al. proposed two separate deep learning methods to address the challenges in the accurate recognition of melanoma in dermoscopy images [47].The methods included lesion segmentation, lesion dermoscopic feature extraction, and lesion classification tasks.The proposed deep learning frameworks achieved promising accuracy of 0.753 for task 1, 0.848 for task 2, and 0.912 for task 3 on the ISIC 2017 dataset.This study suggests that the reliable automatic detection of skin tumors using deep learning networks can increase the accuracy and efficiency of pathologists.
In contrast with the majority of studies in this section, Gareau et al. developed a transparent machine learning technology to differentiate melanoma from nevi in dermoscopy images, and an interface for sensory cue integration [36].The interpretable machine learning algorithm outperformed the leading deep learning approach 75% of the time.In lieu of using a deep learning-based approach, this study aimed to provide a transparent approach that results in greater medical accountability and confidence than a CNN.

Artificial Intelligence-Based Approaches to Analysis of Reflectance Confocal Microscopy (RCM) Images
Reflectance confocal microscopy (RCM) has emerged as a valuable tool in dermatology for the diagnosis of skin tumors, including melanoma and non-melanoma skin cancers, as well as for the interpretation and management of inflammatory and infectious skin diseases [52].This technique allows for the real-time in vivo visualization of the epidermis to the papillary dermis, with comparable resolution to histology [53].By allowing the dermatologist to essentially perform a "virtual biopsy" of skin lesions, it can aid in reducing unnecessary invasive procedures and skin biopsies [54].RCM can increase the specificity and sensitivity of diagnosis in most skin cancers, guide biopsy in suspicious lesions, and aid in mapping the margins of large tumors for proper surgical excision [52].It can also be used to follow the clinical and therapeutic evolution of skin diseases.
Despite its benefits, the subjectivity of this method and the reliance on user interpretation create the potential for variations in diagnosis [52].This opens up an opportunity for AI in allowing for the better delineation of the dermal-epidermal junction in RCM images, while also expediting the diagnostic process and aiding the novice, untrained user.By providing an objective analysis, AI applied to RCM images can allow for a reduction in human error, improvement in diagnostic accuracy, and enhancement in the efficiency and reliability of diagnoses.In 2019, Wodzinski et al. proposed a convolutional neural network approach to classify skin lesions using reflectance confocal microscopy (RCM) mosaics [49].The dataset consisted of 429 RCM mosaics and was divided into three classes: melanoma, basal cell carcinoma, and benign naevi.The test set classification achieved higher accuracy as compared to medical, confocal users.This classification system provides a useful tool for the early, non-invasive detection of melanoma.In 2021, D' Alonzo presented a weakly supervised machine learning model for the segmentation of RCM mosaics into "benign" and "aspecific" regions [50].By using a deep neural network, the trained model achieved an average area under the curve of 0.969 and a Dice coefficient of 0.778, showing the feasibility of the spatial localization of aspecific regions in RCM images, making the diagnostics decision model more interpretable to clinicians.

Artificial Intelligence-Based Approaches Applied to Optical Coherence Tomography (OCT) Images
Optical coherence tomography (OCT) is an interferometric imaging method, originally used for optical imaging, that has since been translated to cutaneous evaluations.This imaging method provides real-time views of the superficial layers of the skin [55].It uses an infrared broadband light source to provide a view into the skin up to a depth of 1-2 mm, with a resolution between 15 and 3 µm depending on the system used [55].OCT can be used to evaluate non-melanoma skin cancers, inflammatory diseases, and parasitic infestations, as well as for the investigation of nails [56].OCT provides a quick and useful diagnostic imaging technique that can be used alone or in combination with other noninvasive imaging tools like dermoscopy, high-frequency ultrasound, and confocal laser scan microscopy [57].
While OCT has been utilized as an imaging modality for the diagnosis of basal cell carcinoma and other keratinocyte carcinomas, its use in the diagnosis of malignant melanoma remains under investigation.Initial studies have shown that OCT can detect the diagnostic features of melanoma, such as epidermal psoriasiform hyperplasia, melanocytic nests, and vertical icicle-shaped structures, and may be useful in the preoperative risk stratification of patients with melanoma [58].However, while OCT shows promise in the diagnosis of malignant melanoma, its sensitivity and specificity as a standalone diagnostic tool are not yet convincing [59].Angiographic OCT shows potential in the diagnosis and staging of melanoma, as it can detect early changes in vessel morphology from dysplastic nevi to melanoma [60].Although traditional OCT has not yet been applied to AI-assisted diagnosis for melanoma, a similar protocol using vibrational OCT was described in one paper in our systematic search.While traditional OCT uses near-infrared light to create detailed, highresolution images of biological tissues, vibrational OCT combines traditional OCT with a mechanical vibration stimulus to measure the vibrational properties of tissue structures, providing additional information about tissue composition and function.This technique can be used to image and differentiate between certain tissue features, such as the presence of collagen or elastin.This added feature can be useful in distinguishing melanoma from benign moles.Thus, while traditional OCT provides high-resolution images and an evaluation of the thickness and structure of melanoma lesions, vibrational OCT may offer additional information that can aid in the differentiation between melanoma and benign nevi.
In 2022, Silver et al. explored the use of vibrational optical coherence tomography (VOCT) along with machine learning to differentiate between normal skin and different skin cancers non-invasively [48].The results demonstrated that machine learning, along with the height and location of the VOCT mechanovibrational peaks, can be used to differentiate between normal skin and different cancerous lesions, including melanoma, with sensitivity and specificity rates of 83.3% and 77.8% [48].

Discussion
To the best of our knowledge, this systematic review obtained the largest number of recent studies (n = 40) published regarding the application of AI-based models to the non-invasive imaging of melanoma.The vast majority of these studies applied a deep learning-based model on dermoscopic images, as research involving AI-based OCT and RCM for the early detection of melanoma continues to develop.These non-invasive modalities differ significantly in the diagnostic information that they offer and the technical requirements for their handling, including the type of features, image size, and necessary preprocessing.Current state-of-the-art work in the field of AI-based melanoma detection primarily involves the application of deep learning models to dermoscopic images.The unique contribution of our review lies in its comprehensive analysis of recent studies and its emphasis on the potential of AI for early melanoma detection using three of the major non-invasive imaging modalities.
The main question addressed by our review is whether the application of AI-based models to the non-invasive imaging of melanoma is feasible, efficient, and beneficial compared to modern day standards, and whether this application provides for the early detection of melanoma.Consequently, this systematic review sought to address this gap by emphasizing the potential of deep learning techniques for early melanoma detection and the importance of further research efforts in this area.Since the development of these deep learning models and the datasets used for their training differ, a direct comparison between them was impractical.Hence, our discussion focused on the clinical utility of these results and established the groundwork for future studies of AI development for the evaluation of melanoma.

Clinical Utility and Perceptions of AI in Dermatology
Among the 40 studies included within this systematic review, all showed robust performance in melanoma identification through AI-based algorithms.The majority of the studies were based on dermoscopic image sets as OCT and RCM datasets have not yet been widely applied to AI due to the lack of many public datasets for these categories.Studies within the dermoscopic image category primarily utilized deep learning; however, some papers applied a fusion method or conventional machine learning to datasets [13,45].In studies directly comparing algorithm performance on dermoscopy images to dermatologists, AI-based algorithms achieved a higher ROC (>80%) in the detection of melanoma [27,39,43,44,46].In these comparative studies using dermoscopic images, the mean algorithm sensitivity was 83.01%and the mean algorithm specificity was 85.58%.Thus, these comparative studies indicate the great potential that AI holds in diagnosis based on dermoscopic images of melanoma.While AI-based clinical OCT has shown promising results in the diagnosis of keratinocyte carcinomas, its use in the diagnosis of malignant melanoma is still in its infancy.AI-based algorithms have not yet been applied to traditional OCT; however, the potential of using vibrational optical coherence tomography (VOCT) with machine learning to improve the accuracy of melanoma diagnosis remains promising [48].Studies evaluating the use of AI on RCM images of melanoma have also shown promising results in improving the accuracy and efficiency of diagnosis and allowing for the potential reduction of unnecessary biopsies [49,50].AI also has the potential to increase the accuracy and reproducibility of results.These advances can have significant implications for patients, as well as clinicians, by providing earlier, non-invasive detection of melanoma and other skin cancers, leading to better outcomes and reduced healthcare costs [61].
The use of AI within the medical field continues to rapidly grow.Within the field of dermatology, where the early diagnosis and treatment of melanoma is key, AI can be greatly beneficial in addressing various diagnostic barriers.In one study seeking to explore how patients view AI and perceive its use in skin cancer screening, patients appeared to be receptive to the use of AI for skin cancer screening under the condition that the physician-patient relationship was preserved [62].Given the need for heterogenous clinical and photographic data to expand the potential for AI, understanding patients' perceptions of this tool is essential.
In our review, most papers reviewing dermoscopic images for AI-assisted diagnosis of melanoma utilized the International Skin Imaging Collaboration (ISIC) publicly available dataset.Although this robust dataset has a large number of dermoscopic images, it mainly encompasses data on lesions from light-skinned patients in the western world (United States, Europe, and Australia), with reduced representation of skin lesions from Asian or other darker-skinned populations [11].Thus, the future development of public datasets or clinical collection of dermoscopic images should focus on expanding datasets to include images from all skin tones in order to develop more powerful algorithms that can be reliably applied to all ethnicities.Achieving diverse datasets that are representative of skin lesions seen in clinical practice across the world can also help to realize the full potential of AI in countries that may not have easy access to dermatological care.
The future integration of AI into clinical platforms can aid in reaching populations such as rural communities, which already suffer from disparities in melanoma incidence and higher mortality in certain parts of the U.S. [63,64].By combining these techniques with dermoscopy and visual inspection, the diagnosis of melanoma can be made more efficient and accessible in areas where dermatologists are not readily available.

Ethical Implications
The integration of AI in dermatology necessitates the consideration of ethical, legal, and patient privacy issues [65].Although the use of AI could potentially be helpful in various areas of a traditional dermatology practice, ranging from triaging patients to allowing for the use of non-invasive diagnosis or aiding in the early detection of skin cancer, its potential flaws must be recognized and taken into account.As many AI models have been trained mainly on European or East Asian populations, the relative lack of training on darker skin pigmentation may limit the overall diagnostic accuracy [8].Imbalanced datasets may thus be susceptible to outputting incorrect results and could result in theoretical consequences for both the physician and patient if treatment or skin surgery is undertaken without a supplemental diagnosis [27].On the contrary, AI-based algorithms may also be susceptible to false negatives, as demonstrated through our review, where the sensitivity and specificity are not 100%.Thus, it may be important to notify and educate patients about AI and its use in their diagnosis if undertaken.The potential flaws of AI also emphasize its role in serving as a supplement or diagnostic aid in the clinic, and not as a replacement for board-certified dermatologists.As AI continues to grow beyond medicine, it may be necessary to launch future public health educational campaigns to educate patients about its use and impact on their medical care.

Limitations
As many of the papers evaluated AI-based algorithms on different datasets, a direct comparison of the efficacy of the various methods was not feasible.Further, a meta-analysis of these studies was not plausible due to the variability in performance metrics among studies.The QUADAS-2 tool, which is typically used to analyze diagnostic accuracy studies, could not be applied due to incomplete information about certain domains in the reviewed papers.Additionally, while deep learning was the primary algorithm used, differences in image processing and reporting standards can make it difficult to draw definitive conclusions.The consideration of methodological improvements for future studies should include establishing standardized reporting and evaluation methods and considering diverse datasets.Various factors can enhance the reproducibility for future datasets.This includes the sharing of code, models, and image datasets.The challenges of new and emerging data can be addressed by continuously updating and expanding the datasets used for training and validation, including diverse images representing various skin types and populations.
Moreover, the generalizability of these algorithms poses an important issue.The majority of studies focused on patients with fairer skin types; thus, the translation of the studies' findings to darker skin or patients with other skin variations may be a challenge.To overcome this issue, future studies should aim to include diverse datasets representing a wide range of skin tones and ethnicities.Artifacts such as the presence of hair in the image or skin texture differences can impact the accuracy of AI-based algorithms and need to be considered when implementing them in clinical practice.Image quality is another important factor to consider when implementing AI-based algorithms in clinical practice.Differences in image acquisition and quality, such as zoom level, focus, lighting, or surgical ink markings, can introduce additional bias and impact the accuracy of AI-based algorithms.These issues must be addressed and overcome to realize the full potential of AI-based techniques for the early detection of melanoma.By expanding the diversity of datasets, the power and reliability of AI-based algorithms can be improved, ensuring their applicability across different populations and clinical settings.
It is also important to note that our systematic review may be subject to publication bias.Studies that report favorable results for AI-based algorithms are more likely to be published, while studies with lower accuracy rates or higher false positives or negatives may be less likely to be reported.Lastly, studies directly comparing AI-based algorithms to dermatologists or comparing OCT and RCM with AI application were found to be limited due to the early stages of these techniques.In summary, while our review highlights the potential of AI in dermatology, it is important to recognize and address these limitations before implementing AI-based algorithms in clinical practice.Further research is needed to better understand the risks of potential biases and limitations of these techniques and to develop standardized reporting and evaluation methods for consistent and reliable results.

Conclusions
Our systematic review of 40 recent studies highlights the great potential of AI-based models for the non-invasive imaging of melanoma.The vast majority of the studies utilized deep learning models on dermoscopic images, demonstrating either the equal or superior performance of AI-based algorithms in detecting melanoma when compared to existing methods.Comparative studies of algorithm performance on dermoscopy images versus dermatologists revealed a higher ROC (>80%) for AI-based algorithms, with mean sensitivity of 83.01% and mean specificity of 85.58%.While the use of AI in clinical OCT for malignant melanoma diagnosis is still in its infancy, studies evaluating the use of AI on RCM images of melanoma have shown promising results in improving diagnosis accuracy and efficiency, allowing for the potential reduction of unnecessary biopsies.Studies evaluating machine learning in conjunction with OCT yielded accuracy of 95%, while studies evaluating RCM demonstrated a mean accuracy rate of 82.72%.The integration of AI into clinical platforms can have a significant impact on areas with disparities in melanoma incidence and higher mortality rates, such as rural communities or developing countries.However, ethical, legal, and patient privacy issues related to the use of AI must be considered before clinical integration.To date, this is the first review to synthesize the outcomes of the AI-trained analysis of melanoma detection using non-invasive imaging modalities.The information synthesized herein will aid in the future generation of AI-based diagnostic tools for melanoma while allowing for algorithm optimization.Further research should expand the datasets to include images from all skin tones and ethnicities, to develop more powerful, reliable, and clinically applicable algorithms.Our findings underscore the importance of continued efforts in the development and evaluation of AI-based techniques for the early detection of melanoma.

Figure 1 .
Figure 1.Reflectance confocal microscopy mosaic of melanoma in situ taken at the dep dermal-epidermal junction.Findings include an atypical ringed pattern with non-edged and a non-specific pattern.There is the presence of pleomorphic atypical melanocytes (ro dendritic types) with visible dark nuclei and bright cytoplasm, as well as bundles of atypi nocytes characteristic for melanoma.Image size: 1 mm × 1 mm.

Figure 1 .
Figure 1.Reflectance confocal microscopy mosaic of melanoma in situ taken at the depth of the dermal-epidermal junction.Findings include an atypical ringed pattern with non-edged papillae and a non-specific pattern.There is the presence of pleomorphic atypical melanocytes (round and dendritic types) with visible dark nuclei and bright cytoplasm, as well as bundles of atypical melanocytes characteristic for melanoma.Image size: 1 mm × 1 mm.

Figure 2 .
Figure 2. Reflectance confocal microscopy mosaic of melanoma in situ taken at depth of th mis.There is an irregular cobblestone pattern, the presence of folliculotropism, and ple pagetoid cells (dendritic and round types) with numerosity greater than 10 per square m Image size: 1 mm × 1 mm.

Figure 2 .Figure 3 .
Figure 2. Reflectance confocal microscopy mosaic of melanoma in situ taken at depth of the epidermis.There is an irregular cobblestone pattern, the presence of folliculotropism, and pleomorphic pagetoid cells (dendritic and round types) with numerosity greater than 10 per square millimeter.Image size: 1 mm × 1 mm.Cancers 2023, 15, x FOR PEER REVIEW

Figure 3 .
Figure 3. Dermoscopy of melanoma in situ demonstrating an asymmetric reticular pattern, atypical thickened network, and blue-white structures.Due to the anatomical location of the lesion (on the temple), there is the presence of perifollicular hyperpigmentation, rhomboid structures, and effacement.Scale bar in the image represents 1 cm.

Figure 4 .
Figure 4. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 Flow Diagram.

Figure 4 .
Figure 4. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 Flow Diagram.

Table 1 .
Review of AI-based techniques for dermoscopy image analysis.

Table 2 .
Review of AI-based techniques for optical coherence tomography image analysis.

Table 3 .
Review of AI-based techniques for reflectance confocal microscopy image analysis.

Table 4 .
Summary of publicly available datasets for dermoscopic images of melanoma.