Artificial Intelligence in Dermatology: A Review of Methods, Clinical Applications, and Perspectives

Zbrzezny, Agnieszka M.; Krzywicki, Tomasz

doi:10.3390/app15147856

Open AccessReview

Artificial Intelligence in Dermatology: A Review of Methods, Clinical Applications, and Perspectives

by

Agnieszka M. Zbrzezny

^1,2,*

and

Tomasz Krzywicki

¹

Faculty of Mathematics and Computer Science, University of Warmia and Mazury, 10-710 Olsztyn, Poland

²

Faculty of Design, SWPS University, Chodakowska 19/31, 03-815 Warsaw, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(14), 7856; https://doi.org/10.3390/app15147856

Submission received: 31 May 2025 / Revised: 10 July 2025 / Accepted: 11 July 2025 / Published: 14 July 2025

(This article belongs to the Special Issue Machine Learning in Biomedical Sciences)

Download

Browse Figures

Versions Notes

Abstract

The use of artificial intelligence (AI) in dermatology is skyrocketing, but a comprehensive overview integrating regulatory, ethical, validation, and clinical issues is lacking. This work aims to review current research, map applicable legal regulations, identify ethical challenges and methods of verifying AI models in dermatology, assess publication trends, compare the most popular neural network architectures and datasets, and identify good practices in creating AI-based applications for dermatological use. A systematic literature review is conducted in accordance with the PRISMA guidelines, utilising Google Scholar, PubMed, Scopus, and Web of Science and employing bibliometric analysis. Since 2016, there has been exponential growth in deep learning research in dermatology, revealing gaps in EU and US regulations and significant differences in model performance across different datasets. The decision-making process in clinical dermatology is analysed, focusing on how AI is augmenting skin imaging techniques such as dermatoscopy and histology. Further demonstration is provided regarding how AI is a valuable tool that supports dermatologists by automatically analysing skin images, enabling faster diagnosis and the more accurate identification of skin lesions. These advances enhance the precision and efficiency of dermatological care, showcasing the potential of AI to revolutionise the speed of diagnosis in modern dermatology, sparking excitement and curiosity. Then, we discuss the regulatory framework for AI in medicine, as well as the ethical issues that may arise. Additionally, this article addresses the critical challenge of ensuring the safety and trustworthiness of AI in dermatology, presenting classic examples of safety issues that can arise during its implementation. The review provides recommendations for regulatory harmonisation, the standardisation of validation metrics, and further research on data explainability and representativeness, which can accelerate the safe implementation of AI in dermatological practice.

Keywords:

dermatology; artificial intelligence; deep learning; classification; image analysis; AI regulations

1. Introduction

Artificial intelligence (AI) aims to simulate human cognitive function, representing a paradigm shift in healthcare. In the future, the increasing complexity and growth of healthcare data will necessitate the development of more AI applications. Healthcare providers and biomedical companies are utilising multiple AI types [1]. AI dates back to the 1950s, when British mathematician Alan Turing questioned whether machines could think. Over a decade later, AI was integrated into the life sciences [2], and, in the 1970s, it began to enter the healthcare industry [3].

AI increased its presence in clinical settings in the 1980s and beyond through artificial neural networks, Bayesian networks, and hybrid intelligent systems. Healthcare only recently became the dominant industrial application of AI in aggregate equity funding [4]. Physical AI (e.g., surgical robots in the real world) and virtual AI (e.g., intelligent applications in a virtual environment) are now utilised to support patients and healthcare professionals.

Dermatology is a field that relies on morphological characteristics, and the majority of diagnoses are based on the recognition of visual patterns. Dermatology is ideally suited for assisted diagnosis using AI image recognition capabilities. Dermoscopy, very high-frequency (VHF) ultrasound, and reflectance confocal microscopy (RCM) are examples of skin imaging technologies. Artificial intelligence (AI)’s integration into dermatology has gained significant attention recently, revolutionising the field and enhancing patient care [5]. For example, this has enabled patients with skin lesions to access an application for self-checking (e.g., [6]). The mentioned application states that it has 943,569 AI Dermatologist users, 2,935,707 online checks have been conducted, and 35,755 skin diseases have been identified.

The history of AI in dermatology dates back to the early 2000s, with numerous advancements made since then. The field of dermatology has seen significant breakthroughs thanks to AI, which has revolutionised how skin problems are identified, treated, and managed [7,8]—e.g., AI is better than dermatologists at diagnosing skin cancer and helps to improve doctors’ work in cooperation with the healthcare system [9]. Applying AI technology in dermatology can enhance the diagnostic precision, improve the clinical throughput, and yield better outcomes for patients.

In the early 2000s, researchers began exploring the application of machine learning algorithms in dermatology. Artificial neural networks (ANNs) have been developed in recent decades for various medical applications. However, their application within the dermatology field still needs improvement. The primary application of ANNs in dermatology is in distinguishing between benign and malignant pigmented lesions in vivo [10]. In 2006, the “deep learning” concept and its training method were introduced [11]. Deep learning enables computational models with multiple processing layers to discover data representations with multiple levels of abstraction. Deep learning, a subfield of artificial intelligence, has shown significant success in dermatology over the past few years. Such models, including convolutional neural networks (CNNs), can directly learn intricate patterns and characteristics from vast datasets.

The review aims to address several key research questions related to the application of artificial intelligence (AI) in dermatology.

We present the types of AI methods and algorithms used in dermatology. We examine the basic AI methods employed in dermatological diagnostics, with a particular focus on deep learning and neural networks. We emphasise that deep neural networks (e.g., convolutional neural networks) have become key for the advanced analysis of skin images and the automatic interpretation of skin lesions. We also discuss other machine learning approaches (e.g., supervised, unsupervised, and reinforcement learning methods) as a context for the development of AI in dermatology.

We also explore clinical areas of dermatology where AI is utilised and examine its main achievements. We examine how AI improves the decision-making process in clinical dermatology. Additionally, mobile and telemedicine applications are discussed, which, thanks to AI, allow patients to perform initial assessments of skin lesions. Such solutions demonstrate how AI enhances the accessibility of dermatological care, supporting both patients and doctors in clinical practice.

We focus on the ethical challenges that arise from implementing AI in dermatology. We explore the ethical issues associated with the use of AI in skin diagnostics. The issue of data privacy and informed patient consent is addressed. Dermatological images contain sensitive data; therefore, it is necessary to obtain explicit patient consent for the use of their images and ensure the anonymisation and security of these data. Algorithm bias and access fairness are also discussed: models trained primarily on images of people with light skin may achieve lower diagnostic accuracy in individuals with darker skin, thereby deepening inequalities in care. Another dilemma is legal responsibility and accountability—it is not clear who is responsible in the event of an incorrect diagnosis generated by AI (whether the fault lies with the software manufacturer, the doctor, or the medical facility), which gives rise to the need to clarify the legal rules in this area. We emphasise that, to maintain the trust of patients and doctors in new technologies, the transparency of algorithms (explainability) is necessary, as well as considering these ethical aspects from the outset, at the stage of designing and implementing AI systems.

We also consider how AI models are verified in dermatology and the difficulties that they present. This work analyses methods of validating AI models and the challenges associated with testing them before clinical use. The quality of training data significantly impacts the credibility of models—limited diversity in skin images can lead to the overtraining of models and a decrease in their effectiveness outside laboratory conditions. The lack of full transparency (“black box”) in neural network operation makes it challenging to understand the basis of algorithms’ decisions, thereby complicating their reliable verification in clinical conditions. This article emphasises the need for the thorough testing of models, including testing their resistance to adversarial attacks (intentional image distortions) and assessing the explainability of their decisions, before they are implemented in routine diagnostics. However, note that the lack of uniform standards and legal requirements makes it difficult to enforce such detailed validation; legislation has not kept pace with the rapid development of AI, which means that manufacturers are not always obliged to test their models fully. In response, recommendations are proposed, including standardised model evaluation measures.

The objectives of this paper are divided into two parts due to its cross-sectional and review nature. The main contribution is based on the methodology (Section 4) and focuses on a literature review on the aspects of (1) legal regulations, (2) verification, and (3) ethical issues regarding AI in dermatology, as well as (4) a bibliometric analysis of publications on AI in dermatology over the last 15 years, (5) an ongoing review of clinical applications, including mobile applications, and (6) an analysis of the neural network architectures and datasets used in the most frequently cited studies on skin diagnostics. This work is complemented by (7) a presentation of the main achievements in dermoscopic and histopathological diagnostics and (8) the formulation of recommendations for future research and technological development in dermatology.

Section 2 describes the main machine and deep learning methods, including convolutional neural networks and transfer learning techniques, used to analyse skin images. Section 3 presents the process of modelling decision systems in clinical cases. Applications in diagnostic imaging are discussed, beginning with dermoscopy and then histopathological microscopy, with measures for model evaluation presented. Section 4 describes the methodology of the systematic review, including the PRISMA protocol, the inclusion and exclusion criteria, the literature search strategy (Google Scholar, PubMed, Scopus, and WoS), the bibliometric analysis, and a comparison of the most popular neural network architectures and datasets used. Section 5 assesses the main challenges, ranging from model transparency issues to data security concerns, EU/FDA compliance, and ethical considerations. Section 6 presents a bibliometric analysis of articles on AI in dermatology. Section 7 provides an overview of AI applications in dermatology, its application in clinical practice, and the main directions of development. Section 8 presents a comparative analysis of the neural network architectures and datasets used in skin diagnostics, highlighting key quality metrics (accuracy, AUC, sensitivity, specificity) of the selected, most cited models, as well as the characteristics of the main image databases. Section 9 summarises the most important observations from the conducted review and formulates specific recommendations for future research and the implementation of AI technologies in dermatology. In Section 10, we summarise the other reviews’ perspectives.

2. AI Methods

Artificial intelligence (AI) is a particular set of behaviours characteristic of humans that are reproduced by digital machines. For example, actions include recognising images, generating texts, or determining routes in unfamiliar areas. Over the past few years, various methods have been employed for this purpose, including rough sets [12], rough mereology [13], fuzzy sets [14], and machine learning [15]. Current AI achievements [16] are made possible by the use of robust mathematical structures, such as artificial neural networks (ANNs) [17] and deep learning [18], which are subsets of methods from the machine learning (ML) canon. ANNs, thanks to their ability to match complex hidden patterns in data, are currently the most widely used methods in achieving artificial intelligence in both the science and IT industries. Machine learning and deep learning (DL) are mainstream in the world of learning systems.

2.1. Machine Learning

At the core of this approach are mathematical methods from optimisation and statistics, which form the basis of heuristics for the learning of mathematical models that provide a simplified and approximate description of reality.

We can distinguish three main streams in machine learning: supervised learning [19], unsupervised learning [20], and reinforcement learning [21] (Table 1).

The general process of training mathematical models can be generalised as the repeated processing of samples in training data, which, in dermatology, are typically in the form of images. In the case of supervised learning, learning samples must be given labels in the form of decisions assigned by a human domain expert. The model in such a situation is learned, minimising the error representing the differences between the decisions assigned by itself and those assigned by the expert (e.g., lesion type). In the case of unsupervised learning, samples are processed without assigned labels, and the training process involves searching for patterns and relationships within the data, such as grouping similar skin lesions without predefined categories. The training process by reinforcement differs from that of its predecessors. It involves teaching an agent to navigate a particular abstract environment by earning rewards or punishments for correctly and incorrectly executed movements. This approach may have particular potential in systems supporting diagnosis and treatment over time.

All types of machine learning mentioned above involve repeating the training process many times until convergence is achieved. In the case of supervised learning, these can be tiny differences in the minimised error over successive training iterations. For unsupervised learning, these can be minor shifts in the relationships between samples in the metric space. In contrast, there can be small differences in the agent’s movements in the environment in reinforcement learning.

Given the nature of supervised learning, it is the most widely used approach in developing learning systems for clinical medicine [28]. Its application in practice may mean gathering training data (e.g., images showing skin lesions) and assigning decisions by a dermatologist for melanoma. Given that ML is a type of heuristics, no rule [29] can indicate top-down which algorithm, which variant, and which number of learning examples will produce the best model. Consequently, the creation of learning systems requires a great deal of computational power and time. A large number of learning examples is also necessary. At a minimum, it should be several thousand (e.g., images); however, the more samples that are used, the better the outcome.

2.2. Deep Learning

Deep learning is a set of methods used to train artificial neural networks (ANNs). Due to the high complexity and abundance of solutions employed in this process, DL is a distinct subset of methods within the canon of ML methods. At the core of DL are ANNs, which are extensible mathematical structures inspired by biological neural networks. Their essential feature is their ability to learn to solve complex problems. ANNs owe their ability to expand to layers, which are their primary building blocks for abstraction.

In dermatology, DL and neural networks have demonstrated state-of-the-art performance in tasks such as the classification of skin lesions, segmentation of affected regions, and detection of melanoma from dermoscopic or clinical photographs. A neural network consists of an input layer that accepts data, a stack of hidden layers, and an output layer that returns a result as a prediction (e.g., a decision or regression prediction). All neurons in a given layer are connected to all neurons of the following layer. Each connection between neurons has a numerical value known as its weight. Stacks of hidden layers are responsible for building the internal knowledge representation of the ANN. The more hidden layers in such a stack, the greater the ANN’s ability to learn complex patterns. ANNs with many layers are considered deep neural networks (DNNs), which can learn hierarchical feature representations directly from, e.g., raw images. There is no rigid rule designating this number of layers. However, it is recognised that ANNs containing more than five hidden layers are classified as DNNs, and the rest are considered shallow neural networks. In practice, however, deep neural networks consist of hundreds or even thousands of layers.

Training ANNs involves repeatedly adjusting the weights that define the strength of the connections between neurons in adjacent layers. In practice, DNNs usually consist of hundreds of thousands or even billions of parameters, and the number of iterations of the training process is several hundred or even several thousand. Training ANNs is time-consuming, and many hardware resources are required to carry it out. Graphical processing units (GPUs) [30] are often used to parallelise some of the calculations performed during the ANN learning process, thereby speeding up the entire operation. The ANN training process often takes many days, depending on the problem’s complexity, the volume and abundance of training data, the complexity of the ANN, the hardware resources, and the final assumptions made about the learned model.

2.3. Applications of Neural Networks

Thanks to their non-trivial properties, ANNs have a wide range of applications. Over the years, architectures and variants have been developed that are tailored to working with specific data types and solving specific problems.

Computer vision (CV) is a foundational application of artificial neural networks (ANNs), for which convolutional neural networks (CNNs) [31] were explicitly designed. The primary difference between ANNs and CNNs is that ANNs are well suited for solving complex problems, whereas CNNs excel in solving computer vision-related problems. A CNN processes data in the form of two- or three-dimensional tensors using the discrete convolution operation, primarily for image data. The primary task of CV is image classification, which is particularly appreciated in clinical applications [32]. Over the years, other CV applications have also been described, such as object detection [33] (Figure 1), image segmentation [34] (Figure 2), and new image generation [35]. Specialised ANN architectures based on CNNs have also been developed for this purpose, such as YOLO [36], SSD [37], UNet [38], and GAN [39], which have seen significant improvements over the years [40,41].

Another typical application of ANNs is forecasting sequential data in the form of time series. For this purpose, the recurrent neural network (RNN) was developed [44]. Among time series forecasting applications, forecasting stock market prices [45], analysing market trends [46], and predicting sports results [47] are widespread.

Among the applications of ANNs, it is also possible to mention natural language processing (NLP) [48], which involves analysing text as a sequence of numbers. For this reason, NLP is often performed using RNNs or more complex architectures, such as Transformers [49]. Popular applications within NLP include sentiment analysis [50], named entity extraction in text [51], text summary generation [52], machine translation [53], and chatbots [54]. Several specialised ANN architectures have been developed to implement these applications, such as Seq2Seq [55] or BERT [56], which have seen improvements over the years [57,58,59].

In recent years, generative adversarial networks (GANs) have gained importance in dermatology, enabling the synthesis and translation of skin lesion images for domain augmentation and unification purposes (Table 2). In particular, StyleGAN2 allows the generation of high-quality, diverse samples of melanocytic nevi. At the same time, CycleGAN provides a pairless mapping between clinical and dermatoscopic images, which improves the generalisability of classification and segmentation models.

The use of StyleGAN2 in dermatology is based on the ability to precisely reproduce the morphological features of melanocytic nevi by injecting a layered style vector, which allows the generation of diverse, realistic samples to balance and extend dermatoscopic sets [60]. CycleGAN’s ability to facilitate clinical image translation into dermatoscopic styles (and vice versa) enables the removal of domain mismatches between different acquisition modalities, significantly improving the generalisability of segmentation and classification models on unpaired image–mask pairs [61]. Both methods synergistically enrich the preprocessing and augmentation process, providing the networks with both high-quality synthetic examples and uniform post-style translation samples, which translates into better training stability and robustness to clinical variability.

2.4. New Trends in 2023–2025

The latest publications from 2023 to 2025 expand the spectrum of AI applications in dermatology, presenting both improvements to classical models and entirely new approaches. Ref. [62] proposed a modified segmentation model that combines UNet with an EfficientNet-B3 encoder, resulting in an increase in the average segmentation accuracy of histopathological images from 79% to 83% (the combined accuracy increased from 85% to 94%). In turn, [63] integrated the ResUNet architecture with the ant colony optimisation (ACO) algorithm for automatic hyperparameter tuning, achieving 95.8% accuracy (Dice coefficient: 93.1%) and outperforming previous methods in skin lesion segmentation and classification. Hybrid CNN–Transformer architectures have also been developed—for example, a model combining ResNet50 with a Transformer component (ViT) and a focal loss function that successfully addressed the imbalanced ISIC 2018 data and significantly outperformed previous classifiers of dermoscopic skin images [64]. Generative methods have also found application: Efficient-GAN (2023) generates skin lesion segmentation masks in a generative adversarial (GAN) system, achieving a record Dice coefficient of

\tilde{9}

0% on the ISIC set while offering a lightweight mobile version with comparable performance [65]. In addition, attention has been paid to the resistance of models to adversarial attacks—ref. [66] demonstrated the vulnerability of mobile skin cancer detection applications to physical adversarial attacks, where invisible perturbations (e.g., a transparent sticker on a camera lens) can lead to a false diagnosis, with a 50–80% success rate, revealing serious security risks in such systems.

2.5. Limitations of AI Methods in Dermatology

Despite the impressive results of artificial intelligence in dermatological diagnostics, it is essential to acknowledge the significant limitations that often go unnoticed during the initial enthusiasm for new technologies. One of the key problems is the dependence of the algorithm quality on the type and amount of data on which it is trained. Many diagnostic systems are created based on sets of images with perfectly lit and clear skin lesions, which rarely correspond to the actual conditions in dermatological offices or self-taken photos by patients [67]. In clinical practice, images are often of poor quality, blurred, or have ambiguous features, which significantly degrades the results of algorithms [68].

In addition, a significant challenge is the so-called overfitting problem, which is particularly prevalent in systems based on deep machine learning. Models can achieve excellent results on the datasets for which they were trained. However, at the same time, they exhibit poor generalisation abilities, i.e., they struggle to deal with images that are different from those available during the training stage [69,70]. Most algorithms have been tested only on the sets on which they were trained, and only 5.7% of the reviewed publications were prospective studies, which limits the generalisability of the results to everyday clinical practice [71].

Another important limitation is the lack of transparency of deep learning models. Dermatologists using AI tools are often unable to obtain a clear explanation of the decisions made by the algorithm, which significantly complicates the verification of diagnoses and can lead to misinterpretations [72]. It raises ethical and practical concerns, as physicians must make clinical decisions based on trust in a tool whose decision-making mechanism remains largely invisible.

Additionally, it is worth noting the problems associated with diagnosing patients of different skin types. The predominance of light-skinned individuals in the available databases suggests that AI-based diagnostics may have limited accuracy for darker-skinned individuals, potentially leading to biases and health inequalities [73,74].

All these issues should be consciously addressed during the design, implementation, and evaluation of AI-based systems in dermatology to minimise the risk of diagnostic errors and improve their effectiveness in practical applications.

3. Decision-Making Modeling in Clinical Issues

In dermatological diagnosis, skin imaging plays a significant role. The skin image must be captured digitally when artificial intelligence is used for clinical diagnosis. Dermatoscopic (Figure 1 and Figure 2) and dermatopathologic (Figure 3) photographs are the most common. Consequently, excellent applications can be found in the form of computer vision (CV).

Figure 4 depicts the general flowchart of the automated decision-making process in dermatology. The process begins with acquiring a digital image of the skin lesion, followed by a preprocessing step that varies depending on the type of examination being conducted. Next, the processed images undergo an inference stage, typically implemented by CNNs, and return decisions. Adding object detection or segmentation operations to the image at this stage can positively contribute to more accurate results in the final step.

3.1. Digital Images

Objects in the real world are represented by images created by interacting with a light-sensitive surface. This is achieved by a charge-coupled device (CCD) or complementary metal–oxide–semiconductor (CMOS) [76] matrix in modern digital cameras. Digital images can be represented in two dimensions (monochrome) or three dimensions (colour).

Formally speaking, a digital monochrome image is a function

f (r, c)

represented by a matrix, where each value represents the signal value in a specific pair of coordinates

(r, c)

that corresponds to a pixel. In contrast to analogue images, each pixel in a digital image is assigned a discrete value from a finite range. A digital image comprises a fixed number of pixels, each with a specific position and value. The colours in colour images are encoded using colour models, where each layer represents components such as colour saturation, luminance, and others. RGB [77] is the most popular colour model, created by combining red, green, and blue and utilised by default in the most popular operating systems. Other colour models, such as CIELab [78] and HSV [79], which are used to identify abstract and hidden patterns in images, can also be applied in clinical settings. This is essential because, in dermatology, the colours of skin lesions are of great importance.

3.2. Dermoscopy

Dermoscopy [80] is an examination used to evaluate skin lesions with the naked eye. It uses a dermatoscope that captures digital images. Several solutions analyse dermatoscopic images for lesions using ANNs [81,82]. Dermatoscopic images are captured using a device that provides magnified views of skin lesions. Such images may be affected by abnormal lighting or low contrast. Hair may also obscure skin lesions, making it impossible to interpret the contents and leading to poor image quality. Accordingly, assessing the quality of dermoscopic images is one of the first steps in preprocessing. It is a process that can be successfully implemented by ANNs [83,84].

In image processing, there are generic and specialised approaches. Generic approaches are applicable to most types of images, while the second type is particularly effective for specific image types. The preprocessing of dermatoscopic photographs begins with the elimination of variable lighting effects. Among the generic approaches, gamma correction (changing light intensity) can be indicated [85]. There is also a wide range of algorithms specialised for this task for dermatoscopic images [86,87,88]. In the next step, the images are transformed into other colour models, and specific layers are selected from them on which the skin lesions present themselves most clearly (Figure 5). The images then undergo contrast correction in a generic [89] or specialised [90,91] manner. The final step is typical for dermatoscopic photographs and involves eliminating hairs that may partially obscure skin lesions or introduce additional noise into the image [92,93,94].

Image preprocessing aims to transform the original image into an altered form that highlights the features that should be detected. The correct sequence of these steps is significant for the next step, which is the automatic analysis of the images thus prepared.

3.3. Dermatopathology

Dermatopathology [95] is an examination that involves taking a biopsy of the skin. Histologic slides are the digital representation of this examination and are the most detailed images showing the skin and its lesions. Thanks to their structured form, it is easy to find characteristic patterns on them, and they do not require preprocessing to further highlight lesions. Several solutions analyse histologic slide images for lesions using ANNs [96,97].

3.4. Inference and Decision

The inference stage is the last in the decision modelling process. Often, one of the steps performed during this stage is postprocessing, which involves preparing the image for accompanying CV tasks (e.g., object detection, lesion segmentation) or generating new features based on the image content [98,99] to further support the assignment of final decisions. Typical postprocessing begins with region merging, which involves eliminating false boundaries and lesion regions by merging neighbouring regions belonging to the same object. Reliable methods are by far the best at this stage [100,101]. For the established regions in the images, generic morphological operations are performed [102], such as opening and closing, which involve removing noise within and around the regions. The next step is to detect and remove the peninsula [103], which is located within the detected region and can interfere with the segmentation or object detection process.

The last step is to measure the lesions found through lesion segmentation or object detection. These measurements can be related to the structure of the lesions [104,105,106]. The lesions’ colour can also be considered [107], as well as their geometric dimensions (perimeter, area, diagonals).

Several methods, including those in ML and DL, can be used to assign a final clinical decision. The segmented and measured skin lesions can be converted into a vector and passed as input to an ANN [108], a support vector machine (SVM) [109], or a decision tree [110]. Images showing skin lesions can also be passed directly as input to a CNN [111], obtaining a numerical result in the output (representing a decision), as in the case of the models presented in [108,109,110], which are simplified forms of modern decision-making models and are now rarely used. Each approach can be considered as a separate model, and its decisions can be treated independently or a joint decision can be determined by combining all models, thus creating an ensemble classifier [112,113].

3.5. Classification Quality Measures

Decision models can return results in binary form (e.g., positive/negative patient) or multiclass form (e.g., degree of disease on an n-point scale). To assess the quality of the model, it is necessary to have test data that were not used during the training process and have labels assigned by medical specialists. The test data are then evaluated by the decision model and compared with the labels of specialists, determining a confusion matrix consisting of case counts:

true positives (TP)—classified positively by both the model and specialists;
false positives (FP)—classified positively by the model but negatively by specialists;
false negatives (FN)—classified negatively by the model but positively by specialists;
true negatives (TN)—classified negatively by both the model and specialists.

The quality measures of binary decision models, such as the sensitivity (Formula (1)), specificity (Formula (2)), positive predicted value (PPV, Formula (3)), and negative predicted value (NPV, Formula (4)), are used to evaluate clinical models. The evaluation of clinical models can also be enriched with measures used in decision modelling in general, such as the accuracy (Acc, Formula (5)), balanced accuracy (BA, Formula (6)), F1-score (F1, Formula (7)), and area under the ROC curve (AUC) (Formula (8), Figure 6).

s e n s i t i v i t y = \frac{T P}{T P + F N}

(1)

s p e c i f i c i t y = \frac{T N}{T N + F P}

(2)

P P V = \frac{T P}{T P + F P}

(3)

N P V = \frac{T N}{T N + F N}

(4)

A c c = \frac{T P + T N}{T P + F P + T N + F N}

(5)

B A = \frac{s e n s i t i v i t y + s p e c i f i c i t y}{2}

(6)

F 1 = \frac{2 T P}{2 T P + F P + F N}

(7)

A U C = \int_{0}^{1} s e n s i t i v i t y (F P R^{- 1} (t)) d t

(8)

where FPR (false positive ratio) denotes the percentage of FPs among ground truth negatives:

F P R = \frac{F P}{F P + T N}

(9)

The quality measures for multiclass decision models can be generalised by averaging the scores for each class (

i, j

) of all decision classes (K) using a one-versus-all approach [114] (Formulas (10)–(12) and (14)–(16)). The exception is the AUC, which requires the consideration of every pair of classes (Formula (17)).

Sensitivity (Macro) = \frac{1}{K} \sum_{i = 1}^{K} \frac{T P_{i}}{T P_{i} + F N_{i}}

(10)

Specificity (Macro) = \frac{1}{K} \sum_{i = 1}^{K} \frac{T N_{i}}{T N_{i} + F P_{i}}

(11)

PPV (Macro) = \frac{1}{K} \sum_{i = 1}^{K} \frac{T P_{i}}{T P_{i} + F P_{i}}

(12)

NPV (Macro) = \frac{1}{K} \sum_{i = 1}^{K} \frac{T N_{i}}{T N_{i} + F N_{i}}

(13)

Accuracy (Macro) = \frac{\sum_{i = 1}^{K} T P_{i}}{\sum_{i = 1}^{K} (T P_{i} + F P_{i} + F N_{i} + T N_{i})}

(14)

Balanced Accuracy (Macro) = \frac{1}{K} \sum_{i = 1}^{K} (\frac{T P_{i}}{T P_{i} + F N_{i}} + \frac{T N_{i}}{T N_{i} + F P_{i}})

(15)

F 1 (Macro) = \frac{1}{K} \sum_{i = 1}^{K} \frac{2 \cdot T P_{i}}{2 \cdot T P_{i} + F P_{i} + F N_{i}}

(16)

AUC (Macro) = \frac{1}{K (K - 1)} \sum_{i = 1}^{K} \sum_{j \neq i} A U C_{i j}

(17)

Multiclass decisions regarding the degree of advancement of pathological changes can be binarised at a later stage, e.g., to determine the validity of clinical procedures or a specific group of symptoms and threats [115,116,117,118]. The condition for the correctness of such a process is the maintenance of a linear order concerning the decision classes and the corresponding degree of advancement in changes.

Correctly understanding the meaning of these measures is crucial in interpreting the results of decision model evaluations. Sensitivity indicates how effectively a model can identify positive cases. Specificity determines how many of the positive cases detected are correct. PPV and NPV are the percentages of objects classified as truly positive or genuinely negative, respectively, among all positive or negative objects. Accuracy is the percentage of all objects that receive a correct decision. When the number of objects within the decision classes (positive/negative or for each grade of a given disease) varies, a more reliable measure is BA, which determines the average accuracy within each decision class. An excellent alternative to BA is the F1-score, which also determines the quality of the model within the decision classes. The final AUC is the area under the ROC curve (Figure 6). The larger the graph area that it covers, the larger the field value under it will be.

4. Methods

The presented analysis aims to evaluate and synthesise scientific research in the areas of dermatological regulations, model validation, ethical issues, bibliographic analysis, and clinical implementations combined with AI. The methodology meets the requirements set by PRISMA to ensure rigour in selecting the scientific literature to inform the conclusions drawn from the analyses developed (Figure 7).

4.1. Protocol and Registration

While a structured protocol was carefully devised to inform the methodology of this systematic review, it was not formally registered in any publicly accessible registry prior to commencement. This lack of prospective registration is recognised as a limitation concerning transparency. Nevertheless, all methodological decisions were predetermined before data extraction began and were consistently followed throughout the entire review process.

4.2. Inclusion and Exclusion Criteria

Specific inclusion criteria were adopted to select articles that met high standards of relevance and originality. For this purpose, only articles presenting the current state of knowledge in the thematic areas covered by this analysis were considered. For this purpose, both original research articles and review papers aggregating knowledge of non-technical aspects of AI, along with additional comments from the authors, were included in consideration.

To ensure that the cited works were up to date, the dates used for database searching were from 1 January 2019 to 30 April 2025. In exceptional cases, works published before 2019 that constituted a significant contribution to the analysed area were also taken into account. In such a case, the indicator was a number of citations exceeding 20. Only papers written in English were taken into account. All other articles that did not concern AI applications in dermatology, including both implementation and ethical, regulatory, and verification aspects, were excluded from the analyses. We also excluded works other than original scientific research and review papers.

In the case of analyses of the most popular neural network architectures, along with the datasets used (Section 8), the Google Scholar database was used to identify the five most cited works from 2019 to 2025, which met additional criteria in the form of the given statistical results regarding classification quality measures: accuracy, AUC, sensitivity, and specificity. Since the vast majority of works did not have all the expected measures provided, it was decided to use works that had at least two out of four such measures provided. The language of the articles was also limited to English.

4.3. Search Strategy

To analyse the current state of the scientific literature in the areas of clinical implementation, regulation, verification, and ethical issues, the Google Scholar database was used.

Section 5 and Section 7 contain references to works that showed the most excellent agreement with the keywords about the aspect being analysed. The agreement of the articles was assessed by two researchers who independently reviewed the titles and abstracts.

For the bibliometric analysis, three popular databases were selected for the study: PubMed, Scopus, and Web of Science (WoS). These databases were selected due to their widespread use in the scientific community and their comprehensive coverage of the subject matter under analysis. We used the available application programming interfaces (APIs) for these databases. We performed automatic queries using dedicated software solutions to determine the total number of papers published over the last 15 calendar years: 2009–2024. The analysis was performed for the following pairs of keywords: “artificial intelligence” and “dermatology”; “machine learning” and “dermatology”; “deep learning” and “dermatology”. To determine trends and the growth rate regarding the number of publications, the obtained results were compared with data on papers published for the following single keywords: “artificial intelligence”, “machine learning”, “deep learning”, and “dermatology”. This approach enabled the determination of the extent to which the development of AI methods influences dermatology research and the key trends in this area. The analysis results enabled an assessment of the research intensity at the interface of the two fields and the identification of potential areas for further development.

5. AI Regulations, Verification, and Ethical Problems

In light of the dynamic development of artificial intelligence in dermatology, legal and regulatory frameworks are becoming increasingly important to ensure patient safety and compliance with applicable standards (AI Act in the EU, FDA guidelines). At the same time, algorithm validation and ethical issues—such as model transparency, data privacy protection, and minimising biases—reflect key aspects that determine clinicians’ and patients’ trust in new technologies.

5.1. Regulations on AI in Medicine

In the European Union [119], diagnostic systems based on AI are classified as “high risk”. This means that they must undergo detailed evaluation procedures and have specific documentation and test results confirming their effectiveness and safety before implementation.

In the US, AI used in diagnostics is regulated by the FDA [120] under the 510(k) procedure. This means that manufacturers of such systems must demonstrate how they update their models and verify their safety and effectiveness (as part of the “Software as a Medical Device” approach). Regular algorithm updates must meet FDA requirements, which affects the pace of corrections and improvements to diagnostic systems.

Although ethical and regulatory issues related to AI in dermatology are widely discussed in the literature, practical problems resulting from its use still require specific examples. Cases related to breaches of patient confidentiality are significant. In the UK, in 2021, a case involving a teledermatology app that shared patient imaging data without sufficient user consent gained high-profile attention. It led to an investigation by the British data protection authority, the Information Commissioner’s Office (ICO), and resulted in its suspension for several months, significantly limiting patients’ access to remote dermatological consultations [121].

In Germany, on the other hand, the implementation of the EU Regulation on Artificial Intelligence (AI Act) requirements necessitated the thorough verification of diagnostic models based on deep learning. As a result, some AI solutions used in dermatological diagnostics were temporarily withdrawn from clinics, pending detailed audits confirming their safety and compliance with European regulations [122].

Additionally, it is worth noting the significant ethical issues related to the use of AI in dermatological diagnostics in the United States, where the FDA has imposed specific requirements for the transparency of diagnostic algorithms. For example, an app using AI to assess patients’ risks of melanoma was subject to additional scrutiny because it did not meet the requirements for clear explanations of the basis for the system’s decisions. This led to delays in the implementation of the app in clinics and reduced trust in the technology by both physicians and patients [123].

An example of regulatory challenges outside Europe and the US can be found in India, where the lack of adequately formulated regulations on AI and patient data protection has resulted in many dermatological AI apps currently operating without clear standards for data protection or diagnostic transparency. This has led to growing concerns about the quality of diagnoses and the potential for abuse by commercial providers of these technologies [124].

In January 2024, the FDA made a landmark decision to approve the first AI skin cancer detection device (DermaSensor) for primary care use [125]. It represents a new regulatory precedent—the first time that an AI dermatology device has been approved for use by non-specialists. The authors state that this approval validates AI’s potential to improve access to care but also raises questions about the need for ongoing evidence gathering and regulatory adaptation to rapidly evolving AI technologies.

All these examples demonstrate the importance of considering ethical and regulatory aspects in detail and in context when implementing AI solutions in dermatology in order to avoid not only formal legal issues but also harm to patients and a decline in public trust in modern technologies.

5.2. Verification of the Model

One of the most significant issues associated with using artificial intelligence-based methods is the susceptibility to various types of attacks. Typically, users not involved in IT are unfamiliar with testing their models for susceptibility to various types of disturbances and do not know how to interpret the model’s decisions. The laws and regulations of various countries still fail to keep pace with the development of artificial intelligence-based methods. Therefore, it is challenging to force the designers of these methods to thoroughly check their models, which is a more advanced but also more cost- and time-consuming process than those presented above in Section 3.5. Two significant means of securing models, which are regulated, among others, by the laws of the European Union [119] and the USA [120], are adversarial attacks (Figure 8) [126,127,128] and explainability [129,130,131].

5.2.1. Adversarial Attacks

Adversarial attacks on AI models in dermatology involve deliberately manipulating skin images (e.g., photos of moles or cancerous lesions) in a way that changes their classification by the AI model while remaining almost imperceptible to the human eye. Adversarial attacks involve adding slight, deliberate noise to the input image. This noise is chosen to affect the AI model’s prediction but not to change the human perception of the image.

The most popular adversarial attack methods include the fast gradient sign method (FGSM) [132], which adds a small perturbation in the gradient direction, maximising the model error; projected gradient descent (PGD) [132], which is an improved version of the FGSM with the iterative addition of perturbations; and the Carlini and Wagner attack (CW) [133], which generates minimal perturbations that effectively change the model’s classification. These attacks are most easily found in programming libraries such as CleverHans [134], the Adversarial Robustness Toolbox [135], Foolbox [136], or AdvBox [137], making it easier for modellers to prepare attacks to train their models further using adversarial training [138,139,140,141,142]. However, there is a great need for new adversarial attack generation algorithms that are specialised for dermatological images [143].

Suppose that an AI model classifies images of moles as benign or malignant (melanoma) (Figure 8). An adversarial attack can modify the image so that a malignant mole is classified as benign (false negative diagnosis) or a benign mole is classified as malignant (false positive diagnosis).

As a consequence, this can lead to incorrect medical decisions, such as delays in skin cancer diagnosis or unnecessary biopsies and treatments.

These attacks can be used for both malicious and testing purposes. Unique attack generation can be used to test the robustness of the model, increasing its security against other attacks. It is intended to counter cyberattacks, which can be used to sabotage AI models used in hospitals or telemedicine [144].

Fortunately, we can defend against attacks using adversarial training defence, where the model is trained on data containing examples of attacks. This involves detecting anomalies, i.e., unnatural perturbations in images, and utilising image processing filters to remove interference invisible to the eye.

Adversarial attacks pose a significant threat to AI systems in dermatology, as they can lead to incorrect diagnoses and potentially detrimental outcomes. Studying and counteracting them is crucial for patient safety and the effectiveness of artificial intelligence in medicine, following the standard verification of model quality using the measures presented in Section 3.5.

5.2.2. Explainability

Explainability [145] is a relatively new term that lacks a single, definitive definition. In [146], the following definition is presented: “Explainability is associated with the notion of explanation as an interface between humans and a decision maker that is, at the same time, both an accurate proxy of the decision maker and comprehensible to humans”. In [147], the definition of explainability is as follows: “An AI system is explainable if the task model is intrinsically interpretable”.

The explainability of AI models in dermatology refers to the ability to understand and interpret the behaviour of algorithms that analyse skin images and make diagnostic decisions [129,148,149,150,151]. This is a key aspect in medical applications, as it allows doctors and patients to trust AI’s decisions and understand why a model produces a given diagnosis.

Model explainability, especially in medicine, is crucial because it increases doctors’ and patients’ trust in automatic diagnoses. AI models can be a powerful tool to support doctors, but their decisions must be understandable and verifiable in order to be accepted in clinical practice. As shown in the previous section, AI models can make mistakes. If a doctor understands the basis for the model’s decision, they can detect potential errors more quickly. Moreover, a key aspect in the context of explainability is also meeting legal regulations. The FDA [120] and the EU’s AI Act [119] require explainable models in medicine to ensure compliance with ethics and patient safety. Another example of a phenomenon that can be addressed by using explainability methods is bias reduction. Models can be susceptible to biases resulting from the uneven representation of patient groups. Explainability helps to detect and eliminate all of the above-mentioned problems.

There are several popular methods of explaining AI decisions in dermatology. These include, among others, attention heatmaps (saliency maps [152], Grad-CAM [153], LIME [72], SHAP [154]), which highlight the areas of the skin image that were crucial for the model’s decision. They help doctors to understand whether the model focused on a mole or, for example, on the skin background. Another method is a comparison with similar cases. Models can explain their decisions by indicating similar cases from a database, e.g., “This mole is similar to others that turned out to be melanoma”. In feature-based models (e.g., classic ML algorithms), one can check which features (colour, shape, texture) had the most significant impact on the diagnosis. This method is called model feature analysis. Counterfactual explanations demonstrate how a minor change in the image (e.g., a change in the colour of a mole) can impact the model’s decision, which helps to clarify its operation.

A clinical trial (reader study) [155] of 76 dermatologists evaluated the impact of explainable AI (XAI) on melanoma diagnosis. Dermatologists diagnosed dermoscopic images of skin lesions with the support of an XAI model that provided explanations for their decisions, and their activity was tracked using eye tracking. The explainable AI model improved dermatologists’ sustainable diagnostic accuracy by 2.8 percentage points compared to the standard black-box model. Furthermore, the presence of XAI explanations increased doctors’ confidence in AI decisions and revealed patterns of higher cognitive load (more eye fixations) in cases of diagnostic discrepancies.

However, explainability methods are expensive. The cost is influenced by, among other aspects, the complexity of neural networks. Moreover, modern models, such as CNNs and Transformers, are challenging to understand, even for experts. There are still no standards for explainability, and different methods can provide different explanations, complicating their interpretation. Another problem is the limited availability of data, as models are often trained on small datasets, which can lead to misinterpretations.

5.3. Ethical Implications

The introduction of AI into dermatological practice is associated with important ethical challenges that must be clearly addressed to ensure patient safety and accountability for the methods used.

5.3.1. Informed Consent and Data Privacy

Dermatological images often contain sensitive information about patients, which is why it is essential to obtain informed consent from them for the use of their data. The patient should know precisely how their images will be stored and processed, and with whom they will be shared. Good practices include compliance with GDPR, transparent communication with patients, and effective methods of data anonymisation. It is emphasised that clear rules in this area are crucial to building patient trust and compliance with data protection laws [156,157].

In July 2024, a comprehensive review of ethical issues in AI dermatology was published, highlighting the insufficient protection of patient data privacy and the lack of uniform procedures for obtaining informed consent in mobile applications. The authors emphasised that only 12% of systems had built-in image pseudonymisation mechanisms, which exposed them to the risk of leaking sensitive medical data [158].

A study evaluating the use of ChatGPT (versions 3.5 and 4-o) in dermatology [159] found that as many as 15% of case reviews led to potentially erroneous treatment recommendations, and the systems did not provide sufficient safeguards against unauthorised access to patient data.

5.3.2. Bias and Fairness

A serious concern is the risk of inequality resulting from the fact that AI algorithms are often trained mainly on images of people with light skin. This results in lower diagnostic accuracy in people with darker skin, deepening inequalities in access to effective dermatological care. It is, therefore, necessary to make databases more diverse and to actively detect and reduce existing biases in them [160,161].

5.3.3. Legal Liability and Accountability

Determining legal liability for possible diagnostic errors caused by AI systems also remains an issue. Dermatologists can rely on the recommendations of an AI system when making clinical decisions. However, it is unclear who should be held liable in the event of an error: the software manufacturer, the medical facility, or the doctor. Therefore, detailed legal guidelines are needed, clearly specifying how such liability should be distributed. Solutions are proposed that emphasise shared responsibility and clearly define the scope of responsibilities for individual parties [122,162].

5.4. Human–AI Interaction

A systematic review and meta-analysis [163] compared the performance of AI algorithms with that of dermatologists in diagnosing skin cancer. This analysis of 53 studies found that CNN algorithms achieved comparable or higher sensitivity (87% vs. 79.8%) and specificity (77% vs. 73.6%) than physicians and that AI support improved the diagnostic accuracy—especially in less experienced clinicians. The authors emphasise the need for further research into the application of AI in real-world clinical settings.

6. Bibliometric Analysis of Articles on AI Applications in Dermatology

The number of papers in particular years, categorised according to tags in specific databases, is presented in Table 3 and Table 4. There has been a dynamic increase in the number of publications on the application of artificial intelligence in dermatology over the last 15 years. The significant increase in the number of works since 2016 is evident, which can be attributed to the growing popularity and availability of advanced machine learning methods, including neural networks and deep learning. In 2024, the number of publications in the three analysed databases reached a record level in all categories, indicating growing interest in this topic in the scientific literature.

The highest growth was noted for the combination of the tags “deep learning” and “dermatology”, which indicates the potential of this technology in skin image analysis and dermatological diagnostics. The number of papers with the single tag “deep learning” also increased significantly compared to the initial period of analysis. Publications with the tags “artificial intelligence” and “machine learning” also show stable growth, although their growth is lower than in the case of “deep learning”.

A comparison of the number of papers for the tag “dermatology” with the remaining combinations indicates that artificial intelligence is becoming a significant tool in dermatological research. These results confirm that the interdisciplinarity and integration of modern technologies with traditional fields of medicine bring tangible benefits, demonstrated by the increasing number of publications and possibilities for practical applications.

7. Overview of the Major Applications of AI in Dermatology

In recent years, artificial intelligence has revolutionised dermatology, providing advanced tools for the automatic detection and classification of skin lesions at the stages of dermoscopy, histopathology, and dermatoscopic photography. Technologies based on deep neural networks are increasingly used in both mobile applications for initial patient assessment and clinical decision support systems, contributing to improvements in diagnosis speeds and the optimisation of therapeutic paths.

7.1. Core Applications

In recent years, the incorporation of artificial intelligence (AI) has substantially transformed various fields, including dermatology, in terms of diagnostic accuracy and the development of new treatments. The intersection of AI and dermatology has yielded numerous innovative applications with the potential to revolutionise the diagnosis, treatment, and management of skin conditions. From detecting skin cancer to recommending personalised treatments, AI has demonstrated its ability to improve precision, efficiency, and patient outcomes.

AI in dermatology has been developing for a long time thanks to the advancement of digitisation and technology [164,165,166,167], primarily driven by new image processing algorithms and the development of convolutional neural networks [168]. Advancements in hardware development have also facilitated this progress.

Regarding treatment, AI can select the optimal treatment for patients, determine the number of treatments needed, and assess the effectiveness of treatments for those with skin diseases [169,170]. In addition, AI-based surgical robotic systems can help to reduce human resource consumption, eliminate human fatigue and potential errors, and dramatically reduce surgery times, thereby improving surgical treatment [171,172].

The detection of skin cancer using artificial intelligence (AI) has emerged as a promising application, enabling dermatologists to make early diagnoses and improving patient outcomes. AI algorithms, particularly deep learning models, have been developed to analyse images of skin lesions and detect signs of skin cancer. Below is an overview of how AI is being used to detect skin cancer.

Clinical photography, or confocal microscopy, can acquire images of skin lesions, which artificial intelligence algorithms can then analyse. AI systems can assist in triaging patients by evaluating the risk levels of cutaneous lesions. These systems can classify lesions into various risk categories based on their characteristics, assisting dermatologists with case prioritisation and resource allocation. Artificial intelligence can provide dermatologists with decision support by providing a second opinion or additional information during the diagnostic process.

Recent studies have compared CNNs and humans in categorising dermoscopic images as melanoma or non-melanoma [173,174,175,176]. A systematic review found eleven studies comparing CNNs to human experts in the dermoscopic image classification of pigmented lesions [177]. AI can categorise pigmented skin lesions in humans regardless of the dataset, methodology, or outcome measure. AI classified dermoscopic images as nevi, melanoma, or seborrheic keratoses with an AUC of 0.87, compared to 0.74 for dermatologists. The AI algorithm had 85% specificity, compared to 72.6% for dermatologists, as well as 76% sensitivity. In a separate comparison of AI versus 58 dermatologists using dermoscopic images to detect melanoma, the dermatologists achieved sensitivity of 86.6% and specificity of 71.3%, whereas the CNN achieved 82.5% specificity. With clinical information and images, the method developers improved their sensitivity (88.9%) and specificity (75.7%) but fell short in comparison to the CNN. The CNN’s AUC was 0.86, while the dermatologists’ was 0.79 [174].

Several studies have demonstrated how AI can enhance the dermoscopic evaluation of skin lesions. Marchetti et al. used AI to classify lesions for which physicians had low diagnostic confidence [178]. Notably, 26.6% of dermatologists and 51% of dermatology residents had low confidence. AI helped dermatologists (n = 905) and dermatology residents (n = 981) to correctly classify lesion evaluations, increasing the rates from 73.4% to 75.4% and 69.4% to 72.6%, respectively [178].

To better understand the lesion morphology, the human examiner employs variable lighting and perspectives (e.g., head-on view, side view). AI algorithms use static images. Maron et al. trained three CNNs to distinguish dermoscopic images of melanoma and benign nevi and test how minor image perturbations might compromise an AI pigmented lesion classification algorithm [179]. Minor image changes were tested. Two test sets had multiple lesion images. Zooms and rotations were the first “artificial” changes tested. The second test set had “natural” changes—slightly different lesion photos. The artificial changes increased the probability of a diagnostic class change from nevus to melanoma, from 3.5% to 12.2%. These findings suggest that AI is sensitive to image variables that the practitioner may overlook.

CNNs can classify dermoscopic images of pigmented lesions on acral and mucosal surfaces. Winkler et al. tested a CNN’s ability to distinguish acral, mucosa, and nail unit melanomas from benign lesions with similar localisation and morphology, such as solar lentigo and LMM. The CNN had AUCs of 0.926 (LMM), 0.928 (acral), and 0.989 (SSM) for melanoma subtypes. Mucosal and nail unit melanomas had AUCs of 0.754 and 0.621, respectively, indicating poor diagnostic accuracy [180].

Yu et al. [181] compared the diagnostic accuracy of CNNs to that of two dermatologists with five or more years of experience in dermoscopy (expert group). Seven hundred and twenty-four images (350 acral melanoma, 374 benign nevi, histologically confirmed) were randomly divided into sets (A and B). Both datasets showed similar diagnostic accuracy between the CNN and the expert group (83.51% vs. 81.08%, 80.23%, and 81.64%). Both datasets had CNN and expert AUCs above 0.8. “Non-expert” general physicians had significantly lower diagnostic accuracy (62.71%) and AUCs (0.63–0.66).

Lee et al. [182] examined the decision-making accuracy of 60 physicians when presented with 100 dermoscopic images of acral pigmented lesions to demonstrate how AI can improve clinical assessment. Clinicians could biopsy or follow up on the lesion. The CNN’s (ALMnet) diagnosis of acral nevus or lentiginous melanoma improved clinicians’ decision-making compared to using history, dermoscopic images, and dermoscopic images alone (accuracy 86.9% vs. 79% vs. 74.7%). The CNN improved clinician concordance and reduced performance gaps within the group.

The study presented by Tri-Cong Pham et al. [183] proposed a method for enhanced melanoma prediction on an unbalanced dataset by reconstructing an appropriate CNN architecture and optimising the algorithms. The contributions consisted of three essential elements: a custom loss function, custom mini-batch logic, and reformatted fully connected layers. In the experiment, the training dataset was kept up-to-date to include 17,302 images of melanoma and nevus. Based on the same dataset, the model’s performance was compared to that of 157 dermatologists from 12 German university hospitals. The experimental results demonstrated that the proposed method outperformed all 157 dermatologists and the state-of-the-art method, with an area under the curve of 94.4%, sensitivity of 85%, and specificity of 95.5%. In addition, utilising the optimal threshold resulted in the most well-balanced measurement compared to other studies, and it has promising application potential in medical diagnosis, with sensitivity of 90% and specificity of 93.88%.

The paper by Bhuvaneshwari Shetty et al. [184] presents the classification of images of skin lesions using machine learning and CNN techniques. Experiments were conducted using the HAM10000 dataset. Following the experiments, the machine learning and customised CNN techniques were evaluated based on the accuracy, precision, recall, and F1-score. The images were preprocessed before the training and testing phases, separated into feature and target values, and augmented with additional data. The results indicated that the customised CNN achieved a higher level of accuracy than the machine learning algorithms, at 95.1%. The proposed CNN performed better in classifying the HAM10000 dataset. Comparing the proposed method to existing recent work on the same dataset revealed that the proposed method achieved greater accuracy with minimal loss and errors.

In [185], the authors propose hybrid systems that leverage the benefits of fused CNN models. CNN models receive dermoscopy images from the ISIC 2019 dataset after the Geometric Active Contour (GAC) algorithm has segmented the lesion area and isolated it from healthy skin. An ANN and random forest (RF) receive CNN features that have been fused and accurately classify them. Using hybrid models, CNN-ANN and CNN-RF, the first methodology involves analysing the areas of skin lesions and identifying their types early on. CNN models (AlexNet, GoogLeNet, and VGG16) receive only the lesion area and generate maps with a considerable depth of features. Then, PCA is used to reduce the depth of the feature maps, which are then classified by the ANN and RF. The second method involves analysing the areas of skin lesions and obtaining an early diagnosis of their type using hybrid CNN-ANN and CNN-RF models based on the characteristics of fused CNN models. Notably, the features of the CNN models were serially integrated after principal component analysis (PCA) reduced their dimensions. Hybrid models based on fused CNN features demonstrated promising results in diagnosing dermatoscopic images from the ISIC 2019 dataset and differentiating skin cancer from other skin lesions. AlexNet-GoogleNetVGG16-ANN achieved an AUC of 94.41%, sensitivity of 88.90%, accuracy of 96.10%, precision of 88.69%, and specificity of 99.44%.

Although AI algorithms show promise in detecting skin cancer, they should be viewed as supplementary tools rather than replacements for clinical expertise. The interpretation and judgment of dermatologists, combined with AI assistance, can result in improved patient outcomes and the more efficient management of skin cancer.

AI systems can analyse patient symptoms and medical records to provide automated diagnoses and treatment recommendations for dermatological conditions. For example, scientists have created and trained a deep neural network algorithm capable of differentiating eczema from other infectious skin diseases, including classifying rare skin lesions with direct clinical implications. This study’s diagnostic capabilities informed treatment recommendations, such as the use of topical corticosteroids versus antiseptics [186]. It demonstrates the potential of AI for individualised treatment planning for eczema patients. These systems utilise machine learning techniques to learn from vast amounts of data, thereby improving the accuracy over time.

Research from 2018 [187] identified 43 skin cancer smartphone apps. Most involved remote consultation, monitoring, information, education, skin self-examination, or the prevention of UV radiation exposure.

The studies that have proposed DNN-based smartphone apps for skin disease diagnosis include [188,189,190,191,192]. The study in [189] used the MobileNet architecture. MobileNet has fewer parameters than other networks. MobileNet architectures reduce first-layer computations with depth-wise separable convolutions. This architecture supports CPU-only applications. The model from [189] used a second version of MobileNet and transfer learning to diagnose early-stage skin cancer, with 91.33% accuracy. This model was used because it had 90.1% top-5 ImageNet accuracy and was smaller. Training this model is straightforward and requires only a few GPUs. This model can also be customised for low-spec smartphones. It was demonstrated that the proposed application could accurately classify lesions as either benign or malignant. They used dermoscopy images, smartphone photos, and histology slides (microscope images). DermNet datasets, ISIC Archive, and the Dermofit Image Library provided these images. Photographs are variable due to factors such as the lighting, angle, zoom, and region of interest, making classification difficult. Classifier generalisation becomes crucial. The authors increased the number of images via mirroring, colour shifting, rotating, etc. This app classifies images as malignant or benign. In [192], a deep network, namely a conditional generative adversarial neural network [193], segmented lesions, reduced noise, extracted features (shapes, colours, textures), and used an SVM classifier. The app in [192] requires smartphone photos. Images should be focused, and lesions should not have shadows. This application outputs binary risk ratios (high or low). It is suggested for self-assessment rather than diagnosis. High-risk users receive image-based expert advice. Two studies [194,195] also examined the diagnostic accuracy of this application. With over 900,000 users, this app can detect skin cancer with 95% sensitivity. The application diagnoses skin cancer from dermoscopy images [190]. The MobileNet architecture utilises fully connected layers, preprocessing augmentation, and upsampling. CNNs have also been proposed for skin cancer diagnosis [188]. Another developed mobile app runs on a server with a cloud-based database. A mobile phone uploads images to the remote server in this system, and diagnosis requires internet access. A recent smartphone app can also diagnose dermatological diseases, with the exception of skin cancer.

The authors of [196] analysed the accuracy of commercially available smartphone applications for the detection of melanoma. The study identified 43 applications, but only 25 claimed to identify melanoma and were found to be functional.

7.2. Applications in Clinical Practice

In dermatology, the most important aspect of AI is its use in supporting the diagnosis of skin cancer, ulcers, and psoriasis [197]. Due to the significant subjectivity and difficulty in quantitatively assessing skin lesions during traditional examinations, the use of AI can significantly support the accuracy of such examinations [198].

Aysa [199] is a symptom-checking mobile application created by VisualDx (NY, US). It integrates a clinically focused search engine with a carefully curated database of over 120,000 medical images covering around 200 dermatological conditions. Additionally, the app adapts its suggestions based on the user’s health history, offering a more tailored experience. In [200], the authors conducted clinical studies on the efficacy of Aysa in a semi-urban town in India. The authors tested the performance of the app using measures such as the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, F1-score, and

χ^{2}

test. The analysed decision model, the core of the Aysa application, showed promising results. Notably, the sensitivity of the most probable decision (top-1) was 71% (95% CI: 61.5–74.3%).

Yan et al. [201] developed the PanDerm neural network model, which was trained on over two million images across four imaging modalities. The authors showed that PanDerm outperformed clinicians by 10.2% in early-stage melanoma detection. The authors also demonstrated that the model supported physicians’ work by improving their overall diagnostic accuracy, based on dermoscopic images, by 11%. Notably, this model also demonstrated strong results across various demographic groups.

Mehta et al. [202] developed a HOT (Hierarchical-Out-of-Distribution Clinical Triage) neural network model, trained on a dataset of approximately 208,000 lesion images, that generates three levels of diagnostic results: the classification of skin lesions, warnings for unknown cases, and recommendations for dermatoscopy. The authors showed that the sensitivity of the most general level of diagnosis (benign/malignant) was 88.14% (95% CI: 87.42–88.51%). At the most detailed level of analysis, encompassing 44 skin lesion classes, the model achieved a sensitivity score of 63.90% (95% CI: 62.27–65.61%). It is worth noting that this model has not yet received official clinical test results. However, this method of classifying skin lesions is innovative, i.e., utilising different levels of detail, which may significantly contribute to promising human–AI cooperation in clinical settings.

An interesting aspect in clinical applications of AI in dermatology is the accuracy of diagnoses made by doctors and intelligent systems. In [203], the authors compared an intelligent image classification system for skin lesions with an expert dermatologist and a non-expert. The AI-based system demonstrated sensitivity of 93.59% and specificity of 70.42% in identifying malignant lesions. The AI-based system showed comparable sensitivity and significantly greater specificity than both the expert dermatologist and the non-expert.

7.3. Future Research Directions

Although the field of AI in dermatology has already seen significant advances in terms of diagnostic accuracy and workflow efficiency, significant gaps remain in validation, implementation, and ethical oversight. Addressing these challenges will require a coordinated effort across technical, clinical, and regulatory domains. Below, we outline the most promising research directions that could help to close these gaps and guide future innovations in AI-based dermatology.

To help readers to quickly identify key research areas, we first outline the main future research directions.

Federated Learning: Training models on distributed datasets without sharing raw images [204,205].
Multimodal AI: Combining imaging data with clinical metadata (e.g., patient history, lab results) [206,207].
Generative Augmentation: Using GANs to expand datasets for rare dermatologic conditions synthetically [60,208].
Explainable AI (XAI) Developing transparent diagnostic explanations (e.g., Grad-CAM, LIME) [149,153,209].
Real-World Evidence (RWE): The prospective validation of algorithms in clinical settings and EHR integration [210,211,212].

8. Comparative Analysis of Neural Network Architectures and Datasets in Skin Diagnostics

The results obtained by the five most frequently cited skin lesion classification methods, as well as the used neural network architectures, the datasets, and typical statistical parameters illustrating the classification quality, are presented in Table 5.

8.1. Comparison of Neural Network Architectures

It can be seen that Transformer-based architectures [218] (e.g., Separable Vision Transformer, Swin Transformer) have recorded the highest accuracy, which is the most general measure of decision models’ quality. However, this does not mean that there is no negative impact on the consumption of computational resources. Neural networks using Transformers are currently some of the most resource-hungry models, including the resource of time.

The highest AUC result was obtained by the DSCCNet architecture; however, this was achieved through a modified training process, where the authors employed a specific approach to balance the number of images with that of the decision classes.

It is also worth noting the high accuracy and specificity obtained by the GoogLeNet-TL architecture, which is based on the original GoogLeNet architecture [219]. The authors used a classical approach based on transfer learning and dataset augmentation.

In the case of the SkinNet architecture, the authors showed that a simple fusion of the results of three lightweight architectures (MobileNetV2 [220], ResNet18 [221], VGG11 [222]) with additional fine-tuning increased the AUC to 0.96, which places it close to the best solutions of this type.

8.2. Comparison of Datasets

In the literature on dermoscopic skin images, three large datasets are often used: ISIC, HAM10000, and BCN20000. Below, we compare these datasets in terms of the number of images, the number of skin disease classes, the most commonly used evaluation metrics, and the main limitations of each, based on information from publications describing these datasets.

ISIC (e.g., ISIC 2019 Challenge Dataset). This dataset is the largest of those compared—it contains 25,331 training images (plus 8239 test images) grouped into eight diagnostic classes of skin dermatoses [223]. These classes include, among others, melanoma, melanocytic nevus, basal cell carcinoma (BCC), actinic keratosis (AK), seborrheic keratoacanthoma (BKL), dermatofibroma (DF), vascular lesions (VASC), and squamous cell carcinoma (SCC). In addition, a ninth class of out-of-distribution images (images that do not fit into any of the eight categories) was added to the test set [223,224]. The balanced multiclass accuracy is the most commonly reported metric when evaluating model performance on ISIC data, and it was the primary metric in ISIC competitions [225].

The ISIC dataset suffers from a strong class imbalance, e.g., the vast majority of images are nevus lesions, while some classes (e.g., melanomas and dermatofibromas) make up a small percentage of the data [43,223]. This can cause models to underperform for rare categories if appropriate balancing techniques are not applied [223]. Another challenge is the limited variety of skin tones captured in these data—reviews have shown that public dermoscopic image collections are underrepresentative of darker skin types [226]. This results in reduced AI model performance for darker skin types, as models are trained primarily on light skin types. Another limitation is the partial lack of histopathological confirmation for some images in ISIC [223].

HAM10000. This is a set of 10,015 dermoscopic images initially provided as a training set in the ISIC 2018 competition [43,223]. It includes seven classes of skin lesions most frequently encountered in dermatological practice, i.e., melanoma, melanocytic nevus, basal cell carcinoma (BCC), actinic keratosis/Bowen’s disease (AKIEC), benign keratotic lesions (BKL), dermatofibroma, and vascular lesions [43]. The most common metric in assessing effectiveness here is the balanced classification accuracy. Due to the uneven number of individual categories, it is recommended to weight the results so that each class has an equal impact [225]. The HAM10000 set, like ISIC, has a class imbalance—images of nevi dominate, while, for example, dermatofibromas or vascular lesions constitute a miniscule fraction [43]. The authors deliberately limited themselves to pigmented skin lesions—there are no completely unpigmented lesions or images from the nail or mucosa region in the dataset [43]. Although this makes the data more homogeneous, it translates into narrow clinical heterogeneity, which may make it challenging to apply the models to cases outside this range (e.g., subungual or amelanotic melanomas are not represented). Another limitation is incomplete histopathological verification: although more than half of the cases in HAM10000 are confirmed by histopathological examination, for the remaining cases, the ”ground truth” is based on long-term observation of the lesions, expert consensus, or confocal microscopy rather than biopsy [43]. Finally, like most datasets of this time, HAM10000 does not contain information on the skin phototypes of the patients—the data come from Austria and Australia, which implies a predominance of lighter phototypes and a potential bias towards individuals with darker skin.

BCN20000. The latest set, published in 2024, contains almost 19,000 dermoscopic images (exactly 18,946 images) from 2010 to 2016, registered at the University Hospital of Barcelona [224]. It includes eight classes of dermatoscopic diagnoses: melanoma, nevus, basal cell carcinoma (BCC), actinic keratosis (AK), squamous cell carcinoma (SCC), benign keratotic lesions (lentigo/seborrhoeic keratosis, BKL), dermatofibroma (DF), and vascular lesions (hemangioma, etc., VASC) [224]. Additionally, a ninth class, OOD, was extracted from the test set for lesions that do not fit into the above categories [224]. As in previous datasets, the balanced accuracy is used in model evaluations as the primary measure of the multiclass classification performance. BCN20000 was designed to address some of the shortcomings of its predecessors—it includes images of lesions in difficult locations (nails, mucosal areas) and lightly pigmented lesions, which were missing in the more cleaned HAM10000-type datasets [224]. However, some limitations remain. The dataset is from a single centre (Spain), so the population and skin type diversity may be limited (as with other datasets, there is a lack of broader representation of dark skin tones). Class imbalance also occurs: although BCN20000 is the largest multiclass dataset (providing the most images for each category), some diagnoses (e.g., dermatofibroma or vascular lesions) are still much less common than, e.g., nevus or BCC. The advantage of BCN20000 is the quality of its annotations: all melanoma and other malignancies are confirmed by histopathology and benign lesions are diagnosed based on histopathology, confocal microscopy, or a consensus opinion from two dermatologists, along with observations of changes over time [224]. This makes the annotations more reliable, reducing the risk of mislabeling. In summary, BCN20000 mitigates some of the biases present in ISIC/HAM (e.g., a wider range of lesion types and locations). However, it does not entirely eliminate the issues of skin bias or class size disparities. Additionally, its homogeneous geographic origin may affect the generalisability of the models.

The following Table 6 presents a tabular summary of the key features of these three datasets, along with their corresponding bibliographic sources.

9. Key Observations and Recommendations

In this subsection, we collect and synthesise the most important results of the conducted literature analysis and our research. The aim is to identify key observations regarding the use of artificial intelligence techniques in dermatology, to indicate dominant trends, and to identify the main challenges and research gaps. Based on them, we formulate specific recommendations that can serve as a practical guide for future research and implementation. This is complemented by a summary of the advantages and disadvantages of individual approaches, which enables the reader to compare the available methods easily.

9.1. Key Observations

Based on the literature review of the period from 2009 to 2024 and the collected empirical data, key trends and patterns in the application of AI in dermatology can be identified. The following observations highlight the dominant approaches, the most commonly used techniques, and the primary challenges that researchers encounter when designing and evaluating models that learn from skin images. This will provide the reader with a synthetic overview of the most important findings, which are the foundation for further recommendations.

Dominance of CNNs in diagnostic imaging. Most works (∼65%) use convolutional neural networks, with transfer learning architectures (e.g., ResNet, DenseNet) being the most common.
Emergence of Transformer-based models. Recent studies have introduced Vision Transformer architectures (e.g., Swin Transformer, ViT variants) that achieve state-of-the-art accuracy in skin lesion classification, albeit with substantially higher computational and data requirements. This marks a new trend that challenges the CNN’s dominance.
Adoption of ensemble learning for performance gains. A few works combine multiple models to boost the accuracy—for example, ensembling lightweight CNNs (as in SkinNet) significantly improved the classification metrics (achieving AUC ≈ 0.96). Such ensembles leverage complementary strengths at the cost of greater complexity.
Growing role of data augmentation. The use of GAN-generated images and augmented imaging (rotation, scaling, colour shifts) has become common, improving the classification accuracy, on average, by 3–7%. It helps to address data scarcity and imbalances in training sets.
Persistent class imbalance challenges. Skewed datasets (e.g., few melanomas among many nevi) remain problematic. Specialised strategies to rebalance training data (for instance, the approach used in DSCC_Net to equalise class representation) have been shown to markedly enhance the model AUC and sensitivity, underlining the importance of addressing rare lesion classes.
Insufficient standardisation of evaluations. A lack of uniform protocols for training/test splits and performance metrics (such as cross-validation vs. random splits and varying datasets) makes it difficult to compare results across studies. This variability hampers the objective benchmarking of different AI models in dermatology.
Limited interpretability of models. Few publications consider model interpretability (e.g., saliency maps like Grad-CAM or LIME), which raises concerns about clinical trust. The black-box nature of many AI models remains a significant obstacle to their acceptance in dermatological practice.

9.2. Key Recommendations

Based on the presented observations, a set of recommendations has been developed to help future researchers and practitioners to select the optimal tools and procedures for skin image analysis. These proposals take into account both the technical aspects of model design and the requirements for experimental reliability and the interpretability of results.

Selection of the appropriate architecture
–
Melanoma classification: We recommend DenseNet-121 as a base in the transfer learning approach, which provides a good compromise between network depth and limiting overfitting.
–
Skin lesion segmentation: We recommend using the UNet model enriched with an attention module (Attention UNet), which enables the more precise extraction of lesion boundaries, especially with limited data.
–
Large-scale lesion classification: If a large quantity of data (>10k images) and computing resources are available, consider advanced architectures like Vision Transformers (e.g., Swin Transformer) for potentially higher accuracy. Please note that these Transformer models are resource-intensive and require longer training times.
–
Maximising diagnostic performance: For critical applications requiring the highest accuracy, an ensemble of models can be used. Combining multiple complementary CNN architectures (as in SkinNet, which fused MobileNet, ResNet18, and VGG11) has demonstrated improved performance (AUC ∼ 0.96), albeit with increased training and deployment complexity.
Standardisation of experimental protocols
–
Use K-fold CV instead of a single random split, which provides a more robust assessment of model generalisation.
–
Use publicly available, standardised reference sets such as ISIC or HAM10000 to increase the comparability between studies.
Augmentation and generative models for data enrichment
–
Use traditional techniques (rotation, scaling, brightness changes) in parallel with StyleGAN2 or CycleGAN to generate synthetic images of rare lesion types.
–
Regularly verify the impact of newly generated samples on model metrics, with the goal of increasing the sensitivity with a minimal increase in false alarms.
Ensure interpretability and clinical confidence
–
Publish saliency maps (e.g., Grad-CAM, LIME) for each new model, which allows verification that the network focuses on symptomatic parts of the lesion.
–
Consider integrating tools such as SHAP to analyse the impact of individual image features on the classifier’s decision.
Regular bibliometric and statistical analysis
–
Update publication trend graphs every six months, taking into account new techniques (e.g., Transformers in computer vision).
–
When comparing performance across architectures, use significance tests (e.g., ANOVA) to determine whether differences in accuracy are statistically significant.

Each recommendation can be tailored to the specific project requirements and available computational resources. Implementing these practices will increase research’s reliability, the comparability of results, and the broader acceptance of AI solutions in clinical applications.

9.3. Method Comparison Table

Table 7 below summarises the most important architectures and approaches discussed in the article, including their key advantages, potential limitations, and recommended applications in the imaging diagnostics of skin lesions.

Figure 9 presents the flowchart applied in selecting the best architectures based on the machine learning task, dataset size, and background complexity.

The presented Table 7 and Figure 9 enable the easy identification of the most suitable methods, depending on the nature of the task, computational requirements, and data availability, which facilitates planning for further work on dermatological diagnostic support systems.

9.4. Machine Learning Flowchart

Figure 10 illustrates a comprehensive workflow for dermatological image analysis, starting with task selection (classification or segmentation) and initial data preparation, including image loading, artefact correction (hair and shadow removal), ROI detection, and cropping, as well as resizing and conversion to tensor format. It is followed by an augmentation stage (geometric and colour transformations, elastic deformations, generation of new examples using StyleGAN2 and CycleGAN, and class balancing), after which the images are subjected to pixel normalisation and standardisation. Subsequent phases include division into training, validation, and test sets (including cross-validation); model architecture selection (transfer learning, fine-tuning); regularisation; and the monitoring of measures, as well as robustness testing (noise simulation, deformation, OOD) and adversarial training (FGSM, PGD). Finally, the framework encompasses model calibration (reliability diagrams, temperature scaling), explainability methods (Grad-CAM, LIME, SHAP), and clinical validation and ethical safeguards prior to implementation.

10. Other Review Perspectives

In Table 8 below, we present an analysis of five review publications in terms of their thematic scope, main contributions, and limitations. The analysis of the above publications suggests that AI has significant potential in dermatology, particularly in the fields of diagnostics and enhancing care efficiency. At the same time, the authors unanimously emphasise the need to address significant challenges before these technologies are widely adopted. It is necessary to ensure the reliable validation of models (on diverse data and in clinical conditions), to ensure the transparency and fairness of algorithms (to avoid bias in terms of, for example, skin phototypes), and to develop clear legal regulations and ethical standards. In practice, AI should serve as a support system for dermatologists and not a replacement. If properly implemented, it can enhance clinical decisions and expand access to specialist care, while maintaining the primary role of the doctor. However, all of this requires further research and close cooperation between clinicians, algorithm developers, and regulators, so that artificial intelligence in dermatology is implemented safely, effectively, and ethically.

11. Conclusions

This article is a cross-sectional review, the main contribution of which is focused on the methodology and bibliography analysis of AI applications in dermatology. It contains a literature review covering legal regulations, verification methods, and ethical issues related to AI. A bibliometric analysis of publications from the last 15 years and an up-to-date review of clinical applications, including mobile applications, are also included. Additionally, the study includes an analysis of the neural network architectures and datasets used in the most frequently cited studies on skin disease diagnostics. This is complemented by a discussion of the basics of AI methods used in dermatology and a presentation of the main achievements in dermoscopic and histopathological diagnostics, as well as the formulation of recommendations for future directions in research and technological development in this field.

Modern dermatology is gaining tremendous support from AI applications, especially in areas related to skin image analysis. This article focuses on this dynamic aspect, exploring various approaches to using AI in dermatology practice. In particular, we emphasise the significance of deep learning and neural networks, which are essential in achieving advanced image analysis and the automatic interpretation of skin lesions. As AI evolves, deep learning methods are becoming increasingly important. Deep learning-based models enable computer systems to detect patterns and subtle differences in skin images, leading to faster and more accurate diagnoses. The decision modelling process based on imaging data is becoming increasingly reliable, improving the dermatological care efficiency. One of the most significant advances in AI in dermatology is its ability to identify complex skin lesions that are often difficult for the human eye to detect. Algorithms learn from vast collections of skin images, enabling them to identify the characteristics of skin lesions accurately. This, in turn, translates into earlier diagnosis and a more effective therapeutic approach. Although AI offers promising prospects, it is worth emphasising that human expertise is still irreplaceable. Collaboration between dermatologists and AI-based tools can significantly enhance the diagnostic and therapeutic process. Ultimately, the application of AI in dermatology represents a step toward more personalised, efficient, and accessible care for patients with various skin conditions.

Despite promising results in the imaging diagnostics of skin lesions—in particular, thanks to deep convolutional networks enabling faster and more accurate pathology recognition—there are still a number of significant limitations. Firstly, the quality and diversity of training data still determine the risk of overfitting among models, and the lack of full transparency in algorithms can lead to unintended biases, making their verification in clinical conditions difficult. At the same time, each AI solution must meet legal and ethical requirements, including the assumptions of the European AI Act and FDA guidelines, which require the implementation of explainability mechanisms and safeguards against adversarial attacks. A bibliometric analysis of the literature from 2009 to 2024 reveals a rapid increase in the number of publications in this field, which indicates growing interest from both academia and industry. Based on the identified trends, we recommend further development in five key areas—federated learning to protect data privacy, multimodal models integrating different information sources, generative augmentation, deep model explainability (XAI), and research based on real-world evidence—which together will contribute to the safer and more effective implementation of AI in dermatology.

Author Contributions

Conceptualisation, A.M.Z. and T.K.; methodology, A.M.Z. and T.K.; software, T.K.; validation, A.M.Z.; investigation, A.M.Z. and T.K.; resources, A.M.Z. and T.K.; data curation, A.M.Z. and T.K.; writing—original draft preparation, A.M.Z. and T.K.; writing—review and editing, A.M.Z. and T.K.; visualisation, T.K.; supervision, A.M.Z.; project administration, A.M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Loh, H.W.; Ooi, C.P.; Seoni, S.; Barua, P.D.; Molinari, F.; Acharya, U.R. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Comput. Methods Programs Biomed. 2022, 226, 107161. [Google Scholar] [CrossRef] [PubMed]
O’Regan, G. History of Artificial Intelligence. In Introduction to the History of Computing; Springer International Publishing: New York, NY, USA, 2016; pp. 249–273. [Google Scholar] [CrossRef]
Buchanan, B.G.; Shortliffe, E.H. Rule Based Expert Systems: The Mycin Experiments of the Stanford Heuristic Programming Project; The Addison-Wesley Series in Artificial Intelligence; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1984. [Google Scholar]
AI Funding in Healthcare Through the Years. Available online: https://www.beckershospitalreview.com/digital-health/ai-funding-in-healthcare-through-the-years.html (accessed on 7 November 2024).
Stypińska, J.; Franke, A. AI revolution in healthcare and medicine and the (re-)emergence of inequalities and disadvantages for ageing population. Front. Sociol. 2023, 7, 1038854. [Google Scholar] [CrossRef] [PubMed]
Dermatologist, A. AI Dermatologist Skin Scanner. Available online: https://ai-derm.com/ai (accessed on 30 May 2025).
Liopyris, K.; Gregoriou, S.; Dias, J.; Stratigos, A.J. Artificial Intelligence in Dermatology: Challenges and Perspectives. Dermatol. Ther. 2022, 12, 2637–2651. [Google Scholar] [CrossRef] [PubMed]
De, A.; Sarda, A.; Gupta, S.; Das, S. Use of artificial intelligence in dermatology. Indian J. Dermatol. 2020, 65, 352. [Google Scholar] [CrossRef]
2025. Available online: https://skin-analytics.com/ (accessed on 7 July 2025).
Renders, J.M.; Simonart, T. Role of Artificial Neural Networks in Dermatology. Dermatology 2009, 219, 102–104. [Google Scholar] [CrossRef] [PubMed]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
Polkowski, L.; Skowron, A. Rough mereology: A new paradigm for approximate reasoning. Int. J. Approx. Reason. 1996, 15, 333–365. [Google Scholar] [CrossRef]
Zadeh, L. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Solomonoff, R.J. An inductive inference machine. In Proceedings of the IRE Convention Record, Section on Information Theory; The Institute of Radio Engineers: New York, NY, USA, 1957. [Google Scholar]
Alwakid, G.; Gouda, W.; Humayun, M.; Sama, N.U. Melanoma Detection Using Deep Learning-Based Classifications. Healthcare 2022, 10, 2481. [Google Scholar] [CrossRef]
Ivakhnenko, A.; Lapa, V. Cybernetics and Forecasting Techniques. In Modern Analytic and Computational Methods in Science and Mathematics; American Elsevier Publishing Company: New York, NY, USA, 1967. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Liu, Q.; Wu, Y. Supervised Learning. In Encyclopedia of the Sciences of Learning; Springer: New York, NY, USA, 2012; pp. 3243–3245. [Google Scholar] [CrossRef]
Barlow, H. Unsupervised Learning. Neural Comput. 1989, 1, 295–311. [Google Scholar] [CrossRef]
Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement Learning: A Survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
Zhou, J.; Wu, Z.; Jiang, Z.; Huang, K.; Guo, K.; Zhao, S. Background selection schema on deep learning-based classification of dermatological disease. Comput. Biol. Med. 2022, 149, 105966. [Google Scholar] [CrossRef]
Inthiyaz, S.; Altahan, B.R.; Ahammad, S.H.; Rajesh, V.; Kalangi, R.R.; Smirani, L.K.; Hossain, M.A.; Rashed, A.N.Z. Skin disease detection using deep learning. Adv. Eng. Softw. 2023, 175, 103361. [Google Scholar] [CrossRef]
Bettuzzi, T.; Hua, C.; Diaz, E.; Colin, A.; Wolkenstein, P.; de Prost, N.; Ingen-Housz-Oro, S. Epidermal necrolysis: Characterization of different phenotypes using an unsupervised clustering analysis. Br. J. Dermatol. 2022, 186, 1037–1039. [Google Scholar] [CrossRef]
Garcia, J.B.; Tanadini-Lang, S.; Andratschke, N.; Gassner, M.; Braun, R. Suspicious Skin Lesion Detection in Wide-Field Body Images using Deep Learning Outlier Detection. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, UK, 11–15 July 2022; pp. 2928–2932. [Google Scholar] [CrossRef]
Usmani, U.A.; Watada, J.; Jaafar, J.; Aziz, I.A.; Roy, A. A Reinforcement Learning Algorithm for Automated Detection of Skin Lesions. Appl. Sci. 2021, 11, 9367. [Google Scholar] [CrossRef]
Barata, C.; Rotemberg, V.; Codella, N.C.F.; Tschandl, P.; Rinner, C.; Akay, B.N.; Apalla, Z.; Argenziano, G.; Halpern, A.; Lallas, A.; et al. A reinforcement learning model for AI-based decision support in skin cancer. Nat. Med. 2023, 29, 1941–1946. [Google Scholar] [CrossRef]
Piccialli, F.; Somma, V.D.; Giampaolo, F.; Cuomo, S.; Fortino, G. A survey on deep learning in medicine: Why, how and when? Inf. Fusion 2021, 66, 111–137. [Google Scholar] [CrossRef]
Wolpert, D.; Macready, W. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef]
Eklund, A.; Dufort, P.; Forsberg, D.; LaConte, S.M. Medical image processing on the GPU—Past, present and future. Med. Image Anal. 2013, 17, 1073–1094. [Google Scholar] [CrossRef] [PubMed]
Cun, Y.L.; Boser, B.; Denker, J.S.; Howard, R.E.; Habbard, W.; Jackel, L.D.; Henderson, D. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems 2; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1990; pp. 396–404. [Google Scholar]
Sarvamangala, D.R.; Kulkarni, R.V. Convolutional neural networks in medical image understanding: A survey. Evol. Intell. 2021, 15, 1–22. [Google Scholar] [CrossRef] [PubMed]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Minaee, S.; Boykov, Y.Y.; Porikli, F.; Plaza, A.J.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef]
Chen, Y.; Yang, X.H.; Wei, Z.; Heidari, A.A.; Zheng, N.; Li, Z.; Chen, H.; Hu, H.; Zhou, Q.; Guan, Q. Generative Adversarial Networks in Medical Image augmentation: A review. Comput. Biol. Med. 2022, 144, 105382. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Springer International Publishing: New York, NY, USA, 2016; pp. 21–37. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Springer International Publishing: New York, NY, USA, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar] [CrossRef]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv 2017, arXiv:1703.10593. [Google Scholar] [CrossRef]
Codella, N.; Rotemberg, V.; Tschandl, P.; Celebi, M.E.; Dusza, S.; Gutman, D.; Helba, B.; Kalloo, A.; Liopyris, K.; Marchetti, M.; et al. Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC). arXiv 2019, arXiv:1902.03368. [Google Scholar] [CrossRef]
Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 2018, 5, 180161. [Google Scholar] [CrossRef]
Rumelhart, D.E.; McClelland, J.L. Learning Internal Representations by Error Propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations; MIT Press: Cambridge, MA, USA, 1987; pp. 318–362. [Google Scholar]
Goyal, A.; Choudhary, A.; Malik, D.; Baliyan, M.S.; Rani, S. Implementing and Analysis of RNN LSTM Model for Stock Market Prediction. In Advances in Data and Information Sciences; Springer: Singapore, 2022; pp. 241–248. [Google Scholar] [CrossRef]
Zhao, J.; Zeng, D.; Liang, S.; Kang, H.; Liu, Q. Prediction model for stock price trend based on recurrent neural network. J. Ambient Intell. Humaniz. Comput. 2020, 12, 745–753. [Google Scholar] [CrossRef]
Hou, J.; Tian, Z. Application of recurrent neural network in predicting athletes’ sports achievement. J. Supercomput. 2021, 78, 5507–5525. [Google Scholar] [CrossRef]
Khurana, D.; Koli, A.; Khatter, K.; Singh, S. Natural language processing: State of the art, current trends and challenges. Multimed. Tools Appl. 2022, 82, 3713–3744. [Google Scholar] [CrossRef]
Patwardhan, N.; Marrone, S.; Sansone, C. Transformers in the Real World: A Survey on NLP Applications. Information 2023, 14, 242. [Google Scholar] [CrossRef]
Wankhade, M.; Rao, A.C.S.; Kulkarni, C. A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 2022, 55, 5731–5780. [Google Scholar] [CrossRef]
Bose, P.; Srinivasan, S.; Sleeman, W.C.; Palta, J.; Kapoor, R.; Ghosh, P. A Survey on Recent Named Entity Recognition and Relationship Extraction Techniques on Clinical Texts. Appl. Sci. 2021, 11, 8319. [Google Scholar] [CrossRef]
Widyassari, A.P.; Rustad, S.; Shidik, G.F.; Noersasongko, E.; Syukur, A.; Affandy, A.; Setiadi, D.R.I.M. Review of automatic text summarization techniques &; methods. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 1029–1046. [Google Scholar] [CrossRef]
Popel, M.; Tomkova, M.; Tomek, J.; Kaiser, L.; Uszkoreit, J.; Bojar, O.; Žabokrtský, Z. Transforming machine translation: A deep learning system reaches news translation quality comparable to human professionals. Nat. Commun. 2020, 11, 4381. [Google Scholar] [CrossRef]
Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. arXiv 2014, arXiv:1409.3215. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Proceedings of the 2019 Conference of the North; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar] [CrossRef]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2025, arXiv:2303.18223. [Google Scholar] [CrossRef]
Matin, R.N.; Linos, E.; Rajan, N. Leveraging large language models in dermatology. Br. J. Dermatol. 2023, 189, 253–254. [Google Scholar] [CrossRef] [PubMed]
Rashid, H.; Tanveer, M.A.; Aqeel Khan, H. Skin Lesion Classification Using GAN based Data Augmentation. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 916–919. [Google Scholar] [CrossRef]
Chen, Y.; Zhu, Y.; Chang, Y. CycleGAN Based Data Augmentation For Melanoma images Classification. In Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition, AIPR ’20, New York, NY, USA, 26–28 June 2020; pp. 115–119. [Google Scholar] [CrossRef]
Asaf, M.Z.; Rasul, H.; Akram, M.U.; Hina, T.; Rashid, T.; Shaukat, A. A Modified Deep Semantic Segmentation Model for Analysis of Whole Slide Skin Images. Sci. Rep. 2024, 14, 23489. [Google Scholar] [CrossRef]
Sarwar, N.; Irshad, A.; Naith, Q.H.; D.Alsufiani, K.; Almalki, F.A. Skin lesion segmentation using deep learning algorithm with ant colony optimization. BMC Med Inform. Decis. Mak. 2024, 24, 265. [Google Scholar] [CrossRef] [PubMed]
Nie, Y.; Sommella, P.; Carratù, M.; O’Nils, M.; Lundgren, J. A Deep CNN Transformer Hybrid Model for Skin Lesion Classification of Dermoscopic Images Using Focal Loss. Diagnostics 2023, 13, 72. [Google Scholar] [CrossRef]
Innani, S.; Dutande, P.; Baid, U.; Pokuri, V.; Bakas, S.; Talbar, S.; Baheti, B.; Guntuku, S.C. Generative adversarial networks based skin lesion segmentation. Sci. Rep. 2023, 13, 13467. [Google Scholar] [CrossRef] [PubMed]
Oda, J.; Takemoto, K. Mobile applications for skin cancer detection are vulnerable to physical camera-based adversarial attacks. Sci. Rep. 2025, 15, 18119. [Google Scholar] [CrossRef]
Esteva, A.; Chou, K.; Yeung, S.; Naik, N.; Madani, A.; Mottaghi, A.; Liu, Y.; Topol, E.; Dean, J.; Socher, R. Deep learning-enabled medical computer vision. Npj Digit. Med. 2021, 4, 5. [Google Scholar] [CrossRef]
Phillips, M.; Marsden, H.; Jaffe, W.; Matin, R.N.; Wali, G.N.; Greenhalgh, J.; McGrath, E.; James, R.; Ladoyanni, E.; Bewley, A.; et al. Assessment of Accuracy of an Artificial Intelligence Algorithm to Detect Melanoma in Images of Skin Lesions. JAMA Netw. Open 2019, 2, e1913436. [Google Scholar] [CrossRef]
Tschandl, P.; Rinner, C.; Apalla, Z.; Argenziano, G.; Codella, N.; Halpern, A.; Janda, M.; Lallas, A.; Longo, C.; Malvehy, J.; et al. Human-computer collaboration for skin cancer recognition. Nat. Med. 2020, 26, 1229–1234. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Zhang, H.; Gichoya, J.W.; Katabi, D.; Ghassemi, M. The limits of fair medical imaging AI in real-world generalization. Nat. Med. 2024, 30, 2838–2848. [Google Scholar] [CrossRef]
Salinas, M.P.; Sepúlveda, J.; Hidalgo, L.; Peirano, D.; Morel, M.; Uribe, P.; Rotemberg, V.; Briones, J.; Mery, D.; Navarrete-Dechent, C. A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis. Npj Digit. Med. 2024, 7, 125. [Google Scholar] [CrossRef] [PubMed]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
Daneshjou, R.; Vodrahalli, K.; Novoa, R.A.; Jenkins, M.; Liang, W.; Rotemberg, V.; Ko, J.; Swetter, S.M.; Bailey, E.E.; Gevaert, O.; et al. Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci. Adv. 2022, 8, 32. [Google Scholar] [CrossRef]
Montoya, L.N.; Roberts, J.S.; Hidalgo, B.S. Towards Fairness in AI for Melanoma Detection: Systemic Review and Recommendations. arXiv 2024, arXiv:2411.12846. [Google Scholar] [CrossRef]
Mahbod, A.; Bancher, B.; Ellinger, I. CryoNuSeg. 2021. Available online: https://www.kaggle.com/dsv/1900145 (accessed on 10 May 2025). [CrossRef]
Kainth, K.; Singh, B. Analysis of CCD and CMOS Sensor Based Images from Technical and Photographic Aspects (January 19, 2020). International Conference of Advance Research & Innovation (ICARI) 2020. Available online: https://ssrn.com/abstract=3559236 (accessed on 10 July 2025).
Süsstrunk, S.; Buckley, R.; Swen, S. Standard RGB Color Spaces. Color Imaging Conf. 1999, 7, 127–134. [Google Scholar] [CrossRef]
Fairchild, M.D. Refinement of the RLAB color space. Color Res. Appl. 1996, 21, 338–346. [Google Scholar] [CrossRef]
Shaik, K.B.; Ganesan, P.; Kalist, V.; Sathish, B.; Jenitha, J.M.M. Comparative Study of Skin Color Detection and Segmentation in HSV and YCbCr Color Space. Procedia Comput. Sci. 2015, 57, 41–48. [Google Scholar] [CrossRef]
Campos-do-Carmo, G.; Ramos-e-Silva, M. Dermoscopy: Basic concepts. Int. J. Dermatol. 2008, 47, 712–719. [Google Scholar] [CrossRef]
Alwakid, G.; Gouda, W.; Humayun, M.; Jhanjhi, N.Z. Diagnosing Melanomas in Dermoscopy Images Using Deep Learning. Diagnostics 2023, 13, 1815. [Google Scholar] [CrossRef]
Deena, D.G.; Prakash, M.B.; Sreekar, C.S.; Rohith, S. A New Approach for Diagnosis of Melanoma from Dermoscopy Images. J. Surv. Fish. Sci. 2023, 10–15. [Google Scholar] [CrossRef]
Dugonik, B.; Dugonik, A.; Marovt, M.; Golob, M. Image Quality Assessment of Digital Image Capturing Devices for Melanoma Detection. Appl. Sci. 2020, 10, 2876. [Google Scholar] [CrossRef]
Alves, J.; Moreira, D.; Alves, P.; Rosado, L.; Vasconcelos, M. Automatic Focus Assessment on Dermoscopic Images Acquired with Smartphones. Sensors 2019, 19, 4957. [Google Scholar] [CrossRef]
Kubinger, W.; Vincze, M.; Ayromlou, M. The role of gamma correction in colour image processing. In Proceedings of the 9th European Signal Processing Conference (EUSIPCO 1998), Rhodes Island, Greece, 8–11 September 1998; pp. 1–4. [Google Scholar]
Cavalcanti, P.G.; Scharcanski, J.; Lopes, C.B.O. Shading Attenuation in Human Skin Color Images. In Advances in Visual Computing; Springer: Berlin/Heidelberg, Germany, 2010; pp. 190–198. [Google Scholar] [CrossRef]
Quintana, J.; Garcia, R.; Neumann, L. A novel method for color correction in epiluminescence microscopy. Comput. Med Imaging Graph. 2011, 35, 646–652. [Google Scholar] [CrossRef] [PubMed]
Glaister, J.; Amelard, R.; Wong, A.; Clausi, D.A. MSIM: Multistage Illumination Modeling of Dermatological Photographs for Illumination-Corrected Skin Lesion Analysis. IEEE Trans. Biomed. Eng. 2013, 60, 1873–1883. [Google Scholar] [CrossRef] [PubMed]
Pizer, S.; Johnston, R.; Ericksen, J.; Yankaskas, B.; Muller, K. Contrast-limited adaptive histogram equalization: Speed and effectiveness. In Proceedings of the First Conference on Visualization in Biomedical Computing, Atlanta, GA, USA, 22–25 May 1990; pp. 337–345. [Google Scholar] [CrossRef]
Celebi, M.E.; Iyatomi, H.; Schaefer, G. Contrast enhancement in dermoscopy images by maximizing a histogram bimodality measure. In Proceedings of the 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 2601–2604. [Google Scholar] [CrossRef]
Abbas, Q.; Garcia, I.F.; Emre Celebi, M.; Ahmad, W.; Mushtaq, Q. A perceptually oriented method for contrast enhancement and segmentation of dermoscopy images. Skin Res. Technol. 2012, 19, e490–e497. [Google Scholar] [CrossRef]
Nguyen, N.H.; Lee, T.K.; Atkins, M.S. Segmentation of light and dark hair in dermoscopic images: A hybrid approach using a universal kernel. In Proceedings of the Medical Imaging 2010: Image Processing; Dawant, B.M., Haynor, D.R., Eds.; SPIE: Bellingham, WA, USA, 2010; Volume 7623, p. 76234N. [Google Scholar] [CrossRef]
Abbas, Q.; Celebi, M.; García, I.F. Hair removal methods: A comparative study for dermoscopy images. Biomed. Signal Process. Control 2011, 6, 395–404. [Google Scholar] [CrossRef]
Abbas, Q.; Garcia, I.F.; Emre Celebi, M.; Ahmad, W. A Feature-Preserving Hair Removal Algorithm for Dermoscopy Images. Skin Res. Technol. 2011, 19, e27–e36. [Google Scholar] [CrossRef]
Kempf, W.; Hantschke, M.; Kutzner, H. Dermatopathology; Springer International Publishing: New York, NY, USA, 2022. [Google Scholar] [CrossRef]
Olsen, T.G.; Jackson, B.H.; Feeser, T.A.; Kent, M.N.; Moad, J.C.; Krishnamurthy, S.; Lunsford, D.D.; Soans, R.E. Diagnostic Performance of Deep Learning Algorithms Applied to Three Common Diagnoses in Dermatopathology. J. Pathol. Inform. 2018, 9, 32. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, X.; Qu, D.; Xue, Y.; Bi, X.; Chen, Z. A Deep Learning Approach for Basal Cell Carcinomas and Bowen’s Disease Recognition in Dermatopathology Image. J. Biomater. Tissue Eng. 2022, 12, 879–887. [Google Scholar] [CrossRef]
Dalal, A.; Moss, R.H.; Stanley, R.J.; Stoecker, W.V.; Gupta, K.; Calcara, D.A.; Xu, J.; Shrestha, B.; Drugge, R.; Malters, J.M.; et al. Concentric decile segmentation of white and hypopigmented areas in dermoscopy images of skin lesions allows discrimination of malignant melanoma. Comput. Med. Imaging Graph. 2011, 35, 148–154. [Google Scholar] [CrossRef]
Kaur, R.; Albano, P.P.; Cole, J.G.; Hagerty, J.; LeAnder, R.W.; Moss, R.H.; Stoecker, W.V. Real-time supervised detection of pink areas in dermoscopic images of melanoma: Importance of color shades, texture and location. Skin Res. Technol. 2015, 21, 466–473. [Google Scholar] [CrossRef]
Emre Celebi, M.; Kingravi, H.A.; Iyatomi, H.; Alp Aslandogan, Y.; Stoecker, W.V.; Moss, R.H.; Malters, J.M.; Grichnik, J.M.; Marghoob, A.A.; Rabinovitz, H.S.; et al. Border detection in dermoscopy images using statistical region merging. Skin Res. Technol. 2008, 14, 347–353. [Google Scholar] [CrossRef] [PubMed]
Wong, A.; Scharcanski, J.; Fieguth, P. Automatic Skin Lesion Segmentation via Iterative Stochastic Region Merging. IEEE Trans. Inf. Technol. Biomed. 2011, 15, 929–936. [Google Scholar] [CrossRef] [PubMed]
Mat Said, K.A.; Jambek, A.; Sulaiman, N. A study of image processing using morphological opening and closing processes. Int. J. Control Theory Appl. 2016, 9, 15–21. [Google Scholar]
Kasmi, R.; Mokrani, K.; Rader, R.K.; Cole, J.G.; Stoecker, W.V. Biologically inspired skin lesion segmentation using a geodesic active contour technique. Skin Res. Technol. 2015, 22, 208–222. [Google Scholar] [CrossRef] [PubMed]
Serrano, C.; Acha, B. Pattern analysis of dermoscopic images based on Markov random fields. Pattern Recognit. 2009, 42, 1052–1057. [Google Scholar] [CrossRef]
Celebi, M.E.; Mendonca, T.; Marques, J.S. (Eds.) Dermoscopy Image Analysis; A Bioinspired Color Representation for Dermoscopy Image Analysis; CRC Press: Boca Raton, FL, USA, 2015; p. 44. [Google Scholar] [CrossRef]
Celebi, M.E.; Mendonca, T.; Marques, J.S. (Eds.) Dermoscopy Image Analysis; Where’s the Lesion?: Variability in Human and Automated Segmentation of Dermoscopy Images of Melanocytic Skin Lesions; CRC Press: Boca Raton, FL, USA, 2015; p. 30. [Google Scholar] [CrossRef]
Barata, C.; Ruela, M.; Francisco, M.; Mendonca, T.; Marques, J.S. Two Systems for the Detection of Melanomas in Dermoscopy Images Using Texture and Color Features. IEEE Syst. J. 2014, 8, 965–979. [Google Scholar] [CrossRef]
Ercal, F.; Chawla, A.; Stoecker, W.; Lee, H.C.; Moss, R. Neural network diagnosis of malignant melanoma from color images. IEEE Trans. Biomed. Eng. 1994, 41, 837–845. [Google Scholar] [CrossRef]
Celebi, M.E.; Kingravi, H.A.; Uddin, B.; Iyatomi, H.; Aslandogan, Y.A.; Stoecker, W.V.; Moss, R.H. A methodological approach to the classification of dermoscopy images. Comput. Med. Imaging Graph. 2007, 31, 362–373. [Google Scholar] [CrossRef]
Kahofer, P.; Hofmann-Wellenhof, R.; Smolle, J. Tissue counter analysis of dermatoscopic images of melanocytic skin tumours: Preliminary findings. Melanoma Res. 2002, 12, 71–75. [Google Scholar] [CrossRef] [PubMed]
Cheng, X.; Kadry, S.; Meqdad, M.N.; Crespo, R.G. CNN supported framework for automatic extraction and evaluation of dermoscopy images. J. Supercomput. 2022, 78, 17114–17131. [Google Scholar] [CrossRef]
Schaefer, G.; Krawczyk, B.; Celebi, M.E.; Iyatomi, H. An ensemble classification approach for melanoma diagnosis. Memetic Comput. 2014, 6, 233–240. [Google Scholar] [CrossRef]
Marques, J. Streak Detection in Dermoscopic Color Images Using Localized Radial Flux of Principal Intensity Curvature. In Dermoscopy Image Analysis; CRC Press: Boca Raton, FL, USA, 2015; pp. 227–246. [Google Scholar] [CrossRef]
Rifkin, R.; Klautau, A. In Defense of One-Vs-All Classification. J. Mach. Learn. Res. 2004, 5, 101–141. [Google Scholar]
Senan, E.M.; Jadhav, M.E. Analysis of dermoscopy images by using ABCD rule for early detection of skin cancer. Glob. Transit. Proc. 2021, 2, 1–7. [Google Scholar] [CrossRef]
Almattar, W.; Luqman, H.; Khan, F.A. Diabetic retinopathy grading review: Current techniques and future directions. Image Vis. Comput. 2023, 139, 104821. [Google Scholar] [CrossRef]
Grzybowski, A.; Peeters, F.; Barão, R.C.; Brona, P.; Rommes, S.; Krzywicki, T.; Stalmans, I.; Jacob, J. Evaluating the efficacy of AI systems in diabetic retinopathy detection: A comparative analysis of Mona DR and IDx-DR. Acta Ophthalmol. 2024, 103, 388–395. [Google Scholar] [CrossRef]
Grzybowski, A.; Brona, P.; Krzywicki, T.; Ruamviboonsuk, P. Diagnostic Accuracy of Automated Diabetic Retinopathy Image Assessment Software: IDx-DR and RetCAD. Ophthalmol. Ther. 2024, 14, 73–84. [Google Scholar] [CrossRef]
Commision, E. Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence ACT) and Amending Certain Union Legislative Acts. 2021. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206&from=EN (accessed on 11 May 2025).
The, U.S. Food and Drug Administration (FDA). Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan. 2021. Available online: https://www.fda.gov/media/145022/download (accessed on 11 May 2025).
Powles, J.; Hodson, H. Google DeepMind and healthcare in an age of algorithms. Health Technol. 2017, 7, 351–367. [Google Scholar] [CrossRef]
Gerke, S.; Minssen, T.; Cohen, G. Ethical and legal challenges of artificial intelligence-driven healthcare. In Artificial Intelligence in Healthcare; Elsevier: Amsterdam, The Netherlands, 2020; pp. 295–336. [Google Scholar] [CrossRef]
Cohen, I.G.; Amarasingham, R.; Shah, A.; Xie, B.; Lo, B. The Legal And Ethical Concerns That Arise From Using Complex Predictive Analytics In Health Care. Health Aff. 2014, 33, 1139–1147. [Google Scholar] [CrossRef]
Eapen, B. Artificial intelligence in dermatology: A practical introduction to a paradigm shift. Indian Dermatol. Online J. 2020, 11, 881. [Google Scholar] [CrossRef]
Venkatesh, K.P.; Kadakia, K.T.; Gilbert, S. Learnings from the first AI-enabled skin cancer device for primary care authorized by FDA. Npj Digit. Med. 2024, 7, 156. [Google Scholar] [CrossRef] [PubMed]
Purohit, J.; Shivhare, I.; Jogani, V.; Attari, S.; Surtkar, S. Adversarial Attacks and Defences for Skin Cancer Classification. In Proceedings of the 2023 International Conference for Advancement in Technology (ICONAT), Goa, India, 24–26 January 2023; pp. 1–6. [Google Scholar] [CrossRef]
Huq, A.; Pervin, M.T. Analysis of Adversarial Attacks on Skin Cancer Recognition. In Proceedings of the 2020 International Conference on Data Science and Its Applications (ICoDSA), Bandung, Indonesia, 5–6 August 2020; pp. 1–4. [Google Scholar] [CrossRef]
Ma, X.; Niu, Y.; Gu, L.; Wang, Y.; Zhao, Y.; Bailey, J.; Lu, F. Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recognit. 2021, 110, 107332. [Google Scholar] [CrossRef]
Chanda, T.; Hauser, K.; Hobelsberger, S.; Bucher, T.C.; Garcia, C.; Wies, C.; Kittler, H.; Tschandl, P.; Navarrete-Dechent, C.; Podlipnik, S.; et al. Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma. arXiv 2023, arXiv:2303.12806. [Google Scholar] [CrossRef]
Hauser, K.; Kurz, A.; Haggenmüller, S.; Maron, R.C.; von Kalle, C.; Utikal, J.S.; Meier, F.; Hobelsberger, S.; Gellrich, F.F.; Sergon, M.; et al. Explainable artificial intelligence in skin cancer recognition: A systematic review. Eur. J. Cancer 2022, 167, 54–69. [Google Scholar] [CrossRef]
Lucieri, A.; Bajwa, M.N.; Braun, S.A.; Malik, M.I.; Dengel, A.; Ahmed, S. ExAID: A multimodal explanation framework for computer-aided diagnosis of skin lesions. Comput. Methods Programs Biomed. 2022, 215, 106620. [Google Scholar] [CrossRef]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. Conference Track Proceedings; OpenReview.net, 2018. [Google Scholar]
Carlini, N.; Wagner, D. Towards Evaluating the Robustness of Neural Network. arXiv 2017, arXiv:1608.04644. [Google Scholar] [CrossRef]
Google Inc.; OpenAI and Pennsylvania State University. CleverHans. 2016. Available online: https://github.com/cleverhans-lab/cleverhans (accessed on 14 August 2024).
LF AI & Data. Adversarial Robustness Toolbox (ART). 2018. Available online: https://github.com/Trusted-AI/adversarial-robustness-toolbox (accessed on 14 August 2024).
Bethge Lab. Foolbox. 2017. Available online: https://github.com/bethgelab/foolbox (accessed on 14 August 2024).
Baidu X-Lab. AdvBox. 2018. Available online: https://github.com/advboxes/AdvBox (accessed on 14 August 2024).
Zhao, M.; Zhang, L.; Ye, J.; Lu, H.; Yin, B.; Wang, X. Adversarial Training: A Survey. arXiv 2024, arXiv:2410.15042. [Google Scholar] [CrossRef]
Li, S.; Ma, X.; Jiang, S.; Meng, L.; Zhang, S. Adaptive Perturbation-Driven Adversarial Training. In Proceedings of the 2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Zhuhai, China, 19–21 April 2024; pp. 993–997. [Google Scholar] [CrossRef]
Tu, W.; Liu, X.; Hu, W.; Pan, Z. Dense-Residual Network with Adversarial Learning for Skin Lesion Segmentation. IEEE Access 2019, 7, 77037–77051. [Google Scholar] [CrossRef]
Li, X.; Cui, Z.; Wu, Y.; Gu, L.; Harada, T. Estimating and Improving Fairness with Adversarial Learning. arXiv 2021, arXiv:2103.04243. [Google Scholar] [CrossRef]
Zunair, H.; Ben Hamza, A. Melanoma detection using adversarial training and deep transfer learning. Phys. Med. Biol. 2020, 65, 135005. [Google Scholar] [CrossRef] [PubMed]
Kanca, E.; Ayas, S.; Kablan, E.B.; Ekinci, M. Implementation of Fast Gradient Sign Adversarial Attack on Vision Transformer Model and Development of Defense Mechanism in Classification of Dermoscopy Images. In Proceedings of the 2023 31st Signal Processing and Communications Applications Conference (SIU), Istanbul, Turkey, 5–8 July 2023; pp. 1–4. [Google Scholar] [CrossRef]
Finlayson, S.G.; Kohane, I.S.; Beam, A.L. Adversarial Attacks Against Medical Deep Learning Systems. arXiv 2018, arXiv:1804.05296. [Google Scholar] [CrossRef]
Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. [Google Scholar] [CrossRef]
Arrieta, A.B.; Díaz-Rodríguez, N.; Ser, J.D.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2019, 58, 82–115. [Google Scholar] [CrossRef]
Markus, A.F.; Kors, J.A.; Rijnbeek, P.R. The role of explainability in creating trustworthy artificial intelligence for health care: A comprehensive survey of the terminology, design choices, and evaluation strategies. J. Biomed. Inform. 2021, 113, 103655. [Google Scholar] [CrossRef]
Nigar, N.; Umar, M.; Shahzad, M.K.; Islam, S.; Abalo, D. A Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification. IEEE Access 2022, 10, 113715–113725. [Google Scholar] [CrossRef]
Giavina-Bianchi, M.; Vitor, W.G.; Fornasiero de Paiva, V.; Okita, A.L.; Sousa, R.M.; Machado, B. Explainability agreement between dermatologists and five visual explanations techniques in deep neural networks for melanoma AI classification. Front. Med. 2023, 10, 1241484. [Google Scholar] [CrossRef] [PubMed]
Jalaboi, R.; Faye, F.; Orbes-Arteaga, M.; Jørgensen, D.; Winther, O.; Galimzianova, A. DermX: An end-to-end framework for explainable automated dermatological diagnosis. Med. Image Anal. 2023, 83, 102647. [Google Scholar] [CrossRef]
Jalaboi, R.; Winther, O.; Galimzianova, A. Dermatological Diagnosis Explainability Benchmark for Convolutional Neural Networks. arXiv 2023, arXiv:2302.12084. [Google Scholar] [CrossRef]
Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity Checks for Saliency Maps. arXiv 2020, arXiv:1810.03292. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2019, 128, 336–359. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
Chanda, T.; Haggenmueller, S.; Bucher, T.C.; Holland-Letz, T.; Kittler, H.; Tschandl, P.; Heppt, M.V.; Berking, C.; Utikal, J.S.; Schilling, B.; et al. Dermatologist-like explainable AI enhances melanoma diagnosis accuracy: Eye-tracking study. Nat. Commun. 2025, 16, 4739. [Google Scholar] [CrossRef]
Char, D.S.; Shah, N.H.; Magnus, D. Implementing Machine Learning in Health Care—Addressing Ethical Challenges. N. Engl. J. Med. 2018, 378, 981–983. [Google Scholar] [CrossRef]
Mittelstadt, B. Principles alone cannot guarantee ethical AI. Nat. Mach. Intell. 2019, 1, 501–507. [Google Scholar] [CrossRef]
Gordon, E.R.; Trager, M.H.; Kontos, D.; Weng, C.; Geskin, L.J.; Dugdale, L.S.; Samie, F.H. Ethical considerations for artificial intelligence in dermatology: A scoping review. Br. J. Dermatol. 2024, 190, 789–797. [Google Scholar] [CrossRef]
Goktas, P.; Grzybowski, A. Assessing the Impact of ChatGPT in Dermatology: A Comprehensive Rapid Review. J. Clin. Med. 2024, 13, 5909. [Google Scholar] [CrossRef]
Adamson, A.S.; Smith, A. Machine Learning and Health Care Disparities in Dermatology. JAMA Dermatol. 2018, 154, 1247. [Google Scholar] [CrossRef]
Daneshjou, R.; Smith, M.P.; Sun, M.D.; Rotemberg, V.; Zou, J. Lack of Transparency and Potential Bias in Artificial Intelligence Data Sets and Algorithms: A Scoping Review. JAMA Dermatol. 2021, 157, 1362. [Google Scholar] [CrossRef]
Froomkin, A.M.; Kerr, I.; Pineau, J. When AIs outperform doctors: Confronting the challenges of a tort-induced over-reliance on machine learning. Ariz. L. Rev. 2019, 61, 33. [Google Scholar]
Krakowski, I.; Kim, J.; Cai, Z.R.; Daneshjou, R.; Lapins, J.; Eriksson, H.; Lykou, A.; Linos, E. Human-AI interaction in skin cancer diagnosis: A systematic review and meta-analysis. Npj Digit. Med. 2024, 7, 78. [Google Scholar] [CrossRef] [PubMed]
von Itzstein, M.S.; Hullings, M.; Mayo, H.; Beg, M.S.; Williams, E.L.; Gerber, D.E. Application of Information Technology to Clinical Trial Evaluation and Enrollment: A Review. JAMA Oncol. 2021, 7, 1559. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Su, G.H.; Ma, D.; Xiao, Y.; Shao, Z.M.; Jiang, Y.Z. Technological advances in cancer immunity: From immunogenomics to single-cell analysis and artificial intelligence. Signal Transduct. Target. Ther. 2021, 6, 312. [Google Scholar] [CrossRef] [PubMed]
Barisoni, L.; Lafata, K.J.; Hewitt, S.M.; Madabhushi, A.; Balis, U.G.J. Digital pathology and computational image analysis in nephropathology. Nat. Rev. Nephrol. 2020, 16, 669–685. [Google Scholar] [CrossRef]
Ho, D.; Quake, S.R.; McCabe, E.R.; Chng, W.J.; Chow, E.K.; Ding, X.; Gelb, B.D.; Ginsburg, G.S.; Hassenstab, J.; Ho, C.M.; et al. Enabling Technologies for Personalized and Precision Medicine. Trends Biotechnol. 2020, 38, 497–518. [Google Scholar] [CrossRef]
Li, Z.; Koban, K.C.; Schenck, T.L.; Giunta, R.E.; Li, Q.; Sun, Y. Artificial Intelligence in Dermatology Image Analysis: Current Developments and Future Trends. J. Clin. Med. 2022, 11, 6826. [Google Scholar] [CrossRef]
Khozeimeh, F.; Alizadehsani, R.; Roshanzamir, M.; Khosravi, A.; Layegh, P.; Nahavandi, S. An expert system for selecting wart treatment method. Comput. Biol. Med. 2017, 81, 167–175. [Google Scholar] [CrossRef] [PubMed]
Han, S.S.; Park, I.; Eun Chang, S.; Lim, W.; Kim, M.S.; Park, G.H.; Chae, J.B.; Huh, C.H.; Na, J.I. Augmented Intelligence Dermatology: Deep Neural Networks Empower Medical Professionals in Diagnosing Skin Cancer and Predicting Treatment Options for 134 Skin Disorders. J. Investig. Dermatol. 2020, 140, 1753–1761. [Google Scholar] [CrossRef]
Avram, M.R.; Watkins, S.A. Robotic Follicular Unit Extraction in Hair Transplantation. Dermatol. Surg. 2014, 40, 1319–1327. [Google Scholar] [CrossRef]
Cianci, S.; Arcieri, M.; Vizzielli, G.; Martinelli, C.; Granese, R.; La Verde, M.; Fagotti, A.; Fanfani, F.; Scambia, G.; Ercoli, A. Robotic Pelvic Exenteration for Gynecologic Malignancies, Anatomic Landmarks, and Surgical Steps: A Systematic Review. Front. Surg. 2021, 8, 79015. [Google Scholar] [CrossRef]
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef] [PubMed]
Haenssle, H.; Fink, C.; Schneiderbauer, R.; Toberer, F.; Buhl, T.; Blum, A.; Kalloo, A.; Hassen, A.B.H.; Thomas, L.; Enk, A.; et al. Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 2018, 29, 1836–1842. [Google Scholar] [CrossRef]
Marchetti, M.A.; Codella, N.C.; Dusza, S.W.; Gutman, D.A.; Helba, B.; Kalloo, A.; Mishra, N.; Carrera, C.; Celebi, M.E.; DeFazio, J.L.; et al. Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images. J. Am. Acad. Dermatol. 2018, 78, 270–277.e1. [Google Scholar] [CrossRef] [PubMed]
Tschandl, P.; Codella, N.; Akay, B.N.; Argenziano, G.; Braun, R.P.; Cabo, H.; Gutman, D.; Halpern, A.; Helba, B.; Hofmann-Wellenhof, R.; et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: An open, web-based, international, diagnostic study. Lancet Oncol. 2019, 20, 938–947. [Google Scholar] [CrossRef]
Haggenmüller, S.; Maron, R.C.; Hekler, A.; Utikal, J.S.; Barata, C.; Barnhill, R.L.; Beltraminelli, H.; Berking, C.; Betz-Stablein, B.; Blum, A.; et al. Skin cancer classification via convolutional neural networks: Systematic review of studies involving human experts. Eur. J. Cancer 2021, 156, 202–216. [Google Scholar] [CrossRef]
Marchetti, M.A.; Liopyris, K.; Dusza, S.W.; Codella, N.C.; Gutman, D.A.; Helba, B.; Kalloo, A.; Halpern, A.C.; Soyer, H.P.; Curiel-Lewandrowski, C.; et al. Computer algorithms show potential for improving dermatologists’ accuracy to diagnose cutaneous melanoma: Results of the International Skin Imaging Collaboration 2017. J. Am. Acad. Dermatol. 2020, 82, 622–627. [Google Scholar] [CrossRef]
Maron, R.C.; Haggenmüller, S.; von Kalle, C.; Utikal, J.S.; Meier, F.; Gellrich, F.F.; Hauschild, A.; French, L.E.; Schlaak, M.; Ghoreschi, K.; et al. Robustness of convolutional neural networks in recognition of pigmented skin lesions. Eur. J. Cancer 2021, 145, 81–91. [Google Scholar] [CrossRef]
Winkler, J.K.; Sies, K.; Fink, C.; Toberer, F.; Enk, A.; Deinlein, T.; Hofmann-Wellenhof, R.; Thomas, L.; Lallas, A.; Blum, A.; et al. Melanoma recognition by a deep learning convolutional neural network—Performance in different melanoma subtypes and localisations. Eur. J. Cancer 2020, 127, 21–29. [Google Scholar] [CrossRef] [PubMed]
Yu, C.; Yang, S.; Kim, W.; Jung, J.; Chung, K.Y.; Lee, S.W.; Oh, B. Acral melanoma detection using a convolutional neural network for dermoscopy images. PLoS ONE 2018, 13, e0193321. [Google Scholar] [CrossRef]
Lee, S.; Chu, Y.; Yoo, S.; Choi, S.; Choe, S.; Koh, S.; Chung, K.; Xing, L.; Oh, B.; Yang, S. Augmented decision-making for acral lentiginous melanoma detection using deep convolutional neural networks. J. Eur. Acad. Dermatol. Venereol. 2020, 34, 1842–1850. [Google Scholar] [CrossRef]
Pham, T.C.; Luong, C.M.; Hoang, V.D.; Doucet, A. AI outperformed every dermatologist in dermoscopic melanoma diagnosis, using an optimized deep-CNN architecture with custom mini-batch logic and loss function. Sci. Rep. 2021, 11, 17485. [Google Scholar] [CrossRef] [PubMed]
Shetty, B.; Fernandes, R.; Rodrigues, A.P.; Chengoden, R.; Bhattacharya, S.; Lakshmanna, K. Skin lesion classification of dermoscopic images using machine learning and convolutional neural network. Sci. Rep. 2022, 12, 18134. [Google Scholar] [CrossRef]
Olayah, F.; Senan, E.M.; Ahmed, I.A.; Awaji, B. AI Techniques of Dermoscopy Image Analysis for the Early Detection of Skin Lesions Based on Combined CNN Features. Diagnostics 2023, 13, 1314. [Google Scholar] [CrossRef]
De Guzman, L.C.; Maglaque, R.P.C.; Torres, V.M.B.; Zapido, S.P.A.; Cordel, M.O. Design and Evaluation of a Multi-model, Multi-level Artificial Neural Network for Eczema Skin Lesion Detection. In Proceedings of the 2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS), Kota Kinabalu, Malaysia, 2–4 December 2015; pp. 42–47. [Google Scholar] [CrossRef]
Ngoo, A.; Finnane, A.; McMeniman, E.; Soyer, H.P.; Janda, M. Fighting Melanoma with Smartphones: A Snapshot of Where We are a Decade after App Stores Opened Their Doors. Int. J. Med. Inform. 2018, 118, 99–112. [Google Scholar] [CrossRef] [PubMed]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Ech-Cherif, A.; Misbhauddin, M.; Ech-Cherif, M. Deep Neural Network Based Mobile Dermoscopy Application for Triaging Skin Cancer Detection. In Proceedings of the 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia, 1–3 May 2019; pp. 1–6. [Google Scholar] [CrossRef]
Sae-Lim, W.; Wettayaprasit, W.; Aiyarak, P. Convolutional Neural Networks Using MobileNet for Skin Lesion Classification. In Proceedings of the 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE), Chonburi, Thailand, 10–12 July 2019; pp. 242–247. [Google Scholar] [CrossRef]
Velasco, J. A Smartphone-Based Skin Disease Classification Using MobileNet CNN. Int. J. Adv. Trends Comput. Sci. Eng. 2019, 8, 2632–2637. [Google Scholar] [CrossRef]
SkinVision. Skin Cancer Melanoma Detection App. Available online: https://www.skinvision.com (accessed on 8 November 2024).
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. arXiv 2016, arXiv:1611.07004. [Google Scholar] [CrossRef]
Maier, T.; Kulichova, D.; Schotten, K.; Astrid, R.; Ruzicka, T.; Berking, C.; Udrea, A. Accuracy of a smartphone application using fractal image analysis of pigmented moles compared to clinical diagnosis and histological result. J. Eur. Acad. Dermatol. Venereol. 2014, 29, 663–667. [Google Scholar] [CrossRef]
Thissen, M.; Udrea, A.; Hacking, M.; von Braunmuehl, T.; Ruzicka, T. mHealth App for Risk Assessment of Pigmented and Nonpigmented Skin Lesions—A Study on Sensitivity and Specificity in Detecting Malignancy. Telemed. E-Health 2017, 23, 948–954. [Google Scholar] [CrossRef]
Sun, M.; Kentley, J.; Mehta, P.; Dusza, S.; Halpern, A.; Rotemberg, V. Accuracy of commercially available smartphone applications for the detection of melanoma. Br. J. Dermatol. 2022, 186, 744–746. [Google Scholar] [CrossRef]
Koka, S.S.A.; Burkhart, C.G. Artificial Intelligence in Dermatology: Current Uses, Shortfalls, and Potential Opportunities for Further Implementation in Diagnostics and Care. Open Dermatol. J. 2023, 17, e187437222304140. [Google Scholar] [CrossRef]
Kania, B.; Montecinos, K.; Goldberg, D.J. Artificial intelligence in cosmetic dermatology. J. Cosmet. Dermatol. 2024, 23, 3305–3311. [Google Scholar] [CrossRef]
VisualDx. Aysa. 2025. Available online: https://www.visualdx.com/blog/visualdx-launches-aysa-for-consumers-to-check-skin-conditions-using-ai/ (accessed on 17 May 2025).
Marri, S.S.; Albadri, W.; Hyder, M.S.; Janagond, A.B.; Inamadar, A.C. Efficacy of an Artificial Intelligence App (Aysa) in Dermatological Diagnosis: Cross-Sectional Analysis. JMIR Dermatol. 2024, 7, e48811. [Google Scholar] [CrossRef] [PubMed]
Yan, S.; Yu, Z.; Primiero, C.; Vico-Alonso, C.; Wang, Z.; Yang, L.; Tschandl, P.; Hu, M.; Ju, L.; Tan, G.; et al. A Multimodal Vision Foundation Model for Clinical Dermatology. arXiv 2024, arXiv:2410.15038. [Google Scholar] [CrossRef] [PubMed]
Mehta, D.; Primiero, C.; Betz-Stablein, B.; Nguyen, T.D.; Gal, Y.; Bowling, A.; Haskett, M.; Sashindranath, M.; Bonnington, P.; Mar, V.; et al. Multi-task AI models in dermatology: Overcoming critical clinical translation challenges for enhanced skin lesion diagnosis. J. Eur. Acad. Dermatol. Venereol. 2025. [Google Scholar] [CrossRef] [PubMed]
Mevorach, L.; Farcomeni, A.; Pellacani, G.; Cantisani, C. A Comparison of Skin Lesions’ Diagnoses Between AI-Based Image Classification, an Expert Dermatologist, and a Non-Expert. Diagnostics 2025, 15, 1115. [Google Scholar] [CrossRef]
Riaz, S.; Naeem, A.; Malik, H.; Naqvi, R.A.; Loh, W.K. Federated and Transfer Learning Methods for the Classification of Melanoma and Nonmelanoma Skin Cancers: A Prospective Study. Sensors 2023, 23, 8457. [Google Scholar] [CrossRef]
Haggenmüller, S.; Schmitt, M.; Krieghoff-Henning, E.; Hekler, A.; Maron, R.C.; Wies, C.; Utikal, J.S.; Meier, F.; Hobelsberger, S.; Gellrich, F.F.; et al. Federated Learning for Decentralized Artificial Intelligence in Melanoma Diagnostics. JAMA Dermatol. 2024, 160, 303–311. [Google Scholar] [CrossRef]
Cheslerean-Boghiu, T.; Fleischmann, M.E.; Willem, T.; Lasser, T. Transformer-based interpretable multi-modal data fusion for skin lesion classification. arXiv 2023, arXiv:2304.14505. [Google Scholar] [CrossRef]
Ou, C.; Zhou, S.; Yang, R.; Jiang, W.; He, H.; Gan, W.; Chen, W.; Qin, X.; Luo, W.; Pi, X.; et al. A deep learning based multimodal fusion model for skin lesion diagnosis using smartphone collected clinical images and metadata. Front. Surg. 2022, 9, 1029991. [Google Scholar] [CrossRef]
Qin, Z.; Liu, Z.; Zhu, P.; Xue, Y. A GAN-based image synthesis method for skin lesion classification. Comput. Methods Programs Biomed. 2020, 195, 105568. [Google Scholar] [CrossRef] [PubMed]
Shah, S.A.H.; Shah, S.T.H.; Khaled, R.; Buccoliero, A.; Shah, S.B.H.; Di Terlizzi, A.; Di Benedetto, G.; Deriu, M.A. Explainable AI-Based Skin Cancer Detection Using CNN, Particle Swarm Optimization and Machine Learning. J. Imaging 2024, 10, 332. [Google Scholar] [CrossRef] [PubMed]
Mohsen, F.; Ali, H.; El Hajj, N.; Shah, Z. Artificial intelligence-based methods for fusion of electronic health records and imaging data. Sci. Rep. 2022, 12, 17981. [Google Scholar] [CrossRef] [PubMed]
Heinlein, L.; Maron, R.C.; Hekler, A.; Haggenmüller, S.; Wies, C.; Utikal, J.S.; Meier, F.; Hobelsberger, S.; Gellrich, F.F.; Sergon, M.; et al. Prospective multicenter study using artificial intelligence to improve dermoscopic melanoma diagnosis in patient care. Commun. Med. 2024, 4, 177. [Google Scholar] [CrossRef] [PubMed]
Marchetti, M.A.; Cowen, E.A.; Kurtansky, N.R.; Weber, J.; Dauscher, M.; DeFazio, J.; Deng, L.; Dusza, S.W.; Haliasos, H.; Halpern, A.C.; et al. Prospective validation of dermoscopy-based open-source artificial intelligence for melanoma diagnosis (PROVE-AI study). Npj Digit. Med. 2023, 6, 127. [Google Scholar] [CrossRef]
Abbas, Q.; Daadaa, Y.; Rashid, U.; Ibrahim, M. Assist-Dermo: A Lightweight Separable Vision Transformer Model for Multiclass Skin Lesion Classification. Diagnostics 2023, 13, 2531. [Google Scholar] [CrossRef]
Tahir, M.; Naeem, A.; Malik, H.; Tanveer, J.; Naqvi, R.A.; Lee, S.W. DSCC_Net: Multi-Classification Deep Learning Models for Diagnosing of Skin Cancer Using Dermoscopic Images. Cancers 2023, 15, 2179. [Google Scholar] [CrossRef]
Ayas, S. Multiclass skin lesion classification in dermoscopic images using swin transformer model. Neural Comput. Appl. 2022, 35, 6713–6722. [Google Scholar] [CrossRef]
Kassem, M.A.; Hosny, K.M.; Fouad, M.M. Skin Lesions Classification Into Eight Classes for ISIC 2019 Using Deep Convolutional Neural Network and Transfer Learning. IEEE Access 2020, 8, 114822–114832. [Google Scholar] [CrossRef]
Liu, X.; Yu, Z.; Tan, L.; Yan, Y.; Shi, G. Enhancing Skin Lesion Diagnosis with Ensemble Learning. arXiv 2024, arXiv:2409.04381. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv 2018, arXiv:1801.04381. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
Ray, A.; Sarkar, S.; Schwenker, F.; Sarkar, R. Decoding skin cancer classification: Perspectives, insights, and advances through researchers’ lens. Sci. Rep. 2024, 14, 30542. [Google Scholar] [CrossRef] [PubMed]
Hernández-Pérez, C.; Combalia, M.; Podlipnik, S.; Codella, N.C.F.; Rotemberg, V.; Halpern, A.C.; Reiter, O.; Carrera, C.; Barreiro, A.; Helba, B.; et al. BCN20000: Dermoscopic Lesions in the Wild. Sci. Data 2024, 11, 641. [Google Scholar] [CrossRef]
Cassidy, B.; Kendrick, C.; Brodzicki, A.; Jaworek-Korjakowska, J.; Yap, M.H. Analysis of the ISIC image datasets: Usage, benchmarks and recommendations. Med. Image Anal. 2022, 75, 102305. [Google Scholar] [CrossRef]
Alipour, N.; Burke, T.; Courtney, J. Skin Type Diversity in Skin Lesion Datasets: A Review. Curr. Dermatol. Rep. 2024, 13, 198–210. [Google Scholar] [CrossRef]
Gomolin, A.; Netchiporouk, E.; Gniadecki, R.; Litvinov, I.V. Artificial Intelligence Applications in Dermatology: Where Do We Stand? Front. Med. 2020, 7, 100. [Google Scholar] [CrossRef]
Patel, S.; Wang, J.V.; Motaparthi, K.; Lee, J.B. Artificial intelligence in dermatology for the clinician. Clin. Dermatol. 2021, 39, 667–672. [Google Scholar] [CrossRef] [PubMed]
Jeong, H.K.; Park, C.; Henao, R.; Kheterpal, M. Deep Learning in Dermatology: A Systematic Review of Current Approaches, Outcomes, and Limitations. JID Innov. 2023, 3, 100150. [Google Scholar] [CrossRef] [PubMed]
Biswas, S.; Achar, U.; Hakim, B.; Achar, A. Artificial Intelligence in Dermatology: A Systematized Review. Int. J. Dermatol. Venereol. 2024, 8, 33–39. [Google Scholar] [CrossRef]
Martínez-Vargas, E.; Mora-Jiménez, J.; Arguedas-Chacón, S.; Hernández-López, J.; Zavaleta-Monestel, E. The Emerging Role of Artificial Intelligence in Dermatology: A Systematic Review of Its Clinical Applications. Dermato 2025, 5, 9. [Google Scholar] [CrossRef]
Nahm, W.J.; Sohail, N.; Burshtein, J.; Goldust, M.; Tsoukas, M. Artificial Intelligence in Dermatology: A Comprehensive Review of Approved Applications, Clinical Implementation, and Future Directions. Int. J. Dermatol. 2025. Online ahead of print. [Google Scholar] [CrossRef]

Figure 1. Example of object detection in a skin lesion photograph. Source: HAM1000 dataset [42,43].

Figure 2. Example of object segmentation in a skin lesion photograph. Source: HAM1000 dataset [42,43].

Figure 3. Example of histology photo. Source: [75].

Figure 4. General flowchart of the automated decision-making process. Source: own work and HAM10000 dataset [42,43].

Figure 5. Layers of the dermatoscopic photograph in colour models: RGB—(a–c), HSL—(d–f), L*a*b—(g–i). (a) Layer representing the red colour. (b) Layer representing the green colour. (c) Layer representing the blue colour. (d) Layer representing the hue. (e) Layer representing the saturation. (f) Layer representing the lightness. (g) Layer representing the lightness. (h) Layer representing the green–red opponent colours. (i) Layer representing the blue–yellow opponent colours. Source: own work and HAM1000 dataset [42,43].

Figure 6. Diagram showing the ROC curve. The AUC is under the ROC.

Figure 7. The PRISMA flowchart.

Figure 8. The left subfigure simulates a skin image with a dark birthmark (mole) in the centre. The middle subfigure shows the noise added to the left one. The right subfigure illustrates an adversarial attack: the same image with added slight noise, which is imperceptible but can affect classification by machine learning models.

Figure 9. The flowchart for the best architecture selection.

Figure 10. A flowchart of the best practices for selecting a method and procedures for skin image analysis.

Table 1. Some examples of supervised, unsupervised, and reinforcement learning in dermatology.

Type of ML	Examples of Solutions
Supervised	Background selection schema for the deep learning-based classification of dermatological diseases [22] Skin disease detection using deep learning [23]
Unsupervised	Epidermal necrolysis: characterisation of different phenotypes using unsupervised clustering analysis [24] Suspicious skin lesion detection in wide-field body images using deep learning outlier detection [25]
Reinforcement	A reinforcement learning model for AI-based decision support in skin cancer [26] A reinforcement learning algorithm for the automated detection of skin lesions [27]

Table 2. Applications of StyleGAN2 and CycleGAN in dermatological images.

Method	Input	Key Innovation	Application in Dermatology
StyleGAN2	Random hidden vector + mapping network	AdaIN for style control at different network levels	Generation of synthetic images of melanocytic nevi to augment the dermatoscopic set [60]
CycleGAN	Dermatoscopic images ↔ clinical images	Loss of cyclic consistency	Translation of clinical images into dermatoscopic style (and vice versa) to unify domains and augment training data [61]

Table 3. Number of articles by combined tags published in each year.

Tags/Year	Artificial Intelligence AND Dermatology			Machine Learning AND Dermatology			Deep Learning AND Dermatology
Tags/Year	PubMed	Scopus	WoS	PubMed	Scopus	WoS	PubMed	Scopus	WoS
2009	8	6	2	0	5	3	0	0	0
2010	6	8	2	3	6	0	0	0	0
2011	7	10	2	1	11	2	1	0	0
2012	6	10	1	1	8	3	0	0	0
2013	8	10	0	1	3	1	0	0	0
2014	26	8	0	9	8	1	0	0	0
2015	13	21	2	12	13	10	0	1	0
2016	26	30	0	11	18	3	0	5	0
2017	41	33	2	23	32	9	4	23	3
2018	77	41	14	53	49	12	26	57	9
2019	135	50	20	95	95	26	44	150	19
2020	235	71	53	152	137	27	88	207	35
2021	330	99	70	196	149	42	125	301	41
2022	310	102	73	191	199	42	118	385	38
2023	408	195	115	247	329	63	157	654	69
2024	644	309	224	320	288	85	144	526	87

Table 4. Number of articles by single tag published in each year.

Tag/Year	Artificial Intelligence			Machine Learning			Deep Learning			Dermatology
Tag/Year	PubMed	Scopus	WoS	PubMed	Scopus	WoS	PubMed	Scopus	WoS	PubMed	Scopus	WoS
2009	4640	11,150	1420	604	4622	2742	106	127	44	7040	1097	1159
2010	4473	12,518	1399	718	4669	2729	107	162	74	7358	1145	1099
2011	5280	14,737	1397	1148	4685	3075	118	186	100	8281	1293	1336
2012	5549	15,503	1440	1498	4941	3453	146	239	136	8536	1309	1462
2013	6811	14,740	1575	1931	5723	4388	207	331	227	10,504	1586	1635
2014	6894	16,064	1863	2426	7336	5943	265	679	481	14,094	1649	1556
2015	6773	18,332	2148	3292	9251	8044	384	1250	1200	16,145	1849	1701
2016	6804	21,627	2615	3921	12,048	10,508	643	2872	2588	17,987	1904	1727
2017	8240	21,430	3602	5284	17,052	14,691	1353	8134	6370	19,731	2284	1929
2018	11,245	23,154	6393	8331	25,896	23,017	3085	17,088	13,618	20,790	2495	2070
2019	16,422	23,082	11,370	12,676	45,281	34,673	5613	32,987	23,602	22,452	2840	2494
2020	22,692	30,020	16,731	18,240	56,636	44,872	9328	46,750	31,789	27,515	3580	3029
2021	31,476	34,080	23,899	25,664	71,723	59,819	14,522	63,368	45,096	29613	4250	3223
2022	39,132	36,190	30,871	31,426	88,916	71,351	19,640	82,530	57,790	28,317	4374	3332
2023	38,695	47,355	36,708	33,865	109,766	74,386	21,222	99,458	59,014	27,131	4891	3254
2024	50,970	60,469	54,470	41,594	130,263	90,995	23,112	99,742	69,072	27,912	5183	3788

Table 5. Comparison of the quality of selected neural network architectures in the diagnosis of skin diseases (2019–2025).

Architecture	Source	Dataset	Accuracy [%]	AUC	Sensitivity [%]	Specificity [%]
Separable Vision Transformer	[213] (2023)	PH2 + ISBI-2017 + HAM10000 + ISIC (9 classes)	95.6	0.95	96.7	95.00
DSCC_Net	[214] (2023)	ISIC–2020 + HAM10000 + DermIS (4 classes)	94.17	0.9943	93.76	—
Swin Transformer	[215] (2022)	ISIC-2019 (8 classes)	97.20	—	82.30	97.90
GoogleNet-TL	[216] (2020)	ISIC-2019 (8 classes)	94.92	—	79.80	97.00
SkinNet	[217] (2024)	HAM10000 (7 classes)	86.70	0.96	—	—

Table 6. Comparison of sets of dermoscopic skin images: ISIC 2019, HAM10000, BCN20000. In BCN20000, the number of training images is given; the test set (20% of the total) contains an additional OOD class. The balanced accuracy measure denotes the arithmetic mean of sensitivity (TPR) obtained separately for each class, which prevents the dominant class from overestimating the effectiveness. Limitations include set bias (e.g., dominance of light skin phototypes) and annotation quality (e.g., lack of routine confirmation of each diagnosis by histopathological examination).

Dataset	Number of Images	Number of Classes	Evaluation Measure	Main Limitations
ISIC (2019)	25,331 train + 8239 test	8 (+OOD)	Balanced accuracy	Class imbalance; limited phototype diversity; partial lack of histopathology
HAM10000	10,015	7	Balanced accuracy	Class imbalance; no images outside the pigmented lesion range; ∼50% without histopathological confirmation
BCN20000	18,946 (train)	8 (+OOD)	Balanced accuracy	Class imbalance; one site (geographic/skin bias); limited phototype data

Table 7. Comparison of methods and architectures for skin lesion analysis.

Method/Architecture	Advantages	Disadvantages	Recommended Application
ResNet-50 (transfer)	Fast convergence due to residual blocks Rich feature representations	Large number of parameters Prone to overfitting on small datasets	Classification of major lesion types
VGG-16	Simple, transparent architecture Well documented in literature	Very large model size Slow training	Rapid prototyping and baseline comparisons
GoogLeNet (transfer learning)	Well-established CNN with efficient Inception modules Effective when used with transfer learning and augmentation	Complex multibranch design Outperformed by newer, deeper models	Multiclass skin lesion classification using pretrained weights (ISIC 8-class tasks)
DenseNet-121	Efficient gradient flow Reduced overfitting risk	Slower training than ResNet	Melanoma classification (transfer learning)
EfficientNet-B0	Optimal balance of parameters and accuracy Automatic network scaling	More complex hyperparameter tuning	GPU-limited applications
DSCC_Net	Very high AUC achieved on dermoscopic images Class-balancing strategy improves rare lesion detection	Specialized architecture for skin cancer Requires complex training (data rebalancing)	Dermoscopic skin cancer diagnosis with imbalanced data
SkinNet (ensemble learning)	Improves accuracy by ensembling multiple CNNs Leverages complementary features of lightweight models	Increased training and inference complexity Requires coordination of multiple models	Ensemble-based skin lesion classification for maximum accuracy
Swin Transformer	High accuracy via hierarchical self-attention Captures local and global image context effectively	Memory- and computation-intensive Requires large training dataset for best performance	Large-scale skin lesion image classification state-of-the-art model)
Separable Vision Transformer	Lightweight ViT design optimised for dermatology Achieves state-of-the-art classification accuracy	Requires very large training datasets to generalise Still computationally demanding (albeit optimised)	High-precision lesion classification across diverse image sets (with sufficient data)
UNet	Excellent segmentation precision Easy to implement	Requires detailed pixel-level annotation	Boundary segmentation of skin lesions
Attention UNet	Better focus on critical image regions Improved segmentation quality	Higher computational cost	Segmentation in complex backgrounds and shapes
StyleGAN2 (augmentation)	Generates highly realistic synthetic images Helps to balance class distribution	Difficult to stabilize training Possible artefact generation	Augmentation of rare lesion types; training on small datasets

Table 8. Comparison of review papers on AI in dermatology (2020–2025).

Authors	Year	Scope	Main Contributions	Limitations
Gomolin et al. [227]	2020	Narrative review of AI applications in dermatology	Overview of AI use in melanoma, ulcers, inflammatory dermatoses; identifies barriers to clinical use (e.g., black-box models)	Non-systematic; early 2020 state-of-the-art; no quantitative synthesis
Patel et al. [228]	2021	Educational overview for clinicians on AI in dermatology	Summarises diagnostic performance of AI (∼67–99% accuracy); stresses improved access and faster diagnosis	Narrative form; limited in regulation, bias, and external validation
Jeong et al. [229]	2022	Systematic review of CNNs in dermatology imaging	Comprehensive summary of CNN-based classification; dataset comparison; discusses regulatory paths	Focuses only on CNN image models; no clinical outcomes or ethical analysis
Biswas et al. [230]	2025	Review of diagnostic performance of AI across skin diseases	Reports high sensitivity/accuracy for melanoma, psoriasis, acne, tinea; stresses health access benefits	Lacks technical/regulatory depth; semi-systematic review only
Martinez-Vargas et al. [231]	2025	Systematic review of clinical outcomes with AI in dermatology	Shows reduced missed melanoma cases (58.8% to 4.1%), improved triage; identifies heterogeneity and bias risks	Limited external validation; scope excludes technical/ethical discussions
Nahm et al. [232]	2025	Review of FDA/regulated AI tools in dermatology	Identifies 15 approved AI tools; highlights clinical integration in screening and education	Only covers approved systems; omits experimental and technical details
This work	2025	Broad review: AI methods, clinical use, ethics, regulation, validation, bibliometrics	Combines technical, legal, and ethical dimensions; bibliometric analysis 2009–2024; regulatory gaps; architecture/dataset comparison; recommendations for standardisation and explainability	Very broad scope; limited depth in specific subdomains; only English-language sources

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zbrzezny, A.M.; Krzywicki, T. Artificial Intelligence in Dermatology: A Review of Methods, Clinical Applications, and Perspectives. Appl. Sci. 2025, 15, 7856. https://doi.org/10.3390/app15147856

AMA Style

Zbrzezny AM, Krzywicki T. Artificial Intelligence in Dermatology: A Review of Methods, Clinical Applications, and Perspectives. Applied Sciences. 2025; 15(14):7856. https://doi.org/10.3390/app15147856

Chicago/Turabian Style

Zbrzezny, Agnieszka M., and Tomasz Krzywicki. 2025. "Artificial Intelligence in Dermatology: A Review of Methods, Clinical Applications, and Perspectives" Applied Sciences 15, no. 14: 7856. https://doi.org/10.3390/app15147856

APA Style

Zbrzezny, A. M., & Krzywicki, T. (2025). Artificial Intelligence in Dermatology: A Review of Methods, Clinical Applications, and Perspectives. Applied Sciences, 15(14), 7856. https://doi.org/10.3390/app15147856

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence in Dermatology: A Review of Methods, Clinical Applications, and Perspectives

Abstract

1. Introduction

2. AI Methods

2.1. Machine Learning

2.2. Deep Learning

2.3. Applications of Neural Networks

2.4. New Trends in 2023–2025

2.5. Limitations of AI Methods in Dermatology

3. Decision-Making Modeling in Clinical Issues

3.1. Digital Images

3.2. Dermoscopy

3.3. Dermatopathology

3.4. Inference and Decision

3.5. Classification Quality Measures

4. Methods

4.1. Protocol and Registration

4.2. Inclusion and Exclusion Criteria

4.3. Search Strategy

5. AI Regulations, Verification, and Ethical Problems

5.1. Regulations on AI in Medicine

5.2. Verification of the Model

5.2.1. Adversarial Attacks

5.2.2. Explainability

5.3. Ethical Implications

5.3.1. Informed Consent and Data Privacy

5.3.2. Bias and Fairness

5.3.3. Legal Liability and Accountability

5.4. Human–AI Interaction

6. Bibliometric Analysis of Articles on AI Applications in Dermatology

7. Overview of the Major Applications of AI in Dermatology

7.1. Core Applications

7.2. Applications in Clinical Practice

7.3. Future Research Directions

8. Comparative Analysis of Neural Network Architectures and Datasets in Skin Diagnostics

8.1. Comparison of Neural Network Architectures

8.2. Comparison of Datasets

9. Key Observations and Recommendations

9.1. Key Observations

9.2. Key Recommendations

9.3. Method Comparison Table

9.4. Machine Learning Flowchart

10. Other Review Perspectives

11. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI