1. Introduction
Skin cancers have been on the rise in recent decades, becoming a very significant public health issue [
1]. Among the different types of skin cancer, melanoma is one of the deadliest types, including 80% of deaths from skin cancer [
2]. In the past decade (2012–2022), the number of new invasive melanoma cases diagnosed annually increased by 31% [
3]. However, early detection is vital to the possibility of effective treatment: the estimated five-year survival rate for patients whose melanoma is detected early is about 99 percent. This survival rate falls to 68 percent when the disease reaches the lymph nodes and 30 percent when the disease metastasizes to distant organs [
3]. Taking these numbers into account, it is imperative to identify skin cancer and melanoma as early as possible.
Pigmented skin lesions (PSLs) represent a diverse spectrum of dermatological conditions characterized by an anomaly on the skin, presenting itself as a discolored spot due to the presence of various pigments. These lesions are of significant clinical importance, as they encompass both benign entities, such as moles and freckles, and malignant conditions, such as melanoma and non-melanoma skin cancer [
4,
5].
A common method for diagnosing PSLs is the use of dermoscopy, a non-invasive technique that exploits the use of a magnifying lens (a dermoscope) and liquid immersion to magnify submacroscopic structures [
6]. It enhances the sensitivity of the naked-eye examinations in clinical practice [
7]. However, an early-stage case of skin cancer may only receive an opinion from a non-specialist (e.g., a physician who is not trained in dermatology) with only standard cameras as the imaging method at hand. In such cases, an image from the lesion can be captured and sent to a dermatologist for examination. This method has proven to be as effective as physical patient diagnosis with a much faster speed [
8]. In a study done by Brinker et al. [
9], a group of 157 dermatologists (including 60% dermatology residents and junior physicians) performed significantly worse on dermoscopic images than on clinical (macroscopic) images of various skin lesions in detecting melanoma. Consequently, melanoma detection is largely dependent on the imaging type.
In the past decades, there have been developments in computational methods for helping dermatologists diagnose skin cancer early. Computerized analysis of pigmented skin lesions is a growing field of research. Its main goal is to develop reliable automated tools to recognize skin cancer from images. Studies have shown that automated systems are capable of diagnosing melanoma under experimental conditions [
10]. Moreover, computer-aided diagnosis (CAD) systems have the potential to prove useful as a backup for specialist diagnosis, reducing the risk of missed melanomas in highly selected patient populations [
11]. Machine learning (ML) has evolved considerably over the past decade due to the availability of larger image databases and improvements in computer architecture. Advances in deep neural networks have also been a critical factor in making deep learning gradually supplant customary machine learning models for the detection of skin cancer. The conventional procedure employed for automated skin cancer detection involves a sequence of steps that include acquiring the image data, performing pre-processing, segmenting the pre-processed image, extracting the relevant features, and subsequently classifying the image based on the extracted features as depicted in
Figure 1. The final step of the approach is the evaluation of the trained classifier using proper metrics. It should be noted that the segmentation and feature extraction steps may be skipped depending on the ML method employed.
Motivation and Contribution
Melanoma is one of the most fatal cancers. This type of cancer has a high death rate and is typically detected in advanced stages, but research from the previous decade indicates that early diagnosis of malignant lesions can drastically lower patient mortality rates and survival rates. Many researchers have worked to use different methods of imaging and artificial intelligence to detect and diagnose these malignancies. Also, in order to address the issues related to the identification and diagnosis of these cancers, these researchers have put forth a number of unique and modified conventional methodologies.
However, while finding numerous evaluations on the identification of skin cancer using artificial intelligence, we were unable to locate a thorough analysis of the diagnosis of skin cancer using clinical (macroscopic) images and machine learning methods. Furthermore, we could not find any work that presented all the available clinical datasets in a comprehensive manner. We have conducted a comparison of the presented survey with other recent reviews in
Table 1. Our presented survey was subjected to a comparative analysis with existing reviews, based on various criteria such as the year scope, imaging modality type, and coverage, as well as the major tasks involved in the automated skin cancer detection pipeline, as depicted in
Figure 1. Furthermore, we conducted an assessment to verify if the papers elaborated on technical details or solely focused on the results, and examined if all available clinical datasets had been reviewed thoroughly.
Stiff et al. [
12] comprehensively reviewed the literature on clinical images, with a primary focus on the application of convolutional neural networks (CNNs) for image classification. However, their review lacked a complete overview of the available clinical datasets. In contrast, Wen et al. [
13] provided a thorough survey of all available datasets, encompassing various image modalities, including clinical data. Nevertheless, Wen’s work only focused on the datasets themselves. On the other hand, Manhas et al. [
14] conducted a general review of works done on automated skin cancer detection. However, their research did not provide a summary of available clinical datasets and instead focused mostly on the challenges researchers face in the field. Similarly, Bhatt et al. [
15] and Jones and Rehg [
16] conducted general surveys of state-of-the-art techniques for skin cancer detection, with a main emphasis on machine learning methods for classification. Consequently, they failed to present a detailed review of works done on clinical datasets. Finally, Haggenmüller et al. [
17] solely concentrated on research of melanoma detection using convolutional neural networks. Although, their review did not include a comprehensive evaluation of segmentation and feature extraction tasks. Meanwhile, Dildar et al. [
18] mainly centered their survey on neural network-based skin cancer detection studies.
In order to analyze the work of the researchers of the reviewed papers in terms of the pipelines they implemented and the results they offered, we attempt to present the most significant research articles that have used clinical image data for the detection, segmentation, and classification of skin cancers in this research paper. The contribution of this study is to provide a critical and thorough analysis of several artificial intelligence techniques and their use in the diagnosis of cancerous lesions on the skin using clinical data based on performance evaluation metrics. Our objective is to provide a review article that serves as a comprehensive reference for researchers interested in acquiring knowledge and conducting systematic skin cancer detection using clinical skin images.
In this paper, we review the research on automatic skin cancer detection approaches using machine learning in the past decade with a special focus on clinical images. It is important to acknowledge that melanoma, being the most aggressive form of skin cancer, was the primary focus of the majority of the papers we reviewed. Consequently, a greater emphasis is given to the diagnosis and detection of melanoma throughout our article. In the following section, we will briefly describe skin cancer types and their causes, as well as different imaging modalities used to monitor skin lesions. In
Section 3, a detailed discussion of the search scope used for selecting the papers will be presented. Next, we will discuss various datasets used in the state-of-the-art papers. Later, in
Section 4, we will review the selected articles in terms of pre-processing, image segmentation, feature extraction, change detection, and other diagnostic and classification methods. Lastly, the
Section 5 of the article highlights the numerous key findings from this review.
3. Search Criteria
This section addresses the identification of the scope of the reviewed articles. The process for identifying the scope is based on the imaging technique, machine learning models employed, pre-processing procedure, segmentation techniques utilized, features extracted, and performance evaluation metrics.
In this review paper, we aim to select different studies on skin cancer and its diagnosis using machine learning techniques. The publications in this review were acquired from the following databases: IEEE Xplore, Science Direct (Elsevier), Springer Link, PubMed, Arxiv, and Google Scholar.
Regarding the methodology, we first started our search by using relevant keywords such as skin cancer, automated skin cancer detection, artificial intelligence in dermatology, melanoma detection, skin cancer detection with machine learning, and machine learning. We gathered articles from journals, conferences, and books that focused on the automated detection of skin lesion tasks from 2011 to 2022, excluding earlier publications. From the papers published in 2022, only those that were out by the time of compilation of this paper are included. Also, we removed articles that used datasets that were not entirely clinical. Furthermore, we excluded articles that failed to produce acceptable results compared to the other available research in the same scope. In other words, papers that neither improved the outcomes of prior research nor introduced novel methodologies. Following a thorough examination of these papers, only 51 were chosen based on our research criteria, as illustrated in
Figure 3.
5. Conclusions and Discussion
Computational methods for automated PSL detection can be of great help when assisting dermatologists in the early diagnosis of skin cancer and specifically among computational methods, machine learning has proven to be very effective to aid general practitioners to spot high-risk moles when standard, off-the-shelf cameras are used. In this paper, we reviewed 51 studies that have attempted to detect melanoma using machine learning methods over the past decade, with a focus on the use of clinical datasets, i.e., datasets using standard camera images, as opposed to exploiting more specific tools (dermoscopes) to evaluate suspicious lesions.
Firstly, all the clinical datasets used by the authors have been presented and analyzed. The majority of the clinical datasets in the reviewed state-of-the-art papers were unbalanced, relatively small, or unavailable for public use. This issue can affect the performance of the PSL classifiers negatively since all datasets have a reduced percentage of melanoma and numerous benign lesions, and a large number of articles use accuracy as a quantitative metric of performance measure.
In addition, when describing the experiments, most papers did not divide the dataset into further subsets or did not provide any information regarding their test sets (whether they were different from the validation set or not). Surprisingly, in most cases, all of the data are used for training, and then the same data are used to provide an accuracy estimate that is assumed to show the performance of the system. This is not good practice in machine learning and does not provide a measure of the real performance of the model when faced with a dataset other than that with which it has been trained. To prevent over-fitting, good research practice should include separate training, validation, and testing datasets. This is very important for understanding how well the ML model is generalizing the problem. Every classifier tries to memorize the training set, and this is even more true when the amount of data we use to train the model is small and does not allow the classifier to generalize the problem. For this reason, it is very important to allow the classifier to generalize to new data, and this is not possible with only training and a validation set. Every time a researcher makes a decision about how to change the classifier’s parameters to improve its performance (hyper-parameter tuning), they are actually providing information to the classifier about the validation set. So, after several experiments, the validation data bleed into the training data. A possible way to solve this is to have more annotated data (a test set, in addition to the validation set already used), which is then hidden during the training process, and never examined until a final decision has been made about the tuning of the classifier. Then, the researchers are ready to use a test set to measure the actual error and the real performance of the model. Therefore, we conclude that this flaw in the reviewed articles may impact the performance comparison of different models.
Secondly, we carried out the process of reviewing the implementation of automated skin lesion detection, step by step, and explained each subprocess in detail. The first step of building an automated machine learning model is pre-processing the images. We divided all the pre-processing approaches utilized in the reviewed papers into four categories: illumination correction, artifact removal, image cropping, and data augmentation. It could be observed that the artifact removal approaches were not effective for all cases. We have argued that artifact removal tasks are not absolutely necessary and can be avoided in some cases, depending on the nature of the dataset. In other cases, using an artifact removal method can be beneficial to the general performance of the model. But, until now, available artifact removal approaches have some flaws, thus they should be used with caution. In addition, illumination correction and image cropping can be implemented where they are needed. On the other hand, data augmentation is essential when we are dealing with small and unbalanced datasets.
Segmentation is often the second step in developing automated computational systems for diagnosing skin lesions. Segmentation is also one of the most challenging parts of the process. Among all the applied segmenting methods in the state-of-the-art articles, Otsu’s threshold-based method and K-means (with K = 2) clustering algorithm were the most popular. On the other hand, some authors proposed new segmentation approaches in their works that were mostly based on pre-existing segmentation methods. However, since most of them did not provide a trustworthy evaluation metric, we were not able to provide a quantitative comparison.
In the third step, skin lesion features can be extracted (from the actual images or segmented regions) in order to obtain information for classification. The reviewed papers extracted features either manually (based on the ABCD rule + texture criteria) or automatically (using CNNs). The authors extracted various combinations of attributes such as their asymmetry, border, color, and texture hand-crafted features. Most of the reviewed articles extracted feature descriptors manually based on the ABCD rule. However, hand-crafted features usually require a feature selection and normalization step to improve the performance of the model. On the other hand, papers that applied CNN leave the feature extraction step to be performed automatically by the network. In recent years, CNNs have grown in popularity as a means of automated feature extraction, and as can be seen from the
Section 4.4, papers that used automatic methods demonstrated very good performance. In addition, automated features take less time and effort, which makes them more convenient to use. And for skin lesion diagnosis, the skin area surrounding the lesion can provide further information regarding the type of mole. In CNNs, these skin features are automatically taken into account, while hand-crafted features are usually extracted from segmented masks without considering the tissue around the lesion.
The last step of developing an automated PSL diagnosis system is classification. CNNs and SVMs were the most commonly used classifiers in the reviewed papers and achieved better results than the other methods. Also, papers that extracted features manually used trained SVMs to classify the lesions based on their hand-crafted features, while CNNs were used directly on the dataset images in the other studies. Additionally, we saw that during the past few years, CNNs were often the preferred choice over SVMs for feature extraction, due to their ease of use and precision in learning features from the data. Moreover, as we observed in the results section, the reviewed state-of-the-art articles that trained CNNs for classification showed slightly better performance than other methods. Since deep models are currently progressing rapidly, it is expected that more trustworthy models with better performance will appear in the future. Additionally, with the prompt appearance of skin lesion databases, we expect to see more deep models with multi-class classification abilities, providing accurate risk scores and lesion assessments for different types of skin cancer. However, we must mention that CNNs only perform well when they are trained on a corpus of images large enough to yield sufficient samples for all classes. Because the number of melanoma samples is usually limited, researchers who work on melanoma classification may still prefer to use SVMs with hand-crafted features over CNNs, since they provide better generalization with limited data.
On the other hand, in dermatological examinations, skin lesions are usually evaluated in comparison to their neighboring lesions for suspiciousness in order to determine whether further examination or biopsy is necessary. Therefore, to determine if a lesion is malignant or benign, it is important to examine other lesions of the patient as well. In addition, malignant lesions often grow and change over time. Hence, keeping track of lesion changes is also a crucial index in diagnosing PSLs. However, there has been very little work conducted in the area of detecting PSLs using clinical regional images.
Computer-aided diagnosis systems for skin lesions have improved noticeably during the past decade. With the progress of deep neural networks and the appearance of large dermoscopic datasets, CAD systems are now able to diagnose PSLs with high reliability. However, these models are still not capable of replacing professional dermatologists because, first of all, they do not cover all lesion diagnosis criteria, and secondly, there are still some limitations when it comes to imaging the lesions.
As we discussed in
Section 4.5, malignant lesions usually grow and evolve over time. Therefore, dermatologists track suspicious lesions over time by having regular examinations. To our knowledge, there is no reliable work conducted on the automatic diagnosis of skin cancer that takes change detection into account. Also, a change in a lesion may be too small to be detected by the naked eye. Hence, having an automated change detection system can also support dermatologists in the early detection of skin cancer and melanoma.
Another important step in skin lesion diagnosis is full-body examinations. The suspiciousness of a lesion can be ranked based on other lesions present on the body of the patient. A lesion may be considered malignant in one patient and benign in other patients based on the overall type and nature of other lesions present on the skin of the patient. Currently, the majority of available CAD systems are trained on single-lesion images. Moreover, to date, there is no publicly available dataset that contains wide-field images. Having such datasets at hand can result in further progress in automated PSL diagnosis.
One of the most important limitations of skin lesion imaging is the presence of hair and other artifacts on the lesions or their surrounding skin area. As we reviewed in
Section 4.2, currently, there are no pre-processing methods that can remove the artifacts effectively. As a result, intelligent classifiers are still not able to diagnose those types of images properly. Another limitation in work conducted on clinical data is the lack of public datasets with sufficient numbers of images and diversity in classes that can be used to train a reliable classifier that would be able to diagnose and differentiate all types of skin cancers.
We believe that overcoming the obstacles mentioned above would result in great progress in automated PSL fields and the development of smart devices that could be used in the early detection of melanoma.