Next Article in Journal
Autopoiesis and Eigenform
Next Article in Special Issue
Predicting Time-to-Healing from a Digital Wound Image: A Hybrid Neural Network and Decision Tree Approach Improves Performance
Previous Article in Journal
Evolutionary Computation Techniques for Path Planning Problems in Industrial Robotics: A State-of-the-Art Review
Previous Article in Special Issue
Regional Contribution in Electrophysiological-Based Classifications of Attention Deficit Hyperactive Disorder (ADHD) Using Machine Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Two-Stage Input-Space Image Augmentation and Interpretable Technique for Accurate and Explainable Skin Cancer Diagnosis

1
Faculty of Computer Science, Universitas Dian Nuswantoro, Semarang 50131, Indonesia
2
Dinus Research Group for AI in Medical Science (DREAMS), Universitas Dian Nuswantoro, Semarang 50131, Indonesia
3
Department of Health Information Management, Universitas Indonesia Maju, Jakarta 12610, Indonesia
*
Author to whom correspondence should be addressed.
Computation 2023, 11(12), 246; https://doi.org/10.3390/computation11120246
Submission received: 28 October 2023 / Revised: 25 November 2023 / Accepted: 25 November 2023 / Published: 5 December 2023
(This article belongs to the Special Issue Computational Medical Image Analysis)

Abstract

:
This research paper presents a deep-learning approach to early detection of skin cancer using image augmentation techniques. We introduce a two-stage image augmentation process utilizing geometric augmentation and a generative adversarial network (GAN) to differentiate skin cancer categories. The public HAM10000 dataset was used to test how well the proposed model worked. Various pre-trained convolutional neural network (CNN) models, including Xception, Inceptionv3, Resnet152v2, EfficientnetB7, InceptionresnetV2, and VGG19, were employed. Our approach demonstrates an accuracy of 96.90%, precision of 97.07%, recall of 96.87%, and F1-score of 96.97%, surpassing the performance of other state-of-the-art methods. The paper also discusses the use of Shapley Additive Explanations (SHAP), an interpretable technique for skin cancer diagnosis, which can help clinicians understand the reasoning behind the diagnosis and improve trust in the system. Overall, the proposed method presents a promising approach to automated skin cancer detection that could improve patient outcomes and reduce healthcare costs.

1. Introduction

Skin cancer is one of the most prevalent and potentially life-threatening forms of cancer worldwide. For more effective therapy and better patient recovery, early detection and diagnosis are essential [1,2]. In recent years, the study of medical image analysis has been completely transformed by convolutional neural networks (CNNs) compared to other advanced machine learning models, supervised or unsupervised, such as k-nearest neighbor (KNN) and support vector machine (SVM), offering a promising approach for the automated detection of skin cancer [3]. CNNs have shown to be extremely effective at extracting complicated patterns and characteristics from medical images, making them an ideal tool for automating the process of skin cancer detection. This technology has the potential to assist dermatologists and healthcare professionals in identifying skin lesions and distinguishing between benign and malignant tumors.
HAM10000 and the International Skin Imaging Collaboration (ISIC) are two datasets that are widely utilized in skin cancer detection studies. HAM10000 is a comprehensive dataset containing diverse dermoscopic images of pigmented skin lesions, a common category of skin cancer [4]. An advantage of the HAM10000 dataset lies in its relatively smaller size compared to the expansive ISIC dataset. This may be beneficial for researchers facing limited computational resources or who want to focus on a specific subset of skin lesions. However, the ISIC dataset has unique advantages, including a larger scale and the inclusion of additional metadata such as lesion location and patient age. The ISIC datasets have been used for segmentation tasks, but the availability of delineated segmentation masks is limited compared to the classification tasks [5]. The choice of dataset often depends on the specific research investigation and the resources available for the study.
The issue with skin cancer detection datasets is the imbalance in the number of data samples across different classes. This imbalance is observed in both the HAM10000 and ISIC 2017–2020 datasets. In the HAM10000 dataset, which includes a total of 10,015 images, the highest number of data samples can be seen in the melanoma category, with 6705 images, while the lowest number of samples is present in the dermatofibroma category, consisting of 115 images [6]. Meanwhile, in the ISIC 2020 dataset, encompassing a total of 33,126 images, the most abundant data samples are found within the unknown (benign) category, which comprises 27,126 images, whereas the solar lentigo category contains the fewest data samples, with only 7 images [7]. Data imbalance can lead to biased results in classification because the model may be more likely to predict the overrepresented class.
Several approaches can be employed to address imbalances in the amount of data, such as geometric-transformation-based augmentation, feature-space augmentation, and GAN-based augmentation [8]. Geometric data augmentation is a technique employed in the areas of machine learning and computer vision to enhance the variability of a dataset by doing geometric modifications on the original data. The technique transforms the geometric configuration of images by moving the positions of individual pixels without modifying the values of those pixels. These transformations involve altering the position, orientation, or scale of the data while preserving their inherent characteristics. Geometric data augmentation is particularly useful for image data and is often applied to improve the performance of deep learning models. Some common geometric augmentations include rotation, scaling, translation, shearing, flipping, cropping, and zooming.
In feature-space data augmentation, there are two approaches: namely, the undersampling and oversampling approaches. In the undersampling approach, the number of samples from the majority class is reduced to create a more balanced distribution between the classes. By reducing the number of majority class samples, undersampling can help prevent the model from being biased towards the majority class and can improve its ability to recognize the minority class. However, undersampling may result in a loss of potentially valuable information, so it should be applied carefully. In the oversampling approach, additional samples from the minority class are generated to create a more balanced distribution between the classes. The goal is to increase the representation of the minority class to match the number of samples in the majority class, making the dataset more balanced. There are several oversampling methods, with one of the most commonly used techniques being SMOTE (Synthetic Minority Over-sampling Technique). In order to generate synthetic samples for the minority class, SMOTE [9] interpolates between existing data points. This helps to improve the model’s ability to learn from the minority class and can lead to better classification results. However, it is important to be cautious with oversampling, as generating too many synthetic samples can lead to overfitting and reduced model generalization.
The concept of GAN-based augmentation refers to the use of generative adversarial networks (GANs) for the purpose of producing synthetic data samples that can be used to augment an existing dataset [10]. This technique is particularly useful in cases where the original dataset is small or imbalanced, as it can help to increase the size of the dataset and balance the class distribution. Augmentation by GAN has been implemented effectively in numerous domains, including medical imaging, natural language processing, and computer vision. Some examples of GAN augmentation in medical imaging include the generation of synthetic CT scans, MRI images, and X-ray images to aid in disease diagnosis and treatment.
In general, resampling approaches can be divided into two categories: namely, input-space data augmentation and feature-space data augmentation. Input-space resampling involves manipulating the original data instances themselves before any feature extraction. Meanwhile, feature-space resampling is applied after feature extraction. Geometric transformations and GAN-based augmentations are categorized as input-space data augmentation, whereas SMOTE is classified as feature-space data augmentation. The benefit of input-space data augmentation is its independence from the feature extraction method, providing greater flexibility in choosing feature extraction methods. Therefore, in this study, we propose a two-step augmentation, including geometric and GAN-based augmentation, for early detection of skin cancer. The main contributions of this research article are:
  • The integration of geometric and GAN-based augmentation for skin cancer detection;
  • In this study, we provide an explainable AI using SHAP to explain how the model makes decisions or predictions.

2. Related Works

The related studies in this research are categorized into three groups: studies using feature-space augmentation, geometric augmentation, and GAN-based augmentation. Augmentation or oversampling is employed to address the issues of limited data and imbalanced data. Both of these problems contribute to the reduced accuracy of the detection model. This is also observed in skin cancer detection. Several studies have been conducted regarding the use of augmentation or oversampling in skin cancer detection. Abayomi et al. [11] proposed a data augmentation strategy that entails creating a new skin melanoma dataset using dermoscopic images from the publicly available PH2 dataset. The study adopted SMOTE-conv [12], which is a variant of SMOTE. SMOTE-conv utilizes a covariance matrix to detect relationships among attributes and generate synthetic instances. The SqueezeNet deep learning network was then trained using these modified images. In the binary classification scenario, it resulted in an accuracy of 92.18%, while in the multiclass classification scenario, it achieved an accuracy of 89.2%.
SMOTE is an oversampling method used to balance the number of samples between the majority and minority classes in a dataset. SMOTE randomly selects samples from the minority class and creates new synthetic samples by combining them with those of their nearest neighbors. This helps improve the classification performance on imbalanced datasets. However, SMOTE tends to introduce noise and affect classification performance. Therefore, K-means-SMOTE [13] was developed to address SMOTE’s limitations. It does this by using k-means clustering to group samples and generate synthetic samples only within clusters with fewer minority class instances. Chang et al. [14] adopted Kmeans-SMOTE to address class imbalance in the ISIC 2018 and ISIC 2019 datasets. Five pre-trained models—namely, VGG16, MELA-CNN, InceptionResNetV2, Inception V3, and the dermatologist handcrafted method—were used to extract features. The minority class data are oversampled using Kmeans-SMOTE and then classified using the Extreme Gradient Boosting (XGB) classifier. The research yielded an accuracy of 96.5%, precision of 97.4%, recall of 87.8%, AUC (Area Under the Curve) of 98.1%, and F1-score of 90.5%.
A study on a deep-learning-based skin cancer classification network (DSCC_Net) was proposed by Tahir et al. [15]; the study proposes the development of a deep learning model with multi-classification capabilities for the purpose of identifying skin cancer through the analysis of dermoscopic pictures. The model was trained and evaluated on three public datasets (HAM10000, ISIC2020, and DermIS), and the results showed that DSCC_Net outperformed other state-of-the-art models in terms of accuracy, sensitivity, specificity, and F1-score. The SMOTE Tomek [16] technique is used to balance the dataset by generating synthetic samples for the minority class and removing noisy and borderline examples from both the minority and majority classes. The DSCC_Net model demonstrates a notable level of performance, achieving an accuracy rate of 94.17%, a recall rate of 93.76%, an F1-score of 93.93%, a precision rate of 94.28%, and an AUC of 99.42%.
An alternative approach is demonstrated by Alam et al. [17], who proposed geometric augmentation in skin cancer detection. Data augmentation involved cropping the images to 256 × 256, horizontal flipping, and rotation at various angles. The study utilized the HAM10000 dataset, which initially consisted of 10,015 samples but which was increased to over 30,000 images through data augmentation. Feature extraction was performed using AlexNet, InceptionV3, and RegNetY-320. The proposed method achieved accuracy, F1, and ROC values of 91%, 88.1%, and 0.95, respectively. A similar approach was also carried out by Sae Lim et al. [18], who proposed geometric augmentation techniques, including rotation, zooming, shifting, and flipping. Experiments were performed using MobileNet on the HAM10000 dataset, leading to performance metrics of accuracy 83.23%, specificity 87%, sensitivity 85%, and an F1-score of 82%.
Alsaidi et al. [19] demonstrated various augmentation techniques in skin cancer detection. Their research proposed the use of GAN to address imbalanced data. Several pre-trained models, including EfficientNet-B0, ResNet50, ViT, and ConvNeXT, were employed. The utilization of GAN as augmentation and EfficientNet-B0 on the HAM1000 dataset yielded an accuracy rate of 96.8%, precision rate of 96.8%, recall rate of 96.9%, and F1-score of 96.8%. The development of GAN models for data augmentation was conducted by Qin et al. [20], who proposed style-based GANs. This method was tested on the ISIC 2018 dataset and achieved an accuracy of 95.2%. Using the same dataset, Ali et al. [21] proposed progressive generative adversarial networks (PGANs) and achieved an accuracy of 70.1%.

3. Materials and Methods

3.1. Dataset

HAM10000 (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T, accessed on 2 July 2023) is a dataset containing clinical images of various pigmented skin lesions, including both malignant (cancerous) and benign cases. The dataset consists of 10,015 dermatoscopic images of various skin lesions. These images vary in their types and characteristics. The data are categorized into seven categories based on the type of skin lesion. These categories include melanocytic nevi (nv), melanoma (mel), benign-keratosis-like lesions (bkl), basal cell carcinoma (bcc), actinic keratoses and intraepithelial carcinoma (akiec), vascular lesions (vasc), and dermatofibroma (df). Figure 1 and Figure 2 show the number and image samples of each category, respectively. Every image in the collection is accompanied by clinical metadata that includes information such as patient age and gender and the location of the skin lesion. Dermatology experts have provided annotations and diagnoses for each image in this dataset. These annotations include information about the type of lesion (whether it is malignant or benign) and its characteristics. The images in the HAM10000 dataset are of high resolution and good quality, making them suitable for in-depth analysis and diagnosis. HAM10000 is widely used by researchers and machine learning practitioners to develop and evaluate algorithms for skin cancer diagnosis. These data have played a crucial role in advancing the field of computer-aided skin cancer diagnosis. HAM10000 is a publicly available dataset, allowing researchers and developers to access and use it for non-commercial purposes.

3.2. The Proposed Skin Cancer Detection Method

Transfer learning is especially beneficial when there is minimal data for the new task or when building a deep model from scratch would be computationally expensive and time-consuming. In this study, skin cancer classification employs six pre-trained CNN models, which include Xception, Inceptionv3, Resnet152v2, EfficientnetB7, InceptionresnetV2, and VGG19. In order to build a robust model, we apply augmentation techniques to categories that have a limited number of images. Two-stage input-space augmentations—namely, geometric and GAN augmentations—are proposed. Figure 3 shows the flow of skin cancer detection with the proposed augmentation.
Geometric augmentation is one of the data augmentation techniques used in computer image processing, particularly in the context of deep learning and pattern recognition. The goal of geometric augmentation is to enhance the diversity of training data by altering the geometry of the original image without changing the associated labels or class information related to that image. In this way, machine learning models can learn more general patterns and are not overly dependent on specific poses, orientations, or geometric transformations.
Some commonly used geometric augmentation techniques in deep learning include:
  • Rotation: images can be rotated by a certain angle, either clockwise or counterclockwise.
  • Translation: images can be shifted in various directions, both horizontally and vertically.
  • Scaling: images can be resized to become larger or smaller.
  • Shearing: images can undergo linear distortions, such as changing the angles.
  • Flipping: images can be flipped horizontally or vertically.
  • Cropping: parts of the image can be cut out to create variations.
  • Perspective Distortion: images can undergo perspective distortions to change the viewpoint.
By applying these geometric augmentation techniques, training data can be enriched with geometric variations, which helps machine learning models become more robust to variations in real-world images. This allows the model to perform better in pattern recognition tasks, such as object classification, object detection, or image segmentation, even when objects appear in different orientations or poses.
GAN [22] augmentation refers to the use of generative adversarial networks (GANs) as one of the data augmentation techniques in the context of machine learning, especially in image processing. GAN is an artificial neural network architecture consisting of two models, the generator and the discriminator, that compete in a game to improve their capabilities [23]. In the context of data augmentation, GAN augmentation involves using the GAN generator to create additional data that are similar to the existing training data. The GAN generator tries to create images that appear authentic, while the GAN discriminator attempts to distinguish between images generated by the generator and real images.
By combining the images generated by the GAN generator with the training data, the dataset can be enhanced with image variations that appear realistic. GAN augmentation has been proven effective at improving the performance of machine learning models, especially in image recognition tasks such as object classification, object detection, or image segmentation, as it can create more diverse and relevant image variations.

3.3. Design of Experiments

In the experiment, 20% of the 10,015 images, which are 2003 images, are utilized for testing, while the remaining 8012 images are split into 90% (7210) for training and 10% (802) for validation. Three methods are used to oversample the data: geometric, GAN, and geometric+GAN augmentations. Experiments are carried out using Python 3.11.5 and were run on an Nvidia DGX Station A100 with a 40 GB GPU, a 64-core CPU, and 512 GB of DDR4 RAM.
Several experimental schemes are established to achieve the best performance. In the first scheme, skin cancer detection is conducted using the original data (without augmentation). In the second, third, and fourth schemes, the original data are augmented using geometric augmentation, GAN augmentation, and geometric+GAN augmentation, respectively. This study uses rotation, shift, shear, zoom, flip, and brightness for geometric augmentation, with detailed parameter values shown in Table 1. During GAN-based augmentation, a total of 1000 epochs are run with a batch size of 64. Table 2 shows the structure of discriminator and generator networks of GAN-based augmentation. In the discriminator, we employ the Adam optimizer with a learning rate of 0.0002 along with binary cross-entropy as the loss function. LeakyReLU with α = 0.2 is applied as the activation function for all layers except the last one, where a sigmoid activation function is utilized. A discriminator dropout with a probability of 0.2 is applied. Also in the generator network, each layer utilizes LeakyReLU with α = 0.2 except for the final layer, which employs Tanh as the activation function. The parameter values of the training model, such as optimizer, learning rate, and epoch, are shown in Table 3. In this experiment, we also conduct trials with a custom FC layer configuration as shown in Table 4, consisting of a dense layer with 64 neurons, a dense layer with 32 neurons, and a dense layer with 7 neurons [24].
This study also uses SHAP to explain skin cancer detection, which is a technique or approach that utilizes the concept of Shapley values to explain the contribution of each pixel or feature in an image to the model’s predictions. CNNs are frequently referred to as black boxes due to the difficulty in deciphering their decision-making processes. For the purpose of understanding model behavior and building trust, SHAP assists in improving the transparency and interpretability of the CNN’s decision-making. In SHAP, the concept of Shapley values is applied to measure and understand the influence of each pixel in the image on the model’s output or prediction. This technique is valuable for interpreting machine learning models, including the convolutional neural network (CNN) models frequently used for image-based tasks. Positive SHAP values signify that the presence of a pixel had a positive impact on the prediction (red pixel), whereas negative values indicate the contrary (blue pixel) [25].

3.4. Performance Metrics

The evaluation of performance was conducted using seven metrics: accuracy (Acc), precision (Prec), recall (Rec), F1-score, SpecificityAtSensitivity, SensitivityAtSpecificity, and G-mean. Accuracy assesses the proportion of true positives and true negatives among all the images. Precision is a metric that quantifies the accuracy of a model’s positive predictions. It is calculated by dividing the number of accurate positive predictions by the total number of positive predictions. Recall, also known as sensitivity, measures the ratio of true positives to all relevant elements, i.e., the true positives in the dataset. Specificity is a metric that assesses the model’s capability to accurately recognize instances that are actually not part of the positive class in a classification scenario. The F1-score represents the harmonic mean of recall and precision, providing an indication of classification accuracy in imbalanced datasets. Equations (1)–(6) define these seven metrics. G-mean, short for geometric mean, is utilized to assess the effectiveness of classification models, particularly in situations where imbalanced datasets exist.
A c c u r a c y = T P + T N T P + T N + F P + F N
P r e c i s i o n = T P T P + F P
R e c a l l = S e n s i t i v i t y = T P T P + F N
F 1 - s c o r e = 2 P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l
S p e c i f i c i t y = T N T N + F P
G - m e a n = s e n s i t i v i t y × s p e c i f i c i t y

4. Results and Discussion

Before the prediction process, data augmentation is performed on the training, validation, and testing data in the HAM10000 dataset using geometric augmentation, GAN augmentation, and geometric+GAN augmentation. The limited number of images in the skin cancer class is augmented to bring it closer to the number of images in the class with the highest number of images (nv class). The number of images in each class before and after augmentation is shown in Table 5.
Table 6 shows the performance comparison of several pre-trained models with the proposed augmentation method. Using the original data, Resnet152v2 performed the best based on accuracy (84.12%), precision (84.77%), recall (83.67%), and F1-score (84.22%). However, when considering sensitivity, specificity, and G-mean, EfficientnetB7 achieved the best metric values with 99.49%, 94.91%, and 97.17%, respectively. Through the augmentation scheme we proposed, the accuracy of skin cancer detection can be enhanced, reaching a range of 96% to 97.95%. Overall, geometric augmentation produced the best performance based on accuracy, precision, and F1-score metrics, while geometric+GAN yielded the best metrics in terms of sensitivity, specificity, and G-mean values. SensitivityAtSpecificity, SpecificityAtSensitivity, and G-mean all approach 100% when employing geometric+GAN on a tested pre-trained model. It is clear from Table 7 that changing the FC layer makes the accuracy go up to 98.07% when EfficientnetB7 and geometric augmentation are used.
Figure 4 shows sample accuracy results from the training and validation of EfficientnetB7 on the original dataset and the proposed augmentation. The training and validation accuracies appear to overfit the original dataset (Figure 4a). Validation accuracy is improved by geometric augmentation (Figure 4b), thereby reducing overfitting. Training accuracy is enhanced through the use of GAN and geometric+GAN (Figure 4c,d).
The sample confusion matrices generated from the original dataset and the best-proposed model are shown in Figure 5 and Figure 6, respectively. Both of these confusion matrices were generated using EfficientnetB7. In Figure 5, many classes are still predicted inaccurately due to imbalanced data. In Figure 6, skin cancer images in the df and vasc classes can be accurately classified with no classification errors. Only 3 images out of 1319 images in the akiec class were misclassified as bcc. Six mispredictions were observed among bcc samples out of 1360. Fifty-two instances of bkl samples were inaccurately predicted out of a comprehensive pool of 1316 samples. Out of the overall 1303 samples, 70 samples belonging to the mel class were predicted incorrectly. Similarly, for nv cases, 50 mistakes were found in 1316 samples.
We performed a comparative analysis to evaluate the performance of our model by comparing it to the outcomes of earlier research that utilized the use of the HAM10000 dataset, as shown in Table 8. Our proposed approach outperforms earlier findings in a number of metrics. Our limitation is mainly in terms of accuracy when compared to Gomathi et al. [4]. The accuracy rate still needs improvement, and we plan to explore other deep-learning architectures to enhance skin cancer detection. However, in terms of recall, precision, and F1, our approach outperforms the previous research. The standard deviations of accuracy, precision, and recall in our proposed methods also indicate low values, suggesting that our proposed approach demonstrates consistent performance across all three metrics. Figure 7 shows the SHAP explanations of akiec, bcc, bkl, df, mel, nv, and vasc samples. The explanations are displayed on a clear grey background, with the testing images on the left.

5. Conclusions

This study provides valuable insights into a deep-learning approach for the early detection of skin cancer using image augmentation techniques. The proposed two-stage image augmentation technique, involving both geometric augmentation and GAN augmentation, demonstrated high performance. The proposed model achieves an accuracy of 96.90%, precision of 97.07%, recall of 96.87%, and F1-score of 96.97%. The other metrics, such as sensitivity, specificity, and G-mean, of the proposed augmentation method also achieve better performance compared to the results from the original dataset. The use of an interpretable technique for skin cancer diagnosis is also a significant contribution to the field, as it can help clinicians understand the reasoning behind the diagnosis and improve trust in the system. Overall, this research paper presents a promising approach to automated skin cancer detection that could have a significant impact on patient outcomes and healthcare costs. For future research, we will include another dataset, namely ISIC 2020, to validate the results of the next experiments.

Author Contributions

Conceptualization, C.S.; Data curation, A.S. and J.Z.; Formal analysis, A.W.; Investigation, C.S. and A.W.; Methodology, C.S.; Resources, A.S.; Software, A.S. and J.Z.; Supervision, A.W.; Validation, C.S. and A.W.; Writing—original draft, C.S.; Writing—review and editing, C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by DRTPM-DIKTI: 065/A38-04/UDN-09/VII/2023 for research funding in 2023.

Data Availability Statement

Dataset is publicly available at: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T (accessed on 2 July 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bozkurt, F. Skin lesion classification on dermatoscopic images using effective data augmentation and pre-trained deep learning approach. Multimed. Tools Appl. 2023, 82, 18985–19003. [Google Scholar] [CrossRef]
  2. Kalpana, B.; Reshmy, A.; Senthil Pandi, S.; Dhanasekaran, S. OESV-KRF: Optimal ensemble support vector kernel random forest based early detection and classification of skin diseases. Biomed. Signal Process. Control 2023, 85, 104779. [Google Scholar] [CrossRef]
  3. Girdhar, N.; Sinha, A.; Gupta, S. DenseNet-II: An improved deep convolutional neural network for melanoma cancer detection. Soft Comput. 2023, 27, 13285–13304. [Google Scholar] [CrossRef]
  4. Gomathi, E.; Jayasheela, M.; Thamarai, M.; Geetha, M. Skin cancer detection using dual optimization based deep learning network. Biomed. Signal Process. Control 2023, 84, 104968. [Google Scholar] [CrossRef]
  5. Cassidy, B.; Kendrick, C.; Brodzicki, A.; Jaworek-Korjakowska, J.; Yap, M.H. Analysis of the ISIC image datasets: Usage, benchmarks and recommendations. Med. Image Anal. 2022, 75, 102305. [Google Scholar] [CrossRef] [PubMed]
  6. Chaturvedi, S.S.; Tembhurne, J.V.; Diwan, T. A multi-class skin Cancer classification using deep convolutional neural networks. Multimed. Tools Appl. 2020, 79, 28477–28498. [Google Scholar] [CrossRef]
  7. Alsahafi, Y.S.; Kassem, M.A.; Hosny, K.M. Skin-Net: A novel deep residual network for skin lesions classification using multilevel feature extraction and cross-channel correlation with detection of outlier. J. Big Data 2023, 10, 105. [Google Scholar] [CrossRef]
  8. Mumuni, A.; Mumuni, F. Data augmentation: A comprehensive survey of modern approaches. Array 2022, 16, 100258. [Google Scholar] [CrossRef]
  9. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  10. Zhang, Y.; Wang, Z.; Zhang, Z.; Liu, J.; Feng, Y.; Wee, L.; Dekker, A.; Chen, Q.; Traverso, A. GAN-based one dimensional medical data augmentation. Soft Comput. 2023, 27, 10481–10491. [Google Scholar] [CrossRef]
  11. Abayomi-Alli, O.O.; Damaševičius, R.; Misra, S.; Maskeliūnas, R.; Abayomi-Alli, A. Malignant skin melanoma detection using image augmentation by oversampling in nonlinear lower-dimensional embedding manifold. Turk. J. Electr. Eng. Comput. Sci. 2021, 29, 2600–2614. [Google Scholar] [CrossRef]
  12. Leguen-deVarona, I.; Madera, J.; Martínez-López, Y.; Hernández-Nieto, J.C. SMOTE-Cov: A New Oversampling Method Based on the Covariance Matrix. In Data Analysis and Optimization for Engineering and Computing Problems: Proceedings of the 3rd EAI International Conference on Computer Science and Engineering and Health Services, Mexico City, Mexico, 28–29 November 2019; Vasant, P., Litvinchev, I., Marmolejo-Saucedo, J.A., Rodriguez-Aguilar, R., Martinez-Rios, F., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 207–215. [Google Scholar]
  13. Douzas, G.; Bacao, F.; Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf. Sci. 2018, 465, 1–20. [Google Scholar] [CrossRef]
  14. Chang, C.C.; Li, Y.Z.; Wu, H.C.; Tseng, M.H. Melanoma Detection Using XGB Classifier Combined with Feature Extraction and K-Means SMOTE Techniques. Diagnostics 2022, 12, 1747. [Google Scholar] [CrossRef] [PubMed]
  15. Tahir, M.; Naeem, A.; Malik, H.; Tanveer, J.; Naqvi, R.A.; Lee, S.W. DSCC_Net: Multi-Classification Deep Learning Models for Diagnosing of Skin Cancer Using Dermoscopic Images. Cancers 2023, 15, 2179. [Google Scholar] [CrossRef] [PubMed]
  16. Batista, G.E.A.P.A.; Bazzan, A.L.C.; Monard, M.C. Balancing Training Data for Automated Annotation of Keywords: A Case Study. WOB 2003, 3, 10–18. [Google Scholar]
  17. Alam, T.M.; Shaukat, K.; Khan, W.A.; Hameed, I.A.; Almuqren, L.A.; Raza, M.A.; Aslam, M.; Luo, S. An Efficient Deep Learning-Based Skin Cancer Classifier for an Imbalanced Dataset. Diagnostics 2022, 12, 2115. [Google Scholar] [CrossRef] [PubMed]
  18. Sae-Lim, W.; Wettayaprasit, W.; Aiyarak, P. Convolutional Neural Networks Using MobileNet for Skin Lesion Classification. In Proceedings of the 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE), Chonburi, Thailand, 10–12 July 2019; pp. 242–247. [Google Scholar] [CrossRef]
  19. Alsaidi, M.; Jan, M.T.; Altaher, A.; Zhuang, H.; Zhu, X. Tackling the class imbalanced dermoscopic image classification using data augmentation and GAN. Multimed. Tools Appl. 2023. [CrossRef]
  20. Qin, Z.; Liu, Z.; Zhu, P.; Xue, Y. A GAN-based image synthesis method for skin lesion classification. Comput. Methods Programs Biomed. 2020, 195, 105568. [Google Scholar] [CrossRef]
  21. Ali, I.S.; Mohamed, M.F.; Mahdy, Y.B. Data Augmentation for Skin Lesion using Self-Attention based Progressive Generative Adversarial Network. arXiv 2019, arXiv:1910.11960. [Google Scholar]
  22. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 27. [Google Scholar]
  23. Shamsolmoali, P.; Zareapoor, M.; Shen, L.; Sadka, A.H.; Yang, J. Imbalanced data learning by minority class augmentation using capsule adversarial networks. Neurocomputing 2021, 459, 481–493. [Google Scholar] [CrossRef]
  24. Shahin, M.; Chen, F.F.; Hosseinzadeh, A.; Khodadadi Koodiani, H.; Shahin, A.; Ali Nafi, O. A smartphone-based application for an early skin disease prognosis: Towards a lean healthcare system via computer-based vision. Adv. Eng. Inform. 2023, 57, 102036. [Google Scholar] [CrossRef]
  25. Bhandari, M.; Shahi, T.B.; Neupane, A. Evaluating Retinal Disease Diagnosis with an Interpretable Lightweight CNN Model Resistant to Adversarial Attacks. J. Imaging 2023, 9, 219. [Google Scholar] [CrossRef] [PubMed]
  26. Shan, P.; Chen, J.; Fu, C.; Cao, L.; Tie, M.; Sham, C.W. Automatic skin lesion classification using a novel densely connected convolutional network integrated with an attention module. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 8943–8956. [Google Scholar] [CrossRef]
  27. Alwakid, G.; Gouda, W.; Humayun, M.; Jhanjhi, N.Z. Diagnosing Melanomas in Dermoscopy Images Using Deep Learning. Diagnostics 2023, 13, 1815. [Google Scholar] [CrossRef] [PubMed]
  28. Ameri, A. A Deep Learning Approach to Skin Cancer Detection in Dermoscopy Images. J. Biomed. Phys. Eng. 2020, 10, 801–806. [Google Scholar] [CrossRef] [PubMed]
  29. Ali, M.S.; Miah, M.S.; Haque, J.; Rahman, M.M.; Islam, M.K. An enhanced technique of skin cancer classification using deep convolutional neural network with transfer learning models. Mach. Learn. Appl. 2021, 5, 100036. [Google Scholar] [CrossRef]
  30. Sevli, O. A deep convolutional neural network-based pigmented skin lesion classification application and experts evaluation. Neural Comput. Appl. 2021, 33, 12039–12050. [Google Scholar] [CrossRef]
  31. Fraiwan, M.; Faouri, E. On the Automatic Detection and Classification of Skin Cancer Using Deep Transfer Learning. Sensors 2022, 22, 4963. [Google Scholar] [CrossRef]
  32. Balambigai, S.; Elavarasi, K.; Abarna, M.; Abinaya, R.; Vignesh, N.A. Detection and optimization of skin cancer using deep learning. J. Phys. Conf. Ser. 2022, 2318, 012040. [Google Scholar] [CrossRef]
  33. Shaheen, H.; Singhn, M.P. Multiclass skin cancer classification using particle swarm optimization and convolutional neural network with information security. J. Electron. Imaging 2022, 32, 042102. [Google Scholar] [CrossRef]
Figure 1. Category distribution of HAM10000 dataset.
Figure 1. Category distribution of HAM10000 dataset.
Computation 11 00246 g001
Figure 2. Example image of each class in HAM10000 dataset. From top to bottom: akiec, bcc, bkl, df, nv, mel, and vasc.
Figure 2. Example image of each class in HAM10000 dataset. From top to bottom: akiec, bcc, bkl, df, nv, mel, and vasc.
Computation 11 00246 g002
Figure 3. The proposed skin cancer detection method.
Figure 3. The proposed skin cancer detection method.
Computation 11 00246 g003
Figure 4. The samples of training and validation accuracy on EfficientnetB7.
Figure 4. The samples of training and validation accuracy on EfficientnetB7.
Computation 11 00246 g004
Figure 5. Confusion matrix of EfficientnetB7 on original dataset.
Figure 5. Confusion matrix of EfficientnetB7 on original dataset.
Computation 11 00246 g005
Figure 6. Confusion matrix of the best performance model (EfficientnetB7+Custom FC using geometric augmentation).
Figure 6. Confusion matrix of the best performance model (EfficientnetB7+Custom FC using geometric augmentation).
Computation 11 00246 g006
Figure 7. The results of SHAP explanation on InceptionResnetV2 using Geometric+GAN augmentation. The sample images are correctly classified as akiec, bcc, bkl, df, mel, nv, and vasc since the high concentrations of red pixels are located in the second, third, fourth, fifth, sixth, seventh, and eighth explanation column images, respectively.
Figure 7. The results of SHAP explanation on InceptionResnetV2 using Geometric+GAN augmentation. The sample images are correctly classified as akiec, bcc, bkl, df, mel, nv, and vasc since the high concentrations of red pixels are located in the second, third, fourth, fifth, sixth, seventh, and eighth explanation column images, respectively.
Computation 11 00246 g007
Table 1. Summary of geometric augmentation parameters.
Table 1. Summary of geometric augmentation parameters.
ParameterValue
rotation_range20
width_shift_range0.2
height_shift_range0.2
shear_range0.2
zoom_range0.2
horizontal_flipTrue
brightness_range(0.8, 1.2)
Table 2. Summary of GAN-based augmentation parameters.
Table 2. Summary of GAN-based augmentation parameters.
LayerActivation
DiscriminatorConv2DLeakyReLU
Conv2DLeakyReLU
Conv2DLeakyReLU
Conv2DLeakyReLU
Flatten
Dropout
DenseSigmoid
GeneratorDenseLeakyReLU
Conv2DTransposeLeakyReLU
Conv2DTransposeLeakyReLU
Conv2DTransposeLeakyReLU
Conv2DTanh
Table 3. Parameters of training model.
Table 3. Parameters of training model.
ParameterValue
OptimizerAdam
Learning rate0.0001
Optimizer parametersbeta_1 = 0.9, beta_2 = 0.999
Epochs100 (with early stopping)
Table 4. Summary of custom FC layers.
Table 4. Summary of custom FC layers.
LayerOutput ShapeActivation
Dense(None, 64)Relu
Dense(None, 32)Relu
Dense(None, 7)Softmax
Table 5. Distribution of each skin cancer category for each augmentation scheme.
Table 5. Distribution of each skin cancer category for each augmentation scheme.
OriginalGeometric Aug.GANGeometric Aug.+GAN
CategoryTrainTestValTrainTestValTrainTestValTrainTestVal
vasc110266480113505544843135950348051371529
nv48221347536482613165634854130254948561300549
mel79222299487713035254858131952848361361508
df83257477514235074831132554948311340534
bkl78522490488713165024813132956348311337537
bcc37010143478313605624792138752647981394513
akiec2485821484413195424802136653748361284585
Num. images7210200380233,7939387375533,7939387375533,79393873755
Total images10,01546,93546,93546,935
Table 6. Performance of the proposed augmentation method on several pre-trained models.
Table 6. Performance of the proposed augmentation method on several pre-trained models.
Augmentation
Method
Pre-Trained
Model
AccPrecRecF1Sensitivity
AtSpecificity
Specificity
AtSensitivity
G-MeanEpoch
Original DataXception79.9380.7079.5380.1199.1691.7195.3612
Inceptionv378.8879.5178.4878.9999.3192.3195.7511
Resnet152v284.1284.7783.6784.2299.3393.5696.4018
EfficientnetB778.0379.6377.2878.4499.4994.9197.1711
InceptionresnetV279.6380.2078.6879.4499.2891.7695.4519
VGG1981.7381.8381.6381.7399.1291.2695.1128
GeometricXception97.0597.0697.0197.0399.8799.1299.4919
Inceptionv397.3897.4897.3597.4199.9099.2099.5531
Resnet152v296.9096.9596.8696.9099.8598.9399.3928
EfficientnetB797.9598.0097.9097.9599.9199.4199.6619
InceptionresnetV297.4097.4697.3697.4199.8999.2099.5528
VGG1997.2297.2497.2097.2299.8398.8499.3332
GANXception96.0896.3595.9696.1699.8698.7099.2810
Inceptionv396.5096.6296.4596.5399.8698.6499.2516
Resnet152v296.3096.4796.2396.3599.7998.3799.0820
EfficientnetB796.4896.5996.4496.5199.7998.2599.0220
InceptionresnetV296.2296.3296.2096.2699.8298.4499.1318
VGG1996.2296.2696.2096.2399.70100.0099.8538
Geometric
+ GAN
Xception96.2196.5196.0496.2799.9499.2299.589
Inceptionv396.4596.5696.3996.4899.8698.5699.2120
Resnet152v296.5996.7596.4596.6099.8698.8799.3614
EfficientnetB796.5096.6196.4396.5299.8998.9299.4014
InceptionresnetV296.7196.8296.6796.7499.8598.6299.2321
VGG1995.3997.3693.8995.59100.0099.9399.9617
Table 7. Performance of the proposed augmentation method on the custom FC layer (three dense layers with 64 neurons, 32 neurons, and 7 neurons, respectively).
Table 7. Performance of the proposed augmentation method on the custom FC layer (three dense layers with 64 neurons, 32 neurons, and 7 neurons, respectively).
Augmentation
Method
Pre-Trained
Model
AccPrecRecF1Sensitivity
AtSpecificity
Specificity
AtSensitivity
G-MeanEpoch
GeometricEfficientnetB798.0798.1098.0698.0899.9299.4699.6920
GANInceptionv396.4896.6396.4496.5399.8398.5499.1817
Geometric
+ GAN
InceptionresnetV296.9097.0796.8796.9799.8698.9099.3822
Table 8. A comparative analysis of performance with the latest models.
Table 8. A comparative analysis of performance with the latest models.
Ref.MethodAccPrecRecF1Stdev
Alam et al. [17]AlexNet, InceptionV3, and RegNetY-32091--88.1-
Kalpana et al. [2]ESVMKRF-HEAO97.496.395.997.40.7767
Shan et al. [26]AttDenseNet-1219891.885.485.66.3003
Gomathi et al. [4]DODL net98.7696.0295.3794.321.7992
Alwakid et al. [27]InceptionResnet-V291.269191910.1501
Sae-Lim et al. [18]Modified MobileNet83.23-8582-
Ameri [28]AlexNet84----
Chaturvedi et al. [6]ResNeXt10193.28888-3.0022
Shahin Ali et al. [29]DCNN91.4396.5793.6695.092.5775
Sevli et al. [30]Custom CNN architecture91.51----
Fraiwan et al. [31]DenseNet20182.978.573.674.44.6522
Balambigai et al. [32]Grid search ensemble77.17----
Shaheen et al. [33]PSOCNN97.82--98-
This studyGeometric+EfficientnetB7+Custom FC98.0798.1098.0698.080.0002
GAN+InceptionV396.5096.6296.4596.530.0009
Geometric+GAN+InceptionresnetV2+Custom FC96.9097.0796.8796.970.0011
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Supriyanto, C.; Salam, A.; Zeniarja, J.; Wijaya, A. Two-Stage Input-Space Image Augmentation and Interpretable Technique for Accurate and Explainable Skin Cancer Diagnosis. Computation 2023, 11, 246. https://doi.org/10.3390/computation11120246

AMA Style

Supriyanto C, Salam A, Zeniarja J, Wijaya A. Two-Stage Input-Space Image Augmentation and Interpretable Technique for Accurate and Explainable Skin Cancer Diagnosis. Computation. 2023; 11(12):246. https://doi.org/10.3390/computation11120246

Chicago/Turabian Style

Supriyanto, Catur, Abu Salam, Junta Zeniarja, and Adi Wijaya. 2023. "Two-Stage Input-Space Image Augmentation and Interpretable Technique for Accurate and Explainable Skin Cancer Diagnosis" Computation 11, no. 12: 246. https://doi.org/10.3390/computation11120246

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop