Skin Cancer Classification Framework Using Enhanced Super Resolution Generative Adversarial Network and Custom Convolutional Neural Network

Sufiyan Bashir Mukadam; Hemprasad Yashwant Patil

doi:10.3390/app13021210

and

School of Electronics Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India

^*

Author to whom correspondence should be addressed.

Appl. Sci.2023, 13(2), 1210;https://doi.org/10.3390/app13021210

This article belongs to the Special Issue Artificial Intelligence in Medical Imaging: The Beginning of a New Era

Version Notes

Order Reprints

Simple Summary

Skin cancer is one of the most fatal diseases for mankind. The early detection of skin cancer will facilitate its overall treatment and contribute towards lowering the mortalities. This paper presents the deep learning-based algorithm along with pre-processing for the classification of skin cancer images. The image resolution of publicly available HAM10000 data after resizing is low and hence, when we pre-process the data to enhance the image resolution and then subject it to the deep neural network, overall performance metrics namely accuracy, is typically competitive.

Abstract

Melanin skin lesions are most commonly spotted as small patches on the skin. It is nothing but overgrowth caused by melanocyte cells. Skin melanoma is caused due to the abnormal surge of melanocytes. The number of patients suffering from skin cancer is observably rising globally. Timely and precise identification of skin cancer is crucial for lowering mortality rates. An expert dermatologist is required to handle the cases of skin cancer using dermoscopy images. Improper diagnosis can cause fatality to the patient if it is not detected accurately. Some of the classes come under the category of benign while the rest are malignant, causing severe issues if not diagnosed at an early stage. To overcome these issues, Computer-Aided Design (CAD) systems are proposed which help to reduce the burden on the dermatologist by giving them accurate and precise diagnosis of skin images. There are several deep learning techniques that are implemented for cancer classification. In this experimental study, we have implemented a custom Convolution Neural Network (CNN) on a Human-against-Machine (HAM10000) database which is publicly accessible through the Kaggle website. The designed CNN model classifies the seven different classes present in HAM10000 database. The proposed experimental model achieves an accuracy metric of 98.77%, 98.36%, and 98.89% for protocol-I, protocol-II, and protocol-III, respectively, for skin cancer classification. Results of our proposed models are also assimilated with several different models in the literature and were found to be superior than most of them. To enhance the performance metrics, the database is initially pre-processed using an Enhanced Super Resolution Generative Adversarial Network (ESRGAN) which gives a better image resolution for images of smaller size.

Keywords:

benign; malignant; skin cancer; ESRGAN; CAD

1. Introduction

Skin melanoma occurs due to fast procreation of aberrant skin cells in human anatomy. The count of skin malignancy cases has significantly increased over the past years []. As the skin is comprised of three lamina, the topmost lamina is the Epidermis, the middle lamina is the Dermis, and the deepest lamina is the Hypodermis, which is for the formation of fat and fibrous connective tissue. As skin is the outer most organ of human anatomy, it is most likely to be affected by fungal growth and bacteria which can be identified under microscopic examination. It results in varying textures and colours of the skin []. Skin cancer is classified under two sub-classifications, namely non-melanoma and malignant melanoma cancer. Non-melanoma cancer is less hazardous and occurs due to repeated exposure to UV radiation. The most common reason for skin cancer-related mortality is malignant melanoma. According to the survey by WHO, one out of three patients who are diagnosed with cancer have a skin cancer specifically. There are nearly 2–3 million non-malignant patients and 1.32 lakh malignant melanoma patients []. Melanoma is caused due to an imbalance of melanocytes in skin cells. The diagnosis of skin lesions are difficult due to the lack of standard guidelines for the detection of skin cancer. In addition to this, skin lesion classification is more challenging due to obscure boundaries, and the involvement of obstacles like veins, hairs, and moles []. The dermatologists who work on different skin diseases face limitations in visualising the dermoscopic images manually. Due to the similarity in skin lesions (inter-class similarity of skin diseases) leads to a degree of subjectivity and thus, human error []. There are further issues presented by clinical examinations: they are costlier and require highly skilled medical experts to operate the specialized medical diagnostic tools []. In recent years, researchers have developed various techniques, namely via a computer-aided diagnosis (CAD) system in an effort to lessen the workload of medical professionals by supporting them in providing an accurate diagnosis of cancer []. The CAD systems can categorize the lesion images into the melanoma and non-melanoma cancer []. In this proposed work, we implemented a Custom Convolution Neural Network (CCNN) which helps us to categorize the seven distinct classes of skin cancer stated in the database Human Against Machine (HAM10000) []. The HAM10000 database, consisting of 10,015 images of dermoscopic skin lesions, is used in this proposed work. The pre-processing of the HAM10000 database is carried out using an Enhanced Super Resolution Generative Adversarial Network (ESRGAN) which enhanced the quality of dermoscopic images to acquire better results compared to existing models. The proposed model was implemented on the HAM10000 dataset which is split into two subsets stated as the training and testing datasets in an 80:20 ratio (protocol-I), as well as the train:val:test split as per protocol-II and protocol-III. The paper is organized as follows. Section 2 presents the related published works on skin cancer classification. The description of the HAM10000 dataset is mentioned in Section 3. The proposed methodology including preprocessing techniques, and the design and building of the custom CNN model is indicated in Section 4. The results of the proposed framework are presented in Section 5. Section 6 concludes the work and discusses the future scope for further enhancement of performance metrics.

2. Related Work

The majority of research on the classification of melanomas focuses on the use of the dermoscopic data, which provides more visual information and is frequently employed by professional dermatologists. Recent research on the CAD system for skin lesion categorization employs deep learning-based approaches. In most of the approaches, it is seen that the model requires more training time due to larger image size. Furthermore, the presently available public databases for skin lesion classification are mostly imbalanced, which hinders the performance of the model. To classify skin lesions, a study was performed by Aladhadh et al. [] in which they employed a deep learning method based on vision transformers. A two-layer architecture is used in this work to accurately classify skin cancer. The transformer splits the augmented data into different patches and feeds the input to a multi-layer perceptron classifier to define its class with an accuracy of 96.14%. The study carried out by Bansal et al. [] have pre-processed the HAM10000 database using different morphological operations. The handcrafted method for feature extraction is used to retrieve the features. The two-transfer learning models named EfficientNet-B0 and ResNet50V2 are used for skin lesion classification and obtained an accuracy of 94.9%. A research study by Basak et al. [] worked on the HAM10000 database by employing a multi-focus segmentation network (MFS-Net) based on a deep learning algorithm. The retrieval of deep features is performed using the parallel partial decoder technique to produce a segmentation map. Finally, two different attention modules are implemented to obtain a segmentation output. The authors achieved a dice score of 90.6% using the prescribed algorithm. Nakai et al. [] used a transformer model based on a deep bottleneck. This model integrates the self-attention block to form a model with deep extracted features. It helped them to enhance the performance of overall categorization. The model accomplished a total accuracy of 96.1%. In the research work by Popescu et al. [], a collective intelligence-based transfer learning system was presented. This system comprises of nine different transfer learning models. This individual model is trained using the HAM10000 database and the outputs of the individual network are combined using a decision-level fusion module. It helped them to boost their overall performance by 3%. This approach yields an accuracy of 86.71%. Qian et al. [] used multi-scale attention blocks which are a deep learning-based approach. This technique was implemented on the HAM10000 database to retrieve special features which will focus on skin lesion area. It has also adopted a loss weighting which helped to solve the issue of imbalanced data per class. The performance of this model gains an accuracy of 91.6%. The study stated in [] is utilized for the categorization of skin lesions using the HAM10000 dataset. Multi-Scale Multi-CNN (MSM-CNN), a DL model built on a three-tier ensemble approach, was employed in this work. The proposed model results are then compared to the pre-trained CNN models such as EfficientNetB0, SeResNeXt-50, and EfficientNetB1. The MSM-CNN achieves the highest accuracy of 96.3% compared to other models. Panthakkan et al. [] used a concatenation of Xception and ResNet50 models on the HAM10000 database. A sliding window method is implemented for the purpose of training as well as testing the system. The presented approach yields a good accuracy of 97.8% on testing data. In the study article [], a classification of skin lesions is performed using the fusion of handcrafted and DL-based features and is further classified using ML classifiers to achieve an accuracy of 92.4%.

Through these related work studies we have identified some shortcomings which have been overcome by our proposed model. We can clearly see that there is a scope to improvise the performance metrics in terms of accuracy. Furthermore, it is identified that complexion in the model leads to the maximum execution time in training the model.

3. Materials and Methods

This section presents a detailed description of the HAM10000 dataset and also explains the seven different classes present in it.

3.1. HAM10000 Dataset

To train any neural network for obtaining good classification results, a huge dataset is required. The datasets used for the classification of skin pigmented lesions were small and inadequate for training. To overcome this issue, Tschandl and his team released the Human against Machine (HAM10000) dataset []. The dataset consists of 10,000 skin pigmented lesions of seven different important classes that can be used for the diagnosis of skin cancer. Due to the diverse population of dermoscopic images, data organization, cleaning, and defining a workflow to train a neural network is required. The final database version consists of 10,015 images and was released for academic research purpose and is made available on ISIC archive []. The ground truth of the database was confirmed by the expert pathologist in the field of dermoscopy. The seven important diagnosis classes are the following.

3.1.1. Actinic Keratosis (akiec)

Actinic Keratosis is the most common and non-obtrusive carcinoma. It is a sub-variant of squamous cell carcinoma which is cured locally without any surgical operation. It is said that akiec is an early sign of cell carcinoma and not a real carcinoma. This akiec lesion may grow into an intrusive squamous cell carcinoma []. Actinic Keratosis mostly appears on the face of the human body and is induced due to excessive exposure to UV light [].

3.1.2. Basal Cell Carcinoma (bcc)

Basal cell carcinoma is a specific class of melanoma which arises in basal melanocytes that make new cells rather than shedding old ones. It is the most prevalent kind of melanoma []. It is more likely to appear in areas that are susceptible to direct sun light, such as the neck and head of human body []. It generally occurs in the form of pink growths, recurrent sores, and red patches on the skin. These lesions develop gradually and hardly disseminate [].

3.1.3. Benign Keratosis-Like Lesions (bkl)

The bkl category in the database has three distinct classes of lesions that lacked cancerous traits. These sorts of lesions include Lichenoid Keratosis, Solar Lentigo, and Seborrheic Keratosis []. A benign skin condition known as lichenoid keratosis often manifests as a tiny, single, grey-brown lesion on the chest and upper limbs []. Solar Lentigo is a kind of macular hyper-pigmented infection that may differ in size, ranging from a few millimetres to more than one centimetre []. Seborrheic Keratosis is a benign condition that does not necessitate in-depth treatment. It is reddish-brown or greyish brown in color and often appears on the back, collar, scalp, and chest [].

3.1.4. Dermatofibroma (df)

Dermatofibroma is a relatively common dermatological condition that mostly impacts adolescent or elderly humans, with little women preponderance []. Clinically speaking, dermatofibroma presents as stiff soles, or many hard pustules, patches, or lumps, with a soft surface and a color that may range from pale brown to darkly brown, purplish-red, or yellow []. These benign skin lesions often appear on the upper arm, upper back, and lower leg [].

3.1.5. Melanocytic Nevi (nv)

The list of seven subclasses includes all of the innocuous melanocyte malignancies known as melanocytic nevi, which may have numerous variations []. They are skin tumours brought on by the expansion of melanocytes (the skin’s pigment-producing cells). It is mainly induced due to UV rays emitted from the sun at the early childhood age [].

3.1.6. Vascular Lesions (vasc)

The majority of vasc are inherited; however, they may arise later in life and are seldom malignant. They are sores of various appearances that form on the epidermis and surrounding tissues and are often referred to as birthmarks [].

3.1.7. Melanoma (mel)

Malignant melanocytes give rise to melanoma, a cancer that may manifest in many different forms. If removed at a preliminary phase, it is curable with simple surgical intervention. Melanomas may be either intrusive or harmless []. It is particularly apparent on sun-exposed body parts that include the face, trunk, hands, collar, and legs. Melanoma may be identified by patches that have an irregular shape, uneven borders, and distinct colours, are larger than 6 mm, and tend to expand. It might disseminate to different organs of the body and can cause fatality if it remains untreated [].

The HAM10000 dataset comprises seven different classes as described above and the class-wise categorization for the number of images is stated in Table 1. The distribution of images seems to be imbalanced. To make it balanced, data is augmented which is elaborated in the pre-processing section.

Table 1. Class-wise images present in HAM10000 dataset.

4. Proposed Methodology

This section elaborates on two different pre-processing techniques implemented in this proposed work. In addition to pre-processing, we have also discussed the custom convolutional neural network and we built a CNN model from scratch.

4.1. Pre-Processing

It is one of the most important steps while working with clinical image data []. It is primarily applied on the raw database, before feeding them for training the Convolutional Neural Network based system []. The pre-processing algorithm provides enhancement in images, which helps to boost the inclusive performance metrics pertaining to the model. One of the substantive contributions of the proposed research work is to enhance the quality of HAM10000 data using the ESRGAN algorithm which indeed leads to better extraction of features from the clinical image by the model. There are two different pre-processing techniques that are implemented in this study, namely ESRGAN and data augmentation which are discussed in detail in the following sub-sections.

4.1.1. Enhanced Super-Resolution Generative Adversarial Network (ESRGAN)

Pre-processing is a crucial step for the enhancement of images which helps in achieving superior performance metrics []. The Super-Resolution Generative Adversarial Network (SRGAN) is a foundational technique which can generate photorealistic patterns while super-resolving a single picture. The reckoning of a high resolved image from a low-resolution image is termed a super-resolution. The major optimization focus of super-resolution is to cut back the mean square error from the obtained highly resolved image and original image. GANs offer a potent framework for creating realistic pictures that seem believable and have excellent perceptual quality []. The visual hallucinated features, though, are very often associated with undesirable effects []. The Enhanced SRGAN is the adaptive technique which mainly addresses the three shortcomings of SRGAN that are Adversarial loss, Network design, and Perceptual loss. It also facilitates maintenance of a better ocular peculiarity with more pragmatic and natural-looking colors than SRGAN. To achieve the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN), Wang et al. introduced an additional Residual layer in Residual Dense Block (RDB) in [] by removing the Batch Normalization (BN) layer. The Residual in Residual Dense Network (RRDN) comprises four different blocks, namely Dense Feature Fusion, Residual Dense Blocks, Shallow Feature Extraction, and up-sampling net []. The Local Feature Fusion layer and the Local Residual Learning layer are the two dense layers that form the RRDB.

Local Feature Fusion (LFF): It is an adaptive state derived from RRDB and a convolution layer in a new RRDB and is given by Equation (1).

f_{d, L F} = h_{L F F}^{D} (\{f_{d - 1}, f_{d, 1}, \dots \dots, f_{d, c}, \dots . ., f_{d, C}\})

(1)

where

h_{L F F}^{D}

indicates the convolution layer of size 1 × 1 in the dth RRDB block and

f_{d - 1}, f_{d, 1}

, etc., are the input and output of dth RRDB correspondingly.

b.: Local Residual Learning (LRL): It is implemented for the improvement of overall information flow. It also helps to get the final output of dth RRDB as shown in Equation (2).

f_{d} = (\{f_{d - 1} + f_{d, L F}\})

(2)

Other than the improvement of visual qualities using RRDB, Wang et al. also calculated different loss functions which gave the overall performance of the generator. The different loss functions are stated as [].

(i): Discriminator loss: It is the loss calculated during misclassification of real and fake instances. Some of the fake instances are obtained from the generator by expanding the equation given in Equation (3).

l_{D}^{R a} = - E_{x_{r}} (l o g \{D_{R a} \{x_{r}, x_{f}\}\}) - E_{x_{f}} (l o g \{1 - D_{R a} \{x_{f}, x_{r}\}\})

(3)

where

l o g \{D_{R a} \{x_{r}, x_{f}\}\}

is the probability of classification by generator correctly and

l o g \{1 - D_{R a} \{x_{f}, x_{r}\}\}

helps to accurately label the fake images from the generator.

(ii): Generator loss: The generator loss is calculated if the discriminator misclassifies the fake images which helps the discriminator to improvise. It is given by Equation (4)

l_{G}^{R a} = - E_{x_{r}} (l o g \{1 - D_{R a} \{x_{r}, x_{f}\}\}) - E_{x_{f}} (l o g \{D_{R a} \{x_{f}, x_{r}\}\})

(4)

It is observed that the generator can achieve better results from both real and generated data in adversarial training.

(iii): Perpetual Loss: In ESRGAN, the perpetual loss is also improved by confining the features prior to activation, as compared to features after activation in SRGAN. The perpetual loss function is given by Equation (5).

l_{G} = l_{p e r c e p} + λ l_{G}^{R a} + η l_{1}

(5)

where the terms

λ

and

η

are the factors to equalize various loss functions and

l_{G}^{R a}

is the generator loss function.

(iv): Content Loss: The element wise Mean Square Error (MSE). It is most broadly used in targeting the super resolved image and is given by Equation (6)

L_{M S E}^{S R} = \frac{1}{r^{2} w h} \sum_{x = 1}^{r w} \sum_{y = 1}^{r h} {(i_{x, y}^{H R} - g_{θ_{g}} {(i^{L R})}_{x, y})}^{2}

(6)

where

g_{θ_{g}} (i^{L R})

is the reformed image and

i_{x, y}^{H R}

is the down sampled operation with a factor r [].

Figure 1 presents the comparison of sample images with their respective ESRGAN-enhanced images.

Figure 1. Comparison of sample images with their respective ESRGAN enhanced images.

4.1.2. Data Augmentation

In order to train the CNN model with multiple variations of the dermoscopic images, a data augmentation method is included in our research work. Minority oversampling is the most widely implemented method in restoring the model’s robustness and reducing the dataset’s bias when there is a significant imbalance in classes []. The deep learning model performs well when it is feed with a huge training dataset. The HAM10000 dataset used for our proposed work is imbalanced, as seen in Table 1. Data augmentation helps the network from overfitting issues caused due to imbalanced data. The main reason for augmenting the data is that there are only 8012 images in the training dataset. The different augmentation methods are implemented such as rescaling, rotating the image, zooming with factor of 0.1, and height and width shift with range factor 0.1. It makes the dataset more balanced and improves overall performance of the model.

4.2. Custom Convolutional Neural Network

CNN is a category of deep-learning system which detects and extracts features from images automatically []. It has acquired significance in medical image analysis, as it has in many other fields as a result of its higher performance. The layers of a standard CNN include convolution layer, dropout layer, activation function, fully connected layer, and pooling layer []. The image pixels need to be processed and are given as an input to the CNN. The original input pixels are subjected to detecting feature vectors, also termed as filters, in the convolution layer in order to extract a collection of features []. CNN’s primary function, convolution, allows automated feature extraction []. During the step of pooling, a dimensionality reduction process is conducted by applying filters to an input vector []. The reduction technique is carried out by taking the minimum, maximum, or median of the values in the filtering window, which is strung across the initial input vector [].

In neural network models, overfitting problems can arise, especially when the manifold training samples is insufficient. With a view to address this issue, a dropout operation was used, which increased the network’s capacity to alter distinct environments by arbitrarily deactivating a fraction of its neurons during training. The fully connected layer helps the process go on to the categorization stage. The output matrix is flattened before being sent on to the classifier after the feature extraction and pooling procedures. The proposed algorithm is shown in Figure 2.

Figure 2. Overview of the proposed ESRGAN-based CNN algorithm.

The dataset has two fundamental aspects. The first component aspect is a metadata file that contains specific data for cancer lesion images. The skin lesion’s location, the patient’s age and gender, the lesion’s diagnosis, and the skin lesion directory are all included in the metadata file. The second and primary section of the collection is comprised of visual files.

The objective of this study is to categorize skin lesions only based on digital images. Thus, the data file was reorganized to simply include the lesion type and the image file directory. Each lesion’s textual labelling was transformed into digital values between 0 and 6. Each subtype labelling codes are shown in Table 2.

Table 2. Notations for each class in the HAM10000 database.

The original dermoscopic images are of 600 × 400 pixels resolution and are saved in the RGB format. It was observed that the processing burden increases proportionally with picture size. Hence, image size reduction increases processing speed. Therefore, all samples in the collection are downsized to 24 × 24 pixels. Since the colour is a distinguishing factor in diagnosing the kind of lesion, the original colours of the photographs were maintained. The sharpening filters are implemented to enhance the contrast of every applied image.

4.3. Building a Custom CNN Model

When dealing with a large dataset, deep learning is typically regarded as an effective algorithm []. Conventionally, deep learning techniques demand a significant amount of computing time and large storage space []. Figure 3 depicts the customized CNN network model for classifying skin lesions. The custom CNN model is comprised of 4 × 2 layers. RGB input image of size 28 × 28 was utilized. The convolution operation is performed on the first two layers in each of these layers and 3 × 3 sized 32 filters are applied with a ReLu activation function. It is followed by the implementation of max pooling 2D layer with pool size of 2 × 2 and a batch normalization layer. In the second layer, the same convolution operation is performed with change in the parameters. In this layer 3 × 3 sized 64 filters are used with a ReLu activation function. After ReLu function, a max pooling 2D layer of 2 × 2 size and a batch normalization layer are employed. As a part of the third layer, the same convolution operation is performed with alteration in the parameters. In this layer, 3 × 3 sized 128 neurons are implemented with a ReLu activation function. It is then followed by the max pooling 2D layer of 2 × 2 size and a batch normalization layer. The fourth layer contains the similar convolution operation is performed with another set of parameters. In this layer, 3 × 3 sized 256 filters are used which are then followed by max 2D pooling of size 2 × 2, batch normalization layer, a dropout layer of 20%, and a flattening layer. In the final stage, the classifier receives the output of the flattening layer. Table 3 and Table 4 show the summary and hyper parameters used for designing the model, respectively. Proposed method for classification of skin lesions is illustrated in Algorithm 1.

Algorithm 1: Proposed algorithm for classification of skin lesions

Step 1: Pre-processing

a.: Raw input images are first pre-processed using the ESRGAN generator model.
b.: The images are then resized to 28 × 28 for faster classification using the CNN model.
c.: The imbalanced dataset is balanced using the data augmentation processes.
d.: The augmented data is first split up into training data and testing data.

Step 2: Training custom CNN model

a.

Feature map Fmap are extracted from the input images

b.

Set Fc = 2D Conv (Fmap, size(32));

c.

Set Fr = ReLu (Fc);

d.

Set Fp = MaxPooling2D (Fr);

e.

Set Fb = BatchNormalization (Fp);

f.

size₁ = [64,128,256]

for i = 0 to 2:

: Set Fc₁ = 2D Conv (Fmap, size₁(i));
: Set Fr₁ = ReLu (Fc₁);
: Set Fc₂ = 2D Conv (Fmap, size₁(i));
: Set Fr₂ = ReLu (Fc₂);
: Set Fp₁ = MaxPooling2D (Fr₂);
: Set Fb₁ = BatchNormalization (Fp₁);

end for

g.

Set Ff = Flattening (Fb₁);

h.

Set F∂ = Dropout (Ff);

i.

size₂ = [256,128,64,32]

for j = 0 to 3
Set Fd = Dense (F∂,size2(j));
Set Fb = BatchNormalization (Fd);
end j

j.

Set Foc = OutputClassifier (Fb);

Figure 3. Layered architecture of proposed CNN model.

Table 3. CNN Model Summary.

Table 4. Hyper parameters for training the model.

5. Results and Discussion

In this section, we discuss the model’s performance over a range of metrics and present a comparative study that illustrates how the suggested technique outperforms the current melanoma detection algorithms.

5.1. Performance Metrics

To assess the efficiency of the presented model, we used performance metrics such as Accuracy, F1-Score, Recall, and Precision. Performance metrics shown in Table 5 are calculated from a confusion matrix and are given by Equations (7), (8), (9) and (10), respectively. Performance measurement of the deep learning model comprises the following terms: (a) True Positive (Tp), (b) True Negative (Tn), (c) False Positive (Fp), and (d) False Negative (Fn) [].

Table 5. Performance metrics and their formulas.

5.2. Protocol-I (Train:Test = 80:20 Ratio)

After performing data augmentation, the entire dataset is split into two partitions, namely, train and test with the ratio of 80:20. For protocol-I, for this augmented data, the total number of training images is 37,548 and of test images is 9387. The details of class-wise training and test images are depicted in Table 6.

Table 6. Class-wise images present in HAM10000 dataset with protocol-I.

The model was trained for 25 epochs on the Google Colaboratory Pro platform with 12 GB RAM and Python 3 Google compute backend engine GPU Accelerator. We interrupt the model’s continuing execution using the early stopping method and record the model’s best-performing parameters, such as its maximum accuracy and minimum cross-entropy loss. Every time the model fails to reach an accuracy greater than those acquired in the previous two epochs, we decreased the learning rate of the model to prevent additional stalling in the learning phase. The training and testing accuracies and loss graphs are displayed in Figure 4 and Figure 5, respectively. The highest testing accuracy was obtained as 98.77% on the 25th epoch. Accuracy is one of the important metrics to characterize the achievement of the model if the dataset is proportionate. To get the different evaluation scores, we have used the confusion matrix which gives the exact classifications as shown in Figure 6. In this experiment, a confusion matrix is incurred for seven classes as mentioned in the dataset. The confusion matrix scores are computed to examine the performance of the model for different classes. From Table 7, it can be observed that the model works very well in classifying class 0, class 3, and class 5. The scores obtained for class 4 are slightly low.

Figure 4. Accuracy graph for training and testing for protocol-I.

Figure 5. Training and testing losses for protocol-I.

Figure 6. Confusion matrix for protocol-I (80:20 Train:Test split).

Table 7. Class-wise performance measures of the model.

Various Approaches That Follow Protocol-I

Table 8 specifies accuracies in the context of current research performed on the HAM10000 database. In the Agyenta et al. [], the authors carried out research work on the HAM10000 database. Transfer learning techniques like InceptionV3, ResNet50, DenseNet201, and comparative study is accomplished on the HAM1000 database and achieved accuracies of 85.80%, 86.69%, and 86.91%, respectively. The authors have reached the highest accuracy for the DenseNet201 model. In another work by Onur et al. in [], the presented approach included a custom CNN model and experiments with an image size of 75 × 100. An accuracy of 91.51% was achieved in this study. Qian et al. [] presented an experimental study using the CNN model concatenated with the Grouping Of Multi-Scale Attention Blocks (GMAB) technique. This study achieved an accuracy of 91.6 %. Shetty et al. [] developed a CNN model along with a k-fold cross-validation method. The accuracy of this model was 95.18%. The study carried out by Panthakkan et al. [], and was based on the Concatenated Xception-ResNet50 model for the diagnosis of skin cancer. This model yields competitive results with an accuracy of 97.8%. The proposed work presents a custom CNN model and implements it on pre-processed data using the ESRGAN algorithm to achieve an accuracy of 98.77% which is much higher when compared to other literature studies carried out on the HAM10000 database.

Table 8. Highest Accuracy for protocol-I.

5.3. Protocol II

For the purpose of parameter tuning with more test images, the following protocol-II is chosen where the dataset is split into the following ratio ((Train + Val):Test) = ((90 + 10):20). It indicates that at first, the dataset is divided into 80: 20 Train: Test split. Subsequently, the training set is subdivided into 90% for training and 10% for validation. For this augmented data the number of training images is 33,793, validation images is 3755 as well as 9387 test images. Class-wise samples for this experimentation are depicted in Table 9. The model was trained using a machine with 12 GB RAM and GPU attached to it. Training and validation accuracies and losses are indicated in Figure 7 and Figure 8, respectively. The confusion matrix for protocol-II is indicated in Figure 9.

Table 9. Class-wise images present in the HAM10000 dataset with protocol-II.

Figure 7. Accuracy graph for training and testing for protocol-II.

Figure 8. Training and testing losses for protocol-II.

Figure 9. Confusion matrix for protocol-II.

Various Approaches That Follow Protocol-II

Table 10 specifies accuracies in the context of current research performed on the HAM10000 database. In Sevli et al. [], a deep convolutional neural network was implemented for the classification of skin lesions. This study accomplished an accuracy of 91.51%. Saarela et al. [] worked on the HAM10000 dataset for skin-lesion classification. In this study, the robustness, stability, and fidelity studies of the deep convolutional neural network are carried out. Their model gives a classification accuracy of 80%. The proposed method for protocol-II gives a better accuracy of 98.36% as indicated in Table 10.

Table 10. Highest accuracy for protocol-II.

5.4. Protocol III

For the purpose of parameter tuning with fewer numbers of images for testing, the following protocol-III is implemented where the dataset is split into the following ratio ((Train + Val):Test) = ((90 + 10):10). It indicates that at first, the dataset is divided into 90: 10 Train: Test split. Subsequently, the training set is subdivided into 90% for training and 10% for validation. For this augmented data, the number of training images is 38,017, the number of validation images is 4224, and the number of test images is 4694. Class-wise samples for this experimentation are depicted in Table 11.

Table 11. Class-wise images present in HAM10000 dataset with protocol-III.

The CNN model was trained using a machine with 12 GB RAM and GPU attached to it. Training and validation accuracies and losses are indicated in Figure 10 and Figure 11, respectively. The confusion matrix for protocol-III is indicated in Figure 12.

Figure 10. Accuracy graph for training and testing for protocol-III.

Figure 11. Training and testing losses for Protocol-III.

Figure 12. Confusion matrix for Protocol-III.

Various Approaches That Follow Protocol-III

Table 12 specifies accuracies, in the context of current research performed on the HAM10000 database. The research article presented by Aldhyani et al. [] focused on kernel-based CNN. In this study, a lightweight dynamic kernel deep-learning-based convolutional neural network is implemented. This algorithm achieved an accuracy of 97.8% when the model was tested on the HAM10000 database. Alam et al. [] presented an approach that works on the segmentation-based sub-network. In this study, the S2C-DeLeNet algorithm was applied on skin cancer data. This algorithm has obtained an accuracy of 90.58%. The proposed Custom CNN using ESRGAN technique achieved an accuracy of 98.89%.

Table 12. Highest accuracy for Protocol-III.

6. Conclusions and Future Scope

Melanocyte cells are responsible for the formation of pigmented lesions. Skin malignancies such as melanoma are caused by the unregulated division of melanocyte cells, which may have a damaging effect on the human body. The dermatologists with extensive training interpret dermoscopic images. Due to the insufficiency of educated specialists and the need to minimize human-induced mistakes, the use of computer-assisted systems is emphasized. Convolution neural network, a method for deep learning that retrieves features from images, achieved huge success in the domain of computer vision. The pre-processing with ESRGAN helps us to reduce the size of images with better resolution and overall execution time for the experiment. The complexion in the model in terms of image shape that it considers as an input leads to maximum execution time in training the model. Hence, in this work, we have used images with a resolution of 28 × 28 pixels. Before resampling the images, the original images were enhanced using the ESRGAN dataset which helps to preserve the eminent features in the input images after down sampling. In this experimental analysis, we have implemented a HAM10000 dataset having 10,015 images of skin lesions and are categorized into seven different classes using a custom CNN model. The experimental model achieved accuracies of 98.77%, 98.36%, and 98.89% for protocol-I, protocol-II, and protocol-III, respectively, and it is seen to be competitively high as compared to the pretrained models presented by different researchers.

In the future, our aim is to work on the diagnosis of real-time skin lesions with improvement in the testing accuracy. We also hope to implement our proposed model to work on larger datasets if available for skin-cancer image categorization. It will in turn help us to enhance the performance metric scores. It is anticipated that the proposed work will help the dermatologist to examine and classify the class of skin cancer in lesser time duration and with more precision. Additionally, it will assist in reducing the total costs associated with skin cancer diagnosis. There is a scope for further enhancement in performance metrics such as accuracy, precision, and recall.

Author Contributions

Conceptualization, S.B.M. and H.Y.P.; methodology, S.B.M.; software, S.B.M.; validation, S.B.M.; formal analysis, S.B.M.; investigation, S.B.M.; resources, S.B.M.; data curation, S.B.M.; writing—original draft preparation, S.B.M.; writing—review and editing, S.B.M.; visualization, S.B.M.; supervision, H.Y.P.; project administration, H.Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is publicly available.

Conflicts of Interest

The author declares no conflict of interest.

References

Afza, F.; Sharif, M.; Khan, M.A.; Tariq, U.; Yong, H.S.; Cha, J. Multiclass Skin Lesion Classification Using Hybrid Deep Features Selection and Extreme Learning Machine. Sensors 2022, 22, 799. [Google Scholar] [CrossRef] [PubMed]
Aldhyani, T.H.H.; Verma, A.; Al-Adhaileh, M.H.; Koundal, D. Multi-Class Skin Lesion Classification Using a Lightweight Dynamic Kernel Deep-Learning-Based Convolutional Neural Network. Diagnostics 2022, 12, 2048. [Google Scholar] [CrossRef] [PubMed]
World Health Organization. Radiation: Ultraviolet (UV) Radiation and Skin Cancer—How Common Is Skin Cancer. Available online: https://www.who.int/news-room/questions-and-answers/item/radiation-ultraviolet-(uv)-radiation-and-skin-cancer (accessed on 12 October 2022).
Jeyakumar, J.P.; Jude, A.; Priya Henry, A.G.; Hemanth, J. Comparative Analysis of Melanoma Classification Using Deep Learning Techniques on Dermoscopy Images. Electronics 2022, 11, 2918. [Google Scholar] [CrossRef]
Ali, K.; Shaikh, Z.A.; Khan, A.A.; Laghari, A.A. Multiclass Skin Cancer Classification Using EfficientNets—A First Step towards Preventing Skin Cancer. Neurosci. Inform. 2022, 2, 100034. [Google Scholar] [CrossRef]
Hebbar, N.; Patil, H.Y.; Agarwal, K. Web Powered CT Scan Diagnosis for Brain Hemorrhage Using Deep Learning. In Proceedings of the 2020 IEEE 4th Conference on Information & Communication Technology (CICT), Chennai, India, 3 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
Aladhadh, S.; Alsanea, M.; Aloraini, M.; Khan, T.; Habib, S.; Islam, M. An Effective Skin Cancer Classification Mechanism via Medical Vision Transformer. Sensors 2022, 22, 4008. [Google Scholar] [CrossRef]
Shetty, B.; Fernandes, R.; Rodrigues, A.P. Skin Lesion Classiication of Dermoscopic Images Using Machine Learning and Convolutional Neural Network. Sci. Rep. 2022, 12, 18134. [Google Scholar] [CrossRef]
Tschandl, P.; Rosendahl, C.; Kittler, H. The HAM10000 Dataset, a Large Collection of Multi-Source Dermatoscopic Images of Common Pigmented Skin Lesions. Sci. Data 2018, 5, 180161. [Google Scholar] [CrossRef] [PubMed]
Bansal, P.; Garg, R.; Soni, P. Detection of Melanoma in Dermoscopic Images by Integrating Features Extracted Using Handcrafted and Deep Learning Models. Comput. Ind. Eng. 2022, 168, 108060. [Google Scholar] [CrossRef]
Basak, H.; Kundu, R.; Sarkar, R. MFSNet: A Multi Focus Segmentation Network for Skin Lesion Segmentation. Pattern Recognit. 2022, 128, 108673. [Google Scholar] [CrossRef]
Nakai, K.; Chen, Y.W.; Han, X.H. Enhanced Deep Bottleneck Transformer Model for Skin Lesion Classification. Biomed. Signal Process. Control 2022, 78, 103997. [Google Scholar] [CrossRef]
Popescu, D.; El-Khatib, M.; Ichim, L. Skin Lesion Classification Using Collective Intelligence of Multiple Neural Networks. Sensors 2022, 22, 4399. [Google Scholar] [CrossRef] [PubMed]
Qian, S.; Ren, K.; Zhang, W.; Ning, H. Skin Lesion Classification Using CNNs with Grouping of Multi-Scale Attention and Class-Specific Loss Weighting. Comput. Methods Programs Biomed. 2022, 226, 107166. [Google Scholar] [CrossRef] [PubMed]
Mahbod, A.; Schaefer, G.; Wang, C.; Dorffner, G.; Ecker, R.; Ellinger, I. Transfer Learning Using a Multi-Scale and Multi-Network Ensemble for Skin Lesion Classification. Comput. Methods Programs Biomed. 2020, 193, 105475. [Google Scholar] [CrossRef]
Panthakkan, A.; Anzar, S.M.; Jamal, S.; Mansoor, W. Concatenated Xception-ResNet50—A Novel Hybrid Approach for Accurate Skin Cancer Prediction. Comput. Biol. Med. 2022, 150, 106170. [Google Scholar] [CrossRef]
Almaraz-Damian, J.A.; Ponomaryov, V.; Sadovnychiy, S.; Castillejos-Fernandez, H. Melanoma and Nevus Skin Lesion Classification Using Handcraft and Deep Learning Feature Fusion via Mutual Information Measures. Entropy 2020, 22, 484. [Google Scholar] [CrossRef]
Zalaudek, I.; Giacomel, J.; Schmid, K.; Bondino, S.; Rosendahl, C.; Cavicchini, S.; Tourlaki, A.; Gasparini, S.; Bourne, P.; Keir, J.; et al. Dermatoscopy of Facial Actinic Keratosis, Intraepidermal Carcinoma, and Invasive Squamous Cell Carcinoma: A Progression Model. J. Am. Acad. Dermatol. 2012, 66, 589–597. [Google Scholar] [CrossRef] [PubMed]
Sevli, O. A Deep Convolutional Neural Network-Based Pigmented Skin Lesion Classification Application and Experts Evaluation. Neural Comput. Appl. 2021, 33, 12039–12050. [Google Scholar] [CrossRef]
Lallas, A.; Apalla, Z.; Argenziano, G.; Longo, C.; Moscarella, E.; Specchio, F.; Raucci, M.; Zalaudek, I. The Dermatoscopic Universe of Basal Cell Carcinoma. Dermatol. Pract. Concept. 2014, 4, 11–24. [Google Scholar] [CrossRef]
BinJadeed, H.; Aljomah, N.; Alsubait, N.; Alsaif, F.; AlHumidi, A. Lichenoid Keratosis Successfully Treated with Topical Imiquimod. JAAD Case Rep. 2020, 6, 1353–1355. [Google Scholar] [CrossRef]
Ortonne, J.P.; Pandya, A.G.; Lui, H.; Hexsel, D. Treatment of Solar Lentigines. J. Am. Acad. Dermatol. 2006, 54, 262–271. [Google Scholar] [CrossRef] [PubMed]
Zaballos, P.; Salsench, E.; Serrano, P.; Cuellar, F.; Puig, S.; Malvehy, J. Studying Regression of Seborrheic Keratosis in Lichenoid Keratosis with Sequential Dermoscopy Imaging. Dermatology 2010, 220, 103–109. [Google Scholar] [CrossRef]
Zaballos, P.; Puig, S.; Llambrich, A.; Malvehy, J. Dermoscopy of Dermatofibromas. Arch. Dermatol. 2008, 144, 75–83. [Google Scholar] [CrossRef] [PubMed]
Sarkar, R.; Chatterjee, C.C.; Hazra, A. Diagnosis of Melanoma from Dermoscopic Images Using a Deep Depthwise Separable Residual Convolutional Network. IET Image Process 2019, 13, 2130–2142. [Google Scholar] [CrossRef]
Teja, K.U.V.R.; Reddy, B.P.V.; Likith Preetham, A.; Patil, H.Y.; Poorna Chandra, T. Prediction of Diabetes at Early Stage with Supplementary Polynomial Features. In Proceedings of the 2021 Smart Technologies, Communication and Robotics (STCR)STCR, Sathyamangalam, India, 9–10 October 2021; pp. 7–11. [Google Scholar] [CrossRef]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.P.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. Cvpr 2017, 2, 4. [Google Scholar]
Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Computer Vision – ECCV 2018 Workshops; Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI); Springer: Cham, Switzerland, 2019; Volume 11133, pp. 63–79. [Google Scholar] [CrossRef]
Le-Tien, T.; Nguyen-Thanh, T.; Xuan, H.P.; Nguyen-Truong, G.; Ta-Quoc, V. Deep Learning Based Approach Implemented to Image Super-Resolution. J. Adv. Inf. Technol. 2020, 11, 209–216. [Google Scholar] [CrossRef]
Milton, M.A.A. Automated Skin Lesion Classification Using Ensemble of Deep Neural Networks in ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection Challenge. arXiv 2019, arXiv:1901.10802. [Google Scholar]
Naeem, A.; Farooq, M.S.; Khelifi, A.; Abid, A. Malignant Melanoma Classification Using Deep Learning: Datasets, Performance Measurements, Challenges and Opportunities. IEEE Access 2020, 8, 110575–110597. [Google Scholar] [CrossRef]
Hu, Z.; Tang, J.; Wang, Z.; Zhang, K.; Zhang, L.; Sun, Q. Deep Learning for Image-Based Cancer Detection and diagnosis − A Survey. Pattern Recognit. 2018, 83, 134–149. [Google Scholar] [CrossRef]
Srivastava, V.; Kumar, D.; Roy, S. A Median Based Quadrilateral Local Quantized Ternary Pattern Technique for the Classification of Dermatoscopic Images of Skin Cancer. Comput. Electr. Eng. 2022, 102, 108259. [Google Scholar] [CrossRef]
Patil, P.; Ranganathan, M.; Patil, H. Ship Image Classification Using Deep Learning Method BT—Applied Computer Vision and Image Processing; Iyer, B., Rajurkar, A.M., Gudivada, V., Eds.; Springer: Singapore, 2020; pp. 220–227. [Google Scholar]
Barua, S.; Patil, H.; Desai, P.; Manoharan, A. Deep Learning-Based Smart Colored Fabric Defect Detection System; Springer: Berlin/Heidelberg, Germany, 2020; pp. 212–219. ISBN 978-981-15-4028-8. [Google Scholar]
Sarkar, A.; Maniruzzaman, M.; Ahsan, M.S.; Ahmad, M.; Kadir, M.I.; Taohidul Islam, S.M. Identification and Classification of Brain Tumor from MRI with Feature Extraction by Support Vector Machine. In Proceedings of the 2020 International Conference for Emerging Technology (INCET), Belgaum, India, 5–7 June 2020; Volume 2, pp. 9–12. [Google Scholar] [CrossRef]
Agyenta, C.; Akanzawon, M. Skin Lesion Classification Based on Convolutional Neural Network. J. Appl. Sci. Technol. Trends 2022, 3, 14–19. [Google Scholar] [CrossRef]
Saarela, M.; Geogieva, L. Robustness, Stability, and Fidelity of Explanations for a Deep Skin Cancer Classification Model. Appl. Sci. 2022, 12, 9545. [Google Scholar] [CrossRef]
Alam, M.J.; Mohammad, M.S.; Hossain, M.A.F.; Showmik, I.A.; Raihan, M.S.; Ahmed, S.; Mahmud, T.I. S2C-DeLeNet: A Parameter Transfer Based Segmentation-Classification Integration for Detecting Skin Cancer Lesions from Dermoscopic Images. Comput. Biol. Med. 2022, 150, 106148. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Comparison of sample images with their respective ESRGAN enhanced images.

Figure 2. Overview of the proposed ESRGAN-based CNN algorithm.

Figure 3. Layered architecture of proposed CNN model.

Figure 4. Accuracy graph for training and testing for protocol-I.

Figure 5. Training and testing losses for protocol-I.

Figure 6. Confusion matrix for protocol-I (80:20 Train:Test split).

Figure 7. Accuracy graph for training and testing for protocol-II.

Figure 8. Training and testing losses for protocol-II.

Figure 9. Confusion matrix for protocol-II.

Figure 10. Accuracy graph for training and testing for protocol-III.

Figure 11. Training and testing losses for Protocol-III.

Figure 12. Confusion matrix for Protocol-III.

Table 1. Class-wise images present in HAM10000 dataset.

Class	akiec	bcc	bkl	df	nv	vasc	mel
Images	327	514	1099	115	6705	142	1113

Table 2. Notations for each class in the HAM10000 database.

Class	akiec	bcc	bkl	df	nv	vasc	mel
Label	0	1	2	3	4	5	6

Table 3. CNN Model Summary.

Layer	Output Shape	Parameters
Input Layer	[(None, 28, 28, 3)]	0
Convolution 2D_1	(None, 28, 28, 32)	896
MaxPooling2D_1	(None, 14, 14, 32)	0
Batch Normalization_1	(None, 14, 14, 32)	128
Convolution 2D_2	(None, 14, 14, 64)	18,496
Convolution 2D_3	(None, 14, 14, 64)	36,928
MaxPooling2D_2	(None, 7, 7, 64)	0
Batch Normalization_2	(None, 7, 7, 64)	256
Convolution 2D_4	(None, 7, 7, 128)	73,856
Convolution 2D_5	(None, 7, 7, 128)	147,584
MaxPooling2D_3	(None, 3, 3, 128)	0
Batch Normalization_3	(None, 3, 3, 128)	512
Convolution 2D_6	(None, 3, 3, 256)	295,168
Convolution 2D_7	(None, 3, 3, 256)	590,080
Batch Normalization_4	(None, 1, 1, 256)	0
Flatten	(None, 256)	0
Dropout	(None, 256)	0
Dense_1	(None, 256)	65,792
Batch Normalization_5	(None, 256)	1024
Dense_2	(None, 128)	32,896
Batch Normalization_6	(None, 128)	512
Dense_3	(None, 64)	8256
Batch Normalization_7	(None, 64)	256
Dense_4	(None, 32)	2080
Batch Normalization_8	(None, 32)	128
Classifier	(None, 7)	231

Table 4. Hyper parameters for training the model.

Parameter	Value
Batch size	128
Number of epochs	25
Number of iterations	294
Optimizer	Adam
Optimizer parameters	Lr = 0.00001

Table 5. Performance metrics and their formulas.

Performance Metrics	Formula	Equation
Accuracy	$\frac{(T_{n} + T_{p})}{(T_{p} + F_{p} + F_{n} + T_{n})}$	(7)
F1-Score	$\frac{(2 * P r e c i s i o n * R e c a l l)}{(P r e c i s i o n + R e c a l l)}$	(8)
Recall	$\frac{T_{p}}{(T_{p} + F_{n})}$	(9)
Precision	$\frac{T_{p}}{T_{p} + F_{p}}$	(10)

Table 6. Class-wise images present in HAM10000 dataset with protocol-I.

Class	akiec	bcc	bkl	df	nv	vasc	mel
Training Samples	5383	5352	5408	5417	5325	5341	5322
Testing Samples	1322	1353	1297	1288	1380	1364	1383

Table 7. Class-wise performance measures of the model.

Lesion Class	Precision	Recall	F1-Score
0-akiec	1.00	1.00	1.00
1-bcc	0.99	1.00	0.99
2-bkl	0.97	1.00	0.99
3-df	1.00	1.00	1.00
4-nv	1.00	0.92	0.96
5-vasc	1.00	1.00	1.00
6-mel	0.96	1.00	0.98

Table 8. Highest Accuracy for protocol-I.

Sr. No.	Work	Data Augmentation/Balancing? (Yes/No). Total Number of Images after Data Augmentation/Balancing	Methodology	Accuracy (%)
1	Agyenta et al. []	Yes, 7283	InceptionV3	85.80%
			ResNet50	86.69%
			DenseNet201	86.91%
2	Qian et al. []	Yes, Not mentioned	Grouping of Multi-scale Attention Blocks (GMAB)	91.6%
3	Shetty et al. []	Yes, 1400	Convolutional neural network (CNN)	95.18%
4	Panthakkan et al. []	No	Concatenated Xception-ResNet50 -	97.8%
5	Proposed algorithm	Yes, 46,935	ESRGAN-CNN	98.77%

Table 9. Class-wise images present in the HAM10000 dataset with protocol-II.

Class	akiec	bcc	bkl	df	nv	vasc	mel
Training Samples	4845	4817	4867	4875	4792	4807	4790
Validation Samples	538	535	541	542	533	534	532
Testing Samples	1322	1353	1297	1288	1380	1364	1383

Table 10. Highest accuracy for protocol-II.

Sr. No.	Work	Data Augmentation/Balancing? (Yes/No). Total Number of Images after Data Augmentation/Balancing	Methodology	Accuracy (%)
1	Onur et al. []	Yes, Not Mentioned	Convolutional neural network (CNN)	91.51%
2	Saarela et al. []	No	Deep Convolutional neural network (CNN)	80%
3	Proposed algorithm	Yes, 46,935	ESRGAN-CNN	98.36%

Table 11. Class-wise images present in HAM10000 dataset with protocol-III.

Class	akiec	bcc	bkl	df	nv	vasc	mel
Training Samples	5346	5557	5338	5184	5500	5513	5579
Validation Samples	594	617	593	576	611	613	620
Testing Samples	660	686	659	640	679	681	689

Table 12. Highest accuracy for Protocol-III.

Sr. No.	Work	Data Augmentation/Balancing? (Yes/No). Total Number of Images after Data Augmentation/Balancing	Methodology	Accuracy (%)
1	Aldhyani et al. []	Yes, 54,907	Lightweight Dynamic Kernel Deep-Learning-Based Convolutional Neural Network	97.8%
2	Alam et al. []	No	S2C-DeLeNet	90.58%
3	Proposed algorithm	Yes, 46,935	ESRGAN-CNN	98.89%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Skin Cancer Classification Framework Using Enhanced Super Resolution Generative Adversarial Network and Custom Convolutional Neural Network

Simple Summary

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. HAM10000 Dataset

3.1.1. Actinic Keratosis (akiec)

3.1.2. Basal Cell Carcinoma (bcc)

3.1.3. Benign Keratosis-Like Lesions (bkl)

3.1.4. Dermatofibroma (df)

3.1.5. Melanocytic Nevi (nv)

3.1.6. Vascular Lesions (vasc)

3.1.7. Melanoma (mel)

4. Proposed Methodology

4.1. Pre-Processing

4.1.1. Enhanced Super-Resolution Generative Adversarial Network (ESRGAN)

4.1.2. Data Augmentation

4.2. Custom Convolutional Neural Network

4.3. Building a Custom CNN Model

5. Results and Discussion

5.1. Performance Metrics

5.2. Protocol-I (Train:Test = 80:20 Ratio)

Various Approaches That Follow Protocol-I

5.3. Protocol II

Various Approaches That Follow Protocol-II

5.4. Protocol III

Various Approaches That Follow Protocol-III

6. Conclusions and Future Scope

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics