SkinNet-INIO: Multiclass Skin Lesion Localization and Classification Using Fusion-Assisted Deep Neural Networks and Improved Nature-Inspired Optimization Algorithm

Background: Using artificial intelligence (AI) with the concept of a deep learning-based automated computer-aided diagnosis (CAD) system has shown improved performance for skin lesion classification. Although deep convolutional neural networks (DCNNs) have significantly improved many image classification tasks, it is still difficult to accurately classify skin lesions because of a lack of training data, inter-class similarity, intra-class variation, and the inability to concentrate on semantically significant lesion parts. Innovations: To address these issues, we proposed an automated deep learning and best feature selection framework for multiclass skin lesion classification in dermoscopy images. The proposed framework performs a preprocessing step at the initial step for contrast enhancement using a new technique that is based on dark channel haze and top–bottom filtering. Three pre-trained deep learning models are fine-tuned in the next step and trained using the transfer learning concept. In the fine-tuning process, we added and removed a few additional layers to lessen the parameters and later selected the hyperparameters using a genetic algorithm (GA) instead of manual assignment. The purpose of hyperparameter selection using GA is to improve the learning performance. After that, the deeper layer is selected for each network and deep features are extracted. The extracted deep features are fused using a novel serial correlation-based approach. This technique reduces the feature vector length to the serial-based approach, but there is little redundant information. We proposed an improved anti-Lion optimization algorithm for the best feature selection to address this issue. The selected features are finally classified using machine learning algorithms. Main Results: The experimental process was conducted using two publicly available datasets, ISIC2018 and ISIC2019. Employing these datasets, we obtained an accuracy of 96.1 and 99.9%, respectively. Comparison was also conducted with state-of-the-art techniques and shows the proposed framework improved accuracy. Conclusions: The proposed framework successfully enhances the contrast of the cancer region. Moreover, the selection of hyperparameters using the automated techniques improved the learning process of the proposed framework. The proposed fusion and improved version of the selection process maintains the best accuracy and shorten the computational time.

In medical imaging, the convolutional neural network (CNN) shows improved recognition performance [15]. By employing the deep backbone of CNN, the deeper layer is selected for the deep feature extraction [21]. Much research has been conducted in this domain in the last couple of years incorporating deep learning methods [22]. Despite this, many challenges still exist in this domain, including low-contrast infected lesions, variations in the shape of lesions, similarities in the colors of different skin lesion classes, imbalanced skin classes, and a few more. Based on these challenges, there is room to enhance lesion detection and multiclass classification accuracy. Hence, in this article, the following challenges are addressed: (i) imbalanced skin classes increase the probability rate of a higher number of image classes that impact the prediction performance of other classes; (ii) the low-contrast skin lesions impact the lesion localization accuracy; (iii) variations in lesion shape and texture may segment the incorrect region that later extracted the irrelevant features (incorrect region features, healthy region features, and extra features that are not required for the classification purpose). In addition, multiclass skin lesions have a high similarity in shape, color, and appearance; therefore, it is also difficult to recognize a true class correctly.
Major Contributions: Our major contributions are as follows: • Proposal of a hybrid contrast enhancement technique using the fusion of top-bottom filtering and haze reduction technique. • Fine-tuning of three pre-trained CNN architectures and training using transfer learning. For the training of deep learning models, a genetic algorithm is employed for the selection of hyperparameters instead of manual selection.

•
Proposal of a serial-controlled positive correlation approach for the fusion of trained neural nets feature. • Development of an improved optimization algorithm named Antlion for the feature selection.
The manuscript is organized so that Section 2 describes the related work based on skin lesion approaches. Section 3 describes proposed methodology, followed by Section 4, which elaborates on and discusses the experimental setup, results, and comparisons with existing methods. Finally, the conclusion is given in Section 5.

Related Work
It has been extensively investigated how to automatically diagnose skin cancer [23,24]. Deep learning algorithms show significant success in the area of medical imaging, especially for the identification of skin cancer [25]. The main components of traditional automated skin cancer diagnosis approaches are developing handcrafted features and In medical imaging, the convolutional neural network (CNN) shows improved recognition performance [15]. By employing the deep backbone of CNN, the deeper layer is selected for the deep feature extraction [21]. Much research has been conducted in this domain in the last couple of years incorporating deep learning methods [22]. Despite this, many challenges still exist in this domain, including low-contrast infected lesions, variations in the shape of lesions, similarities in the colors of different skin lesion classes, imbalanced skin classes, and a few more. Based on these challenges, there is room to enhance lesion detection and multiclass classification accuracy. Hence, in this article, the following challenges are addressed: (i) imbalanced skin classes increase the probability rate of a higher number of image classes that impact the prediction performance of other classes; (ii) the low-contrast skin lesions impact the lesion localization accuracy; (iii) variations in lesion shape and texture may segment the incorrect region that later extracted the irrelevant features (incorrect region features, healthy region features, and extra features that are not required for the classification purpose). In addition, multiclass skin lesions have a high similarity in shape, color, and appearance; therefore, it is also difficult to recognize a true class correctly.
Major Contributions: Our major contributions are as follows: • Proposal of a hybrid contrast enhancement technique using the fusion of top-bottom filtering and haze reduction technique. • Fine-tuning of three pre-trained CNN architectures and training using transfer learning. For the training of deep learning models, a genetic algorithm is employed for the selection of hyperparameters instead of manual selection. • Proposal of a serial-controlled positive correlation approach for the fusion of trained neural nets feature. • Development of an improved optimization algorithm named Antlion for the feature selection.
The manuscript is organized so that Section 2 describes the related work based on skin lesion approaches. Section 3 describes proposed methodology, followed by Section 4, which elaborates on and discusses the experimental setup, results, and comparisons with existing methods. Finally, the conclusion is given in Section 5.

Related Work
It has been extensively investigated how to automatically diagnose skin cancer [23,24]. Deep learning algorithms show significant success in the area of medical imaging, especially for the identification of skin cancer [25]. The main components of traditional automated skin cancer diagnosis approaches are developing handcrafted features and Diagnostics 2023, 13, 2869 4 of 25 using machine learning classifiers for classification [26]. A CAD system consists of a few important steps, such as preprocessing of original dermoscopy images, lesion detection using segmentation techniques, handcrafted feature extraction, feature selection, and classification using machine learning classifiers. Recently, CNNs that can learn hierarchical features have had considerable success with medical image processing, especially for skin cancer recognition [27].
Kassem et al. [28] discussed the importance of deep learning for the classification of skin cancer using deep learning techniques. They discussed extensively the importance of deep learning for better skin lesion classification, the complexity of deep learning techniques, and the most current stage of development. Hauser et al. [29] presented an explainable AI framework for skin lesion diagnosis. Zhang et al. [30] presented an attention mechanism CNN model for skin lesion recognition. Each attention block jointly used residual learning to improve representation learning. The experiments were conducted on the ISIC2017 dataset and showed improved recognition accuracy. Anand et al. [31] presented a U-NET and CNN architecture fusion for skin lesion detection and classification. They used U-NET architecture to detect lesions from the input dermoscopy images; however, CNN architecture was employed for the classification. The HAM10000 dataset was employed to validate the proposed framework and obtained accuracy above 97%. Fayadh et al. [32] introduced a wavelet transform and CNN-based architecture to diagnose skin lesions. The unwanted information was removed by employing the concept of wavelet and max pooling. Then, a residual neural network is proposed and features are extracted by employing the concept of transfer learning. The extracted features are classified using an ELM classifier and obtained improved accuracy on ISIC2017 and HAM10000 datasets.
Simon et al. [33] provided an interpretable deep-learning framework for skin lesion segmentation and classification. The main strength of this work was categorizing the tissues into 12 dermatological classes. After that, they trained a deep CNN using these characteristics for final classification. They tested the introduced framework on dermatoscopy images and compared it with clinical accuracy. During the comparison phase, the clinical method achieved an accuracy of 93.6%, whereas the computerized method attained 97.9%. This shows that the computerized methods have better performance than the clinical techniques. Javeria et al. [34] introduced an integrated model of preprocessing, segmentation, feature extraction, and deep feature fusion. Firstly, they resized the images and converted RGB into a luminance channel, then they used the Otsu algorithm and biorthogonal 2-D wavelet transform to segment the affected part of the skin. After that, pre-trained AlexNet and VGG16 were used to extract the deep features. Then, the optimal feature set was obtained using PCA for further classification. Al-Masni et al. [35] devised an integrated diagnostic paradigm encompassing skin lesions' segmentation and classification. Inception-v3, ResNet-50, Inception-ResNet-v2, and DenseNet-201 were deployed in the DL FRCN framework using dermatoscopic images to segment regions of interest, followed by classifier over segmentation results. The proposed integrated DL model works acceptably on different types of skin lesions. The model was evaluated on a balanced, segmented, and augmented dataset, including the International Skin Imaging Collaboration (ISIC) and its variants in 2016, 2017, and 2018. Overall weighted prediction accuracy for Inception-v3, ResNet-50, Inception-ResNet-v2, and DenseNet-201 classifiers is 77.04%, 79.95%, 81.79%, and 81.27% for two ISIC2016 classes, 81.29%, 81.57%, 81.34%, and 73.44% for three ISIC2017, as well as 88.05%, 89.28%, 87.74%, and 88.70 for four ISIC2018 classes. Pacheco et al. [36] used the thirteen best deep-learning networks. Finally, they concluded that the SE Net convolutional neural network and Adam optimization were the perfect architecture among all neural networks. The proposed model obtained 91% performance on the ISIC2019 dataset. Farooq et al. [37] introduced a model to enhance the classification performance by up to 86% by incorporating Mobile Net and Inception Net. For these models, Kaggle's updated dataset of skin cancer was utilized to check their performances. Esteva et al. [38] conducted a pioneering CNN-based research work to detect and classify skin lesion datasets. Lui et al. [39] defined a deep learning model with Dense Net and Resnet using the MFL module. The proposed work generated an effective accuracy of 87% on the ISIC2017 database for skin lesion classification. Pedro et al. [40] proposed a Feedforward Neural Network (FNN) classification model and Linear SVM on the dermo fit dataset. Their setup produced an accuracy level of 90% on the selected dataset. Milton et al. [41] depicted a comprehensive study of multiple deep-learning techniques for skin cancer. They conducted the experiments on the publicly available ISIC2018 dataset, fed to multiple neural networks, including Inception Resnet-V2, PNASNet-5, SENet-154, and Inception-V4. The PNASNet-5 model is the best performer at 76% accuracy level.
Khatib et al. [42] presented Resnet-101 architecture for the skin lesions classification. They fine-tuned the architecture by employing transfer learning (TL) to differentiate the various forms of skin lesions and achieved an accuracy level of 90% on a well-known PH2 database. Alizadeh et al. [43] deployed the Vgg19 NN model using kernel principal components analysis (KPCA) and attained 85.2% accuracy using the ISIC2016 dataset. Almaraz et al. [44] used the ABCD rule-based technique after extracting handcrafted features' color, shapes, and texture. These features were then given to Mobile NetV2 neural network melanoma categorization. The proposed technique achieved 92.4% accuracy using the HAM10000 dataset. Reis et al. [45] employed a DL approach for skin lesion identification and segmentation. The suggested technique was investigated on three widely accessible datasets, ISIC2018, ISIC2019, and ISIC2020, where prediction accuracy was enhanced to 90.1, 90.2, and 91.3%. Khan et al. [8] presented an improved subdivision combinatorial architecture (IMFO) consisting of moth+flame and DL Classification for skin lesion classification. Furthermore, they extended the model to minimize the time taken in diagnosing skin cancer. The IMFO architecture was tested on PH2, ISBI 2016, 2017, and 2018 datasets and obtained an accuracy level of 98.70%, 95.38%, 95.79%, and 92.69%, respectively. The architecture was also tested on the dataset Ham10000, where it reflected a precision level of 90.67% which represents an improvement. Khan et al. [46] presented another intelligent system based on deep neural networks for complex skin cancer categories. The authors suggested a two-stream DNN information fusion framework for classifying multiclass skin cancer. Firstly, a contrast enhancement technique based on fusion was suggested in which magnified images were fed to the pre-formed DenseNet201 architecture. These features were modified utilizing the skewness-controlled moth + flame optimization approach. After that, stream deep features were captured and down-sampled using fine-tuned MobileNetV2 pre-trained systems and a proposed feature selection structure. The proposed technique was tested on three unbalanced datasets named as HAM10000, ISBI2018, and ISIC2019, that produced accuracy levels of 96.5%, 98%, and 89%, respectively. These discussed methods focused on detection and classification using deep learning and machine learning classifiers. They did not focus on the fusion of different source features. Also, they ignore the process of best feature selection that can help in reducing the computational time. To address these important challenges, a new AI-based fully automated framework is proposed for skin lesion classification.

Proposed Methodology
The proposed methodology is illustrated in Figure 2. Figure 2 reflects that firstly, the dataset is preprocessed, and then the enhanced dataset is fed to fine-tune the DL model for training based on transfer learning to extract deep features. Secondly, the extracted features are passed through the feature fusion process. Finally, an updated Antlion optimization approach was employed to obtain an optimized feature vector.

Datasets Description
This paper uses two variants of ISIC datasets, including 2018 and 2019, for the experimental process.
ISIC2018: This dataset was generated in the year 2018 by ISIC. It is a collection of 10,014 training images and 55,834 testing images. The dermoscopy technology is employed for capturing images RGB images. This dataset has seven classes: Akiec, Bcc, Bkl, Df, Mel, Nv, and Vasc. Table 1 summarizes and highlights the overall class distribution within the dataset.
ISIC2019: This dataset was generated in the year 2019 by ISIC. It is a collection of 20,685 training images and 47,514 testing images. The dermatoscopy technology is employed for capturing images RGB images. This dataset has seven classes: AK, BCC, BKL, DF, MEL, NV, and VASC. Table 2 summarizes and highlights the overall class distribution within the dataset.

Datasets Description
This paper uses two variants of ISIC datasets, including 2018 and 2019, for the experimental process.
ISIC2018: This dataset was generated in the year 2018 by ISIC. It is a collection of 10,014 training images and 55,834 testing images. The dermoscopy technology is employed for capturing images RGB images. This dataset has seven classes: Akiec, Bcc, Bkl, Df, Mel, Nv, and Vasc. Table 1 summarizes and highlights the overall class distribution within the dataset.

Novelty 1: Lesion Enhancement
In this work, a hybrid technique is employed for contrast enhancement. In the first step, a haze reduction technique is employed, where the input image is refined, followed by applying top-bottom filtering to improve local and global contrast [47]. The step-wise haze reduction process is given below.
Step 1: The haze image model is given below: where I, J, L, and T represent the intensity, scene radiance, atmospheric light, and map transmission, respectively. The scene radiance is recovered using the algorithm [48]; however, other factors, including J from the estimated light of the atmosphere and the map transmission, are computed as follows: Step 2: Consider λ(x, y) is an input image of dimension N × M × K where N = M = 256 and K = 3. Let, ∼ λ nz (x, y) determine the haze reduction image having the same dimensions. The top hat filtering is proposed and computed using the following mathematical formulation: The visual output of this process is illustrated in Figure 3.

Novelty 1: Lesion Enhancement
In this work, a hybrid technique is employed for contrast enhancement. In the firs step, a haze reduction technique is employed, where the input image is refined, followed by applying top-bottom filtering to improve local and global contrast [47]. The step-wis haze reduction process is given below.
Step 1: The haze image model is given below: where Ι, J, , and T represent the intensity, scene radiance, atmospheric light, and map transmission, respectively. The scene radiance is recovered using the algorithm [48]; how ever, other factors, including J from the estimated light of the atmosphere and the map transmission, are computed as follows: Step 2: Consider ( , ) is an input image of dimension × × where = = 256 and = 3. Let, ( , ) determine the haze reduction image having the same di mensions. The top hat filtering is proposed and computed using the following mathemat ical formulation: The visual output of this process is illustrated in Figure 3.

Data Augmentation
This is a process in which the data/data points are artificially increased using the ex isting data for better training, identification, and classification in the later stages. The ad vantage of data augmentation is that it improves model learning by providing a huge amoun of data. Also, the cost of operations related to data collection will be reduced. The detail give in Tables 2 and 3 shows that the total number of original images are 20,685. Before data aug mentation, the contrast of the real images is improved using the proposed contrast-enhanced

Data Augmentation
This is a process in which the data/data points are artificially increased using the existing data for better training, identification, and classification in the later stages. The advantage of data augmentation is that it improves model learning by providing a huge amount of data. Also, the cost of operations related to data collection will be reduced. The detail given in Tables 2 and 3 shows that the total number of original images are 20,685. Before data augmentation, the contrast of the real images is improved using the proposed contrast-enhanced technique. After applying augmentation, the selected datasets were updated and are shown in Tables 3 and 4. A few sample augmented images are illustrated in Figure 4. Table 3. Updated ISIC2018 Skin dataset images after data augmentation.

Class
Before Augmentation After Augmentation technique. After applying augmentation, the selected datasets were updated and are shown in Tables 3 and 4. A few sample augmented images are illustrated in Figure 4.

Modified Models
In this work, different DL models were fine-tuned to obtain high-performance accuracy. These are explained in detail: Fine-Tuned DarkNet19: The model is fine-tuned by eliminating linked, softmax, classification, and final four average-pool layers. The original model is shown in Figure 5. It is all because it is pre-trained on 1000 classes belonging to the ImageNet dataset. Hence, during the fine-tuning process, four new layers are added, including the average-pooling 2D-layer, fully connected layer, softmax layer, and classification layer. The Darknet19 model is trained through transfer learning. In the training process, several hyper-parameters are adjusted, i.e., the learning rate is 0.001, the minimum-batch size is 20, the

Modified Models
In this work, different DL models were fine-tuned to obtain high-performance accuracy. These are explained in detail: Fine-Tuned DarkNet19: The model is fine-tuned by eliminating linked, softmax, classification, and final four average-pool layers. The original model is shown in Figure 5. It is all because it is pre-trained on 1000 classes belonging to the ImageNet dataset. Hence, during the fine-tuning process, four new layers are added, including the average-pooling 2D-layer, fully connected layer, softmax layer, and classification layer. The Darknet19 model is trained through transfer learning. In the training process, several hyper-parameters are adjusted, i.e., the learning rate is 0.001, the minimum-batch size is 20, the momentum is 0.07, the optimizer is stochastic gradient descent, and the maximum epochs are 100. Finally, the trained model extracts features adopting the gap layer.  Fine-Tuned ResNet18: The ResNet18 DL model consists of 18 layers. The architecture has a fully connected combination of softmax, convolutional, pooling, and classification layers. This model uses a pooling layer named 'pool5' for feature extraction. From the ImageNet dataset, more than a million images will be trained on the network when you load the pre-trained version. The architecture of ResNet18 is depicted in Figure 6. The last four layers, termed the average-layer, are deleted during the fine-tuning phase, along with the fully connected, softmax, and classification layers. The previous fully connected layer was trained on an ImageNet dataset with 1000 item types.
Furthermore, four more layers are added in a fine-tuning process. These are averagepooling 2D-layer, fully connected-layer, softmax layer, and classification layer. The Res-Net18 model is trained using TL. Numerous hyper-parameters are initialized and adjusted during training, such as learning rate to 0.001, mini-batch size to 20, momentum to 0.07, stochastic gradient descent optimizer, and a maximum number of epochs to 100. Finally, the trained model extracts features from the pool5 layer. Fine-tuned InceptionV3: The InceptionV3 DL model consists of 48 layers. The architecture contains a fully connected combination of softmax, convolutional, pooling, and classification layers. A pooling layer named the 'avg-pool layer' was used for feature extraction. This model was previously trained on over one million photos in the ImageNet dataset. This model is mostly used for image recognition and has a 78.1% accuracy rate. The architecture of the fine-tuned InceptionV3 is depicted in Figure 7. The last four average pool layers, along with the fully connected, softmax, and classification layers, are deleted during the fine-tuning phase. The previous fully connected-layer was trained on an Fine-Tuned ResNet18: The ResNet18 DL model consists of 18 layers. The architecture has a fully connected combination of softmax, convolutional, pooling, and classification layers. This model uses a pooling layer named 'pool5' for feature extraction. From the ImageNet dataset, more than a million images will be trained on the network when you load the pre-trained version. The architecture of ResNet18 is depicted in Figure 6. The last four layers, termed the average-layer, are deleted during the fine-tuning phase, along with the fully connected, softmax, and classification layers. The previous fully connected layer was trained on an ImageNet dataset with 1000 item types.
Diagnostics 2023, 13, x FOR PEER REVIEW 9 of 25 momentum is 0.07, the optimizer is stochastic gradient descent, and the maximum epochs are 100. Finally, the trained model extracts features adopting the gap layer. Fine-Tuned ResNet18: The ResNet18 DL model consists of 18 layers. The architecture has a fully connected combination of softmax, convolutional, pooling, and classification layers. This model uses a pooling layer named 'pool5' for feature extraction. From the ImageNet dataset, more than a million images will be trained on the network when you load the pre-trained version. The architecture of ResNet18 is depicted in Figure 6. The last four layers, termed the average-layer, are deleted during the fine-tuning phase, along with the fully connected, softmax, and classification layers. The previous fully connected layer was trained on an ImageNet dataset with 1000 item types.
Furthermore, four more layers are added in a fine-tuning process. These are averagepooling 2D-layer, fully connected-layer, softmax layer, and classification layer. The Res-Net18 model is trained using TL. Numerous hyper-parameters are initialized and adjusted during training, such as learning rate to 0.001, mini-batch size to 20, momentum to 0.07, stochastic gradient descent optimizer, and a maximum number of epochs to 100. Finally, the trained model extracts features from the pool5 layer. Fine-tuned InceptionV3: The InceptionV3 DL model consists of 48 layers. The architecture contains a fully connected combination of softmax, convolutional, pooling, and classification layers. A pooling layer named the 'avg-pool layer' was used for feature extraction. This model was previously trained on over one million photos in the ImageNet dataset. This model is mostly used for image recognition and has a 78.1% accuracy rate. The architecture of the fine-tuned InceptionV3 is depicted in Figure 7. The last four average pool layers, along with the fully connected, softmax, and classification layers, are deleted during the fine-tuning phase. The previous fully connected-layer was trained on an ImageNet dataset with 1000 item types. Next, in a fine-tuning procedure, four new layers are added. These are the average-pooling 2D-layer, a fully connected layer, a softmax layer, and a classification layer followed by TL to train the Inceptionv3 model. Numerous hyper-parameters are initialized and adjusted during the training process, such as Furthermore, four more layers are added in a fine-tuning process. These are averagepooling 2D-layer, fully connected-layer, softmax layer, and classification layer. The ResNet18 model is trained using TL. Numerous hyper-parameters are initialized and adjusted during training, such as learning rate to 0.001, mini-batch size to 20, momentum to 0.07, stochastic gradient descent optimizer, and a maximum number of epochs to 100. Finally, the trained model extracts features from the pool5 layer.
Fine-tuned InceptionV3: The InceptionV3 DL model consists of 48 layers. The architecture contains a fully connected combination of softmax, convolutional, pooling, and classification layers. A pooling layer named the 'avg-pool layer' was used for feature extraction. This model was previously trained on over one million photos in the ImageNet dataset. This model is mostly used for image recognition and has a 78.1% accuracy rate. The architecture of the fine-tuned InceptionV3 is depicted in Figure 7. The last four average pool layers, along with the fully connected, softmax, and classification layers, are deleted during the fine-tuning phase. The previous fully connected-layer was trained on an ImageNet dataset with 1000 item types. Next, in a fine-tuning procedure, four new layers are added. These are the average-pooling 2D-layer, a fully connected layer, a softmax layer, and a classification layer followed by TL to train the Inceptionv3 model. Numerous hyper-parameters are initialized and adjusted during the training process, such as learning rate to 0.001, minimum batch size to 20, momentum to 0.07, optimizer of stochastic gradient descent, and maximum number of epochs to 100. Finally, the trained model is used to extract features from the avg-pool-layer.
Diagnostics 2023, 13, x FOR PEER REVIEW 10 of 25 learning rate to 0.001, minimum batch size to 20, momentum to 0.07, optimizer of stochastic gradient descent, and maximum number of epochs to 100. Finally, the trained model is used to extract features from the avg-pool-layer.  Transfer Learning: In this section, TL [50] is discussed for this work. The domain, denoted by F = {Z, R(Z)}, is made up of two parts, i.e., a feature space Z and a marginal probability distribution R(Z), whereas Z = {z|z i ∈ Z, i = 1, · · · , M} and M is a dataset containing M occurrences.
After that, the task is defined; when presented with a particular domain F, the task is represented as T = {W, f (.)} including two factors such as label-space W and a mapping function f (.), whereby W = {w|w i ∈ W, i = 1, · · · , M}, and M is a label set for the relevant instances in F. The mapping function f (.), generally known as f (z) = R(w|z), is a nonlinear indirect function that could bridge the gap between the anticipated judgment derived from the proposed datasets and the input instance. The label spaces between these tasks also allow for the specification of different goals. Different fault classes and categories might be conceived of as distinct tasks.
Transfer learning, supplied with a source domain F s = {Z s , R s (Z s )} with the source task T s = {W s , f s (.)} and a target domain F T = Z T , R T Z T with the target task T T = W T , f T (.) , is looking for a better mapping function f T (.) for the target task T T utilizing transferable knowledge from the source domain D s and task T s . Unlike traditional ML and DL, where the domain and job of the source and target situations are identical, i.e., F s = F T and T s = T T , TL solves challenges where the source and destination situations' domains and/or tasks diverge, i.e., F s = F T and/or T s = T T .
Deep TL may be defined as follows based on the above concept: Deep TL aims to comprehend the mapping function f S→T (.) Given a transfer learning challenge, leverage the sophisticated DL model that is DNN f S→T (.): Proposed Work Process: The process of transfer learning for feature extraction of this work is depicted in Figure 8. Al three selected fine-tuned models are trained on the skin datasets using the concept of transfer learning. Deep features are extracted from the global average pooling layer of each model and obtained different dimensional feature vectors. During the training of the deep models, the hyperparameters such as learning rate, momentum, L2RegularizationFactor, and mini-batch size are selected through GA. The resultant values are given in the above section. The extracted features are further fused using a novel fusion technique (presented in the next Section 3.5).

Novelty: Features Fusion and Optimization
Deep extracted features are fused using a serial correlation-based approach in this work. The main purpose of this approach is to first serially fuse all the features and then find the correlation based on the pairs. A total of four steps were performed for the fusion of this approach: Serially fused all vectors, as shown in Figure 3 Obtained a combined vector of dimension × Find the correlation of each row feature vector and consider the most highly correlated features Check the fitness of each row using the Fine-KNN classifier In the end, the positively correlated and weakly correlated features are again serially fused in separate vectors. Both vectors are analyzed in terms of fitness function and the best one with better accuracy. This complete process is defined under the following Algorithm 1: Step 2: Make sets of using 2 × 2 window size.
Step 3: Find the correlation of each set using the following equation: Step 4: Consider features of positive correlation in a feature vector and weak correlation in Step 7: Fuse and separately in two new feature vectors and find the fitness of each.
Step 8: Based on the fitness, consider the highest accuracy feature set for further process. Output: Positive correlation vector (higher accuracy value in this work) ←

Novelty: Features Fusion and Optimization
Deep extracted features are fused using a serial correlation-based approach in this work. The main purpose of this approach is to first serially fuse all the features and then find the correlation based on the pairs. A total of four steps were performed for the fusion of this approach: Serially fused all vectors, as shown in Figure 3 Obtained a combined vector of dimension N × K Find the correlation of each row feature vector and consider the most highly correlated features Check the fitness of each row using the Fine-KNN classifier In the end, the positively correlated and weakly correlated features are again serially fused in separate vectors. Both vectors are analyzed in terms of fitness function and the best one with better accuracy. This complete process is defined under the following Algorithm 1: Step 1: Fused all vectors in a serial-based fashion Step 2: Make sets of φ 4 using 2 × 2 window size.
Step 3: Find the correlation of each set using the following equation: Step 4: Consider features of positive correlation in a feature vector φ 5k and weak correlation in φ 6k Step 7: Fuse φ 5k and φ 6k separately in two new feature vectors and find the fitness of each.
Step 8: Based on the fitness, consider the highest accuracy feature set for further process. Output: Positive correlation vector (higher accuracy value in this work) ← φ 5k The fused feature vector is further refined using a nature-inspired improved algorithm. Antlion Optimization with Mean Deviation(ALO-MD).
Mirjalili [36] developed a novel enacted optimization approach called antlion optimization (ALO). The ALO algorithm is constructed around the inherent hunting mechanism of ant lions.
Motivation: Antlions (doodlebugs) are classified as Myrmeleontidae and Neuroptera [51]. They often hunt as larvae, while the adult stage is used for reproduction. As they dive deep into the sand, antlion larvae move in a circular motion and spew sand from their large lips. After excavating the trap, larvae sleep under the cone's bottom, waiting for bugs, particularly ants, to be entrapped in it. The antlion attempts to seize any prey it discovers in the snare.
On the other hand, insects try to avoid captivity and are occasionally not immediately captured. Antlions expertly pour sand towards the hole's edge, enabling the prey to sink to the bottom. A victim trapped in the mouth is eaten underneath. Antlions fling the victim's remains outside the hole after devouring the victim and prepare the hole for their subsequent hunt. A further interesting aspect of antlion conduct is the relationship involving trap size and two variables: hunger level and moon shape.
Antlion optimization (ALO): Mirjalili [36] developed a novel enacted optimization approach called antlion optimization (ALO). The ALO algorithm is constructed around the inherent hunting mechanism of antlions.

Artificial Antlion
Using the prior depiction of antlions, Mirjalili devised the following criterion throughout optimization: • Ants, as prey, wander across the search space utilizing various random walks. • Antlion traps influence random walks.

•
Antlions may dig holes in accordance to their size. The greater the fitness, the larger the hole.

•
Antlions are more likely to capture ants if their holes are wider.

•
An antlion with the highest fitness level in each cycle can catch any ant.

•
The random walk's span is adaptively reduced to simulate ants sliding toward antlions.

Input
A searchable area, a fitness feature, a quantity of ants, antlions, iterations, and antlions (T) Output The fitness of the elitist antlion: 1.
Make an irregular population of n ant positions and n antlion positions 2.
Determine the fitness of each ant and antlion.

for each Ant I, do
• Choose an opponent using a roulette wheel (making trap).

•
For this Ant I, build and balance a random walk; check Equations (5) and (6) for model trapping, Equation (7) for the random walk, and Equation (9) for walk normalization. • end
If an antlion becomes fitter than the elite, update it. 9.
end while

Method 1: Antlion Optimization Algorithm (ALO)
• If an ant grows stronger than an antlion, the antlion will grab it and drag it beneath the sand.

•
After each hunt, an antlion repositions itself near the most recently caught prey and digs a hole to maximize its chances of catching new prey.
• Under the conditions above, an antlion optimizer can be built in the following. • Method 1.

Building trap:
The hunting skill of antlions is modeled using a roulette wheel. Ants are believed to be restricted to a single antlion. The ALO algorithm must select antlions throughout optimization depending on their fitness using a roulette wheel operator. This technique increases the likelihood of stronger antlions catching ants.
Catching prey and re-building the hole: In the final step of the hunt, the antlion consumes the ant. It is thought that when ants increase physical fitness in relation to their comparable antlion, they penetrate the sand and attempt to catch prey. An antlion must modify its posture to match the latest whereabouts of the chasing ant in order to maximize its potential for finding new victims. In this sense, sentence (1) is proposed. (1) where t indicates the most recent revision, Antlion( t j ) represents the position of the choose j − th antlion at t − th iteration, and Ant( t i ) represents the location of the I − th Ant at the t − th iteration. Antlion optimizer, according to the algorithm, performs the following stages on each particular ant:

Sliding ants towards Antlion:
Sand is thrown from the hole's center when an antlion finds an ant inside the trap. The imprisoned ant's attempt to escape is impacted by this action. The radius of the ants' random walk hyper-sphere is reduced adaptively to represent this behavior numerically; see Equations (8)-(10).
where a s is the component that has the least impact on t − th iteration and i is a ratio.
where b s is the highest value for all variables at t − th iteration and I is a ratio that is defined as: where s is the latest iteration; S the highest number of iterations; and u a constant specified by the current iteration (u = 2 f or s > 0.1S, u = 3 otherwise). When s exceeds 0.5S, w equals 4. When s > 0.75S and u = 5, when s > 0.9S, u = 6, and when s > 0.95S, u = 6. Essentially, the constant u can vary the amount of exploitation precision.

Trapping in Antlion's holes
The slide ant is captured by simulating the food movement towards the targeted antlion's hole. Alternatively, the location of the selected antlion now determines how far the ant can travel. Adjusting the range of the ant's random journey to the antlion's location in five equations can be depicted using Equations (11) and (12): where a s is the least significant variable at the t − th iteration; b s is the vector containing all variables with the highest values at the t − th iteration; a( s i ) is the least significant factor for the i − th ant; b( s j ) is the maximum of all variables for the i − th ant; and Antlion( s j ) shows the location of the chosen j − th antlion at the t − th iteration.
where s is the random walk step iteration in this research and rand is a random integer produced with a homogenous distribution in the range [0, 1] as per Equation (15): where x i is the random walk in the least of the i − th variable; b i is the random walk's maximum value in the i − th variable; a( s i ) is the lowest of the i − th variable at the t − th iteration; and b( s i ) is the peak of the i − th variable at the t − th iteration.

Elitism
The best solution(s) should be maintained throughout iterations by employing elitism. The chosen antlion and the elite antlion lead the ant's random walk in this scenario; therefore, moving a given ant takes the form of the average of both random walks; see Equation (16).
where P( s A ) is the picked antlion's random stroll about the roulette table, and P( s F ) is the shambling around the roulette wheel of the elite antlion.

Results and Discussion
With an emphasis on the inefficiency of other classifiers, test design, data collection, recall value, quantitative data, graphical representations, and tables, this section will analyze and show the findings based on various performance indicators.

Experimental Setup
On the dataset, 10-fold cross-validation was used to perform the calculations. The training rate is set to 0.05, the mini-batch range is restricted to 32, and 100 iterations are required for CNN architecture learning. The best among them is validated based on performance measurements such as accuracy, time taken, sensitivity rate, precision rate, number of observations, FNR, Fowlkes-Mallows index, and F1-Score. Several classifiers are utilized to validate the suggested approach with the greatest accuracy and minimum time consumed. MATLAB 2022a was employed to execute the simulation studies on a personal desktop pc Core-i7 having a memory of 16GB as well as an 8 Gigabyte graphics card. Table 5 contains the classification outcomes for the ISIC2018 dataset using the DarkNet19 deep model. The fine-tuned model was trained using the enhanced dataset, which was also used to extract features from the second-to-last feature layer. Several classifiers were used, but Quadratic SVM outperformed them with an accuracy of 86.3%, a recall rate of 87.27%, a precision rate of 87.2%, F1 score of 87.24%, and an AUC value of 0.98%. Each classifier's computational time is also calculated, as shown in Table 5. The Fine Tree classifier's least recorded time is 108.46 s, while the Medium Neural Network's greatest recorded time is 2978.1 (s). The classification outcomes of the ISIC2018 dataset for the Resnet18 deep model are shown in Table 5 (second half). Numerous classifiers have been used for the classification process but Quadratic SVM performed better, achieving an accuracy of 88.3%, a recall rate of 89.39%, a precision rate of 89.13%, an F1 score of 89.26%, and an AUC value of 0.98%. Moreover, the computational time is also computed for each classifier, as shown in Table 5. Compared with experiment 1 (Table 5), it is observed that the maximum accuracy for this experiment is 88.3%, whereas for the first experiment, the maximum obtained accuracy was 86.3%. Hence, it can be summarized that the fine-tuned Resnet18 model gives better accuracy. The least noted time is 52.616 s for the Fine Tree classifier, whereas the maximum observed time is 1112.6 s for the medium neural network.

ISIC2018 Dataset Results:
The classification outcomes for the ISIC2018 dataset for the InceptionV3 deep model are shown in Table 5 (third section). Although several classifiers were used, Quadratic SVM outperformed them all with an accuracy of 90.9%, a recall rate of 92.63%, a precision rate of 92.03%, an F1 score of 92.32%, and an AUC value of 0.99%. Each classifier's computing time is also calculated, as shown in Table 5. It was found that the maximum accuracy for this experiment is 90.9%, compared to experiments 1 and 2. Meanwhile, for the first experiment, the maximum obtained accuracy was 86.3%, and for the second experiment, the maximum accuracy was 88.3%. Hence, it can be summarized that the fine-tuned InceptionV3 model provides better accuracy. The minimum noted time is 129.68 s for the Fine Tree classifier, whereas the maximum observed time is 4601.7 s for the medium neural network.
The classification outcomes of the proposed fusion technique on the enhanced ISIC2018 skin dataset are given in Table 6. Many classifiers were used; however, Quadratic SVM outperformed them all with an accuracy of 96.1%, a recall rate of 96.93%, a precision rate of 96.33%, an F1 score of 96.62%, and an AUC value of 0.98. Each classifier's computational time is also calculated, as shown in Table 6. Compared with previous experiments (Table 5), it is observed that the maximum accuracy for this experiment is 96.1%, whereas for the first experiment, the maximum obtained accuracy was 86.3%, for the second experiment, the maximum accuracy was 88.3%, and for the third experiment the maximum accuracy was 90.9%. Hence, the fusion process increases accuracy more than individual deep model components. The minimum noted time is 290.756 s for the Fine Tree classifier, whereas the maximum observed time is 8693 s for the medium neural network. The confusion matrix of this experiment is also shown in Figure 9. The classification outcomes of the proposed fusion technique on the enhanced ISIC2018 skin dataset are given in Table 6. Many classifiers were used; however, Quadratic SVM outperformed them all with an accuracy of 96.1%, a recall rate of 96.93%, a precision rate of 96.33%, an F1 score of 96.62%, and an AUC value of 0.98. Each classifier's computational time is also calculated, as shown in Table 6. Compared with previous experiments (Table 5), it is observed that the maximum accuracy for this experiment is 96.1%, whereas for the first experiment, the maximum obtained accuracy was 86.3%, for the second experiment, the maximum accuracy was 88.3%, and for the third experiment the maximum accuracy was 90.9%. Hence, the fusion process increases accuracy more than individual deep model components. The minimum noted time is 290.756 s for the Fine Tree classifier, whereas the maximum observed time is 8693 seconds for the medium neural network. The confusion matrix of this experiment is also shown in Figure 9.   Table 7 shows the proposed feature selection technique results using the enhanced ISIC2018 dataset. The Quadratic SVM classifier outperformed them all with an accuracy of 96.0%, a recall rate of 96.86%, a precision rate of 96.3%, an F1 score of 96.56%, and an AUC value of 0.99%. Each classifier's computational time is also calculated, as shown in Table 7. Moreover, the confusion matrix is illustrated in Figure 10, which shows the correct  Table 7 shows the proposed feature selection technique results using the enhanced ISIC2018 dataset. The Quadratic SVM classifier outperformed them all with an accuracy of 96.0%, a recall rate of 96.86%, a precision rate of 96.3%, an F1 score of 96.56%, and an AUC value of 0.99%. Each classifier's computational time is also calculated, as shown in Table 7. Moreover, the confusion matrix is illustrated in Figure 10, which shows the correct prediction rate for each class. The highest accuracy for this experiment is 96.0%, compared with experiments 1, 2, and 3 (Table 5). Although the maximum accuracy for the first experiment was 86.3%, the maximum accuracy for the second experiment was 88.3%, the maximum accuracy for the third experiment was 90.9%, and the maximum accuracy for the fourth experiment was 96.1%. prediction rate for each class. The highest accuracy for this experiment is 96.0%, compared with experiments 1, 2, and 3 (Table 5). Although the maximum accuracy for the first experiment was 86.3%, the maximum accuracy for the second experiment was 88.3%, the maximum accuracy for the third experiment was 90.9%, and the maximum accuracy for the fourth experiment was 96.1%.
In conclusion, it can be said that when comparing Table 5, it is shown that the optimization time increases accuracy and decreases computing time; however, it can be seen that the accuracy only changed a little, but the computational time changed significantly compared to the previous experiment. Hence, overall, the proposed framework and the optimization process show improvement. The least noted time is 130.94 s for the Fine Tree classifier, whereas the maximum observed time is 2525.7 (s) for medium KNN.

ISIC2019 Dataset Results:
The classification outcomes of the ISIC2019 dataset using the DarkNet19 deep model are shown in Table 8. The fine-tuned model was trained using the supplemented dataset, which was also used to extract features from the second-to-last feature layer. Weighted KNN outperformed other classifiers used for classification, achieving an accuracy of 99.7%, a recall rate of 99.73%, a precision rate of 99.71%, an F1 score of 99.72%, and an AUC value of 1.00%. Each classifier's computing time is also calculated, as shown in Table 8. The Fine Tree classifier's minimum noted time is 245.51 s, whereas the bi-layer neural network's highest recorded time is 2123.6 (s).
The classification outcomes of the ISIC2019 dataset for the Resnet18 deep model are shown in Table 8 (second half). Several classifiers have been used; however, Weighted KNN outperformed them all with an accuracy of 99.5%, a recall rate of 99.53%, a precision rate of 99.59%, an F1 score of 99.56%, and an AUC value of 1.00%. Each classifier's processing time is also calculated. This experiment obtained a maximum accuracy of 99.5% In conclusion, it can be said that when comparing Table 5, it is shown that the optimization time increases accuracy and decreases computing time; however, it can be seen that the accuracy only changed a little, but the computational time changed significantly compared to the previous experiment. Hence, overall, the proposed framework and the optimization process show improvement. The least noted time is 130.94 s for the Fine Tree classifier, whereas the maximum observed time is 2525.7 (s) for medium KNN.
ISIC2019 Dataset Results: The classification outcomes of the ISIC2019 dataset using the DarkNet19 deep model are shown in Table 8. The fine-tuned model was trained using the supplemented dataset, which was also used to extract features from the second-tolast feature layer. Weighted KNN outperformed other classifiers used for classification, achieving an accuracy of 99.7%, a recall rate of 99.73%, a precision rate of 99.71%, an F1 score of 99.72%, and an AUC value of 1.00%. Each classifier's computing time is also calculated, as shown in Table 8. The Fine Tree classifier's minimum noted time is 245.51 s, whereas the bi-layer neural network's highest recorded time is 2123.6 (s).
The classification outcomes of the ISIC2019 dataset for the Resnet18 deep model are shown in Table 8 (second half). Several classifiers have been used; however, Weighted KNN outperformed them all with an accuracy of 99.5%, a recall rate of 99.53%, a precision rate of 99.59%, an F1 score of 99.56%, and an AUC value of 1.00%. Each classifier's processing time is also calculated. This experiment obtained a maximum accuracy of 99.5% compared to the previous experiment. The classification outcomes of the ISIC2019 dataset for the InceptionV3 deep model are shown in Table 8 (third section). Many classifiers have been used; however, Weighted KNN outperformed them all with an accuracy of 99.7%, a recall rate of 99.66%, a precision rate of 99.69%, an F1 score of 99.36%, and an AUC value of 1.00%. Overall, this experiment's performance is better than previous experiments. The classification outcomes for the enhanced ISIC2018 skin dataset are given in Table 9. Many classifiers have been used; however, Medium KNN outperformed them all with an accuracy of 99.9%, a recall rate of 99.86%, a precision rate of 99.88%, an F1 score of 99.88%, and an AUC value of 1.00%. Each classifier's processing time is also calculated, as shown in Table 9. Moreover, Figure 11 shows the Medium KNN's confusion matrix to verify the correct prediction rate. Compared with the previous three experiments of the proposed fusion process, it is observed that the accuracy of this experiment is significantly improved. After the fusion process, we employed the proposed feature selection technique. proposed fusion process, it is observed that the accuracy of this experiment is significantly improved. After the fusion process, we employed the proposed feature selection technique.  Several classifiers have been used; however, Weighted KNN outperformed them all with an accuracy of 99.9%, a recall rate of 99.89%, a precision rate of 99.89%, an F1 score of 99.88%, and an AUC value of 1.00%. Each classifier's computing time is also calculated, as shown in Table 10. Moreover, Figure 12 also shows the Weighted KNN confusion matrix. By employing Figure 12, we can verify the correct prediction rate of each cancer class. In contrast to Experiment 1, Experiment 2, Experiment 3, and Experiment 4 (Tables 8 and  9), it is noted that the maximum accuracy for this experiment is 99.9%. In contrast, the maximum accuracy for the first experiment was 99.7%, the maximum accuracy for the second experiment was 99.5%, the maximum accuracy for the third experiment was 99.7%, and the maximum accuracy for the fourth experiment was 99.9%. Overall, it is noted that the accuracy of the fusion process is improved, but computational time is significantly reduced for the feature selection technique.  Several classifiers have been used; however, Weighted KNN outperformed them all with an accuracy of 99.9%, a recall rate of 99.89%, a precision rate of 99.89%, an F1 score of 99.88%, and an AUC value of 1.00%. Each classifier's computing time is also calculated, as shown in Table 10. Moreover, Figure 12 also shows the Weighted KNN confusion matrix. By employing Figure 12, we can verify the correct prediction rate of each cancer class. In contrast to Experiment 1, Experiment 2, Experiment 3, and Experiment 4 (Tables 8 and 9), it is noted that the maximum accuracy for this experiment is 99.9%. In contrast, the maximum accuracy for the first experiment was 99.7%, the maximum accuracy for the second experiment was 99.5%, the maximum accuracy for the third experiment was 99.7%, and the maximum accuracy for the fourth experiment was 99.9%. Overall, it is noted that the accuracy of the fusion process is improved, but computational time is significantly reduced for the feature selection technique. In the end, the comparison is conducted regarding time for the middle steps on selected datasets. Table 11 presents the computational time-based comparison of the  ISIC2018 dataset. This table shows that the time noted by the Resnet18 model is less  than the Darknet19 and InceptionV3, except for the Bagged Tree classifier. However, after the fusion process, it jumped and almost doubled this time, which is a drawback of this framework. This drawback was resolved through a proposed optimization approach that maintains accuracy and reduces the computational time significantly compared to the fusion process. For Darknet19, the minimum time is 108. 46  In the end, the comparison is conducted regarding time for the middle steps on selected datasets. Table 11 presents the computational time-based comparison of the ISIC2018 dataset. This table shows that the time noted by the Resnet18 model is less than the Darknet19 and InceptionV3, except for the Bagged Tree classifier. However, after the fusion process, it jumped and almost doubled this time, which is a drawback of this framework. This drawback was resolved through a proposed optimization approach that maintains accuracy and reduces the computational time significantly compared to the fusion process. For Darknet19, the minimum time is 108. 46     Finally, the proposed framework's accuracy is compared with several recent studies, as presented in Table 13. Based on this table, it is observed that the proposed framework accuracy is significantly improved. In addition, a few AI-based dermatoscopy techniques (publicly available) are compared with the proposed method. In [52], they obtained an AUC value of 0.970 on ISIC2019 and 0.932 on ISIC2018 dataset using the ADAE technique. However, our method obtained 0.99. In [53], they obtained an accuracy of 96.10%, whereas the proposed method obtained 99.8%.  Table 14 presents the summary of all best results based on the additional performance measures such as Fowlkes-Mallows index, MCC, and Kappa. Overall, the proposed method shows the improved accuracy.

Conclusions
Today, serious issues include the deaths of patients due to the late or incorrect diagnosis of cancer cases. Early diagnosis of cancer cases using a CAD system can help in the reduction in the death rate. When an appropriate CAD system is employed, this can complement the work of dermatologists in classifying skin lesions (benign or melanoma). This work proposes a deep learning-and optimization-based end-to-end framework for multiclass skin lesion classification. Initially, a contrast enhancement technique was proposed based on the dark channel haze and top-bottom filtering that improved image quality and the strength of deep features. Hyperparameters of the fine-tuned model were initialized using a genetic algorithm instead of manual initialization. After that, deep features were extracted and fused with the information using a serial correlation approach. The fusion process improved the accuracy, but computational time increased. A selection technique called improved antlion optimization was developed to make the framework more efficient in terms of time. The best features are selected using this approach and classified using machine learning classifiers. The experimental process was conducted on two publicly available datasets, ISIC2018 and ISIC2019, and obtained improved accuracy of 96.1% and 99.9%, respectively.

Limitations
-A detailed analysis is required for the max pooling operation of sizes 2 × 2, 3 × 3, and 4 × 4 of the weights preprocessing process. - The augmentation process improved the accuracy, but on the other hand, it significantly increased the redundant features. -KNN classifiers drop the classification accuracy that needs the proper analysis. - The fusion process improved the accuracy, but computational time also increased due to the enlarged number of predictors.

Future Directions
A residual block-based attention network will be designed in the future, and more layers will be added based on the GradCAM approach. This will allow max-pooling layer weights to be analyzed to help improve the proposed model. In addition, the experimental process will be conducted on the ISIC2020 dataset.