Skin Lesion Detection Using Hand-Crafted and DL-Based Features Fusion and LSTM

The abnormal growth of cells in the skin causes two types of tumor: benign and malignant. Various methods, such as imaging and biopsies, are used by oncologists to assess the presence of skin cancer, but these are time-consuming and require extra human effort. However, some automated methods have been developed by researchers based on hand-crafted feature extraction from skin images. Nevertheless, these methods may fail to detect skin cancers at an early stage if they are tested on unseen data. Therefore, in this study, a novel and robust skin cancer detection model was proposed based on features fusion. First, our proposed model pre-processed the images using a GF filter to remove the noise. Second, the features were manually extracted by employing local binary patterns (LBP), and Inception V3 for automatic feature extraction. Aside from this, an Adam optimizer was utilized for the adjustments of learning rate. In the end, LSTM network was utilized on fused features for the classification of skin cancer into malignant and benign. Our proposed system employs the benefits of both ML- and DL-based algorithms. We utilized the skin lesion DermIS dataset, which is available on the Kaggle website and consists of 1000 images, out of which 500 belong to the benign class and 500 to the malignant class. The proposed methodology attained 99.4% accuracy, 98.7% precision, 98.66% recall, and a 98% F-score. We compared the performance of our features fusion-based method with existing segmentation-based and DL-based techniques. Additionally, we cross-validated the performance of our proposed model using 1000 images from International Skin Image Collection (ISIC), attaining 98.4% detection accuracy. The results show that our method provides significant results compared to existing techniques and outperforms them.


Introduction
Skin melanoma is a life-threatening disease that appears on areas of the body more exposed to sunlight. To analyze the abnormal growth of cells on skin, biomedical imaging plays a vital role. Moles are assessed through imaging techniques, which aid in the early detection and treatment of chronic diseases [1]. Cells and biological tissues are examined to investigate illness of the human organs. These organs are scanned with the help of medical imaging modalities such as X-ray, immunohistochemistry images, ECG, and MRI [2]. CT (SPECT), PET scan, X-ray and MRI are some of the medical imaging modalities that are = frequently utilized by medical experts and physicians. The physiological functions and biological structure of various organs can be analyzed by using these imaging modalities [3]. Moreover, numerous types of tumor, i.e., brain, blood, skin, and lungs are mostly analyzed and detected using CT scans. Dysplasia is clinically visualized with autofluorescence imaging (AFI), high-resolution endoscopy (HRE), narrow-band imaging (NBI), and confocal laser endo-microscopy (CLE) and optical coherence tomography (OCT) [4]. Tumor tissues, adipose tissues, and breast cancer can be easily diagnosed by X-ray mammography [5,6]. X-ray mammography is painful; therefore, advanced microwave imaging techniques are

•
To propose a novel features fusion-based technique for the early detection of skin cancer. First, images were pre-processed using GF to remove the noise. Second, we extracted features from the images using LBP and Inception V3. Then, we fused these features and employed an LSTM network for the binary classification into malignant and benign. Additionally, we used an Adam optimizer to adjust the learning rate for Inception V3. • Our proposed model is an efficient technique due to its hybrid architecture that extracts most representative features and employs Long Short-Term Memory (LSTM) for the classification.

•
We trained our classifier on 75% dataset and performed various experiments for the assessment of the proposed system, demonstrating its efficacy • We cross-validated our proposed model, and the experiments showed that it significantly outperformed the existing techniques.

•
Our proposed features fusion-based model is simple and easy to execute.
The remaining paper is organized as follows: Section 2 presents related work, Section 3 explains our proposed method, Section 4 demonstrates various experiments, and Section 5 is the conclusion.

Related Work
Initially, traditional machine learning-based techniques were introduced for melanoma detection and classification. Codella et al. developed a traditional ML-based features extraction method employing the color and edge histogram along with local binary pattern (LBP). The shades of gray algorithm is used for image pre-processing in [1]. Moreover, Mask R-CNN, a deep learning approach, is used for the segmentation of skin lesions. Morphological operations were performed to remove noise in the images. The detection of skin lesions from dermoscopy images consisting of three steps was performed in [22], such as: (1) image pre-processing to improve the performance of the model dividing them into negative and positive classes, (2) image augmentation applied on data to protect the model from overfitting, (3) the use of Densnet-121 to extract the features and the proposal of a U-net architecture-based lightweight CNN model for skin lesion detection. The results of the proposed method are not mentioned in terms of confusion matrix and accuracy.
H. A. Hasan proposed a framework, i.e., hybrid detection techniques for skin cancer detection, in which images were converted from RGB to grayscale. The list of arrays was converted into NumPy array, and benign images were labeled as 0 and malignant images as 1. The dataset was divided into train and test samples, and the CNN model was trained with K-fold validation. The Xception model achieved 85.303% accuracy [23], whereas other tested models such as MobileNet-v2, Resnet 50, and VGG19 achieved lower accuracy, with MobileNet-v2 producing the lowest at 54.54%. The lower accuracy was due to using a dataset with low-quality images. In [24], skin cancer detection was performed by employing a machine learning model, i.e., Support Vector Machine (SVM). Features were extracted by the gray level co-occurrence matrix (GLCM) method and fed to SVM for the detection of skin lesions. The method achieved 95% accuracy; however, it could be improved by utilizing some image pre-processing methods on the dataset to remove noise and improve the training process.
In [25], M. R. Ibraheem performed the contrast-limited adaptive histogram equalization (CLAHE) technique to enhance the images of lesions, employing bilinear interpolation and threshold equalization methods. Moreover, the pixel-based method was applied for the segmentation of lesions and feature extraction. The classes were named as 0 to 2, where 0 represented a background object, 1 indicated the benign lesion, and 2 represented melanoma. Gradient Boosted Tree (GBT) gave 97.5% accuracy. In [26], Rahajeng, M. Nuh used different techniques, such as median filter, threshold and automatic cropping as image pre-processing operations. Furthermore, active contour and Sobel filters were applied for skin lesion segmentation based on shape, color, and texture features, which were obtained using the GLCM method. SVM was employed as a classifier to identify the type of skin lesion, achieving 85% accuracy. The performance of the system was lacking in high precision and accuracy.
G. S. Jayalakshmi and V. S. Kumar initially resized images into 400 × 400 pixels and fed these into an image input layer. A convolution layer with 32 filters was used for feature extraction, changing the parameter of input distribution, and adaptive re-parameterization was applied using a batch normalization layer to overcome the internal covariant shift [27]. ReLU was used as an activation function after max-pooling and fully connected layers. Accuracy, recall, F1-score, and precision were calculated as performance metrics. The learning rate was set as 0.001, and 89.3% accuracy with a loss factor of 0.2633 was attained; however, the accuracy can be improved by improving the training options and customizing the layers of the proposed method. The authors of [28] used CNN to classify an image as benign or malignant. The International Skin Collaboration (ISIC) 2016 dataset was used for training, which consisted of images with dimensions of 1024 × 767 pixels. These images comprised three types, i.e., melanoma was classified as malignant, whereas nevus and seborrhea keratosis were classified as benign. The generalized Gaussian distribution method was used for image segmentation with a convolution neural network employing ReLU as an activation function to classify the images. The performance of the model was calculated in terms of accuracy, specificity, and sensitivity. The accuracy of the proposed method was 98.32%; however, the model is not validated on other datasets. Y. Filali, H. El Khoukhi focused on classification of a skin lesion by decomposing the images into object component and texture. Segmentation was applied on objects to obtain the desired area; then, the segmented area and texture were combined. A convolutional layer was used for feature extraction to learn the hidden patterns [29] requiring high computational resources. A pooling layer was used after the convolution layer to reduce the spatial size, and fully connected layers were added to use the extracted features for classification of the skin lesion. Softmax was used as an activation function to classify melanoma as malignant and nevus as benign. The classification results were evaluated through accuracy, specificity, and sensitivity. The proposed method achieved an accuracy of 93.50%, which can further be increased. N. Rezaoana classified nine classes-NV, DF, MEL, VASC, BCC, AKIEC, BKL, squamous carcinoma, and seborrhea keratosis-using the CNN model. Data augmentation on images such as flip, rotation, shear, etc., were performed to increase the volume of data for training. Convolutional layers were used for feature extraction, the max pooling layer was utilized to reduce the dimensionality, and softmax was used after the pair of fully connected layers as an activation function [30]. Precision, F1-score, recall, and accuracy were used as the performance measurements of the model. VGG-16 and VGG-19 were also analyzed and achieved an accuracy of 69.57% and 71.19%, respectively, whereas the proposed model achieved an accuracy of about 79.45%; however, this accuracy could be increased by improving the model architecture. Various existing techniques are reported in Table 1.

Materials and Methods
In Figure 1, the workflow of the proposed system is presented. The figure demonstrates that the proposed model consisted of pre-processing utilizing the Gaussian Filtering method to minimize the noise present in the images. Then, features extraction was performed using LBP and the deep learning model, i.e., the Inception V3 algorithm. Moreover, an Adam optimizer was exploited to optimize the rate of learning for the Inception technique. Ultimately, Long Short-Term Memory (LSTM) was utilized to detect and classify the skin images into melanoma and benign. and BOF [32]  High computational resources

Materials and Methods
In Figure 1, the workflow of the proposed system is presented. The figure demonstrates that the proposed model consisted of pre-processing utilizing the Gaussian Filtering method to minimize the noise present in the images. Then, features extraction was performed using LBP and the deep learning model, i.e., the Inception V3 algorithm. Moreover, an Adam optimizer was exploited to optimize the rate of learning for the Inception technique. Ultimately, Long Short-Term Memory (LSTM) was utilized to detect and classify the skin images into melanoma and benign.

Gaussian Filtering:
To improve the images and remove the noise, 2D Gaussian filters were employed. These required high computational resources; however, they provide a new area to conduct research. In this sub-stage, Gaussian operators represent convolutional operators, and smoothing is recommended by convolution. The one-dimensional Gaussian operator is given below as: Diagnostics 2022, 12, x FOR PEER REVIEW Figure 1. Architecture of the proposed skin cancer detector.

Gaussian Filtering:
To improve the images and remove the noise, 2D Gaussian fil These required high computational resources; however, they provid duct research. In this sub-stage, Gaussian operators represent conv and smoothing is recommended by convolution. The one-dimension is given below as: The significant filter for the smoothing of images passes throug frequency and spatial domains, whereas the relation of uncertainty i below: Two-dimensional operator for Gaussian filter is given below: Here, sigma (ϭ) is the standard deviation for a Gaussian function imum, the smoothing will be highest, whereas x and y exhibit the C of the image, demonstrating the window dimensions.

Features Extraction (FE)
Features extraction was performed after pre-processing the dat algorithms, i.e., local binary patterns (LBP) and a histogram of o (HOG) with the Inception v3 method. Moreover, we utilized the lea along with the Adam optimizer to improve the detection performanc

Gaussian Filtering:
To improve the images and remove the noise, 2D Gaussian filters were These required high computational resources; however, they provide a new a duct research. In this sub-stage, Gaussian operators represent convolutiona and smoothing is recommended by convolution. The one-dimensional Gaussi is given below as: The significant filter for the smoothing of images passes through localiz frequency and spatial domains, whereas the relation of uncertainty is execute below: Two-dimensional operator for Gaussian filter is given below: Here, sigma (ϭ) is the standard deviation for a Gaussian function. If the va imum, the smoothing will be highest, whereas x and y exhibit the Cartesian c of the image, demonstrating the window dimensions.

Features Extraction (FE)
Features extraction was performed after pre-processing the data using th algorithms, i.e., local binary patterns (LBP) and a histogram of orientation (HOG) with the Inception v3 method. Moreover, we utilized the learning rat along with the Adam optimizer to improve the detection performance.

Local Binary Patterns:
LBP has been utilized in various domains, such as knee disease detectio The significant filter for the smoothing of images passes through localization in the frequency and spatial domains, whereas the relation of uncertainty is executed as shown below: Two-dimensional operator for Gaussian filter is given below: duct research. In this sub-stage, Gaussian operators represent conv and smoothing is recommended by convolution. The one-dimensiona is given below as: The significant filter for the smoothing of images passes throug frequency and spatial domains, whereas the relation of uncertainty is below: Two-dimensional operator for Gaussian filter is given below: Here, sigma (ϭ) is the standard deviation for a Gaussian function imum, the smoothing will be highest, whereas x and y exhibit the Ca of the image, demonstrating the window dimensions.

Features Extraction (FE)
Features extraction was performed after pre-processing the data algorithms, i.e., local binary patterns (LBP) and a histogram of or (HOG) with the Inception v3 method. Moreover, we utilized the lear along with the Adam optimizer to improve the detection performanc

Local Binary Patterns:
LBP has been utilized in various domains, such as knee disease disease detection [41], etc. As a separate vector, the histograms were namely as a pattern vector. The LBP texture features were integrate organizing map (SOM), which finds an alternative use in assessing the posed technique. LBP is a method for texture description depending u symptoms from central and neighboring pixels. A binary code was att the threshold technique on all pixel values utilizing the mid pixel. known as the binary pattern. The value of the neighbor pixel was set a of the pixel became greater than the value of the threshold. This was value of the pixel was less than the value of the threshold. Succeedin was utilized to compute the occurrence of BP, and each pattern ex value for the pattern in an image.
The LBP module uses the intermediate pixel's value as a threshol 3 × 3 pixel. The threshold was employed by deploying a binary patt texture features. The LBP procedure is defined in equation form below

Gaussian Filtering:
To improve the images and remove the noise, 2D Gaussian filters wer These required high computational resources; however, they provide a new duct research. In this sub-stage, Gaussian operators represent convolution and smoothing is recommended by convolution. The one-dimensional Gauss is given below as: The significant filter for the smoothing of images passes through locali frequency and spatial domains, whereas the relation of uncertainty is execut below: Two-dimensional operator for Gaussian filter is given below: Here, sigma (ϭ) is the standard deviation for a Gaussian function. If the v imum, the smoothing will be highest, whereas x and y exhibit the Cartesian of the image, demonstrating the window dimensions.

Features Extraction (FE)
Features extraction was performed after pre-processing the data using algorithms, i.e., local binary patterns (LBP) and a histogram of orientati (HOG) with the Inception v3 method. Moreover, we utilized the learning ra along with the Adam optimizer to improve the detection performance.

Local Binary Patterns:
LBP has been utilized in various domains, such as knee disease detect disease detection [41], etc. As a separate vector, the histograms were integr namely as a pattern vector. The LBP texture features were integrated along organizing map (SOM), which finds an alternative use in assessing the efficac posed technique. LBP is a method for texture description depending upon th symptoms from central and neighboring pixels. A binary code was attained b the threshold technique on all pixel values utilizing the mid pixel. The bi known as the binary pattern. The value of the neighbor pixel was set as 1 wh of the pixel became greater than the value of the threshold. This was set as value of the pixel was less than the value of the threshold. Succeeding this, was utilized to compute the occurrence of BP, and each pattern exhibited value for the pattern in an image.
The LBP module uses the intermediate pixel's value as a threshold to the 3 × 3 pixel. The threshold was employed by deploying a binary pattern tha texture features. The LBP procedure is defined in equation form below: To improve the images and remove the noise, 2D Gaussian filters were employed. These required high computational resources; however, they provide a new area to conduct research. In this sub-stage, Gaussian operators represent convolutional operators, and smoothing is recommended by convolution. The one-dimensional Gaussian operator is given below as: The significant filter for the smoothing of images passes through localization in the frequency and spatial domains, whereas the relation of uncertainty is executed as shown below: Two-dimensional operator for Gaussian filter is given below: Here, sigma (ϭ) is the standard deviation for a Gaussian function. If the value is maximum, the smoothing will be highest, whereas x and y exhibit the Cartesian coordinates of the image, demonstrating the window dimensions.

Features Extraction (FE)
Features extraction was performed after pre-processing the data using the fusion of algorithms, i.e., local binary patterns (LBP) and a histogram of orientation gradients (HOG) with the Inception v3 method. Moreover, we utilized the learning rate scheduler along with the Adam optimizer to improve the detection performance.

Local Binary Patterns:
LBP has been utilized in various domains, such as knee disease detection [40], eye disease detection [41], etc. As a separate vector, the histograms were integrated in LBP, namely as a pattern vector. The LBP texture features were integrated along with a selforganizing map (SOM), which finds an alternative use in assessing the efficacy of the proposed technique. LBP is a method for texture description depending upon the differential symptoms from central and neighboring pixels. A binary code was attained by employing the threshold technique on all pixel values utilizing the mid pixel. The binary code is known as the binary pattern. The value of the neighbor pixel was set as 1 where the value of the pixel became greater than the value of the threshold. This was set as 0, where the value of the pixel was less than the value of the threshold. Succeeding this, a histogram was utilized to compute the occurrence of BP, and each pattern exhibited the possible value for the pattern in an image.
The LBP module uses the intermediate pixel's value as a threshold to the neighboring 3 × 3 pixel. The threshold was employed by deploying a binary pattern that referred to texture features. The LBP procedure is defined in equation form below: represents the central and neighboring pixel values, whereas index i demonstrates the index of the neighbor pixel as for v < 0 the g(v) = 0 and g(v) = 1 for v ≥ 0. Consequently, the nearest ) is the standard deviation for a Gaussian function. If the value is maximum, the smoothing will be highest, whereas x and y exhibit the Cartesian coordinates of the image, demonstrating the window dimensions.

Features Extraction (FE)
Features extraction was performed after pre-processing the data using the fusion of algorithms, i.e., local binary patterns (LBP) and a histogram of orientation gradients (HOG) with the Inception v3 method. Moreover, we utilized the learning rate scheduler along with the Adam optimizer to improve the detection performance.

Local Binary Patterns:
LBP has been utilized in various domains, such as knee disease detection [40], eye disease detection [41], etc. As a separate vector, the histograms were integrated in LBP, namely as a pattern vector. The LBP texture features were integrated along with a selforganizing map (SOM), which finds an alternative use in assessing the efficacy of the proposed technique. LBP is a method for texture description depending upon the differential symptoms from central and neighboring pixels. A binary code was attained by employing the threshold technique on all pixel values utilizing the mid pixel. The binary code is known as the binary pattern. The value of the neighbor pixel was set as 1 where the value of the pixel became greater than the value of the threshold. This was set as 0, where the value of the pixel was less than the value of the threshold. Succeeding this, a histogram was utilized to compute the occurrence of BP, and each pattern exhibited the possible value for the pattern in an image.
The LBP module uses the intermediate pixel's value as a threshold to the neighboring 3 × 3 pixel. The threshold was employed by deploying a binary pattern that referred to texture features. The LBP procedure is defined in equation form below: LBP (v c ,u c ) exhibits the LBP value for the middle pixel (v c , u c ). I(v c , u c ) and I i represents the central and neighboring pixel values, whereas index i demonstrates the index of the neighbor pixel as for v < 0 the g(v) = 0 and g(v) = 1 for v ≥ 0. Consequently, the nearest pixels may be 0, if the value of the score is less than the threshold. On the other side, it may be set to 1, where the value of the pixel is greater than the value of the threshold. LBP is computed through scalar multiplication among weight and binary matrices. In the end, the results are used to exhibit the value of LBP.

Inception V3 Using Adam Optimization
Convolutional neural networks (CNN) consist of five layers: the input layer, convolution layer, pooling layer, fully connected layer, and classification layer, which is known as the output layer. Google Net is a network installed in Google. It utilizes the Inception model as it bounds the attributes of layers and improves the depth. Consequently, it has been widely used for classification purposes. The general attributes of CNN were explained in [42]. Convolutional Layer: Convolutional layers may have variations when all pixels are not linked to previous layers through parameters and bias. The image is split into small regions and then parameters and biases are utilized. These biases and parameters are known as filters which are convolved with the tiny regions on the input image and provide feature maps. The filters are known as simple features that may be analyzed in images fed into the input layer, in that specific layer. Moreover, the number of parameters is important during convolutional operation and can be small, as the same filter is employed for a complete image against a single feature. The hyper-parameters of a convolutional layer include the size of the local region, the number of filters, padding, and stride. To attain optimal outcomes, the hyper-parameters are fine-tuned depending upon the size of the input image.
Pooling Layer (PL): PL is utilized to minimize the spatial dimensions of images, the number of parameters, and the complexity. It employs a constant method for an input without parameters. There are various types of PL, such as max, average, and stochastic pooling. The most common type is max pooling, which is applied when i x i slides through and reduces the input, according to stride 's'. The size of the input becomes limited and the maximum value in the i × i region is utilized. It provides translational invariance when a tiny difference in a position is used to assess an input image. Therefore, the location disappears while minimizing the size.
Fully Connected Layer: In this layer, the result from the final PL is provided as input. It works as a convolutional neural network with all neurons connected to the present layer. Therefore, the convolutional layer contains the maximum number of parameters. Then, the fully connected layer is attached to the final layer, which is known as the classification layer.
Activation Function (AF): Various AFs are employed for various types of CNN. The optimal output is attained through non-linear activation functions than tangent or sigmoid functions. These functions are employed to increase the speed of training. Therefore, different activation functions are employed, whereas ReLU demonstrates better performance than other models. Convolutional neural network (CNN) utilizes vector computation and the chain rule. Let us suppose that x is a scalar as x cR, and y cRh as a vector, where x is a function of y, representing the partial derivative of x in the context of y, which is mathematically described as: More specifically, ∂x ∂Y represents the vector comprising an equal size to Y, and the ith number component is represented by ∂x ∂Y i . It is noteworthy that Additionally, z cR w represents a different vector, whereas Y is a function of z. Moreover, the fractional derivative of Y in terms of z is represented by: In the fractional derivative h × w matrix, it is accessed at the interval of the i and j row and column, respectively, such as (∂Y_i)/(∂z_i). It is clearly shown that x is a task of z in chain arguments. Thus, one method maps z to Y, while another method maps Y to x. The chain method was employed for the computation presented below. ∂x ∂z t , as The loss computation function is used to compute the difference among the predicted value of the Convolutional Neural Network x l and the goal g, z 1 →w 1 , z 2 →, . . . . . . ,z l →w l = Diagnostics 2022, 12, 2974 8 of 19 x. The loss function is simple, as x = ||target-z l || 2 . The predicted output is represented as argmax i z i l . Therefore, a convolutional operation is computed as below: where f represents the filter with the size (h × w × d l ). Therefore, the conv. layer maintains the size (h l -h + 1) × (w l -w + 1), comprising d slices that represent Y(z l+1 ) in R h l+1 × w l+1 × w l+1 , h l+1 = h l − h + 1, w l + 1 = w l − w+1, and d l + 1 = d.
The likelihood of all labels k c{1, . . . ,k} is employed for the training of instance that is computed through P ; here, x refers to the non-normalized log possibility.
The ground truth is s(k|z), which is normalized in a way that ∑ k s(k|z) = 1, where loss is computed using cross-entropy and is represented as below: The value of cross-entropy is based on differentiation in the context of x k , and it is computed as the gradient training of deep functions as it has the easiest form ∂l ∂x k = p(k) − s(k), which varies from −1 to 1. The cross-entropy decreases due to the possibility of a maximum value of probability for an accurate label. Inception version 3 is referred to as mutual for the labels that are free of trained instances v(k) with a parameter € representing a training sample; then, the share label s(k|z) is easily returned as: In another case, the cross-entropy is computed as below: Therefore, the regularization for label smoothing is similar for the implication of cross-entropy loss h(s,p) and losses as h(s,p) and h(v,p).
Google Net is known as an Inception network due to its aim of acting similarly [43]. It comprises various versions, such as V1, V2, V3, V4, and Incep-ResNet. Moreover, the Inception network consists of three varying sizes of max-pooling and convolutional layers. Various channels pass through the network layers and, after convolutional operation, the non-linear fusion method is employed. In Figure 2, the basic architecture of the Inception network is shown. Version 3 of the Inception network employed by Keras is pre-trained on ImageNet. The size required for images is 299 × 299 with three channels. Comparatively, Inception V3 utilizes a convolutional kernel method to split integrals into the tiniest convolution. In any instance, 3 × 3 conv. is split into 1 × 3 and 3 × 1 convolutions. The number of attributes is limited; therefore, the network speed is enhanced while effectively extracting spatial features. The three different grid sizes are 8 × 8, 17 × 17, and 35 × 35. The structure of the basic Inception model is presented in Figure 3.

Learning Rate Scheduler
During the deep learning training stage, limiting the learning rate ∇ is suggested when the training accuracy increases. The number of parameters is improved during training, which is referred to as the learning rate (LR) or step size. More particularly, LR adjusts the hyper-parameter, which is employed to train the neural network, utilizing values from 0 to 1. The minimum values for the learning rate require more epochs for training, and the weights alterations are reduced. On the other hand, the more variations there are in LR, the less training epochs that are required by the neural network. The maximum LR creates a divergent training procedure, whereas minimum LR causes a slow convergence process. The procedure used to schedule LR is known as learning rate scheduling. There are various general LR schedulers, such as step, time, and exponential decay. An Adam optimizer is an estimator for moments that employ a method based on first-order gradient. It also relates to the adaptive prediction of low-order moments. Where refers to gradients, represents the weight at time T, and are 0 and 1, and represents LR.

Fusion Process
Features fusion is utilized in various machine learning and computer vision applications, such as medical imaging [40,41]. It provides a vital process that incorporates most

Learning Rate Scheduler
During the deep learning training stage, limiting the learning rate ∇ is suggested when the training accuracy increases. The number of parameters is improved during training, which is referred to as the learning rate (LR) or step size. More particularly, LR adjusts the hyper-parameter, which is employed to train the neural network, utilizing values from 0 to 1. The minimum values for the learning rate require more epochs for training, and the weights alterations are reduced. On the other hand, the more variations there are in LR, the less training epochs that are required by the neural network. The maximum LR creates a divergent training procedure, whereas minimum LR causes a slow convergence process. The procedure used to schedule LR is known as learning rate scheduling. There are various general LR schedulers, such as step, time, and exponential decay. An Adam optimizer is an estimator for moments that employ a method based on first-order gradient. It also relates to the adaptive prediction of low-order moments. Where refers to gradients, represents the weight at time T, and are 0 and 1, and represents LR.

Fusion Process
Features fusion is utilized in various machine learning and computer vision applications, such as medical imaging [40,41]. It provides a vital process that incorporates most

Learning Rate Scheduler
During the deep learning training stage, limiting the learning rate ∇ t is suggested when the training accuracy increases. The number of parameters is improved during training, which is referred to as the learning rate (LR) or step size. More particularly, LR adjusts the hyper-parameter, which is employed to train the neural network, utilizing values from 0 to 1. The minimum values for the learning rate require more epochs for training, and the weights alterations are reduced. On the other hand, the more variations there are in LR, the less training epochs that are required by the neural network. The maximum LR creates a divergent training procedure, whereas minimum LR causes a slow convergence process. The procedure used to schedule LR is known as learning rate scheduling. There are various general LR schedulers, such as step, time, and exponential decay. An Adam optimizer is an estimator for moments that employ a method based on first-order gradient. It also relates to the adaptive prediction of low-order moments. Where g T refers to gradients, ϑ T represents the weight at time T, β i and β j are 0 and 1, and Q represents LR. g 2 T represents the square of g T × g T . The initial settings were Q = 0.001, β i = 0.98, β j = 0.999, and c= 10 −7 .

Fusion Process
Features fusion is utilized in various machine learning and computer vision applications, such as medical imaging [40,41]. It provides a vital process that incorporates most of the features' maps. The proposed method is based on entropy for features fusion. Addition-ally, the attained features are merged to form a single vector. We computed three vectors here, as below: Features fusion was applied as below: Here, f denotes a fused feature vector. Afterwards, an entropy is computed for selective features based on the value, as shown below.
Here, p presents the probability of features and L he refers to an entropy. In the end, the selected features are fed to the classification network to distinguish the affected images.

Classification Using LSTM
A convolutional neural network is one of the most popular artificial neural networks (ANN) that performs mathematical linear operations on feature vectors known as convolutional [44]. CNN operates in two phases, i.e., the forward phase and the back propagation phase during training. Input and weights are multiplied with a matrix of the filter; then, a convolutional operation is performed to calculate the output, and this output is used for error computation in the forwarding phase. The parameters are adjusted during the back propagation phase to overcome the final prediction errors. Ground truth and output are compared to find errors using the cost function [45]. The gradient of the parameter is computed, and parameters are updated to minimize the error.
There are multiple layers in convolutional neural networks, such as the convolutional layer, pooling layer, normalization layer, fully connected layer, activation layer, and classification layer. CNN performs significantly in the problems related to image data. The layers' detail is shown in Table 2.

Experimental Evaluation
In this section, we discuss the environmental setup, the metrics employed for the assessment, and the various experiments used to analyze the performance of the proposed model.

Dataset
The dataset was collected from the skin lesion DermIS dataset, which is available on the Kaggle website [46]. There are two classes in the dataset: benign and malignant. Seventyfive percent of the images of each class were used to train the model, and the remaining 25% were used for validation purposes. There were 1000 images in total; 500 images belonged to the benign class and 500 to the malignant class, with dimensions of 600 × 450 pixels. Image pre-processing was performed to resize the images to dimensions of 227 × 227, and the resized images were provided as input to the image input layer. The images of the benign and malignant classes used to train our model are provided in Figure 4. The employed LSTM classifier's approach is shown in Figure 5.

Experimental Evaluation
In this section, we discuss the environmental setup, the metrics employed for the assessment, and the various experiments used to analyze the performance of the proposed model.

Dataset
The dataset was collected from the skin lesion DermIS dataset, which is available on the Kaggle website [46]. There are two classes in the dataset: benign and malignant. Seventy-five percent of the images of each class were used to train the model, and the remaining 25% were used for validation purposes. There were 1000 images in total; 500 images belonged to the benign class and 500 to the malignant class, with dimensions of 600 × 450 pixels. Image pre-processing was performed to resize the images to dimensions of 227 × 227, and the resized images were provided as input to the image input layer. The images of the benign and malignant classes used to train our model are provided in Figure 4. The employed LSTM classifier's approach is shown in Figure 5.

Metrics
To assess the proposed model, we utilized various metrics, such as precision, recall, accuracy, and F1-score. These metrics relied on true-positive (TP), false-positive (FP), truenegative (TN), and false-negative (FN). TP refers to the number of correctly classified im-

Metrics
To assess the proposed model, we utilized various metrics, such as precision, recall, accuracy, and F1-score. These metrics relied on true-positive (TP), false-positive (FP), true-negative (TN), and false-negative (FN). TP refers to the number of correctly classified images by our proposed model; FP refers to the number of images that were incorrectly classified as other malignant images; FN denotes the number of diseased images that were incorrectly classified as normal; and TN refers to the number of images that were correctly classified as a negative class, such as normal or benign. Furthermore, precision refers to the fraction of TP over the total images classified as positive. The mathematical equation is given below.
The accuracy of the system indicates the correctly classified images by the proposed system. The equation is presented below.
The recall is the fraction of the classified positive class images to all images of the positive class, whether they were classified as a negative class by the system. The recall value closer to 1 refers to the better model. The recall equation is given below.
Another metric used for the proposed system was the F1-score. This is defined as a measure of the accuracy of the proposed model over the dataset. It is employed for binary classification models. The equation of the F1-score is given below.

Environmental Setup
We performed the experiments using a GPU NVIDIA card, i.e., GEFORCE GTX with 4 GB memory. The details of the employed hardware are shown in Table 3. The operating system was Windows 10 with a 16 GB RAM. The experiment was performed on the Anaconda framework.

Results
In this section, we discuss the results achieved by our proposed model during the testing phase. As shown in Figure 6, the ROC curve exhibits the significant performance of the proposed model. We attained 99.4% accuracy, 98.7% precision, 98.66% recall, and a 98% F-score. More specifically, we employed 750 images to train the proposed technique, including 375 images from the malignant class and 375 images from the benign class. As we employed transfer learning-based algorithms, we achieved considerable results while only using a small number of training samples. Then, we tested our proposed model over the remaining 30% of test data, i.e., 250 images belonging to the malignant and benign classes, respectively. We achieved 99.4% accuracy over the testing data, which demonstrates the tremendous performance of the proposed system. The number of TPs were 123, and TNs were also 123 by our proposed system. In particular, we believe that our proposed model has a significant ability to detect skin cancer in images and classify it into the malignant class. Furthermore, it also classifies non-affected images as benign images. of the proposed model. We attained 99.4% accuracy, 98.7% precision, 98.66% recall, and a 98% F-score. More specifically, we employed 750 images to train the proposed technique, including 375 images from the malignant class and 375 images from the benign class. As we employed transfer learning-based algorithms, we achieved considerable results while only using a small number of training samples. Then, we tested our proposed model over the remaining 30% of test data, i.e., 250 images belonging to the malignant and benign classes, respectively. We achieved 99.4% accuracy over the testing data, which demonstrates the tremendous performance of the proposed system. The number of TPs were 123, and TNs were also 123 by our proposed system. In particular, we believe that our proposed model has a significant ability to detect skin cancer in images and classify it into the malignant class. Furthermore, it also classifies non-affected images as benign images.

Comparison with Segmentation-Based Methods
The aim of [47] was the detection of skin cancer from images using segmentation and feature extraction methods. Image pre-processing was employed to improve the quality of images, and a median filter was applied to reduce the noise. A median filter of a 5 × 5 windows size was used after morphological operations erosion, and dilation was performed to remove extra artifacts such as hair and skin color. Binary mask and threshold techniques were applied to obtain the desired area of the image. The asymmetry, border irregularity, color, and diameter of the area were selected for the detection of skin cancer. The proposed method detected skin cancer with 90.83% accuracy. Manu Gofal [48] performed lesion boundary segmentation from dermoscopic images with the deep learning method. The PH2 and ISIC 2017 datasets were used for the segmentation of skin lesions, and image preprocessing was performed to reduce the computational cost and improve performance. The performance measures were sensitivity, specificity, and accuracy, achieving: 98%, 92%, and 93% respectively. The goal of [49] was the automatic melanoma class segmentation of skin cancer using deep learning. The U-Net architecture of the convolutional neural network was applied for better segmentation of skin lesions. Holes, disconnected regions, and loose objects were still present after U-Net segmentation; therefore, post-processing was performed by applying morphological operations dilation. Noise and extra artifacts such as hair and skin color were removed from images for better segmentation. The proposed methodology achieved 96% accuracy, 98% specificity, and 93% sensitivity, whereas our proposed model attained 99.4% accuracy, 98.7% precision, 98.66% recall, and a 98% Fscore. A comparison of the proposed CNN with existing segmentation-based and other techniques is presented in Figure 7. The figure clearly shows that our proposed model achieves remarkable results compared to existing segmentation-based techniques.
Noise and extra artifacts such as hair and skin color were removed from images for better segmentation. The proposed methodology achieved 96% accuracy, 98% specificity, and 93% sensitivity, whereas our proposed model attained 99.4% accuracy, 98.7% precision, 98.66% recall, and a 98% F-score. A comparison of the proposed CNN with existing segmentation-based and other techniques is presented in Figure 7. The figure clearly shows that our proposed model achieves remarkable results compared to existing segmentationbased techniques.

Comparison with DL-Based Methods
In this section, we compare our proposed features fusion method for skin cancer detection with existing DL-based methods. N. Rezaoana classified nine classes-NV, DF, MEL, VASC, BCC, AKIEC, BKL, squamous carcinoma, and seborrhea keratosis-using

Comparison with DL-Based Methods
In this section, we compare our proposed features fusion method for skin cancer detection with existing DL-based methods. N. Rezaoana classified nine classes-NV, DF, MEL, VASC, BCC, AKIEC, BKL, squamous carcinoma, and seborrhea keratosis-using the CNN model. Data augmentation on images, such as flip, rotation, shear, etc., were performed to increase the volume of data for training. Convolutional layers were used for feature extraction, a max pooling layer was utilized to reduce the dimensionality, and softmax was employed after pairing of the fully connected layer as an activation function [30]. Precision, F1-score, recall, and accuracy were used as performance measurements of the model. VGG-16 and VGG-19 were also analyzed and achieved an accuracy of 69.57% and 71.19%, respectively, whereas the proposed model achieved an accuracy of about 79.45%. The objective of [50] was the detection of skin lesions from skin images using an artificial neural network. The PH2 dataset was utilized, containing 40 images from the melanoma class and 160 images from the benign class. Image pre-processing was performed for the removal of noise and accurate segmentation. Moreover, area, shape, and centroid features were extracted from the images, and these features were fed to the artificial neural network for the detection of skin cancer, attaining 98% accuracy. Y. Filali and H. El Khoukhi focused on the classification of a skin lesion by decomposing the images into object components and texture. Segmentation was applied to objects to obtain the desired area, then the segmented area and texture were combined. A convolutional layer was used for feature extraction to learn the hidden patterns [29]. A pooling layer was used after the convolution layer to reduce the spatial size. Furthermore, fully connected layers were added to use the extracted features for the classification of skin lesions. Softmax was employed as an activation function to classify melanoma as malignant and nevus as benign. The classification results were evaluated through accuracy, specificity, and sensitivity. The proposed method achieved an accuracy of 93.50%. A comparison of accuracy with the existing DL-based and ML-based models is shown in Table 4. The plot of the comparison is shown in Figure 7.

Cross-Validation
In this section, we performed cross-validation on our proposed system for skin lesion detection. We employed a different dataset from Kaggle for this purpose, namely International Skin Image Collection (ISIC). The dataset consists of 1800 images from the malignant class and 1497 images from the benign class. We utilized 500 images from each class to analyze the performance. Our proposed techniques classified 493 images from the malignant class and 491 images from the benign class. The confusion matrix is shown in Figure 8. The figure clearly shows that our proposed model achieved 98.4% accuracy over cross-validation. The results show that our proposed model is a robust technique that can detect skin cancer in images at an early stage and with high precision. Therefore, it is concluded that our proposed model outperforms the existing methods for skin disease detection. detect skin cancer in images at an early stage and with high precision. Therefore, it is concluded that our proposed model outperforms the existing methods for skin disease detection.

Discussion
The above-mentioned experimental results indicate that our proposed skin lesion detector is a more effective system than existing techniques. In [53], the authors employed the fast random forest (FRF) algorithm to identify the area affected by malignant melanoma. They achieved 17% precision discordance with the pathologist's results. Contrastingly, in [54], the swarm optimization technique was utilized to extract the region of interest from the dermoscopy images. Then, speeded-up robust features were extracted

Discussion
The above-mentioned experimental results indicate that our proposed skin lesion detector is a more effective system than existing techniques. In [53], the authors employed the fast random forest (FRF) algorithm to identify the area affected by malignant melanoma. They achieved 17% precision discordance with the pathologist's results. Contrastingly, in [54], the swarm optimization technique was utilized to extract the region of interest from the dermoscopy images. Then, speeded-up robust features were extracted and images were classified by employing CNN. The classification accuracy was 98.42% and the precision was 97.73%. Both methods achieved significant results; however, they failed in the case of unseen tiny skin lesions. Moreover, our proposed model attained 99.4% detection accuracy and 5% discordance with the pathologist's results. We believe that the hybrid nature of our proposed system for feature extraction-i.e., ML and DL-based methods-make our system capable of extracting the most representative features from skin lesions. We used an Adam optimizer to adjust the learning rate that optimized the training phase of the proposed classifier. Furthermore, the LSTM classifier reduces the complexity by minimizing the requirement to update the weights rather than convolution-based neural networks. Thus, our proposed model is an efficient skin lesion detector achieving 98.7% precision, 98.66% recall, and a 98% F-score. The proposed system identifies skin lesions with significant accuracy and outperforms the existing skin lesion detection methods.

Conclusions
In this study, a novel and robust skin cancer detection model was proposed based on features fusion. In the first stage, our proposed model pre-processed the images using a GF filter to remove the noise from the skin images. Then, features were extracted by employing LBP for manual features extraction and Inception V3 for automatic features extraction. Aside from this, an Adam optimizer was utilized for the adjustments of the learning rate. In the end, an LSTM network was utilized on fused features for the classification of skin cancer into two classes: malignant and benign. We utilized the skin lesion DermIS dataset available on the Kaggle website, consisting of 1000 images, out of which 500 belong to the benign class and 500 to the malignant class. The proposed methodology attained 99.4% accuracy, 98.7% precision, 98.66% recall, and a 98% F-score. We evaluated our proposed model and compared the performance with existing segmentation-based and DL-based techniques. The results show that our method provided significant results compared to existing techniques. In the future, we aim to employ our proposed model for multi-classification for skin cancer detection, such as squamous cell carcinoma, basal cell carcinoma, Kaposi's sarcoma, Merkel cell carcinoma, etc.

Data Availability Statement:
The data used for this study are publicly available. Further, any queries related to the dataset may be directed to the authors.