Classification of Skin Disease Using Deep Learning Neural Networks with MobileNet V2 and LSTM

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.


Introduction
The skin is the largest organ in the human body, consisting of the epidermis, dermis, subcutaneous tissues, blood vessels, lymphatic vessels, nerves, and muscles. Skin can prevent lipid deterioration in the epidermis with liquid such that the skin barrier feature can be improved. Skin diseases can arise because of fungal development over the skin, hidden bacteria, allergic reactions, microbes affecting the skin's texture, or creating pigment [1]. Skin illnesses are chronic and occasionally may grow into malignant tissues. To minimize their development and proliferation, skin diseases must be treated immediately [2]. Research on procedures to identify the effects of diverse skin diseases based on imaging technology is now mainly in demand. Several skin diseases exhibit symptoms that might take considerable effort to treat such patients as they grow for months before they are diagnosed. the data with the available arbitrary memory, unlike most of the neural network models that need an auxiliary memory for processing. However, RNN is comparatively slow due to heavy computational needs, and FRNN requires a tremendous effort in classifying the patterns from the image data and consumes noticeable computational time [6].
The image is classified based on intensity though a statistical approach, namely Gray Level Co-occurrence Matrix (GLCM) extracts the features that appear in the acquired image, usually the textured-based parameters [41]. GLCM determines the instance amplitude tabulation concerning a particular combination of attributes of intensity values in an image. However, GLCM needs considerable computational efforts, and characteristics are not invariant with rotation and texture changes [42].
Bayesian classification is among the approaches used in skin disease classification [43]. The approach is used in the classification of the image among the various trained disease image datasets. Still, the Naïve Bayes classification fails in independent predictors; the zeroprobability problem makes it challenging to implement in the multi-objective-based domain. The Naïve Bayes classifiers are not suitable to handle unsupervised data classification [44]. The Decision Tree [45] algorithm is a widely used approach for skin disease classification, prediction of lower limbs ulcers and cervical cancer. The Decision Tree model needs a tremendous amount of training and a considerable accuracy level. A small change in the input data would result in an exponential change in the outcome and make the model insatiable. Additionally, the model needs comparatively more memory, and resultantly the Decision Tree model needs more computational time [46].
K-Nearest Neighbor (KNN) [47] is the predominantly used classification model widely used in forecasting and predictive models. The models do not need training of the model. Moreover, the accuracy of the KNN model is considerably high [48]. The KNN models are not appropriate to use with larger-size data models, as it may take a significant time in performing the predictions of the outcome. In addition, the model performs poorly when working with high dimensional data with inappropriate feature information, which might impact the performance of the model in accurate predictions [49], which has made the model inappropriate for the skin disease classification.
Skin disease classification through the ensemble models [50] yields higher accurate outcomes by combining multiple prediction models. Ensemble models have an overfitting issue, and the ensemble model fails to work with unknown discrepancies between the considered sample and population [51,52]. Deep Neural Network model-based skin disease classification [53,54] has exhibited a notable performance in classifying skin diseases. Still, the experimental studies have shown that the model is not suitable for multi-lesion images. Deep Neural Network models need a considerable training level to attain a reasonable accuracy that requires more computational time.
Cross correlation-based model for classification of the feature extraction [55], where both the spatial and the frequency features are considered for feature selection using visual coherency. The cross-correlation models are robust against the background fluctuations. Resultantly, the predictions are more accurate. Additionally, working in the frequency domain needs considerable effort in creating the experimental setup and obtaining the results.
The proposed model is associated with the mobile application, and there are many other such experimental applications designed for the ease of assessment of the diseases. Lee, H.Y. et al. [56] presented the influence of text messaging on the benefits of human papillomavirus (HPV) vaccination and noticed a sharp rise in HPV vaccine consumption in targeted communities. In another study proposed by Weaver et al. [57] to address screening intake, cancer screening services have also used text messages. Ijaz et al. [58] proposed a model on IoT for healthcare for patients to access remotely and utilize the healthcare gadgets to analyze and monitor their health through bio-medical signals and intimately model the healthcare professionals in case of an emergency. Table 1 summarizes the various machine and deep learning approaches for image classification.

Reference
Approach Objective Challenges of the Approach [22] Morphological Operations Morphological operations involve the dilation and erosion that are efficient in identifying the image features that help determine the abnormality. It works through the structuring element.
Identifying the optimal threshold is crucial and not suitable for analyzing the disease region's growth through morphology operations. The process of applying the structuring elements for the skin disease classification does not yield an accurate result.
[48] K-Nearest Neighborhood KNN based model works without the training data in classifying the data through the feature selection and similarity matching for categorizing the data. It works through the distance measure as the mode of identifying the correlation among the selected features.
KNN-based classification model, the accuracy of the outcome is directly dependent on the quality of underlying data. Additionally, in the case of a larger sample size, the prediction time might be significantly high. The KNN model is subtle to the inappropriate features in the data. [20,24,59] Genetic Algorithm The genetic algorithm relies more on a probabilistic approach by randomly selecting the initial population. It performs the crossover and the mutation operations simultaneously until it reaches a suitable number of segments.
The Genetic Algorithm does not guarantee the global best solution and too much time to converge. [28,60] Support Vector Machine Support Vector Machine is efficient in handling the high dimensional data with minimal memory consumption. Support Vector Machine approach is not appropriate for noisy image data and identifying the feature-based parameters is a challenging task. [31,35] Artificial Neural Networks Artificial Neural Networks are efficient in recognition non-linear associations among the dependent and independent parameters by storing the data across the network nodes.
Artificial Neural Network models are efficient in handling the contexts like inadequate understanding of the problem. However, the approach there is a chance of missing the image's spatial features, and diminishing and exploding the gradient is a significant concern.
[32 In FTNN approach, when the elements are fed with new weights, it forgets the previously associated weight that may impact the outcome. Gray Level Co-occurrence Matrix (GLCM) is a statistical approach that performs the object's classification by analyzing spatial association among the pixels based on the pixel texture.
The GLCM approach needs considerable computational efforts, and characteristics are not invariant with rotation and texture changes. [43,44] Bayesian classification The Bayesian classification-based approach efficiently handles discrete and continuous data by ignoring the inappropriate features for both the binary and multi-class classifications.
The Bayesian Classifier is not suitable for handling the unsupervised data classification, fails in independent predictors, and is widely known as an inappropriate probabilistic model. [45,46] Decision Tree Decision Tree-based models are used in handling both the stable and discrete data that performs the prediction through a rule-based approach. It is proven to be productive in managing non-linear parameters.
In Decision Tree models, a small change in the input data would result in an exponential growth in the outcome makes the model unstable. Overfitting is the other issue associated with the decision tree-based models.
[ [50][51][52] Ensemble models Ensemble models are proven to be better prediction models with a combination of various robust algorithms. They are efficient in analyzing both the linear and complex data patterns by combining two or more complex models.
Ensemble models do have the overfitting issue, and the ensemble model fails to work with unknown discrepancies. The model minimizes the understandability of the approach. [53,54] Deep Neural Networks Deep Neural Networks-based models can work with structured and unstructured data. The models can still be able to work with unlabeled data and can yield a better outcome.
The models like the Inception V3 model [62,63] is used in classifying skin disease. On experimentation, the authors have found the model is not suitable for the disease with multiple lesions.

Methodology
In this section, integrating the LSTM with the MobileNet V2 is explained with an architecture diagram. MobileNet V2 is used in classifying the type of skin disease, and LSTM is used to enhance the performance of the model by maintaining the state information of the features that it comes across in the previous generation of the image classification.

MobileNet Architecture Model for Image Classification
As opposed to MobileNet V2 [63], MobileNet [4] is a CNN-based model that is extensively used to classify images. The main advantage of using the MobileNet architecture is that the model needs comparatively less computational effort than the conventional CNN model that makes it suitable for working over mobile devices and the computers that work over lower computational capabilities [64][65][66]. The MobileNet model is a simplified structure that incorporates a convolution layer that can be used in distinguishing the detail that relies on two manageable features that switch among the parameter's accuracy and latency effectively. The MobileNet model is advantageous in reducing the network size [67].
MobileNet [68] architecture is equally efficient with a minimum number of features, such as Palmprint Recognition [17]. The architecture of MobileNet is depth-wise [69]. The fundamental structure is based on different abstraction layers, a component of different convolutions that appear to be the quantized configuration that assesses a regular problem complexity in-depth. The complexity of 1 × 1 is called point-wise complexity. Platforms to make in-depth are designed to have abstraction layers with structures in-depth and point through a standard, rectified linear unit (ReLU). The resolution multiplier variable ω is added to minimize the dimensionality of the input image and each layer's internal representation with the same variable.
The feature vector map of size F m × F m and the filter is of size F s × F s the input variable is denoted by p, and the output variable is recognized as q. For the core abstract layers of the architecture, the overall computation efforts are represented by the variable c e and may be assessed through the following Equation (1): The multiplier value is context-specific, and for the experimental analysis in skin disease classification, the value of multiplier ω is considered to be in the range 1 to n. The value of the variable resolution multiplier identified by α is deemed to be 1. The computational efforts are recognized through the variable cost e can be assessed through Equation (2) stated below: The proposed model incorporates the depth-wise, and point-wise convolutions are bounded by the depletion variable identified by the variable d that is approximated through the Equation (3) stated below: The two hyper-features width multiplier and the resolution multiplier help adjust the optimal size window for accurate prediction based on the context [70]. In the proposed model, the input size of the image is 224 × 224 × 3. The first two values (224 × 224) indicate the height and width of the image. These values should always be greater than 32. The third value suggest that it has 3 input channels. The proposed architecture has 32 filters, and the filter size is 3 × 3 × 3 × 32 [71].
The principle underneath the MobileNet architectures is to substitute complicated convolutional layers in which each layer comprises a convolutionary layer of size 3 × 3 that buffers the input data, accompanied by a convolutional layer of size 1 × 1 pointwise that incorporates these filtered parameters to build a new component as shown in Figure 1. The concept mentioned above is to simplify the model and make it faster than the ordinary convolutional model. and point through a standard, rectified linear unit (ReLU). The resolution multiplier variable is added to minimize the dimensionality of the input image and each layer's internal representation with the same variable.
The feature vector map of size × and the filter is of size × the input variable is denoted by , and the output variable is recognized as . For the core abstract layers of the architecture, the overall computation efforts are represented by the variable and may be assessed through the following Equation (1): The multiplier value is context-specific, and for the experimental analysis in skin disease classification, the value of multiplier is considered to be in the range 1 to n. The value of the variable resolution multiplier identified by is deemed to be 1. The computational efforts are recognized through the variable can be assessed through Equation (2) stated below: The proposed model incorporates the depth-wise, and point-wise convolutions are bounded by the depletion variable identified by the variable that is approximated through the Equation The two hyper-features width multiplier and the resolution multiplier help adjust the optimal size window for accurate prediction based on the context [70]. In the proposed model, the input size of the image is 224 × 224 × 3. The first two values (224 × 224) indicate the height and width of the image. These values should always be greater than 32. The third value suggest that it has 3 input channels. The proposed architecture has 32 filters, and the filter size is 3 × 3 × 3 × 32 [71].
The principle underneath the MobileNet architectures is to substitute complicated convolutional layers in which each layer comprises a convolutionary layer of size 3 × 3 that buffers the input data, accompanied by a convolutional layer of size 1 × 1 pointwise that incorporates these filtered parameters to build a new component as shown in Figure  1. The concept mentioned above is to simplify the model and make it faster than the ordinary convolutional model.

Design Model MobileNet
The MobileNet V2 architecture comprises the residual layer with a stride of 1 and the downsizing layer with a stride of 2 alongside the ReLu component. The architecture of the same is represented in Figure 1.
Both residual and downsizing layer encompass 3 sub-layers each.
• The 1 × 1 convolution with the ReLu6 is the first layer. • Depth-Wise Convolution is the second layer in the architecture. The Depth-Wise layer adds a single convolutional layer that performs a lightweight filtering process. • 1 × 1 convolution layer without non-linearity is the third layer in the proposed architecture. In the third layer, the ReLu6 component is used in the output domain.

•
ReLu6 is used to ensure the robustness used in low-precision situations and improvise the randomness of the model.

•
All the layers have the same quantity of output channels within that overall sequence.

•
The filter of size 3 × 3 is common for contemporary architecture models, and dropout and batch normalization are used during the training phase.

•
There is a residual component to support the gradient flow across the network through batch processing and ReLu6 as the activation component.
In Figure 2, the symbol σ represents the sigmoid layer, Hyperbolic tangent (tanh) is the layer for the non-linearity layer. cs t−1 designates the current cell state, and cs t is in concern to the next cell state. γ t−1 designates the present hidden component and γ t represents the next hidden state. X designates the scaling of the data, and the symbol + is for summation of the data.

Design Model MobileNet
The MobileNet V2 architecture comprises the residual layer with a stride of 1 and the downsizing layer with a stride of 2 alongside the ReLu component. The architecture of the same is represented in Figure 1.
Both residual and downsizing layer encompass 3 sub-layers each.
• The 1 × 1 convolution with the ReLu6 is the first layer.

•
Depth-Wise Convolution is the second layer in the architecture. The Depth-Wise layer adds a single convolutional layer that performs a lightweight filtering process. • 1 × 1 convolution layer without non-linearity is the third layer in the proposed architecture. In the third layer, the ReLu6 component is used in the output domain.

•
ReLu6 is used to ensure the robustness used in low-precision situations and improvise the randomness of the model.

•
All the layers have the same quantity of output channels within that overall sequence.

•
The filter of size 3 × 3 is common for contemporary architecture models, and dropout and batch normalization are used during the training phase.

•
There is a residual component to support the gradient flow across the network through batch processing and ReLu6 as the activation component.
In Figure 2, the symbol represents the sigmoid layer, Hyperbolic tangent (tanh) is the layer for the non-linearity layer.
designates the current cell state, and is in concern to the next cell state.
designates the present hidden component and represents the next hidden state.
designates the scaling of the data, and the symbol + is for summation of the data.

MobileNet V2 with LSTM
LSTM [16] is the component that is extensively used with recurrent neural network architectures. It is capable of reliance on its learning sequence on pattern estimation problems. Memory blocks are managed by memory cells that comprise an input and outlet gate, a forgotten gate, and a window connection encompassed in the abstract LSTM layer module. The calculations describe the activation function for the persistent abstract LSTM memory module. The LSTM module encompasses memory. The state is interpreted as at the time t over the hidden state vector of the input: Output Gate:

MobileNet V2 with LSTM
LSTM [16] is the component that is extensively used with recurrent neural network architectures. It is capable of reliance on its learning sequence on pattern estimation problems. Memory blocks are managed by memory cells that comprise an input and outlet gate, a forgotten gate, and a window connection encompassed in the abstract LSTM layer module. The calculations describe the activation function for the persistent abstract LSTM memory module. The LSTM module encompasses memory. The state is interpreted as P t at the time t over the hidden state vector v t of the input: Output Gate : Forget Gate : From Equations (4)-(8), the variable i t is the input to the LSTM block at the time 't'. The weights W iα , W iβ , W i f , W ics are associated with input gate, output gate, forget gate, and cell stated gate, respectively. W γα , W γβ , W γ f are the weights associated with the hidden recurrent layer. The integration model is shown in Figure 3.
Cell State Gate: LSTM outcome: From Equations (4)- (8), the variable is the input to the LSTM block at the time 't'. The weights , , , are associated with input gate, output gate, forget gate, and cell stated gate, respectively. , , are the weights associated with the hidden recurrent layer. The integration model is shown in Figure 3. Figure 3 presents the overall architecture of the MobileNet V2 with the LSTM model with a combination of set of convolutions and max pooling layers and the LSTM component that is attached to the flattening layer of the model. The fully connected layer that performs the correlation of the identified features with the pre-existing data through training. Finally, the softmax layer that determines the probabilities of various classes of diseases.

Grey-Level Correlation Matrix
One strategy of texture attribute extraction is the Grey-Level Co-occurrence Matrix (GLCM) [72] approach with the localized intensity coefficient's recurring sequence. GLCM gives the spatial distribution structure of the color and intensity of the pixel, which is determined by the distribution of intensity levels within the window. GLCM focuses on intensity histogram tabulation for a mutation of various pixel intensity values in an image. The association among the two pixels i.e., reference and neighbor pixel through GLCM model using the Equation (9). The variable Om designates the occurrence matrix of dimension m x m, where m represents the image's grey levels: [ , ] = (9)

Grey-Level Correlation Matrix
One strategy of texture attribute extraction is the Grey-Level Co-occurrence Matrix (GLCM) [72] approach with the localized intensity coefficient's recurring sequence. GLCM gives the spatial distribution structure of the color and intensity of the pixel, which is determined by the distribution of intensity levels within the window. GLCM focuses on intensity histogram tabulation for a mutation of various pixel intensity values in an image. The association among the two pixels i.e., reference and neighbor pixel through GLCM model using the Equation (9). The variable Om designates the occurrence matrix of dimension m × m, where m represents the image's grey levels: In the Equation (9) stated above, the variable m ij denotes the histogram of the intensity value (i, j) at the dimension m of the image. The components of the occurrence matrix are normalized through Equation (10): By normalization, matrix components have a dimension scale from 0 to 1 that can be modified as a function of likelihood. The variable (k, m) represents the number of elements dimensions of the feature vector that is a set of number of elements and the dimensions, the feature vector can be assessed through Equation (11): The GLCM approach is used in approximating the disease growth based on the obtained texture-based information. The GLCM is used in evaluating the skin disease of the proposed model.

Implementation Platform
This experiment was performed on an online compiler named Kaggle [73] with an Intel core ™i7-8550U CPU @ 1.99 GHz accelerated by RADEON (TM) 530 Graphics 8 Gb memory. In the implementation process, on training with the model with a tremendous amount of data for better accuracy, the ordinary CPU might take considerable execution time. To overcome that, a GPU accelerator is used to build the model to save a large amount of time. The in-depth learning approach, represented in our paper, is built using the PyTorch Deep Learning framework [74].

Libraries
The libraries used in our model are NumPy, pandas, os, matplotlib. pyplot, shutil, seaborn, and torchvision as stated by Declan V. [75]. The Matplotlib, pyplot, and Seaborn libraries are used for image operations and plotting, such as graphs, charts, and tables. The Shutil and os libraries offer path and directory operations on files and the collection of files. For model building such as classification report, ROC curve, and confusion matrix, we import the torchvision and seaborn libraries. The numpy and pandas are the most popularly used libraries for array processing and data analysis (series and data frames).

Dataset Description
The dataset plays a crucial role in the training of our proposed neural networks for automated diagnosis. The dataset named HAM10000 is the skin disease dataset that has been extracted from the Kaggle, which has served as a benchmark database downloaded from the source [76]. The dataset comes in metadata format such as comma-separated values file (.CSV), consisting of age, gender, and cell type. This dataset contains more than 10,000 dermatoscopic images that are collected from different people around the world. The dataset also provides additional tips and tricks to overcome certain challenges such as overfitting and limited data, which will help in increasing the model's accuracy and performance. In this dataset, we have seven different types of skin problems in our dataset, namely Melanocytic Nevi (NV), Benign Keratosis-like Lesions (BKL), Dermatofibroma (DF), Vascular Lesions (VASC), Actinic Keratoses, and Intraepithelial Carcinoma (AKIEC), Basal Cell Carcinoma (BCC), and Melanoma (MEL). There is an imbalance in the number of skin images in each type of lesion present in the dataset. To avoid this imbalance, we performed data augmentation techniques to balance all types of lesions to the same range of images. The dataset is divided into three parts: training data, validation data, and testing data of 85%, 5%, and 10%, respectively, to enhance our model's generalization. The model is evaluated against the ground facts that are associated with the training dataset. The target size of the images for our proposed model is 224 × 224. This research aims to determine the accuracy in diagnosing skin cancer on dermatoscopic images using our proposed approach.

Results and Discussion
In this section, the results of the proposed model are discussed in detail. The proposed MobileNet V2 with LSTM performance is evaluated through the hyperparameters like training and validation loss measures that determine the proposed model's capabilities. The proposed model's learning rate at various training levels is discussed in the current section. The performance evaluation with other existing approaches in terms of Sensitivity, Specificity, Accuracy, Jaccard Similarity Index (JSI), and Mathew Coefficient Correlation (MCC) are presented. The proposed model's computational time is evaluated as a part of performance evaluation and compared against the existing approaches on performing the classification over similar data.

Performance Evaluation of Proposed Model
The experiment is carried out on the dataset discussed in Section 3. The proposed model's results on implementation and the statistical analysis through various performance evolution metrics that include, but are not limited to, accuracy measures determine how many times the proposed MobileNet V2 model with the LSTM model is successfully classifying the skin disease.
To make a reasonable contrast among various approaches concerning the implementation configurations, the authors decided to standardize pivotal parameters throughout all the studies. Table 2 represents the parameters that are considered in the implementation of the proposed model. At first, the experiment was performed over several images, and the type of disease is assessed through the proposed MobileNet V2 with the LSTM approach. The outcome of the experiment is shown in Figure 4. The charts next to the skin images in Figure 5 of the experimental outcome represent the percentage of confidence that the disease was observed in the corresponding images of a particular class of disease trained previously. The actual type of disease based on the actual ground facts is also presented. For akiec, bcc, and mel classes, the result appears to be precise. The predicted confidence is on par with the ground reality. The akiec class holds the confidence of 74.32%, 55.2% more than the peer classes. On the other hand, both the mel and bcc class instances are ideally classified with 84.12% and 96.63% confidence, respectively. the experimental outcome represent the percentage of confidence that the disease was observed in the corresponding images of a particular class of disease trained previously. The actual type of disease based on the actual ground facts is also presented. For akiec, bcc, and mel classes, the result appears to be precise. The predicted confidence is on par with the ground reality. The akiec class holds the confidence of 74.32%, 55.2% more than the peer classes. On the other hand, both the mel and bcc class instances are ideally classified with 84.12% and 96.63% confidence, respectively.  The graphs represented in Figure 6 are obtained from the initial trained model, where the training model loss is better than the validation loss. The left graph indicates the number of batches processed versus loss obtained during the training and the validation phases. The batch size value in the initial model is 100, which is used to speed up the training data. The training and validation loss alongside the learning-rate is presented in Figure 7, and they are significant in determining the overfitting and underfitting of the proposed model. When the validation loss is ahead of the training loss, the model may end up overfitting, and when they are almost equal, it would be an under-fitting problem.  The graphs represented in Figure 6 are obtained from the initial trained model, where the training model loss is better than the validation loss. The left graph indicates the number of batches processed versus loss obtained during the training and the validation phases. The batch size value in the initial model is 100, which is used to speed up the training data. The training and validation loss alongside the learning-rate is presented in Figure 7, and they are significant in determining the overfitting and underfitting of the proposed model. When the validation loss is ahead of the training loss, the model may end up overfitting, and when they are almost equal, it would be an under-fitting problem. The fact we observed is that the accuracy in predicting the input skin images is slightly distorted. The right graph represents the learning rate versus loss obtained. This non-linear graph resulted in lower values at specific points, challenging, leading to higher epochs and increasing the time complexity. Figure 6 with graphs and outputs is observed from the trained model before improvements of the training data, and Figure 7 presents the results that were obtained from the trained model after the slight improvements in terms of epochs, batch size, and data augmentation values. The batch size is reduced from 100 to 50 in order to reduce the computational time and also overcome the lower generalization results and higher loss values. The epochs value was increased by 20 to gain more accuracy. The data augmentation is also performed to reduce the over fitting while training and minimizing the error rate. The batch size is kept more for speedup of the previous model's training data, which ended up getting lower generalization results. The graph represents the loss values versus batches processed in which we got higher loss values compared to the improvised model. Even the learning rate of the previous model is comparatively low when compared to the final model. The learning rate is the hyper-parameter that determines the weight of the network component. If the learning rate is too low, it becomes a challenging task and can also lead the process to get stuck.
To overcome the drawbacks mentioned above, we reduced the batch size to a much smaller size to have faster convergence, resulting in better-optimized results. We increased the learning rate, which resulted in getting better outputs at training fewer epochs. Figure 8 represents the training and validation loss of the batch processing alongside the model's learning rate upon improvising the model's training. It can be observed from the graphs that the model has improvised performance at a considerable level. The proposed model's value is assessed through various performance evaluation metrics like Sensitivity, Specificity, Accuracy, JSI, and the MCC. The models mentioned above' value is assessed through the True Positive, True Negative, False Positive, and False Negative values assessed through the repeated experimentation of the proposed approach. The True Positive value is about precise identification of the region of disease; True Negative represent the preciseness of the non-disease region of the disease that is evaluated from the image captured. The False Positive represents the number of times the proposed approach fails in recognizing the class of disease accurately, and False Negative determines the number of times the proposed model misinterprets the non-disease region as the disease region.  The fact we observed is that the accuracy in predicting the input skin images is slightly distorted. The right graph represents the learning rate versus loss obtained. This non-linear graph resulted in lower values at specific points, challenging, leading to higher epochs and increasing the time complexity. Figure 6 with graphs and outputs is observed from the trained model before improvements of the training data, and Figure 7 presents the results that were obtained from the trained model after the slight improvements in terms of epochs, batch size, and data augmentation values. The batch size is reduced from 100 to 50 in order to reduce the computational time and also overcome the lower generalization results and higher loss values. The epochs value was increased by 20 to gain more accuracy. The data augmentation is also performed to reduce the over fitting while training and minimizing the error rate. The batch size is kept more for speedup of the previous model's training data, which ended up getting lower generalization results. The graph represents the loss values versus batches processed in which we got higher loss values compared to the improvised model. Even the learning rate of the previous model is comparatively low when compared to the final model. The learning rate is the hyper-parameter that determines the weight of the network component. If the learning rate is too low, it becomes a challenging task and can also lead the process to get stuck.
To overcome the drawbacks mentioned above, we reduced the batch size to a much smaller size to have faster convergence, resulting in better-optimized results. We increased the learning rate, which resulted in getting better outputs at training fewer epochs. Figure 8 represents the training and validation loss of the batch processing alongside the model's learning rate upon improvising the model's training. It can be observed from the graphs that the model has improvised performance at a considerable level. The proposed model's value is assessed through various performance evaluation metrics like Sensitivity, Specificity, Accuracy, JSI, and the MCC. The models mentioned above' value is assessed through the True Positive, True Negative, False Positive, and False Negative values assessed through the repeated experimentation of the proposed approach. The True Positive value is about precise identification of the region of disease; True Negative represent the preciseness of the non-disease region of the disease that is evaluated from the image captured. The False Positive represents the number of times the proposed approach fails in recognizing the class of disease accurately, and False Negative determines the number of times the proposed model misinterprets the non-disease region as the disease region.  The Figures 6 and 8 are the resultant hyperparameter graphs obtained on the execution of the proposed model. In either of the graphs, it can be observed that the training and the validation loss curves are close to each other, which depicts an optimal classification of the skin disease. The learning curve presents the reasonable level of learning aspect of the model.

Comparison with Past Studies
The values are evaluated on repeated execution of the proposed model with a varied training level. The performance of the proposed model is compared against a Heuristic The Figures 6 and 8 are the resultant hyperparameter graphs obtained on the execution of the proposed model. In either of the graphs, it can be observed that the training and the validation loss curves are close to each other, which depicts an optimal classification of the skin disease. The learning curve presents the reasonable level of learning aspect of the model.

Comparison with Past Studies
The values are evaluated on repeated execution of the proposed model with a varied training level. The performance of the proposed model is compared against a Heuristic Approach for Real-Time Image Segmentation (HARIS) [25], a Fine-Tuned Neural Net-works (FTNN) approach [77], a Convolutional Neural Network (CNN) [32], the VGG-19 model [78], and MobileNet models [72,79].
In evaluating the proposed model's performance, the experimentation is repeatedly executed over the auxiliary computer on repeated execution of the model. The evaluations are done in concern to the number of times the proposed model accurately classifies the skin disorder that is considered the True Positive and correctly identifies that the image is not of that particular skin category as True Negative. The number of times the proposed model recognizes the disease correctly is considered the False Positive. The number of times the proposed model misinterprets the skin disease is assumed as the False Negative.  Table 3 reflects our proposed approach's performance and other related approaches in terms of Sensitivity, Specificity, Accuracy, JSI, and MCC. The MobileNet-based models exhibited a better performance in classifying the region of interest with minimal computational efforts; the MobileNet V2 exhibited an optimal efficiency in disease classification [70]. The MobileNet V2 model encompassed LSTM which has an impact on the crucial parameters like learning rates and input and output gates that yield a better outcome. Plotting the results of Table 3 in Figure 9, it is visible that the proposed MobileNet V2-LSTM approach outperformed other state-of-the-art models in almost all performance sectors.  The performance of the proposed model is compared against the various other approaches concerning the parameters like Accuracy, Sensitivity, and Specificity of each of the approaches like Decision Tree and Random Forest approaches, Lesion Index Calculation Unit (LICU) approach, Fuzzy Support Vector Machine with probabilistic boosting the segmentation, Compact Deep Neural Network, SegNet model, U-Net model, respectively [81][82][83][84][85], considered for comparative analysis that determine the efficiency of the model. Figure 10 is the graph that is obtained from the values of Table 4.  The performance of the proposed model is compared against the various other approaches concerning the parameters like Accuracy, Sensitivity, and Specificity of each of the approaches like Decision Tree and Random Forest approaches, Lesion Index Calculation Unit (LICU) approach, Fuzzy Support Vector Machine with probabilistic boosting the segmentation, Compact Deep Neural Network, SegNet model, U-Net model, respectively [81][82][83][84][85], considered for comparative analysis that determine the efficiency of the model. Figure 10 is the graph that is obtained from the values of Table 4.   Tables 1  and 2. The proposed model's performance has been observed as a steep incline in performance, reducing the number of classes for comparison. The other significance of the proposed model is that the computation efforts needed for the classification of the skin disease are comparatively low compared to the rest of the methods considered for evaluation. Experimentation is performed further to assess the progress of the skin disease through texture-based information [24,86]. Table 5 presents the progress of the disease through the metrics like Disease Core (DC) that represents the actual region of the tumor, and the Enhanced Disease (ED) is the region that has recently been affected by the disease that is approximated through the texture of the sin around the disease code and the entire region of the disease code and the enhanced disease is considered as the Whole Disease (WD). The experimental study is efficient in assessing the impact of the treatment on the disease. The progress in disease is likely to be more accurate when examined against the ground facts, and it would help take up the most suitable medication for controlling the disease. The confidence of obtained outcome is assessed through Equation (17):  The proposed model outperformed compared to the various existing approaches. All the approaches are examined against the five classes of skin diseases. The proposed model is implemented against seven skin diseases classes as evaluations presented in Tables 1 and 2. The proposed model's performance has been observed as a steep incline in performance, reducing the number of classes for comparison. The other significance of the proposed model is that the computation efforts needed for the classification of the skin disease are comparatively low compared to the rest of the methods considered for evaluation. Experimentation is performed further to assess the progress of the skin disease through texture-based information [24,86]. Table 5 presents the progress of the disease through the metrics like Disease Core (DC) that represents the actual region of the tumor, and the Enhanced Disease (ED) is the region that has recently been affected by the disease that is approximated through the texture of the sin around the disease code and the entire region of the disease code and the enhanced disease is considered as the Whole Disease (WD). The experimental study is efficient in assessing the impact of the treatment on the disease. The progress in disease is likely to be more accurate when examined against the ground facts, and it would help take up the most suitable medication for controlling the disease. The confidence of obtained outcome is assessed through Equation (17): The confidence mean in Table 5 is the value obtained on evaluating the mean of the confidence values observed on repeated experimentation. The robustness of the proposed approach can be determined from the mean of the confidence value that is assessed. The values after the decimal digits represent the deviation of the approximated from the ground facts. The values for the proposed approach are almost negligible compared to the other methods compared in the paper. Figure 11 represents the graphs obtained from Table 5, illustrating the disease growth progress that would support better treatment for the patients. The model is efficient in assessing the progress of diseased growth. The confidence value determines the average confidence level at which it determines the enhanced region of the disease. The proposed model is efficient in approximating the class of the disease more precisely with minimal computational efforts.  Figure 11. The progress of the disease growth. Figure 11. The progress of the disease growth.
The incorporation of the LSTM component has enhanced the accuracy of the proposed approach. It can be observed from Table 3 that the proposed MobileNet V2 with LSTM model has outperformed over the other approaches like the HARIS, FTNN, CNN, VGG19, and conventional MobileNet V1, MobileNet V2 models in terms of Sensitivity, Specificity, Accuracy metrics alongside the MCC and JSI [87][88][89]. It can be analyzed that the proposed model is better than LICU, SegNet, U-Net, Yuan in terms of Sensitivity, Specificity, and Accuracy, as presented in Table 4.
Training loss and validation loss are two significant hyper-parameters that determine the preciseness of the proposed model. The training accuracy and the validation accuracy of the proposed model are evaluated against the similar parameters of other models considered in this study. Table 6 presents the Training and Validation accuracy of the various approaches [87][88][89][90][91]. Figure 12 represents the graphical representation of the values obtained from Table 6.  Figure 11. The progress of the disease growth.

Execution Time
In the process of evaluating the performance of the proposed model, the execution time of the validation phase is presented in Table 7, and Figure 13 in accordance to the existing studies. The proposed model consumed approximately around 1134 seconds for training the model over 20 epochs. The computational time to MobileNet V2 with LSTM over MobileNet V2 has not drastically reduced [92,93]. Still, MobileNet V2 exhibited a better prediction accuracy in terms of other performance evolution measures like Sensitivity, Specificity, and Accuracy.

Algorithm
Execution Time(s) CNN [32] 151.23 VGG19 [78] 128.51 MobileNet V1 [71] 126.98 MobileNet V2 [80] 105.92 MobileNet V2-LSTM 101.87 The computational time of the proposed MobileNet V2 with LSTM is reasonably good, as shown in Table 7 which makes it feasible to incorporate the technology to run over the computationally lite weighted devices. Incorporating the LSTM module will assist in faster convergence by remembering the significant features necessary for the more rapid and accurate classification of the lesion images.

Practical Implications
The proposed model based on MobileNet V2 with LSTM is associated with the mobile application for ease of use for the patients/doctors to classify diseases based on the image fed as the input shown in one such application [94]. Figure 14 represents the architecture of the proposed model. The mobile app is designed to acquire the affected region's image and the representational state transfer (Rest) API for securely storing the data in a remote server. NoSQL MongoDB is used in handling massive user-related data.
The proposed model is quite helpful for both the patients and the doctor in classifying the type of skin disease. The image captured using the mobile device is fed as the input for the interface. The interface then uses the MobileNet V2 with LSTM for processing the data. The MobileNet V2 can be implemented in an iOS platform through netscope and netron architecture. The information can either be transferred through the XML/JSON, or the model can be implemented in an iOS platform without separate space for the model. The computational time of the proposed MobileNet V2 with LSTM is reasonably good, as shown in Table 7 which makes it feasible to incorporate the technology to run over the computationally lite weighted devices. Incorporating the LSTM module will assist in faster convergence by remembering the significant features necessary for the more rapid and accurate classification of the lesion images.

Practical Implications
The proposed model based on MobileNet V2 with LSTM is associated with the mobile application for ease of use for the patients/doctors to classify diseases based on the image fed as the input shown in one such application [94]. Figure 14 represents the architecture of the proposed model. The mobile app is designed to acquire the affected region's image and the representational state transfer (Rest) API for securely storing the data in a remote server. NoSQL MongoDB is used in handling massive user-related data.  The proposed framework for the practical implication involves multiple phases in the process of classifying the type of the skin disease presented through Figure 15. In the initial phase, the data are acquired and assessed by the professionals and practitioners for the type of disease for accurate training of the model. The second phase of the framework concerns the app integration of the proposed MobileNet V2 with LSTM model. In this phase, the image of the affected region is captured and fed as the input for the model, the features of the input image are identified for correlating the features with the trained data for predictions. The probabilities of the particular type of diseases are approximated in this phase to determine the class of the disease. In the third phase, the classification outcome and the evaluation of the model are performed. The disease classification probability The proposed model is quite helpful for both the patients and the doctor in classifying the type of skin disease. The image captured using the mobile device is fed as the input for the interface. The interface then uses the MobileNet V2 with LSTM for processing the data. The MobileNet V2 can be implemented in an iOS platform through netscope and netron architecture. The information can either be transferred through the XML/JSON, or the model can be implemented in an iOS platform without separate space for the model. A flask framework can be used in web/mobile-based data access with a set of available libraries. The LSTM can be imported from the Keras libraries that are available for incorporating into the model; the integration of the LSTM is almost the same as the Recurrent Neural Network architecture.
The proposed framework for the practical implication involves multiple phases in the process of classifying the type of the skin disease presented through Figure 15. In the initial phase, the data are acquired and assessed by the professionals and practitioners for the type of disease for accurate training of the model. The second phase of the framework concerns the app integration of the proposed MobileNet V2 with LSTM model. In this phase, the image of the affected region is captured and fed as the input for the model, the features of the input image are identified for correlating the features with the trained data for predictions. The probabilities of the particular type of diseases are approximated in this phase to determine the class of the disease. In the third phase, the classification outcome and the evaluation of the model are performed. The disease classification probability determines the class of the disease, and the outcome of the predictions are evaluated against the various evaluation metrics and the information is updated in the database for the feature perception [95].  Figure 15. The mobile framework on incorporating MobileNet V2 with LSTM. Figure 16 represents the screens acquired from the prospectus model, the user's information that includes the name, date of birth, gender, email, and the date related to the current health conditions like diabetes, hypertension, etc., entered by the user. The type of diseases is selected on the home page, which redirects users to the appropriate page where the user has the provision to upload the image of the affected region as showing the second image of Figure 14 and the data like the number of days since effected. Upon recognizing the suitable type of skin disease, it will be returning the disease's details and the symptoms associated with the disease, as shown in Figure 14. The details provided will help the physician, radiologist, and the patient in the preliminary assessment of the disease.   The performance of the proposed Mobilenet V2 with LSTM is evaluated through various assessment metrics, and the implemented results are presented along with the graphs of the hyperparameters. It is evident from the obtained results that the proposed model's performance for lesion classification is reasonably fair with minimal computational time than the other approaches. The proposed model needs a considerable lesser computational effort in performing the classification of the images, which makes it suitable to deploy in mobility devices. The prospectus application that works with the proposed model can precisely identify the skin disease for the image that is captured.

Conclusions
The proposed model based on the MobileNet V2 and LSTM approach proved efficient for skin disease classification and detection with minimal computational power and effort. The outcome is promising, with an accuracy of 85.34% when experimented with and compared with other methods over the real-time images acquired from Kaggle [11]. The MobileNet V2 architecture is designed to work with a portable device with a stride2 mechanism. The model is computationally effective, and the use of the LSTM module with the MobileNet V2 would enhance the prediction accuracy by maintaining the previous timestamp data. The information related to the current state through weights optimizations would make the model robust. It is also compared against various other conventional models like CNN, FTNN, and HARIS. It is observed that the proposed model has outperformed in classification and analyzing the progress of the tumor growth based on the textured-based information as presented in the Results and Discussion section. The bidirectional LSTM may further improvise the performance of the model. In the practical implementation of the proposed model, an association of the front end designed through the android studio/SSDLite/DeepLabv3+ and the business model built over Kaggle has taken tremendous efforts in integrating either of the models. However, at the present point, there is a range of shortcomings that must be resolved in future work. The model's precision is dramatically decreased to just below 80 percent when checked on a series of photographs captured in poor illumination conditions distinct from those used during testing. Eventually, the proposed approach is not designed to replace but rather to supplement existing disease-diagnostic solutions. Laboratory test results are always more trustworthy than diagnoses based solely on visual symptoms, and visual inspection alone often challenges early diagnosis.

Future Works
The proposed model is computationally efficient as it is designed to work on top of lightweight capability devices. The proposed MobileNet V2 with the LSTM model needs a more significant number of parameters for better accuracy. The considered input image and the MobileNet V2 with LSTM model's resultant outputs have no significant randomness to explore all possible patterns in the assessment process. Alongside the bottleneck in residual connections in the proposed architecture, the model yields higher accuracy with minimal effort. The model can be further improved by incorporating the self-learning capability and knowledge acquisition from its previous experiences. The efforts on training the model can be considerably reduced. However, the model must be mechanized to assess the impact of features extracted for each strategy, and the incorporation of randomizing components is necessary. The researchers recommend that future research be performed to examine the feature extraction actions based on biomarkers, even though there is ample data, depending on the specific findings. Biomarkers effectively identify the disease from the supplementary data like the genomic, protein sequences, and pathological data in addition to the imaging data. It is recommended to consider lightweight security when transmitting physiological and biological data in health networks, and a user-friendly smart device app, which can display alarms and communicate between patients and physicians in eHealth and telehealth environment to securely exchange and transmit data [96,97].  Data Availability Statement: The data HAM10000 from Kaggle is considered for the experimental study in the paper. The data set consists of 10,000 dermatoscopic images of various individuals worldwide, with the divergent type of skin diseases. The data is openly available from the link https://kaggle.com/kmader/skin-cancer-mnist-ham10000 (accessed on 17 April 2021).