Modified U-NET Architecture for Segmentation of Skin Lesion

Dermoscopy images can be classified more accurately if skin lesions or nodules are segmented. Because of their fuzzy borders, irregular boundaries, inter- and intra-class variances, and so on, nodule segmentation is a difficult task. For the segmentation of skin lesions from dermoscopic pictures, several algorithms have been developed. However, their accuracy lags well behind the industry standard. In this paper, a modified U-Net architecture is proposed by modifying the feature map’s dimension for an accurate and automatic segmentation of dermoscopic images. Apart from this, more kernels to the feature map allowed for a more precise extraction of the nodule. We evaluated the effectiveness of the proposed model by considering several hyper parameters such as epochs, batch size, and the types of optimizers, testing it with augmentation techniques implemented to enhance the amount of photos available in the PH2 dataset. The best performance achieved by the proposed model is with an Adam optimizer using a batch size of 8 and 75 epochs.


Introduction
All kinds of microorganisms can cause a variety of skin infections, which can range from moderate to serious. An early diagnosis of a skin illness improves the chances of a successful treatment. Dark patches or birthmarks may appear on the skin as a result of skin illness [1]. The modality used for the diagnosis of skin disease is dermoscopy images. Dermoscopy is a process in which a dermatologist observes a position with a special microscope or magnifying lens. Dermatologist uses a device named dermatoscope, that consists of a high magnifying class, so that a clear picture of the nodule can be seen. In medical imaging, a variety of technologies are employed for the diagnosis of skin disease such as Machine Learning (ML), Deep Learning (DL), Transfer Learning (TL), Ensemble Learning (EL), etc. In ML, a machine is made to learn the tasks, whereas in DL, TL, and EL it learns features directly from the data provided. Improvements in deep learning Convolutional Neural Networks (CNN) have shown promising results in recent years, and they have also emerged as a difficult study topic for categorization in medical image processing [2,3].
Automatically segmenting skin lesions is still a difficult task. Some skin lesions with light pigment have a very similar color and visual patterns in the pigment patches and the surrounding skin regions, making skin lesion segmentation difficult. In addition, the original dermoscopic images have a high resolution, which means that processing them obtained an accuracy value of 86.54. Khan et al. [37] proposed a model using pre-processing, segmentation and classification and obtained a segmentation accuracy of 96.8% and 92.1% for the ISIC and PH2 datasets, respectively. Moreover, they obtained a classification accuracy of 97% on the ISIC dataset. Long et al. [38] used the concept of fine tuning with the classification networks AlexNet, GoogleNet, and VGGNet. The authors presented a novel architecture that produces accurate and detailed results by combining semantic information from a deep, coarse layer with appearance information from a shallow, fine layer's segmentations. Chen et al. [39] combined the methods from deep convolutional neural networks and graphical models for addressing semantic image segmentation. The authors combined the final layers with a fully connected Conditional Random Field. They obtained an IOU accuracy value of 71.6% in the test set. Noh et al. [40] combined a deep deconvolution network, and the proposed technique overcomes the constraints of previous methods based on fully convolutional networks, and the segmentation method frequently finds complex structures and handles objects at many sizes. They achieved an accuracy of 72.5% through ensembling with the fully convolutional network. Wang et al. [41] presented non-local U-Nets that are used with flexible global aggregation blocks. These blocks are used for preserving size in upsampling and downsampling layers. Ibethaz et al. [42] proposed a MultiResUNet architecture, by replacing it with the two convolutional layers. A parameter is assigned for the layers that controls the number of filters of the convolutional layers. They used 97 images ranging from 1349 × 1030 to 1344 × 1024 and resized to 256 × 256, and used the ISIC-2018 dataset. Christ et al. [43] presented a method for the automatic segmentation of lesions in CT abdomen images using cascaded fully convolutional neural networks. They used a two-fold cross validation on the images and obtained a dice score of over 94%. Lin et al. [44] presented a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable a high-resolution prediction using long-range residual connections. They achieved an intersection-over-union score of 83.4 on the challenging PASCAL VOC 2012 dataset. Novikov et al. [45] used a multi-class configuration, and the ground-truth masks were trained and tested using the publicly accessible JSRT database, which contains 247 X-ray images and can be found in the SCR database. They obtained a Jaccard index value of 95.0%.
The performance results after segmentation increase, and the results obtained are satisfying. From the literature, it can be seen that, when the segmented images are used for classification, the classification accuracy increases.
The major contributions of the study are as follows: • A modified U-Net architecture has been proposed for the segmentation of lesions from skin disease using dermoscopy images.

•
The data augmentation technique has been performed to increase the randomness of images for better stability. • The proposed model is validated with different optimizers, batch sizes, and epochs for better accuracy. • The proposed model has been analyzed with various performance parameters such as Jaccard Index, Dice Coefficient, Precision, Recall, Accuracy and Loss.
The rest of the paper is structured as follows: materials and methods are given in Section 2, followed by results and discussions in Section 3, and Section 4 shows the conclusion and future scope.

Materials and Methods
The proposed model exploits the U-Net architecture for lesion segmentation from skin disease dermoscopy images. The proposed model has been evaluated on the PH2 [46] dataset consisting of 200 skin disease dermoscopy images.

Dataset
The PH2 dataset contains 200 dermoscopy images (160 non-melanomas and 40 melanoma) that are obtained by the Tuebinger Mole Analyzer system using a 20-fold magnification. All images have an approximate size of 192 × 256 pixels. Figure 1 shows the skin diseases' original images, and Figure 2 shows the ground truth masks for the respective original images.
The proposed model exploits the U-Net architecture for lesion segmentation from skin disease dermoscopy images. The proposed model has been evaluated on the PH2 [46] dataset consisting of 200 skin disease dermoscopy images.

Dataset
The PH2 dataset contains 200 dermoscopy images (160 non-melanomas and 40 melanoma) that are obtained by the Tuebinger Mole Analyzer system using a 20-fold magnification. All images have an approximate size of 192 × 256 pixels. Figure 1 shows the skin diseases' original images, and Figure 2 shows the ground truth masks for the respective original images.

Data Augmentation
As the available training dermoscopy images in the dataset are few, offline data augmentation techniques have been implemented to increase the number of sample images. The data augmentation [47] on images is done using different techniques such as flipping and rotation, as shown in Figure 3. The corresponding masks of the augmented images are also shown in Figure 3. skin disease dermoscopy images. The proposed model has been evaluated on the PH2 [46] dataset consisting of 200 skin disease dermoscopy images.

Dataset
The PH2 dataset contains 200 dermoscopy images (160 non-melanomas and 40 melanoma) that are obtained by the Tuebinger Mole Analyzer system using a 20-fold magnification. All images have an approximate size of 192 × 256 pixels. Figure 1 shows the skin diseases' original images, and Figure 2 shows the ground truth masks for the respective original images.

Data Augmentation
As the available training dermoscopy images in the dataset are few, offline data augmentation techniques have been implemented to increase the number of sample images. The data augmentation [47] on images is done using different techniques such as flipping and rotation, as shown in Figure 3. The corresponding masks of the augmented images are also shown in Figure 3.

Data Augmentation
As the available training dermoscopy images in the dataset are few, offline data augmentation techniques have been implemented to increase the number of sample images. The data augmentation [47] on images is done using different techniques such as flipping and rotation, as shown in Figure 3. The corresponding masks of the augmented images are also shown in Figure 3.

Modified U-Net Architecture
An enhanced version of the Convolutional Neural Network (CNN) was developed for dealing with biomedical images in which the purpose is not only to categorize whether or not an infection exists but also to identify the infected area [48]. The U-Net architecture consists of two paths. The first one is the contraction path, that is also known as encoder, and the second one is the symmetric expanding path, also known as decoder. Encoder is used to capture the image context, whereas decoder uses transposed convolutions to enable precise localization. In this paper, the proposed modified U-Net architecture has been presented, as shown in Figure 4.

Modified U-Net Architecture
An enhanced version of the Convolutional Neural Network (CNN) was developed for dealing with biomedical images in which the purpose is not only to categorize whether or not an infection exists but also to identify the infected area [48]. The U-Net architecture consists of two paths. The first one is the contraction path, that is also known as encoder, and the second one is the symmetric expanding path, also known as decoder. Encoder is used to capture the image context, whereas decoder uses transposed convolutions to enable precise localization. In this paper, the proposed modified U-Net architecture has been presented, as shown in Figure 4. In the contraction path, each and every process consists of two convolutional layers. In the first part, the channel changes from 1 to 64. The blue arrow pointing down shows the max pooling layer that halves down the image from 192 × 256 to 96 × 128. This process is repeated three times and reaches below. Below are the two convolutional layers, but these layers are without max pooling layers. The image has been resized to 12 × 16 × 1024.
In the expanding path, the image is going to be upsized to its original image size. The upsampling technique expands the size of the images, and it is known as transposed convolution. The image is upsized from 12 × 16 to 24 × 32. After that, the image is concatenated with the image from the contracting path. The reason for this is to combine the information from the last layers to get a more accurate prediction. The proposed modified U-Net architecture includes a feature map rectangular in size starting from 192 × 256 in the first layer and 96 × 128 in the second layer. It is downsized again to 48 × 64 in the third layer. Then, it is downsized to 24 × 32 in the fourth layer, and, finally, it is downsized to 6 × 8 in the last layer. Afterwards, the feature map size increases in the expansion path with 24 × 32 in the first layer from the bottom. It is upsized to 48 × 64 in the second layer and to 96 × 128 in the third layer. Finally, the feature map size changes to 192 × 256 in the topmost layer.
After the contraction and expanding process, the architecture reaches the upper level, reshaping the image; the last layer is a convolution layer. Table 1 shows the parameters of the proposed model, that consists of different convolution layers, input and output image size, filter size, number of filters, and activation function. The total number of parameters for the proposed model are 33,393,669, whereas the total number of trainable parameters are 33,377,795, and non-trainable parameters are 15,874.  The proposed architecture localizes and distinguishes borders by classifying every pixel; therefore, input and output share the same size. In the encoder part, the convolution layer and the max-pooling layer are applied. In the decoder part, the transposed convolution layer and the simple convolution layer are applied.
During the simulation phase, the Input image undergoes a multilevel decomposition in the encoder path, and the feature maps are reduced with the help of a max pooling layer, which can be seen in Figure 4 as arrows with different colors. The yellow arrows show the convolutional layer of size 3 × 3, ReLU (Rectified Linear Unit) activation function and dropout layer; the red arrows show the convolutional layer of size 3 × 3 and ReLU activation function; the blue arrows show the max-pooling layer; the green arrows show the upsampling with 2 × 2 size; the black arrows show the concatenation of images from contracting and expanding paths; and, finally, the brown arrows show the final convolutional layer with size 1 × 1.
In the contraction path, each and every process consists of two convolutional layers. In the first part, the channel changes from 1 to 64. The blue arrow pointing down shows the max pooling layer that halves down the image from 192 × 256 to 96 × 128. This process is repeated three times and reaches below. Below are the two convolutional layers, but these layers are without max pooling layers. The image has been resized to 12 × 16 × 1024.
In the expanding path, the image is going to be upsized to its original image size. The upsampling technique expands the size of the images, and it is known as transposed convolution. The image is upsized from 12 × 16 to 24 × 32. After that, the image is concatenated with the image from the contracting path. The reason for this is to combine the information from the last layers to get a more accurate prediction. The proposed modified U-Net architecture includes a feature map rectangular in size starting from 192 × 256 in the first layer and 96 × 128 in the second layer. It is downsized again to 48 × 64 in the third layer. Then, it is downsized to 24 × 32 in the fourth layer, and, finally, it is downsized to 6 × 8 in the last layer. Afterwards, the feature map size increases in the expansion path with 24 × 32 in the first layer from the bottom. It is upsized to 48 × 64 in the second layer and to 96 × 128 in the third layer. Finally, the feature map size changes to 192 × 256 in the topmost layer.
After the contraction and expanding process, the architecture reaches the upper level, reshaping the image; the last layer is a convolution layer. Table 1 shows the parameters of the proposed model, that consists of different convolution layers, input and output image size, filter size, number of filters, and activation function. The total number of parameters for the proposed model are 33,393,669, whereas the total number of trainable parameters are 33,377,795, and non-trainable parameters are 15,874.

Results and Discussion
This section includes all the results attained by using a modified U-Net model. The model is evaluated on the PH2 dataset. An experimental analysis has been done, from which training accuracy and loss curves are obtained. A detailed description of the per-formed visual analysis of segmented images and the analysis of confusion matrix parameters is given below.

Result Analysis Based on Different Optimizers
This section includes all the results obtained by using Adam, Adadelta, and SGD optimizers with a batch size of 18 and 100 epochs.

Analysis of Training Loss and Accuracy
The results are taken using different optimizers with a batch size of 18 and 100 epochs. Figure 5 shows the curves of training loss and training accuracy. It is worth noticing that the value of accuracy increases with the number of epochs, and the loss value decreases. The color red shows the training loss, and the color blue shows the training accuracy.

Analysis of Training Loss and Accuracy
The results are taken using different optimizers with a batch size of 18 and 100 epochs. Figure 5 shows the curves of training loss and training accuracy. It is worth noticing that the value of accuracy increases with the number of epochs, and the loss value decreases. The color red shows the training loss, and the color blue shows the training accuracy. Figure 5a shows the training loss by using the SGD optimizer; the maximum loss value is 0.7, which decreases with the number of epochs. Figure 5b shows the training accuracy in which the maximum accuracy is greater than 0.95 at the 100th epoch. Figure  5c shows the training loss by exploiting the Adam optimizer; the maximum loss value is lower than that of the SGD optimizer. Figure 5d shows the training accuracy in which the maximum accuracy is greater than 0.975 at the 100th epoch. The accuracy for the Adam optimizer outperforms the accuracy at the SGD optimizer. Figure 5e shows the training loss with the Adadelta optimizer, whose maximum value is 0.75, which is higher with respect to the SGD and Adam optimizers. Figure 5f shows the training accuracy, and the value of accuracy is only 0.90. Figure 5 shows that the Adam optimizer outperforms the SGD and Adadelta optimizers in terms of training loss and training accuracy.  Figure 6i-p shows the predicted masks and segmented outputs for the Adadelta and SGD optimizers, respectively. From the visual analysis of these figures, it can be seen that the Adam and SGD optimizers show almost similar results with a batch size of 18 and 100 epochs, whereas the Adadelta optimizer does not follow the profile of the skin lesion; rather, it extracts a complete skin part. So, the Adadelta optimizer cannot be recommended for skin lesion segmentation. To select the best performing optimizer between  Figure 5a shows the training loss by using the SGD optimizer; the maximum loss value is 0.7, which decreases with the number of epochs. Figure 5b shows the training accuracy in which the maximum accuracy is greater than 0.95 at the 100th epoch. Figure 5c shows the training loss by exploiting the Adam optimizer; the maximum loss value is lower than that of the SGD optimizer. Figure 5d shows the training accuracy in which the maximum accuracy is greater than 0.975 at the 100th epoch. The accuracy for the Adam optimizer outperforms the accuracy at the SGD optimizer. Figure 5e shows the training loss with the Adadelta optimizer, whose maximum value is 0.75, which is higher with respect to the SGD and Adam optimizers. Figure 5f shows the training accuracy, and the value of accuracy is only 0.90. Figure 5 shows that the Adam optimizer outperforms the SGD and Adadelta optimizers in terms of training loss and training accuracy. Figure 6 shows the segmented images using the Adam, Adadelta and SGD optimizers with a batch size of 18 and 100 epochs. Figure 6a,c shows the ground truth masks of the original images, and Figure 6b,d shows the original images. Figure 6e,g shows the predicted masks of original images 1 and 2 with the Adam optimizer, whereas Figure 6f,h shows the segmented outputs of original images 1 and 2 with the Adam optimizer. Similarly, Figure 6i-p shows the predicted masks and segmented outputs for the Adadelta and SGD optimizers, respectively. From the visual analysis of these figures, it can be seen that the Adam and SGD optimizers show almost similar results with a batch size of 18 and 100 epochs, whereas the Adadelta optimizer does not follow the profile of the skin lesion; rather, it extracts a complete skin part. So, the Adadelta optimizer cannot be recommended for skin lesion segmentation. To select the best performing optimizer between Adam and SGD, an analysis of these two optimizers is done in Section 3.1.3 using confusion matrix parameters. 3.1.2. Visual Analysis of Segmented Images Figure 6 shows the segmented images using the Adam, Adadelta and SGD optimizers with a batch size of 18 and 100 epochs. Figure 6a,c shows the ground truth masks of the original images, and Figure 6b,d shows the original images. Figure 6e,g shows the predicted masks of original images 1 and 2 with the Adam optimizer, whereas Figure 6f,h shows the segmented outputs of original images 1 and 2 with the Adam optimizer. Similarly, Figure 6i-p shows the predicted masks and segmented outputs for the Adadelta and SGD optimizers, respectively. From the visual analysis of these figures, it can be seen that the Adam and SGD optimizers show almost similar results with a batch size of 18 and 100 epochs, whereas the Adadelta optimizer does not follow the profile of the skin lesion; rather, it extracts a complete skin part. So, the Adadelta optimizer cannot be recommended for skin lesion segmentation. To select the best performing optimizer between Adam and SGD, an analysis of these two optimizers is done in Section 3.1.3 using confusion matrix parameters.

Analysis of Confusion Matrix Parameters
In Section 3.1.2, a visual analysis of segmented images is shown, proving that the Adam and SGD optimizers do not have the best results. Now, to see the best performing optimizer, confusion matrix parameters are analyzed. Table 2 shows the comparison of the Jaccard Index, Dice Coefficient, Precision, Recall, Accuracy, and Loss for the modified U-Net model architecture by using the Adam, Adadelta, and SGD optimizers. The validation dataset results, also shown in Figure 7, show that the SGD optimizer reaches the best performance in terms of Precision, with a value of 91.23%, although the Adam optimizer outperforms the SGD optimizer with a 94.74% value of Jaccard Index, 86.13% value of Dice Coefficient, 97.14% value of Recall, 95.01% of accuracy, and 16.24 of loss value. In the case of the Adadelta optimizer, the obtained results show that it is the worst one. Therefore, from these results we can affirm that the Adam optimizer has shown the best results on validation dataset, as it has outperformed on almost all parameters with respect to the SGD and Adadelta optimizers. Figure 7 shows the analysis of the confusion matrix parameters on the Adam, Adadelta, and SGD optimizers using a validation dataset. From this figure, it can be seen that the Adam optimizer is performing best on almost all the parameters, such as Jaccard Index, Dice Coefficient, Precision, Recall, Accuracy, and Loss. The value of loss is much lower in the case of the Adam optimizer in comparison to the SGD and Adadelta optimizers. Figure 7 shows the analysis of the confusion matrix parameters on the Adam, Adadelta, and SGD optimizers using a validation dataset. From this figure, it can be seen that the Adam optimizer is performing best on almost all the parameters, such as Jaccard Index, Dice Coefficient, Precision, Recall, Accuracy, and Loss. The value of loss is much lower in the case of the Adam optimizer in comparison to the SGD and Adadelta optimizers.

Result Analysis Based on Different Optimizers
From Section 3.1, it is seen that the Adam optimizer has outperformed in comparison to the SGD and Adadelta optimizers with a batch size of 18. Therefore, in this section, the results are calculated using the Adam optimizer on different batch sizes. However, it is possible that the Adadelta and SGD optimizers may provide better results on different

Result Analysis Based on Different Optimizers
From Section 3.1, it is seen that the Adam optimizer has outperformed in comparison to the SGD and Adadelta optimizers with a batch size of 18. Therefore, in this section, the results are calculated using the Adam optimizer on different batch sizes. However, it is possible that the Adadelta and SGD optimizers may provide better results on different combinations of batch size and epochs. In future, these two optimizers can be analyzed for different batch size and epoch combinations. Here, the values of batch sizes used for analyzing the Adam optimizer are 8, 18, and 32 on 100 epochs.

Analysis of Training Loss and Accuracy
The results are taken using different batch sizes with the Adam optimizer on 100 epochs. Figure 8 shows the curves of training loss and training accuracy, and from the curves it can be concluded that the value of accuracy increases with the number of epochs, and the loss value decreases. The color red shows the training loss, and the color blue shows the training accuracy. Figure 8a,c,e shows the training loss on batch sizes 8, 18, and 32, and the loss value is 0.5. Figure 8b,d shows the training accuracy on batch sizes 8 and 18, and the value of accuracy is approximately greater than 0.975. Figure 8f shows the training accuracy, and the value of accuracy is only 0.95 with a batch size of 32.
curves it can be concluded that the value of accuracy increases with the number of epochs, and the loss value decreases. The color red shows the training loss, and the color blue shows the training accuracy. Figure 8a,c,e shows the training loss on batch sizes 8, 18, and 32, and the loss value is 0.5. Figure 8b,d shows the training accuracy on batch sizes 8 and 18, and the value of accuracy is approximately greater than 0.975. Figure 8f shows the training accuracy, and the value of accuracy is only 0.95 with a batch size of 32.    Figure 9i-p shows the predicted masks and segmented outputs on batch sizes 18 and 32, respectively. From the visual analysis of the figures, it can be seen that batch sizes 8 and 18 show almost similar results with the Adam optimizer and 100 epochs, whereas batch size 32 does not perform well, since it is not extracting only the lesion part but also the outer part. Therefore, batch size 32 cannot be recommended for skin lesion segmentation. To see the best performing batch size between 8 and 18, the analysis of these two batch sizes, the confusion matrix parameters are evaluated in Section 3.2.3. segmented outputs of original images 1 and 2 on batch size 8. Similarly, Figure 9i-p shows the predicted masks and segmented outputs on batch sizes 18 and 32, respectively. From the visual analysis of the figures, it can be seen that batch sizes 8 and 18 show almost similar results with the Adam optimizer and 100 epochs, whereas batch size 32 does not perform well, since it is not extracting only the lesion part but also the outer part. Therefore, batch size 32 cannot be recommended for skin lesion segmentation. To see the best performing batch size between 8 and 18, the analysis of these two batch sizes, the confusion matrix parameters are evaluated in Section 3.2.3.

Analysis of Confusion Matrix Parameters
In Section 3.2.2, a visual analysis of segmented images is done on different batch sizes, from which batch size 8 and 18 have shown the best results. Now, to see the best performing batch size, the confusion matrix parameters are analyzed. Table 3 shows the analysis of the U-Net model architecture using batch sizes 8, 18, and 32.
In the case of the validation dataset, as also shown in Figure 10, the batch size of 18 has performed best on Recall with a value of 97.14%, although batch size 8 has outperformed and shown a 95.68% value of Jaccard Index, 87.49% value of Dice Coefficient, 93.42% value of Precision, 95.51% value of Accuracy, and a lower loss value, of 13.72. In the case of batch size 32, as already observed with the visual analysis, the performance is lower with respect to the other batch sizes, showing a loss of 19.19. Therefore, from these results it can be seen that batch size 8 has shown the best results on the validation dataset.

Result Analysis Based on Different Epochs with the Adam Optimizer and Batch Size 8
From Section 3.2, it was seen that batch size 8 has outperformed in comparison to batch sizes 18 and 32 for the Adam optimizer. Therefore, in this section, the results are calculated using batch size 8 with different epochs. However, it is possible that batch sizes 18 and 32 may provide better results on different combinations of epochs. In future, these two batch sizes can be further analyzed with different epochs. Here, the value of epochs used for analyzing the Adam optimizer with batch size 8 are 25, 50, 75, and 100.

Analysis of Confusion Matrix Parameters
The results are taken using the Adam optimizer on batch size 8 with 25, 50, 75, and 100 epochs. Figure 11 shows the curves of training loss and training accuracy, and from the curves it is concluded that the value of accuracy increases with the number of epochs, and the loss value is decreases. Figure 11a,c,e,g shows the training loss with 25, 50, 75, and 100 epochs, and the loss value is 0.5; Figure 11b shows the training accuracy on 25 epochs, and the value of accu-  Figure 10 shows the analysis of confusion matrix parameters on batch sizes 8, 18, and 32. From the figure it can be seen that batch size 8 is performing best on almost all the parameters, such as Jaccard Index, Dice Coefficient, Precision, Recall, Accuracy, and Loss. The value of loss is much lower in the case of batch size 8 in comparison to batch sizes 18 and 32.

Result Analysis Based on Different Epochs with the Adam Optimizer and Batch Size 8
From Section 3.2, it was seen that batch size 8 has outperformed in comparison to batch sizes 18 and 32 for the Adam optimizer. Therefore, in this section, the results are calculated using batch size 8 with different epochs. However, it is possible that batch sizes 18 and 32 may provide better results on different combinations of epochs. In future, these two batch sizes can be further analyzed with different epochs. Here, the value of epochs used for analyzing the Adam optimizer with batch size 8 are 25, 50, 75, and 100.

Analysis of Confusion Matrix Parameters
The results are taken using the Adam optimizer on batch size 8 with 25, 50, 75, and 100 epochs. Figure 11 shows the curves of training loss and training accuracy, and from the curves it is concluded that the value of accuracy increases with the number of epochs, and the loss value is decreases. Figure 11a,c,e,g shows the training loss with 25, 50, 75, and 100 epochs, and the loss value is 0.5; Figure 11b shows the training accuracy on 25 epochs, and the value of accuracy is approximately greater than 0.94. Figure 11f,h shows the training accuracy, and the value of accuracy is only 0.975 on 75 and 100 epochs. From the visual analysis of these figures, it can be seen that 25, 50, and 75 epochs show almost similar results on the Adam optimizer and batch size 8, whereas 100 epochs do not show good results. To see the best performing epochs between 25, 50, and 75, an analysis of these two epochs is done in Section 3.3.3 using confusion matrix parameters.

Analysis of Confusion Matrix Parameters
In Section 3.3.2, a visual analysis of segmented images is done on different epoch values, of 25, 50, 75, and 100. Now, to see the best performing epochs, the confusion matrix parameters are analyzed. Table 4 shows the analysis of the U-Net model architecture using 25, 50, 75, and 100 epochs size and a batch size of 8 with the Adam optimizer.

Result Analysis Based on Different Epochs with the Adam Optimizer and Batch Size 8
From Section 3.2, it was seen that batch size 8 has outperformed in comparison to batch sizes 18 and 32 for the Adam optimizer. Therefore, in this section, the results are calculated using batch size 8 with different epochs. However, it is possible that batch sizes 18 and 32 may provide better results on different combinations of epochs. In future, these two batch sizes can be further analyzed with different epochs. Here, the value of epochs used for analyzing the Adam optimizer with batch size 8 are 25, 50, 75, and 100.

Analysis of Confusion Matrix Parameters
The results are taken using the Adam optimizer on batch size 8 with 25, 50, 75, and 100 epochs. Figure 11 shows the curves of training loss and training accuracy, and from the curves it is concluded that the value of accuracy increases with the number of epochs, and the loss value is decreases. Figure 11a,c,e,g shows the training loss with 25, 50, 75, and 100 epochs, and the loss value is 0.5; Figure 11b shows the training accuracy on 25 epochs, and the value of accuracy is approximately greater than 0.94. Figure 11f,h shows the training accuracy, and the value of accuracy is only 0.975 on 75 and 100 epochs.  Figure 12 shows the segmented images using the Adam optimizer and batch size 8 on different epochs. Figure 12a,c shows the ground truth masks of the original images 1 and 2, and Figure 12b,d shows the original images. Figure 12e,g shows the predicted masks of original images 1 and 2 on 25 epochs, whereas Figure 12f,h shows the segmented outputs of original images 1 and 2 on 25 epochs. Similarly, Figure 12i-t shows the predicted masks and segmented outputs on 50, 75, and 100 epochs, respectively.

Visual Analysis of Segmented Images
From the visual analysis of these figures, it can be seen that 25, 50, and 75 epochs show almost similar results on the Adam optimizer and batch size 8, whereas 100 epochs do not show good results. To see the best performing epochs between 25, 50, and 75, an analysis of these two epochs is done in Section 3.3.3 using confusion matrix parameters.     From Table 4, in the case of the validation dataset, it can be seen that on 25 epochs the value of loss is 28.17, which is very high, followed by a loss value of 23.37 on 50 epochs, whereas on 75 epochs the value of loss becomes lower, i.e., 11.56. Moreover, the values of the Jaccard Index, Dice Coefficient, and Accuracy are increased. Therefore, it can be seen that during the training of the model, there was underfitting on 25 and 50 epochs, due to which the performance parameters values are not good. But at the epoch value of 75, the model is properly trained, so the parameters' values are also improved. If the model is further trained up to 100 epochs, then the loss value is increased to 165.86. Hence, it can be identified that the proposed model performs best on 75 epochs. Figure 13 shows the analysis of confusion matrix parameters on 25, 50, 75, and 100 epochs. The results are obtained on the Jaccard Index, Dice Coefficient, Precision, Recall, Accuracy, and Loss. The best value of accuracy is obtained on 75 epochs with a much lower loss.

Comparison with State-of-the-Art Techniques
A comparison of the suggested scheme with other current state-of-the-art methods using dermoscopy images has been performed in terms of both the Jaccard Coefficient and accuracy. Table 5 provides a breakdown of both class-level predictions. This result

Comparison with State-of-the-Art Techniques
A comparison of the suggested scheme with other current state-of-the-art methods using dermoscopy images has been performed in terms of both the Jaccard Coefficient and accuracy. Table 5 provides a breakdown of both class-level predictions. This result analysis shows that the proposed framework achieves a superior overall accuracy compared to the state-of-the-art approaches. Jaccard coefficient and accuracy differed from one study to the next, since they employed different datasets (ISBI-2016, ISBI-2017, and PH2). According to Yuan et al. [10,11], the Jaccard Coefficient is 0.963 for the ISBI-2016 dataset and 0.78 for the ISBI-2017 dataset when employing convolutional neural networks.

Conclusions and Future Scope
Since medical image analysis is one of the challenging tasks which requires various computational techniques in the hierarchy of imaging applications, different types of analysis techniques, including image pre-processing, classification, segmentation, compression, and security, must be taken into account. In the literature, various authors have worked on the segmentation of skin lesions. This study proposed the modified U-Net model architecture for the segmentation of skin lesion in dermoscopy image so that an accurate classification of skin disease can be performed. The dermoscopy images are taken from the PH2 dataset with 200 images. The proposed model has been analyzed with different batch sizes, of 8, 18, and 32, using the Adam, Adadelta, and SGD optimizers and 25, 50, 75, and 100 epochs. The proposed model works best with a batch size of 8, the Adam optimizer, and 75 epochs, having an accuracy of 96.27% and a Jaccard Index of 96.35%. Its Dice Coefficient is coming out as 89.01%. Hence, there is still scope for improving the Dice Coefficient and the Precision of the modified U-Net architecture model. Moreover, in future, segmented images can be used for classification purposes to improve classification accuracy.  Data Availability Statement: Not applicable as the study did not require ethical approval. The data (PH2 dataset) is available in a publicly accessible repository.