Next Article in Journal
Whale Optimization Algorithm with a Hybrid Relation Vector Machine: A Highly Robust Respiratory Rate Prediction Model Using Photoplethysmography Signals
Next Article in Special Issue
Prediction of Coronary Artery Disease Using Machine Learning Techniques with Iris Analysis
Previous Article in Journal
Prediction of the Topography of the Corticospinal Tract on T1-Weighted MR Images Using Deep-Learning-Based Segmentation
Previous Article in Special Issue
Machine-Learning-Based Detecting of Eyelid Closure and Smiling Using Surface Electromyography of Auricular Muscles in Patients with Postparalytic Facial Synkinesis: A Feasibility Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Collaborative Learning Model for Skin Lesion Segmentation and Classification

1
School of Information Science and Engineering, University of Jinan, Jinan 250022, China
2
Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan 250022, China
3
Artificial Intelligence Research Institute, University of Jinan, Jinan 250022, China
*
Author to whom correspondence should be addressed.
Diagnostics 2023, 13(5), 912; https://doi.org/10.3390/diagnostics13050912
Submission received: 31 January 2023 / Revised: 19 February 2023 / Accepted: 24 February 2023 / Published: 28 February 2023

Abstract

:
The automatic segmentation and classification of skin lesions are two essential tasks in computer-aided skin cancer diagnosis. Segmentation aims to detect the location and boundary of the skin lesion area, while classification is used to evaluate the type of skin lesion. The location and contour information of lesions provided by segmentation is essential for the classification of skin lesions, while the skin disease classification helps generate target localization maps to assist the segmentation task. Although the segmentation and classification are studied independently in most cases, we find meaningful information can be explored using the correlation of dermatological segmentation and classification tasks, especially when the sample data are insufficient. In this paper, we propose a collaborative learning deep convolutional neural networks (CL-DCNN) model based on the teacher–student learning method for dermatological segmentation and classification. To generate high-quality pseudo-labels, we provide a self-training method. The segmentation network is selectively retrained through classification network screening pseudo-labels. Specially, we obtain high-quality pseudo-labels for the segmentation network by providing a reliability measure method. We also employ class activation maps to improve the location ability of the segmentation network. Furthermore, we provide the lesion contour information by using the lesion segmentation masks to improve the recognition ability of the classification network. Experiments are carried on the ISIC 2017 and ISIC Archive datasets. The CL-DCNN model achieved a Jaccard of 79.1% on the skin lesion segmentation task and an average AUC of 93.7% on the skin disease classification task, which is superior to the advanced skin lesion segmentation methods and classification methods.

1. Introduction

Skin cancer is one of the most common and deadly cancers. The American Cancer Society reports that by 2022, there will be approximately 97,920 new cases of melanoma [1]. The early diagnosis and treatment of skin cancer are critical. Except for early surgical excision, skin cancer lacks special treatment and has a poor prognosis. Therefore, the computer-aided diagnosis of skin diseases has been increasingly investigated to assist dermatologists in improving diagnosis accuracy, efficiency, and objectivity.
Accurate detection of the skin lesion’s boundary can help pathologists mitigate noise interference and obtain contour information [2]. With a large amount of labeled data, deep learning has achieved advanced performance in image processing. However, obtaining pixel-level annotations for segmentation is often expensive for dermoscopic images, as generating accurate annotations requires specialized skills [3]. Many semi-supervised learning and weakly supervised learning methods have been proposed for segmentation in the case of small quantity of pixel-level labeled data. These methods use unlabeled or weakly labeled data to realize accurate segmentation. Self-training is a semi-supervised method that uses a teacher model, trained using labeled data, to create synthetic labels for unlabeled examples [4]. The student model can be trained with the pseudo-labels generated by the teacher model [5]. Weakly supervised learning is an umbrella covering a variety of studies that attempt to construct predictive models by learning with weak supervision [6]. In the weakly supervised learning method, the image-level labeled data can be used to train the classification networks to generate class activation maps (CAMs) [7]. Then, the pseudo-labels generated by CAMs are employed to train the segmentation network to improve the segmentation performance.
The classification of skin diseases (melanoma, nevus, seborrheic keratosis) is essential to assist physicians in diagnosing skin cancer. The dermatological classification task is challenging due to four reasons: (1) the low contrast between each lesion and its surrounding skin tissue results in fuzzy lesion boundaries; (2) the inter-type skin lesions may share visual similarities, and the intra-type lesions may have visual differences; (3) skin lesions vary significantly in visual appearance, which may be corrupted by artifacts such as hair, blood vessels, and air bubbles; and (4) labeling skin disease types on dermatoscopic images requires specialized knowledge, resulting in a small amount of image-level labeled training data. The segmentation can help remove distractions from dermoscopic images and thus is highly beneficial for improving the accuracy of lesion classification [2]. For many medical image classification methods, accurate segmentation is considered as the first step of the classification task. Many researchers [8,9,10,11] focus on the methods based on mask to improve the classification performance. The different scenarios included approaches that exploited the segmentation masks either for the cropping of skin lesion images or removing the surrounding background or using the segmentation masks as an additional input channel for model training [11]. However, inaccurate masks may disturb the judgment of the classification network. When using masks to improve classification performance, the importance of mask accuracy should be emphasized.
In most cases, the segmentation and classification of skin lesions are studied independently. As Figure 1 shows, to improve the skin lesion segmentation and classification performance under limited annotation data, we explore the correlation between segmentation and classification tasks to enable segmentation and classification to learn more helpful information.
This project proposes a deep convolutional neural network model, termed CL-DCNN, for the collaborative learning of dermatological classification and segmentation. The contributions of this work are three-fold:
(1)
We propose a CL-DCNN model for accurate skin lesion segmentation and classification. Different from the methods dedicated to segmentation or classification, the model tries to leverage the intrinsic correlation in segmentation and classification tasks, improving segmentation and classification performance with limited annotation data.
(2)
We provide a self-training method for segmentation by generating high-quality pseudo-labels. Specifically, to alleviate the potential segmentation performance degradation incurred by incorrect pseudo-labels, we screen reliable pseudo-labels based on the similarity between pseudo-labels and ground truth for selective retraining.
(3)
We employ class activation maps to improve the location ability of the segmentation and apply lesion masks to improve the recognition ability of the classification.

2. Related Work

2.1. Segmentation and Classification of Skin Lesion

In medical image processing, automatic disease diagnosis has been widely explored and applied to various practical computer-aided diagnosis and treatment systems [12]. Classification and segmentation are two fundamental tasks in dermatoscopy image processing. Classification can predict the type or severity of skin disease, and segmentation aims to identify pixel-level fine-grained lesion regions.
The lesion area’s shape information is essential for skin disease discrimination. Existing works have explored the method of skin lesion segmentation to assist dermatologists in diagnosing diseases. Lei et al. [13] proposed a generative adversarial network that enhances the decision making of the discriminative module through joint learning. Wang et al. [14] introduced a new knowledge-aware depth framework to integrate clinical knowledge into the task of skin lesion segmentation. Wang et al. [15] integrated a novel boundary attention gate into the transformer, enabling the network to model global long-range dependencies and capture more local details. Bi et al. [16] fused the extracted user input and image features in multiple stages to alleviate the information loss. Mirikharaji et al. [17] encoded the star shape prior to the loss function, which penalizes non-star shape segments in FCN prediction maps to guarantee a global structure in segmentation results. Wang et al. [18] designed a novel bi-directional dermoscopic feature learning framework, which models the complex correlation between skin lesions and their informative context.
Automatic skin lesion classification in dermoscopic images is critical to improving diagnostic accuracy and reducing melanoma mortality [19]. Li et al. [20] proposed a difficulty-aware meta-optimization scheme to address the classification of rare diseases, which is optimized by dynamically down-weighting easy tasks and emphasizing complex tasks. Yu et al. [21] used sequential dermoscopic images for early melanoma diagnosis, reducing the misdiagnosis of borderline cases caused by lesions’ temporal and morphological changes. Zhang et al. [19] designed an attention residual learning block that jointly uses residual learning and novel attention learning mechanisms to improve the classification network’s ability for discriminative representation. Zhang et al. [22] simultaneously used dual DCNNs with the asynergic network, which can mutually learn from each other to address the challenges caused by the intra-class variation and inter-class similarity in skin lesion classification.

2.2. Segmentation and Classification Collaborative Learning

Segmentation can provide the location and contour information of the skin lesion for classification. The benefits of segmentation to classification motivate researchers to solve problems through collaborative learning of multiple tasks [10]. Yu et al. [8] designed classification networks to use segmentation results to learn more representative and specific features, alleviating the shortage of training data. Shen et al. [9] proposed a mixed-supervision guided method and a residual-aided classification U-Net model for joint segmentation and benign–malignant classification. Xie et al. [10] used multi-task generative adversarial networks to generate accurate masks to improve classification performance. Mahbod et al. [11] studied the effect of using segmentation masks in different ways on the performance of dermatological classification.
The potential benefit of classification results to the lesion segmentation task can be achieved using the weakly supervised learning strategy [23]. This method is usually implemented by CAMs [7] to locate objects of interest in images to train the segmentation network. Zhang et al. [24] leveraged an image classification branch to generate CAMs for the annotated categories, which are further pruned into confident yet tiny object/background regions. Jo et al. [25] proposed the Puzzle-CAM algorithm to narrow the supervision gap between fully supervised semantic segmentation and weakly supervised semantic segmentation using image-level labels. Wei et al. [26] used classifiers to activate hard-to-discriminate regions to improve segmentation performance. Qin et al. [27] designed the spotlight branch and compensation branch to obtain weighted CAMs to provide supervisory signals for recalibration. Yuan et al. [28] reported a gated recurrent network with dual classification assistance for semantic segmentation to solve the blurred boundaries problem.
Many methods use the potential correlation between segmentation and classification tasks, which are tasks that can learn from each other. Zhou et al. [12] jointly improved the performance of disease grading and lesion segmentation through a semi-supervised collaborative learning method with an attention mechanism. Xie et al. [23] proposed a mutual bootstrapping model for automated skin lesion segmentation and classification. Jin et al. [29] designed a cascaded knowledge diffusion network to transfer and aggregate the knowledge learned from different tasks.

2.3. Self-Training for Segmentation

To fully use the unlabeled data to improve the segmentation performance, Yang et al. [5] performed selective retraining by ranking the reliability of unlabeled images based on overall prediction-level stability. Wang et al. [30] separate reliable and unreliable pixels via the entropy of predictions, push each unreliable pixel to a category-wise queue consisting of negative samples, and train the segmentation model with all candidate pixels. Zheng et al. [31] explicitly estimate the segment prediction uncertainty with the assistance of an auxiliary classifier and then ignore the unreliable pixel while self-training to improve the segmentation performance.
Despite the impressive results obtained by the above methods, they do not pay enough attention to the correlation between segmentation and classification tasks. Therefore, we design a CL-DCNN model based on the relationship between tasks. The correlation between classification and segmentation is paid more attention to by filtering reliable pseudo-labels, generating masks, and generating class activation maps. The classification and segmentation task can collaboratively learn more information under limited labeled data.

3. Method

3.1. Problem Definition

We propose a CL-DCNN model for accurate dermatological segmentation and classification, which consists of four networks: the teacher segmentation network, the pseudo-label quality evaluation network, the skin disease classification network, and the student segmentation network. In this model, some terms and definitions are shown in Table 1, and the pipeline is summarized in Figure 2.

3.2. Generating Reliable Pseudo-Labels

In the self-training scheme [32], unlabeled data can be used to generate pseudo-labels to help the segmentation network to learn more about the image under limited labeled data. However, some pseudo-labels generated by trained teacher segmentation network are of poor quality. If pseudo-labels with uneven quality are directly employed to train the student segmentation network, it is accessible to overfit the noise. We hope the CL-DCNN model can automatically obtain reliable pseudo-labels. To realize this task, we need to solve three problems: (1) how to generate pseudo-labels, (2) how to screen reliable pseudo-labels, and (3) how to obtain reliable pseudo-labels. Therefore, we design a reliable pseudo-labels generate method based on the similarity between pseudo-labels and ground truth for selective retraining. This method can realize the automatic screen of reliable pseudo-labels by training a pseudo-label quality evaluation classification network.

3.2.1. Generating Pseudo-Labels

In order to generate pseudo-labels, we build a teacher segmentation network teacher-SN based on Deeplabv3+ [33], which is pre-trained on the MS-COCO [34] and PASCAL VOC 2012 [35] datasets. To adapt the Deeplabv3+ network to the skin lesion segmentation task, we remove its last convolutional layer and then add a new convolutional layer with the output channel of one for prediction. The weights of the new layer are randomly initialized, and the activation function in the last layer is set to the sigmoid function. Pixels at the edge of lesions are usually difficult to classify, so we employ rank loss [23] to promote the segmentation network to focus on hard pixels and learn more discriminative representations. Figure 3a shows that the teacher-SN is trained with the segmentation training set D l P . The pseudo-labels can be produced by inputting the unlabeled images into the trained teacher-SN. However, the incorrect predictions in some hard examples may negatively impact the following self-training process.

3.2.2. Screening Reliable Pseudo-Labels

To realize the automatic screen of reliable pseudo-labels, we build a pseudo-label quality evaluation network quality-CN and then generate an classification training set that presents the quality grade of pseudo-labels for the training of quality-CN.
The quality-CN is built upon the advanced Xception network [36], which is pre-trained on the ImageNet dataset [37]. After performing global average pooling, features are input to a fully connected layer of C randomly initialized neurons followed by a softmax activation function. The quality-CN aims to classify the reliability of pseudo-labels (reliable or unreliable) after inputting the original images and pseudo-labels. Therefore, the C is set as 2. We optimize quality-CN by minimizing the cross-entropy loss.
Inspired by ST++ [5], we generate an image-level pseudo-label quality grade training set D p s e u d o I based on the prediction level in the entire training course for the training of quality-CN. D p s e u d o I consists of k N dermatology images X l P i = 1 k N , pseudo-labels Y p s e u d o P i = 1 k N , and the image-level quality labels Y p s e u d o I i = 1 k N of pseudo-labels. Pseudo-label Y p s e u d o P is generated by inputting each dermatology image X l P from dataset D l P into teacher-SN’s checkpoint. The quality label Y p s e u d o I presents the reliability of the pseudo-label, which is obtained by calculating the similarity with ground truth Y l P ( Y l P D l P ). The plainest process of generating the pseudo-label quality grade training set D p s e u d o I is shown in Figure 3b. Since the training model tends to converge and achieves different performances in the middle training stage, we input each image X l P from dataset D l P into k checkpoints of the teacher-SN to generate k pseudo-labels Y p s e u d o P i = 1 k of different qualities. Checkpoints are intermediate models that have not fully converged, and they are often used to save parameters. Then, to measure the reliability of each pseudo-label Y p s e u d o P , we compute the Jaccard score s between pseudo-label Y p s e u d o P and ground truth Y l P :
s = Jaccard Y l P , Y p s e u d o P = | Y l P Y p s e u d o P | / | Y l P Y p s e u d o P |
The Jaccard can serve as a measurement for stability and further reflect the reliability of the Y p s e u d o P . Based on the Jaccard scores, the pseudo-labels are classified into high-quality ( Y p s e u d o I = 1 ) and low-quality ( Y p s e u d o I = 0 ).
Y p s e u d o I = { 1 s t 0 s < t
t is a threshold that is obtained by empirical. D p s e u d o I is generated based on the quality grade of the pseudo-labels.
After obtaining the classification training set D p s e u d o I , the quality-CN is trained to evaluate the quality of the pseudo labels. As shown in Figure 3c, each image X l P and corresponding pseudo-label Y p s e u d o P are concatenated along the dimension of the channel and input into quality-CN. The quality-CN is trained according to the category label Y p s e u d o I in D p s e u d o I .

3.2.3. Obtaining Reliable Pseudo-Labels

D u is an unlabeled segmentation training set containing n pieces of unlabeled images X u i = 1 n . We input each unlabeled image X u from dataset D u into teacher-SN to generate pseudo-label Y u P . After that, the n pieces of unlabeled images X u i = 1 n with their corresponding pseudo-labels Y u P i = 1 n are concatenated along the dimensionality of channels and then input to the trained quality-CN to screen n n < n reliable pseudo-labels Y u P . The screened reliable pseudo-label dataset is denoted by D pseudo P = X u , Y u P i = 1 n . The pseudocode of generating reliable pseudo-labels is illustrated in Algorithm 1, which works as a strong baseline for our self-training method.

3.3. Segmentation in Weakly Labeled Data

In addition to unlabeled data, image-level labeled data also can be used to train segmentation networks by weak supervision. To allow the segmentation network to learn more information under limited annotation data, we use both unlabeled and image-level labeled data to train the student-SN in the form of pseudo-labels and class activation maps. The uniqueness of this method lies in mining the potential benefits of classification to segmentation from two aspects to alleviate the problem of the small amount of pixel-level labeled data. On the one hand, we employ quality-CN to evaluate the quality level of pseudo-labels and provide reliable pseudo-labels to student-SN for self-training. On the other hand, we use disease-CN to generate accurate CAMs to transfer the localization prior to student-SN. We have introduced the generation method of reliable pseudo-labels in Section 3.2. Next, we will focus on the production and employment of CAMs.

3.3.1. Generating CAMs

CAM is first proposed by [7] through global average pooling. A CAM for a particular category indicates the discriminative image regions used by CNN to identify that category. The CAM approach can localize objects from a classification model [38], which is widely used in weakly supervised semantic segmentation. However, in most circumstances, the CAMs directly generated by the classification network are not precise enough. The masks generated by the segmentation network possess the location and contour information of the skin lesion. Therefore, we employ masks to promote the disease-CN to generate precise CAMs.
We use the classification training dataset D l I and the masks generated by the teacher-SN to train the skin disease classification network disease-CN. Each classification training image and its corresponding lesion mask are concatenated as an input to disease-CN, which aims to enhance disease-CN’s location ability to produce accurate CAMs.
Algorithm 1: Generate reliable pseudo-labels
  Input: Pixel-level labeled dataset D l P = X l P , Y l P i = 1 N , unlabeled dataset
                                  D u = X u i = 1 n , teacher-SN T, quality-CN Q
  Output: Reliable pseudo-labels and corresponding images
  1  // Train T to generate pseudo-labels
  2  Train T on D l P and save k checkpoints T j j = 1 k
  3  // Train Q to screen reliable pseudo-labels
  4  for X l P D l P do
  5           for T j T j j = 1 k do
  6                    Generate pseudo-label Y p s e u d o P = T j ( X l P )
  7                    Compute the Jaccard score s with Equation (1) between Y l P and Y p s e u d o P
  8                    The category Y p s e u d o I of Y p s e u d o P is set according to s by Equation (2)
  9  Denote the pseudo-label quality level training set as D p s e u d o I .
           D p s e u d o I = { ( X l P , Y p s e u d o P ) , Y p s e u d o I } i = 1 k N
10  Train Q on D p s e u d o I
11  // Obtain reliable pseudo-labels from D u
12   D p s e u d o P = { }
13   n = 0
14  for X u D u do
15           Generate pseudo-label T ( X u )
16           If Q ( X u , T ( X u ) ) = 1
17                     D p s e u d o P = D p s e u d o P ( X u , T ( X u ) )
18                     n + +
19   D p s e u d o P = { X u , T ( X u ) } i = 1 n
20  Return D p s e u d o P

3.3.2. Refine Segmentation

As shown in Figure 4, images from segmentation training dataset D l P D p s e u d o P and corresponding masks are concatenated along the dimension of the channels and then input into the trained disease-CN. We weight the feature maps produced by the last convolutional layer of disease-CN using the class-specific weights of the output layer. Then, all channels of the weighted feature maps are summed to generate the CAMs.
The backbone network of the student segmentation network student-SN is the same as that of teacher-SN. To migrate the lesion location information from the CAMs into the student-SN, we add a fusion layer after the encoder of the student-SN. The feature maps extracted by the encoder are stitched with the CAMs along the channel dimension. The fusion layer fuses the spliced information using a convolutional layer with post-conjugated BN and ReLU activation functions. Then, the fused feature maps are fed into the decoder to refine segmentation. The enhanced CAMs are employed as a prior to help the student-SN learn the location information of lesions and reduce the need for dense pixel-level annotations. In addition, the student-SN is trained with the pixel-level labeled dataset D l P and pseudo-label dataset D p s e u d o P to learn more about image features.

3.4. Utilizing Masks to Classify

In clinical environments, pathologists generally diagnose melanoma according to the lesion border information. Pigmented nevi are generally symmetrical in shape, mostly round, with well-defined margins. Melanoma is asymmetrical in shape, with irregular and indistinct margins. The contour information of the lesion is crucial for the diagnosis of melanoma. In addition, noise in the dermoscopic image (such as hairs and bubbles) may interfere with the discrimination of disease-CN. With the assistance of quality-CN and disease-CN, the student-SN’s segmentation performance may be improved by obtaining information about the image feature and lesion position from pseudo-labels and CAMs. Therefore, we employ the segmentation masks generated by the student-SN to provide contour information and help the disease-CN focus on areas of skin lesions that are more meaningful for diagnosis, reducing the impact of noise and relatively unimportant background areas on category determination.
The disease-CN’s structure is roughly the same as quality-CN. The difference is that after performing global average pooling, the features are input to a randomly initialized fully connected layer with three neurons followed by a softmax activation function. We use the skin disease classification training set D l I to train the disease-CN. During the training phase, the images X l I and masks T X l I are concatenated along the channel dimension as the input to the disease-CN, aiming to improve the diagnostic performance of disease-CN for skin disease.

4. Experiments

4.1. Dataset

We evaluate the proposed CL-DCNN model on two dermoscopic image datasets.
(1)
ISIC 2017: ISIC 2017 is a skin lesion segmentation and classification dataset provided by the International Skin Imaging Collaboration Organization. The ISIC 2017 dataset included 2000 dermoscopic images for training, 150 for validation, and 600 for testing. Each dermoscopic image has its corresponding pixel-level expert annotation for segmentation and a gold-standard diagnosis of lesions (melanoma, mole, and seborrheic keratosis) for classification. We use the pixel-level labeled data of ISIC 2017 to train the teacher-SN and student-SN to segment the skin lesions, and we use the image-level labeled data to train the disease-CN to diagnose the type of skin disease.
(2)
ISIC Archive: ISIC Archive is a skin lesion classification dataset that contains 1320 image-level annotated dermoscopic images. It comprises 466 cases diagnosed with melanoma, 32 cases with seborrheic keratosis, and 822 cases with nevus. We use ISIC Archive to expand the disease-CN’s training data and serve the ISIC Archive as unlabeled data to generate pseudo-labels. The relevant information of the dataset is shown in Table 2.

4.2. Evaluation Metrics

(1)
Segmentation evaluation metrics: we use five indicators of the Jaccard index (JA), Dice coefficient (DI), pixel-wise accuracy (pixel-AC), pixel-wise sensitivity (pixel-SE), and pixel-wise specificity (pixel-SP) to evaluate the segmentation performance.
(2)
Classification evaluation metrics: we use four indicators of the area under receive operation curve (AUC), accuracy (AC), sensitivity (SE), and specificity (SP) to evaluate the classification performance.

4.3. Experimental Details

The teacher-SN is trained on the pixel-level labeled dataset ISIC 2017. We input the images from ISIC 2017 into k (Empirically set to 5) different checkpoints of the trained teacher-SN to generate pseudo-labels. Based on the similarity between the pseudo-labels and ground truth from ISIC 2017, a classification training set D p s e u d o I presents the quality level of the pseudo-labels is generated, and the quality-CN is trained on D p s e u d o I . The images without pixel-level annotations from ISIC Archive are input into the trained teacher-SN to generate pseudo-labels. The quality of the pseudo-labels are evaluated by quality-CN to obtain a reliable pixel-level pseudo-label training set D p s e u d o P . The disease-CN is trained on image-level labeled datasets ISIC 2017 and ISIC Archive. The student-SN is trained on pixel-level labeled datasets ISIC 2017 and D p s e u d o P .
Before training the network, the images are preprocessed by the random affine transformation, vertical flip, horizontal flip, and other data enhancement operations to increase the data’s diversity and prevent overfitting. We use the adam algorithm to optimize the networks. The initial learning rates are set to 0.0001, and the maximum iteration period is 500. The ISIC 2017 validation set is used to monitor the CL-DCNN model’s convergence and terminates the training process if the model falls into overfitting. In the testing phase, the trained CL-DCNN model is directly applied to the ISIC 2017 testing set to evaluate the skin lesion segmentation and classification performance.

4.4. Experimental Results

4.4.1. Segmentation Results

We compare the segmentation performance of the CL-DCNN model with other skin disease segmentation methods on the ISIC 2017 testing set. These segmentation methods include FCN [39], U-Net [40], generative adversarial networks with dual discriminators DAGAN [13], edge and neighborhood guidance network ENGNet [41], neighborhood context refinement network NCRNet [42], AG-Net [43] and MultiResUNet [44]. It can be seen from Table 3 that the CL-DCNN has achieved superior segmentation results in the three indicators of JA, DI, and Pixel-AC. The CL-DCNN model achieved a JA of 79.1, DI of 86.7, and Pixel-AC of 94.1. Specifically, our model achieves a 79.1 JA, which is 0.5% higher than the second-best model NCRNet’s. To demonstrate the performances of our proposed method, we visualize the segmentation results at each stage of the CL-DCNN model in Figure 5. The CL-DCNN model achieves more accurate segmentation results in the second stage, which is closer to the ground truth.

4.4.2. Classification Results

Table 4 shows the average classification performance of the CL-DCNN model. We compare it with several classification methods: Xception [36], advanced semi-supervised adversarial classification model SSAC [45], attention residual learning convolutional neural network ARL-CNN [19], synergic deep learning model SDL [46], MWNL-CLS [47], and mutual bootstrapping deep convolutional neural networks MBDCNN [23]. The CL-DCNN model can obtain the highest AC, SP, and AUC compared to other approaches. The CL-DCNN model achieved an AUC of 93.7, SP of 94.7, and AC of 90.7, which improves the AUC by 0.9%, SP by 0.4%, and AC by 0.1%. The substantial performance gains over the base model and five recent solutions indicate the superiority of the proposed CL-DCNN model.

4.4.3. Advantages of CAMs and Pseudo-Labels

The uniqueness of the proposed skin lesion self-training segmentation method lies in that the student-SN can learn from both CAMs and pseudo-labels (PLs). We transfer the high-quality lesion area activation maps to the student-SN to improve its localization ability. Moreover, we provide a reliable pseudo-label generate method based on the similarity between pseudo-labels and ground truth for selective retraining.
CAMs can build a generic localizable deep representation that exposes the implicit attention of CNNs on an image [7]. To exhibit the effectiveness of CAMs, we visualize the segmentation results with and without CAMs in Figure 6. Figure 6c shows that CAMs can activate the lesion area—the closer to the lesion center, the higher the network response. In addition, the location of CAMs is close to the ground truth. Therefore, we use CAMs to assist the segmentation network to obtain the information of the lesion location. As shown in Figure 6d, with the help of CAMs, the segmentation results generated by our work are more consistent with ground truth than those not using CAMs. Consequently, we believe that CAMs could help promote CL-DCNN to better locate the lesion area to achieve better segmentation performance.
Figure 7 shows the reliable pseudo-labels and unreliable pseudo-labels evaluated by quality-CN. The contour of the reliable pseudo-labels is consistent with the area of skin lesions. The screened reliable pseudo-labels show the potential to help the student-SN to reduce the need for pixel-level labeled data and learn from image features.
To verify the validity of the CAMs and pseudo-labels, we implement the following ablation experiments, as shown in Table 5. The results show that the segmentation performance is improved, including JA, DI, and Pixel-SE, after using CAMs and reliable pseudo-labels. Comparing to the base model, our model improves the average JA by 0.8%, DI by 0.7%, and Pixel-SE by 2.6%. Significantly, the JA improved from 78.3% to 79.1%. These results prove that the CAMs and pseudo-labels have the ability to improve segmentation performance.

4.4.4. Advantages of Masks

In order to represent the impact of masks on the performance of classification network, we visualized the network’s attention in the form of CAMs in Figure 8. CAMs could visualize the predicted class scores on a given image, highlighting the discriminative object parts detected by the CNN. It can be seen that after using the masks generated by teacher-SN, disease-CN will pay more attention to the lesion area. The information on lesion area plays an essential role in skin disease judgment.
Table 6 shows the average classification performance of melanoma and seborrheic keratosis when nothing migrated to the disease-CN, and the masks generated by the teacher-SN and student-SN migrated to the disease-CN. After using the masks generated by teacher-SN, the classification performance improved by 0.2 of AC, 1.2 of SP, and 0.7 of AUC. Significantly, after using the masks generated by student-SN, the classification performance improved by 0.9 of AC, 0.7 of SE, and 0.9 of AUC. It can be seen that masks are useful in improving classification performance. The masks generated by the student-SN can assist the disease-CN in obtaining better classification performance. We conduct that masks provide the location and contour information of skin lesions, in which discriminative features would be extracted by disease-CN. The higher the accuracy of the masks, the more it can promote the classification network to achieve accurate classification performance.

5. Conclusions

In this paper, we proposed a CL-DCNN model for the collaborative learning of dermatological segmentation and classification. The model fully exploits the correlation between tasks under limited annotation data, allowing the segmentation and classification network to learn more information. The experimental results show that the skin lesion segmentation performance can be improved by using the reliable pseudo-labels screened by the classification network and target location maps generated by the classification network. In addition, the accurate masks produced by the segmentation network help improve the discriminative ability of classification. The limitations of this method are mainly reflected in the generalization of the model, which is challenging to apply to clinical practice. We have tried applying the model to other data, but the result is not ideal. Therefore, a potential work is to explore how to make the prediction results of other modality data accurate. In the future, we plan to extend this framework to domain adaptation to improve the model’s generalization ability.

Author Contributions

Conceptualization, Y.W.; methodology, Y.W. and J.S.; software, Y.W.; validation, Y.W.; formal analysis, Y.W. and J.S.; investigation, Y.W. and J.S.; resources, Y.Z.; data curation, Y.W.; writing—original draft preparation, Y.W. and Q.X.; writing—review and editing, Y.W.; visualization, Y.W.; supervision, Y.W. and J.S.; project administration, Y.W.; funding acquisition, J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (No. 52001039), National Natural Science Foundation of China under Grand (No. 52171310), Shandong Natural Science Foundation in China (No. ZR2019LZH005), Research fund from Science and Technology on Underwater Vehicle Technology Laboratory (No. 2021JCJQ-SYSJJ-LB06903). University Innovation Team Project of Jinan (No. 2019GXRC015). Science and technology improvement project for small and medium-sized enterprises in Shandong Province (No. 2021TSGC1012).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Acknowledgments

The authors thank the anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Siegel, R.L.; Miller, K.D.; Fuchs, H.E.; Jemal, A. Cancer statistics, 2021. CA Cancer J. Clin. 2021, 71, 7–33. [Google Scholar] [PubMed]
  2. Xie, Y.; Zhang, J.; Lu, H.; Shen, C.; Xia, Y. SESV: Accurate medical image segmentation by predicting and correcting errors. IEEE Trans. Med. Imaging 2020, 40, 286–296. [Google Scholar] [PubMed]
  3. Luo, X.; Chen, J.; Song, T.; Wang, G. Semi-supervised medical image segmentation through dual-task consistency. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 8801–8809. [Google Scholar]
  4. Scudder, H. Probability of error of some adaptive pattern-recognition machines. IEEE Trans. Inf. Theory 1965, 11, 363–371. [Google Scholar]
  5. Yang, L.; Zhuo, W.; Qi, L.; Shi, Y.; Gao, Y. St++: Make self-training work better for semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 4268–4277. [Google Scholar]
  6. Zhou, Z.H. A brief introduction to weakly supervised learning. Natl. Sci. Rev. 2018, 5, 44–53. [Google Scholar]
  7. Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
  8. Yu, L.; Chen, H.; Dou, Q.; Qin, J.; Heng, P.A. Automated melanoma recognition in dermoscopy images via very deep residual networks. IEEE Trans. Med. Imaging 2016, 36, 994–1004. [Google Scholar] [CrossRef]
  9. Shen, T.; Gou, C.; Wang, J.; Wang, F.Y. Simultaneous segmentation and classification of mass region from mammograms using a mixed-supervision guided deep model. IEEE Signal Process. Lett. 2019, 27, 196–200. [Google Scholar] [CrossRef]
  10. Xie, H.; He, Y.; Xu, D.; Kuo, J.Y.; Lei, H.; Lei, B. Joint segmentation and classification task via adversarial network: Application to HEp-2 cell images. Appl. Soft Comput. 2022, 114, 108156. [Google Scholar]
  11. Mahbod, A.; Tschandl, P.; Langs, G.; Ecker, R.; Ellinger, I. The effects of skin lesion segmentation on the performance of dermatoscopic image classification. Comput. Methods Programs Biomed. 2020, 197, 105725. [Google Scholar]
  12. Zhou, Y.; He, X.; Huang, L.; Liu, L.; Zhu, F.; Cui, S.; Shao, L. Collaborative learning of semi-supervised segmentation and classification for medical images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, Long Beach, CA, USA, 15–20 June 2019; pp. 2079–2088. [Google Scholar]
  13. Lei, B.; Xia, Z.; Jiang, F.; Jiang, X.; Ge, Z.; Xu, Y.; Qin, J.; Chen, S.; Wang, T.; Wang, S. Skin lesion segmentation via generative adversarial networks with dual discriminators. Med. Image Anal. 2020, 64, 101716. [Google Scholar]
  14. Wang, X.; Jiang, X.; Ding, H.; Zhao, Y.; Liu, J. Knowledge-aware deep framework for collaborative skin lesion segmentation and melanoma recognition. Pattern Recognit. 2021, 120, 108075. [Google Scholar]
  15. Wang, J.; Wei, L.; Wang, L.; Zhou, Q.; Zhu, L.; Qin, J. Boundary-aware transformers for skin lesion segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September–1 October 2021; Springer: Cham, Switzerland, 2021; pp. 206–216. [Google Scholar]
  16. Bi, L.; Fulham, M.; Kim, J. Hyper-fusion network for semi-automatic segmentation of skin lesions. Med. Image Anal. 2022, 76, 102334. [Google Scholar] [PubMed]
  17. Mirikharaji, Z.; Hamarneh, G. Star shape prior in fully convolutional networks for skin lesion segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, Granada, Spain, 16–20 September 2018; Springer: Cham, Switzerland, 2018; pp. 737–745. [Google Scholar]
  18. Wang, X.; Jiang, X.; Ding, H.; Liu, J. Bi-directional dermoscopic feature learning and multi-scale consistent decision fusion for skin lesion segmentation. IEEE Trans. Image Process. 2019, 29, 3039–3051. [Google Scholar]
  19. Zhang, J.; Xie, Y.; Xia, Y.; Shen, C. Attention residual learning for skin lesion classification. IEEE Trans. Med. Imaging 2019, 38, 2092–2103. [Google Scholar] [PubMed]
  20. Li, X.; Yu, L.; Jin, Y.; Fu, C.W.; Xing, L.; Heng, P.A. Difficulty-aware meta-learning for rare disease diagnosis. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Lima, Peru, 4–8 October 2020; Springer: Cham, Switzerland, 2020; pp. 357–366. [Google Scholar]
  21. Yu, Z.; Nguyen, J.; Nguyen, T.D.; Kelly, J.; Mclean, C.; Bonnington, P.; Zhang, L.; Mar, V.; Ge, Z. Early Melanoma Diagnosis with Sequential Dermoscopic Images. IEEE Trans. Med. Imaging 2021, 41, 633–646. [Google Scholar]
  22. Zhang, J.; Xie, Y.; Wu, Q.; Xia, Y. Skin lesion classification in dermoscopy images using synergic deep learning. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16–20 September 2018; Springer: Cham, Switzerland, 2018; pp. 12–20. [Google Scholar]
  23. Xie, Y.; Zhang, J.; Xia, Y.; Shen, C. A mutual bootstrapping model for automated skin lesion segmentation and classification. IEEE Trans. Med. Imaging 2020, 39, 2482–2493. [Google Scholar] [CrossRef] [Green Version]
  24. Zhang, B.; Xiao, J.; Wei, Y.; Sun, M.; Huang, K. Reliability does matter: An end-to-end weakly supervised semantic segmentation approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 22 February–1 March 2022; Volume 34, pp. 12765–12772. [Google Scholar]
  25. Jo, S.; Yu, I.J. Puzzle-CAM: Improved localization via matching partial and full features. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 639–643. [Google Scholar]
  26. Wei, Y.; Feng, J.; Liang, X.; Cheng, M.M.; Zhao, Y.; Yan, S. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1568–1576. [Google Scholar]
  27. Qin, J.; Wu, J.; Xiao, X.; Li, L.; Wang, X. Activation modulation and recalibration scheme for weakly supervised semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 2117–2125. [Google Scholar]
  28. Yuan, F.; Zhang, L.; Xia, X.; Huang, Q.; Li, X. A gated recurrent network with dual classification assistance for smoke semantic segmentation. IEEE Trans. Image Process. 2021, 30, 4409–4422. [Google Scholar]
  29. Jin, Q.; Cui, H.; Sun, C.; Meng, Z.; Su, R. Cascade knowledge diffusion network for skin lesion diagnosis and segmentation. Appl. Soft Comput. 2021, 99, 106881. [Google Scholar]
  30. Wang, Y.; Wang, H.; Shen, Y.; Fei, J.; Li, W.; Jin, G.; Wu, L.; Zhao, R.; Le, X. Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 4248–4257. [Google Scholar]
  31. Zheng, Z.; Yang, Y. Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. Int. J. Comput. Vis. 2021, 129, 1106–1120. [Google Scholar]
  32. Lee, D.-H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proceedings of the Workshop on Challenges in Representation Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013; Volume 3, p. 896. [Google Scholar]
  33. Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
  34. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
  35. Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2009, 88, 303–308. [Google Scholar]
  36. Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
  37. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  38. Liu, S.A.; Xie, H.; Xu, H.; Zhang, Y.; Tian, Q. Partial Class Activation Attention for Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 16836–16845. [Google Scholar]
  39. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  40. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
  41. Cao, W.; Zheng, J.; Xiang, D.; Ding, S.; Sun, H.; Yang, X.; Liu, Z.; Dai, Y. Edge and neighborhood guidance network for 2D medical image segmentation. Biomed. Signal Process. Control 2021, 69, 102856. [Google Scholar]
  42. Liu, Q.; Wang, J.; Zuo, M.; Cao, W.; Zheng, J.; Zhao, H.; Xie, J. NCRNet: Neighborhood context refinement network for skin lesion segmentation. Comput. Biol. Med. 2022, 146, 105545. [Google Scholar] [PubMed]
  43. Schlemper, J.; Oktay, O.; Schaap, M.; Heinrich, M.; Kainz, B.; Glocker, B.; Rueckert, D. Attention gated networks: Learning to leverage salient regions in medical images. Med. Image Anal. 2019, 53, 197–207. [Google Scholar] [PubMed]
  44. Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020, 121, 74–87. [Google Scholar] [PubMed]
  45. Xie, Y.; Zhang, J.; Xia, Y. Semi-supervised adversarial model for benign–malignant lung nodule classification on chest CT. Med. Image Anal. 2019, 57, 237–248. [Google Scholar]
  46. Zhang, J.; Xie, Y.; Wu, Q.; Xia, Y. Medical image classification using synergic deep learning. Med. Image Anal. 2019, 54, 10–19. [Google Scholar]
  47. Yao, P.; Shen, S.; Xu, M.; Liu, P.; Zhang, F.; Xing, J.; Shao, P.; Kaffenberger, B.; Xu, R.X. Single model deep learning on imbalanced small datasets for skin lesion classification. IEEE Trans. Med. Imaging 2021, 41, 1242–1254. [Google Scholar]
Figure 1. The correlation between skin lesion segmentation and classification tasks. Segmentation can provide the contour information of lesions for classification. Classification can generate class activation maps to provide the location information of lesions for segmentation. Classification also can be used to screen pseudo-labels for segmentation.
Figure 1. The correlation between skin lesion segmentation and classification tasks. Segmentation can provide the contour information of lesions for classification. Classification can generate class activation maps to provide the location information of lesions for segmentation. Classification also can be used to screen pseudo-labels for segmentation.
Diagnostics 13 00912 g001
Figure 2. Structure of the CL-DCNN model for skin lesion segmentation and classification. The teacher-SN is constructed to generate the pseudo-labels, which are concatenated with original images as the input to train the quality-CN to screen reliable pseudo-labels for self-training. Then, the masks generated by teacher-SN are employed to provide disease-CN with information about the lesions and promote disease-CN, which can generate accurate CAMs. Following that, we take advantage of CAMs and reliable pseudo-labels to improve the student-SN’s skin lesion segmentation performance. In the end, the masks generated by student-SN are employed to improve the disease-CN’s skin disease identification ability.
Figure 2. Structure of the CL-DCNN model for skin lesion segmentation and classification. The teacher-SN is constructed to generate the pseudo-labels, which are concatenated with original images as the input to train the quality-CN to screen reliable pseudo-labels for self-training. Then, the masks generated by teacher-SN are employed to provide disease-CN with information about the lesions and promote disease-CN, which can generate accurate CAMs. Following that, we take advantage of CAMs and reliable pseudo-labels to improve the student-SN’s skin lesion segmentation performance. In the end, the masks generated by student-SN are employed to improve the disease-CN’s skin disease identification ability.
Diagnostics 13 00912 g002
Figure 3. Generating reliable pseudo-labels: (a) generating pseudo-labels, (b) generating pseudo-label quality level training set, and (c) screening reliable pseudo-labels.
Figure 3. Generating reliable pseudo-labels: (a) generating pseudo-labels, (b) generating pseudo-label quality level training set, and (c) screening reliable pseudo-labels.
Diagnostics 13 00912 g003
Figure 4. Student-SN learning in weakly labeled data.
Figure 4. Student-SN learning in weakly labeled data.
Diagnostics 13 00912 g004
Figure 5. Segmentation results generated by the CL-DCNN model at each stage: (a) dermoscopy images, (b) segmentation results generated by teacher-SN, (c) CAMs generated by disease-CN, (d) segmentation results generated by student-SN, and (e) ground truth.
Figure 5. Segmentation results generated by the CL-DCNN model at each stage: (a) dermoscopy images, (b) segmentation results generated by teacher-SN, (c) CAMs generated by disease-CN, (d) segmentation results generated by student-SN, and (e) ground truth.
Diagnostics 13 00912 g005
Figure 6. Comparison of the segmentation results obtained by segmentation network with or without using the CAMs: (a) dermoscopy images, (b) segmentation results obtained when not using CAMs, (c) CAMs, (d) segmentation results obtained when using CAMs, and (e) ground truth.
Figure 6. Comparison of the segmentation results obtained by segmentation network with or without using the CAMs: (a) dermoscopy images, (b) segmentation results obtained when not using CAMs, (c) CAMs, (d) segmentation results obtained when using CAMs, and (e) ground truth.
Diagnostics 13 00912 g006
Figure 7. Comparison between reliable pseudo-labels and unreliable pseudo-labels: (a) reliable pseudo-labels, corresponding images, and (b) unreliable pseudo-labels, corresponding images.
Figure 7. Comparison between reliable pseudo-labels and unreliable pseudo-labels: (a) reliable pseudo-labels, corresponding images, and (b) unreliable pseudo-labels, corresponding images.
Diagnostics 13 00912 g007
Figure 8. Comparison of the CAMs obtained by disease-CN with or without using the lesion masks: (a) dermoscopy images, (b) CAMs obtained when not using lesion masks, (c) CAMs obtained when using lesion masks, and (d) ground truth.
Figure 8. Comparison of the CAMs obtained by disease-CN with or without using the lesion masks: (a) dermoscopy images, (b) CAMs obtained when not using lesion masks, (c) CAMs obtained when using lesion masks, and (d) ground truth.
Diagnostics 13 00912 g008
Table 1. Terms and corresponding definitions.
Table 1. Terms and corresponding definitions.
TermsDefinitions
Teacher-SNTeacher-SN is a teacher segmentation network used to generate pseudo-labels and masks.
Disease-CNDisease-CN is a disease classification network used to diagnose skin disease types.
Quality-CNQuality-CN is a pseudo-label quality evaluation network for screening reliable pseudo-labels.
Student-SNStudent-SN is a student segmentation network for the fine segmentation of skin lesion regions.
D l P = X l P , Y l P i = 1 N D l P is a segmentation training set that contains N dermoscopy images X l P i = 1 N and corresponding pixel-level labels Y l P i = 1 N . Some pixels in Y l P belong to the skin lesion area, and others belong to the normal skin area.
D l I = X l I , Y l I i = 1 M D l I is a classification training set that contains M dermoscopy images X l I i = 1 M and corresponding image-level labels Y l I i = 1 M . Y l I represents the type of skin disease (melanoma, nevus, or seborrheic keratosis).
D u = X u i = 1 n D u is an unlabeled segmentation training set that contains n dermoscopy images X u i = 1 n .
Table 2. Details of dataset ISIC 2017 and ISIC Archive.
Table 2. Details of dataset ISIC 2017 and ISIC Archive.
DatasetFormatLabelTrainSize ValidationTesting
ISIC 2017PngPixel-level2000150600
Image-level
ISIC ArchiveImage-level132000
Table 3. Segmentation performance of the CL-DCNN model and other skin lesion segmentation methods on the ISIC 2017 testing set. The highest results are shown in bold font for easy observation and analysis.
Table 3. Segmentation performance of the CL-DCNN model and other skin lesion segmentation methods on the ISIC 2017 testing set. The highest results are shown in bold font for easy observation and analysis.
MethodJADIPixel-ACPixel-SEPixel-SP
FCN [39]75.284.193.982.297.0
U-Net [40]76.585.293.384.597.3
DAGAN [13]77.185.993.583.597.6
ENGNet [41]77.185.393.282.797.8
NCRNet [42]78.686.694.086.995.9
AG-Net [43]76.985.393.583.597.4
MultiResUNet [44]76.885.293.683.996.8
Ours79.186.794.186.595.9
Table 4. Comparison of the average classification performance between CL-DCNN model and other skin lesion classification methods on the ISIC 2017 testing set. The highest results are shown in bold font for easy observation and analysis.
Table 4. Comparison of the average classification performance between CL-DCNN model and other skin lesion classification methods on the ISIC 2017 testing set. The highest results are shown in bold font for easy observation and analysis.
MethodACSESPAUC
Xception [36]89.870.194.392.8
ARL-CNN [19]86.476.388.291.7
SSAC [45]86.273.691.091.6
SDL [46]90.6--91.3
MWNL-CLS [47]76.356.476.091.7
MBDCNN [23]90.478.593.0-
Ours90.770.894.793.7
Table 5. Segmentation performance of the CL-DCNN model on the ISIC 2017 testing set after training with CAMs and pseudo-labels. The highest results are shown in bold font for easy observation and analysis.
Table 5. Segmentation performance of the CL-DCNN model on the ISIC 2017 testing set after training with CAMs and pseudo-labels. The highest results are shown in bold font for easy observation and analysis.
MethodsJADIPixel-ACPixel-SEPixel-SP
CAMsPLs
78.386.094.183.997.7
78.986.594.385.096.7
79.186.794.186.595.9
Table 6. Comparison of average classification performance with or without masks. The highest results are shown in bold font for easy observation and analysis.
Table 6. Comparison of average classification performance with or without masks. The highest results are shown in bold font for easy observation and analysis.
MethodsACSESPAUC
Teacher-SN’s MaskStudent-SN’s Mask
89.870.194.392.8
90.065.295.593.5
90.770.894.793.7
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Su, J.; Xu, Q.; Zhong, Y. A Collaborative Learning Model for Skin Lesion Segmentation and Classification. Diagnostics 2023, 13, 912. https://doi.org/10.3390/diagnostics13050912

AMA Style

Wang Y, Su J, Xu Q, Zhong Y. A Collaborative Learning Model for Skin Lesion Segmentation and Classification. Diagnostics. 2023; 13(5):912. https://doi.org/10.3390/diagnostics13050912

Chicago/Turabian Style

Wang, Ying, Jie Su, Qiuyu Xu, and Yixin Zhong. 2023. "A Collaborative Learning Model for Skin Lesion Segmentation and Classification" Diagnostics 13, no. 5: 912. https://doi.org/10.3390/diagnostics13050912

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop