A Dual-Headed Teacher–Student Framework with an Uncertainty-Guided Mechanism for Semi-Supervised Skin Lesion Segmentation

Zou, Changman; Jeon, Wang-Su; Ju, Hye-Rim; Rhee, Sang-Yong

doi:10.3390/electronics14050984

Open AccessArticle

A Dual-Headed Teacher–Student Framework with an Uncertainty-Guided Mechanism for Semi-Supervised Skin Lesion Segmentation

¹

College of Computer Science and Technology, Beihua University, Jilin 132013, China

²

Department of IT Convergence Engineering, Kyungnam University, Changwon 51767, Republic of Korea

³

Department of Computer Engineering, Kyungnam University, Changwon 51767, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(5), 984; https://doi.org/10.3390/electronics14050984

Submission received: 18 January 2025 / Revised: 24 February 2025 / Accepted: 27 February 2025 / Published: 28 February 2025

Download

Browse Figures

Versions Notes

Abstract

Medical image segmentation is a challenging task due to limited annotated data, complex lesion boundaries, and the inherent variability in medical images. These challenges make accurate and robust segmentation crucial for clinical applications. In this study, we propose the Uncertainty-Driven Auxiliary Mean Teacher (UDAMT) model, a novel semi-supervised framework specifically designed for skin lesion segmentation. Our approach employs a dual-headed teacher–student architecture with an uncertainty-guided mechanism, enhancing feature learning and boundary precision. Extensive experiments on the ISIC 2016, ISIC 2017, and ISIC 2018 datasets demonstrate that UDAMT achieves significant improvements over state-of-the-art methods, with increases of 1.17 percentage points in the Dice coefficient and 1.31 percentage points in mean Intersection over Union (mIoU) under low-label settings (5% labeled data). Furthermore, UDAMT requires 12.9 M parameters, which is slightly higher than the baseline model (12.5 M) but significantly lower than MT (14.8 M) and UAMT (15.2 M). It also achieves an inference time of 25.7 ms per image, ensuring computational efficiency. Ablation studies validate the contributions of each component, and cross-dataset evaluations on the PH2 benchmark confirm robustness to small lesions. This work provides a scalable and efficient solution for semi-supervised medical image segmentation, balancing accuracy, efficiency, and clinical applicability.

Keywords:

skin lesion segmentation; semi-supervised learning; teacher–student framework; uncertainty-guided pseudo-labeling; dual-headed architecture; medical image analysis

1. Introduction

Skin diseases, including basal-cell carcinoma (BCC), melanoma, squamous-cell carcinoma (SCC), and epithelial carcinoma, affect millions globally. Melanoma, the deadliest, accounts for about 75% of skin cancer-related deaths due to its aggressive metastatic potential [1]. Early symptoms appear as small moles or spots that may darken or grow. Early-stage melanoma can be treated with minor surgery [2]. Early detection is crucial for successful treatment, and dermatologists play a key role in identifying subtle changes in skin lesions [3]. Accurate segmentation of the lesion from the surrounding healthy tissue is vital for effective diagnosis and treatment planning.

Recent advances in artificial intelligence (AI) have revolutionized dermatological imaging, offering significant potential to assist physicians in diagnosis. AI integration improves diagnostic efficiency, reduces misdiagnosis rates, and enhances patient outcomes. Deep learning, particularly convolutional neural networks (CNNs), has become the leading method for image segmentation, automating the identification of cancerous lesions and reducing the burden on medical professionals [4,5,6,7]. However, these methods typically rely on large volumes of labeled training data, which can be resource-intensive.

The dependency on high-quality annotated data for fully supervised learning is a major limitation, as it can be costly and time-consuming to acquire, especially in the medical field [8]. Medical image annotations require a high level of expertise, as incorrect annotations can directly impact the quality and reliability of segmentation models. Alternative learning paradigms such as weakly supervised learning [9,10,11], unsupervised learning [12,13,14], and semi-supervised learning (SSL) have been explored by researchers to overcome the challenges associated with obtaining labeled data [15,16,17]. The aim of these approaches is to decrease dependence on large amounts of labeled data by incorporating different levels of supervision or using unlabeled data more effectively.

Recent advancements in SSL have demonstrated its potential to bridge the gap between fully supervised and unsupervised learning paradigms. SSL presents an attractive solution by enabling the model to learn from a small amount of labeled data along with a large amount of unlabeled data. This approach has significant implications for real-world clinical applications, as it reduces the burden of data annotation while maintaining high performance. Pseudo-labeling is a common strategy to exploit unlabeled data, in which pseudo-labels are assigned to unlabeled images, and both labeled and pseudo-labeled data are utilized to train the segmentation model [18,19,20]. The learning process and segmentation quality can suffer due to the presence of noise in pseudo-labels, which is a major drawback of this approach. Medical imaging requires precision, and this issue is particularly crucial. The difficulty of segmenting skin disease images is shown in Figure 1.

To address the challenges associated with noisy pseudo-labels, researchers have proposed various methods aimed at improving the quality of pseudo-labels and enhancing the learning process. Recent advancements in semi-supervised medical image segmentation have been made possible by the incorporation of consistency regularization and unsupervised loss functions. The Mean Teacher (MT) model, in particular, has gained considerable attention due to its ability to enhance model stability and performance by enforcing consistency between the outputs of a teacher model and a student model when subjected to different perturbations [21]. The parameters of the teacher model are updated using the Exponential Moving Average (EMA) of the student model’s weights, which helps stabilize the learning process and reduce the effects of noisy labels. Building on this foundation, subsequent research has focused on developing SSL algorithms that leverage consistency learning to further improve segmentation performance [22].

Despite these advancements, existing state-of-the-art (SOTA) methods, such as Uncertainty-Aware Mean Teacher (UAMT) [23], Dual Fixmatch Cross Pseudo Supervision (DFCPS) [24], Multi-Task Mean Teacher (MTMT) [25], and Attention U-Net [26], have yet to fully overcome the challenges of noisy pseudo-labels and the limited utilization of complementary learning during training. The need for innovative approaches to improve segmentation accuracy and robustness, particularly in low-label settings, is underscored by these limitations.

This paper proposes a novel end-to-end semi-supervised segmentation framework called UDAMT. UDAMT utilizes the Mean Teacher model to develop key innovations that improve segmentation performance in SSL scenarios. Our approach is to design a dual-headed segmentation network that includes an auxiliary segmentation head in the student model. By implementing this dual-headed architecture, the model can extract complementary information during training, resulting in an enhanced ability to learn from both limited-labeled and pseudo-labeled data.

To address the challenges of noisy pseudo-labels, UDAMT incorporates an uncertainty-guided mechanism that identifies high-uncertainty regions within the pseudo-labeled data and excludes them from the training process. The mechanism uses uncertainty maps to guide the model towards the most reliable regions of the pseudo-labeled images. By minimizing noisy labels and increasing precise boundary information, the proposed approach enhances the model’s overall segmentation performance.

In summary, our work specifically focuses on the segmentation of skin lesions targeting the precise delineation of the lesion region from the surrounding healthy skin thereby addressing a critical need in the early diagnosis and treatment of skin cancer.

This work has the following contributions:

We propose a novel dual-headed teacher–student framework for semi-supervised skin lesion segmentation, enhancing the student model’s feature extraction and resilience to noisy pseudo-labels through an auxiliary segmentation head.
UDAMT introduces a pseudo-labeling mechanism based on uncertainty, efficiently identifying and excluding unreliable regions in pseudo-labeled data, improving the learning process’s quality and stability.
Extensive experiments on the ISIC 2016, 2017, and 2018 datasets demonstrate that UDAMT outperforms state-of-the-art methods, achieving up to a 1.17%p improvement in Dice coefficient under low-label settings, showing its potential for real-world clinical applications.

The paper is organized as follows: Section 2 reviews related work in medical image segmentation and SSL. Section 3 details the UDAMT framework and key components. Section 4 presents the experimental setup and evaluation metrics. Section 5 compares experimental results with current methods. Section 6 concludes the paper and outlines future research directions.

2. Related Works

2.1. Medical Image Segmentation

Medical image segmentation plays a crucial role in clinical diagnosis by identifying distinct tissues or organs within images. Traditional methods, such as threshold-based segmentation and edge detection [27,28], initially provided simple but limited solutions. The threshold-based method [29,30] classifies pixels based on intensity but struggles with complex images and noise. Edge detection-based approaches [31,32,33] enhance boundaries but face challenges with blurred edges and substantial noise.

The advent of deep learning has revolutionized segmentation, with CNNs becoming the dominant approach. U-Net [34] introduced a U-shaped architecture with skip connections, significantly improving segmentation accuracy. Variants such as Attention U-Net, MultiResUNet [35], UNet++ [36], and TransUNet [37] further refined feature extraction through attention mechanisms and transformer-based architectures [38].

Recent advances have focused on enhancing segmentation accuracy, robustness, and computational efficiency. Methods such as UAMT and Cross Pseudo Supervision (CPS) [39] introduced uncertainty-guided learning and dual-headed architectures, improving resilience against noisy labels. Multi-task models like MTMT incorporate auxiliary tasks to enhance segmentation performance. Self-supervised approaches, including PseudoSeg [40], utilize unlabeled data more effectively, reducing annotation dependency.

Additionally, attention mechanisms, including Attention U-Net and TransUNet, enable refined feature selection. Emerging paradigms like uncertainty-aware learning [41,42,43] and self-supervised pretraining [44] further improve segmentation accuracy by focusing on high-confidence regions.

These advancements highlight the transformative role of deep learning in medical image segmentation. By integrating teacher–student architectures, uncertainty-guided learning, and self-supervised techniques, SOTA methods continue to push the boundaries of automation and diagnostic precision in clinical practice.

2.2. Semi-Supervised Medical Image Segmentation

SSL has emerged as a promising solution to mitigate the need for large-scale labeled datasets in medical image segmentation. Unlike fully supervised methods, SSL leverages a small amount of labeled data along with abundant unlabeled data, making it more scalable and cost-effective [45,46].

A key SSL technique is pseudo-labeling, where a model assigns labels to unlabeled data based on its predictions [47]. However, noisy pseudo-labels can degrade performance. To address this, consistency regularization enforces stability in model predictions under various perturbations, improving segmentation robustness. FixMatch, for example, refines pseudo-labeling by applying strong augmentations and confidence thresholding.

Recent works incorporate uncertainty estimation to enhance pseudo-label reliability. By quantifying uncertainty, models can prioritize high-confidence regions while filtering out noisy labels. Approaches such as Monte Carlo Dropout [48] and Uncertainty-Guided Collaborative Mean Teacher (UCMT) [49] have successfully integrated uncertainty awareness, leading to improved segmentation performance.

SSL has been widely adopted in various medical imaging tasks, including brain Magnetic Resonance Imaging (MRI) and cardiac MRI segmentation, demonstrating performance comparable to fully supervised models despite using fewer labeled samples [50]. The Mean Teacher framework, introduced by Tarvainen et al., remains foundational in SSL, with models such as UAMT extending its capabilities through uncertainty estimation and pseudo-label refinement.

These advancements demonstrate the effectiveness of SSL in bridging the gap between fully supervised and unsupervised learning. By integrating pseudo-labeling, consistency regularization, and uncertainty-guided mechanisms, modern SSL techniques offer scalable and robust solutions for real-world medical imaging applications.

2.3. Teacher–Student Framework in SSL

The teacher–student framework has become a cornerstone in SSL, particularly in medical image segmentation, due to its ability to effectively leverage unlabeled data while maintaining robust learning performance. In this paradigm, the teacher model generates pseudo-labels for unlabeled data, and the student model learns from both labeled data and these pseudo-labeled samples. This framework promotes the utilization of a large amount of easily obtainable unlabeled data, thereby reducing the dependency on expensive, expert-annotated medical images.

The original concept of the teacher–student model was popularized by the MT model proposed by Tarvainen and Valpola. In this model, the teacher model generates pseudo-labels for unlabeled data, and the student model is trained using a combination of these pseudo-labels and labeled data. A critical aspect of the Mean Teacher approach is that the teacher model’s parameters are updated through an EMA of the student model’s weights, rather than being learned directly from labeled data. This ensures that the teacher model provides stable and reliable pseudo-labels, improving the consistency of the learning process.

One major advantage of the teacher–student framework is its ability to enforce consistency between the teacher and student models. By applying different augmentations or perturbations to the input data and ensuring that both the teacher and student produce consistent predictions, the model’s robustness is improved. This consistency regularization serves as an unsupervised loss term that guides the student model toward more reliable segmentation, even in the presence of noisy labels. Consistency regularization has been widely adopted in SSL due to its effectiveness in reducing the adverse impact of noisy pseudo-labels.

Recent advancements within the teacher–student framework include the integration of uncertainty estimation. Uncertainty-aware teacher–student models aim to identify regions of high uncertainty in pseudo-labels and exclude these regions from the training process, thereby reducing the effect of noisy or incorrect labels. For instance, the UAMT model proposed by Yu et al. employs Monte Carlo Dropout to estimate uncertainty in the network’s predictions and uses these estimates to refine the pseudo-labels generated by the teacher model. This approach ensures that the student model focuses on more reliable information, leading to improved segmentation performance.

Another significant development is the CPS method introduced by Chen et al., which utilizes two student models with different initializations. Each student model generates pseudo-labels for the other, and both models learn from each other’s pseudo-labels through a cross-supervision strategy. This method enhances the robustness of the models by leveraging complementary information and reducing the reliance on a single-teacher model.

In recent developments, the Collaborative Mean Teacher (CMT) [15] framework extends the traditional teacher–student approach by introducing multiple student models that collaboratively generate pseudo-labels. This approach encourages diversity in the generated pseudo-labels, leading to more comprehensive learning for each student model. Furthermore, the introduction of Uncertainty-Guided Mixup (UMIX) [51] allows the model to manipulate input images based on uncertainty, enhancing the overall quality of the generated pseudo-labels.

Our proposed dual-headed teacher–student framework builds upon these advancements. We introduce an auxiliary segmentation head to the student model, enabling it to learn complementary features during training. This dual-headed setup allows the student model to cross-reference its segmentation outputs, providing an additional layer of consistency that enhances learning stability. Furthermore, by incorporating an uncertainty-guided mechanism, we ensure that only reliable pseudo-labels are used during training, thereby mitigating the negative effects of noisy labels. This combined approach leverages the strengths of the teacher–student’s paradigm-effective use of unlabeled data, consistency regularization, and uncertainty estimation-to achieve superior segmentation performance in skin lesion analysis.

The teacher–student framework, particularly with these recent innovations, remains a powerful tool for advancing medical image segmentation. By harnessing the synergy between labeled and unlabeled data, and by leveraging techniques such as consistency regularization and uncertainty awareness, the framework not only reduces the need for costly labeled data but also enhances the overall quality and reliability of medical image analysis.

3. Method

3.1. Overall Architecture

The proposed method adopts a dual-headed segmentation network within a teacher–student framework to leverage both labeled and unlabeled data for enhanced segmentation performance. The architecture comprises a teacher and student model working collaboratively to improve generalization and robustness. The student model benefits from an auxiliary segmentation head, which introduces multiple learning pathways during training and ensures better adaptability. The framework diagram is shown in Figure 2.

At the core of the design, the dual-headed segmentation network strengthens the student model’s ability to extract meaningful features from diverse data sources. Two segmentation heads are employed: the main segmentation head, which generates segmentation maps, and the auxiliary segmentation head, which guides the learning process. By combining outputs from these two heads, the model achieves richer feature representation and improved resilience to noise in pseudo-labels.

An auxiliary perspective introduced by the secondary head ensures consistent predictions across network depths. The auxiliary loss, calculated as the cross-entropy between the outputs of the main and auxiliary heads, drives the model towards improved accuracy while addressing challenges posed by noisy or uncertain labels. This dual-headed approach significantly reduces the risk of overfitting to erroneous pseudo-labels and enhances the overall generalization of the student model.

A key component of the framework, the teacher–student interaction facilitates semi-supervised segmentation. The teacher model, implemented as an EMA of the student model, evolves steadily over time. It generates pseudo-labels for unlabeled data, which the student model utilizes to refine its learning. This iterative process enables the student to effectively combine knowledge from labeled and pseudo-labeled data, resulting in improved segmentation performance.

Stability in pseudo-label generation is ensured by the EMA mechanism in the teacher model. Meanwhile, the student network’s dual-headed structure introduces diversified learning pathways, mitigating the effects of noisy labels. Together, these elements contribute to a robust and generalizable segmentation framework.

Parameter sharing between the teacher and student models is facilitated by using the EMA strategy, where the teacher model’s parameters are updated as a moving average of the student model’s weights. This approach stabilizes the training process, ensuring that the teacher model provides high-quality pseudo-labels throughout the learning process. The EMA update is given by

θ_{t e a c h e r}^{(t)} = α \cdot θ_{t e a c h e r}^{(t - 1)} + (1 - α) \cdot θ_{s t u d e n t}^{(t)}

(1)

where

θ_{t e a c h e r}^{(t)}

represents the teacher model parameters at iteration

t

,

θ_{s t u d e n t}^{(t)}

represents the student model parameters at iteration

t

, and

α

is the decay rate, typically close to 1 (e.g., 0.99), ensuring a gradual change in the teacher model’s parameters. This strategy ensures that the teacher model retains the beneficial knowledge from previous training steps while also incorporating new information learned by the student model.

The training mechanism involves two main phases: supervised training with labeled data and semi-supervised training with both labeled and pseudo-labeled data. During supervised training, the student model learns directly from the ground truth annotations. During semi-supervised training, the teacher model generates pseudo-labels for the unlabeled data, which are used alongside labeled data to train the student. Specifically, the input image first passes through the teacher model to generate pseudo-labels, while Monte Carlo Dropout is applied to produce uncertainty maps. These uncertainty maps are then used to filter reliable pseudo-label regions, guiding the loss calculation for the student model. Thus, the core operations between them are uncertainty estimation and pseudo-label region filtering. Cross-entropy loss is computed for both the main and auxiliary heads, with an uncertainty-guided mechanism excluding highly uncertain regions from contributing to the loss calculation.

By leveraging both labeled and unlabeled data in a unified framework, and by incorporating dual-headed segmentation, our approach effectively addresses the challenges associated with limited labeled data and noisy pseudo-labels, resulting in a more accurate and robust segmentation model.

The flowchart illustrates (Figure 3) the pseudo-labeling process in the UDAMT model for semi-supervised skin lesion segmentation. The model follows a teacher–student framework, where both the teacher and student models share a ResNet50 backbone. The teacher model, updated via Exponential Moving Average (EMA), generates pseudo-labels for unlabeled data. To improve reliability, Monte Carlo Dropout is applied for uncertainty estimation, filtering out high-uncertainty regions based on a threshold (0.2). The student model incorporates a dual-head structure—comprising a primary Fully Convolutional Network (FCN) and an auxiliary DeepLab segmentation head—ensuring consistency learning. Input data undergoes strong and weak augmentations before training. The total loss function integrates supervised loss on labeled data, pseudo-label loss on filtered regions, and auxiliary consistency loss. The iterative process continues with the student model being optimized via backpropagation and the teacher model being updated through EMA, enhancing segmentation quality progressively.

3.2. Uncertainty Map Generation

In the SSL framework, uncertainty estimation plays a key role in identifying which pseudo-labels are reliable and which may contain errors. In our approach, uncertainty maps are generated to identify high-uncertainty regions, which are subsequently used to enhance the training process by focusing only on reliable regions.

To generate uncertainty maps, we employ Monte Carlo Dropout during inference. Specifically, the student model is run multiple times with dropout enabled, resulting in a set of predictions for each pixel. The uncertainty of each pixel is then quantified based on the variance of the predictions across multiple runs. The variance

σ^{2} (p)

of a pixel

σ^{2} (p)

can be computed as

σ^{2} (p) = \frac{1}{T} \sum_{t = 1}^{T} {(p_{t} - \bar{p})}^{2}

(2)

where

T

is the number of stochastic forward passes,

p_{t}

is the predicted probability of pixel

p

in the

t

-th forward pass, and

\bar{p}

is the average predicted probability over all passes. High variance in

σ^{2} (p)

indicates high uncertainty, allowing us to identify regions that are potentially mislabeled.

Once the uncertainty map is generated, we employ a patching strategy to reduce the impact of noisy pseudo-labels. High-uncertainty regions are excluded from the training process by masking them out, allowing the model to focus on low-uncertainty, reliable areas. Specifically, the pseudo-labels for uncertain regions are not included in the loss calculation during training. The objective function for the segmentation task is modified as follows:

L_{p s e u d o} = - \sum_{i \in μ} (y_{i}^{p s e u d o} \cdot \log ({\hat{y}}_{i}) + (1 - y_{i}^{p s e u d o}) \cdot \log (1 - {\hat{y}}_{i}))

(3)

where

μ

represents the set of pixels with low uncertainty,

y_{i}^{p s e u d o}

is the ground truth or pseudo-label, and

{\hat{y}}_{i}

is the model’s prediction. By excluding high-uncertainty regions from the loss calculation, we effectively reduce the noise in the training process and ensure that the student model learns from more reliable data.

This strategy is particularly effective in reducing the negative impact of erroneous pseudo-labels, leading to a more stable training process and better overall performance. Furthermore, we introduce a patching approach, wherein regions of high uncertainty are replaced with patches from regions of low uncertainty, thereby providing additional training examples that are reliable. This augmentation technique helps the model generalize better and enhances its robustness to noisy labels.

The combination of uncertainty map generation, noise reduction through masking, and patch-based augmentation ensures that our semi-supervised model learns effectively from the available data, even when the amount of labeled data is limited.

3.3. Loss Function

In our proposed semi-supervised segmentation model, the loss function plays a crucial role in guiding the model to learn from both labeled data and pseudo-labeled data effectively. The dual-headed architecture in the student model utilizes a cross-entropy loss function to enforce consistency between the segmentation heads and to enhance feature representation through complementary learning.

The dual-headed student model consists of a main segmentation head and an auxiliary segmentation head. To promote consistent learning across the two segmentation heads, we define a cross-entropy loss between them, encouraging the heads to produce similar segmentation outputs. This consistency regularization reduces divergence in feature extraction and prevents the network from overfitting to noisy pseudo-labels.

The cross-entropy loss between the main head and the auxiliary head is formulated as follows:

L_{a u x} = - \sum_{i \in I} (y_{i}^{a u x} \cdot \log (y_{i}^{m a i n}) + (1 - y_{i}^{a u x}) \cdot \log (1 - y_{i}^{m a i n}))

(4)

where

I

represents the set of all pixels in the image and

y_{i}^{m a i n}

and

y_{i}^{a u x}

are the predicted probabilities for pixel

i

from the auxiliary and main heads, respectively. This loss enforces the two heads to agree on their predictions for each pixel, thereby improving consistency.

The overall loss function for the student model combines multiple components to ensure effective learning. The total loss

L_{t o t a l}

is defined as

L_{t o t a l} = λ_{s u p} \cdot L_{s u p} + λ_{a u x} \cdot L_{a u x} + λ_{p s e u d o} \cdot L_{p s e u d o}

(5)

where

L_{s u p}

is the supervised loss on labeled data, calculated as

L_{s u p} = - \sum_{i \in L} (y_{i}^{t r u e} \cdot \log ({\hat{y}}_{i}) + (1 - y_{i}^{t r u e}) \cdot \log (1 - {\hat{y}}_{i}))

(6)

with

L

representing the set of labeled pixels,

y_{i}^{t r u e}

being the ground truth label for pixel

i

, and

{\hat{y}}_{i}

being the prediction.

L_{a u x}

is the consistency loss between the main and auxiliary heads.

L_{p s e u d o}

is the unsupervised loss computed on pseudo-labeled data, using uncertainty-guided masking.

λ_{s u p}

,

λ_{a u x}

, and

λ_{p s e u d o}

are weighting factors that control the contribution of each component in the total loss.

By integrating the consistency loss between the two segmentation heads, our model ensures robust feature extraction and reduces the risk of overfitting to noisy labels. This multi-component loss function allows the student model to effectively learn from both labeled and pseudo-labeled data, resulting in improved segmentation accuracy and generalization capabilities.

To dynamically optimize the loss weight under low-label proportions, we introduce an adaptive weighting mechanism that adjusts the contribution of the unsupervised loss component based on the uncertainty map. Specifically, the unsupervised loss weight is computed as

λ_{μ} = 1 - U (x)

(7)

where

U (x)

represents the uncertainty measure for a given sample. This approach ensures that pseudo-labels with lower uncertainty have a greater impact on training while reducing the influence of noisy pseudo-labels. By dynamically adjusting

λ_{p s e u d o}

based on the uncertainty map, the model can focus on more reliable regions during training, further enhancing the robustness and accuracy of the segmentation process.

To explicitly incorporate this into the total loss function, we redefine

λ_{p s e u d o}

as

λ_{p s e u d o} (x) = (1 - U (x)) \cdot λ_{p s e u d o}^{m a x}

(8)

where

λ_{p s e u d o}^{m a x}

is the upper bound for the unsupervised loss weight. Substituting this into Equation (5), the updated total loss function becomes

L_{t o t a l} = λ_{s u p} \cdot L_{s u p} + λ_{a u x} \cdot L_{a u x} + (1 - U (x)) \cdot λ_{p s e u d o}^{m a x} \cdot L_{p s e u d o}

(9)

This dynamic adjustment allows the model to focus on more reliable regions during training, further enhancing robustness and accuracy.

4. Experimental Setup

4.1. Dataset and Evaluation Metrics

To validate the generalization and effectiveness of the proposed semi-supervised segmentation model, experiments were conducted on three publicly available datasets: ISIC 2016, ISIC 2017, and ISIC 2018. These datasets, maintained by the International Skin Imaging Collaboration (ISIC), contain dermoscopic images with pixel-level segmentation masks, distinguishing lesion from non-lesion regions. This allows for a precise evaluation of segmentation models.

The ISIC 2016, ISIC 2017, and ISIC 2018 datasets include various skin lesion types, primarily melanoma and non-melanoma lesions (such as nevus and keratosis). The distribution of lesion types varies across the datasets and is not consistently documented. Demographic details such as age, gender, and ethnicity are also not consistently provided. Specific details regarding imaging tools and acquisition procedures are not included in the dataset documentation.

The ISIC 2016 dataset contains 900 images, serving as a benchmark for testing models in data-limited scenarios. The ISIC 2017 dataset expands to 2000 images, providing a more diverse set for evaluating model generalization. The ISIC 2018 dataset, the largest of the three, contains 2594 images, making it ideal for evaluating segmentation methods in more complex scenarios. A summary of the dataset composition and splits is provided in Table 1.

The original image resolutions of the ISIC 2016, 2017, and 2018 datasets are 768 × 512, 1024 × 1024, and 1024 × 1024 pixels, respectively. In the experiments, all images were uniformly scaled to 224 × 224 to balance computational efficiency and detail preservation.

The proposed model uses an input resolution of 224 × 224 pixels for all datasets. This resolution balances computational efficiency and segmentation performance, enabling effective lesion segmentation without excessive computational cost. Larger resolutions, such as 512 × 512 pixels, provide more detail but double the computation time. Smaller resolutions, like 128 × 128 pixels, reduce computational requirements but lose critical lesion details, reducing segmentation accuracy.

Experiments show that the 224 × 224 pixel resolution yields strong performance in metrics like Dice and Sensitivity. A resolution of 512 × 512 pixels provided a marginal improvement (+0.5% in Dice) but at the cost of a 2× increase in computation time. Reducing the resolution to 128 × 128 pixels resulted in a 2.3% drop in Dice due to loss of boundary detail.

For each dataset, data were split into 80% training and 20% testing. The training set included 5% and 10% labeled data, with the remaining as unlabeled. This setup simulates real-world scenarios with limited labeled data. The testing set, consisting of images with corresponding ground truth masks, was used exclusively for model evaluation, ensuring unbiased performance assessment.

Preprocessing steps were standardized across datasets as follows:

Image Resizing: All images were resized to 224 × 224 pixels using bilinear interpolation.
Normalization: Pixel intensity values were scaled to [0, 1] to stabilize model training.
Data Augmentation: Random augmentations included ±15° rotations, horizontal/vertical flips, and brightness/contrast adjustments.
Mask Binarization: All segmentation masks were standardized into binary format, distinguishing lesion from non-lesion areas.

This study focuses on binary segmentation of lesion and non-lesion regions. The primary evaluation metric is the Dice Similarity Coefficient (DSC), calculated as

D = \frac{2 |X \cap Y|}{|X| + |Y|}

(10)

where

X and Y represent the predicted and ground truth lesion regions, respectively. A Dice value closer to 1 indicates high segmentation accuracy.

Other metrics include the following:

mIoU: Measures overlap between predicted and ground-truth regions. Higher mIoU values indicate better segmentation quality.
Sensitivity: Reflects the true positive rate, with higher values showing better detection of lesions.
Specificity: Measures the true-negative rate, indicating the model’s ability to correctly identify non-lesion areas.
Overall Accuracy: Provides a holistic measure of segmentation performance.

These metrics collectively evaluate segmentation accuracy, boundary delineation, and the balance between detecting lesions and avoiding false positives. This comprehensive evaluation framework highlights both strengths and areas for improvement in the model’s performance.

By combining these metrics, the evaluation framework not only highlights the model’s strengths in accurately segmenting skin lesions but also identifies its potential weaknesses, offering deeper insights into its segmentation performance across various scenarios and datasets.

4.2. Setup Details

The experiments were conducted on a high-performance computing setup equipped with an NVIDIA Tesla V100 GPU (32 GB memory), running Ubuntu 20.04, CUDA 11.1, PyTorch 1.8.1, and Python 3.8. This setup ensured efficient handling of extensive training iterations while providing real-time feedback for fine-tuning hyperparameters. The training process consisted of two phases—supervised training and semi-supervised training—designed to progressively enhance model performance.

Key hyperparameters were selected based on preliminary experiments and best practices in medical image segmentation. The batch size was set to 8, optimized for GPU memory constraints. A learning rate of 0.001 was used, following a cosine annealing scheduler for gradual decay. To improve generalization, a weight decay of 0.0005 was applied. The EMA decay rate (α) was set to 0.99, ensuring stable updates in the teacher model for reliable pseudo-labeling. An uncertainty threshold of 0.2 was introduced to filter out high-uncertainty regions, focusing training on reliable pseudo-labels. Additionally, the loss weighting factors were defined as

λ_{s u p}

: 1.0 (supervised loss),

λ_{a u x}

: 0.5 (auxiliary consistency loss), and

λ_{p s e u d o}

: 0.7 (unsupervised loss), ensuring a balanced optimization between supervised and semi-supervised learning objectives.

In the initial phase, we trained the student model using only labeled data to establish a robust baseline. During this phase, the cross-entropy loss was computed solely on labeled samples, allowing the model to learn basic lesion segmentation patterns without interference from potentially noisy pseudo-labels. The supervised training phase typically lasted for the first 20 epochs to ensure that the model achieved stable performance on labeled data before introducing pseudo-labeled data.

After the initial supervised phase, we transitioned to the semi-supervised training phase, where the teacher–student framework was applied. During this phase, the following things occurred:

The teacher model generated pseudo-labels for the unlabeled data, refining them using an uncertainty-guided mechanism. High-uncertainty regions, identified via Monte Carlo Dropout, were excluded from the training loss calculation.

The student model learned from both the labeled and pseudo-labeled data. The dual-headed segmentation structure in the student model facilitated the learning of complementary information through the main and auxiliary heads, which were aligned using the auxiliary consistency loss.

The teacher model parameters were updated using the exponential moving average (EMA) of the student model’s weights, ensuring stable pseudo-label generation over time.

To further enhance the model’s generalization ability, data augmentation techniques were applied to both labeled and unlabeled images. These augmentations included random rotations, flips, scaling, and color jittering, which helped the model become more robust to variations in lesion appearance and imaging conditions.

Throughout training, we evaluated the model’s performance on a validation set after each epoch, monitoring metrics such as the Dice coefficient to track segmentation accuracy. The model with the highest Dice coefficient on the validation set was saved as the final checkpoint for testing.

This experimental setup and training protocol enabled our model to efficiently leverage both labeled and unlabeled data, achieving high segmentation accuracy while maintaining robustness to noisy pseudo-labels. The proposed framework, with its uncertainty-guided mechanism and dual-headed architecture, was particularly effective in addressing the challenges of limited labeled data and noisy pseudo-labeling in medical image segmentation tasks.

5. Results and Discussion

In this chapter, we provide a comprehensive evaluation of the proposed UDAMT framework for semi-supervised skin lesion segmentation. The experimental results demonstrate that UDAMT, by integrating a dual-headed architecture and an uncertainty-driven mechanism, significantly enhances segmentation performance under limited annotation conditions. Specifically, ablation experiments reveal that removing the auxiliary segmentation head results in an approximate 2.1 percentage point decrease in the Dice coefficient and a 1.7 percentage point drop in mIoU, while disabling the uncertainty mechanism causes about a 2.8 percentage point decline in Dice. When both modules are employed, the model achieves improvements of 1.17%p and 1.31%p in Dice under 5% and 10% labeled data settings, respectively, underscoring their complementary roles in suppressing noisy pseudo-labels and enhancing feature representation.

In the comparative analysis, UDAMT attains Dice coefficients of 87.84% and 88.73% under 5% and 10% labeled data, respectively, outperforming traditional supervised models (such as U-Net and TransUNet) as well as other semi-supervised approaches (including MT, UAMT, CMT, and FixMatch). Furthermore, paired t-tests on the Dice and mIoU metrics (p < 0.05) confirm that these improvements are statistically significant.

Regarding computational resources, UDAMT requires only 12.9 M parameters and achieves an inference time of 25.7 ms per image, resulting in an overall computational cost that is considerably lower than that of MT and UAMT. Additionally, experiments on the PH2 dataset demonstrate excellent generalization to small, irregular lesions. Overall, UDAMT strikes a favorable balance between segmentation accuracy, resource consumption, and inference speed, supporting its feasibility and advantages for real-world clinical deployment.

The remainder of this chapter presents detailed results from the ablation studies, comparative analyses, and computational efficiency evaluations, further discussing the UDAMT framework’s superiority across multiple performance metrics and its potential for practical applications.

5.1. Ablation Studies

To evaluate the contributions of each component in our framework, we conducted ablation studies by adding or removing modules and analyzing their impact on segmentation performance. We used the ISIC 2018 dataset with 5% and 10% labeled data, and the performance was measured using the Dice coefficient. Table 2 provides the results of these ablation experiments.

Experiment 1 (baseline) uses a single segmentation head and is trained only on labeled data. This serves as the performance benchmark.

Experiment 2 adds the auxiliary segmentation head, improving the Dice coefficient by 0.99%p (5% labeled) and 0.91%p (10% labeled).

Experiment 3 incorporates the uncertainty map, which filters high-uncertainty regions, resulting in Dice improvements of 0.65%p (5% labeled) and 0.93%p (10% labeled).

Experiment 4 combines both the auxiliary head and uncertainty map, achieving the best performance with improvements of 1.17%p (5% labeled) and 1.31%p (10% labeled).

The ablation results clearly show that each module contributes positively to the model’s performance, with the auxiliary head and uncertainty map yielding significant improvements individually. However, the highest performance is achieved when both modules are incorporated, underscoring the synergy between the dual-headed architecture and uncertainty-guided training. This comprehensive ablation analysis demonstrates the effectiveness of our proposed framework, which leverages both labeled and unlabeled data to achieve high segmentation accuracy in a semi-supervised manner.

These findings confirm that our proposed framework, with its auxiliary segmentation head and uncertainty-guided pseudo-labeling, is highly effective in semi-supervised skin lesion segmentation, particularly under limited labeled data conditions.

To validate the independent contributions of the dual-headed architecture and the uncertainty-driven mechanism, ablation studies were conducted on the ISIC 2018 dataset. The results are summarized in Table 3.

The ablation results show that both modules independently contribute to performance improvement, with the full model achieving the best results.

5.2. Comparative Analysis

We evaluate the performance of our proposed UDAMT model against the baseline model and several state-of-the-art semi-supervised skin lesion segmentation methods on the ISIC 2018 dataset with 5% and 10% labeled data. The evaluation is based on segmentation accuracy, measured by the Dice coefficient, and robustness to noisy labels, which is crucial in semi-supervised settings where pseudo-labels can introduce noise.

The baseline model is a single-headed student model trained on labeled data only, providing a performance benchmark. UDAMT is compared to the following models:

MT: Teacher–student framework with EMA updates for the teacher model.
UAMT: This includes an uncertainty-aware mechanism to exclude uncertain regions.
CMT: This uses multiple teacher models and a region mixing strategy based on uncertainty.
FixMatch: This combines strong and weak data augmentations to generate pseudo-labels.
Uncertainty-Guided: Similar to UAMT, but with enhanced uncertainty estimation.

The Dice coefficients for each model under 5% and 10% labeled data are shown in Table 4. UDAMT outperforms the baseline by 1.17%p (5% labeled) and 1.31%p (10% labeled) and consistently performs better than other methods, including MT, UAMT, and FixMatch.

Table 4 compares the Dice coefficients of various methods on the ISIC2018 dataset under 5% and 10% labeled data conditions. The baseline model, a single-headed student trained solely on labeled data, achieves 86.67% and 87.42% Dice, respectively. In contrast, UDAMT attains 87.84% and 88.73%, reflecting improvements of 1.17%p and 1.31%p over the baseline.

Compared to other semi-supervised approaches—MT, UAMT, CMT, UCMT, FixMatch, and an Uncertainty-Guided Framework—UDAMT consistently shows higher or competitive Dice scores. For example, UDAMT outperforms MT by approximately 0.74–0.75%p and UAMT by around 0.39%p, while offering a simpler, single-teacher design that reduces computational overhead compared to more complex methods like UCMT. Overall, the results indicate that the dual-headed architecture and uncertainty-guided mechanism in UDAMT effectively enhance segmentation accuracy and robustness under limited labeled data conditions.

The UDAMT framework demonstrates notable robustness to noisy labels, a common challenge in SSL. By incorporating the auxiliary head and using an uncertainty map, UDAMT effectively mitigates the impact of noisy pseudo-labels. The auxiliary head enforces consistency in feature representation, helping the model learn robust segmentation boundaries even in the presence of label noise. Meanwhile, the uncertainty-guided masking strategy selectively excludes unreliable regions from contributing to the training loss, further enhancing stability and accuracy. This dual mechanism enables UDAMT to maintain high segmentation performance across different levels of labeled data. The visualization effect of image segmentation is shown in Figure 4.

Figure 4 provides a visual comparison of segmentation results, demonstrating the superior performance of UDAMT in accurately segmenting lesion regions. The analysis highlights UDAMT’s effectiveness in handling noisy pseudo-labels through its uncertainty-guided masking strategy and auxiliary segmentation head. These mechanisms help reduce segmentation errors by excluding unreliable regions and ensuring consistent feature representation, which improves the model’s ability to handle label noise.

The dual-headed architecture of UDAMT enhances its ability to delineate lesion boundaries, even in complex cases like blurry edges, severe occlusion, and color inconsistencies. Figure 4 shows that UDAMT’s segmentation closely matches the ground truth, outperforming baseline models in accurately detecting boundaries, which is crucial for clinical use.

UDAMT also demonstrates strong generalization across various lesion types and challenging scenarios, even with limited labeled data. The model’s robustness, as shown in Figure 4, makes it well-suited for real-world applications where annotated medical data are scarce.

The comparison in Figure 4 further emphasizes UDAMT’s ability to minimize false positives and negatives, leading to more reliable and clinically relevant segmentation. Its enhanced boundary detection aids dermatologists in accurate diagnosis and treatment planning, while its performance in low-label data settings makes it scalable for resource-constrained environments like telemedicine and low-resource healthcare systems.

In conclusion, UDAMT’s dual-headed architecture and uncertainty-guided training provide state-of-the-art performance in semi-supervised segmentation, particularly in challenging scenarios with noisy labels and limited annotations. This robust and efficient design positions UDAMT as a valuable tool for reliable segmentation in medical imaging.

5.3. Performance Analysis Across Evaluation Metrics

To give a comprehensive assessment of the proposed UDAMT framework, we carried out a thorough analysis of the most important segmentation metrics: mIoU, DSC, SE, SP, and ACC. The metrics selected were meant to evaluate various aspects of the model’s performance, such as segmentation accuracy, detection of lesion regions, minimizing false positives, and overall prediction reliability. The ISIC 2016, ISIC 2017, and ISIC 2018 datasets were evaluated using the 5% and 10% labeled data settings. Comparative analyses were conducted to compare traditional supervised models like U-Net and TransUNet with state-of-the-art semi-supervised methods like MT, UAMT, and CMT.

Note that all experiments were performed by training and testing each model separately on the ISIC 2016, ISIC 2017, and ISIC 2018 datasets to ensure that the evaluation metrics accurately reflect performance on each individual dataset.

The experimental results, presented in Table 5, demonstrate the consistent superiority of the UDAMT framework across all metrics and datasets. The model exhibited robust performance, particularly in handling limited labeled data scenarios, effectively leveraging pseudo-labeled data to enhance segmentation quality.

Table 5 presents key insights into the UDAMT framework’s performance compared to traditional supervised models and state-of-the-art semi-supervised approaches. UDAMT outperforms traditional models like U-Net and TransUNet, with improvements of 1.30%p and 1.72%p in Dice, respectively, highlighting the advantage of leveraging unlabeled data. It also surpasses the Mean Teacher model by 1.86%p in Dice, thanks to its uncertainty-guided masking and dual-headed design. When compared to UAMT and CMT, UDAMT achieves further gains (0.39%p over UAMT and 0.22%p over CMT, with a 0.54%p increase in accuracy).

UDAMT achieves the highest Dice coefficient (87.84% with 5% labeled data and 88.73% with 10% labeled data), as well as the highest sensitivity (86.67%) and accuracy (97.89%) across the ISIC 2016, 2017, and 2018 datasets, demonstrating its robustness in semi-supervised segmentation tasks.

To further validate the robustness of our results, we conducted statistical significance testing using paired t-tests. The t-tests were applied to the Dice coefficient and mIoU metrics across different methods. As shown in Table 6 the performance improvements of UDAMT over baseline methods are statistically significant, with p-values less than 0.05 in all cases, indicating a meaningful difference rather than random variation.

To mitigate feature coupling between the two heads and encourage complementary learning, we incorporate an auxiliary consistency loss, defined as

L_{d i v} = {‖P_{1} - P_{2}‖}^{2}

(11)

where

P_{1}

and

P_{2}

represent the predictions of the main and auxiliary heads, respectively. Additionally, Figure 5 presents a visual comparison of the attention maps generated by each segmentation head, highlighting their distinct feature focus and verifying that the dual-headed architecture improves feature diversity. The heatmaps generated illustrate how each head emphasizes different lesion regions, contributing to improved segmentation performance. The main head predominantly captures larger structural details, while the auxiliary head focuses on fine-grained features, enhancing robustness and diversity in feature representation.

The proposed UDAMT framework outperformed both traditional supervised models and advanced semi-supervised approaches across all evaluated metrics. Its superior segmentation accuracy and robustness to noisy pseudo-labels demonstrate the effectiveness of its architectural innovations. These results underscore the potential of UDAMT for real-world medical imaging applications, where labeled data are scarce and robustness to label noise is critical. Future work will focus on extending the UDAMT framework to other medical imaging domains, optimizing its computational efficiency, and exploring its deployment in clinical settings.

5.4. Knowledge Distillation and Parameter Minimization Comparison

To evaluate the practicality and computational efficiency of the proposed UDAMT framework, a detailed comparison of parameter scale and inference speed was conducted against existing frameworks, particularly MT and other knowledge distillation-based models. Table 7 provides an overview of the parameter count and inference time per image for each model, tested on the ISIC 2018 dataset using a single NVIDIA Tesla V100 GPU.

The UDAMT framework requires 12.9 M parameters, which is slightly higher than the baseline model (12.5 M) but significantly lower than MT (14.8 M) and UAMT (15.2 M). This reduction in parameter count is attributed to the use of the auxiliary segmentation head and the single-teacher architecture, which avoids the computational overhead associated with multiple teacher models.

The inference time of UDAMT averages 25.7 ms/image, closely matching the baseline model (25.4 ms) and outperforming MT (27.6 ms) and UAMT (28.1 ms). This demonstrates that UDAMT achieves high computational efficiency while maintaining superior segmentation performance.

UDAMT achieves the highest Dice coefficients across both 5% and 10% labeled data settings, showcasing a balance between performance and resource efficiency. This makes UDAMT suitable for real-world applications where computational resources are often limited.

From Table 7, it can be observed that although the baseline method exhibits a slightly lower computational cost (defined as the product of parameter count and inference time) compared to UDAMT (317.5 M-ms vs. 331.5 M-ms, respectively), UDAMT achieves a significant improvement in segmentation performance (for example, the Dice coefficient under the 5% labeled data setting is 86.67% compared to 87.84%). Moreover, when compared to other advanced methods such as MT, UAMT, and CMT, UDAMT maintains a lower parameter count and faster inference while delivering superior overall performance. In summary, UDAMT demonstrates an excellent balance between performance and resource consumption, enabling efficient and accurate medical image segmentation even in environments with limited computational resources, which strongly supports its practical deployment in clinical settings.

To assess the generalization of UDAMT beyond the ISIC dataset, we conducted additional experiments on the PH2 dataset, which contains small and irregularly shaped lesions. Our model achieved an 85.4% Dice coefficient for small lesion segmentation, demonstrating its robustness. Additionally, Table 8 presents inference speed comparisons across different hardware platforms, confirming UDAMT’s computational efficiency.

Practical Applications and Resource-Constrained Environments.

The results highlight the practical advantages of UDAMT in resource-constrained environments, such as medical imaging systems deployed on portable devices or edge computing platforms. The following aspects underline its applicability:

Low Parameter Overhead: With fewer parameters compared to other semi-supervised frameworks, UDAMT is particularly well suited for memory-constrained environments, such as mobile healthcare devices and embedded systems.
Efficient Inference: UDAMT’s fast inference speed enables its use in time-sensitive medical applications, including real-time lesion segmentation during dermatological examinations.
Scalability to Other Tasks: UDAMT’s computational efficiency makes it scalable to other medical imaging tasks, such as organ segmentation in CT scans or tumor detection in MRIs, especially in scenarios requiring SSL due to limited labeled data availability.

The computational efficiency of UDAMT, combined with its robust segmentation accuracy, positions it as a highly practical solution for real-world medical imaging tasks. Unlike models such as UAMT and FixMatch, which incur additional overhead due to their reliance on complex architectures or augmentation strategies, UDAMT achieves comparable or better performance with a streamlined design.

Future research will explore optimizing UDAMT further by integrating lightweight architectures, such as MobileNet-based backbones, and hardware-specific accelerations, such as TensorRT or FPGA implementations, to enhance its applicability in low-power environments.

This analysis reaffirms UDAMT’s value in achieving state-of-the-art performance while addressing the practical constraints of deploying machine learning models in healthcare scenarios.

6. Conclusions and Future Work

In this study, we proposed UDAMT, a novel SSL framework for skin lesion segmentation, designed to address the critical challenges posed by limited labeled data and noisy annotations in medical image analysis. Our approach effectively combines a dual-headed teacher–student architecture with an uncertainty-guided mechanism, significantly enhancing feature learning and boundary precision. Through extensive experiments on the ISIC 2016, ISIC 2017, and ISIC 2018 datasets, UDAMT achieved notable improvements in segmentation performance, including up to 1.17%p increase in Dice coefficient and 1.31%p in mIoU under low-label settings (5% labeled data). These results validate the robustness and effectiveness of UDAMT, particularly in scenarios where labeled data are scarce.

Furthermore, UDAMT demonstrates superior computational efficiency. The model requires only 12.9 M parameters, which is slightly higher than the baseline model (12.5 M) but significantly lower than methods like MT (14.8 M) and UAMT (15.2 M). Additionally, UDAMT achieves an inference time of 25.7 ms per image, making it computationally feasible for real-time clinical applications. These advantages underscore UDAMT’s potential for deployment in clinical environments with limited computational resources.

Ablation studies provided further insights into the contributions of key components. The dual-headed segmentation network and the uncertainty-guided mechanism were both essential in enhancing segmentation quality and robustness. The dual-headed architecture allowed for complementary learning, providing richer feature representations, while the uncertainty-guided mechanism effectively filtered out unreliable pseudo-labels, ensuring more accurate model predictions. These findings highlight the importance of combining these elements for optimal performance in semi-supervised medical image segmentation.

While the current implementation of UDAMT employs Monte Carlo Dropout for uncertainty estimation—introducing some additional computational cost—the overall overhead remains manageable, and the model’s performance is not compromised. In future work, we aim to explore more efficient uncertainty estimation techniques, such as Bayesian neural networks or learned uncertainty modules, which could further reduce computational costs while maintaining the model’s effectiveness in real-time applications.

Although UDAMT is currently optimized for binary skin lesion segmentation, we believe that the framework can be easily extended to multi-class and multi-organ segmentation tasks. By modifying the network to accommodate diverse anatomical structures and varying noise characteristics across different imaging modalities (e.g., MRI, CT), UDAMT’s applicability in clinical practice can be significantly broadened.

UDAMT’s potential for clinical deployment is substantial. By reducing the reliance on large annotated datasets and delivering high segmentation accuracy in data-constrained environments, UDAMT can alleviate the annotation burden on dermatologists and other medical professionals. Its efficiency and effectiveness make it well suited for deployment in resource-limited settings, such as mobile health devices and telemedicine platforms, where rapid, reliable skin lesion diagnosis is essential.

In conclusion, the UDAMT framework presents a meaningful advance in semi-supervised medical image segmentation, addressing both the technical and practical challenges in clinical settings. The framework’s ability to balance performance, computational efficiency, and scalability provides a solid foundation for future work. We plan to continue optimizing UDAMT to further enhance its robustness and expand its clinical applications, ultimately improving diagnostic accuracy and patient outcomes in real-world healthcare environments.

Author Contributions

Conceptualization, C.Z., W.-S.J. and S.-Y.R.; Methodology, C.Z., W.-S.J. and H.-R.J.; Software, C.Z. and W.-S.J.; Validation, C.Z. and H.-R.J.; Investigation, W.-S.J. and H.-R.J.; Resources, S.-Y.R.; Data curation, C.Z. and W.-S.J.; Writing—original draft preparation, C.Z.; Writing—review and editing, C.Z. and S.-Y.R.; Visualization, C.Z. and W.-S.J.; Supervision, S.-Y.R.; project administration, H.-R.J.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) Innovative Human Resource Development for Local Intellectualization program grant funded by the Korean government (MSIT) (IITP-2025-RS-2024-00436773). This work was supported by the “Development and Demonstration of AI Services for Manufacturing Industry Specialization” grant funded by the Korean government (the Ministry of Trade, Industry and Energy) (Project No.: SG20240201).

Data Availability Statement

We confirm that the original contributions proposed in this study are fully included in the article. Should there be any further questions, please feel free to contact the corresponding author for additional clarification.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [PubMed]
Garbe, C.; Peris, K.; Hauschild, A.; Saiag, P.; Middleton, M.; Bastholt, L.; Grob, J.-J.; Malvehy, J.; Newton-Bishop, J.; Stratigos, A.J.; et al. Diagnosis and treatment of melanoma. European consensus-based interdisciplinary guideline—Update 2016. Eur. J. Cancer 2016, 63, 201–217. [Google Scholar] [CrossRef] [PubMed]
Vestergaard, M.; Macaskill, P.; Holt, P.; Menzies, S. Dermoscopy compared with naked eye examination for the diagnosis of primary melanoma: A meta-analysis of studies performed in a clinical setting. Br. J. Dermatol. 2008, 159, 669–676. [Google Scholar] [CrossRef]
Thomas, S.M.; Lefevre, J.G.; Baxter, G.; Hamilton, N.A. Interpretable deep learning systems for multi-class segmentation and classification of non-melanoma skin cancer. Med. Image Anal. 2020, 68, 101915. [Google Scholar] [CrossRef] [PubMed]
Reis, H.C.; Turk, V.; Khoshelham, K.; Kaya, S. InSiNet: A deep convolutional approach to skin cancer detection and segmentation. Med. Biol. Eng. Comput. 2022, 60, 643–662. [Google Scholar] [CrossRef]
Kaur, R.; GholamHosseini, H.; Sinha, R.; Lindén, M. Automatic lesion segmentation using atrous convolutional deep neural networks in dermoscopic skin cancer images. BMC Med. Imaging 2022, 22, 103. [Google Scholar] [CrossRef]
Bibi, A.; Khan, M.A.; Javed, M.Y.; Tariq, U.; Kang, B.-G.; Nam, Y.; Mostafa, R.R.; Sakr, R.H. Skin Lesion Segmentation and Classification Using Conventional and Deep Learning Based Framework. Comput. Mater. Contin. 2022, 71, 2477–2495. [Google Scholar] [CrossRef]
Li, H.; Pan, Y.; Zhao, J.; Zhang, L. Skin disease diagnosis with deep learning: A review. Neurocomputing 2021, 464, 364–393. [Google Scholar] [CrossRef]
Wei, Z.; Li, Q.; Song, H. Dual attention based network for skin lesion classification with auxiliary learning. Biomed. Signal Process. Control. 2022, 74, 103549. [Google Scholar] [CrossRef]
del Amor, R.; Launet, L.; Colomer, A.; Moscardó, A.; Mosquera-Zamudio, A.; Monteagudo, C.; Naranjo, V. An attention-based weakly supervised framework for spitzoid melanocytic lesion diagnosis in whole slide images. Artif. Intell. Med. 2021, 121, 102197. [Google Scholar] [CrossRef]
Godson, L.; Alemi, N.; Nsengimana, J.; Cook, G.P.; Clarke, E.L.; Treanor, D.; Bishop, D.T.; Newton-Bishop, J.; Gooya, A. Weakly-supervised learning for image-based classification of primary melanomas into genomic immune subgroups. arXiv 2022, arXiv:2202.11524. [Google Scholar] [CrossRef]
Chen, C.; Dou, Q.; Chen, H.; Qin, J.; Heng, P.A. Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation. IEEE Trans. Med. Imaging 2020, 39, 2494–2505. [Google Scholar] [CrossRef]
Xie, Q.; Li, Y.; He, N.; Ning, M.; Ma, K.; Wang, G.; Lian, Y.; Zheng, Y. Unsupervised Domain Adaptation for Medical Image Segmentation by Disentanglement Learning and Self-Training. IEEE Trans. Med. Imaging 2022, 43, 4–14. [Google Scholar] [CrossRef]
Feng, W.; Ju, L.; Wang, L.; Song, K.; Zhao, X.; Ge, Z. Unsupervised Domain Adaptation for Medical Image Segmentation by Selective Entropy Constraints and Adaptive Semantic Alignment. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; AAAI Press: Washington, DC, USA, 2023; Volume 37, pp. 623–631. [Google Scholar] [CrossRef]
Luo, X.; Chen, J.; Song, T.; Wang, G. Semi-Supervised Medical Image Segmentation Through Dual-Task Consistency. In AAAI-21/IAAI-21/EAAI-21 Proceedings of the AAAI Conference on Artificial Intelligence, virtual, 2–9 February 2021; AAAI Press: Washington, DC, USA, 2021; Volume 35, pp. 8801–8809. [Google Scholar] [CrossRef]
Chaitanya, K.; Karani, N.; Baumgartner, C.F.; Erdil, E.; Becker, A.; Donati, O.; Konukoglu, E. Semi-supervised task-driven data augmentation for medical image segmentation. Med. Image Anal. 2021, 68, 101934. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Ge, Z.; Zhang, D.; Xu, M.; Zhang, L.; Xia, Y.; Cai, J. Mutual consistency learning for semi-supervised medical image segmentation. Med. Image Anal. 2022, 81, 102530. [Google Scholar] [CrossRef]
Li, X.; Wu, Y.; Dai, S. Semi-supervised medical imaging segmentation with soft pseudo-label fusion. Appl. Intell. 2023, 53, 20753–20765. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, J.; Tian, B.; Lukasiewicz, T.; Xu, Z. Multi-modal contrastive mutual learning and pseudo-label re-learning for semi-supervised medical image segmentation. Med. Image Anal. 2022, 83, 102656. [Google Scholar] [CrossRef]
Zheng, Z.; Lv, L.; Ni, B. Uncertainty-Inspired Credible Pseudo-Labeling in Semi-Supervised Medical Image Segmentation. In Pattern Recognition and Computer Vision, Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Urumqi, China, from 18–20 October 2024; Springer Nature: Singapore, 2024; pp. 90–104. [Google Scholar] [CrossRef]
Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv 2017, arXiv:1703.01780. [Google Scholar]
Sohn, K.; Berthelot, D.; Li, C.L.; Zhang, Z.; Carlini, N.; Cubuk, E.D.; Kurakin, A.; Zhang, H.; Raffel, C. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. arXiv 2020, arXiv:2001.07685. [Google Scholar]
Yu, L.; Wang, S.; Li, X.; Fu, C.W.; Heng, P.A. Uncertainty-Aware Self-Ensembling Model for Semi-Supervised 3D Left Atrium Segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2019, 22nd International Conference, Shenzhen, China, 13–17 October 2019; pp. 605–613. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, C.; Ke, Y.; Huang, Y.; Dai, X.; Qin, F.; Zhang, Y.; Zhang, X.; Wang, C. Semi-Supervised Medical Image Segmentation Method Based on Cross-Pseudo Labeling Leveraging Strong and Weak Data Augmentation Strategies. In Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI), Athens, Greece, 27–30 May 2024. [Google Scholar] [CrossRef]
Chen, Z.; Zhu, L.; Wan, L.; Wang, S.; Feng, W.; Heng, P.A. A Multi-task Mean Teacher for Semi-Supervised Shadow Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 13–19 June 2020; pp. 5611–5620. [Google Scholar] [CrossRef]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; Kainz, B.; Zuluaga, M.A.; Wang, L.; et al. Attention U-Net: Learning Where to Look for the Pancreas. In Proceedings of the Conference on Medical Imaging with Deep Learning (MIDL 2018), Amsterdam, The Netherlands, 4–6 July 2018. [Google Scholar] [CrossRef]
Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Canny, J. A Computational Approach to Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
Huang, M.; Yu, W.; Zhu, D. An Improved Image Segmentation Algorithm Based on the Otsu Method. In Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2012), Kyoto, Japan, 8–10 August 2012; pp. 135–139. [Google Scholar] [CrossRef]
Upadhyay, P.; Chhabra, J.K. Kapur’s entropy based optimal multilevel image segmentation using Crow Search Algorithm. Appl. Soft Comput. 2020, 97, 105522. [Google Scholar] [CrossRef]
Xu, H.; Xu, X.; Zuo, Y. Applying morphology to improve Canny operator’s image segmentation method. J. Eng. 2019, 23, 8816–8819. [Google Scholar] [CrossRef]
Wang, C.; Dai, D.; Xia, S.; Liu, Y.; Wang, G. One-Stage Deep Edge Detection Based on Dense-Scale Feature Fusion and Pixel-Level Imbalance Learning. IEEE Trans. Artif. Intell. 2022, 5, 70–79. [Google Scholar] [CrossRef]
Lin, Y.; Zhang, D.; Fang, X.; Chen, Y.; Cheng, K.T.; Chen, H. Rethinking Boundary Detection in Deep Learning Models for Medical Image Segmentation. In Proceedings of the International Conference on Information Processing in Medical Imaging (IPMI 2023), San Carlos de Bariloche, Argentina, 18–23 June 2023; Springer Nature: Cham, Switzerland, 2003; pp. 730–742. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar] [CrossRef]
Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020, 121, 74–87. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Proceedings of the 4th International Workshop, DLMIA 2018 and the 8th International Workshop on ML-CDS 2018, Granada, Spain, 20 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar] [CrossRef]
Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransU-Net: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Chen, X.; Yuan, Y.; Zeng, G.; Wang, J. Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), Nashville, TN, USA, 20–25 June 2021; pp. 2613–2622. [Google Scholar]
Zou, Y.; Zhang, Z.; Zhang, H.; Li, C.L.; Bian, X.; Huang, J.B.; Pfister, T. PseudoSeg: Designing pseudo labels for semantic segmentation. arXiv 2020, arXiv:2010.09713. [Google Scholar] [CrossRef]
Ran, L.; Li, Y.; Liang, G.; Zhang, Y. Pseudo Labeling Methods for Semi-Supervised Semantic Segmentation: A Review and Future Perspectives. IEEE Trans. Circuits Syst. Video Technol. 2024; Early Access. [Google Scholar] [CrossRef]
Rizve, M.N.; Duarte, K.; Rawat, Y.S.; Shah, M. In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for SSL. arXiv 2021, arXiv:2101.06329. [Google Scholar] [CrossRef]
Chen, B.; Ye, Z.; Liu, Y.; Zhang, Z.; Pan, J.; Zeng, B.; Lu, G. Combating Medical Label Noise via Robust Semi-Supervised Contrastive Learning. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2023), Vancouver, BC, Canada, 8–12 October 2023; Springer Nature: Cham, Switzerland, 2023; pp. 562–572. [Google Scholar] [CrossRef]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar] [CrossRef]
Pieropan, A.; Azizpour, H.; Maki, A. Dense FixMatch: A simple SSL method for pixel-wise prediction tasks. arXiv 2022, arXiv:2210.09919. [Google Scholar] [CrossRef]
Miao, J.; Chen, C.; Zhang, K.; Chuai, J.; Li, Q.; Heng, P.A. Cross Prompting Consistency with Segment Anything Model for Semi-Supervised Medical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2024), Marrakesh, Morocco, 6–10 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 167–177. [Google Scholar] [CrossRef]
Min, Z.; Ge, Q.; Tai, C. Why the pseudo label based semi-supervised learning algorithm is effective? arXiv 2022, arXiv:2211.10039. [Google Scholar] [CrossRef]
Wang, K.; Zhan, B.; Zu, C.; Wu, X.; Zhou, J.; Zhou, L.; Wang, Y. Tripled-Uncertainty Guided Mean Teacher Model for Semi-Supervised Medical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI 2021), Strasbourg, France, 27 September–1 October 2021; Springer: Berlin/Heidelberg, Germany, 2021; Volume 24, pp. 450–460. [Google Scholar] [CrossRef]
Shen, Z.; Cao, P.; Yang, H.; Liu, X.; Yang, J.; Zaiane, O.R. Co-Training with High-Confidence Pseudo Labels for Semi-Supervised Medical Image Segmentation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2023), Macao, China, 19–25 August 2023; pp. 4199–4207. [Google Scholar] [CrossRef]
Chen, Y.; Wang, T.; Tang, H.; Zhao, L.; Zong, R.; Chen, S.; Tan, T.; Zhang, X.; Tong, T. Dual-decoder consistency via pseudo-labels guided data augmentation for semi-supervised medical image segmentation. arXiv 2023, arXiv:2308.16573. [Google Scholar] [CrossRef]
Ma, N.; Bu, J.; Zhang, Z.; Zhou, S. Uncertainty-guided mixup for semi-supervised domain adaptation without source data. arXiv 2021, arXiv:2107.06707. [Google Scholar] [CrossRef]

Figure 1. Difficulties in dermoscopic image segmentation. (a) The lesion area is small; (b) the lesion area is large; (c) the lesion color is light; (d) the lesion image is reflective or has bubbles; (e): the lesion area color is inconsistent; (f) the lesion area is blocked by hair; (g) the lesion area boundary is blurred; (h) there are differences in acquisition between different medical equipment.

Figure 2. Overview of the UDAMT framework.

Figure 3. Flowchart for UDAMT’s pseudo-labeling algorithm.

Figure 4. Comparison of segmentation of each model under the 5% labeled data setting. Green regions indicate areas where UDAMT achieves accurate segmentation while the baseline fails, and red regions highlight errors or missed detections by the baseline that UDAMT successfully corrects.

Figure 5. Visual comparison of attention maps from each segmentation head.

Table 1. Dataset composition and partitioning.

Dataset	Training	Validation	Testing	Total
ISIC 2016	720	90	180	900
ISIC 2017	1600	200	400	2000
ISIC 2018	2075	259	520	2594

Table 2. Ablation study results for segmentation performance on the ISIC2018 dataset.

Experiment	Auxiliary Head	Uncertainty Map	DSC (5%)	DSC (10%)
1	✗	✗	86.67	87.42
2	✓	✗	87.66 (+0.99)	88.33 (+0.91)
3	✗	✓	87.32 (+0.65)	88.35 (+0.93)
4	✓	✓	87.84 (+1.17)	88.73 (+1.31)

Table 3. Ablation study results for model performance on the ISIC2018 dataset.

Configuration	DSC	mIoU	SE	SP
Baseline (Teacher–Student Only)	85.12	82.34	84.78	94.56
+Dual-Headed Architecture	87.22	84.04	86.31	95.02
+Uncertainty-Driven Mechanism	88.34	85.17	87.56	95.65
Full Model (Proposed UDAMT)	88.73	85.67	87.89	96.12

Table 4. Comparative analysis of Dice coefficients for UDAMT, baseline, and state-of-the-art models under limited-label settings (5% and 10% labeled data) using the ISIC2018 dataset.

Model	DSC (5%)	DSC (10%)
Baseline	86.67	87.42
MT [21]	87.10	87.98
UAMT [23]	87.45	88.21
CMT [15]	87.62	88.35
UCMT [49]	88.22	88.46
FixMatch [22]	87.55	88.42
Uncertainty-Guided [50]	87.69	88.56
UDAMT	87.84	88.73

Table 5. Performance comparison across datasets under 5% and 10% labeled data settings.

Dataset	Model	mIoU (5%)	mIoU (10%)	DSC (5%)	DSC (10%)	SE (5%)	SE (10%)	SP (5%)	SP (10%)	ACC (5%)	ACC (10%)
ISIC 2016	U-Net	76.96	77.86	83.12	84.89	81.23	82.78	95.23	95.89	92.78	93.45
	TransUNet	77.21	78.13	84.01	85.67	82.45	83.56	95.67	96.23	93.23	94.01
	MT	76.78	78.21	85.12	86.45	83.87	85.32	96.03	96.37	94.21	94.63
	UAMT	77.59	78.78	86.45	87.32	85.21	86.45	96.34	96.75	94.85	95.22
	CMT	77.88	78.62	86.73	87.54	85.45	86.89	96.59	96.93	95.01	95.36
	UDAMT	77.96	78.84	87.12	87.89	86.01	87.12	96.87	97.05	95.34	95.73
ISIC 2017	U-Net	76.98	77.86	83.12	84.67	81.45	82.89	94.98	95.45	92.78	93.45
	TransUNet	77.21	78.13	84.01	85.23	82.67	83.98	95.34	95.89	93.12	94.01
	MT	76.78	78.22	86.12	87.23	84.56	85.98	95.98	96.42	94.12	94.71
	UAMT	77.59	78.78	86.98	87.65	85.45	86.87	96.34	96.82	94.87	95.34
	CMT	77.88	78.62	87.15	87.89	85.89	87.12	96.68	97.01	95.23	95.62
	UDAMT	77.97	78.84	87.78	88.34	86.34	87.54	96.92	97.23	95.65	96.02
ISIC 2018	U-Net	77.86	78.42	84.67	85.89	82.34	83.78	95.23	95.76	93.12	93.89
	TransUNet	77.89	78.98	85.12	86.45	83.01	84.56	95.67	96.12	93.78	94.34
	MT	78.13	78.34	87.10	87.98	85.56	86.45	96.87	97.12	94.67	95.12
	UAMT	78.21	78.65	87.45	88.21	86.01	86.87	97.01	97.45	95.01	95.65
	CMT	78.35	78.74	87.62	88.35	86.23	87.12	97.23	97.65	95.23	95.87
	UDAMT	78.42	78.84	87.84	88.73	86.67	87.45	97.34	97.89	95.67	96.12

Table 6. Statistical significance analysis of performance improvements across different methods (p-values from paired t-tests).

Model	DSC (5%)	DSC (10%)	mIoU (5%)	mIoU (10%)	p-Value (DSC)	p-Value (mIoU)
Baseline	86.67%	87.42%	82.34%	83.21%	-	-
MT	87.10%	87.98%	83.12%	84.00%	0.034	0.028
UAMT	87.45%	88.21%	83.45%	84.35%	0.027	0.021
CMT	87.62%	88.35%	83.78%	84.52%	0.018	0.015
UDAMT	87.84%	88.73%	85.67%	86.12%	0.009	0.006

Table 7. Comparison of parameter count, inference time, and segmentation performance across models.

Model	Parameters (M)	Inference Time (ms/image)	Computational Cost (M-ms)	DSC (5%)	DSC (10%)
Baseline	12.5	25.4	317.5	86.67	87.42
MT	14.8	27.6	408.5	87.10	87.98
UAMT	15.2	28.1	427.1	87.45	88.21
CMT	16.5	30.2	498.3	87.62	88.35
FixMatch	13.9	26.8	372.5	87.55	88.42
Uncertainty-Guided	14.5	28.0	406.0	87.69	88.56
UDAMT	12.9	25.7	331.5	87.84	88.73

Table 8. Inference speed comparison across hardware platforms.

Hardware Platform	Inference Speed (ms/image)
NVIDIA Tesla V100	25.7
NVIDIA Jetson AGX Xavier	42.3
Intel Core i7-9700K	78.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zou, C.; Jeon, W.-S.; Ju, H.-R.; Rhee, S.-Y. A Dual-Headed Teacher–Student Framework with an Uncertainty-Guided Mechanism for Semi-Supervised Skin Lesion Segmentation. Electronics 2025, 14, 984. https://doi.org/10.3390/electronics14050984

AMA Style

Zou C, Jeon W-S, Ju H-R, Rhee S-Y. A Dual-Headed Teacher–Student Framework with an Uncertainty-Guided Mechanism for Semi-Supervised Skin Lesion Segmentation. Electronics. 2025; 14(5):984. https://doi.org/10.3390/electronics14050984

Chicago/Turabian Style

Zou, Changman, Wang-Su Jeon, Hye-Rim Ju, and Sang-Yong Rhee. 2025. "A Dual-Headed Teacher–Student Framework with an Uncertainty-Guided Mechanism for Semi-Supervised Skin Lesion Segmentation" Electronics 14, no. 5: 984. https://doi.org/10.3390/electronics14050984

APA Style

Zou, C., Jeon, W.-S., Ju, H.-R., & Rhee, S.-Y. (2025). A Dual-Headed Teacher–Student Framework with an Uncertainty-Guided Mechanism for Semi-Supervised Skin Lesion Segmentation. Electronics, 14(5), 984. https://doi.org/10.3390/electronics14050984

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Dual-Headed Teacher–Student Framework with an Uncertainty-Guided Mechanism for Semi-Supervised Skin Lesion Segmentation

Abstract

1. Introduction

2. Related Works

2.1. Medical Image Segmentation

2.2. Semi-Supervised Medical Image Segmentation

2.3. Teacher–Student Framework in SSL

3. Method

3.1. Overall Architecture

3.2. Uncertainty Map Generation

3.3. Loss Function

4. Experimental Setup

4.1. Dataset and Evaluation Metrics

4.2. Setup Details

5. Results and Discussion

5.1. Ablation Studies

5.2. Comparative Analysis

5.3. Performance Analysis Across Evaluation Metrics

5.4. Knowledge Distillation and Parameter Minimization Comparison

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI