Next Article in Journal
Classification of Obfuscation Techniques in LLVM IR: Machine Learning on Vector Representations
Previous Article in Journal
A Lightweight Deep Learning Model for Tea Leaf Disease Identification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SemiSeg-CAW: Semi-Supervised Segmentation of Ultrasound Images by Leveraging Class-Level Information and an Adaptive Multi-Loss Function

Department of Electrical and Computer Engineering, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada
*
Authors to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2025, 7(4), 124; https://doi.org/10.3390/make7040124
Submission received: 31 August 2025 / Revised: 3 October 2025 / Accepted: 17 October 2025 / Published: 20 October 2025

Abstract

The limited availability of pixel-level annotated medical images complicates training supervised segmentation models, as these models require large datasets. To deal with this issue, SemiSeg-CAW, a semi-supervised segmentation framework that leverages class-level information and an adaptive multi-loss function, is proposed to reduce dependency on extensive annotations. The model combines segmentation and classification tasks in a multitask architecture that includes segmentation, classification, weight generation, and ClassElevateSeg modules. In this framework, the ClassElevateSeg module is initially pre-trained and then fine-tuned jointly with the main model to produce auxiliary feature maps that support the main model, while the adaptive weighting strategy computes a dynamic combination of classification and segmentation losses using trainable weights. The proposed approach enables effective use of both labeled and unlabeled images with class-level information by compensating for the shortage of pixel-level labels. Experimental evaluation on two public ultrasound datasets demonstrates that SemiSeg-CAW consistently outperforms fully supervised segmentation models when trained with equal or fewer labeled samples. The results suggest that incorporating class-level information with adaptive loss weighting provides an effective strategy for semi-supervised medical image segmentation and can improve the segmentation performance in situations with limited annotations.

1. Introduction

Medical image segmentation is a fundamental task of identifying different regions or tissues in an image. Segmentation is a preliminary step in disease diagnosis, providing insight into detecting abnormalities in medical images. Advances in deep learning models have resulted in segmentation techniques that outperform classical machine learning methods [1]. However, the effectiveness of deep learning-based segmentation models depends on the availability of large amounts of pixel-level annotated data. The challenge of medical image segmentation is the limited amount of labeled data due to the difficulty and high cost of manual pixel annotations. This limitation is particularly pronounced in medical ultrasound images, due to the complex process of pixel-level annotation [2], where the images have low contrast, class imbalance issues, and variable lesion shapes. Segmentation models, such as UNet [3], are designed for small medical datasets; however, they often struggle to generalize with pixel-imbalance issues, where the number of target pixels is noticeably less than that of background pixels. Some methods use data augmentation techniques to increase the number of samples, which may not provide sufficient diversity to the dataset. Transfer learning is often ineffective, as models trained on natural images may not adapt to the distinct features of ultrasound images [4].
Some approaches generate pseudo-labels, weak labels, or synthetic labels to overcome the shortage of labeled data. Generating low-quality and inaccurate pseudo-labels lacks the confidence and reliability for efficient training [5]. Recent models enhance segmentation by utilizing unlabeled data using a mix of supervised and unsupervised loss functions [6]. These models need to balance supervised and unsupervised loss functions, which is a challenging task in practice. These issues highlight the importance of approaches that incorporate other supervisory information beyond pixel-level annotations.
Class-level information provides a promising complementary source of supervision to enhance segmentation [7]. Classification and segmentation are related tasks capable of providing valuable information to each other [8]. Classification can provide valuable insights to guide segmentation, while segmentation obtains fine-grained details that support classification. Most models train classification and segmentation separately and use the trained classifier to provide information for the segmentation, as in [9], or to generate pseudo masks through class activation maps (CAMs) [10]. CAMs can highlight what regions of the image are crucial for predicting the class labels. However, using CAMs directly as pseudo-labels degrades the segmentation quality because CAMs often misalign with the actual lesion region [11]. This highlights the need for a dedicated module that can produce more reliable auxiliary maps.
A unified multitask segmentation and classification model offers mutual benefits. However, the challenge in multitask learning is how to combine multiple loss functions effectively when these functions have different ranges and characteristics. The performance of tasks in a multitask model relies on the contribution of each loss function to the total loss calculation, especially when task priorities differ. Equal or manually tuned weights can bias training toward certain tasks and require trial and error to achieve acceptable performance. Although some methods propose dynamic weighting strategies, they often neglect the heterogeneous nature of individual losses and fail to achieve a balanced optimization across different tasks. Therefore, developing an adaptive weighting strategy that assigns distinct trainable weights to each loss function is important to ensure balanced optimization across objectives.
To address these challenges, this paper presents SemiSeg-CAW, a semi-supervised segmentation and classification model that utilizes class-level information and an adaptive weighting strategy for loss computation. SemiSeg-CAW is a modular structure, including segmentation, classification, auxiliary segmentation map generator (ClassElevateSeg), and weight generation (WGM) modules. ClassElevateSeg is an auxiliary module that generates refined segmentation maps to guide training, and the weight generation module adaptively assigns trainable weights to each loss function. SemiSeg-CAW provides a robust solution when there is limited annotated data for segmentation by effectively using both labeled and unlabeled images and avoiding the dependency on manual loss tuning. Key contributions are
  • Proposing SemiSeg-CAW, a semi-supervised segmentation and classification model that utilizes class-level information to compensate for the shortage of sufficient pixel-level annotated data.
  • Proposing ClassElevateSeg, an auxiliary module to produce refined segmentation maps under multitask supervision, providing stable auxiliary features to enhance training.
  • Proposing an adaptive weighting strategy to generate distinct, trainable weights for multiple loss functions, ensuring balanced and effective multitask learning.
The proposed method is trained using two publicly available ultrasound medical image datasets, the Breast Ultrasound Images dataset (BUSI) [12] and the digital database of Thyroid Ultrasound Images (DDTI) [13,14]. Evaluations indicate that SemiSeg-CAW outperforms supervised models when trained on the same or fewer labeled samples. This confirms the effectiveness of employing class-level information and adaptive loss weighting in a semi-supervised framework. The following sections present the related works performed on medical image segmentation in Section 2, the materials and methods in Section 3, and the evaluation results and discussions in Section 4. Finally, Section 5 covers the conclusion of this paper.

2. Related Works

A semi-supervised approach improves segmentation models by using labeled and unlabeled data through different methods, such as training with the combination of real and pseudo-labels or integrating pixel-level and class-level labels. In [15], a GAN model is proposed to generate segmentation masks, while the adversarial network ensures that the masks are distinguishable between real and synthesized masks. In [16], a GAN inversion-based model uses a pre-trained inversion generator to synthesize more variant samples from the unlabeled samples for training in a semi-supervised manner. Although these approaches address the shortage of labeled samples by generating pseudo-segmentation masks, low-quality pseudo-labels limit their reliability for medical imaging applications where precision is critical. The second approach improves a segmentation model utilizing class-level and pixel-level information within a unified multitask framework or through interactive tasks trained separately. In [9], a multitask segmentation and classification model leverages mutual knowledge transformation between tasks that are trained separately. Despite improving segmentation, the separate training of tasks results in poor collaboration.
Class-level supervision provides another way to compensate for the lack of pixel-level annotations. Class activation map (CAM) [10] is a form of class-level information generated by classification models. CAMs emphasize the discriminative regions crucial for predicting class labels [10]. Some models enhance segmentation performance by combining CAMs with segmentation feature maps, often through concatenation [9] or by using them to create pseudo-segmentation masks. Segmentation and classification models have distinct objectives, where classification categorizes the entire image into classes while segmentation assigns a label to every pixel of the image. Using CAM as pseudo-labels is not optimal, as the pseudo masks generated by CAMs usually differ in size compared to real segmentation masks [11]. Accordingly, some models propose refinement strategies to increase the performance of pseudo masks generated by CAMs. For instance, in [17], CAMs are refined using causal inference techniques and anatomical priors to generate pseudo-segmentation masks with clear shape and boundaries. In a related direction, in [18], CAMs are refined through a cycle-consistent generative adversarial network to produce pseudo masks for weakly supervised segmentation.
Some multitask models employ a shared encoder for feature extraction, followed by task-specific branches to enhance each task by leveraging their correlation, as seen in [19]. For example, MTANet [20] proposed a unified encoder with task-specific attention modules for both segmentation and classification branches trained jointly using a simple sum of task losses. However, these models lack the flexibility to prioritize one task over others through manual loss weighting and do not address the limited number of labeled data. Multitask learning and a weighted combination of multiple losses are alternative ways to enhance segmentation performance. Combining task-specific loss functions in a multitask model is critical because their contributions directly impact overall task performance. Employing equal or manually tuned loss weights is simple, but it can bias training toward certain losses and miss the optimal solution by disregarding the different characteristics and ranges of the loss functions. Manual weight adjustments require trial and error and often result in unreliable outcomes. Although a random weighting strategy may improve the chances of finding better solutions, it does not ensure the optimal solution. Some methods aim to propose automatic weight adjustments during training. In [7], the weights of multiple loss functions are generated during training based on an update equation that gives more importance to losses with higher values. However, neglecting the loss value variation in the weight equation affects the balance between tasks. Ref. [21] provides a balanced learning across multiple tasks by dynamically adjusting the weights using an exponential equation presented in Equation (1).
g ( s ) = e s × L ( θ ) s
where s is the weighting parameter, L ( θ ) represents the task losses, and g ( s ) is the scaled loss function. This strategy aims to balance tasks without considering individual loss characteristics. It uses a single weight for the combined losses, which is trained as a loss parameter through backpropagation. When task priorities and the loss values vary, using unique weights for each loss can enhance flexibility in their contribution.
Overall, semi-supervised segmentation methods generally follow three main approaches, including using pseudo-labels, leveraging class-level information, or employing multitask frameworks. Pseudo-labels provide limited reliability since they often suffer from instability and poor mask quality. CAM-based information provides class-level guidance, but using CAMs as pseudo segmentation masks is not optimal since they mostly misalign with true lesion boundaries. Multitask frameworks can strengthen correlations between classification and segmentation, yet their performance is highly sensitive to how different losses are combined in the final loss computation. Addressing the limitations is important in medical image segmentation, where obtaining pixel-level annotated data is costly and challenging. Accordingly, it is essential to propose approaches that integrate more reliable class-level supervision with adaptive, task-specific loss weighting to enable robust segmentation in situations with limited pixel-level annotations.

3. Materials and Methods

The proposed SemiSeg-CAW framework is a semi-supervised segmentation and classification model designed to improve segmentation performance when pixel-level annotations are limited. SemiSeg-CAW combines class-level information with adaptive multitask optimization. SemiSeg-CAW operates in three stages as illustrated in Figure 1:
  • Stage 1 (Auxiliary Feature Extraction): A pre-trained module, called ClassElevateSeg, processes input images to generate auxiliary segmentation maps. The auxiliary maps provide additional class-level guidance that compensates for the lack of reliable pixel-level annotations.
  • Stage 2 (Core Model Training and Inference): The main network integrates segmentation and classification modules, which share a significant portion of their structure. Segmentation is the main objective, and classification provides complementary global information. Both the segmentation and classification tasks are guided by the auxiliary maps from Stage 1.
  • Stage 3 (Adaptive Loss Computation): The weight generation module (WGM) dynamically assigns trainable weights to segmentation and classification losses during training to balance the contributions of the tasks without manual tuning.
The complete architecture of the SemiSeg-CAW model, shown in Figure 2, contains four main modules: segmentation, classification, ClassElevateSeg, and WGM. The proposed modular structure enables semi-supervised training using both labeled and unlabeled data and provides the flexibility to integrate the modules into existing segmentation models.
The SemiSeg-CAW architecture integrates class-level information with adaptive multitask learning to improve segmentation under limited availability of pixel-level annotated data. As shown in Figure 2, the input image is first processed by the ClassElevateSeg module to produce auxiliary segmentation maps. The auxiliary maps provide additional supervisory information to both the segmentation and classification modules, strengthening their interaction.
The segmentation module is the primary component, responsible for generating the final segmentation masks. A significant portion of the segmentation module is shared with the classification module, enabling both global and local features to be learned jointly. WGM transfers information from the segmentation to the classification module, improving the connection between these two tasks and generating trainable weights for the losses. Incorporating trainable weights during training helps to balance the optimization process, ensuring that no single task dominates the others.
This design enables training segmentation with or without ground truth, while benefiting from class-level supervision. By integrating auxiliary maps and adaptive weighting, SemiSeg-CAW provides a principled approach for semi-supervised segmentation. The ClassElevateSeg and weight generation modules are designed for easy integration into existing segmentation models to improve their performance. The segmentation module can be replaced with alternative models within the proposed structure for further optimization.

3.1. Auxiliary Feature Extraction (ClassElevateSeg Module)

The ClassElevateSeg module is designed to provide reliable auxiliary segmentation maps for SemiSeg-CAW that incorporate both class-level semantics and spatial details. Unlike simple class activation map supervision, which often misaligns with lesion boundaries, ClassElevateSeg explicitly integrates classification and segmentation objectives in a multitask setting to generate higher-quality intermediate maps to guide Semiseg-CAW training. ClassElevateSeg is initially trained separately from other modules. Pre-training ensures that these auxiliary maps are optimized and stable before integration into the segmentation and classification modules of the SemiSeg-CAW model. ClassElevateSeg is further fine-tuned with the main model to enhance the quality of auxiliary maps.
ClassElevateSeg is an encoder–decoder structure with a pre-trained ResNet18 [23] classifier as the encoder to extract hierarchical feature maps and compute classification scores. The decoder refines multi-scale features from three levels of the encoder to generate auxiliary segmentation maps that include class information and spatial details. The model improves the accuracy of the auxiliary segmentation maps through iterative refinement using feedback from segmentation and classification data. The maps serve as intermediate representations for improving the final segmentation quality.
In the decoder section, a lower-level feature map is transformed and concatenated with a higher-level feature map from the encoder in three steps. Next, the combined features are refined using a convolutional layer. An attention mechanism enhances the importance of feature maps through a 1 × 1 convolution followed by a sigmoid activation to create a channel-wise attention map. Lastly, the output channels are adjusted by a convolutional layer with a sigmoid activation to normalize the output and produce a probability map for segmentation. The proposed structure enables effective multi-scale feature fusion and improves segmentation performance by leveraging high-level information and fine-grained spatial details. ClassElevateSeg is trained with a weighted sum of classification and segmentation losses, shown in Equation (2). Classification loss is a cross-entropy loss, and the segmentation loss is a weighted sum of three functions.
Loss ClassElevateSeg = w · L class + ( 1 w ) · L seg
where w is the weight for the classification loss, L class is the cross-entropy loss, and L seg is the segmentation loss shown in Equation (3). w is empirically set to 0.6 based on validation performance, balancing class discrimination and segmentation quality.
L seg = 0.4 L Dice + 0.4 L Ftv + 0.2 L BCEwithLogits
where L Dice is the dice loss, L Ftv is the focal tversky (Ftv) loss, and L BCEwithLogits is the binary cross-entropy with logits loss. The weights of the losses ( 0.4 ,   0.4 ,   0.2 ) are carefully selected to leverage the complementary aspects of different losses and give higher priority to overlap and class imbalance handling while retaining pixel-level accuracy. Unlike SemiSeg-CAW, ClassElevateSeg uses static loss weights instead of adaptive weighting to simplify optimization, reduce overfitting risks, and allow explicit control over task prioritization of classification versus segmentation during auxiliary map generation.

3.2. Core Model Training and Inference

The core model includes segmentation, classification, and weight generation modules.

3.2.1. Segmentation

The segmentation module in SemiSeg-CAW is a modified version of the MALUNet model [22]. MALUNet is a lightweight encoder–decoder structure that has multiple attention modules within a six-stage U-shape structure specifically designed for medical image segmentation [22]. MALUNet was selected as the base model for the segmentation module because it achieves competitive performance on small medical datasets and maintains low computational complexity. Thus, MALUNet is well-suited for semi-supervised learning where labeled data and resources are limited. The modified MALUNet has three main modifications. First, the features from the bottleneck of the encoder–decoder structure are concatenated with the auxiliary segmentation maps generated by the ClassElevateSeg module, allowing the segmentation branch to benefit from additional class-level guidance. Second, several decoder layers are connected to both the WGM and classification modules, enabling stronger interaction between the segmentation and classification tasks. Last, the training of the segmentation module is based on a combination of segmentation and classification losses in a semi-supervised manner, allowing optimization even in the absence of pixel-level annotations.
To evaluate the generality of the proposed SemiSeg-CAW framework, UNeXt [24] is implemented as the segmentation module. UNeXt is an encoder–decoder segmentation model incorporating three convolutional blocks and two tokenized Convolutional Multilayer Perceptron blocks in both the encoder and decoder [24]. Similar adjustments were applied to the UNeXt model, indicating that SemiSeg-CAW can integrate alternative backbones while maintaining its effectiveness.

3.2.2. Classification

The classification module provides class-level supervision that complements the segmentation task in SemiSeg-CAW. Its primary role is to enhance segmentation learning by providing global semantic information, which is especially important when the segmentation ground truth is limited. To achieve this, the classification module shares a significant portion of its structure with the segmentation module, allowing the two tasks to be jointly optimized and benefiting from their correlation. This design is inspired by the classification branch in U-NetSC [7], but includes several modifications to strengthen task connectivity and enrich classification features.
Multiple connections are defined between the segmentation and classification modules to maximize the influence of classification loss on segmentation training. Specifically, the model has two paths from segmentation to classification, utilizing multiple high-level feature maps. First, the decoder feature maps from stage 2, scaled by WGM, are connected to the classification module. Second, the decoder feature maps from stage 1 are directly fed to the classification. These connections enable the segmentation parameters to be trained not only by the error propagation of the segmentation loss but also through the backpropagation of the classification loss.
The classification module consists of three main blocks: target area extraction, feature extraction, and class label detection. The target area extraction removes irrelevant regions from the input image using a segmentation mask derived from the decoder’s last layer before the final output. Non-target regions in the input are set to zero based on a thresholded combination of the input and a scaled segmentation mask. The threshold value of 0.2 is empirically chosen on a validation set to ensure consistent performance.
The feature extraction block contains sequences of convolutional layers and a residual block across two extraction passes. One pass processes the output from the target extraction block, while the other carries the decoder feature maps from stage 2 of the segmentation module. These two passes are concatenated, and the result is further combined with the auxiliary segmentation maps from ClassElevateSeg to enrich the feature representation.
Finally, the class label detection block includes flatten, dense, and batch normalization layers followed by a softmax activation function to produce class probabilities. By incorporating segmentation features and auxiliary feature maps with the classification feature, and using adaptive weighting, the classification module predicts image-level labels and also strengthens the overall segmentation performance of SemiSeg-CAW.

3.2.3. Weight Generation Module (WGM)

WGM is designed to generate trainable loss weights and adaptively balance the contributions of segmentation and classification losses of the SemiSeg-CAW model. Unlike fixed or manually adjusted loss weights, which can cause bias toward one task during training, WGM generates trainable weights that allow the framework to adjust dynamically during optimization. In addition, WGM forwards informative segmentation features to the classification module, strengthening the interaction between the two tasks.
WGM is implemented as a sequence of convolutional layers, batch normalization, and nonlinear activations that process feature maps from the segmentation decoder’s lower layers to capture rich local patterns while remaining computationally efficient. The feature map dimensionality is reduced to match the number of loss functions used in computing the final loss function. The WGM output provides the raw weights for the losses, which are updated during the backpropagation process. The weights of the losses are computed by taking the maximum value of the WGM output in each epoch to avoid zero values. The mean of these loss weights acts as a scaling factor, which is then multiplied by the chosen segmentation layer and passed to the classification as high-level segmentation feature maps.
Through this design, WGM achieves two objectives simultaneously: (i) it adaptively learns distinct weights for multiple loss functions, ensuring balanced multitask optimization, and (ii) it enhances classification by injecting scaled segmentation features, without changing the primary segmentation pathway. This integration allows SemiSeg-CAW to jointly optimize both tasks more effectively under limited supervision.

3.3. Adaptive Loss Computation

The final loss function of SemiSeg-CAW is a weighted average of segmentation loss and classification loss. Adaptive weighting is essential because the characteristics, scales, and dynamic ranges of the losses for each task are different. Segmentation involves pixel-wise predictions, while classification includes categorical predictions over multiple classes. Using fixed or manually tuned weights can bias training toward one task, which limits performance in multitask settings. To address this, SemiSeg-CAW integrates the weight generation module (WGM) to produce trainable loss weights that are dynamically updated during training. The classification loss is a cross-entropy loss function. For a fair comparison, the segmentation loss follows MALUNet [22], combining binary cross-entropy and dice loss, as shown in Equation (4).
L o s s s = L BCE + L Dice
where L Dice is the dice loss, and L BCE is the binary cross-entropy loss. The final loss function is computed based on Equation (5), inspired by the loss balance technique proposed in [21].
Final   loss = e w 1 × Loss S w 1 + e w 2 × Loss C w 2 2
where w 1 and w 2 are task-specific weights generated by WGM for segmentation and classification losses, respectively. Each loss is scaled and regularized by a unique weight generated by WGM. Two main differences exist between the proposed weighting strategy in this paper and the IMTL-L [21] method. IMTL-L [21] uses a single scaling parameter for the total loss summation and learns this weight as a parameter of the loss function. In contrast, this paper assigns a unique scaling parameter to each loss, and the weights are generated from a model layer rather than being parameters of the loss function. The proposed design considers the varied scales and dynamic ranges of different losses and allows for independent scaling and regularization.
The proposed adaptive strategy is particularly beneficial for SemiSeg-CAW since the task priorities differ, and segmentation is the primary objective. Fine-tuning the loss weights during training helps balance the impact of different losses, ultimately improving segmentation performance by dynamically adjusting the importance of each task. The final loss is computed based on two conditions. The final loss is calculated based on Equation (5) if both the segmentation and classification ground truths are present. However, if samples lack the segmentation ground truth, a small constant value is assigned to the segmentation loss, allowing for semi-supervised segmentation training. The constant value is empirically determined based on validation performance for each dataset. The values are tested in a decreasing order (e.g., 0.1, 0.01, 0.001) until obtaining stable results to ensure that unlabeled samples contribute to training without overwhelming the optimization.

4. Experimental Results

This section provides information about datasets, implementation details, and results.

4.1. Dataset

The evaluations are conducted using the Breast Ultrasound Images dataset (BUSI) [12] and the digital database of Thyroid Ultrasound Images (DDTI) [13,14]. The images are randomly divided into training, validation, and test sets, ensuring consistent class distribution across the sets. The BUSI dataset has 568 training, 122 validation, and 90 test images, with three class labels. The DDTI dataset includes 328 training, 64 validation, and 43 test images, with two class labels. Before training, all images are resized to 256 × 256 pixels and normalized with a mean of 0 and a standard deviation of 1, with pixel values between 0 and 255. The MALUNet is trained in two scenarios: one with the full dataset and the other with a reduced number of samples. The samples are randomly reduced for each class to maintain class distribution. The removed images lack segmentation ground truth when training the SemiSeg-CAW model. While the number of input training samples remains the same as in the full dataset, the number of samples with segmentation ground truth decreases. The SemiSeg-CAW model is trained using samples that have both segmentation and classification ground truth, as well as samples that contain only classification labels.

4.2. Implementation Details

The models are trained and evaluated on a NVIDIA GeForce GTX 1070 GPU (NVIDIA Corporation, Santa Clara, CA, USA) with 8 GB RAM, using PyTorch 1.8.1+cu101. All settings are the same as MALUNet [22]. Both SemiSeg-CAW and MALUNet used the AdamW optimizer, a learning rate of 0.001, Cosine Annealing Learning Rate (CosineAnnealingLR) for dynamic learning rate adjustment, and a batch size of 8. The same data augmentation techniques are applied to both models, similar to those provided in [22]. When the training sample lacks the segmentation ground truth, a constant value for the segmentation loss must be defined. This value depends on the dataset characteristics, requiring a unique definition for each dataset. The constant loss value was empirically determined on a validation set for each dataset to ensure stable optimization, with values of 0.01 for BUSI and 0.001 for DDTI. In the SemiSeg-CAW model, the optimal weights are selected based on the lowest segmentation loss on the validation set to prioritize segmentation performance. The segmentation performance is evaluated using the Intersection Over Union (IOU), Dice coefficient, recall, precision, specificity, and accuracy metrics.

4.3. Segmentation Results

The SemiSeg-CAW is compared with the MALUNet [22] and UNeXt [24] models in Table 1. Similar to MALUNet, the UNeXt [24] model is integrated into the segmentation module. MALUNet is implemented with the code from [25], and UNeXt is implemented with the code from [26]. For SemiSeg-CAW, the total number of input images matches the full-size training set, but the number of samples with both segmentation and classification ground truth corresponds to the reduced training set. Additional samples beyond the reduced set contain only classification labels.
Evaluations of the BUSI dataset indicate that SemiSeg-CAW surpasses the supervised MALUNet, which is trained on the full-sized training set, by about 1% across all metrics except for recall. The SemiSeg-CAW works better than the MALUNet even without accessing the segmentation ground truth of 100 images. Also, the SemiSeg-CAW provides better performance compared to the MALUNet with the same number of labeled samples by achieving 2 % , 3 % , 11 % , 3 % , and 1 % increase in Dice, IOU, precision, specificity, and accuracy, respectively. The proposed method achieves better segmentation performance with higher similarity between predicted and actual masks. However, SemiSeg-CAW has a lower recall, indicating that MALUNet generates larger masks at the cost of losing the similarity between the predicted and actual masks. The same results are provided on the DDTI dataset. SemiSeg-CAW outperforms MALUNet trained on both the full-size and the reduced-sized training sets across all metrics except the recall. The SemiSeg-CAW outperforms the supervised MALUNet, even without accessing the segmentation ground truth of 144 samples. With the same number of labeled samples, SemiSeg-CAW outperforms MALUNet by enhancing 5 % , 5 % , 2 % , 5 % , 2 % , and 1 % of Dice, IOU, recall, precision, specificity, and accuracy, respectively.
Figure 3 shows how reducing the number of labeled training samples affects the performance of the MALUNet and SemiSeg-CAW models. Both models are trained on a full-size set of 328 images and the reduced-size sets of 280, 232, 184, and 136 images. The graphs in Figure 3 highlight that the SemiSeg-CAW model outperforms a supervised model with the same number of labeled samples by utilizing additional unlabeled data and class-level information. SemiSeg-CAW, trained with 136 labeled and 192 unlabeled data, significantly outperforms MALUNet, which is trained on 136 labeled data. Table 2 presents an ablation study on the effectiveness of the ClassElevateSeg and WGM modules using the DDTI dataset (136 labeled and 192 unlabeled data), along with the corresponding training time per epoch. First, the ClassElevateSeg module is removed to evaluate the impact of using auxiliary segmentation maps. Removing ClassElevateSeg decreases the segmentation performance by missing more areas of the region of interest while reducing the training time. Second, replacing WGM with equal static weights for the losses significantly reduces segmentation performance and slightly increases training time. Third, using a single trainable weight for the summation of the losses, as suggested in [21], decreases performance while maintaining a similar training time. This highlights the importance of assigning distinct weights for improved flexibility and balance. Fourth, removing the ClassElevateSeg and WGM modules and employing the weighting strategy from [7] reduces segmentation performance but results in lower training time compared to most variants. However, replacing WGM with the adaptive weighting strategy of [7] slightly helps maintain the performance compared to using equal loss weights. Finally, the ClassElevateSeg is replaced with LayerCAM [27], derived from a pre-trained ResNet18 model on the DDTI dataset. LayerCAM is chosen to produce a class activation map (CAM) as a form of class-relevant information instead of using ClassElevateSeg. CAM represents a direct form of class information used to improve segmentation performance in various methods. However, CAM lacks sufficient discriminative details and fails to produce feature maps that resemble segmentation masks. The results demonstrate that SemiSeg-CAW achieves the best trade-off between segmentation accuracy and computational cost.
The training times of different SemiSeg-CAW variants are comparable, with fluctuations of 1 to 2 s per epoch. The results indicate that the ClassElevateSeg and WGM modules provide clear performance improvements without introducing significant time overhead. A detailed comparison of computational complexity between SemiSeg-CAW and the baseline models (MALUNet and UNeXt) is presented in Table 3.
The UNeXt [24] model is used in the segmentation module to illustrate how SemiSeg-CAW improves segmentation performance across different models, as shown in Table 4.
The results in Table 4 indicate that the proposed method effectively improves segmentation performance across various models by integrating them into the proposed structure.
To further evaluate computational complexity, Table 3 reports the training time per epoch and inference efficiency of SemiSeg-CAW compared to the baseline MALUNet and UNeXt models. Both the BUSI and DDTI datasets were used to evaluate runtime performance.
The results show that SemiSeg-CAW increases the number of parameters compared to the base models, which results in slightly longer training and inference times. However, the additional overhead is moderate, and the performance improvements of SemiSeg-CAW are achieved with a minor increase in computational costs. With only 14–16 million trainable parameters, SemiSeg-CAW balances efficiency and accuracy, which makes it suitable for the medical ultrasound segmentation task where pixel-level annotated data is limited. In comparison, state-of-the-art segmentation models such as GLIMS [28] (approximately 47 million trainable parameters), and foundation-style approaches like MedSAM [29] (approximately 94 million trainable parameters in adapted components) demand far greater computational resources. These large-scale models achieve strong performance, but they rely on different training protocols (e.g., prompt-based supervision and extensive large-scale annotations). As a result, direct reproduction on the BUSI and DDTI datasets is outside the scope of this work. These comparisons highlight that SemiSeg-CAW offers a lightweight yet effective alternative, with practical feasibility for deployment in resource-constrained medical imaging settings.
Figure 4 and Figure 5 present qualitative comparisons among the SemiSeg-CAW, MALUNet, and UNeXt models using contour overlays on enhanced ultrasound images for better visualization. The contrast and brightness were adjusted in Figure 4 and Figure 5 only for visualization purposes; all models were trained and evaluated on the original images. For reference, the corresponding binary masks and the original input images without enhancements are provided in Appendix A, Figure A1 and Figure A2.
Figure 4 shows that SemiSeg-CAW generates segmentation contours, derived from the predicted masks that align more closely with the ground truth than those generated by MALUNet and UNeXt on the BUSI dataset. Figure 5 presents the results on the DDTI dataset, where the proposed SemiSeg-CAW demonstrates better alignment with the ground truth compared to the baseline models. The results indicate that integrating a single-task segmentation model into SemiSeg-CAW improves the quality of the masks and makes them more similar to the ground truth. Figure 6 shows the progress of loss weights and segmentation loss during training in both datasets.
In Figure 6, weights are exponentiated before loss computation and dynamically adjusted during training. Exponentiating raw weights prevents trivial zero-weight solutions and ensures meaningful learning. This method systematically explores combinations of w 1 and w 2 to find the most effective balance without manual intervention. In the BUSI dataset, w 1 initially decreases before stabilizing, followed by oscillations in the later stages of training, while w 2 remains near zero. In the DDTI dataset, w 2 remains nonzero, while w 1 fluctuates after a decrease. These trends in weight adjustments confirm that the model dynamically adapts its focus, proving its reliability over manual weights. In the BUSI dataset, the best weights are w 1 = 1.292 and w 2 = 0.08 , which will be 3.64 and 1.08, respectively, after applying the exponential function. The loss weights indicate that the segmentation task is dominant, considering the difficulty of defining lesion boundaries. For the DDTI dataset, the best weights are w 1 = 0 and w 2 = 0.45 , which will be 1 and 1.57, respectively, which suggests that the classification requires stronger optimization in the DDTI dataset. The distinct behaviors of w 1 and w 2 in the BUSI and DDTI datasets highlight the model’s ability to adapt its weighting strategy based on the characteristics of each dataset. As shown in Table 2, this approach surpasses three alternative weighting strategies. The proposed weighting approach shows consistent performance across diverse datasets, as seen in the steady decline in the segmentation loss curves.

5. Conclusions

This paper introduced SemiSeg-CAW, a semi-supervised segmentation and classification framework designed to address the scarcity of pixel-level annotations in medical image segmentation. The framework leverages class-level information in two ways. It allows semi-supervised learning of the segmentation when pixel-level annotations are unavailable. Additionally, it enriches the feature maps of the model through auxiliary features. The auxiliary features are generated from ClassElevateSeg, which is initially pre-trained under joint classification–segmentation supervision and further fine-tuned jointly with the main model to provide auxiliary segmentation maps. This paper proposes an adaptive weighting strategy to dynamically generate trainable weights for the losses based on task difficulty and dataset characteristics. These components form a unified multitask structure that exploits the correlation between classification and segmentation to improve training under limited annotation conditions.
The experimental evaluations on BUSI and DDTI ultrasound datasets show that SemiSeg-CAW outperforms fully supervised MALUNet [22] and UNeXt [24] models when trained with the same or fewer labeled samples. The results validate the effectiveness of combining auxiliary supervision with adaptive weighting in a unified multitask framework in enhancing segmentation performance regardless of the segmentation model. Beyond outperforming specific segmentation baselines, the proposed framework is modular and generalizable. ClassElevateSeg can be integrated into other architectures to provide auxiliary feature maps, while WGM offers a principled approach for balancing different loss functions in multitask learning.
The segmentation masks generated by SemiSeg-CAW can help produce more consistent and reliable lesion delineations in ultrasound images. Accurate delineations enhance consistent measurement of lesion size, enable monitoring of changes across follow-up scans, and provide clearer visual guidance for clinicians. Because SemiSeg-CAW reduces the reliance on pixel-level annotations while maintaining modest computational requirements, it has the potential to make segmentation tools more practical and accessible in real workflows, including integration as overlay guidance during examinations to assist radiologists.
In summary, this work shows that integrating class-level supervision with adaptive multitask weighting can reduce reliance on pixel-level annotations and lays the foundation for more generalizable semi-supervised segmentation models. Future work will focus on extending SemiSeg-CAW to other modalities like MRI and CT, exploring its integration with weakly or self-supervised approaches, and examining scalability to larger datasets and more complex backbones. These directions aim to further enhance the generalizability and applicability of SemiSeg-CAW in both medical and broader computer vision tasks.

Author Contributions

S.B. conducted the research, performed the experiments, analyzed the results, and wrote the manuscript. N.K. supervised the project and reviewed each stage of the research and manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by NSERC funding (Discovery RGPIN-2020-05471).

Data Availability Statement

The datasets used in this study are publicly available. The BUSI dataset can be accessed at [https://www.kaggle.com/datasets/aryashah2k/breast-ultrasound-images-dataset] (accessed on 16 October 2025), and the DDTI dataset can be accessed at [https://www.kaggle.com/datasets/eiraoi/thyroidultrasound/data] (accessed on 16 October 2025). Any preprocessing steps applied to these datasets are described in the Dataset section.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
SemiSeg-CAWSemi-supervised Segmentation and Classification with Adaptive Weighting
CAMClass Activation Map
WGMWeight Generation Module
IoUIntersection over Union
DiceDice Coefficient
BCEBinary Cross-Entropy
BCEwithLogitsBinary Cross-Entropy with Logits
FtvFocal Tversky
CECross-Entropy
AdamWAdaptive Moment Estimation with Weight Decay Optimizer
LRLearning Rate
CosineAnnealingLRCosine Annealing Learning Rate Scheduler
BNBatch Normalization
ReLURectified Linear Unit
BUSIBreast Ultrasound Images Dataset
DDTIDigital Database of Thyroid Ultrasound Images
UNetU-shaped Convolutional Neural Network

Appendix A

To complement the contour overlays shown in Figure 4 and Figure 5, the corresponding binary segmentation masks along with the original (unenhanced) ultrasound images are presented in Figure A1 and Figure A2.
Figure A1. Segmentation results on the BUSI dataset, (a) using MALUNet as the baseline segmentation module, and (b) using UNeXt as the baseline segmentation module. In each panel, every row shows an original input image, the corresponding segmentation mask generated by the baseline model, the mask generated by the proposed SemiSeg-CAW model, and the ground truth mask.
Figure A1. Segmentation results on the BUSI dataset, (a) using MALUNet as the baseline segmentation module, and (b) using UNeXt as the baseline segmentation module. In each panel, every row shows an original input image, the corresponding segmentation mask generated by the baseline model, the mask generated by the proposed SemiSeg-CAW model, and the ground truth mask.
Make 07 00124 g0a1
Figure A2. Segmentation results on the DDTI dataset, (a) using MALUNet as the baseline segmentation module, and (b) using UNeXt as the baseline segmentation module. In each panel, every row shows an original input image, the corresponding segmentation mask generated by the baseline model, the mask generated by the proposed SemiSeg-CAW model, and the ground truth mask.
Figure A2. Segmentation results on the DDTI dataset, (a) using MALUNet as the baseline segmentation module, and (b) using UNeXt as the baseline segmentation module. In each panel, every row shows an original input image, the corresponding segmentation mask generated by the baseline model, the mask generated by the proposed SemiSeg-CAW model, and the ground truth mask.
Make 07 00124 g0a2

References

  1. Liu, X.; Song, L.; Liu, S.; Zhang, Y. A review of deep-learning-based medical image segmentation methods. Sustainability 2021, 13, 1224. [Google Scholar] [CrossRef]
  2. Lyu, Y.; Xu, Y.; Jiang, X.; Liu, J.; Zhao, X.; Zhu, X. AMS-PAN: Breast ultrasound image segmentation model combining attention mechanism and multi-scale features. Biomed. Signal Process. Control 2023, 81, 104425. [Google Scholar] [CrossRef]
  3. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
  4. Liu, S.; Wang, Y.; Yang, X.; Lei, B.; Liu, L.; Li, S.X.; Ni, D.; Wang, T. Deep learning in medical ultrasound analysis: A review. Engineering 2019, 5, 261–275. [Google Scholar] [CrossRef]
  5. Su, J.; Luo, Z.; Lian, S.; Lin, D.; Li, S. Mutual learning with reliable pseudo label for semi-supervised medical image segmentation. Med. Image Anal. 2024, 94, 103111. [Google Scholar] [CrossRef] [PubMed]
  6. Li, X.; Yu, L.; Chen, H.; Fu, C.W.; Xing, L.; Heng, P.A. Transformation-consistent self-ensembling model for semisupervised medical image segmentation. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 523–534. [Google Scholar] [CrossRef] [PubMed]
  7. Barzegar, S.; Khan, N. Skin lesion segmentation using a semi-supervised U-NetSC model with an adaptive loss function. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, Scotland, 11–15 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 3776–3780. [Google Scholar]
  8. Yuan, Y. Automatic skin lesion segmentation with fully convolutional-deconvolutional networks. arXiv 2017, arXiv:1703.05165. [Google Scholar] [CrossRef]
  9. Xie, Y.; Zhang, J.; Xia, Y.; Shen, C. A mutual bootstrapping model for automated skin lesion segmentation and classification. IEEE Trans. Med. Imaging 2020, 39, 2482–2493. [Google Scholar] [CrossRef] [PubMed]
  10. Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
  11. Feng, J.; Li, C.; Wang, J. CAM-TMIL: A Weakly-Supervised Segmentation Framework for Histopathology based on CAMs and MIL. In Proceedings of the 4th International Conference on Computing and Data Science, Macau, China, 16–25 July 2022; Volume 2547, p. 012014. [Google Scholar]
  12. Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef] [PubMed]
  13. Pedraza, L.; Vargas, C.; Narváez, F.; Durán, O.; Muñoz, E.; Romero, E. An open access thyroid ultrasound image database. In Proceedings of the 10th International Symposium on Medical Information Processing and Analysis, Cartagena, Colombia, 14–16 October 2014; SPIE: Bellingham, WA, USA, 2015; Volume 9287, pp. 188–193. [Google Scholar]
  14. Eiraoi. Thyroid Ultrasound Data. Kaggle, 2024. [Online]. Available online: https://www.kaggle.com/datasets/eiraoi/thyroidultrasound/data (accessed on 17 June 2024).
  15. Singh, V.K.; Rashwan, H.A.; Romani, S.; Akram, F.; Pandey, N.; Sarker, M.M.K.; Saleh, A.; Arenas, M.; Arquez, M.; Puig, D.; et al. Breast tumor segmentation and shape classification in mammograms using generative adversarial and convolutional neural network. Expert Syst. Appl. 2020, 139, 112855. [Google Scholar] [CrossRef]
  16. Feng, X.; Lin, J.; Feng, C.M.; Lu, G. GAN inversion-based semi-supervised learning for medical image segmentation. Biomed. Signal Process. Control 2024, 88, 105536. [Google Scholar] [CrossRef]
  17. Chen, Z.; Tian, Z.; Zhu, J.; Li, C.; Du, S. C-cam: Causal cam for weakly supervised semantic segmentation on medical image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11676–11685. [Google Scholar]
  18. Wang, J.; Liu, B.; Li, Y.; Li, J.; Pei, Y. CAM-CycleGAN: A Weakly Supervised Segmentation Method for Medical Images Based on Cycle Consistensy Generative Adversarial Networks. In Proceedings of the 2024 IEEE 6th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 24–26 May 2024; IEEE: Piscataway, NJ, USA, 2024; Volume 6, pp. 1293–1296. [Google Scholar]
  19. Yang, C.L.; Harjoseputro, Y.; Chen, Y.Y. A hybrid approach of simultaneous segmentation and classification for medical image analysis. Multimed. Tools Appl. 2024, 84, 21805–21827. [Google Scholar]
  20. Ling, Y.; Wang, Y.; Dai, W.; Yu, J.; Liang, P.; Kong, D. Mtanet: Multi-task attention network for automatic medical image segmentation and classification. IEEE Trans. Med. Imaging 2023, 43, 674–685. [Google Scholar] [CrossRef] [PubMed]
  21. Liu, L.; Li, Y.; Kuang, Z.; Xue, J.; Chen, Y.; Yang, W.; Liao, Q.; Zhang, W. Towards impartial multi-task learning. In Proceedings of the ICLR 2021 International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
  22. Ruan, J.; Xiang, S.; Xie, M.; Liu, T.; Fu, Y. MALUNet: A multi-attention and light-weight unet for skin lesion segmentation. In Proceedings of the 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 6–8 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1150–1156. [Google Scholar]
  23. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  24. Valanarasu, J.M.J.; Patel, V.M. Unext: Mlp-based rapid medical image segmentation network. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 23–33. [Google Scholar]
  25. Ruano, J. MALUNet: A Multi-Attention and Light-Weight UNet for Skin Lesion Segmentation. 2023. Available online: https://github.com/JCruan519/MALUNet (accessed on 1 November 2023).
  26. Valanarasu, J.M.J. UNeXt–Pytorch Implementation. 2022. Available online: https://github.com/jeya-maria-jose/UNeXt-pytorch (accessed on 16 November 2024).
  27. Jiang, P.T.; Zhang, C.B.; Hou, Q.; Cheng, M.M.; Wei, Y. Layercam: Exploring hierarchical class activation maps for localization. IEEE Trans. Image Process. 2021, 30, 5875–5888. [Google Scholar] [CrossRef] [PubMed]
  28. Yazıcı, Z.A.; Öksüz, İ.; Ekenel, H.K. GLIMS: Attention-guided lightweight multi-scale hybrid network for volumetric semantic segmentation. Image Vis. Comput. 2024, 146, 105055. [Google Scholar] [CrossRef]
  29. Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Three stages of the SemiSeg-CAW framework: (1) Auxiliary Feature Extraction using the ClassElevateSeg module, (2) Core Model Training and Inference with shared segmentation and classification representations guided by auxiliary maps, and (3) Adaptive Loss Computation, where the weight generation module (WGM) assigns dynamic weights to losses.
Figure 1. Three stages of the SemiSeg-CAW framework: (1) Auxiliary Feature Extraction using the ClassElevateSeg module, (2) Core Model Training and Inference with shared segmentation and classification representations guided by auxiliary maps, and (3) Adaptive Loss Computation, where the weight generation module (WGM) assigns dynamic weights to losses.
Make 07 00124 g001
Figure 2. Detailed structure of the SemiSeg-CAW model. The architecture includes segmentation, classification, ClassElevateSeg, and WGM modules. The white block represents the MALUNet segmentation backbone adapted from [22].
Figure 2. Detailed structure of the SemiSeg-CAW model. The architecture includes segmentation, classification, ClassElevateSeg, and WGM modules. The white block represents the MALUNet segmentation backbone adapted from [22].
Make 07 00124 g002
Figure 3. Impact of reducing the number of training samples on segmentation performance (DDTI dataset). (a) Dice scores and (b) IoU scores are shown for SemiSeg-CAW (gray line), MALUNet trained on a reduced set (orange line), and MALUNet trained on the full set (blue dots).
Figure 3. Impact of reducing the number of training samples on segmentation performance (DDTI dataset). (a) Dice scores and (b) IoU scores are shown for SemiSeg-CAW (gray line), MALUNet trained on a reduced set (orange line), and MALUNet trained on the full set (blue dots).
Make 07 00124 g003
Figure 4. Segmentation results on the BUSI dataset, (a) using MALUNet as the baseline segmentation module, and (b) using UNeXt as the baseline segmentation module. In each panel, every row shows an input image (enhanced for visualization), the segmentation boundary generated by the baseline model (red contour), the boundary generated by the proposed SemiSeg-CAW model (blue contour), and the ground truth boundary (green contour).
Figure 4. Segmentation results on the BUSI dataset, (a) using MALUNet as the baseline segmentation module, and (b) using UNeXt as the baseline segmentation module. In each panel, every row shows an input image (enhanced for visualization), the segmentation boundary generated by the baseline model (red contour), the boundary generated by the proposed SemiSeg-CAW model (blue contour), and the ground truth boundary (green contour).
Make 07 00124 g004
Figure 5. Segmentation results on the DDTI dataset. (a) Using MALUNet as the baseline segmentation module. (b) Using UNeXt as the baseline segmentation module. In each panel, every row shows an input image (enhanced for visualization), the segmentation boundary generated by the baseline model (red contour), the boundary generated by the proposed SemiSeg-CAW model (blue contour), and the ground truth boundary (green contour).
Figure 5. Segmentation results on the DDTI dataset. (a) Using MALUNet as the baseline segmentation module. (b) Using UNeXt as the baseline segmentation module. In each panel, every row shows an input image (enhanced for visualization), the segmentation boundary generated by the baseline model (red contour), the boundary generated by the proposed SemiSeg-CAW model (blue contour), and the ground truth boundary (green contour).
Make 07 00124 g005
Figure 6. Segmentation loss and loss-weight progression during training for BUSI (left) and DDTI (right) datasets. The top panels show segmentation loss versus training epochs, and the bottom panels show the corresponding loss weights generated by WGM across epochs.
Figure 6. Segmentation loss and loss-weight progression during training for BUSI (left) and DDTI (right) datasets. The top panels show segmentation loss versus training epochs, and the bottom panels show the corresponding loss weights generated by WGM across epochs.
Make 07 00124 g006
Table 1. Segmentation results on the BUSI and DDTI test sets. The best results are shown in bold.
Table 1. Segmentation results on the BUSI and DDTI test sets. The best results are shown in bold.
DatasetMethodTotalClass OnlyBoth LabelsDiceIOURecallPrecisionSpecificityAccuracy
BUSIMALUNet—Full data572-5720.6720.5060.6710.6730.9700.944
MALUNet—Reduced472-4720.6620.4940.7370.6010.9540.936
SemiSeg-CAW5681004680.6840.5200.6590.7120.9750.948
DDTIMALUNet—Full data328-3280.6280.4570.8310.5040.9050.897
MALUNet—Reduced280-2800.6140.4430.7700.5110.9140.899
SemiSeg-CAW328482800.6560.4880.7920.5600.9280.913
SemiSeg-CAW328962320.6400.4700.7950.5350.9200.907
SemiSeg-CAW3281441840.6330.4630.8170.5170.9110.901
Table 2. Ablation study: the effectiveness of the ClassElevateSeg and WGM modules on the DDTI test set. The best results are shown in bold.
Table 2. Ablation study: the effectiveness of the ClassElevateSeg and WGM modules on the DDTI test set. The best results are shown in bold.
Changes to SemiSeg-CAWDiceIOUTraining Time/Epoch [s]
Removing ClassElevateSeg0.5710.400 10.70 ± 0.95
Static weights instead of WGM0.5720.401 12.60 ± 0.97
Strategy of [21] instead of WGM0.5370.367 12.90 ± 0.83
Removing ClassElevateSeg and WGM [7]0.6010.429 10.30 ± 0.95
LayerCAM [27] instead of ClassElevateSeg0.5830.411 12.60 ± 0.97
SemiSeg-CAW (Full model)0.6100.439 12.50 ± 1.27
Table 3. Training and inference efficiency of SemiSeg-CAW compared to its base models (MALUNet and UNeXt). Inference values represent mean end-to-end latency (Host→Device + forward) with warm-up and synchronization; batch size = 8, input = 256 × 256, AMP off. All models were evaluated on the same GPU/hardware.
Table 3. Training and inference efficiency of SemiSeg-CAW compared to its base models (MALUNet and UNeXt). Inference values represent mean end-to-end latency (Host→Device + forward) with warm-up and synchronization; batch size = 8, input = 256 × 256, AMP off. All models were evaluated on the same GPU/hardware.
BackboneModelParams [M]Training/Epoch [s]Inference [ms/img]
BUSIDDTIBUSIDDTI
MALUNetMALUNet (base)0.18 47.52 ± 2.83 9.83 ± 1.19 2.962.84
SemiSeg–CAW14.50 53.70 ± 1.95 13.23 ± 1.25 5.264.98
UNeXtUNeXt (base)1.47 15.29 ± 0.95 11.45 ± 0.77 1.572.28
SemiSeg–CAW15.85 55.88 ± 0.86 21.32 ± 0.86 3.754.41
Table 4. Segmentation results with UNeXt as the segmentation module. The best results are in bold.
Table 4. Segmentation results with UNeXt as the segmentation module. The best results are in bold.
DatasetMethodDiceIOURecallPrecision
BUSIUNeXt0.7100.5950.6880.722
SemiSeg-CAW0.8130.7040.6730.752
DDTIUNeXt0.6780.5190.8130.598
SemiSeg-CAW0.7100.5520.7370.699
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Barzegar, S.; Khan, N. SemiSeg-CAW: Semi-Supervised Segmentation of Ultrasound Images by Leveraging Class-Level Information and an Adaptive Multi-Loss Function. Mach. Learn. Knowl. Extr. 2025, 7, 124. https://doi.org/10.3390/make7040124

AMA Style

Barzegar S, Khan N. SemiSeg-CAW: Semi-Supervised Segmentation of Ultrasound Images by Leveraging Class-Level Information and an Adaptive Multi-Loss Function. Machine Learning and Knowledge Extraction. 2025; 7(4):124. https://doi.org/10.3390/make7040124

Chicago/Turabian Style

Barzegar, Somayeh, and Naimul Khan. 2025. "SemiSeg-CAW: Semi-Supervised Segmentation of Ultrasound Images by Leveraging Class-Level Information and an Adaptive Multi-Loss Function" Machine Learning and Knowledge Extraction 7, no. 4: 124. https://doi.org/10.3390/make7040124

APA Style

Barzegar, S., & Khan, N. (2025). SemiSeg-CAW: Semi-Supervised Segmentation of Ultrasound Images by Leveraging Class-Level Information and an Adaptive Multi-Loss Function. Machine Learning and Knowledge Extraction, 7(4), 124. https://doi.org/10.3390/make7040124

Article Metrics

Back to TopTop