Advancements in Semi-Supervised Deep Learning for Brain Tumor Segmentation in MRI: A Literature Review

Jin, Chengcheng; Ng, Theam Foo; Ibrahim, Haidi

doi:10.3390/ai6070153

Open AccessReview

Advancements in Semi-Supervised Deep Learning for Brain Tumor Segmentation in MRI: A Literature Review

by

Chengcheng Jin

^1,2

,

Theam Foo Ng

^3,*

and

Haidi Ibrahim

^1,*

¹

School of Electrical and Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, Nibong Tebal 14300, Penang, Malaysia

²

School of Electrical and Control Engineering, Ningxia Vocational Technical University of Industry and Commerce, Yinchuan 750030, China

³

Centre of Global Sustainability Studies, Universiti Sains Malaysia, Minden 11800, Penang, Malaysia

^*

Authors to whom correspondence should be addressed.

AI 2025, 6(7), 153; https://doi.org/10.3390/ai6070153

Submission received: 29 May 2025 / Revised: 21 June 2025 / Accepted: 7 July 2025 / Published: 11 July 2025

(This article belongs to the Section Medical & Healthcare AI)

Download

Browse Figures

Versions Notes

Abstract

For automatic tumor segmentation in magnetic resonance imaging (MRI), deep learning offers very powerful technical support with significant results. However, the success of supervised learning is strongly dependent on the quantity and accuracy of labeled training data, which is challenging to acquire in MRI. Semi-supervised learning approaches have arisen to tackle this difficulty, yielding comparable brain tumor segmentation outcomes with fewer labeled samples. This literature review explores key semi-supervised learning techniques for medical image segmentation, including pseudo-labeling, consistency regularization, generative adversarial networks, contrastive learning, and holistic methods. We specifically examine the application of these approaches in brain tumor MRI segmentation. Our findings suggest that semi-supervised learning can outperform traditional supervised methods by providing more effective guidance, thereby enhancing the potential for clinical computer-aided diagnosis. This literature review serves as a comprehensive introduction to semi-supervised learning in tumor MRI segmentation, including glioma segmentation, offering valuable insights and a comparative analysis of current methods for researchers in the field.

Keywords:

brain tumor segmentation; consistency regularization; deep learning; literature review; pseudo-labeling; semi-supervised learning

1. Introduction

Glioma, the most prevalent primary intracranial tumor, comprises about 28% of all brain tumors and 80% of malignant tumors [1]. The most aggressive type is glioblastoma (GBM), with a mere 6.8% 5-year survival rate [2]. The maximum safe extent of tumor resection (EOR) is an important clinical factor that can improve the survival rate of glioma patients [3], especially when combined with other treatments like radiation and chemotherapy. Currently, neuroimaging is utilized to capture phenotypic information of the whole tumor volume [4]. This gives surgeons a more impartial benchmark to use while performing glioma surgical excision [3]. One of the most effective imaging techniques for identifying brain tumors is magnetic resonance imaging (MRI) [5]. MRI can accurately identify the anatomical structure of the brain and distinguish various soft tissues [6]. There are four modalities in MRI: the T1-weighted sequence (T1), T2-weighted sequence (T2), contrast-enhanced T1-weighted sequence using gadolinium contrast agents (T1Gd or T1ce), and fluid-attenuated inversion recovery (FLAIR) sequence [7].

Due to high heterogeneity, the glioma and sub-regions show different appearances, shapes, and signal intensities in MRI [8]. Generally, there are four important regions used for diagnosis: the normal tissue, enhanced tumor (ET), tumor core (TC), and whole tumor (WT) [8]. Furthermore, segmenting brain tumors at the sub-region level is the core task, which involves identifying pathological regions, such as enhancing tumor tissue, edema, and necrosis, and distinguishing them from normal structures like white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) [9].

Nevertheless, the accurate delineation of gliomas poses substantial challenges in medical imaging. These tumors tend to infiltrate surrounding brain tissue, causing the edges to appear blurred and poorly defined in MRI [1]. Furthermore, the anatomical structure and shape of brain tumors are quite complicated and flexible, with significant differences in location and morphology among patients [10], creating difficulty for experts to segment them. In addition, since MRI provides 3D medical data, presented as a sequence of 2D slices, manually segmenting an enormous amount of MRI slices is both costly and not practical [11]. Hence, researchers have begun to explore automatic MRI segmentation of brain tumors using computer technology.

The application of deep learning (DL) to computer-aided diagnosis (CAD) has opened new possibilities in brain tumor segmentation, contributing to more accurate diagnoses, earlier identification, better classification, and higher patient survival rates [5]. Particularly in developing nations, where experts are in short supply, reliance on CAD for early diagnosis can lower mortality [5].

Glioma MRI segmentation using DL has extensively relied on the Brain Tumor Segmentation dataset (BraTS) [12], which is regarded as a standard benchmark. Since its inception in 2012, the annual BraTS Challenge has been organized by the Medical Image Computing and Computer-Assisted Intervention (MICCAI) society [12,13]. It is becoming a consistent platform for comparing cutting-edge brain tumor segmentation algorithms, and it has evolved over time through successive versions of the multimodal BraTS challenges. The releases of BraTS 2018, 2019, 2020, 2021, and 2023 have been particularly influential. BraTS 2021 currently represents the largest dataset in the series, with 1251 training samples and 219 validation cases featuring concealed ground-truth labels, while the most recent release, BraTS 2023, matches this sample size, but introduces modifications in label processing. Quantitative analysis reveals a positive correlation between training-set size and segmentation accuracy, with the most pronounced improvements occurring in the ET and TC regions [14,15].

As of right now, U-Net [16] and Transformer [17] are the most widely employed methods for glioma segmentation. Both have exhibited strong capabilities in delineating various tumor sub-regions. However, to train neural networks for segmentation, these methods require huge labeled datasets [4]. Such datasets require experienced and highly trained experts to complete—a process that is often time-consuming and expensive. Thus, exploring strategies that enable accurate segmentation with less labeled data has become an important research area for scholars.

In contrast to fully supervised methods, semi-supervised learning (SSL) incorporates labeled and unlabeled data to enhance model performance while reducing annotation dependency. It has become a very active topic for medical image segmentation. State-of-the-art SSL methods emphasize three core strategies: leveraging limited ground-truth annotations for supervision learning, developing robust feature extraction frameworks for unlabeled data, and designing innovative mechanisms to synergistically combine both information sources during model optimization [18]. Some semi-supervised methods achieve excellent segmentation performance on par with the fully supervised baselines or even surpassing them [19]. This means that SSL training can generate more effective guidance compared to purely supervised learning for brain tumor segmentation tasks while dramatically reducing annotation requirements and facilitating clinical deployment of CAD solutions.

While certain supervised DL methods deliver accurate segmentation with comparatively simple model structures and faster convergence, these benefits remain fundamentally constrained by the need for large-scale annotated data [20]. Unlike these methods, SSL approaches exploit latent representations and feature correlations in unlabeled data to enhance learning efficiency. Notably, the effectiveness of SSL is grounded in three fundamental theoretical assumptions: the cluster assumption (similar instances belong to the same class), low-density separation (decision boundaries should lie in low-density regions), and the manifold assumption (data resides on a lower-dimensional manifold) [21,22,23]. These principles enable effective generalization to unseen examples from limited labeled data by exploiting the underlying data information.

Specifically, in medical imaging, the cluster assumption implies that pixels or voxels with similar intensity patterns or spatial features, i.e., occupying the same high-density region in the feature space, should have the same semantic labels (e.g., tumor vs. healthy tissue). This principle enables algorithms to propagate annotations from limited labeled data to anatomically similar unlabeled regions. The low-density separation locates tumor boundaries in anatomically plausible transition areas (e.g., tumor infiltration areas), which are the low-density areas of the feature space. This facilitates the refinement of tumor boundary delineation through the incorporation of abundant unlabeled data. Meanwhile, the manifold assumption reflects the local smoothness of the decision surface, ensuring that neighboring pixels or voxels in the feature space yield consistent predictions. It maintains spatially coherent predictions, which is critical for clinical interpretability.

The main paradigms in SSL include pseudo-labeling and consistency regularization. Beyond these, Generative Adversarial Networks (GANs), contrastive learning, and the hybrid models of the above methods also show increasing promise. Due to the scarcity of annotated medical data, SSL algorithms have gained increasing attention in medical image analysis. While several existing reviews have summarized the application of SSL models in tumor segmentation across various organs [24], this review specifically focuses on their utilization for glioma segmentation. We present research articles published between 2022 and 2025 that have employed BraTS datasets, which provide multimodal glioma MRI scans. In the following sections, we present a detailed comparison and discussion of the selected SSL methods.

2. Pseudo-Labeling Methods

Pseudo-labeling is a relatively straightforward method in SSL. It generates pseudo-labels for unlabeled images by training an initial model with fewer labeled examples. Then, the pseudo-labeled samples are added to the initially labeled examples to iteratively improve the model [25,26]. This method mimics a supervised learning loop with the addition of the unlabeled samples [26]. However, due to the scarcity of labeled data in 3D medical imaging, pseudo-labels are often noisy, as they are generated from imperfect early-stage predictions. Therefore, inaccurate segmentation masks generated by the model may misguide the semi-supervised training process, leading it to learn from incorrect targets instead of real anatomical boundaries [27]. Hence, designing effective voxel-wise pseudo-label selection mechanisms is a vital challenge with these training paradigms [26,28].

Researchers have adopted self-training, co-training, and tri-training to generate pseudo-labels, with the latter two belonging to multi-view learning. Moreover, pseudo-labeling techniques have been extensively employed in federated semi-supervised learning in the medical imaging field to address challenges such as data privacy and annotation scarcity [29].

2.1. Self-Training

Self-training [30] adopts an iterative strategy to enhance limited labeled datasets through the generation of pseudo-labels. As illustrated in Figure 1, the process begins with the training of a baseline encoder–decoder model on annotated medical images to generate initial segmentation masks. Subsequently, the same model is applied to unlabeled data, assigning pseudo-labels based on the maximum class probabilities, as depicted in Figure 1a. These pseudo-labels are blended with the initial labeled data to create an augmented training set. This set is used to retrain the model successively until the model satisfies the segmentation evaluation criterion, as shown in Figure 1b.

Nevertheless, the effectiveness of this approach is constrained by the data distribution and available samples. If labeled data is scarce and uneven, roughly generated pseudo-labels are often troubled by noise, resulting in erroneous pseudo-labels, which can greatly hinder the supervised training process [31]. There exists a substantial volume disparity between healthy tissues and tumors in complex medical imaging scenarios. Furthermore, variations in tumor grade distribution and dataset size in BraTS challenge also significantly influence segmentation performance. For instance, BraTS2020 includes 259 high-grade glioma (HGG) and 110 low-grade glioma (LGG) cases. The latter accounts for 29.8%, which is the highest proportion among all BraTS releases. Liu et al. [32] reported a 0.4% improvement in Dice score on BraTS2020 compared to BraTS2019 with the same model, highlighting the impact of data distribution. BraTS2021 further expands the dataset to 1470 cases, in contrast to only 351 in BraTS2018. Models reported in [14,15] showed considerable improvements over BraTS2021, particularly in the segmentation of ET and TC, likely due to the increased diversity and volume of training data.

Such imbalances in the training data may result in inaccurate annotations, which may persist throughout the optimization [33,34,35]. Specifically, in these cases, the algorithm tends to learn the dominant class characteristics more effectively, often at the expense of under-represented classes. This imbalance may lead to degraded segmentation performance for minority regions, accompanied by increases in false positives and misclassifications [36]. Consequently, significant confirmation bias is introduced by overfitting with these erroneous labels. The model reinforces its existing hypothesis by favoring consistent information and disregarding contradictory cues. This ultimately impairs the model’s generalization performance. Therefore, researchers have proposed many confidence estimation and uncertainty-aware approaches to obtain higher quality pseudo-labels.

Huang et al. [37] developed a semi-supervised evidence fusion framework (SEFNet) with image information, aiming at decreasing segmentation uncertainty. The approach computes two complementary evidence sources: probability functions and mass functions. The former is produced from a SoftMax layer and the latter from an evidential neural module. To fuse these evidence components, the Dempster–Shafer Theory (DST) [38] is applied as a decision fusion mechanism. For unlabeled data, the framework incorporates information constraints via image transformations to guide model training.

Isocitrate dehydrogenase (IDH) genotyping is a critical biomarker for both diagnosis and prognosis in glioma patients [39]. Cheng et al. [40] adopted SSL to deal with the expensive acquisition of IDH mutation data. Despite selecting high-confidence predictions, the traditional pseudo-labeling method may still remain susceptible to inaccuracies due to inadequate model calibration, thereby introducing noisy training. Hence, the author proposed uncertainty-aware pseudo-label selection (UPS), which extracts a more reliable subset of pseudo-labels to enhance the accuracy of IDH genotyping.

Furthermore, Xu et al. [41] proposed Bayesian pseudo-labeling, which generates pseudo-labels with Bayes’ theorem. A stochastic training mechanism was also introduced to dynamically learn pseudo-label thresholds, enabling fully automatic generation of high-quality labels. The method performs better than SSL and supervised baselines in brain tumor segmentation tasks. To mitigate domain shift across labeled and unlabeled images, Qin et al. [42] introduced a technique called Uncertainty-Based Region Clipping (URCA). Rather than randomly cropping and flipping images, this approach generates new unlabeled images that contain uncertainty estimates derived from dual-branch predictions. This facilitates the generation of high-confidence pseudo-labels, thereby enabling the model to achieve superior segmentation performance with a smaller amount of labeled data.

Although these methods improve pseudo-label quality by filtering for high-confidence predictions, it inevitably discards potentially informative but uncertain data, leading to a reduced volume of usable training samples [43]. Notably, even unreliable pseudo-labels may contain valuable information. Hence, except for preserving confident pixel-level pseudo-labels as ground truths, Rahmati et al. [43] refined less reliable labels through an active contour adaptation model. This method maximizes the utility of all unlabeled data, thereby enhancing the extraction of informative features from medical images.

2.2. Co-Training

Co-training [44] represents another pseudo-labeling strategy where multiple models collaboratively learn by exchanging complementary information derived from one another. In essence, co-training is a multiple-view learning approach, with each view corresponding to a feature set and a classifier. Based on the consensus principle, the mutual consistency of different views is alternately maximized to improve the accuracy of pseudo-labels from unlabeled data [45]. Peiris et al. [45] designed Co-BioNet, a co-training framework that integrates two segmentation networks with a pair of critic networks. Compared to conventional co-training approaches [46], this approach allows both models to learn from the same set of labeled and unlabeled data while achieving inter-model consensus to identify reliable prediction regions. The framework further distinguishes itself from dual-task consistency (DTC) approaches [47] by leveraging dual critics to evaluate uncertainty across different views. Experiments show that Co-BioNet performs well on four publicly available datasets, highlighting the benefits of its unified dual-view design with integrated uncertainty guidance.

Additionally, Thompson et al. [48] integrated super-pixels into SSL, using features and edges of a super-pixel map, which represents valuable neighboring pixel clusters to enhance pseudo-label quality during model training. With this strategy, the WT and TC Dice scores can reach 0.824 and 0.707 when only 5 annotations are available. To deal with class imbalance, Wang and Li [49] employed a Dual-debiased Heterogeneous Co-training (DHC) framework, which incorporates two complementary dynamic weighting mechanisms. Specifically, distribution-aware debiased weighting (DistDW) focuses on minority classes, and difficulty-aware debiased weighting (DiffDW) reduces learning bias by slowing down the learning speed for easier classes. DHC improves medical image segmentation performance by jointly optimizing two complementary sub-models through collaborative learning.

To mitigate the dependence on pseudo-label quality, Zhao et al. [35] suggested a co-training strategy with cross-consistency regularization. It exhibits a strong representational capacity for complicated structures and fine-grained segmentation details with an adaptive weight-balancing mechanism. The strategy employs a truncated Gaussian weighting mechanism according to the marginal probability distribution, which simultaneously improves pseudo-label reliability and guarantees high utilization efficiency. Furthermore, to address class-imbalanced pseudo-labels caused by varying learning difficulties across classes, the authors presented a unified alignment method to optimize label distribution and model robustness.

2.3. Tri-Training

As an extension of the co-training framework, Tri-training [50] employs a bootstrap-based ensemble learning method. Specifically, it resamples the labeled data into three subsets, each of which is used to train an independent base classifier. When two classifiers agree on an unlabeled data classification, the label is employed as a pseudo-label for the third classifier. It does not need explicit confidence estimation. This consensus-based pseudo-labeling approach is the core contribution of Tri-training, providing a cost-effective alternative to traditional confidence-based methods. It also makes Tri-training applicable to semi-supervised image segmentation with limited labeled data. In contrast, classical ensemble methods improve performance by aggregating the outputs of multiple models to reduce variance and bias, which are also based on various models. But ensemble approaches typically require large amounts of annotated data, thereby making them more appropriate for fully supervised tasks.

Xing et al. [51] added fuzzy rough-set theory into SSL and introduced a novel Tri-training framework based on multi-view data representation. In contrast to traditional Tri-training, it constructs three complementary data views, namely the original view, a principal component analysis (PCA)-transformed view, and a discretized granular view. This multi-view learning strategy enables a more comprehensive representation of semi-supervised data. By enhancing the diversity among base classifiers, the fuzzy rough set-based Tri-training model more effectively leverages unlabeled data, thereby improving overall learning performance. To address label noise in learning tasks, Zhou et al. [52] developed Tri-correcting, an enhanced ensemble method based on Tri-training. The framework works as follows: an initial classifier is trained on noise-filtered support subsets, followed by iterative refinement via consensus voting, a dynamic instance allocation strategy, and classifier aggregation. Tri-correcting has been demonstrated to improve classification accuracy under conditions of mild label noise.

These pseudo-labeling strategies show unique strengths for medical image segmentation. Self-training has been widely applied in the field, owing to the ease of the single-classifier structure and its efficient implementation. While their noise sensitivity may lead to error propagation within iterated learning, scholars have explored numerous approaches to optimize the confidence of pseudo-labels [41,42]. Compared to self-training, co-training deals with dual-view learning from two classifiers. The view diversity introduces useful data perturbation [53] into the model, which renders the model robust. Co-training has shown promising performance in glioma MRI segmentation. Nonetheless, it depends heavily on the view independence assumption, which may not hold in real applications. When views are redundant or coupled with each other, the segmentation accuracy can significantly degrade [23]. Correspondingly, Tri-training also enhances model diversity by training three base classifiers with bootstrap resampling, which decreases the prediction variance among ensemble estimates [53]. Nevertheless, we found that Tri-training has been relatively less applied in medical image segmentation than self-training and co-training methods.

3. Consistency Regularization

Consistency regularization utilizes labeled data to enhance the consistency of various perturbations and extends it to unlabeled data. It exhibits superior generalizability by enforcing cross-view, cross-model, and cross-task consistency constraints [54]. According to the smoothness assumption, if x and

x^{'}

are close in the input space, their labels (y and

y^{'}

) should be the same [55]. Hence, consistency regularization explores an unsupervised way to leverage unlabeled data by enforcing predictions that are consistent across various perturbed versions of the same input [26].

To enforce prediction consistency, researchers have adopted perturbations at the data level and network level. The former encourages a data-invariant representation by introducing different perturbations (e.g., data augmentation and dropout) to input samples, while network-level consistency enforces perturbations in multiple model architectures. When labeled data is scarce, it is also an effective alternative to extract valuable information from other relevant tasks for the performance enhancement of the main task.

3.1. Network-Level Consistency Regularization

No matter what changes are made to the model for the same image, the network is expected to maintain consistency in the output, i.e., network-level consistency regularization [24]. Some of the popular consistency regularization methods include the

Π

model [56] and the temporal ensemble [56], and mean teacher [57,58] methods. According to the research reported in [24], these models can mainly be divided into single-network, double-network, double-decoder, and multi-decoder models according to the number of networks, as shown in Figure 2a.

X_{l}

represents labeled tags,

X_{u}

represents unlabeled tags, and

T (X_{u})

and

T^{- 1}

represent the transformation and inverse transformation of unlabeled tags.

L_{s}

and

L_{u}

refer to the supervised loss and unsupervised part, respectively.

η

and

η^{'}

represent stochastic perturbations (e.g., Gaussian noise).

3.1.1. Single Model

The

Π

model and temporal ensemble both belong to a single network, as illustrated in Figure 2(a-i), encouraging consistency between the outputs of two perturbing variables for the same input. Laine and Alia [56] proposed the

Π

model, whose structure is shown in Figure 2b. For each sample, under the same input stimulus (

x_{i}

), different data augmentation and dropout approaches are adopted, and consistency regulation loss is applied to constrain the variance between the two prediction outputs (

z_{i}

and

\tilde{z_{i}}

), aiming to minimize them. However, as each sample has to be calculated twice, the training efficiency is low, and this method may cause noise. Hence, Laine and Alia [56] employed a temporal ensemble to simplify and extend the

Π

model, whose structure is illustrated in Figure 2c. Aggregating exponential moving average (EMA) predictions into one integrated prediction (

\tilde{z_{i}}

) to evaluate multiple previous network predictions allows the network to be evaluated only once during training, achieving approximately twice the acceleration of the

Π

model.

3.1.2. Dual Model

While maintaining EMA predictions requires that all previous

z_{i}

values be stored, the training process runs at a slow pace with a burden of memory, and it is hard to realize online training. Therefore, the mean teacher (MT) model [57] was introduced, which employs EMA weight updates rather than label predictions for the student model. MT is a typical dual network, as shown in Figure 2(a-ii).

The tight coupling between the EMA teacher and student models limits the teacher’s ability to provide additional valuable knowledge. When the student model produces biased predictions for certain samples, the EMA teacher is prone to propagating these errors and enforcing them on the student, making the misclassification irreversible [59]. Given the inherent coupling of the EMA teacher constraining the performance ceiling of existing teacher–student paradigms, Ke et al. [59] introduced the double student approach, which substitutes another student for the teacher, and put forward stable samples and stability constraints for the new model, achieving significant improvements in SSL classification challenges.

Nevertheless, the proposed model brings additional architectural complexity, since two independent networks have to be trained at the same time. Therefore, implementation complexity and computational demands are increased as well. In contrast, the MT framework remains a relatively simple design, as the teacher model is not updated with gradient descent but, instead, tracks the student’s parameters through EMA. This mechanism delivers stable supervision while keeping the overall training process efficient. Since dense features can better map real labels [60], Long et al. [31] utilized dual network modules to obtain the feature layer and segmentation layer of unlabeled data, which are then passed to a feature similarity module (FSM) to enforce constraints, correcting error-prone guidance and indirectly leveraging unlabeled data.

3.1.3. Dual Decoder and Multiple Decoders

However, the above SSL algorithms underestimate the importance of challenging areas in medical images, such as adhesive edges or thin branches. Wu et al. [61] argued that these unlabeled, ambiguous regions may contain more critical information. When training deep models with limited annotations, it is prone to generating highly uncertain and erroneous predictions. Therefore, it is essential to focus on and reinforce the learning of such challenging samples. The mutual consistency network (MC-Net) [61] is employed to achieve semi-supervised 3D left atrium segmentation. This framework is composed of a common encoder followed by two slightly distinct decoders, as shown in Figure 2(a-iii). The model leverages the difference between the two outputs to estimate prediction uncertainty. Furthermore, a cyclic pseudo-label scheme is employed to enforce mutual consistency across predictions. This approach guides the model to progressively learn robust features from unlabeled complex regions.

In order to further leverage these challenging samples and improve segmentation efficiency, Wu et al. [62] introduced MC-Net+, a mutual consistency framework featuring a shared encoder paired with several decoders with diverse designs (i.e., with different up-sampling strategies), as shown in Figure 2(a-iv). The highly uncertain area, i.e., the unlabeled hard region (pixels that are hard to classify correctly, including pixels with low contrast and those representing irregular shapes of lesion tissue or tumors), is represented by calculating the statistical difference from the multiple decoders’ output. A mutual consistency constraint is enforced between one decoder’s probabilistic predictions and others’ soft pseudo-labels, minimizing output discrepancy and encouraging the model to generate stable predictions in uncertain regions.

However, performance inconsistency among MC+ sub-models may degrade overall effectiveness. For example, if one sub-model underperforms, it may produce unreliable pseudo-labels that differ greatly from those of other sub-models, leading to unstable supervision for unlabeled data [63]. Therefore, researchers have undertaken new attempts on this basis [63,64,65].

Du et al. [63] adopted a structure similar to the dual decoder with U-Net as the backbone and added a fine-grained segmentation head to the original decoder output, while the original output was used as a coarse-grained segmentation head. The model achieved the consistency constraint of the coarse–fine segmentation prediction, wherein the fine segmentation head is not affected by the coarse head, offering a more stable target for unlabeled samples.

Nevertheless, Li et al. [65] found these models still tend to struggle with boundary prediction errors, particularly when training data is limited. Hence, they introduced a contour-aware consistency network consisting of a shared encoder, a vanilla primary decoder, and an auxiliary decoder enhanced with contour information. The latter improves the corresponding features so that the predictions resembled the anatomy as closely as feasible. Additionally, pseudo-labels are generated by fusing outputs from both decoders, and a self-contrast strategy is employed to optimize the segmentation performance in ambiguous areas.

Furthermore, although both global image-level information (e.g., geometric structures) and local pixel-level details are crucial for accurate segmentation, the latter are typically not fully utilized. To this end, Du et al. [63] proposed a patch-wise loss strategy that splits the feature map into multiple regions. Specifically, patches near object boundaries are adopted to calculate the Intra-Patch Ranked Loss (Intra-PRL) for boundary refinement, and those far from the boundary are adopted for Inter-Patch Ranked Loss (Inter-PRL), which promotes the robustness of low-contrast pathological regions. This spatially adaptive strategy enhances the model’s ability to produce fine-grained and reliable segmentation, particularly in challenging areas with ambiguous boundaries or weak contrast. Zhao et al. [66] argued that it is necessary to further consider not only the overall sample consistency but also the local stability at the voxel or pixel level. Therefore, an information exchange paradigm that identifies and utilizes stable voxel features from 3D imaging data was introduced to guide training. This mechanism replaces the traditional EMA approach commonly used in MT frameworks with a more robust inter-model constraint strategy between dual student models.

3.1.4. Uncertainty-Aware Approaches

Uncertainty research is a very important part of SSL in terms of improving the model’s learning ability. Without a ground truth for unlabeled samples, the predictions generated by the teacher model may exhibit significant noise and reliability issues, potentially undermining model performance. Yu et al. [67] designed the uncertainty-aware mean teacher (UA-MT) framework to improve SSL by guiding the student model with reliable predictions, wherein the teacher model utilizes Monte Carlo (MC) sampling [68] to estimate prediction uncertainty for each target, filters out high-uncertainty outputs, and retains only confident ones. This ensures the student learns from stable and trustworthy targets, enhancing training stability and segmentation performance. Wang et al. [69] constructed a tripled-uncertainty guided framework, wherein a student model performs three tasks under the guidance of a teacher model. Uncertainty-weighted integration (UWI) was proposed to calculate uncertainty, where pixels with higher weights are highly confident, allowing the teacher to focus on more trustworthy regions during knowledge transfer.

Most methods estimate uncertainty from a single model, potentially ignoring inherent noise during early training when model predictions are unstable. Xu et al. [70] proposed a dual uncertainty strategy, where cross-model uncertainty estimation between two networks identifies high-confidence regions for segmentation while preventing error propagation. Similarly, to better employ unlabeled data, Lyu et al. [71] incorporated dual uncertainty-aware methods (i.e., segmentation uncertainty and anatomical information reconstruction uncertainty). Meanwhile, their model adopts a 3D attention mechanism, selectively transferring relevant features while suppressing noise, enabling adaptive multi-stage spatial information aggregation.

However, some limitations still restrict the overall performance of SSL frameworks, including the incomplete leveraging of unlabeled images and the inadequate exploitation of pseudo-labeled information. Consequently, Long et al. [31] employed a high-confidence network architecture augmented by a reliable region enhancement module (REM). By enabling two networks to interact and mutually refine each other’s predictions, this design establishes a collaborative triple-network framework, as depicted in Figure 2d, wherein the light apricot background represents the dual-network,

f (θ_{i})

is the different network model,

P_{i}

denotes the predicted segmentation outputs of model i, and

Z_{i}

indicates the reliable pseudo-labels in the third model. The dual network constantly has mutual constraints in training and learning, then loads the best model parameters (

o p t (θ_{1}, θ_{2})

) from the current iteration to the high-confidence model (

f (θ_{3})

). Through further filtering, more precise pseudo-labels are generated to constrain

f (θ_{1})

and

f (θ_{2})

, exhibiting superior quality compared to those generated by the dual network. This process is beneficial for the full exploitation of pseudo-label information, reinforcing trustworthy regions via weighted emphasis and mitigating the adverse effects of low-confidence areas.

In UG-MCL [72], a dual-task architecture was proposed to maximize the utility of unlabeled data by jointly modeling region-level semantic structures and boundary-level spatial constraints. Uncertainty estimates are employed to guide intra-task and cross-task consistency learning to capture better geometric shape cues from diverse views. Xu et al. [72] found that voxel-average properties of consistency loss, such as mean square error [57], typically reduce the segmentation accuracy in some blurry regions. Nevertheless, these regions may contain key information about tumors. Hence, Xu et al. [26] developed the ambiguity-consensus mean-teacher (AC-MT) model, encouraging perturbation stability in structurally ambiguous yet semantically informative regions, such as slender branches or indistinct edges, in which an estimated ambiguity map is also introduced to encourage the prediction consistency of the two models in these regions.

Nearly all of the current approaches employ global uncertainty estimation to increase prediction confidence, neglecting the local region-level uncertainty [66]. Anatomical regions with blurred boundaries remain difficult to segment accurately, as the global uncertainty of low-confidence regions is insufficient to impose a boundary. Hence, Zhao et al. [66] proposed a Voxel Reliability Constraint (VRC), which applies the regional uncertainty constraint strategy based on reliable voxels to optimize low-confidence predictions.

In [42], a visual comparison of glioma segmentation results was conducted, with a focus on anatomically challenging regions. Their comparison shows that some structures are often prone to segmentation errors due to their subtle and irregular morphology. Notably, uncertainty-aware models have exhibited varying strengths in capturing these complex features. The MT model is more sensitive to the subtle peripheral structures, while AC-MT and URCA achieve more accurate delineation of internal folds. Among them, URCA produces segmentations most consistent with the overall tumor morphology, suggesting its superior capacity in capturing spatial details under uncertainty.

3.2. Data-Level Consistency Regularization

Data-level consistency assumes that model predictions should remain the same under various transformations or perturbations of the input. Consistency regularization essentially enforces this principle by applying augmentation or random noise to unlabeled data, which encourages the model to generate smoother and more robust predictions [24]. The most popular transformation method is data augmentation, and perturbations are typically referred to as random noise. Data augmentation is a crucial step in strengthening the model’s generalization ability at the unsupervised stage. However, if the perturbation added to unlabeled data is relatively weak, particularly during the initial training phase, the student model may quickly adapt to these simple variations. This results in the learning process plateauing shortly after reaching a local minimum, and a phenomenon termed the Lazy Student [73]. More importantly, if the student and teacher experience more significant perturbations on their own, the performance gap between them will widen, and in the end, the student will drag the teacher down via the EMA, ending the learning cycle.

Researchers have been using mix-up methods more frequently to enhance inter-task dependencies [74,75,76,77]. Mixed samples blending at varying rates are shown to the teacher and student networks throughout the unsupervised training. This design facilitates the learning of robustness under distribution shifts, enhancing the generalization performance of models.

The MixMatch algorithm [78] estimates low-entropy labels based on data-augmented unlabeled samples and adopts MixUp to mix labeled and unlabeled data. Another approach with a mix-up operation is FixMatch [79], which applies a dual forward mechanism. It first applies mild perturbations (e.g., Gaussian noise, random rotations, and translations) to generate pseudo-labels, which are then used to guide predictions under more aggressive augmentations (e.g., color jitter, CutMix [75], and context augmentation), thereby enhancing model robustness [41,79].

While FixMatch and its derived methods do exceptionally well in image classification, the clustering assumption breaks down at the pixel level, making them inapplicable to image segmentation tasks. Therefore, Gaussian noise was added as a weak augment and RandAugment [80] as a strong augment to make FixMatch fit for segmentation tasks [41]. On the other side, instead of using mixing techniques directly for data augmentation as in existing methods, Shu et al. [81] introduced a Cross-Mix teaching paradigm by untangling the mixing operation. Specifically, unlabeled data are mixed and fed into the student model, while the corresponding predictions from the teacher model are also fused to ensure consistency under data perturbation. This cross-mixing approach effectively prevents the phenomenon of lazy students by using the mixed student input as extra difficult assignments given by rigorous teachers.

With a focus on clinical applications, Qu et al. [82] incorporated motion artifact simulation into the strong data augmentation technique to construct a motion artifact-augmented pseudo-label network for semi-supervised segmentation (MAPSS). When dealing with the same motion artifact, the Dice coefficient for brain tumor segmentation using this method was reduced by approximately 6%, whereas the MT and UA-MT methods had an average decrease of roughly 50%. To address the inconsistencies in feature distributions across source domains and alleviate domain shift, Hu and Meng [83] proposed AutoMixLayer. This module dynamically learns mixing parameters during training to concurrently integrate multiple domain-specific styles. By synthesizing a unified domain representation, it significantly improves the model’s generalization performance across different domains.

3.3. Task-Level Consistency Regularization

Beyond consistency learning based on data or network perturbations, an alternative research direction emphasizes task-level regularization [47,66,69,72]. This introduces auxiliary tasks to facilitate the extraction of informative representations from unlabeled data in segmentation scenarios. Multi-task learning serves as a form of task-level perturbation in SSL, and explicit modeling of regularization across auxiliary and main tasks is fundamental for enhancing semi-supervised segmentation performance [66].

Luo et al. [47] first constructed the task-consistency constraint for SSL, i.e., Dual-Task Consistency (DTC), aiming to minimize consistency between a pixel-level classification task and a level-set regression task with unlabeled data, which considers the difference between various tasks and only requires one inference. Wang et al. [69] introduced two supplementary learning objectives to guide the model, apart from the segmentation task. The first is a foreground–background reconstruction task designed to enrich semantic understanding, while the second involves predicting signed distance fields (SDFs) to introduce shape constraints. Following the design of the MT framework, Zhang et al. [72] adopted an uncertainty-guided mutual consistency learning framework consisting of dual-task output branches, i.e., generating segmentation probability maps and signed distance maps, leveraging geometric shape information of images through intra-task consistency constraints and inter-task consistency constraints. Additionally, the framework utilizes the uncertainty estimated from the teacher model, filtering out the higher confidence part to guide students in learning the model.

Some multi-task consistency regularization strategies tend to ignore the inherent uncertainty in each task. As a result, propagating uncertain outputs, especially mapping low-confidence predictions from auxiliary tasks into a predefined space, may misguide the learning of the main task. Therefore, Zhao et al. [66] incorporated optimized uncertainty estimation within auxiliary tasks, wherein the multi-task learning architecture is based on hard parameter sharing. This design further reinforces task-level consistency and improves both the accuracy and scalability of the semi-supervised segmentation model.

3.4. Other Consistency Regularization

In addition to consistency regularization at the data level, network level, and task level, researchers have also developed other consistency regularization methods. Chen et al. [54] introduced the Fusion-Guided Dual-View Consistency Training (FDCT) framework, which establishes a bidirectional interaction and fusion mechanism between dual-view representations to mitigate prediction uncertainty. Instead of enforcing direct consistency, FDCT generates fused pseudo-labels by selecting the most accurate prediction for each pixel after comparing the outputs from both views.

Xu et al. [70] found that many consistency regularization-based methods typically apply at the 2D slice level within 3D volumetric data. This approach often neglects the inherent voxel-wise contextual information embedded in the 3D structure, potentially resulting in boundary inaccuracies during segmentation and discontinuities across adjacent slices in volumetric reconstruction. Hence, the Mixing Volume Consistency Module (MVCM) was proposed, which leverages voxel-level spatial information and enforces consistency constraints between the mixed volumes before and after segmentation, thereby improving segmentation accuracy in 3D medical imaging tasks.

Most of these semi-supervised segmentation methods mainly employ paired image augmentation strategies (i.e., two augmented versions from the same labeled sample). But they ignore the potential of unpaired data (e.g., labeled and unlabeled images). To bridge this gap, Chen et al. [84] first developed a framework that facilitates knowledge transfer across labeled and unlabeled samples. They designed a special module to capture class-specific consistency by modeling multi-scale contextual and structural affinities between labeled and unlabeled samples, enhancing model performance without requiring rigorous image pairings.

Moreover, to deal with modality loss, Wu et al. [85] introduced a synthesis-driven strategy that reconstructs FLAIR images from other modalities. They employed a cascaded dual-task learning framework, incorporating multiple regularization strategies, including simultaneous synthesis and coarse segmentation, perceptibility regularization, and error prediction consistency. This approach yielded segmentation results that were on par with those achieved using FLAIR, demonstrating its robustness in scenarios with incomplete modality information.

Consistency-based methods have emerged as a prominent paradigm in semi-supervised medical image analysis due to their efficacy in leveraging limited annotations. In such approaches, supervised loss functions guide labeled data to influence their corresponding samples, whereas unlabeled data are processed through perturbation-induced consistency constraints that operate without direct label supervision. Based on the prototypical networks, Xu et al. [86] introduced a cyclic prototype consistency learning (CPCL) framework, that operates via two complementary consistency paths: a labeled-to-unlabeled (L2U) forward process, where class prototypes derived from labeled data guide segmentation of unlabeled images, and an unlabeled-to-labeled (U2L) backward process, where flow is inverted by using prototypes from unlabeled images to segment labeled images. This bidirectional design converts traditional unsupervised consistency into a prototype-driven “supervised” consistency mechanism. By iteratively interacting between these complementary processes, the model learns more discriminative feature representations, ultimately improving segmentation accuracy and robustness.

Xie et al. [87] identified model prediction inconsistency as a key signal for assessing sample difficulty, which was then fused with uncertainty metrics to enhance selection criteria. Through the integration of active learning strategies into the consistency regularization framework, this method effectively targets ambiguous regions for annotation, thereby improving model performance while minimizing labeling costs. Another similar case has been used for glioma segmentation. Wang et al. [88] introduced a consistency-driven active annotation strategy that applies different perturbations to input data and evaluates prediction consistency. Samples with lower consistency scores—indicating higher uncertainty or difficulty—are prioritized for annotation as informative candidates. This approach effectively reduces labeling effort by selecting representative samples, thereby enhancing model robustness. Notably, this method has demonstrated potential for successful translation into clinical annotation pipelines.

4. Contrastive Learning

With the widespread attention to contrastive learning (CL) in computer vision, many researchers have introduced it into consistency learning. To enhance image quality, data augmentation is kept invariant, and features of different views from the same sample (positive pairs) are aligned, while those from distinct samples (negative pairs) are separated. This process enables the network to capture discriminative and robust semantic features from both labeled and unlabeled data [69].

Contrastive loss has been effectively leveraged in SSL for medical image analysis due to its ability to maximize the mutual information between representations. Specifically, it enhances learning by increasing the similarity between positive pairs while simultaneously reducing that between negative pairs [19]. In regular applications, augmentation strategies, such as geometry transformation or intensity variation, are applied to the original images to generate varied views. The model is then trained to discriminate between similar and dissimilar patches, i.e., those extracted from the same sample, encouraging the learning of more discriminative and generalizable features, which ultimately benefits segmentation accuracy [24,65]. From the perspective of data, CL can be classified into image-level and pixel-level paradigms, each focusing on capturing consistency at various spatial granularities [24].

4.1. Image-Level CL

At the image level, CL generally introduces different perturbations or augmentations to a single image to generate multiple views, which are utilized to construct positive and negative feature pairs [24]. These feature sets are then compared with one another through contrastive objectives to guide representation learning. Unlike conventional SSL approaches that typically rely on single-sample regularization per forward pass, CL explores inter-sample relationships, modeling a richer and more structured data distribution. To fully exploit this potential, Wang et al. [69] integrated the contrastive constraint into the encoder, applying an inner product similarity measure to evaluate the correlation between queries and keys. In this design, features extracted by the student encoder serve as queries, while those from the teacher encoder act as keys. The objective is to push closer representations originating from positive pairs and push away those from negative pairs, thereby facilitating the encoder in learning more discriminative and semantically coherent features.

To further enhance consistency regularization, Xu et al. [70] integrated a Contrastive Training Module (CTM) into the segmentation framework. Specifically, the module first utilizes a 3D average pooling layer followed by a multi-layer perceptron (MLP) to compress 3D feature maps into 1D feature vectors, enabling effective similarity measurement. By comparing original and augmented input representations, CTM encourages the clustering of similar semantic samples while separating less correlated ones. Thereby, it improves pathological feature learning by promoting augmentation invariance.

4.2. Pixel-Level CL

On the other side, to enhance dense prediction in applications like medical image segmentation, involving more delicate prediction at the pixel level, CL has been extended to operate at a more localized scale [19,65,89,90]. To address limitations such as over-locality and overfitting inherent in pixel-wise contrast, Liu et al. [89] presented the first local CL framework operating over multi-scale feature maps. This framework employs both CNN-based U-Net and Transformer-based U-Net networks under a cross-teaching paradigm, seamlessly unifying pseudo-labels and ground-truth labels. To ensure computational efficiency, contrastive pairs are sampled both within individual slices (intra-slice) and across adjacent slices (inter-slice) using multiple patches, allowing for more spatially-aware comparisons. Cross pseudo-supervision also encourages intra-class consistency and inter-class separation, benefiting segmentation of both labeled and unlabeled data.

Achieving accurate medical image segmentation with only one imaging modality during inference has significant clinical value, particularly in scenarios with limited imaging resources. To this end, Zhang et al. [19] designed low-coupling semi-supervised contrast mutual learning (Semi-CML), which incorporates area-similarity contrastive (ASC) loss to avoid the limitations of high-coupling architectures, such as weight sharing and overly complex feature fusion. The ASC loss performs pixel-level contrastive learning directly on the segmentation predictions, enabling modality-aware mutual learning by enforcing prediction consistency across different modalities. This design facilitates more efficient cross-modal knowledge transfer while avoiding excessive architectural entanglement. To mitigate inter-modal accuracy discrepancies, the authors introduced a Soft Pseudo-Label Relearning (PReL) mechanism that dynamically adjusts the pseudo-label quality to enhance supervision on unlabeled samples. Experiments have demonstrated that the Semi-CML framework with PReL not only approaches but occasionally even surpasses the performance of fully supervised methods trained on 100% labeled data, in addition to reducing annotation costs by up to 90%, offering a practical and scalable solution for real-world clinical segmentation problems.

To refine segmentation performance in regions with high uncertainty, Li et al. [65] introduced a self-contrastive learning strategy to contrast predictions from multiple decoders. In this framework, the same pixel predicted by two decoders is regarded as a positive pair, while all the other pixels are utilized as negative samples. This design encourages the model to reduce prediction discrepancies for corresponding pixels across decoders, thereby improving output consistency. Notably, this approach does not require extra data augmentation or negative samples, while a single original image is sufficient for direct contrastive training. To facilitate such a comparison, a projection head with two convolutional layers and a Parametric Rectified Linear Unit (PReLU) activation is attached to the backbone.

Additionally, by exchanging the positions of two pixels, Zhao et al. [90] calculated the linear summation of two-voxel contrastive losses to better learn the separability of the class. In experiments on patients with traumatic brain injury in clinical practice, this approach boosted the segmentation Dice score of the common U-Net by more than 16%.

4.3. Bi-Level CL

Apart from that, some researchers have proposed bi-level CL strategies that simultaneously capture global semantics and local pixel-wise details. As image segmentation is a dense prediction task that requires a precise understanding of fine-grained pixel-level details, as well as global semantic context, it is crucial to incorporate feature learning mechanisms across multiple granularities [91]. Compared with single-level CL methods, these bi-level strategies often yield notable performance gains, offering a more holistic and discriminative medical image representation [92].

Hard positive samples can offer more informative and discriminative knowledge, i.e., samples belonging to the same semantic category as the anchor but with different feature representations. Tang et al. [93] first injected hard positive strategies into a two-layer CL framework, which introduces hard positive samples at both global and local levels through a two-stage pretraining pipeline. First, the model uses image-level hard positives to enhance global semantic alignment; then, samples at the pixel level are employed to refine local feature discrimination. The hierarchical design not only enriches the model’s ability to distinguish subtle variations within the same class but also significantly improves the pixel-wise classification accuracy in semi-supervised segmentation tasks. Especially during the second training stage, there is insufficient memory due to the substantial computational demands associated with dense pixel-level operations. Hence, region-based losses are calculated both within individual regions and across regions of varying scales (i.e., large and small patches). By constraining intra-region contrast to local areas and incorporating inter-region comparisons between coarse and fine granularity, the approach reduces computation while enhancing multi-scale pixel-level feature learning.

Notably, Zhao et al. [94] introduced a location-based clustering strategy to calculate the global and local contrastive loss to learn meaningful and discriminative features during the pre-training stage. Notably, with as few as 10 training samples, it achieved a 4.7% improvement in Dice score over the baseline on a private heart segmentation dataset.

5. Adversarial Learning

Adversarial learning also shines in SSL segmentation, encouraging predictive segmentation of unlabeled data to more closely resemble that of labeled data [94,95]. Typically, a generative adversarial network (GAN) model includes a generator that learns to synthesize images and a discriminator that distinguishes between real and generated samples. Through adversarial training, the generator improves its ability to mimic the target data distribution, while the discriminator becomes more adept at classification. This dynamic interaction enhances segmentation consistency across labeled and unlabeled domains.

In order to improve semi-supervised volumetric medical image segmentation, Li et al. [95] introduced SASSNet, a shape-aware framework that adds the prediction of signature distance maps (SDMs) to the traditional segmentation task. Each SDM encodes the signed distance from a voxel to the nearest object boundary, enriching feature representation with shape information. Shape consistency across the entire dataset is enforced by adversarial loss between SDMs of labeled and unlabeled data. In contrast to UA-MT, SASSNet better depicts the interior regions of the object, avoiding irregular results.

Vorontsov et al. [96] designed a semi-supervised segmentation framework named GenSeg, which enables effective training of conventional encoder–decoder networks from few annotations and common clinical data such as the presence or absence of tumors in radiological scans. By leveraging image-to-image translation, GenSeg learns from unlabeled samples, including healthy and diseased images. The core idea of GenSeg is that image translation and segmentation share the same goal of distinguishing pathological regions from anatomical backgrounds. Through this shared objective, the model can identify tumors robustly, even with minimal supervision. In the segmentation of brain tumors, GenSeg performs better than classical semi-supervised methods like autoencoders and MT, as well as certain fully supervised baselines, by showing 8–14% improvements in Dice scores. This demonstrates that GenSeg effectively isolates pathological features from surrounding anatomy, enabling accurate segmentation and clearer boundary delineation in artifact images.

To further generalize the SSL to a greater extent in cross-modality scenarios, d’Assier et al. [97] introduced M-GenSeg for tumor segmentation on unpaired bi-modal datasets. By enabling the model to learn cross-modality image translation, the approach utilizes pixel-level annotations in the source modality to guide segmentation in the target modality, where such annotations are unavailable. Furthermore, the approach consistently achieves higher Dice scores compared to state-of-the-art domain adaptation techniques on the target modality without annotations. Similarly, to enhance glioma representation, Zhang et al. [98] not only extracted modality-specific features but also introduced domain-sharing layers to learn cross-modal commonalities, enabling a more comprehensive integration of multi-modal information.

Certain segmentation models may occasionally produce results that violate basic anatomical truths, making them difficult to trust and lacking value in clinical settings. To deal with this problem, Wang et al. [99] first introduced anatomical priors (i.e., connectivity, convexity, and symmetry) into SSL frameworks. They adopted a constrained adversarial training strategy, where the model is deliberately challenged with anatomically incorrect examples, encouraging it to learn how to avoid such mistakes. This approach not only promotes more anatomically consistent segmentation outcomes but also offers a flexible way to merge structural knowledge into any segmentation model without altering its architecture.

In addition to the standard training procedure, Xiang et al. [100] incorporated a semi-supervised adversarial learning strategy using an annotation discriminator to identify the source of each image. This approach effectively mitigates the adverse effects of inaccurate predictions on unlabeled samples, and it has demonstrated superior segmentation performance, particularly in detecting small lesions and low-contrast regions in clinical fundus images. Similarly, in clinical cases of cardiac myocardial infarction segmentation, Xu et al. [101] employed adversarial learning to enhance boundary-aware models. Notably, their approach achieved greater segmentation accuracy and stability than the baseline, even with only one-quarter of the training data annotated.

6. Holistic Approach

In brain tumor segmentation, various SSL strategies exhibit distinct strengths and limitations, as summarized in Table 1. Pseudo-labeling methods are relatively simple to implement with minimal modifications to the common network structures, making them practical in real-world scenarios. Nevertheless, their effectiveness heavily depends on the quality of the generated pseudo-labels. Low-quality labels may lead to poor segmentation and overfitting. Consistency regularization is one of the most widely adopted SSL paradigms due to its robustness to perturbations, even in the presence of noise or motion artifacts. However, this comes with a cost of model complexity and computational burden. Contrastive learning is simpler to train, as it does not involve heavy data augmentation or negative sampling, though it typically takes longer training times to converge. In contrast, adversarial learning can be powerful, but training stability and convergence are often challenging in practice.

The integration of two or three of the characteristics of the above SSL algorithms has become an emerging trend in medical image segmentation research. Han et al. [24] found that consistency regularization effectively encourages the model to extract meaningful feature representations from unlabeled images through task consistency. Though the uncertainty guidance map helps eliminate teacher model bias and enhances prediction reliability, it needs to consider the algorithm’s complexity due to the large amount of computation.

In consistency regularization applications, Aralikatti et al. [102] applied data-level perturbations and implicit network-level perturbations at the model pre-training phase and fine-tuning stage, respectively. To further enrich feature representation throughout the training process, the network incorporates manifold learning techniques, allowing the model to capture both local and global semantic structures. It encourages the development of a more robust and informative feature space, improving segmentation outcomes in medical imaging. Significantly, this work represents one of the earliest efforts to jointly leverage multi-level perturbations in a semi-supervised segmentation framework with a pre-training strategy.

Addressing challenges in uncertainty estimation accuracy, process complexity, and pseudo-label quality, Lu et al. [64] designed an asymmetric network based on the MT architecture to combine uncertainty calculation and dual consistency regularization with the pseudo-label. The student network contains two additional auxiliary decoder modules that use cyclic loss to calculate uncertainty, which is then adopted for pseudo-label generation.

Similarly, according to the MT framework, Tang et al. [103] introduced adversarial learning with a collaborative consistency learning mechanism to enhance supervision information quality. The confidence map generated by the discriminator offers more stable and informative guidance to the student network, particularly during later training stages. Moreover, the auxiliary discriminator helps alleviate overfitting by imposing additional regularization. The convergence for the segmentation of lower-grade gliomas is faster in a few iterations than MT, and the training loss is smaller. Some semi-supervised algorithms for glioma segmentation are systematically compared in Table 2.

7. Summary of Semi-Supervised Approaches and Future Challenges

This review provides a comprehensive overview of prevalent SSL approaches for the segmentation of medical images, including pseudo-labeling, consistency regularization, generative adversarial learning, contrast learning, and a holistic approach. Based on a lot of surveys, it is found that some semi-supervised training algorithms can provide more effective guidance in brain tumor segmentation tasks than simple label supervision methods. With the increasing integration of unlabeled data into medical imaging and the growing potential of deep learning technologies, it is predicted that semi-supervised segmentation methods can maintain robust performance, even under data imbalances, expanding the prospects for clinical deployment of CAD systems.

Despite recent advances, SSL also faces several critical issues in brain tumor segmentation. The most significant limitation is the scarcity of high-quality annotated data, in combination with domain shifts (i.e., distributional discrepancies across datasets), potentially leading to detrimental effects on model generalization and prediction performance [66,104,105]. Imbalance in the distributions of labeled and unlabeled samples, along with statistical divergence across medical institutions, further complicates model training. Moreover, glioma segmentation commonly involves multi-modal imaging data, wherein each modality may follow distinct distribution patterns. These differences lead to apparent fluctuations in training performance. In real-world clinical practice, models frequently rely on only one or two modalities. Hence, the pursuit of robust SSL performance under such modality constraints, particularly by aligning predictive accuracy across limited modalities, is of great practical significance [19].

Brain tumor scan images inherently possess rich anatomical priors, including regional continuity, convex shape characteristics, and bilateral symmetry. Effectively incorporating this prior knowledge into SSL frameworks may substantially improve segmentation accuracy and enhance the clinical plausibility of segmentation results, particularly in scenarios where annotated data is limited [23]. To reconcile the intrinsic complexity of DL models with the clinical need for transparency and interpretability, explainable artificial intelligence (XAI) has been increasingly applied to make the decision-making process of brain tumor segmentation more understandable and trustworthy [106,107,108]. The integration of XAI into SSL frameworks not only enhances the interpretability of model outputs but also strengthens the confidence of clinicians and researchers, which is critical for real-world clinical adoption.

Recently, the integration of large language models (LLMs) into the medical domain has emerged as a promising direction of research. In the biomedical context, Han et al. [109] introduced a high-quality dataset containing 160,000 medical text samples to support diverse tasks such as clinical note generation, information retrieval, and educational training for healthcare professionals and students. Similarly, Li et al. [110] highlighted the role of visual language models (VLMs) as a crucial bridge between LLMs and surgical workflow interpretation, enabling multimodal reasoning across textual and visual inputs. In another contribution, Da Costa Nascimento et al. [111] employed LLMs to automatically generate structured clinical reports, incorporating essential patient details such as age, estimated survival time, and tumor subregion descriptions derived from medical imaging. These reports provide key pre-diagnostic insights into accessible text, enhancing medical image diagnosis by linking visual features with clinical descriptions. Looking ahead, the integration of LLMs with SSL frameworks holds potential in terms of improving interpretability and transparency. Interactive engagement with users could offer a more objective foundation for clinical decision-making and reinforce trust in AI-assisted diagnosis.

Compared with fully supervised methods in machine learning [112,113], SSL models are typically more complicated due to the utilization of both labeled and unlabeled data. Consequently, the pursuit of lightweight yet high-performing SSL solutions has become a key research direction, aiming to balance computational efficiency with interpretability [42]. Collectively, these challenges and opportunities highlight the future trajectory of research in brain tumor segmentation, achieving an optimal trade-off between model complexity, explainability, and clinical utility to better serve evolving healthcare demands.

Author Contributions

Conceptualization, C.J., T.F.N. and H.I.; data curation, C.J.; writing—original draft preparation, C.J.; writing—review and editing, T.F.N. and H.I.; supervision, H.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AC-MT	Ambiguity-consensus mean teacher
ASC	Area-similarity constrastive
CAD	Computer-aided diagnosis
CAT	Constrained Adversarial Training
CL	Contrastive Learning
CSF	Cererospinal fluid
CTM	Constrastive Training Module
CPCL	Cyclic prototype consistency learning
DL	Deep learning
DHC	Dual-biased Heterogeneous Co-training
DIffDW	Difficult-aware debiased weighting
DistDW	Distribution-aware debiased weighting
DSC	Dice Similarity Coefficient
DST	Dempster-Shafer theory
DTC	Dual-task consistency
EMA	Exponential moving average
ET	Enhanced tumor
FDCT	Fusion-Guided Dual-View Consistency Training
FLAIR	Fluid-attenuated inversion recovery sequence
FSM	Feature similarity module
GANs	Generative Adversarial Networks
GBM	Glioblastoma
GenSeg	Generative segmentation
GM	Gray metter
IDH	Isocitrate dehydrogenase
Inter-PRL	Inter-patch ranked loss
Intra-PRL	Intra-patch ranked loss
L2U	Labeled-to-unlabeled
MC	Monte Carlo
MC-Net	Mutual consistency network
MetaSeg	Meta-learning-based semantic segmentation
MLP	Multi-layer perceptron
MT	Mean teacher
MRI	Magnetic Resonance Imaging
MVCM	Mixing Volume Consistency Module
PReL	Pseudo-label relearning
PReLu	Parametric Rectified Linear Unit
REM	Region enhancement module
SASSNet	Semantic segmentation algorithm for volumeric medical images
SDF	Signed distance field
SDM	Signature distance map
SEFNet	Semi-supervised evidence fusion framework
SegNet	Segmentation network
Semi-CML	Semi-supervised contrast mutual learning
SSL	Semi-supervised learning
TC	Tumor core
U2L	Unlabeled-to-labeled
UA-MT	Uncertainty-aware mean teacher
UPS	Uncertainty-aware pseudo-label selection
URCA	Uncertainty-Based Region Clipping
UWI	Uncertainty-weighted integration
WM	White matter
WT	Whole tumor
VRC	Voxel Reliability Constraint

References

Wacker, J.; Ladeira, M.; Nascimento, J.E.V. Transfer Learning for Brain Tumor Segmentation. arXiv 2020, arXiv:1912.12452. [Google Scholar]
Pellerino, A.; Caccese, M.; Padovan, M.; Cerretti, G.; Lombardi, G. Epidemiology, risk factors, and prognostic factors of gliomas. Clin. Transl. Imaging 2022, 10, 467–475. [Google Scholar] [CrossRef]
De Feo, M.S.; Granese, G.M.; Conte, M.; Palumbo, B.; Panareo, S.; Frantellizzi, V.; De Vincentis, G.; Filippi, L. Immuno-PET for Glioma Imaging: An Update. Appl. Sci. 2024, 14, 1391. [Google Scholar] [CrossRef]
Mitra, S. Deep Learning with Radiogenomics Towards Personalized Management of Gliomas. IEEE Rev. Biomed. Eng. 2023, 16, 579–593. [Google Scholar] [CrossRef]
Ranjbarzadeh, R.; Caputo, A.; Tirkolaee, E.B.; Ghoushchi, S.J.; Bendechache, M. Brain tumor segmentation of MRI images: A comprehensive review on the application of artificial intelligence tools. Comput. Biol. Med. 2023, 152, 106405. [Google Scholar] [CrossRef] [PubMed]
Fathimathul Rajeena, P.P.; Sivakumar, R. Brain Tumor Classification Using Image Fusion and EFPA-SVM Classifier. Intell. Autom. Soft Comput. 2023, 35, 2837–2855. [Google Scholar] [CrossRef]
Drevelegas, A.; Papanikolaou, N. Imaging Modalities in Brain Tumors. In Imaging of Brain Tumors with Histological Correlations; Drevelegas, A., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 13–33. [Google Scholar] [CrossRef]
Bakas, S.; Reyes, M.; Jakab, A.; Bauer, S.; Rempfler, M.; Crimi, A.; Shinohara, R.T.; Berger, C.; Ha, S.M.; Rozycki, M.; et al. Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge. arXiv 2019, arXiv:1811.02629. [Google Scholar]
Mohammed, Y.M.A.; El Garouani, S.; Jellouli, I. A survey of methods for brain tumor segmentation-based MRI images. J. Comput. Des. Eng. 2023, 10, 266–293. [Google Scholar] [CrossRef]
Angulakshmi, M.; Lakshmi Priya, G.G. Automated brain tumour segmentation techniques—A review. Int. J. Imaging Syst. Technol. 2017, 27, 66–77. [Google Scholar] [CrossRef]
Baid, U.; Ghodasara, S.; Mohan, S.; Bilello, M.; Calabrese, E.; Colak, E.; Farahani, K.; Kalpathy-Cramer, J.; Kitamura, F.C.; Pati, S.; et al. The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification. arXiv 2021, arXiv:2107.02314. [Google Scholar]
Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R.; et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 2015, 34, 1993–2024. [Google Scholar] [CrossRef]
Bakas, S.; Akbari, H.; Sotiras, A.; Bilello, M.; Rozycki, M.; Kirby, J.S.; Freymann, J.B.; Farahani, K.; Davatzikos, C. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data 2017, 4, 170117. [Google Scholar] [CrossRef] [PubMed]
Cai, Y.; Long, Y.; Han, Z.; Liu, M.; Zheng, Y.; Yang, W.; Chen, L. Swin Unet3D: A three-dimensional medical image segmentation network combining vision transformer and convolution. BMC Med. Inform. Decis. Mak. 2023, 23, 33. [Google Scholar] [CrossRef]
Jia, Z.; Zhu, H.; Zhu, J.; Ma, P. Two-Branch network for brain tumor segmentation using attention mechanism and super-resolution reconstruction. Comput. Biol. Med. 2023, 157, 106751. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All you Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
Chaitanya, K.; Erdil, E.; Karani, N.; Konukoglu, E. Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation. Med. Image Anal. 2023, 87, 102792. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, J.; Tian, B.; Lukasiewicz, T.; Xu, Z. Multi-modal contrastive mutual learning and pseudo-label re-learning for semi-supervised medical image segmentation. Med. Image Anal. 2023, 83, 102656. [Google Scholar] [CrossRef]
Gou, F.; Tang, X.; Liu, J.; Wu, J. Artificial intelligence multiprocessing scheme for pathology images based on transformer for nuclei segmentation. Complex Intell. Syst. 2024, 10, 5831–5849. [Google Scholar] [CrossRef]
Chapelle, O.; Schölkopf, B.; Zien, A. Introduction to semi-supervised learning. In Semi-Supervised Learning; MIT Press: Cambridge, UK, 2006; pp. 1–12. [Google Scholar]
Yang, X.; Song, Z.; King, I.; Xu, Z. A Survey on Deep Semi-supervised Learning. IEEE Trans. Knowl. Data Eng. 2023, 35, 8934–8954. [Google Scholar] [CrossRef]
Jiao, R.; Zhang, Y.; Ding, L.; Xue, B.; Zhang, J.; Cai, R.; Jin, C. Learning with limited annotations: A survey on deep semi-supervised learning for medical image segmentation. Comput. Biol. Med. 2024, 169, 107840. [Google Scholar] [CrossRef]
Han, K.; Sheng, V.S.; Song, Y.; Liu, Y.; Qiu, C.; Ma, S.; Liu, Z. Deep semi-supervised learning for medical image segmentation: A review. Expert Syst. Appl. 2024, 245, 123052. [Google Scholar] [CrossRef]
Zhang, W.; Zhu, L.; Hallinan, J.; Zhang, S.; Makmur, A.; Cai, Q.; Ooi, B.C. BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: New York, NY, USA, 2022; pp. 20634–20644. [Google Scholar] [CrossRef]
Xu, Z.; Wang, Y.; Lu, D.; Luo, X.; Yan, J.; Zheng, Y.; Tong, R.K. Ambiguity-selective consistency regularization for mean-teacher semi-supervised medical image segmentation. Med. Image Anal. 2023, 88, 102880. [Google Scholar] [CrossRef] [PubMed]
Vu, M.H.; Norman, G.; Nyholm, T.; Lofstedt, T. A Data-Adaptive Loss Function for Incomplete Data and Incremental Learning in Semantic Image Segmentation. IEEE Trans. Med. Imaging 2022, 41, 1320–1330. [Google Scholar] [CrossRef] [PubMed]
Rizve, M.N.; Duarte, K.; Rawat, Y.S.; Shah, M. In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning. arXiv 2021, arXiv:2101.06329. [Google Scholar]
Qiu, L.; Cheng, J.; Gao, H.; Xiong, W.; Ren, H. Federated Semi-Supervised Learning for Medical Image Segmentation via Pseudo-Label Denoising. IEEE J. Biomed. Health Inform. 2023, 27, 4672–4683. [Google Scholar] [CrossRef]
Lee, D.-H. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks. 2013. Available online: https://api.semanticscholar.org/CorpusID:18507866 (accessed on 1 January 2024).
Long, J.; Yang, C.; Ren, Y.; Zeng, Z. Semi-supervised medical image segmentation via feature similarity and reliable-region enhancement. Comput. Biol. Med. 2023, 167, 107668. [Google Scholar] [CrossRef]
Liu, C.; Cheng, Y.; Tamura, S. Key information-guided networks for medical image segmentation in medical systems. Expert Syst. Appl. 2024, 238, 121851. [Google Scholar] [CrossRef]
Li, J.; Socher, R.; Hoi, S.C.H. DivideMix: Learning with Noisy Labels as Semi-supervised Learning. arXiv 2020, arXiv:2002.07394. [Google Scholar] [CrossRef]
Liu, X.; Li, W.; Yuan, Y. DiffRect: Latent Diffusion Label Rectification for Semi-supervised Medical Image Segmentation. arXiv 2024, arXiv:2407.09918. [Google Scholar]
Zhao, J.; Yao, L.; Cheng, W.; Yu, M.; Shi, W.; Liu, J.; Jiang, Z. Co-training semi-supervised medical image segmentation based on pseudo-label weight balancing. Med. Phys. 2025, 52, 3854–3876. [Google Scholar] [CrossRef]
Saini, M.; Susan, S. Tackling class imbalance in computer vision: A contemporary review. Artif. Intell. Rev. 2023, 56 (Suppl. S1), 1279–1335. [Google Scholar] [CrossRef]
Huang, L.; Ruan, S.; Denœux, T. Semi-supervised multiple evidence fusion for brain tumor segmentation. Neurocomputing 2023, 535, 40–52. [Google Scholar] [CrossRef]
Dempster, A.P. Upper and lower probability inferences based on a sample from a finite univariate population. Biometrika 1967, 54, 515–528. [Google Scholar] [CrossRef] [PubMed]
Louis, D.N.; Perry, A.; Wesseling, P.; Brat, D.J.; Cree, I.A.; Figarella-Branger, D.; Hawkins, C.; Ng, H.K.; Pfister, S.M.; Reifenberger, G.; et al. The 2021 WHO Classification of Tumors of the Central Nervous System: A summary. Neuro-Oncology 2021, 23, 1231–1251. [Google Scholar] [CrossRef]
Cheng, J.; Liu, J.; Kuang, H.; Wang, J. A Fully Automated Multimodal MRI-Based Multi-Task Learning for Glioma Segmentation and IDH Genotyping. IEEE Trans. Med. Imaging 2022, 41, 1520–1532. [Google Scholar] [CrossRef]
Xu, M.; Zhou, Y.; Jin, C.; de Groot, M.; Alexander, D.C.; Oxtoby, N.P.; Hu, Y.; Jacob, J. Expectation maximisation pseudo labels. Med. Image Anal. 2024, 94, 103125. [Google Scholar] [CrossRef]
Qin, C.; Wang, Y.; Zhang, J. URCA: Uncertainty-based region clipping algorithm for semi-supervised medical image segmentation. Comput. Methods Programs Biomed. 2024, 254, 108278. [Google Scholar] [CrossRef]
Rahmati, B.; Shirani, S.; Keshavarz-Motamed, Z. Semi-supervised segmentation of medical images focused on the pixels with unreliable predictions. Neurocomputing 2024, 610, 128532. [Google Scholar] [CrossRef]
Blum, A.; Mitchell, T. Combining Labeled and Unlabeled Data with Co-Training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (COLT’98), Madison, WI, USA, 24–26 July 1998; pp. 92–100. [Google Scholar]
Peiris, H.; Hayat, M.; Chen, Z.; Egan, G.; Harandi, M. Uncertainty-guided dual-views for semi-supervised volumetric medical image segmentation. Nat. Mach. Intell. 2023, 5, 724–738. [Google Scholar] [CrossRef]
Peng, J.; Estrada, G.; Pedersoli, M.; Desrosiers, C. Deep co-training for semi-supervised image segmentation. Pattern Recognit. 2020, 107, 107269. [Google Scholar] [CrossRef]
Luo, X.; Chen, J.; Song, T.; Wang, G. Semi-supervised Medical Image Segmentation through Dual-task Consistency. AAAI 2021, 35, 8801–8809. [Google Scholar] [CrossRef]
Thompson, B.H.; Caterina, G.D.; Voisey, J.P. Pseudo-Label Refinement Using Superpixels for Semi-Supervised Brain Tumour Segmentation. In Proceedings of the 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), Kolkata, India, 28–31 March 2022; IEEE: New York, NY, USA, 2022; pp. 1–5. [Google Scholar] [CrossRef]
Wang, H.; Li, X. DHC: Dual-debiased Heterogeneous Co-training Framework for Class-imbalanced Semi-supervised Medical Image Segmentation. arXiv 2023, arXiv:2307.11960. [Google Scholar]
Zhou, Z.H.; Li, M. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 2005, 17, 1529–1541. [Google Scholar] [CrossRef]
Xing, J.; Gao, C.; Zhou, J. Weighted fuzzy rough sets-based tri-training and its application to medical diagnosis. Appl. Soft Comput. 2022, 124, 109025. [Google Scholar] [CrossRef]
Zhou, R.; Gan, W.; Wang, F.; Yang, Z.; Huang, Z.; Gan, H. Tri-correcting: Label noise correction via triple CNN ensemble for carotid plaque ultrasound image classification. Biomed. Signal Process. Control 2024, 91, 105981. [Google Scholar] [CrossRef]
Zheng, Z.; Hayashi, Y.; Oda, M.; Kitasaka, T.; Mori, K. TriMix: A General Framework for Medical Image Segmentation from Limited Supervision. In Computer Vision—ACCV 2022; Wang, L., Gall, J., Chin, T.-J., Sato, I., Chellappa, R., Eds.; Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2023; Volume 13846, pp. 185–202. [Google Scholar] [CrossRef]
Chen, Z.; Hou, Y.; Liu, H.; Ye, Z.; Zhao, R.; Shen, H. FDCT: Fusion-Guided Dual-View Consistency Training for semi-supervised tissue segmentation on MRI. Comput. Biol. Med. 2023, 160, 106908. [Google Scholar] [CrossRef]
Engelen, J.E.V.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn 2020, 109, 373–440. [Google Scholar] [CrossRef]
Laine, S.; Aila, T. Temporal Ensembling for Semi-Supervised Learning. arXiv 2017, arXiv:1610.02242. [Google Scholar]
Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 1195–1204. [Google Scholar]
Cui, W.; Liu, Y.; Li, Y.; Guo, M.; Li, Y.; Li, X.; Wang, T.; Zeng, X.; Ye, C. Semi-supervised Brain Lesion Segmentation with an Adapted Mean Teacher Model. In Information Processing in Medical Imaging; Chung, A.C.S., Gee, J.C., Yushkevich, P.A., Bao, S., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2019; Volume 11492, pp. 554–565. [Google Scholar] [CrossRef]
Ke, Z.; Wang, D.; Yan, Q.; Ren, J.; Lau, R.W.H. Dual Student: Breaking the Limits of the Teacher in Semi-supervised Learning. arXiv 2019, arXiv:1909.01804. [Google Scholar]
Hamilton, M.; Zhang, Z.; Hariharan, B.; Snavely, N.; Freeman, W.T. Unsupervised Semantic Segmentation by Distilling Feature Correspondences. arXiv 2022, arXiv:2203.08414. [Google Scholar]
Wu, Y.; Xu, M.; Ge, Z.; Cai, J.; Zhang, L. Semi-supervised Left Atrium Segmentation with Mutual Consistency Training. arXiv 2021, arXiv:2103.02911. [Google Scholar]
Wu, Y.; Ge, Z.; Zhang, D.; Xu, M.; Zhang, L.; Xia, Y.; Cai, J. Mutual Consistency Learning for Semi-supervised Medical Image Segmentation. arXiv 2022, arXiv:2109.09960. [Google Scholar] [CrossRef] [PubMed]
Du, J.; Zhang, X.; Liu, P.; Wang, T. Coarse-Refined Consistency Learning Using Pixel-Level Features for Semi-Supervised Medical Image Segmentation. IEEE J. Biomed. Health Inform. 2023, 27, 3970–3981. [Google Scholar] [CrossRef] [PubMed]
Lu, S.; Zhang, Z.; Yan, Z.; Wang, Y.; Cheng, T.; Zhou, R.; Yang, G. Mutually aided uncertainty incorporated dual consistency regularization with pseudo label for semi-supervised medical image segmentation. Neurocomputing 2023, 548, 126411. [Google Scholar] [CrossRef]
Li, L.; Lian, S.; Luo, Z.; Wang, B.; Li, S. Contour-aware consistency for semi-supervised medical image segmentation. Biomed. Signal Process. Control 2024, 89, 105694. [Google Scholar] [CrossRef]
Zhao, Y.; Lu, K.; Xue, J.; Wang, S.; Lu, J. Semi-Supervised Medical Image Segmentation with Voxel Stability and Reliability Constraints. IEEE J. Biomed. Health Inform. 2023, 27, 3912–3923. [Google Scholar] [CrossRef]
Yu, L.; Wang, S.; Li, X.; Fu, C.-W.; Heng, P.-A. Uncertainty-aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation. arXiv 2019, arXiv:1907.07034. [Google Scholar]
Kendall, A.; Gal, Y. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? arXiv 2017, arXiv:1703.04977. [Google Scholar]
Wang, K.; Zhan, B.; Zu, C.; Wu, X.; Zhou, J.; Zhou, L.; Wang, Y. Semi-supervised medical image segmentation via a tripled-uncertainty guided mean teacher model with contrastive learning. Med. Image Anal. 2022, 79, 102447. [Google Scholar] [CrossRef]
Xu, C.; Yang, Y.; Xia, Z.; Wang, B.; Zhang, D.; Zhang, Y.; Zhao, S. Dual Uncertainty-Guided Mixing Consistency for Semi-Supervised 3D Medical Image Segmentation. IEEE Trans. Big Data 2023, 9, 1156–1170. [Google Scholar] [CrossRef]
Lyu, J.; Sui, B.; Wang, C.; Dou, Q.; Qin, J. Adaptive feature aggregation based multi-task learning for uncertainty-guided semi-supervised medical image segmentation. Expert Syst. Appl. 2023, 232, 120836. [Google Scholar] [CrossRef]
Zhang, Y.; Jiao, R.; Liao, Q.; Li, D.; Zhang, J. Uncertainty-guided mutual consistency learning for semi-supervised medical image segmentation. Artif. Intell. Med. 2023, 138, 102476. [Google Scholar] [CrossRef] [PubMed]
Huo, X.; Xie, L.; He, J.; Yang, Z.; Zhou, W.; Li, H.; Tian, Q. ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised Image Segmentation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: New York, NY, USA, 2021; pp. 1235–1244. [Google Scholar] [CrossRef]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. arXiv 2018, arXiv:1710.09412. [Google Scholar]
Yun, S.; Han, D.; Chun, S.; Oh, S.J.; Yoo, Y.; Choe, J. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 6022–6031. [Google Scholar] [CrossRef]
Pang, T.; Xu, K.; Zhu, J. Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks. arXiv 2020, arXiv:1909.11515. [Google Scholar]
Verma, V.; Kawaguchi, K.; Lamb, A.; Kannala, J.; Solin, A.; Bengio, Y.; Lopez-Paz, D. Interpolation consistency training for semi-supervised learning. Neural Netw. 2022, 145, 90–106. [Google Scholar] [CrossRef]
Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; Raffel, C. MixMatch: A Holistic Approach to Semi-Supervised Learning. arXiv 2019, arXiv:1905.02249. [Google Scholar]
Sohn, K.; Berthelot, D.; Li, C.-L.; Zhang, Z.; Carlini, N.; Cubuk, E.D.; Kurakin, A.; Zhang, H.; Raffel, C. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2020; pp. 596–608. Available online: https://proceedings.neurips.cc/paper/2020/hash/06964dce9addb1c5cb5d6e3d9838f733-Abstract.html (accessed on 12 March 2024).
Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. RandAugment: Practical automated data augmentation with a reduced search space. arXiv 2019, arXiv:1909.13719. [Google Scholar]
Shu, Y.; Li, H.; Xiao, B.; Bi, X.; Li, W. Cross-Mix Monitoring for Medical Image Segmentation with Limited Supervision. IEEE Trans. Multimed. 2023, 25, 1700–1712. [Google Scholar] [CrossRef]
Qu, G.; Lu, B.; Shi, J.; Wang, Z.; Yuan, Y.; Xia, Y.; Pan, Z.; Lin, Y. Motion-artifact-augmented pseudo-label network for semi-supervised brain tumor segmentation. Phys. Med. Biol. 2024, 69, 055023. [Google Scholar] [CrossRef]
Hu, L.; Meng, Z. A self-adaptive framework of reducing domain bias under distribution shift for semi-supervised domain generalization. Appl. Soft Comput. 2025, 175, 113087. [Google Scholar] [CrossRef]
Chen, J.; Zhang, J.; Debattista, K.; Han, J. Semi-Supervised Unpaired Medical Image Segmentation Through Task-Affinity Consistency. IEEE Trans. Med. Imaging 2023, 42, 594–605. [Google Scholar] [CrossRef]
Wu, J.; Guo, D.; Wang, L.; Yang, S.; Zheng, Y.; Shapey, J.; Vercauteren, T.; Bisdas, S.; Bradford, R.; Saeed, S.; et al. TISS-net: Brain tumor image synthesis and segmentation using cascaded dual-task networks and error-prediction consistency. Neurocomputing 2023, 544, 126295. [Google Scholar] [CrossRef] [PubMed]
Xu, Z.; Wang, Y.; Lu, D.; Yu, L.; Yan, J.; Luo, J.; Ma, K.; Zheng, Y.; Tong, R.K.-y. All-Around Real Label Supervision: Cyclic Prototype Consistency Learning for Semi-Supervised Medical Image Segmentation. IEEE J. Biomed. Health Inform. 2022, 26, 3174–3184. [Google Scholar] [CrossRef]
Xie, M.; Geng, Y.; Zhang, W.; Li, S.; Dong, Y.; Wu, Y.; Tang, H.; Hong, L. Multi-resolution consistency semi-supervised active learning framework for histopathology image classification. Expert Syst. Appl. 2025, 259, 125266. [Google Scholar] [CrossRef]
Wang, T.; Zhang, X.; Zhou, Y.; Chen, Y.; Zhao, L.; Tan, T.; Tong, T. PCDAL: A Perturbation Consistency-Driven Active Learning Approach for Medical Image Segmentation and Classification. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9, 1–16. [Google Scholar] [CrossRef]
Liu, Q.; Gu, X.; Henderson, P.; Deligianni, F. Multi-Scale Cross Contrastive Learning for Semi-Supervised Medical Image Segmentation. arXiv 2023, arXiv:2306.14293. [Google Scholar]
Chen, H.; Zendehdel, N.; Leu, M.C.; Yin, Z. Fine-grained activity classification in assembly based on multi-visual modalities. J. Intell. Manuf. 2024, 35, 2215–2233. [Google Scholar] [CrossRef]
Zhao, X.; Qi, Z.; Wang, S.; Wang, Q.; Wu, X.; Mao, Y.; Zhang, L. RCPS: Rectified Contrastive Pseudo Supervision for Semi-Supervised Medical Image Segmentation. IEEE J. Biomed. Health Inform. 2024, 28, 251–261. [Google Scholar] [CrossRef]
Chaitanya, K.; Erdil, E.; Karani, N.; Konukoglu, E. Contrastive learning of global and local features for medical image segmentation with limited annotations. arXiv 2020, arXiv:2006.10511. [Google Scholar]
Tang, C.; Zeng, X.; Zhou, L.; Zhou, Q.; Wang, P.; Wu, X.; Ren, H.; Zhou, J.; Wang, Y. Semi-supervised medical image segmentation via hard positives oriented contrastive learning. Pattern Recognit. 2024, 146, 110020. [Google Scholar] [CrossRef]
Zhao, X.; Wang, T.; Chen, J.; Jiang, B.; Li, H.; Zhang, N.; Yang, G.; Chai, S. GLRP: Global and local contrastive learning based on relative position for medical image segmentation on cardiac MRI. Int. J. Imaging Syst. Technol. 2024, 34, e22992. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Available online: https://papers.nips.cc/paper_files/paper/2014/hash/f033ed80deb0234979a61f95710dbe25-Abstract.html (accessed on 13 March 2024).
Li, S.; Zhang, C.; He, X. Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images. Med. Image Comput. Comput. Assist. Interv. 2020, 12261, 552–561. [Google Scholar] [CrossRef]
Vorontsov, E.; Molchanov, P.; Gazda, M.; Beckham, C.; Kautz, J.; Kadoury, S. Towards annotation-efficient segmentation via image-to-image translation. Med. Image Anal. 2022, 82, 102624. [Google Scholar] [CrossRef]
d’Assier, M.A.d.; Vorontsov, E.; Kadoury, S. M-GenSeg: Domain Adaptation For Target Modality Tumor Segmentation with Annotation-Efficient Supervision. arXiv 2023, arXiv:2212.07276. [Google Scholar]
Zhang, J.; Zhang, S.; Shen, X.; Lukasiewicz, T.; Xu, Z. Multi-ConDoS: Multimodal Contrastive Domain Sharing Generative Adversarial Networks for Self-Supervised Medical Image Segmentation. IEEE Trans. Med. Imaging 2024, 43, 76–95. [Google Scholar] [CrossRef]
Wang, P.; Peng, J.; Pedersoli, M.; Zhou, Y.; Zhang, C.; Desrosiers, C. CAT: Constrained Adversarial Training for Anatomically-plausible Semi-supervised Segmentation. IEEE Trans. Med. Imaging 2023, 42, 2146–2161. [Google Scholar] [CrossRef]
Xiang, D.; Yan, S.; Guan, Y.; Cai, M.; Li, Z.; Liu, H.; Chen, X.; Tian, B. Semi-Supervised Dual Stream Segmentation Network for Fundus Lesion Segmentation. IEEE Trans. Med. Imaging 2023, 42, 713–725. [Google Scholar] [CrossRef] [PubMed]
Xu, C.; Wang, Y.; Zhang, D.; Han, L.; Zhang, Y.; Chen, J.; Li, S. BMAnet: Boundary Mining with Adversarial Learning for Semi-Supervised 2D Myocardial Infarction Segmentation. IEEE J. Biomed. Health Inform. 2023, 27, 87–96. [Google Scholar] [CrossRef]
Aralikatti, R.C.; Pawan, S.J.; Rajan, J. A Dual-Stage Semi-Supervised Pre-Training Approach for Medical Image Segmentation. IEEE Trans. Artif. Intell. 2023, 5, 556–565. [Google Scholar] [CrossRef]
Tang, Y.; Wang, S.; Qu, Y.; Cui, Z.; Zhang, W. Consistency and adversarial semi-supervised learning for medical image segmentation. Comput. Biol. Med. 2023, 161, 107018. [Google Scholar] [CrossRef] [PubMed]
Upadhyay, A.K.; Bhandari, A.K. Advances in Deep Learning Models for Resolving Medical Image Segmentation Data Scarcity Problem: A Topical Review. Arch. Computat. Methods Eng. 2024, 31, 1701–1719. [Google Scholar] [CrossRef]
Zhang, Y.; Weng, Y.; Lund, J. Applications of Explainable Artificial Intelligence in Diagnosis and Surgery. Diagnostics 2022, 12, 237. [Google Scholar] [CrossRef] [PubMed]
Mangalathu, S.; Karthikeyan, K.; Feng, D.-C.; Jeon, J.-S. Machine-learning interpretability techniques for seismic performance assessment of infrastructure systems. Eng. Struct. 2022, 250, 112883. [Google Scholar] [CrossRef]
Tehsin, S.; Nasir, I.M.; Damaševičius, R.; Maskeliūnas, R. DaSAM: Disease and Spatial Attention Module-Based Explainable Model for Brain Tumor Detection. BDCC 2024, 8, 97. [Google Scholar] [CrossRef]
Han, T.; Adams, L.C.; Papaioannou, J.-M.; Grundmann, P.; Oberhauser, T.; Figueroa, A.; Löser, A.; Truhn, D.; Bressem, K.K. MedAlpaca: An Open-Source Collection of Medical Conversational AI Models and Training Data. arXiv 2025, arXiv:2304.08247. [Google Scholar]
Li, Y.; Zhao, Z.; Li, R.; Li, F. Deep learning for surgical workflow analysis: A survey of progresses, limitations, and trends. Artif. Intell. Rev. 2024, 57, 291. [Google Scholar] [CrossRef]
Nascimento, J.J.d.C.; Marques, A.G.; Souza, L.d.N.; Dourado Junior, C.M.J.d.M.; Barros, A.C.d.S.; de Albuquerque, V.H.C.; de Freitas Sousa, L.F. A novel generative model for brain tumor detection using magnetic resonance imaging. Comput. Med. Imaging Graph. 2025, 121, 102498. [Google Scholar] [CrossRef]
Kiani, J.; Camp, C.; Pezeshk, S. On the application of machine learning techniques to derive seismic fragility curves. Comput. Struct. 2019, 218, 108–122. [Google Scholar] [CrossRef]
Calò, M.; Ruggieri, S.; Buitrago, M.; Nettis, A.; Adam, J.M.; Uva, G. An ML-based framework for predicting prestressing force reduction in reinforced concrete box-girder bridges with unbonded tendons. Eng. Struct. 2025, 325, 119400. [Google Scholar] [CrossRef]

Figure 1. Self-training pipeline for medical segmentation. (a) Independent pseudo-label generation (b) Hybrid iterative training.

Figure 2. Structures of different SSL models. (a) Consistency regularization model (Han et al., 2024 [24]). (b)

Π

model (Laine and Aila, 2017 [56]). (c) Temporal ensemble model (Laine and Aila, 2017 [56]). (d) Reliable-region enhancement network (Long et al., 2023 [31]).

Figure 2. Structures of different SSL models. (a) Consistency regularization model (Han et al., 2024 [24]). (b)

Π

model (Laine and Aila, 2017 [56]). (c) Temporal ensemble model (Laine and Aila, 2017 [56]). (d) Reliable-region enhancement network (Long et al., 2023 [31]).

Table 1. Performance comparison of different semi-supervised learning approaches in brain tumor segmentation.

Method	Foundational Theory	Advantages	Disadvantages
Pseudo-Labeling	Perform pseudo-labeling of unannotated images and iteratively train, along with annotated data.	The model structure is simple and does not require too many modifications.	The quality requirements for pseudo-labels are high, and overfitting leads to information loss.
Consistency Regularization	Encourage prediction consistency under various perturbations.	Strong robustness, even under motion artifacts.	High computational costs for complex models, and choosing the appropriate hyperparameters is challenging.
Contrastive Learning	Maximize positive pair affinity and minimize negative pair correlation.	No need for additional data augmentation or negative sampling.	Exhibits a longer training time.
Adversarial Learning	Encourage predictive segmentation of unlabeled data closer to that of labeled data with a generator and a discriminator.	Strengthen the model’s ability to generalize and resist perturbations.	It may be challenging in terms of convergence.

Table 2. Representative semi-supervised methods for brain tumor segmentation.

Reference	Dataset	Backbone	Dice	Type	Highlights	Shortcomings
GenSeg [75]	BraTS 2017	nn-UNet	85.00 (1%)	GAN	Employs image-to-image translation to leverage unsegmented data.	Modalities are not considered.
SegPL [29]	BraTS 2018	3D U-Net	–	Pseudo-label	Generates Bayesian pseudo-labels by learning a threshold to select high-quality pseudo labels.	The model may overfit, resulting in information loss and overconfidence.
UG-MCL [56]	BraTS 2019	V-Net	83.61 (20%)	Consistency regulation	Leverages intra- and cross-task consistency guided by uncertainty estimation.	Limited to single-class tasks on small-scale datasets.
AC-MT [17]	BraTS 2019	3D U-Net	84.63 (20%)	Consistency regulation	Incorporates uncertain regions from unlabeled data into the consistency loss.	Limited compatibility with holistic approaches.
URCA [30]	BraTS 2019	U-Net	87.58 (20%)	Pseudo-label	Enhances pseudo-label reliability and mitigates label bias via uncertainty-guided region clipping.	Model complexity remains to be optimized.
SPPL [35]	BraTS 2020	nn-UNet	82.4 (1.9%)	Pseudo-label	Refines the pseudo-labels using the features and edges of the super-pixel maps.	Robustness needs to be enhanced.
DUMC [54]	BraTS 2020	3D-UNet	86.67 (30%)	Consistency regulation and Contrastive Learning	Employs a dual uncertainty-guided mixing consistency model.	High computational costs.
M-GenSeg [76]	BraTS 2020	nn-UNet	86.1 (25%)	GAN	Cross-modality tumor segmentation on unpaired bi-modal datasets.	Used 2D slices instead of full 3D images.
MAPSS [66]	BraTS 2020	3D U-Net	85.33 (20%)	Consistency regularization	Enhance the performance and robustness on limited labeled datasets affected by motion artifacts	Bring large computation costs.
Co-BioNet [32]	BraTS 2022	V-Net	80.30 (40%)	Co-training to generate pseudo-label	Implements co-training with two segmentation and critic networks under uncertainty guidance.	Increased computational cost over the semi-supervised baselines.

Note: Segmentation results are shown in the Dice column, and % reflects the labeled data ratio.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jin, C.; Ng, T.F.; Ibrahim, H. Advancements in Semi-Supervised Deep Learning for Brain Tumor Segmentation in MRI: A Literature Review. AI 2025, 6, 153. https://doi.org/10.3390/ai6070153

AMA Style

Jin C, Ng TF, Ibrahim H. Advancements in Semi-Supervised Deep Learning for Brain Tumor Segmentation in MRI: A Literature Review. AI. 2025; 6(7):153. https://doi.org/10.3390/ai6070153

Chicago/Turabian Style

Jin, Chengcheng, Theam Foo Ng, and Haidi Ibrahim. 2025. "Advancements in Semi-Supervised Deep Learning for Brain Tumor Segmentation in MRI: A Literature Review" AI 6, no. 7: 153. https://doi.org/10.3390/ai6070153

APA Style

Jin, C., Ng, T. F., & Ibrahim, H. (2025). Advancements in Semi-Supervised Deep Learning for Brain Tumor Segmentation in MRI: A Literature Review. AI, 6(7), 153. https://doi.org/10.3390/ai6070153

Article Menu

Advancements in Semi-Supervised Deep Learning for Brain Tumor Segmentation in MRI: A Literature Review

Abstract

1. Introduction

2. Pseudo-Labeling Methods

2.1. Self-Training

2.2. Co-Training

2.3. Tri-Training

3. Consistency Regularization

3.1. Network-Level Consistency Regularization

3.1.1. Single Model

3.1.2. Dual Model

3.1.3. Dual Decoder and Multiple Decoders

3.1.4. Uncertainty-Aware Approaches

3.2. Data-Level Consistency Regularization

3.3. Task-Level Consistency Regularization

3.4. Other Consistency Regularization

4. Contrastive Learning

4.1. Image-Level CL

4.2. Pixel-Level CL

4.3. Bi-Level CL

5. Adversarial Learning

6. Holistic Approach

7. Summary of Semi-Supervised Approaches and Future Challenges

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI