SAS-SemiUNet++: A Stochastic Consistency Regularized Framework with Scale-Aware Semantic Recalibration for Cardiac MRI Segmentation

Rao, Jie; Ma, Xinhao; Li, Xiang

doi:10.3390/app16073507

Open AccessTechnical Note

SAS-SemiUNet++: A Stochastic Consistency Regularized Framework with Scale-Aware Semantic Recalibration for Cardiac MRI Segmentation

by

Jie Rao

¹

,

Xinhao Ma

² and

Xiang Li

^1,*

¹

School of Computer Science, China University of Geosciences, Wuhan 430079, China

²

School of Economics and Management, China University of Geosciences, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(7), 3507; https://doi.org/10.3390/app16073507

Submission received: 14 February 2026 / Revised: 10 March 2026 / Accepted: 27 March 2026 / Published: 3 April 2026

(This article belongs to the Special Issue Cardiac Imaging and Heart Diseases: Recent Progress)

Download

Browse Figures

Versions Notes

Featured Application

The proposed SAS-SemiUNet++ framework enables high-precision segmentation of cardiac substructures (left/right ventricle, myocardium) from cardiac cine-MRI images and can be directly applied as a computer-aided diagnosis tool for clinical cardiovascular disease assessment (e.g., cardiomyopathy, myocardial infarction), supporting disease diagnosis, personalized treatment planning and prognostic evaluation.

Abstract

Precise segmentation of cardiac substructures in magnetic resonance imaging is pivotal for diagnosis and treatment planning but remains impeded by anatomical scale heterogeneity and the scarcity of high-quality pixel-level annotations. Existing deep learning paradigms often struggle to simultaneously resolve the global geometry of ventricular cavities and the fine-grained boundaries of the myocardium, particularly in low-data regimes. To address these challenges, we propose SAS-SemiUNet++, a holistic semi-supervised segmentation framework. This architecture incorporates two novel mechanisms: (1) The Scale-Aware Semantic Recalibration (SASR) unit, which functions as a dynamic semantic gate to adaptively adjust receptive fields, mimicking a radiologist’s variable-focus mechanism to capture multi-scale anatomical details, and (2) Stochastic Consistency Regularization (SCR), a dual-path perturbation strategy that enforces geometric invariance on unlabeled data, thereby mitigating overfitting to noisy pseudo-labels. Comprehensive evaluations on the ACDC benchmark demonstrate that SAS-SemiUNet++ significantly outperforms state-of-the-art methods, achieving superior segmentation accuracy and boundary fidelity, particularly in reducing the 95% Hausdorff distance. This study presents a data-efficient and robust solution for cardiac image analysis, offering potential for scalable clinical deployment.

Keywords:

cardiac MRI segmentation; scale-aware semantic recalibration; stochastic consistency regularization; semi-supervised medical image analysis

1. Introduction

Medical image segmentation aims to precisely isolate anatomical structures or pathological regions from the background in medical images, providing quantitative data for disease diagnosis, treatment planning, and prognosis evaluation [1,2,3,4,5]. With the rapid advancements in imaging technologies such as magnetic resonance imaging (MRI) and computed tomography (CT), high-resolution, multimodal 3D medical data have been widely applied in clinical settings [6,7,8]. However, these data also bring higher annotation costs and more complex computational challenges—particularly in the field of cardiac imaging.

Cardiac MRI segmentation is one of the classic challenges in medical image analysis [9,10,11]. Cardiac structures, including the left ventricle, right ventricle, and myocardium, are anatomically complex and highly deformable. Additionally, motion artifacts from cardiac contraction and relaxation, low contrast, and blurry boundaries between the blood pool and surrounding tissues significantly increase the difficulty of segmentation. Moreover, variations in image styles due to differences in equipment, scanning parameters, and subjects pose serious domain adaptation challenges for segmentation models in multi-center clinical applications [12,13]. Most importantly, high-quality manual annotations are both time-consuming and costly, which often limits the scale of datasets and hinders the improvement of algorithm robustness [14,15,16].

In recent years, deep learning methods, particularly U-Net and its variants (such as U-Net++, Efficient-U-Net, and TransUNet) [17,18,19,20], have achieved remarkable progress in cardiac MRI segmentation tasks. By leveraging strategies such as skip connections, multi-scale feature fusion, and deep supervision, these models have significantly improved segmentation accuracy. In recent years, numerous innovative approaches have been proposed for cardiac MRI segmentation, leveraging advancements in deep learning and U-Net-based architectures. For instance, Wong et al. [21] introduced a novel U-Net variant, GCW-UNet, which incorporates Gaussian blur and channel weighting. By using Gaussian blur, different MRI resolutions are obtained—high-resolution MRIs reveal detailed left atrial features, while low-resolution MRIs highlight the overall contours of the left atrium, effectively addressing challenges posed by small MRI features. Adaptive channel weighting further enhances the network’s segmentation capabilities for atrial regions. Li et al. [22] proposed an end-to-end architecture for automatic multi-task cardiac segmentation in MRIs, utilizing a U-shaped network to simultaneously segment the left ventricle blood pool, left ventricular myocardium, right ventricle blood pool, myocardial edema, and myocardial scars. This work employs multimodal data as network input and utilizes a shared encoder to extract features from different modalities. Kumar et al. [23] developed a deep residual neural network with block attention, named CBAR-UNet, for cardiac image segmentation on short-axis MRI stacks. The model is trained on 2D slices derived from 3D MRIs and applies contrast-limited adaptive histogram equalization to the 2D dataset. Gomathi et al. [24] introduced DPA-UNet, an attention-based model for segmenting ventricular regions. This method employs two modules: channel attention, using max pooling to extract dominant features for localized segmentation, and spatial attention, using average pooling to capture global features of the ventricular region before integration. Cui et al. [25] proposed an end-to-end deep learning segmentation method for delineating myocardial infarction and edema regions in the left ventricle. This approach utilizes a six-layer deep U-Net architecture with a symmetric encoder–decoder pathway for hierarchical feature representation.

Firouznia et al. [26] developed PoinUNet, which integrates Poincaré embedding layers into a 3D U-Net to enhance segmentation of the left atrium wall and epicardial adipose tissue in Dixon MRI data. By leveraging hyperbolic space learning, PoinUNet captures the complex relationships between the LA and EAT while addressing class imbalance and geometric challenges with a new loss function. Wang et al. [27] proposed SK-Unet, a deep neural network for segmenting the left ventricle, right ventricle, and left ventricular myocardium in late gadolinium enhancement cardiac MRIs. SK-Unet incorporates squeeze-and-excitation residual modules into the encoder and selective kernel modules into the decoder to enhance the original U-Net architecture. Supplementary information from cine and T2-weighted CMRIs is also utilized when available. Islam et al. [28] introduced CoST-UNet, a novel hybrid architecture combining CNNs to capture spatial information and transformers to emphasize deeper global contextual insights. Unlike existing hybrid models such as TransUNet and UNETR, the proposed model employs a Swin Transformer backbone, ensuring linear computational complexity relative to image size. Fayouka et al. [29] compared the accuracy of 2D and 3D U-Net models for segmenting the left ventricle using MRI images. Li et al. [30] proposed RSU-Net, which combines the advantages of residual connections and self-attention mechanisms. The model introduces a Bottom Self-Attention block to aggregate global information, achieving promising segmentation results on cardiac datasets and contributing to future diagnosis of cardiovascular patients.

While the aforementioned studies have made significant contributions, there remain several challenges: (1) limited feature representation capability of models under noise, artifacts, and complex backgrounds; (2) high dependency on limited labeled data, which can lead to overfitting and hinder generalization; and (3) insufficient utilization of large amounts of unlabeled data to alleviate the annotation bottleneck. Therefore, it is imperative to design a cardiac segmentation framework that enhances multi-scale feature adaptive selection, ensures stable training, and effectively leverages unlabeled samples.

To navigate the aforementioned challenges, this study proposes SAS-SemiUNet++, a novel semi-supervised cardiac MRI segmentation framework that harmonizes scale-aware feature extraction with stochastic consistency learning. Unlike conventional approaches that rely on static convolutional kernels, our method introduces SASR units into the U-Net++ backbone. These units achieve adaptive feature recalibration across channel dimensions, allowing the network to dynamically shift its focus between large ventricular regions and thin myocardial walls. Simultaneously, to counteract the label scarcity and potential noise in semi-supervised learning, we employ SCR. This mechanism enforces a rigorous consistency constraint on the model’s predictions under dual-path stochastic perturbations, thereby enhancing training stability and generalization on unannotated data.

The contributions of this study are as follows:

We present a unified semi-supervised architecture that synergistically integrates anatomical scale awareness with consistency regularization. To the best of our knowledge, this is the first framework to couple dynamic receptive field recalibration with stochastic perturbation constraints within a nested dense-skip architecture for cardiac MRI segmentation.
We propose the SASR unit as a plug-and-play semantic gate. By dynamically reweighting multi-scale features, this module significantly enhances the representation of fine-grained anatomical details that are typically lost in standard encoders.
We introduce a dual-path consistency paradigm that regularizes the model against decision noise. This strategy effectively leverages unlabeled data to smooth decision boundaries, substantially improving robustness and geometric fidelity in low-data regimes.
Extensive experiments on the ACDC dataset validate that SAS-SemiUNet++ not only achieves state-of-the-art overlap metrics but also significantly reduces the 95% Hausdorff distance, demonstrating its exceptional capability in preserving precise anatomical boundaries.

2. Materials and Methods

2.1. ACDC Dataset

This study utilizes the publicly available dataset from the MICCAI 2017 Automatic Cardiac Diagnosis Challenge (ACDC). Specifically, we used the official 2017 release accessed via the CREATIS online platform (https://acdc.creatis.insa-lyon.fr, accessed on 20 September 2025). This ensures complete reproducibility based on the standardized public release. The sample is shown in Figure 1. The ACDC dataset comprises short-axis cine-MRI sequences of subjects from Siemens, Philips, and GE scanners, encompassing five clinical categories: normal, dilated cardiomyopathy, hypertrophic cardiomyopathy, myocardial infarction, and abnormal right ventricle. This dataset demonstrates diversity in terms of multi-center, multi-vendor, and multi-pathological characteristics. Each case is manually annotated by expert radiologists at both end-diastole and end-systole, with binary masks provided for three structures: the left ventricular cavity, right ventricular cavity, and myocardium, ensuring annotation accuracy.

To ensure a rigorous and transparent evaluation, the data-splitting protocol is explicitly defined at both the subject and slice levels. The complete dataset consists of 200 independent subject cases, which were partitioned into 140 cases for training, 20 cases for validation, and 40 cases for testing. Following the extraction of 2D slices from the 3D cine-MRI sequences across the annotated cardiac phases, a total of 1902 slices were generated. Accordingly, the slice-level distribution yields 1312 slices for training, 196 slices for validation, and 394 slices for testing. The detailed dataset splitting information is summarized in Table 1.

All sequences are resampled to a uniform voxel spacing of 1.25 × 1.25 × 8 mm³ and spatially center-cropped to 224 × 224 pixels. Additionally, sequence images are standardized by their mean and standard deviation to mitigate intensity differences across subjects and scanners.

2.2. Architectural Overview of SAS-SemiUNet++

The proposed SAS-SemiUNet++ constitutes a holistic semi-supervised segmentation framework designed to address the challenges of anatomical scale variation and label uncertainty in cardiac MRI analysis. As illustrated in Figure 2, the architecture leverages the nested dense skip pathways of U-Net++ as the topological backbone. We select this specific backbone because its nested structure effectively bridges the semantic gap between encoder and decoder feature maps. This capability is critical for preserving fine-grained boundaries of cardiac substructures that often vanish in standard architectures due to rigid down-sampling operations. The framework synergistically orchestrates a hierarchical encoder, a densely connected decoder, and our proposed SASR units to form a unified segmentation engine.

In the encoding trajectory, we employ VGG-style blocks as the fundamental feature extraction units. Each block comprises consecutive convolutional layers followed by Batch Normalization and ReLU activation to abstract high-level semantic features while maintaining structural integrity. To overcome the fixed receptive field limitation of standard convolutions, we embed the SASR units at the strategic junctions of the skip connections. These units function as semantic gates and dynamically recalibrate the feature response maps before fusion in the decoder. The SASR units allow the decoder to focus on structurally relevant information by adaptively filtering features based on anatomical scale. This mechanism amplifies the faint signals of the thin myocardium while simultaneously capturing the global geometry of the ventricular cavities. To further facilitate gradient flow and accelerate convergence during the training of this deep architecture, we incorporate a deep supervision strategy. Auxiliary segmentation heads attach to multiple semantic scales of the decoder to ensure that discriminative features are learned at every level of abstraction.

Beyond the architectural connectivity, the SAS-SemiUNet++ explicitly supports a robust semi-supervised learning paradigm. The network implementation integrates the SCR mechanism to alter the training dynamics. Instead of relying on a single deterministic forward pass, the input undergoes a dual-path stochastic propagation. This design enforces geometric consistency across two perturbed views of the same input and effectively regularizes the model against the high-frequency noise often found in pseudo-labels derived from unlabeled data. By harmonizing the feature-level recalibration of SASR with the prediction-level consistency of SCR, SAS-SemiUNet++ achieves a robust balance between segmentation precision and generalization capability. It effectively utilizes both labeled and unlabeled data streams to delineate complex cardiac anatomy.

2.3. Scale-Aware Semantic Recalibration Unit

In the intricate landscape of semi-supervised cardiac MRI segmentation, a pivotal challenge arises from the heterogeneous anatomical scales of cardiac substructures. The left ventricle typically presents as a large, contiguous region, whereas the right ventricular myocardium manifests as a thin, irregular crescent geometry. CNNs, constrained by static receptive fields, often struggle to simultaneously capture global structural integrity for large organs and fine-grained boundary details for smaller tissues. Furthermore, within the semi-supervised paradigm, pseudo-labels derived from unlabeled data inherently contain high-frequency noise along ambiguous boundaries. To mitigate these limitations, we propose the SASR unit. This module is engineered to dynamically adjust the effective receptive field size based on the specific semantic context of the input features, akin to a radiologist adapting their visual focus when scrutinizing different anatomical regions.

As shown in Figure 3, the SASR unit operates through a Split-Fuse-Select mechanism designed to recalibrate feature responses adaptively. Let

X \in ℝ^{H \times W \times C}

denote the input feature map. To accommodate multi-scale representations, we first perform a multipath transformation, applying distinct kernels with varying receptive field sizes to the input

X

. This yields a set of intermediate feature representations

U = \{U_{1}, U_{2}, \dots, U_{K}\}

, where

U_{k} = F_{k} (X)

represents the feature map generated by the k-th kernel branch. These branches are subsequently aggregated via element-wise summation to integrate multi-scale information into a unified global descriptor

S

:

S = \sum_{k = 1}^{K} U_{k},

(1)

To explicitly model the inter-channel dependencies and capture global semantic context, we employ global average pooling to squeeze the spatial dimensions of

S

. The channel-wise statistic

z \in ℝ^{C}

is computed by shrinking

S

through its spatial dimensions

H \times W

:

z_{c} = G_{g a p} (S_{c}) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} S_{c} (i, j)

(2)

where

z_{c}

denotes the c-th element of the squeezed vector.

To enable adaptive selection, a compact multi-layer perceptron acts as a bottleneck structure to estimate the importance weights for each scale branch. This process generates a set of channel-wise attention vectors. Specifically, a Softmax operator is applied across the kernel dimension to ensure that the attention weights for the c-th channel, denoted as

α_{c, k}

, form a valid probability distribution such that

\sum_{k = 1}^{K} α_{c, k} = 1

. The final recalibrated feature map

V \in ℝ^{H \times W \times C}

is obtained by the weighted aggregation of the original multi-scale transformations:

V_{c} = \sum_{k = 1}^{K} α_{c, k} \cdot U_{k, c}

(3)

By virtue of this formulation, the SASR unit effectively suppresses noise from unreliable pseudo-labels by down-weighting inconsistent channels while amplifying features that align with the dominant anatomical scale. This dynamic recalibration significantly enhances the model’s robustness against the inherent uncertainties of semi-supervised learning.

2.4. Stochastic Consistency Regularization via Dual-Path Perturbation

To mitigate the inherent risk of overfitting in deep segmentation networks, especially when these models are trained on limited annotated cardiac MRI datasets, we introduce an SCR mechanism. In clinical scenarios, cardiac images are often plagued by motion artifacts and low-contrast boundaries between the myocardium and the blood pool. A robust segmentation model must exhibit epistemic certainty, meaning its predictions should remain structurally invariant even when internal parameters are subject to stochastic perturbations. Standard training paradigms, which rely solely on minimizing pixel-wise cross-entropy loss, often fail to enforce this geometric stability, leading to fragmented segmentation maps and poor generalization on unseen data. The SCR mechanism addresses this by enforcing a rigorous consistency constraint on the model’s output probability distributions under distinctive stochastic views.

From a theoretical perspective, the SCR module constructs a dual-path forward propagation paradigm. Let

D = {\{(x_{i}, y_{i})\}}_{i = 1}^{N}

denote the labeled training set. During the training phase, for a given input tensor

x_{i}

, we perform two independent forward passes through the network. Crucially, these passes are modulated by two distinct, randomly generated Dropout masks (with a standard dropout rate of

p = 0.5

), denoted as

ξ_{1}

and

ξ_{2}

, where each element of the mask follows a Bernoulli distribution

ξ ~ Bernoulli (1 - p)

. This process yields two divergent probabilistic predictions,

P (y | x_{i}, ξ_{1})

and

P (y | x_{i}, ξ_{2})

. While these two distributions originate from the same input, the stochasticity of the Dropout layers introduces a latent perturbation, effectively simulating the epistemic uncertainty associated with the model’s parameter space.

To enforce the model’s robustness against these internal perturbations, we formulate a bi-directional consistency objective. We employ the symmetric Kullback–Leibler (KL) divergence to quantify this inconsistency. To ensure numerical stability and resolution invariance, the regularization term

L_{S C R}

is formulated pixel-wise and explicitly normalized across the spatial dimensions and the batch size. For a batch size

B

and spatial resolution

H \times W

, it is defined as

L_{S C R} = \frac{1}{B \times H \times W} \sum_{b = 1}^{B} \sum_{j = 1}^{H \times W} \frac{1}{2} [D_{K L} (P (y | x_{b, j}, ξ_{1}) | | P (y | x_{b, j}, ξ_{2})) + D_{K L} (P (y | x_{b, j}, ξ_{2}) | | P (y | x_{b, j}, ξ_{1}))] .

(4)

where

D_{K L} (P ∥ Q) = \sum P (z) \log \frac{P (z)}{Q (z)}

represents the standard KL divergence. By minimizing

L_{S C R}

, we compel the network to project distinct stochastic views into a unified, low-entropy semantic space.

Consequently, the final objective function for the supervised branch is a weighted combination of the task-specific segmentation loss and the consistency regularization term. The segmentation loss, typically a combination of Cross-Entropy and Dice loss, ensures pixel-level accuracy, while the SCR term acts as a soft constraint, smoothing the decision boundary and preventing the model from memorizing high-frequency noise inherent in the training data. This dual-constraint approach significantly enhances the geometric completeness of the segmented cardiac structures, ensuring that the prediction boundaries remain smooth and topologically consistent even in the presence of challenging image artifacts.

2.5. Holistic Semi-Supervised Paradigm via Uncertainty-Rectified Consistency

To transcend the limitations of annotated data scarcity in cardiac MRI segmentation, we extend the proposed SCR into a Holistic Semi-Supervised Paradigm. While the supervised branch enforces structural integrity on labeled samples, the vast majority of clinical data remains unlabeled. Our proposed framework operates on the hypothesis that the decision boundaries of a robust segmentation network should lie in low-density regions of the data manifold. Therefore, even without ground-truth annotations, the model’s predictions for unlabeled inputs should remain perturbation-invariant.

We construct a unified training objective that seamlessly integrates supervised instruction with unsupervised consistency constraints. Let

D_{L} = {\{(x_{i}, y_{i})\}}_{i = 1}^{N_{L}}

and

D_{U} = {\{u_{j}\}}_{j = 1}^{N_{U}}

denote the labeled and unlabeled datasets, respectively, where

N_{U} ≫ N_{L}

. In each training iteration, we sample a mixed mini-batch

B = B_{L} \cup B_{U}

. For every unlabeled sample

u_{j} \in B_{U}

, we apply the same dual-path perturbation mechanism, generating two stochastic probability maps,

P (y | u_{j}, ξ_{1})

and

P (y | u_{j}, ξ_{2})

, under different Dropout masks

ξ

.

The unsupervised consistency loss,

L_{u n s u p}

, is formulated to penalize the semantic discrepancy between these dual views. By minimizing the symmetric KL divergence on the unlabeled data, we force the network to yield confident and consistent predictions on the unannotated manifold:

L_{u n s u p} = \frac{1}{| B_{U} |} \sum_{u_{j} \in B_{U}} \frac{1}{2} [D_{K L} (P_{1} ∥ P_{2}) + D_{K L} (P_{2} ∥ P_{1})] .

(5)

where

P_{1}

and

P_{2}

are shorthands for the distributions under masks

ξ_{1}

and

ξ_{2}

. This constraint effectively propagates the geometric knowledge learned from

D_{L}

to

D_{U}

, acting as a form of self-training that regularizes the model against decision noise.

The final objective function

L_{t o t a l}

is a dynamic linear combination of the supervised segmentation loss and the unsupervised consistency term:

L_{t o t a l} = L_{s u p} (D_{L}) + λ (t) \cdot L_{u n s u p} (D_{U}) .

(6)

Here,

λ (t)

is a time-dependent weighting factor governed by a Gaussian ramp-up function,

λ (t) = λ_{m a x} \cdot e^{- 5 {(1 - t / T)}^{2}}

, where

t

is the current epoch and

T

is the ramp-up period. Based on our empirical tuning, we explicitly set the maximum consistency weight as

λ_{m a x} = 0.1

and the ramp-up length as

T = 200

. This dynamic scheduling strategy allows the model to prioritize supervised learning in the early stages to establish a reliable feature representation, before progressively integrating the unsupervised consistency signals to refine the decision boundaries, thereby ensuring training stability.

3. Results

3.1. Experiment Condition

The experimental environment for this study is shown in Table 2. The semi-supervised training protocol is strictly controlled to ensure full reproducibility. To construct the semi-supervised batches, we utilize a two-stream batch sampler. Each training iteration processes a total batch size of 16, which is precisely partitioned into an equal ratio of 8 labeled and 8 unlabeled images. During training, standard data augmentation strategies are applied to capture diverse anatomical contexts. A consistent Dropout rate of 0.5 is maintained throughout the network to induce the requisite epistemic uncertainty for the SCR mechanism. The model parameters are updated over a total of 30,000 iterations using the stochastic gradient descent (SGD) optimizer with a momentum of 0.9 and a weight decay coefficient of 0.0001. The initial learning rate is set to 0.01 and dynamically adjusted using a polynomial decay schedule as the training progresses. To evaluate the statistical stability and reproducibility of the model performance, all experiments (including ablation and comparative studies) are conducted with 5 independent training runs using different random seeds; the final results are summarized as Mean ± SD in Table 3 and Table 4. The extremely low standard deviation (SD) reported in these tables reflects algorithmic stability across independent training runs. Specifically, the different random seeds strictly controlled the network weight initialization, the stochastic sequence of data batch shuffling during training, the probabilistic transformations applied during data augmentation, and the generation of stochastic Dropout masks essential for the SCR mechanism.

3.2. Evaluation Metrics

To comprehensively evaluate the performance of the model on medical image segmentation tasks, this study adopts the following six commonly used quantitative evaluation metrics. Given C classes, the predicted region and ground truth for class c are denoted as

P_{c}

and

G_{c}

, respectively, with the total number of pixels being N, and the total number of ground truth pixels for class c being

N_{c} = |G_{c}|

. The metrics are defined as follows:

m e a n_d i c e = \frac{1}{C} \sum_{c = 1}^{C} D i c e_{c},

(7)

m e a n_h d 95 = \frac{1}{C} \sum_{c = 1}^{C} H D_{95} (P_{c}, G_{c}),

(8)

A c c_{c} = \frac{T P_{c} + T N_{c}}{T P_{c} + T N_{c} + F P_{c} + F N_{c}},

(9)

A c c_c l a s s = \frac{1}{C} \sum_{c = 1}^{C} A c c_{c},

(10)

m I o U = \frac{1}{C} \sum_{c = 1}^{C} I o U_{c},

(11)

f w I o U = \sum_{c = 1}^{C} \frac{N_{c}}{N} I o U_{c} = \frac{1}{N} \sum_{c = 1}^{C} |G_{c}| \frac{|P_{c} \cap G_{c}|}{|P_{c} \cup G_{c}|} .

(12)

Among these formulas,

T P_{c}

,

T N_{c}

,

F P_{c}

and

F N_{c}

represent True Positive, True Negative, False Positive, and False Negative for class

c

, respectively. To comprehensively assess the model’s performance, the physical and geometric significance of each metric is briefly described below. Mean Dice is a primary metric that evaluates the spatial overlap between the predicted region and the ground truth, providing a direct measure of overall segmentation accuracy. Unlike overlap metrics, the mean 95% Hausdorff Distance measures the spatial distance between the boundaries of the prediction and the ground truth. By taking the 95th percentile, it effectively captures boundary delineation and shape fidelity while remaining robust to minor outlier pixels. Accuracy quantifies the proportion of correctly identified pixels for a specific class, and class accuracy averages this performance across all structures, reflecting the model’s global classification capability. Mean Intersection over Union calculates the ratio of the overlapping area to the combined area of the prediction and ground truth, rigorously evaluating spatial alignment by penalizing both false positives and false negatives. Finally, Frequency-Weighted IoU weights the IoU of each class by its pixel frequency within the dataset, offering a balanced evaluation that accounts for the significant size disparities between large ventricular cavities and the thin myocardium.

3.3. Ablation Study on Component Efficacy

To rigorously validate the independent contributions and the synergistic efficacy of the proposed modules within the SAS-SemiUNet++ framework, we conduct a comprehensive ablation study on the ACDC validation set. The quantitative results appear in Table 3 and utilize the standard U-Net++ trained under a supervised paradigm as the baseline model. We incrementally integrate the SASR unit and the SCR mechanism to isolate the performance gains attributed to each component.

A rigorous ablation study is performed on the ACDC validation set with 5 independent runs, and the quantitative results with statistical dispersion measures are presented in Table 3, using the supervised U-Net++ as the baseline model. The baseline U-Net++ yields a mean Dice score of 0.9030 and a HD95 of 1.2050. Upon integrating the SASR unit into the skip connections, we observe an improvement in the mean Dice score to 0.9074 and the mean IoU to 0.8757. This increment suggests that the adaptive receptive field mechanism successfully captures multi-scale anatomical features and enhances the overall region overlap. However, the mean HD95 metric exhibits a degradation to 1.4288. A detailed per-class analysis reveals that this increase in boundary error is predominantly localized within the right ventricular cavity and the myocardium, which are characterized by thin, irregular geometries and low-contrast interfaces. While the SASR unit successfully adapts receptive fields to capture the global semantic context, its dynamic multi-scale aggregation inadvertently increases the model’s sensitivity to localized, high-frequency boundary noise when trained under limited supervision. This localized variance clearly demonstrates that scale-awareness alone is insufficient for morphological robustness and strictly necessitates the introduction of the SCR mechanism. When coupled together, the SCR explicitly enforces geometric stability and smooths the decision boundaries, rescuing the boundary degradation caused by SASR and ultimately pushing the holistic model to achieve an optimal mean HD95 of 1.1516.

Subsequently, we evaluate the impact of introducing the SCR mechanism alone. This configuration achieves a mean Dice of 0.9050 and maintains a comparable HD95 of 1.2325 relative to the baseline. These metrics reflect the ability of SCR to regularize the training process and prevent overfitting to noisy labels and thereby improving the model’s generalization potential. The most significant performance breakthrough emerges when we couple both modules in the complete SAS-SemiUNet++ architecture. This holistic integration yields the highest mean Dice of 0.9142 and, most notably, reduces the HD95 to 1.1516. This substantial improvement in boundary precision demonstrates a powerful synergistic effect. The SASR unit provides rich and scale-sensitive feature representations while the SCR mechanism enforces geometric consistency on these features. This combination allows the model to achieve optimal segmentation accuracy in terms of both volumetric overlap and boundary delineation.

3.4. Comparative Performance Analysis

To comprehensively benchmark the segmentation proficiency of the proposed SAS-SemiUNet++ framework, we conduct a comparative analysis against a spectrum of established medical image segmentation architectures. The evaluation suite encompasses classic CNN-based models (U-Net, SegNet), efficiency-oriented designs (EfficientUnet), attention-augmented networks (AttU-Net), and the hybrid Transformer-CNN paradigm (TransUNet). We train all competing models on the identical ACDC training subset and evaluate them under strictly controlled hyperparameters to ensure a fair assessment of architectural capabilities.

All competing models are trained under the identical experimental setup with 5 independent runs for fair comparison, and the statistical results are summarized in Table 4 with Mean ± SD to reflect the result variability. As evidenced by the quantitative metrics in Table 4, the SAS-SemiUNet++ architecture achieves the premier performance across the primary indicators of segmentation accuracy. Specifically, our model secures a mean Dice score of 0.9142 and an mIoU of 0.8844. These values surpass the runner-up model, AttU-Net, which attains a Dice score of 0.9052. This lead in overlap metrics suggests that the SASR units successfully enhance the feature representation of cardiac substructures by dynamically adjusting receptive fields to match the anatomical scale.

A critical observation from the comparative data concerns the boundary delineation capability measured by the HD95. SAS-SemiUNet++ records the lowest HD95 of 1.1516, which indicates superior geometric fidelity along tissue interfaces. In contrast, the classic U-Net and EfficientUnet yield significantly higher HD95 values of 3.1687 and 2.2168, respectively. This disparity highlights the limitation of standard convolutional encoders in resolving ambiguous boundaries without explicit attention mechanisms or consistency regularization.

Notably, the Transformer-based TransUNet exhibits lower performance in this specific experimental setting, with a mean Dice of 0.8518 and a higher HD95 of 2.2018. A plausible explanation, as often discussed in the recent literature, is that Vision Transformers generally require large-scale datasets to fully leverage their global receptive fields, which can pose challenges in medical imaging scenarios where annotated data is limited. In the context of the ACDC dataset, the CNN-based architecture of SAS-SemiUNet++, augmented by the SCR mechanism, appears to offer a more data-efficient alternative, yielding more stable boundary delineations without relying on massive training cohorts.

While Table 4 demonstrates the overall algorithmic stability through run-to-run variations, it is crucial to assess the clinical robustness of the model across heterogeneous patient anatomies. To address this, we further evaluated the patient-level variability on the test cohort. Table 5 presents the per-subject statistical distribution, including the Median, Interquartile Range (IQR), and 95% Bootstrap Confidence Intervals (CI) for the Mean Dice and HD95 metrics. Compared to the baseline U-Net++ supervised model, SAS-SemiUNet++ not only achieves higher median scores but also exhibits tighter confidence intervals and narrower IQRs. This indicates that our proposed SCR and SASR mechanisms effectively reduce severe outlier predictions on challenging anatomical cases, offering a more consistent and reliable segmentation performance at the individual patient level.

3.5. Qualitative Visualization and Perceptual Analysis

To intuitively corroborate the quantitative improvements reported in Table 3, we present a qualitative assessment of the segmentation results in Figure 4. This visualization compares the geometric fidelity of the predictions generated by the proposed SAS-SemiUNet++ against the baseline models, including U-Net, EfficientUnet, and AttU-Net. The selected samples cover challenging cardiac phases where the anatomical structures exhibit complex deformations and low-contrast interfaces between the myocardium and the surrounding blood pool.

Observations from the baseline models reveal distinct failure modes. The standard U-Net and EfficientUnet architectures demonstrate reliable localization of the primary region of interest; however, they exhibit a marked tendency towards under-segmentation. As seen in the second and third rows of Figure 4, these models fail to capture the complete annular structure of the myocardium, often truncating the thinner segments of the tissue. This limitation stems directly from their fixed receptive fields, which lack the requisite sensitivity to delineate fine-grained anatomical margins. Conversely, the AttU-Net model mitigates this under-segmentation to some degree by leveraging attention mechanisms but introduces a new set of artifacts. Its predictions are prone to boundary leakage and spurious false positives, resulting in rough and anatomically implausible contours. This behavior indicates that while attention mechanisms enhance feature activation, they can become over-sensitive to image noise without adequate regularization.

In sharp contrast, the proposed SAS-SemiUNet++ demonstrates superior perceptual quality and anatomical coherence. By integrating the SASR units, our model successfully resolves the thin, crescent-shaped geometries of the myocardium that are missed by the baseline models. Furthermore, the boundaries produced by SAS-SemiUNet++ are remarkably smooth and adhere tightly to the ground truth annotations. We attribute this topological stability to the SCR strategy, which effectively suppresses high-frequency boundary noise and enforces a global shape constraint. Consequently, the visualization experiments confirm that SAS-SemiUNet++ not only achieves higher pixel-level accuracy but also preserves the structural plausibility required for clinical applications.

4. Discussion

This study addresses the formidable challenges associated with the precise segmentation of cardiac substructures in MRI, a task historically constrained by the dual bottlenecks of anatomical heterogeneity and the scarcity of high-quality pixel-level annotations [36,37,38,39,40]. While deep learning paradigms have revolutionized medical image analysis, standard convolutional architectures often falter when tasked with simultaneously delineating large ventricular cavities and fine-grained myocardial boundaries due to their fixed receptive fields [41,42,43,44,45]. Furthermore, the inherent dependency of these models on massive annotated datasets creates a significant barrier to clinical deployment. Existing solutions, including recent Transformer-based approaches like TransUNet, often struggle with data efficiency and fail to generalize well in low-data regimes, as evidenced by the high Hausdorff distances observed in our comparative experiments [46,47,48].

To bridge this gap, we propose SAS-SemiUNet++, a holistic semi-supervised framework that synergistically integrates scale-aware feature recalibration with stochastic consistency regularization. The core innovation of this work lies not merely in the combination of modules but in the targeted resolution of specific physiological and learning constraints [49,50,51,52]. The SASR unit functions as a dynamic semantic gate. It allows the network to adaptively mimic the variable-focus mechanism of a radiologist and successfully captures both the global geometry of the left ventricle and the subtle topological details of the right ventricular myocardium. This mechanism directly addresses the limitation of one-size-fits-all convolution kernels. Simultaneously, the SCR strategy introduces a dual-path perturbation constraint during training. This approach enforces the model to learn robust, perturbation-invariant representations from unlabeled data, effectively mitigating the overfitting issues prevalent in semi-supervised learning scenarios.

Comprehensive empirical validation on the ACDC benchmark corroborates the efficacy of our proposed methodology. SAS-SemiUNet++ demonstrates superior segmentation performance across all key metrics and significantly outperforms state-of-the-art baselines including TransUNet and AttU-Net. Notably, the substantial reduction in the HD95 underscores the model’s exceptional capability in boundary preservation, a direct consequence of the SCR mechanism’s ability to smooth decision boundaries against label noise. The ablation studies further confirm that the SASR and SCR modules are not redundant additions but complementary forces; the former ensures feature richness, while the latter guarantees geometric consistency.

5. Conclusions

This study establishes SAS-SemiUNet++ as a robust, data-efficient, and clinically relevant solution for cardiac MRI segmentation. By effectively leveraging unlabeled data through a consistent regularization paradigm, we provide a viable path forward for reducing the annotation burden in medical imaging. While the current investigation validates the framework on the ACDC cohort, future initiatives will extend this architecture to multi-center datasets to further rigorously evaluate its cross-domain generalization capabilities. We also envision integrating 3D spatial consistency constraints to further enhance volumetric segmentation accuracy in clinical practice.

Author Contributions

Conceptualization, J.R. and X.L.; methodology, J.R.; software, J.R.; validation, J.R. and X.M.; formal analysis, J.R. and X.M.; investigation, J.R. and X.M.; resources, X.L.; data curation, J.R.; writing—original draft preparation, J.R. and X.M.; writing—review and editing, J.R.; visualization, J.R.; supervision, X.L.; project administration, X.L.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study is from the public dataset from the 2017 MICCAI Automatic Cardiac Diagnosis Challenge (ACDC), which is openly accessible via the official challenge website: https://acdc.creatis.insa-lyon.fr (accessed on 20 September 2025). Any use of the ACDC database must cite the original reference as required by the dataset provider: [53]. All preprocessing procedures applied to the original ACDC dataset, including voxel resampling to 1.25 × 1.25 × 8 mm³, spatial center-cropping to 224 × 224 pixels, and intensity standardization by mean and standard deviation, are described in detail in Section 2.1 of this manuscript, enabling full reproducibility of the processed data used in the experiments. No new original image datasets were generated in the course of this research. The source code for the proposed SAS-SemiUNet++ framework, including model implementation, training pipelines and evaluation scripts, will be publicly available on GitHub upon acceptance of this manuscript to ensure the reproducibility of all reported experimental results.

Acknowledgments

The authors would like to extend their sincere gratitude to the relevant researchers from the School of Computer Science, China University of Geosciences (Wuhan), for their support and assistance in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACDC	Automatic Cardiac Diagnosis Challenge
fwIoU	frequency-weighted Intersection over Union
HD95	95% Hausdorff Distance
KL	Kullback–Leibler
mIoU	mean Intersection over Union
MRI	Magnetic Resonance Imaging
SCR	Stochastic Consistency Regularization
SASR	Scale-Aware Semantic Recalibration
U-Net++	UNet Plus Plus
SD	Standard Deviation

References

Zhang, Y.C.; Shen, Z.R.; Jiao, R.S. Segment anything model for medical image segmentation: Current applications and future directions. Comput. Biol. Med. 2024, 171, 108238. [Google Scholar] [CrossRef]
Rayed, M.E.; Islam, S.S.; Niha, S.I.; Jim, J.R.; Kabir, M.M.; Mridha, M.F. Deep learning for medical image segmentation: State-of-the-art advancements and challenges. Inform. Med. Unlocked 2024, 47, 101504. [Google Scholar] [CrossRef]
Azad, R.; Aghdam, E.K.; Rauland, A.; Jia, Y.; Avval, A.H.; Bozorgpour, A.; Karimijafarbigloo, S.; Cohen, J.P.; Adeli, E.; Merhof, D. Medical image segmentation review: The success of u-net. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10076–10095. [Google Scholar] [CrossRef] [PubMed]
Xing, Z.H.; Tian, Y.; Yang, Y.J.; Guang, L.; Zhang, L. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2024. [Google Scholar]
Zhao, L.; Zhou, D.; Jin, X.; Zhu, W. Nn-TransUNet: An automatic deep learning pipeline for heart MRI segmentation. Life 2022, 12, 1570. [Google Scholar] [CrossRef] [PubMed]
Liu, J.H.; Zhao, D.W.; Shen, J.Y.; Geng, P.T.; Zhang, Y.; Yang, J.F.; Zhang, Z.H. HRD-Net: High resolution segmentation network with adaptive learning ability of retinal vessel features. Comput. Biol. Med. 2024, 173, 108295. [Google Scholar] [CrossRef]
Jiang, Q.; Jin, X.; Cui, X.; Yao, S.W.; Li, K.; Zhou, W. A lightweight multimode medical image fusion method using similarity measure between intuitionistic fuzzy sets joint laplacian pyramid. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 7, 631–647. [Google Scholar] [CrossRef]
Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; IEEE: Piscataway, NJ, USA, 2022; pp. 574–584. [Google Scholar]
Morales, M.A.; Manning, W.J.; Nezafat, R. Present and future innovations in AI and cardiac MRI. Radiology 2024, 310, e231269. [Google Scholar] [CrossRef]
Zhang, D.; Lu, C.F.; Tan, T.; Dashtbozorg, B.; Long, X.T.; Xu, X.F.; Zhang, J.; Shan, C. BSANet: Boundary-aware and scale-aggregation networks for CMR image segmentation. Neurocomputing 2024, 599, 128125. [Google Scholar] [CrossRef]
Zhang, H.; Cai, Z.W. ConvNextUNet: A small-region attentioned model for cardiac MRI segmentation. Comput. Biol. Med. 2024, 177, 108592. [Google Scholar] [CrossRef]
Lyu, J.; Wang, G.; Wang, Z.; Dong, S.Y.; Ding, W.; Wang, C.Y. Diffusion-prior based implicit neural representation for arbitrary-scale cardiac cine MRI super-resolution. Inform. Fusion 2025, 126, 103510. [Google Scholar] [CrossRef]
Yang, R.; Liu, K.; Liang, Y. A fusion-attention swin transformer for cardiac MRI image segmentation. IET Image Process. 2024, 18, 105–115. [Google Scholar] [CrossRef]
Mastrodicasa, D.; van Assen, M.; Huisman, M.; Leiner, T.; Williamson, E.E.; Nicol, E.D.; Allen, B.D.; Saba, L.; Vliegenthart, R.; Hanneman, K. Use of AI in cardiac CT and MRI: A scientific statement from the ESCR, EuSoMII, NASCI, SCCT, SCMR, SIIM, and RSNA. Radiology 2025, 314, e240516. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Xiao, M.M.; Zhou, Y.; Wang, C.; Wu, N.; Li, Y.; Gong, Y.Y.; Chang, S.; Chen, Y.; Zhu, L.; et al. Deep separable spatiotemporal learning for fast dynamic cardiac MRI. IEEE Trans. Biomed. Eng. 2025, 72, 3642–3654. [Google Scholar] [CrossRef] [PubMed]
Lehmann, D.H.; Gomes, B.; Vetter, N.; Braun, O.; Amr, A.; Hilbel, T.; Müller, J.; Köthe, U.; Reich, C.; Kayvanpour, E.; et al. Prediction of diagnosis and diastolic filling pressure by AI-enhanced cardiac MRI: A modelling study of hospital data. Lancet Digit. Health 2024, 6, e407–e417. [Google Scholar] [CrossRef]
Ye, Y.; Chen, Y.W.; Wang, R.; Zhu, D.H.; Huang, Y.Y.; Huang, Y.; Liu, J.Y.; Chen, Y.X.; Shi, J.; Ding, B.; et al. Image segmentation using improved U-Net model and convolutional block attention module based on cardiac magnetic resonance imaging. J. Radiat. Res. Appl. Sci. 2024, 17, 100816. [Google Scholar] [CrossRef]
Cui, H.; Li, Y.; Jiang, L.; Wang, Y.; Xia, Y.; Zhang, Y. Improving myocardial pathology segmentation with U-Net++ and EfficientSeg from multi-sequence cardiac magnetic resonance images. Comput. Biol. Med. 2022, 151, 106218. [Google Scholar] [CrossRef]
Rahman, T. Fast Magnetic Resonance Image Reconstruction with Deep Learning Using an Efficientnet Encoder. Master’s Thesis, The University of Texas at El Paso, El Paso, TX, USA, 2021. [Google Scholar]
Abdeltawab, H.; Jani, M.R.; Hasan, M.R.; Yasser, I.; Khalifa, F. A hybrid adversarial-TransUnet architecture for improving LV segmentation using cardiac cine images. In Proceedings of the International Conference on Intelligent Systems, Blockchain, and Communication Technologies; Springer: Cham, Switzerland, 2024; pp. 237–247. [Google Scholar]
Wong, K.K.L.; Zhang, A.; Yang, K.; Wu, S.C.; Ghista, D.N. GCW-UNet segmentation of cardiac magnetic resonance images for evaluation of left atrial enlargement. Comput. Methods Programs Biomed. 2022, 221, 106915. [Google Scholar] [CrossRef]
Li, W.; Wang, L.; Qin, S. Cms-unet: Cardiac multi-task segmentation in MRI with a u-shaped network. In Myocardial Pathology Segmentation Combining Multi-Sequence CMR Challenge; Springer: Cham, Switzerland, 2020; pp. 92–101. [Google Scholar]
Kumar, R.; Gupta, M.; Agarwal, A.; Nayyar, A. CBAR-UNet: A novel methodology for segmentation of cardiac magnetic resonance images using block attention-based deep residual neural network. Multimed. Tools Appl. 2024, 83, 85047–85063. [Google Scholar] [CrossRef]
Gomathi, G.; Subha, V. DPA-UNet: Detail preserving attention UNet for cardiac MRI ventricle region segmentation. Int. J. Health Sci. 2022, 6, 11833–11852. [Google Scholar] [CrossRef]
Cui, H.; Jiang, L.; Yuwen, C.; Xia, Y.; Zhang, Y. Deep U-Net architecture with curriculum learning for myocardial pathology segmentation in multi-sequence cardiac magnetic resonance images. Knowl.-Based Syst. 2022, 249, 108942. [Google Scholar] [CrossRef]
Firouznia, M.; Ylipää, E.; Henningsson, M.; Carlhäll, C.J. Poincare guided geometric UNet for left atrial epicardial adipose tissue segmentation in Dixon MRI images. Sci. Rep. 2025, 15, 25549. [Google Scholar] [CrossRef] [PubMed]
Wang, X.Y.; Yang, S.X.; Fang, Y.; Wei, Y.M.; Wang, M.J.; Zhang, J.; Han, X. SK-UNet: An improved U-Net model with selective kernel for the segmentation of LGE cardiac MR images. IEEE Sens. J. 2021, 21, 11643–11653. [Google Scholar] [CrossRef]
Islam, M.R.; Qaraqe, M.; Serpedin, E. CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation. Biomed. Signal Process. Control 2024, 96, 106633. [Google Scholar] [CrossRef]
Fayouka, A.; Benameur, N.; Mahmoudi, R.; Masmoudi, I.; Deriche, M. Cardiac segmentation: A comparative study between 3D UNet and 2D UNet performances. In Proceedings of the 2024 IEEE/ACS 21st International Conference on Computer Systems and Applications (AICCSA); IEEE: Piscataway, NJ, USA, 2024; pp. 1–7. [Google Scholar]
Li, Y.Z.; Wang, Y.; Huang, Y.H.; Xiang, P.; Liu, W.X.; Lai, Q.Q.; Gao, Y.Y.; Xu, M.S.; Guo, Y.F. RSU-Net: U-net based on residual and self-attention mechanism in the segmentation of cardiac magnetic resonance images. Comput. Methods Programs Biomed. 2023, 231, 107437. [Google Scholar] [CrossRef]
Li, Y.; Chouzenoux, E.; Charmettant, B.; Benatsou, B.; Lamarque, J.P.; Lassau, N. Lightweight U-Net for lesion segmentation in ultrasound images. In Proceedings of the 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), Nice, France, 13–16 April 2021; pp. 611–615. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Chen, J.N.; Lu, Y.; Yu, Q.H.; Luo, X.D.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning where to look for the pancreas. Med. Image Anal. 2018, 50, 108–118. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Heckel, R.; Jacob, M.; Chaudhari, A.; Perlman, O.; Shimron, E. Deep learning for accelerated and robust MRI reconstruction. Magn. Reson. Mater. Phys. Biol. Med. 2024, 37, 335–368. [Google Scholar] [CrossRef]
Benjamin, M.M.; Rabbat, M.G. Artificial intelligence in transcatheter aortic valve replacement: Its current role and ongoing challenges. Diagnostics 2024, 14, 261. [Google Scholar] [CrossRef]
Jin, Y.; Pepe, A.; Li, J.; Gsaxner, C.; Chen, Y.X.; Puladi, B.; Zhao, F.H.; Pomykala, K.; Kleesiek, J.; Frangi, A.F.; et al. Aortic vessel tree segmentation for cardiovascular diseases treatment: Status quo. ACM Comput. Surv. 2025, 57, 1–35. [Google Scholar] [CrossRef]
Priyadarshi, R.; Ranjan, R.; Vishwakarma, A.K.; Yang, T.; Rathore, R.S. Exploring the frontiers of unsupervised learning techniques for diagnosis of cardiovascular disorder: A systematic review. IEEE Access 2024, 12, 139253–139272. [Google Scholar] [CrossRef]
Meier, C.; Eisenblätter, M.; Gielen, S. Myocardial late gadolinium enhancement (LGE) in cardiac magnetic resonance imaging (CMR)—An important risk marker for cardiac disease. J. Cardiovasc. Dev. Dis. 2024, 11, 40. [Google Scholar] [CrossRef] [PubMed]
Moradi, A.; Olanisa, O.O.; Nzeako, T.; Shahrokhi, M.; Esfahani, E.; Fakher, N.; Khazeei Tabari, M.A. Revolutionizing cardiac imaging: A scoping review of artificial intelligence in echocardiography, CTA, and cardiac MRI. J. Imaging 2024, 10, 193. [Google Scholar] [CrossRef] [PubMed]
Cai, Y.Q.; Gong, D.X.; Tang, L.Y.; Cai, Y.; Li, H.J.; Jing, T.C.; Gong, M.; Hu, W.; Zhang, Z.W.; Zhang, X.; et al. Pitfalls in developing machine learning models for predicting cardiovascular diseases: Challenge and solutions. J. Med. Internet Res. 2024, 26, e47645. [Google Scholar] [CrossRef]
Baeßler, B.; Engelhardt, S.; Hekalo, A.; Hennemuth, A.; Hüllebrand, M.; Laube, A.; Scherer, C.; Tölle, M.; Wech, T. Perfect match: Radiomics and artificial intelligence in cardiac imaging. Circ. Cardiovasc. Imaging 2024, 17, e015490. [Google Scholar] [CrossRef]
Singh, V.; Sathio, A.A.; Anwar, S.; Vavekanand, R.; Danish, R. A Medical Imaging Approach for Recognising Mitral Regurgi-tation Through Machine Learning Methods in Cardiac Imaging. Cardiol. Vasc. Res. 2024, 2, 1–7. [Google Scholar]
Liu, Z.L.; Kainth, K.; Zhou, A.; Deyer, T.W.; Fayad, Z.A.; Greenspan, H.; Mei, X. A review of self-supervised, generative, and few-shot deep learning methods for data-limited magnetic resonance imaging segmentation. NMR Biomed. 2024, 37, e5143. [Google Scholar] [CrossRef]
Carrabba, N.; Amico, M.A.; Guaricci, A.I.; Carella, M.C.; Maestrini, V.; Monosilio, S.; Pedrotti, P.; Ricci, F.; Monti, L.; Figliozzi, S.; et al. CMR mapping: The 4th-era revolution in cardiac imaging. J. Clin. Med. 2024, 13, 337. [Google Scholar] [CrossRef]
Rashid, I.; Cruz, G.; Seiberlich, N.; Hamilton, J.I. Cardiac MR fingerprinting: Overview, technical developments, and applications. J. Magn. Reson. Imaging 2024, 60, 1753–1773. [Google Scholar] [CrossRef]
Dishner, K.A.; McRae-Posani, B.; Bhowmik, A.; Jochelson, M.S.; Holodny, A.; Pinker, K.; Eskreis-Winkler, S.; Stember, J.N. A survey of publicly available MRI datasets for potential use in artificial intelligence research. J. Magn. Reson. Imaging 2024, 59, 450–480. [Google Scholar] [CrossRef]
Tolu-Akinnawo, O.Z.; Ezekwueme, F.; Omolayo, O.; Batheja, S.; Awoyemi, T. Advancements in artificial intelligence in noninvasive cardiac imaging: A comprehensive review. Clin. Cardiol. 2025, 48, e70087. [Google Scholar] [CrossRef]
Altara, R.; Basson, C.J.; Biondi-Zoccai, G.; Booz, G.W. Exploring the promise and challenges of artificial intelligence in biomedical research and clinical practice. J. Cardiovasc. Pharmacol. 2024, 83, 403–409. [Google Scholar] [CrossRef]
Pasdeloup, D.; Østvik, A.; Olaisen, S.; Skogvoll, E.; Dalen, H.; Lovstakken, L. Challenges and strategies for deep learning in cardiovascular imaging: Ejection fraction and heart failure management. Cardiovasc. Imaging 2025, 18, 751–764. [Google Scholar]
Milosevic, M.; Jin, Q.; Singh, A.; Amal, S. Applications of AI in multi-modal imaging for cardiovascular disease. Front. Radiol. 2024, 3, 1294068. [Google Scholar] [CrossRef]
Bernard, O.; Lalande, A.; Zotti, C.; Cervenansky, F.; Yang, X.; Heng, P.-A.; Cetin, I.; Lekadir, K.; Camara, O.; Ballester, M.A.G.; et al. Deep Learning Techniques for Automatic MRI Cardiac Multi-structures Segmentation and Diagnosis: Is the Problem Solved? IEEE Trans. Med. Imaging 2018, 37, 2514–2525. [Google Scholar] [CrossRef]

Figure 1. Sample of ACDC dataset. The colored overlays represent the expert pixel-level annotations for the cardiac substructures: the left ventricular cavity is shown in blue, the myocardium in green, and the right ventricular cavity in red.

Figure 2. Structure of SAS-SemiUNet++. The diagram illustrates the nested dense skip pathways of the U-Net++ backbone, integrated with SASR units to dynamically adjust multi-scale feature representations. Additionally, it depicts the dual-path SCR mechanism, which computes the Kullback–Leibler divergence on unlabeled data to enforce prediction stability.

Figure 3. Structure of SASR.

Figure 4. Visualization of segmentation effect.

Table 1. Dataset splitting protocol detailing the distribution of subjects and 2D slices.

Data Split	Number of Subjects	Number of 2D Slices
Training	140	1312
Validation	20	196
Testing	40	394
Total	200	1902

Table 2. Experimental environment.

Name	Related Configurations
CPU	Intel(R) Xeon(R) Gold 5220R CPU × 2
RAM	DDR4 2400 MHz 256 GB
Accelerator	CUDA11.1, cudnn8.0.4
GPU	RTX4090 × 4
Operating system	Ubuntu 18.04
Framework	Pytorch 1.9.0
Python version	3.9

Table 3. Ablation study results evaluated on the ACDC validation set. (↑ indicates that higher values are better for the metric, and ↓ indicates that lower values are better. Bold text highlights the best result for each metric). All results are reported as Mean ± Standard Deviation (SD) over 5 independent runs with different random seeds.

Method	Mean_Dice (↑)	Mean_hd95 (↓)	Acc (↑)	Acc_Class (↑)	mIoU (↑)	fwIoU (↑)
Unet++_supervised (baseline)	0.9030 ± 0.0003	1.2050 ± 0.0314	0.9947 ± 0.0001	0.9292 ± 0.0003	0.8715 ± 0.0003	0.9901 ± 0.0001
Unet++	0.9043 ± 0.0003	1.2381 ± 0.0336	0.9947 ± 0.0001	0.9352 ± 0.0004	0.8727 ± 0.0003	0.9902 ± 0.0001
Unet++ w/SASR	0.9074 ± 0.0004	1.4288 ± 0.0427	0.9949 ± 0.0002	0.9351 ± 0.0004	0.8757 ± 0.0004	0.9905 ± 0.0002
Unet++ w/SCR	0.9050 ± 0.0003	1.2325 ± 0.0329	0.9950 ± 0.0002	0.9270 ± 0.0003	0.8739 ± 0.0003	0.9907 ± 0.0003
SAS-SemiUNet++ (Ours)	0.9142 ± 0.0004	1.1516 ± 0.0215	0.9954 ± 0.0002	0.9376 ± 0.0003	0.8844 ± 0.0003	0.9914 ± 0.0002

Table 4. Performance comparison of different models on the validation set. (↑ indicates that higher values are better for the metric, and ↓ indicates that lower values are better. Bold text highlights the best result for each metric). All results are reported as Mean ± Standard Deviation (SD) over 5 independent runs with different random seeds.

Method	Mean_Dice (↑)	Mean_hd95 (↓)	Acc (↑)	Acc_Class (↑)	mIoU (↑)	fwIoU (↑)
EfficientUnet [31]	0.9003 ± 0.0004	2.2168 ± 0.0392	0.9947 ± 0.0001	0.9249 ± 0.0003	0.8672 ± 0.0004	0.9901 ± 0.0001
Segnet [32]	0.8823 ± 0.0005	1.4197 ± 0.0287	0.9938 ± 0.0002	0.9108 ± 0.0004	0.8460 ± 0.0005	0.9885 ± 0.0002
Transunet [33]	0.8518 ± 0.0004	2.2018 ± 0.0415	0.9909 ± 0.0002	0.8941 ± 0.0003	0.8129 ± 0.0004	0.9838 ± 0.0001
AttU_Net [34]	0.9052 ± 0.0003	1.1615 ± 0.0218	0.9948 ± 0.0001	0.9422 ± 0.0003	0.8731 ± 0.0002	0.9904 ± 0.0001
Unet [35]	0.9007 ± 0.0002	3.1687 ± 0.0486	0.9945 ± 0.0001	0.9134 ± 0.0004	0.8682 ± 0.0003	0.9896 ± 0.0001
SAS-SemiUNet++ (Ours)	0.9142 ± 0.0003	1.1516 ± 0.0195	0.9954 ± 0.0002	0.9376 ± 0.0003	0.8844 ± 0.0004	0.9914 ± 0.0002

Table 5. Patient-level statistical distribution of segmentation performance. Metrics are calculated per subject to illustrate clinical variability. The 95% CI values were generated using 1000 bootstrap resamples on the patient cohort.

Method	Metric	Mean	Patient-Level SD	Median	IQR (25th–75th)	95% Bootstrap CI
U-Net++ (Baseline)	Mean Dice	0.9030	0.0352	0.9065	[0.8810, 0.9240]	[0.8320, 0.9480]
U-Net++ (Baseline)	HD95	1.2050	0.5840	1.1200	[0.9100, 1.6200]	[0.7600, 2.6800]
SAS-SemiUNet++ (Ours)	Mean Dice	0.9142	0.0281	0.9180	[0.8960, 0.9320]	[0.8540, 0.9580]
SAS-SemiUNet++ (Ours)	HD95	1.1516	0.4120	1.0500	[0.8600, 1.3800]	[0.6800, 2.1500]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rao, J.; Ma, X.; Li, X. SAS-SemiUNet++: A Stochastic Consistency Regularized Framework with Scale-Aware Semantic Recalibration for Cardiac MRI Segmentation. Appl. Sci. 2026, 16, 3507. https://doi.org/10.3390/app16073507

AMA Style

Rao J, Ma X, Li X. SAS-SemiUNet++: A Stochastic Consistency Regularized Framework with Scale-Aware Semantic Recalibration for Cardiac MRI Segmentation. Applied Sciences. 2026; 16(7):3507. https://doi.org/10.3390/app16073507

Chicago/Turabian Style

Rao, Jie, Xinhao Ma, and Xiang Li. 2026. "SAS-SemiUNet++: A Stochastic Consistency Regularized Framework with Scale-Aware Semantic Recalibration for Cardiac MRI Segmentation" Applied Sciences 16, no. 7: 3507. https://doi.org/10.3390/app16073507

APA Style

Rao, J., Ma, X., & Li, X. (2026). SAS-SemiUNet++: A Stochastic Consistency Regularized Framework with Scale-Aware Semantic Recalibration for Cardiac MRI Segmentation. Applied Sciences, 16(7), 3507. https://doi.org/10.3390/app16073507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SAS-SemiUNet++: A Stochastic Consistency Regularized Framework with Scale-Aware Semantic Recalibration for Cardiac MRI Segmentation

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. ACDC Dataset

2.2. Architectural Overview of SAS-SemiUNet++

2.3. Scale-Aware Semantic Recalibration Unit

2.4. Stochastic Consistency Regularization via Dual-Path Perturbation

2.5. Holistic Semi-Supervised Paradigm via Uncertainty-Rectified Consistency

3. Results

3.1. Experiment Condition

3.2. Evaluation Metrics

3.3. Ablation Study on Component Efficacy

3.4. Comparative Performance Analysis

3.5. Qualitative Visualization and Perceptual Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI