SULBA: A Task-Agnostic Data Augmentation Framework for Deep Learning in Medical Image Analysis

Abe, Ayomide Adeyemi; Nyathi, Mpumelelo

doi:10.3390/diagnostics16101546

Open AccessArticle

SULBA: A Task-Agnostic Data Augmentation Framework for Deep Learning in Medical Image Analysis

by

Ayomide Adeyemi Abe

^1,2,*

and

Mpumelelo Nyathi

¹

Department of Medical Physics, School of Medicine, Sefako Makgatho Health Sciences University, Pretoria 0028, South Africa

²

AureXida Inc., Toronto, ON M4M 1Y3, Canada

^*

Author to whom correspondence should be addressed.

Diagnostics 2026, 16(10), 1546; https://doi.org/10.3390/diagnostics16101546

Submission received: 15 January 2026 / Revised: 10 February 2026 / Accepted: 21 February 2026 / Published: 19 May 2026

(This article belongs to the Special Issue 3rd Edition: AI/ML-Based Medical Image Processing and Analysis)

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: Data augmentation is a foundational component of modern deep learning for enhancing robustness and generalization. However, medical imaging lacks a universally reliable augmentation strategy, forcing researchers into an inefficient “augmentation lottery” that hinders experimental progress and reproducibility. To address this challenge, we introduce Stepwise Upper and Lower Boundaries Augmentation (SULBA), a simple, parameter-free framework designed to eliminate per-task augmentation tuning. Methods: SULBA generates training variations through stepwise cyclic shifts applied along data dimensions, making it inherently applicable to 2D, 3D, and higher-dimensional medical imaging data. To evaluate the efficacy of SULBA as a default DA strategy, we performed benchmarking across 27 publicly available datasets spanning classification and segmentation tasks and 10 convolutional and transformer-based architectures using standard deep learning performance metrics. Results: The results demonstrate that SULBA achieves the highest overall performance and consistently outperforms 16 widely used standard augmentation techniques while delivering robust and reliable improvements without task- or parameter-specific tuning Conclusions: SULBA establishes a principled universal default for data augmentation in medical imaging, with the potential to accelerate the development of generalizable and reproducible medical AI systems.

Keywords:

data augmentation; medical imaging; medical diagnosis; deep learning; artificial intelligence; medical image analysis

1. Introduction

Deep learning has become a cornerstone of modern medical image analysis, enabling substantial advances in tasks such as disease classification, lesion detection, and anatomical segmentation [1,2,3]. Architectures such as convolutional neural networks (CNNs) have demonstrated strong performance across a wide range of imaging modalities by learning hierarchical representations directly from pixel or voxel data [4]. More recently, transformer-based and hybrid architectures have further extended modeling capacity through enhanced long-range contextual reasoning and hierarchical attention mechanisms [5,6]. These advances hold promise for improved diagnostic accuracy, reduced clinician workload, and enhanced patient outcomes. However, achieving robust and generalizable performance with such data-hungry models fundamentally depends on access to large, diverse, and well-annotated datasets [7].

The availability of robust medical imaging datasets is fundamentally constrained by several factors. Expert annotation is expensive and time-consuming, class imbalance is pervasive, and data sharing is restricted by privacy, ethical, and regulatory requirements [8,9]. As a result, many medical imaging studies operate in data-limited regimes, where overfitting and poor generalization remain persistent challenges [10,11]. These limitations have made data augmentation (DA) an indispensable component of medical deep learning pipelines, providing a mechanism to expand training datasets through diverse mechanisms [12].

Existing DA approaches broadly fall into two paradigms: data generation and data transformation. Generative methods such as variational autoencoders, generative adversarial networks, and diffusion models synthesize new samples that approximate the statistical distribution of the original training data [13,14,15]. Transformation-based approaches instead apply predefined operations such as rotation, flipping, cropping, intensity perturbation, or elastic deformation to existing images [12,16]. While both paradigms have shown benefits, they present critical challenges when applied to medical imaging.

A central requirement of medical data augmentation is the preservation of diagnostic validity [12,17]. Generative approaches may exhibit hidden failure modes, leading to the generation of anatomically implausible structures or subtle artifacts that are difficult to detect without expert review, thereby compromising model reliability and clinical trust [18,19]. Transformation-based techniques can similarly undermine diagnostic integrity [17]. For instance, mixing-based augmentations may introduce biologically implausible tissue combinations that obscure true anatomy and degrade learning, while occlusion-based and aggressive cropping can eliminate clinically salient regions [20]. Additionally, commonly used geometric transformations may disrupt anatomical context or alter spatial relationships that are diagnostically meaningful, particularly when applied without domain-specific considerations [12,21].

These data constraints are compounded by domain-specific challenges that distinguish medical from natural image analysis [18]. Effective augmentation must preserve anatomical and pathological fidelity, avoiding transformations that generate biologically implausible tissue structures or relationships. Furthermore, diagnostic features in medical images often rely on subtle variations in low-texture contrast and shape, rather than the high-frequency textures and colors dominant in natural images [12]. This is coupled with vast inter-patient anatomical variability and heterogeneity in acquisition protocols across clinical sites [21]. Consequently, augmentation strategies developed for natural images may be inappropriate or even detrimental when applied directly to medical data [20].

These limitations create a persistent tension between inducing sufficient variability aimed at improving model generalization and preserving the anatomical and pathological fidelity required for clinical relevance, exposing a deeper systemic inefficiency in current practice [13]. In the absence of a universally reliable augmentation strategy, researchers are often compelled to empirically evaluate numerous augmentation techniques and hyperparameter configurations for each new task, imaging modality, and network architecture, which impedes reproducibility and constitutes a significant experimental bottleneck [22]. Although many augmentation methods are standardized and readily accessible through widely adopted deep learning frameworks such as PyTorch [23], TorchIO [24], and TensorFlow [25], the lack of a principled, universally applicable strategy continues to necessitate empirical, task-specific and architecture-dependent selection [26]. This phenomenon, often described as an “augmentation lottery”, represents a significant bottleneck in medical AI research, slowing progress, increasing experimental uncertainty, and hindering reproducibility [22,27].

To address this challenge, we introduce Stepwise Upper and Lower Boundaries Augmentation (SULBA), a simple, parameter-free, perfectly reversible, and dimension-agnostic data augmentation framework. SULBA generates novel training samples via stepwise cyclic shifts applied along data dimensions (e.g., height, width, or depth), making it inherently invariant to data dimensionality and feature composition. It can be applied seamlessly to 2D, 3D, and higher-dimensional data, including single- and multi-channel images without architectural modifications or per-task hyperparameter tuning. While each shift operation is deterministic, diversity arises from stochastic selection of shift offsets during training. This structured reordering aligns with principles explored in permutation-invariant and equivariant learning, where robustness arises from controlled input reordering rather than content corruption [28,29,30].

Unlike conventional augmentation strategies that interpolate, corrupt, or replace image content, SULBA preserves all original information by systematically repositioning contiguous regions through cyclic shifts. This transformation produces complementary partial views within a single image, preserving pixel intensities as well as the integrity of corresponding local tissue structure and pathology while introducing coherent global variation (Figure 1). As a result, SULBA exposes models to anatomically plausible reconfiguration in which salient features appear in altered spatial or feature contexts, reducing reliance on absolute position and encouraging robust, position-invariant representation learning. These properties are particularly advantageous in medical imaging, where preserving diagnostic information and maintaining pathological fidelity are essential [19,31].

While the concept of a cyclic shift is mathematically straightforward, SULBA distinguishes itself from related ideas in several key aspects. SULBA does not follow a general random permutation or patch reordering, which can disrupt clinically vital local anatomical structures [17]. Instead, SULBA’s stepwise cyclic shift preserves contiguous segments of the original data, maintaining the integrity of local features while systematically varying their global context. Furthermore, unlike equivariant networks, which build transformation invariance into the model architecture, SULBA is an input-space strategy that encourages robust, position-invariant feature learning in any standard model. This design is specifically motivated by the need for anatomically plausible variations in medical data, ensuring all original diagnostic information is retained in a perfectly reversible manner.

This work makes two key contributions. First, we propose SULBA as a universal, parameter-free augmentation framework that is reversible, dimension-agnostic, and directly applicable across imaging modalities, data dimensions, and task types. Second, to our knowledge, we present the most comprehensive empirical evaluation of data augmentation methods in medical imaging to date, benchmarking 27 publicly available datasets across 10 convolutional and transformer-based architectures for both classification and segmentation tasks in 2D and 3D domains. Across this diverse experimental landscape, SULBA consistently achieves superior performance, outperforming 16 widely used augmentation strategies while eliminating the need for task-specific augmentation tuning, thereby enabling more robust, reproducible, and scalable medical AI development.

2. Materials and Methods

2.1. SULBA Framework

Stepwise Upper and Lower Boundaries Augmentation (SULBA) is a deterministic, dimension-agnostic data augmentation framework. The framework comprises a stochastic application protocol that is specifically optimized for medical imaging with inherent synchronization for segmentation tasks. For any input image tensor

X \in R^{C {\times D}_{1} \times D_{2} \times \dots \times D_{N}}

defined over feature (C) and spatial dimensions (D), SULBA generates a novel sample by applying a cyclic shift along a randomly selected data dimension

(k)

. A SULBA transformation along dimension

k

,

(k

\in

{0, …, N}) is defined as a cyclic shift operation governed by the modulo function described by

S U L B A (X) = X [:, \dots, (d_{k} + s_{k}) m o d D_{k,} \dots]

(1)

where mod denotes the modulo operation,

$d_{k}$ $\in$ {0,…, $D_{k} - 1$ } is a specific location along dimension $k$ .
$s_{k} \in {1, \dots, D_{k} - 1$ } is a randomly sampled step size.

As shown in Equation (1), the cyclic shift operation is deterministic and perfectly reversible. During training, diversity is introduced by stochastically sampling both the dimension

(k)

to be shifted and the step size

(s_{k})

at each application. This design preserves all original voxel intensities and local structures while systematically reconfiguring the global spatial or feature context of the input. For segmentation tasks, identical shift parameters

({k, s}_{k})

are applied synchronously to both the input image and its corresponding label mask (Figure 1c), ensuring pixel- or voxel-perfect alignment. The complete SULBA procedure is summarized in Algorithm 1.

Algorithm 1 Stepwise Upper and Lower Boundaries Augmentation (SULBA)

Input:
Data tensor

X \in R^{C {\times D}_{1} \times D_{2} \times \dots \times D_{N}}

; stride

s_{k} \in {1, \dots, . D_{k} - 1

}
Output:
Transformed tensor

X^{'}

Procedure:
1. Initialize: For each dimension

k

, determine the set of possible cyclic shift offsets

s_{k} \in {1, 2, \dots, D_{k} - 1

}
2. For each selected dimension

k :

1.: Randomly select a shift $s_{k}$ from the possible offsets.
2.: For each index $D_{k} \in {0, D_{k} - 1$ }, compute the shifted index:
3.: $d_{k}^{'} = (d_{k} + s_{k}) m o d D_{k}$
4.: Rearrange X along dimension $k$ according to the shifted indices $d_{k}^{'}$ .

3. Return: The transformed tensor

X^{'}

.

2.2. Scaling of Generated Samples

Each term

D_{k} - 1

corresponds to the set of all valid step sizes along

k

, and shifts are applied along one dimension at a time. The resulting sample generation is additive across dimensions and automatically adapts to the complexity of the input data. Consequently, larger or higher-dimensional images yield a greater diversity of training samples without requiring manual parameter tuning, and the number of novel samples generated by SULBA scales with both image resolution and dimensionality. For an input tensor

X

,

X \in R^{C {\times D}_{1} \times D_{2} \times \dots \times D_{N}}

; the total number of possible novel configurations is given by

T o t a l s a m p l e s = \sum_{k - 0}^{N} (D_{k} - 1)

(2)

2.3. SULBA Perfect Reversibility

SULBA transformation guarantees perfect reversibility due to the invertible nature of cyclic shifts [28,29,30]. Let

X

denote an input tensor, and let

X^{'}

represent a cyclic shift of

X

along dimension

k

by a step size

s

. The original input can be exactly recovered by applying a complementary shift of

D_{k} - s

as shown in Equations (3) and (4). Additionally, SULBA does not perform interpolation, cropping, or any pixel-level modification. This deterministic and reversible property ensures the preservation of anatomical and pathological content within each transformed view, facilitating reliable and reproducible model training. Formally, if

X ’ = C y c l i c S h i f t (X, k, s)

(3)

Then the inverse operation is given by

X = C y c l i c S h i f t (X ’, k, D_{k} - s)

(4)

where

D_{k}

denotes the size of dimension

k

.

2.4. Datasets and Preprocessing

Benchmarking was conducted across 27 publicly available medical imaging datasets spanning four task categories: 2D classification, 3D classification, 2D segmentation, and 3D segmentation.

For 2D classification, the BloodMNIST, BreastMNIST, DermaMNIST, OctMNIST, OrganAMNIST, OrganCMNIST, OrganSMNIST, PathMNIST, PneumoniaMNIST, and TissueMNIST were selected from the MedMNIST v2 [32] suite. Six volumetric MedMNIST v2 datasets—AdrenalMNIST3D, FractureMNIST3D, NoduleMNIST3D, OrganMNIST3D, SynapseMNIST3D, and VesselMNIST3D—were used for 3D classification.

For 2D segmentation, seven datasets were analyzed from the MedSegBench datasets [33]: AbdomenUSMSBench, Bkai-Igh-MSBench, CystoFluidMSBench, DeepbacsMSBench, FHPsAOPMSBench, MosMedPlusMSBench, and Promise12MSBench. For 3D segmentation, experiments were conducted on IXITiny [24] and the Medical Segmentation Decathlon (MSD) datasets [34] for the Heart and Hippocampus tasks. A complete dataset description is provided in Supplementary Table S30.

For 2D data, preprocessing included normalization using ImageNet [35] statistics and conversion of grayscale images to three channels to enable the use of pretrained weights. For 3D data, images were standardized to canonical orientation, normalized using Z-score normalization, and rescaled to an intensity range of [−1, 1]. When native image sizes varied, inputs were resized to match the architectural requirements of each model.

For cross-dataset generalization, models trained on PneumoniaMNIST were evaluated on a publicly available chest X-ray pneumonia dataset [36] using identical preprocessing pipelines. A summary of dataset characteristics, including resolution, sample count, class distribution, modality and preprocessing details, is provided in Supplementary Table S30.

2.5. Network Architectures

To evaluate robustness across diverse model architectures, we considered both convolutional and transformer-based networks. For 2D classification tasks, we employed ResNet-18 [37] and Swin Transformer Tiny [38] models initialized with ImageNet-pretrained weights. For 3D classification, we used R(2 + 1)D − 18 [39] and 3D Swin Transformer [40]. Tiny models initialized with Kinetics-400 pretrained weights [41]. All classification models were implemented using standard architectures through the PyTorch TorchVision framework.

For 2D segmentation, we evaluated a U-Net [42] with an ImageNet-pretrained ResNet-18 encoder and a SegFormer [43] model with an ImageNet-pretrained MiT-B1 backbone, both implemented using the Segmentation Models PyTorch library. For 3D segmentation, we employed a standard 3D U-Net [44] and SwinUNETR [45] implemented through MONAI [46] and trained from randomly initialized weights.

Cross-dataset generalization experiments additionally included both pretrained and randomly initialized variants of ResNet-18, Swin Transformer Tiny, MobileNetV3 (small) [47], and MobileViT-xxs [48], all implemented using standard architectures through the PyTorch TorchVision. The selected network architectures represent foundational and established convolutional and modern transformer-based paradigms commonly used in medical image analysis benchmarks.

2.6. 2D and 3D Data Augmentation

For 2D augmentation baselines, we evaluated commonly used transformations, including random horizontal flip, random vertical flip, random rotation, random erasing [49], Cutout [50], CutMix [51], and MixUp [52]. Random flipping and rotation were applied with a fixed application probability (

a p

= 0.5) consistent with standard practice. All other augmentation methods, including SULBA, were evaluated at two application probabilities (ap = 0.5 and ap = 1.0). Unless otherwise specified, standard implementations and default parameters were used through the PyTorch TorchVision parameters to facilitate reproducibility. Cutout was implemented using a custom implementation following the standard formulation [50].

For 3D experiments, volumetric augmentations, including spike noise and gamma adjustment [24], anisotropy [53], bias field distortion [54], elastic deformation [55], blurring [56], ghosting [57], random flipping, and additive noise, were implemented using standard transformations through the TorchIO library. All transformations were evaluated at application probabilities of 0.5 and 1.0. Augmentation parameters for all baseline methods were set to their standard, widely used defaults from PyTorch TorchVision and TorchIO libraries to simulate realistic out-of-the-box usage. However, for elastic deformation, the number of control points

(5, 5, 5)

, maximum displacement

(3, 3, 3)

, and border locking (set to 2) were configured to preserve anatomical plausibility. Traditional spatial transformations (flipping, rotation) were applied with a probability of p = 0.5, consistent with standard practice in the field [17].

2.7. Training and Implementation Details

Models were trained using AdamW optimization with mixed-precision training. Cross-entropy loss was minimized using standard mini-batch gradient descent with gradient clipping (ℓ₂ norm = 1.0). Batch sizes were adjusted according to dataset size and data dimensionality. After evaluating multiple learning rates, values that consistently yielded optimal performance over 100 training epochs were selected for both classification

(1 {\times 10}^{- 4})

and segmentation tasks

1 {\times 10}^{- 3} .

Extending training beyond this point resulted in overfitting and performance degradation. Models were trained using the standard training splits and evaluated on the corresponding test splits provided with each dataset. The model with the highest validation accuracy was retained.

To assess reproducibility, three independent runs were performed on a randomly selected dataset and model architecture across all augmentation techniques using random seeds 1, 42, and 100. After confirming consistent trends, the random seed was fixed to 42 for all subsequent experiments. All experiments were implemented in Python 3.13 using PyTorch 2.7.1+cu126 within Jupyter Notebook 7.3.2. Analysis of the results was performed using SciPy (v1.15.2), Pandas (v2.2.3), scikit-learn (v1.6.1), and NumPy (v2.2.6). Experiments were conducted on an Intel(R) Core(TM) i7-9850H CPU @ 2.60 GHz (2.59 GHz), 32 GB of RAM, and an NVIDIA Quadro RTX 3000 GPU (6 GB VRAM).

2.8. Evaluation Protocol and Statistical Analysis

The primary objective of this study is to identify a robust, high-performing default augmentation strategy across the highly heterogeneous medical imaging domain. Therefore, our evaluation and analysis focus on aggregate performance profiling and comparative ranking. Performance was evaluated using task-standard metrics. For classification, we report accuracy, sensitivity, specificity, area under the receiver operating characteristic curve (AUROC), and F1-score. For segmentation, we report intersection over union (IoU), precision, recall, and F1-score.

To enable holistic comparison across models, architectures, and datasets, we adopted a cumulative ranking system based on aggregated metric performance [17], with minor modifications. This approach reduces sensitivity to outliers and provides a clear, unified order of performance. The cumulative score was computed as

C = \sum_{M = 1}^{n} T e c h n i q u e (M)

(5)

where “C” is the cumulative score, “M” is the evaluation metric, “n” is the total number of evaluation metrics, and “technique” is the data augmentation method.

Relative improvement over a non-augmented baseline was calculated as

R e l a t i v e I m p r o v e m e n t = c u m u l a t i v e t e c h n i q u e s c o r e - c u m u l a t i v e b a s e l i n e s c o r e

(6)

We computed 95% confidence intervals for the mean improvements in classification and segmentation benchmarks using the standard error of the mean across datasets. To evaluate cross-dataset generalization, we calculated a composite score by taking the arithmetic mean of standard performance metrics, including accuracy, sensitivity, specificity, AUROC, IoU, and F1-score. This approach was used to assess the overall reliability and ranking of methods across a broad experimental landscape, directly measuring each method’s central effect size and its variability across diverse tasks.

Training overhead was assessed by measuring training time for the selected augmentation strategy over 100 training epochs relative to a non-augmented baseline. The relative time overhead was calculated as:

Overhead = Σ_{n = 1}^{n} (Σ_{e = 1}^{100} Technique - Σ_{e = 1}^{100} Baseline)

(7)

where “n” is the total number of experiments, “e” is the epochs, “technique” is a data augmentation technique, and “baseline” is a non-augmented baseline.

Statistical significance of performance differences was determined using paired t-tests, with p ≤ 0.05 considered statistically significant.

3. Results

3.1. Benchmark Performance on 2D Medical Image Classification

To evaluate the efficacy of SULBA in 2D medical image classification, we performed an extensive benchmark across ten diverse 2D medical image datasets from the MedMNISTv2 suite, spanning dermatology, pathology, radiology, and histology. Performance was assessed using two widely adopted model architectures—a convolutional neural network (ResNet-18) and a vision transformer (Swin Transformer Tiny)—both initialized with ImageNet-pretrained weights. A baseline model trained without data augmentation served as the reference for all comparisons. For all augmentation experiments, transformations were applied stochastically with the predefined application probability. SULBA and four conventional augmentation techniques—CutMix, Cutout, MixUp, and random erasing—were implemented with application probabilities

a p = 0.5

and

a p = 1.0

while commonly used spatial augmentations, including rotation, horizontal flip, and vertical flip, were implemented with

a p = 0.5

, consistent with standard practice.

3.1.1. SULBA Provides Robust Performance Gains Across Diverse Datasets and Model Architectures

The aggregate improvement heatmap (Figure 2a) indicates that SULBA (ap = 1.0) achieved positive performance gains in 8 out of 10 datasets when averaged across both model architectures, ranking highest among all evaluated augmentation techniques. Only OrganCMNIST (−0.99) and PathMNIST (−1.54) exhibited modest declines. Notably, SULBA was not uniquely disadvantaged but was among the most robust methods, demonstrating its superior failure tolerance. These declines likely stem from inherent dataset challenges, such as severe class imbalance, which constrain any augmentation technique operating on the existing sample distribution. Specifically, large improvements were observed on DermaMNIST (+25.16), PneumoniaMNIST (+17.16), and TissueMNIST (+24.71) (Supplementary Tables S1–S7). Overall, SULBA (ap = 1.0) attained a mean relative improvement of +5.56 score points over the non-augmented baseline (95% CI ± 2.61), the highest among all evaluated augmentation methods (Figure 2c). In a per-architecture analysis, SULBA (ap = 1.0) yielded the strongest mean percentage improvement for both ResNet-18 (+1.27%, 95% CI ± 0.88) and the Swin Transformer (+1.27%, 95% CI ± 0.89), followed by SULBA (ap = 0.5). Among competing techniques, random rotation achieved the best performance, with mean percentage improvements of +0.56% (95% CI ± 0.91) for ResNet-18 and +0.61% (95% CI ± 1.10) for the Swin Transformer. Collectively, these results demonstrate the robustness of SULBA across both convolutional and attention-based architectures (Figure 2b).

3.1.2. SULBA Demonstrates Superior and Consistent Performance Improvements

The benchmark analysis revealed that SULBA variants ranked highest among all tested data augmentation methods in the aggregate performance ranking. Across all datasets and both model architectures, SULBA (ap = 1.0) achieved the top cumulative score (9355.24), followed closely by SULBA (ap = 0.5) with a score of 9340.43 (Figure 2d; Supplementary Table S10). The performance margin between SULBA (ap = 1.0) and the strongest conventional augmentation, rotation (ap = 0.5), was 63 points, increasing to 95 points relative to the runner-up method (random erasing, ap = 0.5) and 205 points compared with the lowest-performing technique (vertical flip). Importantly, both SULBA variants consistently yielded positive performance gains across architectures, whereas several standard augmentations, including MixUp (ap = 1.0) and horizontal flip (ap = 0.5), exhibited negative mean relative improvements (Figure 2c), underscoring SULBA’s robustness and reliability.

3.1.3. The Integration of SULBA with Traditional Augmentations Does Not Confer Synergistic Benefits

We investigated whether combining SULBA with foundational spatial transforms (horizontal flip, vertical flip, rotation) could yield complementary effects. Contrary to expectation, these combinations consistently underperformed compared to SULBA applied alone (Figure 2e,f). For instance, combining SULBA (ap = 1.0) with horizontal flip resulted in a mean performance decrease of 0.39% compared to standalone SULBA, while combinations with vertical flip showed a more pronounced detrimental effect (−1.46%) (Figure 2f; Supplementary Tables S8 and S9). This suggests that SULBA’s learned, saliency-guided transformations may subsume or conflict with the benefits of heuristic, label-agnostic spatial modifications, establishing SULBA as a performant standalone augmentation strategy.

3.2. Benchmark Performance on 3D Medical Image Classification

We extended our benchmark to the 3D domain using six volumetric medical imaging datasets from the 3D MedMNISTv2 (AdrenalMNIST, FractureMNIST, NoduleMNIST, OrganMNIST, SynapseMNIST, VesselMNIST) suite. The evaluation included nine volumetric augmentation techniques pertinent to 3D data, such as anisotropic scaling, bias field simulation, 3D elastic deformation, blurring, noise injection, bias field transformation, flipping and ghost artifacts alongside the baseline and SULBA. Each DA method was assessed with R(2 + 1)D − 18 and 3D Swin Transformer models, both initialized with Kinetics-400 natural video dataset pretrained weights. The selected augmentation strategies were implemented with application probabilities of 0.5 and 1.0.

3.2.1. SULBA Delivers Consistent and Exceptionally Large Improvements Across All 3D Datasets

The improvement heatmap (Figure 3a) reveals that SULBA (ap = 1.0) provided the highest positive gains for all datasets. The improvements were most pronounced on VesselMNIST (+87.73), FractureMNIST (+62.67), and SynapseMNIST (+60.24). The mean relative improvement for SULBA (ap = 1.0) was +24.52 score points (95% CI ± 9.45), surpassing all other DA techniques (Figure 3c). Architecturally, SULBA (ap = 1.0) showed robust gains for both backbones, with a mean percentage improvement of +6.49% (95% CI ± 3.29) for R(2 + 1)D − 18 and +6.76% (95% CI ± 4.42) for the 3D Swin Transformer (Figure 3b).

3.2.2. Traditional 3D Augmentations Show High Dataset-Specific Variance and Inconsistent Effects

While techniques like blurring and anisotropy provided moderate aggregate benefits, their effects varied widely—and sometimes severely negatively—across datasets. Other techniques demonstrated overall degradation, such as flipping (ap = 1.0) on OrganMNIST (−66.26) (Figure 3a; Supplementary Tables S11–S14). In contrast, SULBA’s saliency-guided approach generated uniformly high, positive impacts, underscoring its generalizability and reliability for 3D medical image analysis.

3.2.3. SULBA Substantially Outperforms Standard Volumetric Augmentation Techniques in 3D Classification

In the aggregate performance ranking across 3D classification benchmarks, SULBA (ap = 1.0) achieved the highest cumulative score (5011.45), followed closely by SULBA (ap = 0.5) with a score of 4980.96 (Figure 3d; Supplementary Tables S15 and S16). Both SULBA variants outperformed all conventional volumetric augmentation techniques. SULBA (ap = 1.0) exceeded the best-performing traditional method (blurring, ap = 1.0) by 211 points and the lowest-performing technique (ghosting, ap = 1.0) by 348 points. These results highlight SULBA’s pronounced and consistent superiority in the volumetric classification setting.

3.3. Benchmark Performance on 2D Medical Image Segmentation

To validate SULBA’s efficacy on dense prediction tasks, we benchmarked against seven augmentation methods across seven diverse 2D medical image segmentation datasets from the MedSegBench using both CNN (U-Net with an ImageNet-pretrained ResNet-18 encoder) and Transformer (SegFormer model with an ImageNet-pretrained MiT-B1 backbone) segmentation architectures. Standard data augmentation techniques were evaluated at application probabilities ap = 0.5 and ap = 1.0. Flipping and rotation techniques were evaluated at ap = 0.5.

3.3.1. SULBA Provides Robust, Positive Improvements Across Diverse Segmentation Datasets

The improvement heatmap shows SULBA (ap = 1.0) delivered strong gains on AbdomenUSMSBench (+16.28), Bkai-Igh-MSBench (+16.83), and CystoFluidMSBench (+11.47) (Figure 4a). The mean relative improvement for SULBA (ap = 1.0) was +4.49 score points (95% CI ± 2.15), surpassing all other methods (Figure 4c). Architecturally, SULBA showed consistent gains for both backbones, with mean percentage improvements of +1.58% (ResNet-18, 95% CI ± 0.64) and +1.22% (Segformer, 95% CI ± 1.39) (Figure 4b; Supplementary Table S21).

3.3.2. SULBA Ranks as the Top-Performing Augmentation Strategy for 2D Segmentation

In the aggregate performance ranking across seven 2D segmentation datasets and two model architectures, SULBA (ap = 1.0) achieved the highest cumulative score (4655.95), closely followed by SULBA (ap = 0.5) (4649.50) (Figure 4d; Supplementary Table S22). SULBA (ap = 1.0) outperformed the best-performing conventional augmentation (rotation) by 13 points, exceeded the runner-up method (horizontal flip) by 21 points, and surpassed the weakest-performing technique (MixUp, ap = 1.0) by 171 points. Notably, SULBA and its variant were among only three augmentation methods that improved performance in more than 85% of dataset-architecture combinations, alongside random erasing and rotation (Supplementary Tables S17–S20). This consistent superiority demonstrates that SULBA’s advantages extend robustly beyond classification to the more challenging task of dense, pixel-wise segmentation.

3.3.3. Combining SULBA with Spatial Augmentations Provides Marginal and Inconsistent Benefits

When SULBA was combined with traditional spatial augmentations, the effects were small and varied by probability setting. For SULBA (ap = 0.5), pairing with horizontal flip yielded a slight improvement of +0.29%, and with rotation +0.11%, while vertical flip resulted in a minor decline of –0.14%. In contrast, SULBA (ap = 1.0) combinations led to small decreases across all spatial transforms: horizontal flip (–0.004%), rotation (–0.17%), and vertical flip (−0.22%) (Figure 4f).

The magnitude of these changes was minimal, all below 0.30% and inconsistent across probability settings, indicating no reliable synergistic gain (Supplementary Tables S21 and S22). These results reinforce that SULBA alone provides near-optimal augmentation for 2D segmentation, with traditional spatial transforms offering little complementary benefit. This observation further simplifies pipeline design by eliminating the need to combine multiple augmentation strategies.

3.4. Benchmark Performance on 3D Medical Image Segmentation

We evaluated SULBA on 3D segmentation tasks using three volumetric datasets: IXITiny, the Medical Segmentation Decathlon (MSD) comprising the MSD-Heart dataset, and the MSD-Hippocampus dataset. The benchmark included nine volumetric augmentation techniques spanning geometric transformations, intensity perturbations, and artifact-related augmentations, each applied with application probabilities of ap = 0.5 and 1.0. Experiments were conducted using two representative and randomly initialized segmentation architectures: a convolutional 3D U-Net and the transformer-based SwinUNETR. A baseline model trained without augmentation was used as the reference for all comparisons.

3.4.1. SULBA Delivers Consistent Improvements Across 3D Segmentation Datasets

The improvement heatmap shows that SULBA (ap = 1.0) provided the highest gains on all three datasets, with particularly pronounced improvements on MSD-Heart (+54.82) and MSD-Hippocampus (+18.44) (Figure 5a). The mean relative improvement for SULBA (ap = 1.0) was +12.75 score points (95% CI ± 9.56), resulting in a 4.68 improvement over the best competing method (elastic deformation, ap = 1.0) (Figure 5c). Architecturally, SULBA demonstrated consistent benefits, with mean percentage improvements of +4.28% (95% CI ± 5.89) for 3D U-Net and +4.85% (95% CI ± 5.91) for SwinUNETR (Figure 5b).

3.4.2. Conventional 3D Augmentation Methods Exhibit Pronounced Dataset-Dependent Variability

Many established volumetric augmentation techniques produced highly variable and often negative effects across datasets and model architectures. On the MSD-Heart dataset, all augmentation method-probability combinations reduced performance except SULBA and elastic deformation, with several techniques inducing relative declines exceeding 28 points. Although certain methods, such as elastic deformation, yielded moderate improvements on specific datasets, their effects were inconsistent and strongly dependent on both the dataset and the applied probability (0.5 or 1.0) (Figure 5a; Supplementary Tables S23 and S24). Comparable instability was observed on IXITiny and MSD-Hippocampus, where performance gains varied widely across augmentation strategies and experimental settings. Together, these findings highlight the limited reliability of conventional 3D augmentation methods, which may improve performance in isolated cases but frequently degrade it in others, underscoring the need for more robust and task-agnostic augmentation strategies.

3.4.3. SULBA Achieved the Highest Overall Ranking Among 3D Augmentation Strategies

Across all three volumetric datasets and both evaluated architectures, SULBA consistently delivered positive performance gains and achieved the strongest overall results. SULBA (ap = 1.0) attained the highest cumulative score (2023.55), followed closely by SULBA (ap = 0.5) with a score of 2014.51. The leading SULBA variant outperformed the best competing conventional method, random elastic deformation (ap = 1.0), by 28 points; exceeded the runner-up technique, random ghosting (ap = 1.0), by 73 points; and surpassed the lowest-performing method, bias field (ap = 1.0), by 111 points (Figure 5d; Supplementary Tables S25 and S26). Collectively, these results establish SULBA as the most effective and robust augmentation strategy among all evaluated 3D methods, delivering reliable improvements across datasets and architectural paradigms.

3.5. Generalization Performance Across Diverse Architectures

To evaluate SULBA’s ability to generalize beyond the training distribution, models were trained on PneumoniaMNIST and tested on an independent chest X-ray pneumonia dataset [36]. Seven augmentation strategies (CutMix, Cutout, MixUp, random erasing, horizontal flip, rotation, and vertical flip) were benchmarked on four architectures (ResNet-18, Swin Transformer (Tiny), MobileNet V3 (small variant), and MobileViT_xxs), each implemented with both randomly initialized and ImageNet-pretrained weights. Traditional augmentations were applied at application probabilities of 0.5 and 1.0, while spatial augmentations used a fixed probability of 0.5, consistent with standard practice. Comparisons were made across all selected data augmentation strategies across all eight model variants (Figure 6; Supplementary Tables S27 and S28).

3.5.1. SULBA Delivers Superior Cross-Dataset Generalization

SULBA demonstrated robust and consistent cross-dataset generalization across all eight evaluated model variants, achieving positive composite gains in every case (Figure 6a). The largest improvements were observed for randomly initialized models, including MobileNet V3 (+52.29) and Swin Transformer (+33.15) (Supplementary Table S28). Overall, SULBA (ap = 1.0) achieved the highest mean composite score (22.03; 95% CI ± 12.92), an 8.63 improvement over the best competing method, CutMix (ap = 1.0; 13.40; 95% CI ± 8.71) (Figure 6c). In the cumulative performance ranking, SULBA (ap = 1.0) attained the highest total score (3703.84), followed closely by SULBA (ap = 0.5) with a score of 3694.12, outperforming all seven evaluated augmentation strategies across both application probabilities. The leading SULBA variant exceeded the strongest competing method, CutMix (ap = 1.0), by 69 points, rotation by 82 points, and the lowest-performing method, MixUp (ap = 0.5), by 179 points (Figure 6f; Supplementary Table S29). Notably, while conventional augmentations exhibited negative or inconsistent effects when applied to pretrained models, SULBA maintained uniformly positive gains across both randomly initialized and pretrained training regimes, underscoring its reliability for cross-dataset generalization as well as for models trained de novo.

3.5.2. SULBA Provides Consistent Improvements Across Architectures

SULBA demonstrated remarkably stable performance across architectural families, with low variability quantified by a coefficient of variation (CV) of 0.79 for SULBA (ap = 1.0), the lowest among all methods (Figure 6b). In contrast, conventional augmentations exhibited high instability. Vertical flip (ap = 0.5) demonstrated a CV of 42.67, and MixUp (ap = 0.5) showed a CV of 53.63, reflecting unpredictable and frequent detrimental effects on individual architectures (Figure 6b; Supplementary Tables S27 and S28). SULBA’s consistency, coupled with its uniformly positive contributions across both randomly initialized and pretrained models (Figure 6d), underscores its reliability as an architecture-agnostic augmentation strategy.

3.5.3. Training with Randomly Initialized Weights Amplifies SULBA’s Benefits

Analysis of training paradigms revealed that SULBA delivered substantially larger gains in models trained with randomly initialized weights than in those initialized with ImageNet-pretrained weights. The performance increase in randomly initialized models over pretrained models reached +27.68 for SULBA (ap = 0.5) and +27.64 for SULBA (ap = 1.0), the largest differential observed among all augmentation methods (Figure 6e). This advantage was consistent across architectural families, with the randomly initialized variants of MobileNet V3 (+52.29) and Swin Transformer (+33.15) showing the most pronounced absolute improvements under SULBA (ap = 1.0) (Figure 6a). In contrast, traditional augmentation methods such as MixUp and vertical flip exhibited inconsistent or negative effects in the pretrained setting, further highlighting SULBA’s reliability (Supplementary Table S27). Together, these results underscore SULBA’s particular value in data-scarce regimes or when transfer learning from natural images is infeasible or suboptimal for the target medical domain.

3.6. Average Training Time Overhead

To evaluate the practical efficiency of SULBA, we measured the average training time overhead over 100 epochs for all augmentation methods relative to a non-augmented baseline. This analysis encompassed both 2D and 3D classification and segmentation benchmarks.

3.6.1. Average Training Time of 2D Augmentation

The mean training time overhead for 2D augmentation techniques is summarized in Figure 7a. Among the evaluated methods, SULBA introduced minimal computational cost. SULBA (ap = 1.0) incurred an average overhead of 45 s per 100 epochs, while SULBA (ap = 0.5) incurred 39 s. These values were among the lowest recorded, comparable to the overhead of simple spatial transforms like horizontal flip (29 s) and significantly lower than that of resource-intensive methods such as MixUp (ap = 1.0; 241 s) and CutMix (ap = 1.0; 203 s). The statistical analysis results indicated that only CutMix (ap = 1.0 (p = 0.007) and ap = 0.5 (p = 0.018)) and MixUp (ap = 1.0 (p = 0.021) and ap = 0.5 (p = 0.028)) showed statistical significance compared to the non-augmented baseline. The average training time overhead for SULBA and other lightweight transforms was not statistically distinguishable from the baseline.

3.6.2. Average Training Time Overhead of 3D Volumetric Augmentation

For 3D volumetric data, the computational cost of augmentation was generally higher, as shown in Figure 7b. Consistent with the 2D results, SULBA remained highly efficient. The overhead for SULBA (ap = 1.0) was 112 s per 100 epochs, and for SULBA (ap = 0.5) it was 79 s. In contrast, geometrically complex transformations like anisotropy (ap = 1.0; 361 s) and elastic deformation (ap = 1.0; 355 s) imposed the greatest costs, exceeding SULBA’s average training time overhead. Statistical analysis revealed that transformation methods, including anisotropy (p = 0.033), bias field (p = 0.002), elastic deformation (p = 0.024), ghosting (p = 0.016), and spike (p = 0.041) at their ap = 1.0 settings, demonstrated a significant increase in average training time relative to the baseline.

4. Discussion

Data augmentation is a cornerstone of contemporary deep learning, enhancing model robustness and generalization. In medical imaging, however, no established augmentation approach has demonstrated reliable cross-task, cross-modal, or cross-architectural transferability [18,19,20]. As a result, augmentation pipelines are typically constructed through extensive trial-and-error tuning, which increases experimental burden and undermines reproducibility [13,22]. Our large-scale benchmarks across classification, segmentation, and cross-dataset generalization tasks demonstrate that SULBA offers a simple, deterministic, and parameter-free alternative. Unlike conventional augmentation strategies, SULBA systematically transforms data along intrinsic dimensions without modifying pixel intensities, interpolating content, or altering local structure (Figure 1). Its consistent performance across convolutional and transformer-based architectures, as well as across diverse imaging domains, establishes SULBA as a reliable default augmentation strategy that removes the need for task-specific parameter tuning and directly addresses long-standing inefficiencies in medical data augmentation practice.

Relative to existing concepts, such as permutation-based augmentations and equivariant learning. General random permutation or patch-shuffling techniques often introduce biologically implausible discontinuities, violating the structural consistency required in medical images [17]. SULBA avoids this by employing a structured, whole-axis cyclic shift that guarantees the preservation of all local neighborhoods and tissue continuities within each transformed view. Compared to methods designed for equivariant representation learning, which impose architectural constraints, SULBA provides a data-centric regularization that is compatible with any standard network. Its deterministic and reversible nature ensures no loss or synthetic generation of image content, addressing a core limitation of many generative and corruption-based augmentation methods in the clinical domain [19]. These properties present SULBA as a principled, domain-aware operationalization of cyclic transformations, tailored to the constraints and requirements of medical image analysis. Consequently, a user can apply SULBA as a default augmentation without conducting a hyperparameter search. Furthermore, SULBA’s contribution lies in the demonstration that its structured application forms a universally effective and reliable augmentation framework for medical imaging.

Conventional augmentation approaches typically rely on stochastic deformation, interpolation, partial occlusion, or content mixing [49,50,51,52,53,54,55,56]. While such methods can improve performance in specific settings, their effectiveness is often highly sensitive to anatomical orientation, acquisition protocol, label structure, and model architecture [12,13]. Mixing-based techniques may generate biologically implausible tissue combinations, while aggressive cropping or erasure risks removing clinically salient regions [17], and widely adopted geometric transformations can disrupt critical spatial relationships [26]. Consequently, prior studies have emphasized that augmentation performance is difficult to predict a priori and frequently requires dataset-specific tuning [57,58]. Consistent with these observations, our benchmarks reveal substantial dataset-dependent variability among conventional methods (Figure 2, Figure 3, Figure 4 and Figure 5). In contrast, SULBA exhibits stable performance across architectures, dimensionalities (2D and 3D), and task types (classification and segmentation), indicating that its benefits do not depend on narrow inductive biases but instead arise from a more general, architecture-agnostic regularization mechanism.

The robustness of SULBA stems from its mechanistic design. By applying cyclic shifts along data dimensions, SULBA systematically repositions existing image content while preserving pixel information (Equation (1)). Repeated application across training epochs exposes models to structured yet diverse views of the same underlying sample, reinforcing feature representations without introducing artifacts or synthetic content. This encourages distributed, position-tolerant feature learning and yields consistent gains across both classification and segmentation tasks. In segmentation tasks, exact voxel-level correspondence between inputs and labels is preserved, enabling reliable dense prediction without auxiliary heuristics. Experimental analysis of shift offsets further demonstrates that even a limited number of cyclic shifts can generate sufficient diversity to improve model generalization, particularly for small or low-resolution volumetric inputs (e.g., 3D benchmark using 32 × 32 × 32 voxel images; Figure 5).

SULBA’s impact is most pronounced in data-scarce regimes, including training from random initialization without pretrained representations (Figure 5b and Figure 6e). In such settings, early feature learning is especially vulnerable to spurious correlations and dataset-specific biases due to the absence of strong inductive priors [59,60]. By systematically reconfiguring input structure, SULBA mitigates this vulnerability and promotes the learning of position-invariant features that generalize beyond the training distribution. Across cross-dataset evaluations, SULBA consistently improved performance on independent test sets, indicating enhanced robustness under distributional shift. In the generalization experiment, SULBA-augmented models achieved nearly twofold higher mean composite scores compared with competing augmentation strategies across four architectures trained without pretrained representation, spanning both convolutional and transformer-based models (Figure 6c). Similarly, on 3D volumetric datasets such as MSD-Heart and IXITiny, SULBA produced uniformly strong gains for both 3D U-Net and SwinUNETR models initialized with random weights (Figure 5a). Together, these results indicate that SULBA regularizes early feature learning and mitigates the risks associated with limited labeled data across modalities and dimensionalities.

Across extensive benchmarks, SULBA demonstrated highly stable performance under diverse experimental settings (Supplementary Tables S10, S16, S22 and S26). Performance remained robust to the choice of application probability, with both evaluated settings yielding consistent gains (Figure 2, Figure 3, Figure 4 and Figure 5). When combined with conventional spatial transformations, it maintained high performance. However, these combinations provided no systematic advantage in classification tasks and only modest, dataset-specific improvements in select 2D segmentation benchmarks. These results suggest that SULBA’s intrinsic transformation accounts for the dominant augmentation effect (Figure 2e and Figure 4e). Supporting this interpretation, cross-dataset analyses revealed that SULBA exhibited the lowest coefficients of variation across all evaluated architectures, irrespective of whether models were trained de novo or with pre-activated weights (Figure 6b). This low variability underscores SULBA’s predictability and reliability as a default augmentation strategy, beyond improvements in mean performance alone.

The quantitative analysis of computational overhead (Section 3.6) provides empirical validation that SULBA is a computationally lightweight framework. Its core operation, a deterministic, non-interpolative cyclic shift, is fundamentally cheaper than transformations requiring intensive pixel-level computation, such as interpolation for elastic deformation or blending for mixing-based techniques. This mechanistic efficiency is reflected in the results, where the per-epoch training time overhead for SULBA was negligible and statistically indistinguishable from the non-augmented baseline compared to the significant runtime penalties incurred by resource-intensive methods such as CutMix, MixUp, elastic deformation, etc. (Figure 7). Furthermore, SULBA operates as an online, in-place transformation, generating variations on-the-fly during data loading. This design eliminates the need for pre-computation or the storage of augmented samples, thereby introducing zero persistent memory overhead, a critical advantage for processing high-resolution 3D volumes. Consequently, SULBA delivers the robust performance gains demonstrated in Section 3.1, Section 3.2, Section 3.3, Section 3.4 and Section 3.5 with minimal runtime penalty, establishing it not only as an effective but also as an efficient default choice for both 2D and 3D medical imaging deep learning analysis pipelines. Together, these findings position SULBA as a reliable, universal default for data augmentation in medical image analysis with the potential to improve experimental reproducibility and accelerate the development of robust, generalizable, clinically translatable AI models.

Despite these strengths, our evaluation primarily relies on curated, publicly available datasets. Although these datasets span a wide range of modalities, tasks, and architectures, they cannot fully capture the heterogeneity encountered in prospective clinical environments, including site-specific acquisition protocols, scanner variability, and population-level differences [61,62]. Consequently, while the observed gains strongly suggest generalizability, real-world deployment may introduce additional challenges. Future work will therefore focus on multi-institutional validation, longitudinal evaluation, and deployment under heterogeneous clinical conditions to more comprehensively assess robustness and reproducibility in operational settings.

5. Conclusions

This study presents SULBA: a simple, parameter-free, and dimension-agnostic framework based on deterministic cyclic shifts designed to directly address the augmentation lottery in medical imaging deep learning. Comprehensive benchmarks across 27 datasets establish SULBA as a consistently top-performing default strategy. It demonstrates robust efficacy across architectures (convolutional and transformer), dimensionalities (2D and 3D), tasks (classification and segmentation), and diverse modalities, all while introducing minimal computational overhead. For practitioners, applying SULBA with a probability of 1.0 offers a reliable, tuning-free starting point, eliminating the need for task-specific augmentation search.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/diagnostics16101546/s1, Tables S1–S10: Benchmark performance on 2D medical image classification. Tables S11–S16: Benchmark performance on 3D medical image classification. Tables S17–S22: Benchmark performance on 2D medical image segmentation. Tables S23–S26: Benchmark performance on 3D medical image segmentation. Tables S27–S29: Generalization performance of data augmentation techniques across diverse neural architectures using the PneumoniaMNIST (training set) and Chest X-ray Pneumonia dataset (test set). Tables S30–S31: Datasets and training details.

Author Contributions

Conceptualization, A.A.A.; methodology, A.A.A.; software, A.A.A.; validation, A.A.A.; formal analysis, A.A.A.; investigation, A.A.A.; resources, A.A.A.; data curation, A.A.A.; writing—original draft preparation, A.A.A.; writing—review and editing, A.A.A. and M.N.; visualization, A.A.A.; supervision, A.A.A. and M.N.; project administration, A.A.A.; funding acquisition, A.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Department of Science and Innovation and the Centre for Scientific and Industrial Research (IBS) Programme, South Africa (grant number 201909607).

Institutional Review Board Statement

The study was approved by the Sefako Makgatho Health Sciences University Research Ethics Committee on 3rd of February, 2022 with approval number SMUREC/M/12/2022: PG.

Informed Consent Statement

The study involved the analysis of publicly available, de-identified benchmark datasets. No human subjects were directly involved, and ethical review or patient consent was not required.

Data Availability Statement

All datasets used in this study are publicly available. The MedMNIST v2 datasets were obtained from their official public repository. Additional 2D and 3D segmentation datasets, including MSBench datasets, IXITiny and Medical Segmentation Decathlon (MSD) tasks, are accessible from their respective public sources. The independent chest X-ray pneumonia dataset used for cross-dataset generalization is also publicly available. All datasets were accessed from their original repositories. Detailed dataset descriptions, access information, and preprocessing settings are provided in Supplementary Table S31. No new datasets were generated during this study. The SULBA data augmentation code (Version 1.0), including evaluation, benchmark data augmentation implemetation and figure generation, is available at the GitHub repository: https://github.com/Saintcodded/SULBA-Stepwise-Upper-and-Lower-Boundaries-Augmention.git. assessed on 15 January 2026.

Conflicts of Interest

Author A.A.A. is the founder of AureXida Inc., has equity in the company, and receives no income from it at present. Author M.N declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

n	Number
C	Cumulative
M	Metric
2D	Two-Dimensional
3D	Three-Dimensional
ap	Application Probability
DA	Data Augmentation
AI	Artificial Intelligent
CNN	Convolutional Neural Network
MSD	Medical Segmentation Decathlon
SULBA	Stepwise Upper and Lower Boundaries Augmentation
AUROC	Area Under the Receiver Operating Characteristic Curve

References

Tian, F.; Liu, D.; Wei, N.; Fu, Q.; Sun, L.; Liu, W.; Sui, X.; Tian, K.; Nemeth, G.; Feng, J.; et al. Prediction of tumor origin in cancers of unknown primary origin with cytology-based deep learning. Nat. Med. 2024, 30, 1309–1319. [Google Scholar] [CrossRef] [PubMed]
Kumar, R.; Kumbharkar, P.; Vanam, S.; Sharma, S. Medical images classification using deep learning: A survey. Multimed. Tools Appl. 2024, 83, 19683–19728. [Google Scholar] [CrossRef]
Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef] [PubMed]
Kshatri, S.S.; Singh, D. Convolutional neural network in medical image analysis: A review. Arch. Comput. Methods Eng. 2023, 30, 2793–2810. [Google Scholar] [CrossRef]
Takahashi, S.; Sakaguchi, Y.; Kouno, N.; Takasawa, K.; Ishizu, K.; Akagi, Y.; Aoyama, R.; Teraya, N.; Bolatkan, A.; Shinkai, N.; et al. Comparison of vision transformers and convolutional neural networks in medical image analysis: A systematic review. J. Med. Syst. 2024, 48, 84. [Google Scholar] [CrossRef]
Khan, R.F.; Lee, B.D.; Lee, M.S. Transformers in medical image segmentation: A narrative review. Quant. Imaging Med. Surg. 2023, 13, 8747. [Google Scholar] [CrossRef]
Tudosiu, P.D.; Pinaya, W.H.; Ferreira Da Costa, P.; Dafflon, J.; Patel, A.; Borges, P.; Fernandez, V.; Graham, M.S.; Gray, R.J.; Nachev, P.; et al. Realistic morphology-preserving generative modelling of the brain. Nat. Mach. Intell. 2024, 6, 811–819. [Google Scholar] [CrossRef]
Dhar, T.; Dey, N.; Borra, S.; Sherratt, R.S. Challenges of deep learning in medical image analysis—Improving explainability and trust. IEEE Trans. Technol. Soc. 2023, 4, 68–75. [Google Scholar] [CrossRef]
Price, W.N.; Cohen, I.G. Privacy in the age of medical big data. Nat. Med. 2019, 25, 37–43. [Google Scholar] [CrossRef]
Xu, C.; Coen-Pirani, P.; Jiang, X. Empirical study of overfitting in deep learning for predicting breast cancer metastasis. Cancers 2023, 15, 1969. [Google Scholar] [CrossRef]
Azizi, S.; Culp, L.; Freyberg, J.; Mustafa, B.; Baur, S.; Kornblith, S.; Chen, T.; Tomasev, N.; Mitrović, J.; Strachan, P.; et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 2023, 7, 756–779. [Google Scholar] [CrossRef] [PubMed]
Goceri, E. Medical image data augmentation: Techniques, comparisons and interpretations. Artif. Intell. Rev. 2023, 56, 12561–12605. [Google Scholar] [CrossRef] [PubMed]
Kebaili, A.; Lapuyade-Lahorgue, J.; Ruan, S. Deep learning approaches for data augmentation in medical imaging: A review. J. Imaging 2023, 9, 81. [Google Scholar] [CrossRef] [PubMed]
Makhlouf, A.; Maayah, M.; Abughanam, N.; Catal, C. The use of generative adversarial networks in medical image augmentation. Neural Comput. Appl. 2023, 35, 24055–24068. [Google Scholar] [CrossRef]
Wang, J.; Wang, K.; Yu, Y.; Lu, Y.; Xiao, W.; Sun, Z.; Liu, F.; Zou, Z.; Gao, Y.; Yang, L.; et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications. Nat. Med. 2025, 31, 609–617. [Google Scholar] [CrossRef]
Fujii, Y.; Uchida, D.; Sato, R.; Obata, T.; Akihiro, M.; Miyamoto, K.; Morimoto, K.; Terasawa, H.; Yamazaki, T.; Matsumoto, K.; et al. Effectiveness of data-augmentation on deep learning in evaluating rapid on-site cytopathology at endoscopic ultrasound-guided fine needle aspiration. Sci. Rep. 2024, 14, 22441. [Google Scholar] [CrossRef]
Abe, A.A.; Nyathi, M. Lung Cancer Diagnosis from Computed Tomography Images Using Deep Learning Algorithms with Random Pixel Swap Data Augmentation: Algorithm Development and Validation Study. JMIR Bioinform. Biotechnol. 2025, 6, e68848. [Google Scholar] [CrossRef]
Saad, M.M.; O’Reilly, R.; Rehmani, M.H. A survey on training challenges in generative adversarial networks for biomedical image analysis. Artif. Intell. Rev. 2024, 57, 19. [Google Scholar] [CrossRef]
Islam, S.; Aziz, M.T.; Nabil, H.R.; Jim, J.R.; Mridha, M.F.; Kabir, M.M.; Asai, N.; Shin, J. Generative adversarial networks (GANs) in medical imaging: Advancements, applications, and challenges. IEEE Access 2024, 12, 35728–35753. [Google Scholar] [CrossRef]
Rao, A.; Lee, J.Y.; Aalami, O. Studying the impact of augmentations on medical confidence calibration. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2023, Paris, France, 2–3 October 2023; pp. 2462–2472. [Google Scholar]
Sun, D.; Dornaika, F. Data augmentation for deep visual recognition using superpixel based pairwise image fusion. Inf. Fusion 2024, 107, 102308. [Google Scholar] [CrossRef]
Pineau, J.; Vincent-Lamarre, P.; Sinha, K.; Larivière, V.; Beygelzimer, A.; d’Alché-Buc, F.; Fox, E.; Larochelle, H. Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). J. Mach. Learn. Res. 2021, 22, 1–20. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035. [Google Scholar]
Pérez-García, F.; Sparks, R.; Ourselin, S. TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Comput. Methods Programs Biomed. 2021, 208, 106236. [Google Scholar] [CrossRef] [PubMed]
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. {TensorFlow}: A system for {Large-Scale} machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) 2016, Savannah, Georgia, 2–4 November 2016; pp. 265–283. [Google Scholar]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Zoph, B.; Cubuk, E.D.; Ghiasi, G.; Lin, T.Y.; Shlens, J.; Le, Q.V. Learning data augmentation strategies for object detection. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2020; pp. 566–583. [Google Scholar]
Cohen, T.; Welling, M. Group equivariant convolutional networks. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2016; pp. 2990–2999. [Google Scholar]
Zhang, Y.; Hare, J.; Prugel-Bennett, A. Deep set prediction networks. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December, 2019. [Google Scholar]
Gerken, J.E.; Aronsson, J.; Carlsson, O.; Linander, H.; Ohlsson, F.; Petersson, C.; Persson, D. Geometric deep learning and equivariant neural networks. Artif. Intell. Rev. 2023, 56, 14605–14662. [Google Scholar] [CrossRef]
Diaz-Peregrino, R.; Robles, F.T.; Gonzalez, G.; Palma, R.; Escalante-Ramirez, B.; Olveres, J.; Reyes-Gonzalez, J.P.; Gomez-Coeto, J.A.; Rodriguez-Herrera, C.A. Enhancing generalization in whole-body MRI-based deep learning models: A novel data augmentation pipeline for cross-platform adaptation. Intell.-Based Med. 2025, 12, 100277. [Google Scholar] [CrossRef]
Yang, J.; Shi, R.; Wei, D.; Liu, Z.; Zhao, L.; Ke, B.; Pfister, H.; Ni, B. Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Sci. Data 2023, 10, 41. [Google Scholar] [CrossRef]
Kuş, Z.; Aydin, M. MedSegBench: A comprehensive benchmark for medical image segmentation in diverse data modalities. Sci. Data 2024, 11, 1283. [Google Scholar] [CrossRef]
Antonelli, M.; Reinke, A.; Bakas, S.; Farahani, K.; Kopp-Schneider, A.; Landman, B.A.; Litjens, G.; Menze, B.; Ronneberger, O.; Summers, R.M.; et al. The medical segmentation decathlon. Nat. Commun. 2022, 13, 4128. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2009; pp. 248–255. [Google Scholar]
Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 2018, 172, 1122–1131. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Tran, D.; Wang, H.; Torresani, L.; Ray, J.; LeCun, Y.; Paluri, M. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6450–6459. [Google Scholar]
Liu, Z.; Ning, J.; Cao, Y.; Wei, Y.; Zhang, Z.; Lin, S.; Hu, H. Video swin transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 3202–3211. [Google Scholar]
Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P.; et al. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer International Publishing: Cham, Switzerland, 2016; pp. 424–432. [Google Scholar]
Hatamizadeh, A.; Nath, V.; Tang, Y.; Yang, D.; Roth, H.R.; Xu, D. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In International MICCAI Brainlesion Workshop; Springer International Publishing: Cham, Switzerland, 2021; pp. 272–284. [Google Scholar]
Cardoso, M.J.; Li, W.; Brown, R.; Ma, N.; Kerfoot, E.; Wang, Y.; Murrey, B.; Myronenko, A.; Zhao, C.; Yang, D.; et al. Monai: An open-source framework for deep learning in healthcare. arXiv 2022, arXiv:2211.02701. [Google Scholar] [CrossRef]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13001–13008. [Google Scholar]
DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar] [CrossRef]
Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
Billot, B.; Robinson, E.; Dalca, A.V.; Iglesias, J.E. Partial volume segmentation of brain MRI scans of any resolution and contrast. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer International Publishing: Cham, Switzerland, 2020; pp. 177–187. [Google Scholar]
Sudre, C.H.; Cardoso, M.J.; Ourselin, S. Alzheimer’s Disease Neuroimaging Initiative. Longitudinal segmentation of age-related white matter hyperintensities. Med. Image Anal. 2017, 38, 50–64. [Google Scholar] [CrossRef]
Shackleford, J.; Kandasamy, N.; Sharp, G. High Performance Deformable Image Registration Algorithms for Manycore Processors; Newnes: Oxford, UK, 2013; pp. 1–12. [Google Scholar]
Zhao, M.; Wei, Y.; Lu, Y.; Wong, K.K. A novel U-Net approach to segment the cardiac chamber in magnetic resonance images with ghost artifacts. Comput. Methods Programs Biomed. 2020, 196, 105623. [Google Scholar] [CrossRef]
Kumar, T.; Brennan, R.; Mileo, A.; Bendechache, M. Image data augmentation approaches: A comprehensive survey and future directions. IEEE Access 2024, 12, 187536–187571. [Google Scholar] [CrossRef]
Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 16–20 June 2019; pp. 113–123. [Google Scholar]
Theodoris, C.V.; Xiao, L.; Chopra, A.; Chaffin, M.D.; Al Sayed, Z.R.; Hill, M.C.; Mantineo, H.; Brydon, E.M.; Zeng, Z.; Liu, X.S.; et al. Transfer learning enables predictions in network biology. Nature 2023, 618, 616–624. [Google Scholar] [CrossRef]
Sabha, S.U.; Assad, A.; Din, N.M.U.; Bhat, M.R. From scratch or pretrained? An in-depth analysis of deep learning approaches with limited data. Int. J. Syst. Assur. Eng. Manag. 2024. [Google Scholar] [CrossRef]
Jiménez-Sánchez, A.; Avlona, N.R.; de Boer, S.; Campello, V.M.; Feragen, A.; Ferrante, E.; Ganz, M.; Gichoya, J.W.; Gonzalez, C.; Groefsema, S.; et al. In the picture: Medical imaging datasets, artifacts, and their living review. In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency 2025, Athens, Greece, 23–26 June 2025; pp. 511–531. [Google Scholar]
Dulaney, A.; Virostko, J. Disparities in the demographic composition of The Cancer Imaging Archive. Radiol. Imaging Cancer 2024, 6, e230100. [Google Scholar] [CrossRef]

Figure 1. Stepwise Upper and Lower Boundaries Augmentation (SULBA). (a) Illustration of the SULBA operation applied along the height dimension of a 2D image. (b) Illustration of the operation applied along the width dimension. (c) Paired example of a 2D chest CT scan (top) and its corresponding tumor segmentation mask (bottom) after SULBA is applied along the height dimension. (d) Paired example showing the same transformation applied along the width dimension. For all panels, the first column shows the original data, green lines depict the complementary partial views created by the cyclic shift, and columns 2–4 show three distinct augmented outputs generated using different random step sizes.

Figure 2. 2D medical image classification benchmark. (a) Heatmap of the aggregated performance improvement for each data augmentation method relative to a non-augmented baseline across ten 2D MedMNIST datasets. (b) Percentage improvement over the baseline for ResNet−18 (solid bars) and Swin Transformer (hatched bars) architectures. (c) Mean relative improvement for each method across all datasets and architectures. (d) Overall performance ranking based on the sum of scores across all datasets and architectures. (e) Performance comparison of SULBA variants and their combinations with traditional spatial augmentations. (f) Impact on performance (percentage change) when combining SULBA with traditional techniques.

Figure 3. 3D medical image classification benchmark. (a) Heatmap of the aggregated performance improvement for each volumetric augmentation method across six 3D MedMNIST datasets. (b) Percentage improvement for R(2 + 1)D − 18 and 3D Swin Transformer architectures. (c) Mean relative improvement for each method. (d) Circular ranking plot of the total aggregated score for each method.

Figure 4. 2D medical image segmentation benchmark. (a) Heatmap of performance improvement for each augmentation method across seven 2D segmentation datasets. (b) Architecture-specific percentage improvement for a ResNet−18-based U-Net and SegFormer. (c) Mean relative improvement for each method. (d) Overall performance ranking of all methods. (e) Performance of SULBA variants and their combinations with traditional augmentations. (f) Impact on performance from these combinations.

Figure 5. 3D medical image segmentation benchmark. (a) Heatmap of performance improvement for each volumetric augmentation method across three 3D segmentation datasets. (b) Percentage improvement for 3D U−Net and SwinUNETR architectures. (c) Mean relative improvement for each method. (d) Overall performance ranking based on total aggregated score.

Figure 6. Cross−dataset generalization benchmark. (a) Heatmap of the composite performance score for each augmentation method across eight neural network architectures in a pneumonia classification task. (b) Distribution of composite scores for each method (box plots) with annotated coefficient of variation (CV). (c) Mean composite score for each method. (d) Contribution of each architecture to the total composite score for each method. (e) Performance difference between randomly initialized and pretrained models for each method. (f) Circular ranking plot of the total composite score for each method.

Figure 7. Comparative training time overhead of augmentation strategies. (a) Training time overhead for 2D augmentation techniques. (b) Training time overhead for 3D volumetric augmentation techniques. * indicates statistical significance (p < 0.05).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Abe, A.A.; Nyathi, M. SULBA: A Task-Agnostic Data Augmentation Framework for Deep Learning in Medical Image Analysis. Diagnostics 2026, 16, 1546. https://doi.org/10.3390/diagnostics16101546

AMA Style

Abe AA, Nyathi M. SULBA: A Task-Agnostic Data Augmentation Framework for Deep Learning in Medical Image Analysis. Diagnostics. 2026; 16(10):1546. https://doi.org/10.3390/diagnostics16101546

Chicago/Turabian Style

Abe, Ayomide Adeyemi, and Mpumelelo Nyathi. 2026. "SULBA: A Task-Agnostic Data Augmentation Framework for Deep Learning in Medical Image Analysis" Diagnostics 16, no. 10: 1546. https://doi.org/10.3390/diagnostics16101546

APA Style

Abe, A. A., & Nyathi, M. (2026). SULBA: A Task-Agnostic Data Augmentation Framework for Deep Learning in Medical Image Analysis. Diagnostics, 16(10), 1546. https://doi.org/10.3390/diagnostics16101546

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SULBA: A Task-Agnostic Data Augmentation Framework for Deep Learning in Medical Image Analysis

Abstract

1. Introduction

2. Materials and Methods

2.1. SULBA Framework

2.2. Scaling of Generated Samples

2.3. SULBA Perfect Reversibility

2.4. Datasets and Preprocessing

2.5. Network Architectures

2.6. 2D and 3D Data Augmentation

2.7. Training and Implementation Details

2.8. Evaluation Protocol and Statistical Analysis

3. Results

3.1. Benchmark Performance on 2D Medical Image Classification

3.1.1. SULBA Provides Robust Performance Gains Across Diverse Datasets and Model Architectures

3.1.2. SULBA Demonstrates Superior and Consistent Performance Improvements

3.1.3. The Integration of SULBA with Traditional Augmentations Does Not Confer Synergistic Benefits

3.2. Benchmark Performance on 3D Medical Image Classification

3.2.1. SULBA Delivers Consistent and Exceptionally Large Improvements Across All 3D Datasets

3.2.2. Traditional 3D Augmentations Show High Dataset-Specific Variance and Inconsistent Effects

3.2.3. SULBA Substantially Outperforms Standard Volumetric Augmentation Techniques in 3D Classification

3.3. Benchmark Performance on 2D Medical Image Segmentation

3.3.1. SULBA Provides Robust, Positive Improvements Across Diverse Segmentation Datasets

3.3.2. SULBA Ranks as the Top-Performing Augmentation Strategy for 2D Segmentation

3.3.3. Combining SULBA with Spatial Augmentations Provides Marginal and Inconsistent Benefits

3.4. Benchmark Performance on 3D Medical Image Segmentation

3.4.1. SULBA Delivers Consistent Improvements Across 3D Segmentation Datasets

3.4.2. Conventional 3D Augmentation Methods Exhibit Pronounced Dataset-Dependent Variability

3.4.3. SULBA Achieved the Highest Overall Ranking Among 3D Augmentation Strategies

3.5. Generalization Performance Across Diverse Architectures

3.5.1. SULBA Delivers Superior Cross-Dataset Generalization

3.5.2. SULBA Provides Consistent Improvements Across Architectures

3.5.3. Training with Randomly Initialized Weights Amplifies SULBA’s Benefits

3.6. Average Training Time Overhead

3.6.1. Average Training Time of 2D Augmentation

3.6.2. Average Training Time Overhead of 3D Volumetric Augmentation

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI