A Hybrid Particle Swarm–Genetic Algorithm Framework for U-Net Hyperparameter Optimization in High-Precision Brain Tumor MRI Segmentation

Saifullah, Shoffan; Dreżewski, Rafał; Yudhana, Anton; Tanone, Radius; Suryotomo, Andiko Putro

doi:10.3390/app16063041

Open AccessArticle

A Hybrid Particle Swarm–Genetic Algorithm Framework for U-Net Hyperparameter Optimization in High-Precision Brain Tumor MRI Segmentation

by

Shoffan Saifullah

^1,2,*

,

Rafał Dreżewski

^1,3,*

,

Anton Yudhana

⁴

,

Radius Tanone

⁵

and

Andiko Putro Suryotomo

²

¹

Faculty of Computer Science, AGH University of Krakow, 30-059 Krakow, Poland

²

Department of Informatics, Universitas Pembangunan Nasional Veteran Yogyakarta, Yogyakarta 55283, Indonesia

³

Artificial Intelligence Research Group (AIRG), Informatics Department, Faculty of Industrial Technology, Universitas Ahmad Dahlan, Yogyakarta 55181, Indonesia

⁴

Department of Electrical Engineering, Universitas Ahmad Dahlan, Yogyakarta 55191, Indonesia

⁵

Artificial Intelligence Research Center, Universitas Kristen Satya Wacana, Salatiga 50711, Indonesia

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 3041; https://doi.org/10.3390/app16063041

Submission received: 4 February 2026 / Revised: 14 March 2026 / Accepted: 19 March 2026 / Published: 21 March 2026

(This article belongs to the Special Issue Advanced Techniques and Applications in Magnetic Resonance Imaging)

Download

Browse Figures

Versions Notes

Abstract

Accurate and robust brain tumor segmentation remains a critical challenge in medical image analysis due to high inter-patient variability, complex tumor morphology, and modality-specific noise in MRI scans. This study proposes PSO-GA-U-Net, a novel hybrid deep learning framework that integrates Particle Swarm Optimization (PSO) and Genetic Algorithms (GAs) to optimize the U-Net architecture, enhancing segmentation performance and generalization. PSO dynamically tunes the learning rate to accommodate modality-specific variations, while the GA adaptively regulates dropout to improve feature diversity and reduce overfitting. The model was evaluated on three benchmark datasets—FBTS, BraTS 2021, and BraTS 2018—using five-fold cross-validation. PSO-GA-U-Net achieves Dice Similarity Coefficients (DSC) of 0.9587, 0.9406, and 0.9480 and Jaccard Index (JI) scores of 0.9209, 0.8881, and 0.9024, respectively, consistently outperforming state-of-the-art models in both overlap accuracy and boundary delineation. Statistical tests confirm that these improvements are significant across folds (

p < 0.05

). Visual heatmaps further illustrate the model’s ability to preserve structural integrity across tumor types and modalities. These results indicate that metaheuristic-guided deep learning offers a promising and clinically applicable solution for automatic tumor segmentation in radiological workflows.

Keywords:

brain tumor segmentation; U-Net optimization; Particle Swarm Optimization (PSO); Genetic Algorithm (GA); hybrid metaheuristics; medical image analysis

1. Introduction

Brain tumors pose significant clinical and radiological challenges due to their aggressive progression [1], diverse histopathological characteristics [2], and heterogeneous presentation in magnetic resonance imaging (MRI) [3]. Accurate and timely segmentation of brain tumors is essential for treatment planning, surgical navigation, and post-operative assessment [4,5,6]. Manual delineation by expert radiologists, although precise, is labor-intensive and subject to intra- and inter-observer variability [7,8,9], particularly in multimodal MRI where subtle boundary differences and modality-specific artifacts frequently occur [2,10,11]. These challenges have propelled the development of automated segmentation systems, with deep learning methods—especially convolutional neural networks (CNNs)—emerging as the de facto standard for high-throughput tumor delineation [12,13].

Among CNN-based architectures, U-Net and its variants have demonstrated remarkable performance in biomedical segmentation tasks due to their encoder–decoder structure with skip connections that preserve spatial context [14,15,16,17]. However, despite their widespread adoption, U-Net models often exhibit limited generalization across datasets due to static architecture configurations, sensitivity to hyperparameter settings, and vulnerability to overfitting [18,19]—especially when exposed to heterogeneous tumor morphology or imbalanced multimodal inputs [20,21]. These challenges are further exacerbated in real-world clinical datasets where MR acquisitions may vary in contrast, noise levels, and tumor size.

To overcome these limitations, recent studies have explored metaheuristic optimization strategies, such as Particle Swarm Optimization (PSO) and Genetic Algorithms (GAs), to tune deep network hyperparameters adaptively. PSO is known for its global convergence ability in continuous optimization problems [22,23,24], while GA [25,26,27,28] offers robust exploration through population-based dropout and structural tuning. However, these methods are often applied independently, without harnessing their complementary strengths. Moreover, prior work lacks an integrated framework that optimizes both learning dynamics and architectural regularization to improve boundary precision and robustness across diverse tumor types.

In this work, we propose a novel hybrid optimization-based deep learning framework, named PSO-GA-U-Net, which leverages PSO for dynamic learning rate adaptation and a GA for dropout regulation within the U-Net architecture. This dual-strategy optimization enhances the model’s ability to capture fine-grained tumor boundaries while preserving global contextual integrity. The model was extensively evaluated on three benchmark datasets—FBTS (Figshare Brain Tumor Segmentation), BraTS 2021, and BraTS 2018—across multiple tumor classes and MRI modalities. Experimental results demonstrate that PSO-GA-U-Net consistently outperforms state-of-the-art (SOTA) models in the Dice Similarity Coefficient (DSC), Jaccard Index (JI), and boundary preservation. Statistical significance tests confirm the reliability of improvements across five-fold cross-validation settings, while qualitative heatmaps further illustrate the structural fidelity of the model in complex tumor regions.

The main contributions of this study are summarized as follows:

We designed a novel hybrid metaheuristic optimization framework (PSO-GA-U-Net) that integrates PSO and a GA to jointly optimize the learning rate dynamics and dropout rates in U-Net for brain tumor segmentation.
We implemented a robust training pipeline validated across three challenging datasets (FBTS, BraTS 2021, BraTS 2018), ensuring generalization across tumor types, MRI modalities, and anatomical variations.
We conducted extensive performance evaluations using both quantitative metrics (DSC, JI, HD-Hausdorff Distance, and ASSD-Average Symmetric Surface Distance) and qualitative heatmap analyses and performed statistical validation with paired t-tests on cross-validation folds.
We present a comprehensive comparison with SOTA models, demonstrating superior performance in volumetric overlap, boundary precision, and robustness to modality-specific noise.

The remainder of the paper is structured as follows: Section 2 presents an overview of related works, highlighting recent developments in deep learning and metaheuristic optimization for medical image segmentation. Section 3 details the proposed PSO-GA-U-Net architecture, the hybrid optimization strategy, and the experimental setup. Section 4 presents the quantitative and qualitative results, comparative evaluations, and statistical analyses. Finally, Section 5 concludes the paper with a summary of the findings, a discussion of limitations, and directions for future research.

2. Related Works

The field of automated brain tumor segmentation has evolved significantly with the rise of deep learning methods, particularly convolutional neural networks (CNNs) [8,29]. Among these, the U-Net architecture has become a foundational backbone due to its encoder–decoder structure and skip connections, which enable multiscale feature fusion [30,31,32]. However, traditional U-Net variants often suffer from rigidity in parameter selection and limited adaptability to complex tumor morphologies and multimodal MRI inputs [15,18,33].

To enhance segmentation performance, numerous studies have focused on architectural modifications to U-Net. Attention-based models such as U-Net-AG [31,34,35,36] integrate spatial attention gates to guide the model’s focus toward salient tumor regions. Residual and dense connectivity strategies, as employed in Residual-Attention-U-Net [8,37,38,39,40], aim to improve gradient flow and multi-scale representation. However, these models are often manually tuned and sensitive to overfitting when confronted with high variability in tumor intensity and shape. Recent works, such as DeepLabV3+ [41,42] and SPPNet [43], introduce atrous spatial pyramid pooling to address multiscale features; however, they lack adaptive learning dynamics.

To address hyperparameter sensitivity, researchers have turned to bio-inspired optimization techniques [44]. Particle Swarm Optimization (PSO), inspired by collective behavior in swarms, has been applied to optimize learning rates [18,24], loss function weights [45,46], and even thresholding in preprocessing pipelines [24,44,47]. PSO has demonstrated success in efficiently tuning continuous-valued parameters, but can suffer from premature convergence in high-dimensional search spaces.

On the other hand, Genetic Algorithms (GAs) excel in exploring diverse architectural variants through population-based evolution strategies [48,49,50]. In segmentation tasks, GAs have been used to evolve filter configurations [51,52], dropout rates, and layer depths [53,54]. Despite its global search capability, GA performance can degrade when used alone due to lack of convergence guidance or redundancy in feature activation [55,56].

Several studies have explored hybrid optimization to mitigate these individual shortcomings. Works such as PSO-U-Net [24] and GA-U-Net [53,57] demonstrate that integrating optimization algorithms into deep learning pipelines can yield better generalization. However, most prior methods employ these techniques in isolation or apply them to static configurations, without a coordinated mechanism to optimize both training dynamics (e.g., learning rate schedules) and regularization factors (e.g., dropout schemes) simultaneously.

On benchmark datasets, models such as ViT-self-attention [58] and AWA-VGG-19 [59] have achieved competitive results on the BraTS dataset. However, transformer-based models often require extensive training data and struggle to refine local features, particularly in edematous or necrotic tumor subregions. Cascaded architectures such as 2C-U-Net [60] and U-Net-ASPP-EVO [61] improve multi-resolution learning, but are computationally intensive and sensitive to modality imbalance.

The proposed PSO-GA-U-Net differs from existing approaches in several key ways:

In contrast to previous PSO or GA models, it introduces a dual-level optimization pipeline where PSO governs adaptive learning rate scheduling during training. At the same time, the GA dynamically evolves the dropout rate to prevent overfitting and promote feature diversity.
The architecture remains U-Net-based for its proven spatial fidelity, but is optimized across cross-validation folds and datasets to ensure both performance and generalization.
The evaluation covers diverse tumor types (meningioma, glioma, and pituitary, HGG, LGG) and MRI modalities (T1CE, FLAIR, T1, and T2), with consistent superiority in DSC and JI compared to transformer-based, residual, and attention-enhanced models.
Statistical significance tests ( $p < 0.05$ ) were used to validate the robustness of improvements across datasets and tumor classes.

In summary, while various strategies have been employed to improve tumor segmentation—ranging from architectural innovations to standalone optimizers—our work uniquely combines adaptive learning and architectural regularization into a single, hybrid metaheuristic framework. This positions PSO-GA-U-Net as a robust and scalable solution for clinical-grade segmentation in MRI-based tumor analysis.

3. Methods

3.1. Dataset Description

To comprehensively evaluate the robustness, adaptability, and generalization capability of the proposed PSO-GA-U-Net framework, three publicly available brain tumor segmentation datasets were used: the Figshare Brain Tumor Segmentation (FBTS) dataset, the BraTS 2021 dataset, and the BraTS 2018 dataset. These datasets vary in tumor types, imaging modalities, and annotation granularity, providing a diverse and challenging experimental setup. All datasets were preprocessed to a uniform spatial resolution of

256 \times 256

pixels and normalized to the range [0, 1] before training.

3.1.1. Figshare Brain Tumor Segmentation (FBTS) Dataset

The FBTS dataset consists of T1-weighted contrast-enhanced (T1CE) axial MRI slices sourced from a curated clinical dataset [62]. It contains manually annotated tumor masks for three tumor types: meningioma, glioma, and pituitary adenoma. Each slice is associated with a binary mask that delineates the tumor region, as illustrated in Figure 1.

Meningioma: 3060 slices;
Glioma: 708 slices;
Pituitary Tumor: 930 slices.

Figure 1. Representative examples from the FBTS dataset: meningioma, glioma, and pituitary tumors with their respective binary masks.

Tumor sizes and shapes vary significantly across classes, with gliomas exhibiting irregular, infiltrative structures, and pituitary tumors generally appearing as small, well-circumscribed masses. Each image is single-channel but was expanded to three channels during preprocessing to meet the deep learning model’s input requirements. The binary masks were thresholded and rescaled to ensure compatibility with the sigmoid output of the segmentation network.

3.1.2. BraTS 2021 Dataset

The BraTS 2021 dataset comprises axial MRI slices acquired using four distinct modalities: T1, T2, T1CE (post-contrast), and FLAIR [63]. Each sample (Figure 2) is annotated with a label representing the tumor region, which in our study was converted into a binary whole tumor mask. This dataset is particularly relevant for evaluating modality-specific performance as each MRI sequence highlights different pathological characteristics:

T1 emphasizes anatomical structures.
T2 highlights fluid-filled regions.
T1CE captures active tumor enhancement.
FLAIR is sensitive to edema and non-enhancing tumor cores.

Figure 2. Sample from BraTS 2021 showing all four MRI modalities (T1, T2, T1CE, and FLAIR), the colored multi-label mask, and its binarized whole tumor counterpart.

All samples were processed as independent 2D axial slices, preserving the integrity of the modality throughout evaluation. The ground truth masks were derived from voxel-level segmentations and combined to form whole tumor regions.

3.1.3. BraTS 2018 Dataset

To examine the performance of the proposed model across tumor grades, the BraTS 2018 dataset was utilized (Figure 3). This dataset includes separate cases for high-grade gliomas (HGGs) and low-grade gliomas (LGGs) [64,65,66]. The MRI modalities used are consistent with BraTS 2021 (T1, T2, T1CE, and FLAIR). Each slice is labeled with expert-annotated tumor masks focusing on the entire tumor structure. This dataset provides an essential test of the model’s ability to handle intra-tumoral variability and subtle morphological differences between tumor grades.

Each MRI slice is accompanied by a corresponding ground-truth mask indicating the tumor region. The masks are binary images in which tumor pixels are labeled 1 and background pixels 0.

An analysis of the pixel distribution across the FBTS segmentation masks reveals a strong class imbalance. Tumor pixels constitute approximately 1.69% of total pixels, while background pixels account for 98.31%. Across individual images, tumor regions occupy on average 1.69 ± 1.38% of pixels.

3.1.4. Preprocessing and Augmentation Pipeline

To ensure robust feature extraction and effective generalization across heterogeneous brain tumor images, a comprehensive preprocessing and augmentation pipeline was applied [67,68]. The preprocessing steps were designed to normalize intensity distributions [69], standardize spatial dimensions [70], and prepare data for hybrid optimization training [71]. The entire pipeline can be mathematically summarized through the following stages:

(1): Intensity Normalization

All input MR slices

I (x, y)

were normalized to the [0, 1] range via min–max normalization (Equation (1)).

I_{norm} (x, y) = \frac{I (x, y) - min (I)}{max (I) - min (I)}

(1)

For the Figshare dataset, grayscale T1CE slices were replicated into three identical channels to form

I_{norm}^{(3)} (x, y)

. For the BraTS datasets, normalization was performed independently for each modality (T1, T2, FLAIR, and T1CE), preserving contrast-specific features.

(2): Spatial Standardization and Binarization

Each image

I_{norm}

and mask M was resized to a fixed dimension of

256 \times 256

pixels using bilinear interpolation for the image and nearest-neighbor interpolation for the mask (Equation (2)).

I_{resized} (x^{'}, y^{'}) = {Interp}_{bilinear} (I_{norm} (x, y)), M_{resized} (x^{'}, y^{'}) = {Interp}_{nn} (M (x, y))

(2)

The resized masks were thresholded into binary format using Equation (3).

M_{bin} (x, y) = \{\begin{matrix} 1, & if M (x, y) \geq 0.5 \\ 0, & otherwise \end{matrix}

(3)

This binarization ensures compatibility with the output layer of our segmentation network using a sigmoid activation.

(3): Data Augmentation Strategy

To introduce structural variability and mitigate overfitting, we applied a probabilistic augmentation pipeline

A

to the training set (Equation (4)).

\tilde{I}, \tilde{M} = A (I_{resized}, M_{bin}),

(4)

where

A

includes the following transformations:

Rotation: $θ \sim U (- 15^{\circ}, + 15^{\circ})$ ;
Flipping: $\Pr_{flip} = 0.5$ (horizontal and vertical);
Zoom: Scaling factor $s \sim U (0.9, 1.1)$ ;
Translation: Offset $(Δ x, Δ y) \sim U (- 10, 10)$ ;
Elastic deformation: Applied via a random displacement field $ϕ (x, y)$ with Gaussian smoothing;
Contrast and Brightness Shift: $I^{'} = α I + β$ where $α \sim U (0.9, 1.1)$ , $β \sim U (- 0.1, 0.1)$ .

All augmentation operations were applied synchronously to images and their corresponding masks to preserve spatial alignment.

Data augmentation was applied exclusively to the training subset, while the validation and test sets remained unchanged to ensure unbiased performance evaluation.

(4): Label Harmonization and Whole Tumor Mask Generation

In the BraTS 2021 and 2018 datasets, segmentation masks comprise three distinct regions: enhancing tumor (

E T

), non-enhancing tumor core (

N E T

), and peritumoral edema (

E D

). For binary segmentation, we generated a unified whole tumor mask

M_{whole}

as in Equation (5).

M_{whole} (x, y) = ⊮_{{E T (x, y) \cup N E T (x, y) \cup E D (x, y)}},

(5)

where

⊮

denotes the binary indicator function. This harmonization facilitates a consistent comparison across all datasets.

(5): Dataset Partitioning and Cross-Validation

Each dataset was divided into 80% training, 10% validation, and 10% testing [72,73]. This configuration was selected to ensure sufficient data availability for learning complex spatial features while preserving dedicated subsets for optimization and unbiased evaluation. The 80% training portion allows the PSO-GA-driven U-Net to converge effectively on high-dimensional tumor patterns, especially important for diverse MRI modalities and inter-patient variability.

The 10% validation subset was used exclusively for fitness computation during the PSO-GA optimization process, ensuring convergence monitoring without leaking into the final test evaluation. The remaining 10% test set was strictly held out and used only for final performance assessment.

The PSO-GA optimization procedure was executed once using the predefined training-validation split to identify the optimal hyperparameter configuration. After determining the best-performing PSO-GA-U-Net model, the optimized configuration was evaluated using a five-fold cross-validation (

k = 5

) procedure to assess stability and generalization capability. Hyperparameter tuning was not performed within the cross-validation folds, ensuring that the evaluation phase remained independent of the optimization process.

During cross-validation, subject-level separation was preserved, ensuring that all slices from the same patient remained within a single fold. All reported metrics—Dice Similarity Coefficient (DSC), Jaccard Index (JI), Hausdorff Distance (HD), and Average Symmetric Surface Distance (ASSD)—were averaged across folds. Statistical analysis using paired t-tests with a significance level of

α = 0.05

was used to assess the consistency of the improvements. Because the statistical comparison is based on five cross-validation folds (

n = 5

), the resulting p-values should be interpreted as indicative rather than definitive evidence and are complemented by consistent performance trends across folds.

(6): Consistency Checks

To ensure data integrity post-augmentation, all segmentation masks were subjected to consistency validation using connected component analysis [74]. Only masks exhibiting a single contiguous tumor region were retained, and corrupted or multi-labeled instances were automatically excluded [75]. This quality control step enforces anatomical plausibility and preserves label fidelity, thereby ensuring robust training signals for the PSO-GA-U-Net framework.

(7): Preprocessing and Augmentation Pipeline Summary

The complete preprocessing and augmentation procedure is illustrated in Figure 4. The pipeline began by extracting raw axial slices from the FBTS, BraTS 2021, and BraTS 2018 datasets, encompassing four MRI modalities: T1-weighted, T2-weighted, contrast-enhanced T1 (T1CE), and FLAIR. All images were converted to grayscale, resized to a uniform resolution of

256 \times 256

pixels, and stacked into three-channel representations to ensure compatibility with convolutional operations.

The corresponding ground truth masks were also resized and binarized to facilitate binary segmentation. For training augmentation, we applied affine transformations such as horizontal flipping, random rotations (

\pm 15^{\circ}

), and zooming (scale factor in [0.9, 1.1]) to enhance model generalization and mitigate overfitting.

Subsequently, the dataset was split into training (80%), validation (10%), and testing (10%) subsets, as described in Section Dataset Partitioning and Cross-Validation, ensuring that class balance was preserved. This standardized pipeline unified heterogeneous data sources and prepared them for robust training of the PSO-GA-U-Net framework.

During the hyperparameter optimization phase, the PSO–GA framework was executed once using the training–validation subset derived from the initial 80/10/10 data partition. Candidate configurations were evaluated using the validation Dice Similarity Coefficient (DSC), while the test set remained completely isolated during the search process.

After identifying the optimal configuration, the model was retrained using the optimized parameters and evaluated through five-fold cross-validation to assess the stability of the learned configuration across different data partitions. Although nested cross-validation could provide a fully unbiased evaluation of hyperparameters, performing evolutionary optimization within each fold would significantly increase the computational cost. Therefore, cross-validation was used primarily to evaluate robustness rather than to perform nested hyperparameter selection.

3.2. PSO-GA Hybrid Framework

The PSO-GA optimization framework operates on the predefined training-validation split described in Section Dataset Partitioning and Cross-Validation. Candidate configurations are evaluated exclusively on the validation subset, and the optimization process is executed once to determine the best-performing hyperparameter configuration. The independent test set remains strictly untouched during this search phase.

To enhance the segmentation performance and generalization of the U-Net architecture, we propose a hybrid metaheuristic optimization framework that integrates Particle Swarm Optimization (PSO) with Genetic Algorithms (GAs). The PSO-GA-U-Net pipeline enables automated and adaptive tuning of critical hyperparameters, including the learning rate

(10^{- 4} to 10^{- 1})

, dropout rate

(0.0 to 0.5)

, kernel size

(3 \times 3, 5 \times 5)

, batch size

(4, 8, 16, 32, 64)

, filter configurations

({32, 64, 128, 256, 512})

, activation function (ReLU or LeakyReLU) and optimizer selection (SGD, Adam, or RMSprop). This dual-phase strategy capitalizes on the global exploration capability of PSO [76] and the adaptive refinement of GA to efficiently navigate complex search spaces [44].

3.2.1. Particle Swarm Optimization (PSO)

In PSO, a swarm of particles (candidate solutions) explores the hyperparameter space [77,78]. The position vector of each particle

x_{i}

encodes a set of hyperparameters, and its movement is guided by personal best

p_{i}^{best}

and global best

g^{best}

solutions [18,44]. The velocity update rule is expressed as Equation (6).

v_{i}^{t + 1} = ω v_{i}^{t} + c_{1} r_{1} (p_{i}^{best} - x_{i}^{t}) + c_{2} r_{2} (g^{best} - x_{i}^{t}),

(6)

where

ω

is the inertia weight,

c_{1}

and

c_{2}

are the cognitive and social coefficients, and

r_{1}, r_{2} \sim U (0, 1)

are random scalars that promote exploration.

The new position is computed as Equation (7).

x_{i}^{t + 1} = x_{i}^{t} + v_{i}^{t + 1},

(7)

with boundaries enforced to ensure feasible hyperparameter values. For categorical parameters (e.g., activation function or optimizer), we employ a velocity-guided stochastic mapping strategy. Each categorical option is encoded using either an index or a one-hot representation, and selections are made based on probabilities modulated by velocity magnitudes through a softmax-based probability transformation defined in Equation (8). This allows continuous updates to influence discrete choices, allowing consistent exploration of categorical search spaces.

To formalize this mapping, the probability of selecting the categorical option j among the k possible choices is computed using a softmax transformation of the associated velocity components (Equation (8)).

P_{j} = \frac{exp (β v_{j})}{\sum_{m = 1}^{k} exp (β v_{m})}

(8)

where

v_{j}

denotes the velocity component corresponding to option j,

β

is a temperature scaling factor controlling the sharpness of the probability distribution, and

P_{j}

represents the probability of selecting that categorical option. The final hyperparameter value is sampled from this probability distribution using multinomial selection, allowing continuous PSO updates to guide exploration within discrete search spaces.

3.2.2. Genetic Algorithm (GA) Phase

Following PSO updates, the top K particles (ranked by validation DSC) undergo GA operations to introduce diversity and refine promising solutions [53,79,80].

Selection: Elitist selection of the top $K = 10$ particles.
Crossover: Uniform crossover combines hyperparameter values from parent pairs to form offspring (Equation (9)).

$child [j] = \{\begin{matrix} {parent}_{1} [j] & if rand < 0.5, \\ {parent}_{2} [j] & otherwise . \end{matrix}$

(9)
Mutation: Each gene in the child has the probability $μ = 0.1$ of being replaced by a random value from its domain.

3.2.3. Fitness Evaluation

Each particle is evaluated by training the U-Net model for

E = 50

epochs using a given hyperparameter configuration. The objective is to maximize the validation DSC; hence, the fitness is defined in Equation (10).

Fitness (x) = - {DSC}_{val} (x) .

(10)

The negative sign ensures that the optimization algorithm, which minimizes fitness, will favor configurations with higher DSC. Since most metaheuristic optimization algorithms, including PSO and GA, are inherently designed to minimize objective functions, the DSC is negated to align with this paradigm and ensure that higher segmentation accuracy corresponds to lower fitness values.

Each candidate configuration is trained for a fixed budget of 50 epochs during both the optimization and evaluation stages. This consistent training budget ensures fair comparison across candidate solutions evaluated by the PSO–GA framework.

3.2.4. Termination Criteria

The hybrid PSO-GA process is iterated for

G = 10

generations or until convergence, as defined in Equation (11).

Δ ({DSC}_{g_best}) < 10^{- 3} for S = 5 consecutive generations .

(11)

This criterion terminates the optimization process when improvements in the global best validation DSC become negligible across successive generations, thereby avoiding redundant search iterations and reducing computational overhead while maintaining solution stability.

3.2.5. Optimization Workflow

The overall hybrid optimization process is visualized in Figure 5. It begins with random population initialization, followed by iterative PSO updates, fitness evaluation, and GA-based refinement. This dual mechanism enables both efficient exploration and exploitation of the hyperparameter landscape, achieving a robust balance between convergence speed and model generalizability.

3.2.6. Pseudocode Implementation

The procedural implementation of the PSO-GA framework is presented in Algorithm 1, detailing the iterative optimization process and the convergence mechanism.

Algorithm 1 PSO-GA Hybrid Optimization for U-Net Segmentation

Require:: Population size N, generations G, mutation rate $μ$ , convergence threshold $δ$
1:: Initialize particles ${x_{1}, \dots, x_{N}}$ randomly within the search space
2:: Initialize velocities ${v_{1}, \dots, v_{N}}$ using a uniform distribution $v_{i} \sim U (- v_{max}, v_{max})$
3:: for each particle $x_{i}$ do
4:: Train U-Net with $x_{i}$ , compute fitness $f_{i} = - {DSC}_{val} (x_{i})$
5:: Set personal best $p_{i}^{best} \leftarrow x_{i}$
6:: end for
7:: Set global best $g^{best} = arg min f_{i}$
8:: for generation $g = 1$ to G do
9:: for each particle $x_{i}$ do
10:: Update velocity $v_{i}$ via Equation (6)
11:: Update position $x_{i}$ via Equation (7)
12:: Apply boundary constraints to $x_{i}$
13:: Train U-Net, evaluate $f_{i} = - {DSC}_{val} (x_{i})$
14:: if $f_{i} < f (p_{i}^{b e s t})$ then
15:: $p_{i}^{b e s t} \leftarrow x_{i}$
16:: end if
17:: if $f_{i} < f (g^{b e s t})$ then
18:: $g^{b e s t} \leftarrow x_{i}$
19:: end if
20:: end for
21:: Select top K particles
22:: Apply crossover and mutation to generate offspring
23:: Replace the worst particles with offspring
24:: if Convergence criterion Equation (11) is met then
25:: break
26:: end if
27:: end forreturn Best configuration $g^{best}$

3.3. U-Net Architecture and Training Configuration

The core segmentation framework utilized in this study is the U-Net architecture, renowned for its encoder–decoder topology and skip connections that effectively preserve spatial resolution during feature extraction and reconstruction [14,31,41]. To improve the adaptability and generalizability of the model, the architecture is dynamically constructed using hyperparameters optimized by the proposed PSO-GA framework. These include the learning rate, dropout rate, kernel size, encoder filter sizes, activation function (e.g., ReLU or LeakyReLU), and optimizer selection (e.g. Adam, SGD, or RMSprop).

Figure 6 illustrates the baseline U-Net configuration, composed of convolutional blocks with Conv2D layers, batch normalization, activation functions, and optional dropout. In contrast, Figure 7 presents the parameterized U-Net where architectural blocks remain consistent, but hyperparameters are dynamically tuned through the PSO-GA hybrid optimization process. This modular structure enables dataset-specific customization while preserving the architectural integrity of U-Net.

3.3.1. Encoder and Decoder Design

Each encoder block consists of two sequential convolutional layers followed by batch normalization, a non-linear activation function (either ReLU or LeakyReLU), and a Dropout layer if specified [81]. The spatial resolution is reduced after each block via a strided convolution, while the number of filters progressively increases according to the optimized configuration. Let

F = [f_{1}, f_{2}, f_{3}]

denote the list of filters per block, then each encoder level

l \in {1, 2, 3}

applies (Equation (12)).

Conv 2 D (f_{l}, k \times k) \to BatchNorm \to Activation \to Dropout,

(12)

followed by a strided convolution to downsample the feature map.

At the bottleneck, a deeper convolutional block with

2 \times f_{3}

filters captures high-level semantic information before upsampling begins.

The decoder upsamples feature maps using bilinear interpolation or transposed convolutions, concatenates them with corresponding encoder features (via skip connections), and applies convolutional blocks similar to those in the encoder. This structure facilitates reconstruction of spatial detail lost during downsampling.

3.3.2. Activation and Optimization Strategy

The activation function [82,83] is selected from relu or leaky_relu, the latter using a negative slope coefficient of 0.1. Optimizer candidates include SGD, Adam, and RMSprop, with learning rates sampled from a continuous range

[10^{- 4}, 10^{- 1}]

. The optimizer is dynamically instantiated using the selected method and the learning rate (Equation (13)).

optimizer = Adam (η) or SGD (η),

(13)

where

η

is the learning rate selected during the optimization process.

3.3.3. Output Layer and Loss Function

The final output layer is a

1 \times 1

convolution followed by a sigmoid activation, producing a binary segmentation mask [84]. The network is trained using the Binary Cross-Entropy loss function (Equation (14)).

L_{BCE} = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log ({\hat{y}}_{i}) + (1 - y_{i}) \log (1 - {\hat{y}}_{i})],

(14)

where

y_{i}

and

{\hat{y}}_{i}

denote the ground truth and predicted probability at pixel i, respectively.

Although Dice-based losses directly optimize region overlap, they may produce unstable gradients when predicted foreground regions are very small or highly imbalanced, particularly during early training stages. Binary Cross-Entropy (BCE) provides smoother and more stable gradient behavior [85,86], which supports reliable convergence during network training. Therefore, BCE is used as the training loss to guide pixel-level learning.

In contrast, the Dice Similarity Coefficient (DSC) is used as the fitness metric within the PSO-GA optimization process because it directly measures region-level overlap between predicted and ground-truth tumor masks. This separation allows for stable gradient-based training while ensuring that the metaheuristic optimization remains aligned with the final segmentation objective.

Furthermore, since the PSO-GA framework evaluates configurations based on validation DSC rather than training loss alone, the optimization process remains aligned with the final segmentation objective.

3.3.4. Evaluation Metrics

To assess segmentation quality, the following metrics are monitored [87,88]:

Dice Similarity Coefficient (DSC): Equation (15)

$DSC = \frac{2 \cdot | P \cap G |}{| P | + | G |}$

(15)
Jaccard Index (JI): Equation (16)

$JI = \frac{| P \cap G |}{| P \cup G |}$

(16)
Accuracy (Acc): Equation (17)

$Acc = \frac{T P + T N}{T P + T N + F P + F N}$

(17)

These metrics capture both overlap quality and classification reliability of the predicted masks.

3.3.5. Implementation and Hardware

All models were implemented using TensorFlow and Keras. Training was performed on NVIDIA A100 GPUs (40 GB memory), enabling efficient parallel computation. The input slices were resized to

256 \times 256 \times 3

and normalized to the [0, 1] range.

The optimized hyperparameter configurations are passed as Python 3.9.5 dictionaries and dynamically applied at runtime, supporting modularity and reproducibility.

3.3.6. Example Instantiation

An example of a top-performing configuration derived from the PSO-GA optimization is shown below:

{

learning_rate: 0.0095,

dropout_rate: 0.4,

batch_size: 8,

kernel_size: (3, 3),

encoder_filters: [64, 128, 512],

activation: leaky_relu,

optimizer: sgd

}

This configuration governs U-Net instantiation and compilation with the corresponding optimizer and metrics.

3.3.7. Integrated Optimization and Training Loop

The interaction between the U-Net model and the PSO-GA hybrid optimizer is depicted in Figure 8. The PSO phase facilitates global exploration of the search space, while the GA phase enhances diversity and refines candidate solutions. Fitness is evaluated using validation DSC, and the optimization process terminates upon convergence or reaching the maximum generation threshold.

3.4. Experimental Setup

To comprehensively evaluate the proposed PSO-GA-U-Net framework, experiments were conducted on three publicly available datasets: the Figshare Brain Tumor Segmentation (FBTS) dataset, the BraTS 2021 dataset, and the BraTS 2018 dataset (including HGG and LGG subgroups). These datasets include diverse anatomical structures and imaging modalities, enabling a robust evaluation of the generalizability of the model.

All MRI scans were preprocessed using the pipeline described in Section 3.1.4. For FBTS, T1CE axial slices were extracted, while for BraTS 2021 and BraTS 2018, multimodal volumes (FLAIR, T1, T1Gd, and T2) were used. Each sample was resized to

256 \times 256

pixels, normalized to a

[0, 1]

intensity range, and stacked into three channels. The ground truth masks were also resized and binarized according to the segmentation objective (e.g., whole tumor).

The complete dataset was randomly partitioned into training, validation, and test subsets using an 80:10:10 split. This configuration ensures sufficient diversity for model training while preserving an independent test subset for unbiased evaluation. The rationale follows established machine learning practice to balance generalization of training and unbiased testing.

The partitioning was performed at the patient level to ensure that slices originating from the same patient do not appear in multiple subsets, thereby preventing data leakage between training, validation, and testing sets.

To enhance model generalization and robustness, various data augmentation strategies were applied during training, including random rotations (

\pm 15^{\circ}

), horizontal and vertical flipping, and zooming within a range of 90–

110 %

. These transformations were applied only to the training subset to avoid data leakage (Augmentation was only applied to training data to prevent performance inflation and data leakage).

Model training was implemented in Python using the TensorFlow and Keras backends on an NVIDIA A100-SXM4 GPU (40 GB of memory). Each model instance was trained for a fixed number of 50 epochs under identical training conditions to ensure a fair comparison across all candidate configurations. Batch sizes were dynamically selected from the PSO-GA-optimized hyperparameter space, ranging from 4 to 64. The Adam, SGD, or RMSprop optimizers were instantiated at runtime using the learning rates between

10^{- 4}

and

10^{- 1}

, as optimized by hybrid metaheuristic search. The PSO–GA optimization procedure was executed once using the training–validation subset derived from the initial 80/10/10 partition, while the independent test set remained completely isolated during the search process.

The evaluation pipeline computed segmentation performance using the following metrics on both validation and test sets [89]:

Dice Similarity Coefficient (DSC) (Equation (15));
Jaccard Index (JI) (Equation (16));
Hausdorff Distance (HD) (Equation (19));
Average Symmetric Surface Distance (ASSD) (Equation (20));
ROC-AUC and Accuracy (Equation (17)).

After the optimal hyperparameter configuration was identified through the PSO–GA search on the training–validation subset, the resulting configuration was evaluated using five-fold cross-validation to assess the robustness and stability of the learned model across different data partitions. In this stage, the optimized hyperparameters remained fixed, and the model was trained independently within each fold. Each fold i produced a metric

m_{i}

, and the final performance is reported as the mean across folds as defined in Equation (18).

\bar{m} = \frac{1}{5} \sum_{i = 1}^{5} m_{i}

(18)

All random seeds were fixed to ensure reproducibility and stability across runs.

HD (P, G) = max \{sup_{p \in P} inf_{g \in G} d (p, g), sup_{g \in G} inf_{p \in P} d (g, p)\}

(19)

ASSD (P, G) = \frac{1}{| P | + | G |} (\sum_{p \in P} min_{g \in G} d (p, g) + \sum_{g \in G} min_{p \in P} d (g, p))

(20)

Unit of Boundary Metrics

Hausdorff Distance (HD) and Average Symmetric Surface Distance (ASSD) were computed in pixel units based on resized

256 \times 256

axial slices. Since all images were spatially normalized during preprocessing and the voxel spacing information was not preserved in the 2D slice representation, these boundary metrics reflect relative pixel-level distances rather than physical millimeter measurements. Therefore, the HD values reported in this study should be interpreted as comparative indicators of segmentation accuracy rather than direct clinical distance measurements.

4. Results and Discussion

4.1. Optimization Results and Comparative Evaluation

4.1.1. PSO-GA-Driven Hyperparameter Tuning and Convergence Behavior

To enhance the performance and generalization of the U-Net segmentation model, we employed a hybrid Particle Swarm Optimization–Genetic Algorithm (PSO-GA) to perform global hyperparameter optimization. The objective metric guiding the search was the validation Dice Similarity Coefficient (DSC). Over 10 generations, the algorithm adaptively evolved key parameters, including learning rate, dropout, batch size, kernel size, encoder filter configuration, bottleneck size, activation function, and optimizer.

Figure 9 illustrates the convergence trend of PSO-GA using raw generation-wise best and mean DSC values of a primary hold-out validation run. This trend confirms that early exploration was efficient, with PSO exploiting local optima while the GA maintained diversity. The observed dip near Generation 7 suggests a phase of diversity injection, followed by recovery as the algorithm converged to high-performing regions.

Figure 9 focuses on a representative hold-out progression, illustrating the trajectory of the DSC values for a typical candidate configuration. Notably, the curve highlights how performance evolves over generations, reflecting the convergence behavior of the search process. A statistically richer view of the population-level behavior, showing mean ± standard deviation across all candidate configurations per generation, is provided in Section Comparative Performance of PSO, GA, and PSO-GA Metaheuristics.

To track optimization dynamics, the best-performing U-Net configuration for each generation is summarized in Table 1. These configurations reflect the evolving trade-offs between depth, regularization, and optimizer choice across generations. The highest validation DSC was achieved in Generation 10 (0.7662) with the following configuration: learning rate = 0.0095, dropout = 0.4, batch size = 32, encoder = [64, 128, 256, 512], leaky_relu activation, and the Adam optimizer. Earlier generations favored smaller batch sizes and different encoder filter arrangements, which converged on more stable, deeper hierarchies in later stages.

The correlation analysis was conducted across all candidate configurations evaluated during the PSO-GA optimization process (population size = 5 over 10 generations; total evaluations,

n = 50

). Pearson correlation coefficients were computed along with the corresponding p-values to assess statistical significance.

Further statistical evaluations provide deeper insights into the influence of individual hyperparameters. Figure 10 displays the Pearson correlation matrix between the numeric hyperparameters and the validation DSC. Among them, the learning rate exhibited a strong positive correlation with DSC (

ρ \approx 0.68

,

p < 0.001

), while the dropout rate showed a moderate negative correlation (

ρ \approx - 0.54

,

p < 0.01

). This indicates that moderately high learning rates—close to 0.0095—and lower dropout rates in the range of 0.1–0.2 are consistently associated with better segmentation performance.

Since these configurations were generated through an evolutionary optimization process, the independence assumption required by Pearson correlation is not strictly satisfied. Therefore, the reported correlations should be interpreted as descriptive trend indicators rather than strict inferential statistics.

To further examine parameter interdependencies beyond linear correlations, Figure 11 presents a scatter matrix that provides an exploratory view of parameter interdependencies. It confirms that configurations with lower dropout and higher bottleneck sizes tend to cluster around higher DSC values, suggesting potential synergy between network capacity and regularization. This relationship is non-linear and highlights the value of hybrid search over purely gradient-based tuning approaches.

Categorical parameters were evaluated using violin plots. Figure 12a shows that the leaky_relu activation function led to the most consistent and highest DSC values, followed by relu. Similarly, Figure 12b demonstrates that models trained with RMSprop performed better than those using Adam or SGD, suggesting that adaptive gradient strategies with dynamic momentum can enhance model convergence for this task. The effect of kernel size, shown in Figure 12c, revealed that

5 \times 5

kernels led to a greater variance in performance, while

3 \times 3

kernels showed more stable results. The evaluation of the encoder filters (Figure 12d) and the derived encoder depth (Figure 13) indicated that the deeper encoders (with 4 to 5 convolutional blocks) achieved higher validation DSC, supporting the importance of hierarchical feature learning in complex tumor boundaries.

Conducting such an extensive hyperparameter search required significant computational resources. The experiments were executed using a high-performance computing cluster with access to 8 × NVIDIA A100 GPUs, enabling parallel training of multiple U-Net variants. This infrastructure enabled full generational population evaluations in parallel, drastically reducing the time required for convergence. Access to this infrastructure, supported by external research funding, was vital for completing the hyperparameter optimization process within a practical timeframe. The ability to evaluate hundreds of configurations in parallel allowed comprehensive coverage of the search space and helped uncover robust trends.

Overall, the results demonstrate that PSO-GA can effectively and efficiently navigate a high-dimensional, mixed-variable search space. Compared to baseline random or grid search, the hybrid approach yielded better performance with fewer evaluations. The convergence pattern, the consistency across generations, and the diversity of solutions emphasize the strength of the algorithm to balance global exploration and local refinement.

4.1.2. Comparative Performance of PSO, GA, and PSO-GA Metaheuristics

To investigate the effectiveness of different metaheuristic strategies for optimizing the U-Net segmentation model, we conducted a comparative analysis of standalone Particle Swarm Optimization (PSO), Genetic Algorithm (GA), and the proposed hybrid PSO-GA approach (Table 2). Each method was evaluated based on identical datasets, segmentation tasks, and evaluation metrics: Dice Similarity Coefficient (DSC), Jaccard Index (IoU), validation loss, and overall accuracy.

The hybrid PSO-GA strategy outperformed both standalone methods, achieving the highest validation DSC of 0.7359, validation IoU of 0.5857, and validation accuracy of 0.9929 with a validation loss of 0.0321. The GA-based model achieved moderate results, while the PSO-based approach showed improvement but was still inferior to the hybrid configuration.

These improvements are attributed to the dual-mechanism advantage of the PSO-GA framework. While the GA promotes architectural diversity and robust exploration of the discrete hyperparameter space (e.g., activation functions and encoder filters), PSO fine-tunes continuous parameters (e.g., learning rate and dropout rate) within high-performing regions. This synergy effectively balances exploration and exploitation, reduces premature convergence, and yields more stable performance across training epochs.

In addition to the representative convergence in Figure 9, Figure 14 presents the population-level distribution of validation DSC values across ten generations using boxplots. Each box summarizes the median, interquartile range (IQR), and extremes of all candidate configurations evaluated in a given generation, while the red line shows the best-performing DSC per generation. This representation provides a more robust visualization of population diversity and distributional asymmetry than mean–standard deviation shading.

Figure 14 illustrates the convergence dynamics of the PSO-GA optimization process. In early generations (1–3), the distributions are relatively compact, indicating focused local refinement primarily guided by the PSO component. A moderate widening of the interquartile range is observed around Generations 4–6, reflecting increased exploration induced by the GA operations, which inject diversity through crossover and mutation.

A notable drop in performance occurs in Generation 7, where both the median and the best DSC decrease substantially. This behavior corresponds to a diversity-injection phase in which exploratory configurations temporarily reduce performance before refinement resumes. Importantly, the boxplot shows that this dip is not due to isolated outliers, but rather reflects a broader shift in the population distribution.

From Generation 8 onward, both the median and upper quartile steadily increase, while the interquartile range narrows. This indicates convergence toward high-performing regions of the search space. The final generation achieved the highest DSC (0.7662), showing the effectiveness of the hybrid algorithm in uncovering high-performance segmentation configurations. This value corresponds to the validation DSC obtained during the hyperparameter optimization phase and is used solely for model selection.

Overall, the boxplot-based visualization confirms that the hybrid PSO-GA strategy effectively balances exploration and exploitation. The early compact distributions highlight efficient local refinement, the mid-stage spread reflects diversity preservation, and the final narrowing indicates convergence toward robust, high-quality segmentation configurations.

4.2. Ablation Study: Evaluating Individual and Hybrid Metaheuristics

To ensure a fair comparison, GA, PSO, and PSO-GA were executed with identical optimization settings, including population size (5), number of generations (10), total fitness evaluations (

n = 50

), and fixed training budget (50 epochs per configuration). Therefore, performance differences reflect intrinsic algorithmic characteristics rather than unequal computational allocation.

To systematically evaluate the contribution of each metaheuristic component, we performed an ablation study comparing four U-Net configurations (Table 3): the baseline model (with default parameters), U-Net optimized using a Genetic Algorithm (GA-U-Net), U-Net tuned with Particle Swarm Optimization (PSO-U-Net) and hybrid PSO-GA-U-Net. All models were evaluated on identical datasets using four key metrics: Dice Similarity Coefficient (DSC), Jaccard Index (IoU), pixel-level accuracy, and validation loss.

The ablation results (Figure 15) highlight the complementary roles of the two metaheuristic components. PSO is particularly effective at optimizing continuous hyperparameters, such as learning rate and dropout rate, through velocity-guided exploration, enabling efficient convergence toward high-performing parameter regions. In contrast, GA promotes architectural diversity through crossover and mutation operations, which facilitate exploration of discrete structural parameters such as kernel size, filter configurations, and optimizer selection. The hybrid PSO-GA framework combines these advantages, enabling both fine-grained parameter tuning and broader architectural exploration. This indicates that PSO efficiently refines continuous hyperparameters, while GA maintains population diversity and prevents premature convergence, resulting in a more stable global search process.

To validate these performance improvements, two-tailed paired t-tests were conducted between each optimized model and the baseline U-Net over repeated trials. GA-U-Net exhibited marginal improvements that were not statistically significant in any metric. This suggests that the GA configuration space may have been insufficiently diverse or failed to effectively escape local optima.

However, PSO-U-Net demonstrated statistically significant gains in all metrics, with p-values of 0.004 (DSC), 0.006 (IoU), 0.012 (accuracy), and 0.008 (loss). These improvements reflect PSO’s strength in exploring continuous hyperparameter spaces, particularly learning rate and dropout, which strongly influence model convergence.

The hybrid PSO-GA-U-Net achieved the most substantial and consistent improvements, with

p < 0.001

for DSC and IoU,

p = 0.002

for accuracy, and

p = 0.003

for loss. This validates the hypothesis that combining PSO’s fine-grained parameter tuning with the GA’s architectural diversity yields a more expressive and stable optimization pathway. In particular, IoU improved by 81.72%, highlighting better pixel-wise spatial agreement between predictions and tumor masks. The 51.27% improvement in DSC further supports the increased overlap between predicted and ground truth regions.

To better interpret the accuracy metric, we examined the dataset’s pixel distribution. The dataset exhibits substantial class imbalance, with tumor pixels representing approximately 1.69% of the total pixels and background pixels accounting for 98.31%. Across individual images, tumor regions occupy on average 1.69 ± 1.38% of pixels. Under such imbalance conditions, pixel-wise accuracy may appear artificially high due to the dominance of background classification. Therefore, overlap-based metrics such as DSC and IoU provide more reliable indicators of segmentation performance.

Although accuracy showed only a modest improvement (+0.45%), it becomes less informative under strong class imbalance conditions in segmentation tasks. Therefore, its limited sensitivity is compensated by stronger DSC and IoU gains. The 8.29% loss reduction also points to better convergence behavior and a reduced risk of overfitting.

In summary, the ablation study confirms that PSO alone substantially improves segmentation performance over the baseline, and the hybrid PSO-GA strategy yields the most balanced and significant gains across all metrics. The complementary search behavior of the two algorithms enables robust exploration and stable convergence across both discrete and continuous hyperparameter dimensions—demonstrating their synergistic value in medical image segmentation. Additional evaluation details—including fold-wise metrics and metric boxplots—are provided in Appendix A. This offers full reproducibility while maintaining brevity in the main manuscript.

Computational Cost and Model Complexity

To assess the computational characteristics of the proposed optimization framework, we analyzed the parameter size and training time of each model configuration. Table 4 summarizes the parameter counts and average training times obtained during experimentation.

The baseline U-Net model contains approximately 34.5 million parameters and requires an average epoch time of 39.2 s. Optimization with the GA and PSO significantly reduced model complexity, producing architectures with 6.17 million and 7.77 million parameters, respectively. This reduction is mainly due to the adaptive exploration of filter configurations and kernel sizes within the search space.

The proposed PSO-GA-U-Net achieves a balanced architecture with approximately 17.0 million parameters while maintaining competitive training efficiency. Although the hybrid model has more parameters than the GA-U-Net and PSO-U-Net variants, it achieves superior segmentation performance while keeping computational requirements substantially lower than those of the original U-Net. These results demonstrate that the hybrid metaheuristic strategy can simultaneously improve segmentation accuracy and control architectural complexity.

4.3. Evaluation on FBTS: Augmentation and Cross-Validation Impact

To improve generalization and mitigate class-wise performance disparity, data augmentation was applied to the FBTS dataset. This step was especially important given the morphological variability and imbalance among tumor types. Segmentation performance was evaluated during training, validation, and testing for each class (meningioma, glioma, and pituitary), both with and without augmentation. The results are summarized in Table 5, and the improvements from augmentation are reported in Table 6.

Augmentation had the most profound impact on glioma segmentation, where DSC and JI increased by +21.84% and +22.37%, respectively. The morphological irregularities of gliomas make them more prone to overfitting; data augmentation helped diversify the training distribution, allowing the model to learn more generalizable boundaries. For meningioma and pituitary, which already performed strongly without augmentation, the improvements were smaller but still consistent, indicating better boundary refinement and regularization.

To evaluate the consistency and generalizability of the model, we performed a five-fold cross-validation on augmented data. Table 7 shows the fold-wise metrics for each tumor class. Meningioma and pituitary exhibited very low variance across folds, while glioma showed noticeable variance reductions compared to pre-augmentation.

Performance statistics across the five folds reveal class-specific differences in variability, as illustrated by the distributional plots in Appendix B (Figure A2). The interquartile ranges are narrow for all classes, particularly for meningioma and pituitary, indicating stable segmentation performance. Although glioma exhibits slightly higher variability, central tendencies remain strong, reflecting the effectiveness of the augmentation strategy in reducing performance fluctuations.

To validate the statistical relevance of these improvements, we performed paired t-tests and computed 95% confidence intervals for each class. The results, shown in Table 8, indicate that all classes exhibit a statistically significant improvement (

p < 0.05

) in DSC. The confidence intervals are narrow and consistent with the observed medians.

Altogether, the results confirm that data augmentation significantly improves segmentation performance, especially for morphologically complex tumors such as gliomas. The improvement is not only quantitative but also consistent across repeated splits. By pairing augmentation with 5-fold cross-validation, the model achieves both high accuracy and reliable generalization. The consistent performance patterns further affirm the strategy’s readiness for extension to larger datasets like BraTS 2021 and BraTS 2018.

4.4. Evaluation on BraTS 2021: Multi-Modality Performance Analysis

Following the robust results obtained on FBTS, we extended our evaluation to the BraTS 2021 dataset, focusing on the whole tumor segmentation across four MRI modalities: FLAIR, T1, T2, and T1CE. Using the optimized model configuration derived from the previous 5-fold cross-validation study, we evaluated the model’s behavior across training, validation, and testing to assess consistency and generalizability.

Table 9 presents the per-modality segmentation results. The FLAIR modality achieved the highest DSC (0.9406) and JI (0.8881) during testing, indicating its strong sensitivity in delineating tumor boundaries. T2 also showed excellent segmentation performance (DSC = 0.9405), marginally lower than FLAIR, supporting its diagnostic value for tumor structure delineation. T1 and T1CE, while clinically useful for tissue contrast, yielded comparatively lower segmentation scores, with T1CE scoring the lowest among all (DSC = 0.9168).

To analyze statistical robustness, we examined the distribution of Dice and Jaccard scores across modalities using violin plots (Figure A3 in Appendix C). FLAIR and T2 modalities consistently exhibit higher medians and narrower interquartile ranges, indicating strong central tendencies and limited performance dispersion. In contrast, T1CE shows a wider distribution, as reflected in lower first-quartile values and greater variability in contrast-enhanced tumor delineation.

Inferential evaluation was conducted using paired t-tests to examine whether observed differences in DSC scores were statistically significant. Confidence intervals at 95% were also computed to assess the expected performance ranges. Table 10 reports the t-statistics, p-values, and confidence intervals for each modality. All modalities demonstrated significant p-values (

p < 0.05

), indicating strong statistical confidence in the accuracy of the segmentation. FLAIR achieved the highest t-statistic (11.0254), aligning with its superior test performance and the lowest standard deviation of the DSC.

These results demonstrate the ability of the model to generalize across different MRI contrasts. Performance disparities across modalities reflect differences in anatomical visibility, resolution, and signal intensity variation. FLAIR and T2 are especially effective in capturing the lesion structure, leading to higher overlap scores and lower prediction uncertainty. T1CE’s comparatively lower performance may stem from variable enhancement patterns, which remain a challenge in segmentation learning.

The variability in the Dice and Jaccard scores across modalities reflects both the structural complexity and the contrast dependence of each MRI sequence. FLAIR and T2 consistently achieve higher segmentation fidelity, whereas T1CE exhibits greater dispersion.

4.5. Evaluation on BraTS 2018: HGG and LGG Segmentation Performance

To further evaluate the transferability of the model, the optimized segmentation framework was applied to the BraTS 2018 dataset, stratified into high-grade glioma (HGG) and low-grade glioma (LGG) cases. This evaluation enables in-depth analysis of segmentation quality across tumor grades and MRI modalities—FLAIR, T1, T2, and T1CE—without re-optimizing the architecture.

4.5.1. High-Grade Glioma (HGG) Evaluation

Table 11 summarizes the training, validation, and testing performance of the model for HGG cases across all four modalities. The best test-stage DSC and JI were achieved with the T2 modality (DSC = 0.9316, JI = 0.8720), closely followed by FLAIR (DSC = 0.9087, JI = 0.8330). These modalities are clinically valuable for capturing edema and heterogeneous tissue textures, consistent with their superior segmentation performance.

Performance statistics illustrated in Figure A4 (Appendix D) confirm that T2 and FLAIR maintain high medians with narrow interquartile ranges. The spread of DSC values is particularly tight for T2, highlighting the consistent spatial overlap across HGG samples. Meanwhile, JI values for FLAIR exhibit minimal dispersion, indicating reliable intersection quality between predicted and ground-truth tumor regions. The greater variability observed for T1CE may be attributable to heterogeneous enhancement patterns on post-contrast imaging.

The significance testing (Table 12) supports the observed trends. FLAIR and T2 exhibit the highest t-statistics and the tightest confidence intervals, reinforcing their reliability in segmenting complex infiltrative HGG lesions. The lower t-statistic of T1CE may be attributed to inconsistent enhancement patterns across the samples.

4.5.2. Low-Grade Glioma (LGG) Evaluation

Table 13 presents the LGG segmentation results. Compared to HGG, LGG segmentation performance is consistently higher across all modalities. The highest DSC and JI values in the testing were achieved by FLAIR (0.9770, 0.9550) and T1 (0.9758, 0.9528), indicating that the model better captures low-grade tumors with more homogeneous structures.

As illustrated in Figure A4 (Appendix D), the DSC and JI distributions for LGG samples are tightly clustered, particularly for T1 and T1CE modalities. These sequences exhibit minimal variance and high lower quartile values, indicating consistent tumor boundary identification across low-grade tumor morphologies.

Statistical testing (Table 14) confirms the consistency of performance. T1 achieved the highest t-statistic (14.1150) and the tightest confidence interval, confirming its exceptional consistency throughout LGG segmentation. All modalities reached

p < 0.01

, validating the statistical significance of the observed performance levels.

The complete distributional analysis is presented in Figure A4 (Appendix D), where the violin plots capture the variability of segmentation performance for both HGG and LGG cases. The results indicate that T2 and FLAIR provide the most consistent segmentation for high-grade tumors. At the same time, T1 and T1CE provide a stable and accurate delineation of low-grade gliomas, supporting their prioritization in clinical workflows.

These findings suggest that for high-grade tumors, T2 and FLAIR offer the most robust segmentation consistency. In contrast, for low-grade gliomas, T1 and T1CE provide stable, high-fidelity delineation—an insight that may inform modality prioritization in clinical workflows.

4.6. Qualitative Evaluation on FBTS: Visual and Interpretive Analysis

To complement the quantitative metrics, we conducted a detailed qualitative evaluation of the segmentation results using the FBTS dataset. Representative samples from the three tumor classes—meningioma, glioma, and pituitary—were selected to visually assess the predictive behavior of the model within the proposed PSO-GA-U-Net framework. For each sample, we present five panels: (1) the original MRI slice, (2) the ground-truth tumor boundary, (3) the predicted boundary, (4) an error heatmap overlay, and (5) an attention heatmap overlay. These visualizations are shown in Figure 16.

The model demonstrates strong boundary alignment across all classes (Table 15). For meningioma, the overlap between the predicted and actual boundaries is nearly perfect, as reflected in the extremely low HD and ASSD values (1.0000 and 0.0212, respectively). The corresponding heatmaps show minimal deviation from the ground truth and the attention map is densely concentrated over the tumor region, suggesting focused spatial encoding during inference.

Glioma segmentation posed greater challenges due to irregular tumor morphology and lower contrast, yet the model achieved robust results (DSC = 0.9444, JI = 0.8947). The predicted boundary aligns closely with the annotated mask, though the error heatmap reveals subtle discrepancies around the peripheral edges. This is further supported by slightly elevated HD (3.6056) and ASSD (0.0824), indicating minor boundary variations. Nevertheless, the attention overlay effectively highlights the entire lesion structure, capturing both central and peripheral tumor features.

For pituitary tumors, the model maintains a strong balance of precision and spatial coherence, with a DSC of 0.9590 and an ASSD of 0.0405. Visual inspection confirms compact, smooth boundary detection, with the error heatmap showing almost negligible residuals. The attention map demonstrates tight localization and contour sensitivity, reflecting the model’s ability to focus on spatially distinct tumor areas accurately.

These qualitative results validate the effectiveness of PSO-GA-U-Net in preserving boundary precision and contextual integrity across tumor types. The combination of error and attention overlays reveals a strong alignment between the internal model focus and the final predictions. This fusion of high-resolution structural learning and robust hyperparameter tuning translates into clinically valuable segmentation performance—particularly for morphologically diverse tumors such as gliomas.

The visual overlays help to not only identify segmentation quality, but also support interpretability. Regions of error concentration correlate with biologically ambiguous tumor margins, while the attention maps provide visual cues indicating where the network focuses during inference. This is especially informative in regions with subtle contrast gradients or non-uniform textures, where prediction confidence may naturally drop.

Together, these visualizations and metric correlations affirm the clinical and computational viability of our approach for precise brain tumor segmentation under challenging real-world conditions.

4.7. Qualitative Results on BraTS 2021: Visual and Metric-Based Analysis

To further validate the robustness of the PSO-GA-U-Net architecture in complex clinical data, qualitative visualizations and boundary-focused metrics were analyzed across four modalities of the BraTS 2021 dataset: FLAIR, T1, T2, and T1CE. Figure 17 illustrates representative segmentation results for each modality, displaying the original image, ground truth, and predicted boundaries, along with overlaid error and attention heatmaps.

The quantitative scores corresponding to these samples are listed in Table 16. The FLAIR modality achieved the highest DSC (0.9731) and JI (0.9476), along with the lowest HD (2.2361) and ASSD (0.0311), reflecting strong boundary alignment and minimal surface deviation. T2 followed closely, confirming its relevance in tissue structure localization. T1CE and T1, while still performing well, exhibited slightly elevated HD and ASSD values, suggesting greater spatial prediction error and boundary uncertainty in contrast-enhanced regions.

The error heatmaps reveal that most boundary inaccuracies occur along peripheral tumor zones, especially in modalities with lower signal-to-noise ratios. For example, in T1CE, false-positive regions emerge near enhancing tissue margins, likely due to local texture ambiguities and intensity overlap. In contrast, FLAIR and T2 heatmaps show sharper transitions and less dispersed error regions, underscoring their superior delineation capability.

Attention heatmaps further highlight the model’s internal focus. In FLAIR and T2, the attention modules strongly localize to active tumor regions, with dense feature activation centered on edematous zones. T1 and T1CE show broader, less focused attention spans, possibly influenced by tissue overlap and varying contrast behavior, especially in post-contrast cases.

Taken together, these qualitative findings affirm that PSO-GA-U-Net is effective in adapting to modality-specific structural variation, with FLAIR and T2 allowing stronger boundary capture. Attention behavior aligns with modality utility, reinforcing the model’s capacity to prioritize diagnostically salient regions in a biologically informed manner.

4.8. Qualitative Evaluation on BraTS 2018: Visual Analysis of HGG and LGG

To gain further insights into the boundary-aware behavior of the proposed PSO-GA-U-Net, we present a qualitative evaluation on the BraTS 2018 dataset for both high-grade glioma (HGG) and low-grade glioma (LGG) segmentation tasks. Figures for each modality (FLAIR, T1, T2, and T1CE) display a comprehensive five-panel layout: original image, ground truth boundary, predicted boundary, error heatmap overlay, and attention heatmap overlay. Performance is quantitatively supported by metrics: Dice Similarity Coefficient (DSC), Jaccard Index (JI), Hausdorff Distance (HD), and Average Symmetric Surface Distance (ASSD), as listed in Table 17.

4.8.1. HGG: Boundary Sensitivity and Modality Reliability

In HGG, segmentation is hindered by infiltrative growth patterns and irregular morphology. Qualitative overlays (see figures for HGG) show that the FLAIR and T2 modalities exhibit more stable boundary capture, particularly in the regions of peritumoral edema. This is reflected in their higher DSC and JI values (T2: 0.9604/0.9237, FLAIR: 0.9554/0.9146) and a lower ASSD (T2: 0.0488).

T1CE, while effective at highlighting and enhancing cores, tends to underperform in border delineation, as evidenced by slightly elevated HD values (7.6158) corresponding to peripheral boundary inconsistencies in the error heatmaps. Attention overlays for T2 and FLAIR align closely with the true lesion boundaries, particularly in the medial and inferior tumor zones, underscoring the network’s focus on clinically salient regions.

4.8.2. LGG: Homogeneous Boundaries and Compact Attention Spread

In contrast, LGG samples typically exhibit more homogeneous tissue morphology and lower spatial complexity. This difference leads to tighter alignment between predicted and ground truth masks, as observed across all modalities. As shown in the attention heatmaps, the proposed method maintains a concentrated focus within the core lesion zones, reducing false positives at the margins.

T1 and T1CE yielded particularly high scores (DSC ≥ 0.984, JI ≥ 0.9685) and minimal ASSD (T1CE: 0.0166), showing near-perfect region match with marginal boundary offset. HD values remained low across T2 and T1CE (2.8284 and 2.0000), confirming that the model’s predictions rarely deviate sharply from true contours. However, a high HD value for T1 (65.146) in one sample reflects localized missegmentation in a poorly contrasted region, as evidenced by the brighter zones on the error heatmap.

4.8.3. Comparative Observations: HGG vs. LGG

Comparing the two tumor grades reveals that LGG segmentation benefits from consistent morphology and clearer boundaries, leading to stronger metric performance. Attention maps for LGG display sharper, less dispersed focus areas than the broader receptive fields required in HGG cases. This reflects the model’s adaptation to the spatial ambiguity and size variability inherent in HGG lesions.

FLAIR performs reliably across both grades, balancing spatial precision and structural sensitivity. T1CE, while clinically important, exhibits greater variance, particularly for HGG, due to variability in contrast enhancement. In contrast, for LGG, its structural clarity yields precise segmentation when coupled with attention-guided refinement.

These qualitative insights underline the robustness of the proposed PSO-GA-U-Net in both high- and low-grade tumor segmentation across multimodal MR imaging. The visual overlays in Figure 18 and Figure 19 provide detailed boundary-level evidence of the spatial consistency, clinical reliability, and adaptability of the model to tumor grade differences. In particular, the attention mechanisms demonstrate effective alignment with clinically salient regions, while error heatmaps help pinpoint residual discrepancies—offering potential cues for refinement in post-processing pipelines or architectural adaptations tailored to tumor heterogeneity.

4.9. Comparison with State-of-the-Art Methods

To evaluate the robustness and precision of the proposed PSO-GA-U-Net, we compare its segmentation performance against several state-of-the-art (SOTA) models across three benchmark datasets: FBTS, BraTS 2021, and BraTS 2018. The evaluation leverages widely accepted metrics—the Dice Similarity Coefficient (DSC) and the Jaccard Index (JI)—which collectively reflect volumetric overlap and boundary precision. Detailed results are provided in Table 18, Table 19 and Table 20, together with statistical analysis where applicable.

4.9.1. FBTS Dataset: Model Precision and Boundary Localization

The FBTS dataset presents distinct class-wise structural variability across meningioma, glioma, and pituitary tumors. Table 18 outlines the segmentation results of various SOTA methods against our proposed PSO-GA-U-Net. The proposed model reports the highest Jaccard Index (0.9209) and a highly competitive DSC (0.9587), indicating both volumetric alignment and sharp spatial conformity.

The FBTS dataset consists of well-contrasted T1CE MRI slices across three tumor classes. The proposed PSO-GA-U-Net achieves a DSC of 0.9587 and a JI of 0.9209—outperforming most contemporary models in both global and class-wise segmentation accuracy. While Self-Attention U-Net achieved a slightly lower DSC (0.9327), its substantially lower JI (0.7800) suggests disproportionate overlap in segmentation, likely due to spatial over-smoothing. This discrepancy highlights that high volumetric agreement does not guarantee structural fidelity in boundary regions.

In particular, the proposed method achieves balanced improvement across both metrics, indicating not only volumetric similarity but also precise boundary localization. The inclusion of PSO for adaptive learning rate tuning promotes stable convergence during feature learning. At the same time, GA-driven dropout regulation prevents overfitting—both mechanisms enabling better generalization across varying tumor textures. A paired t-test confirms that the improvement in JI over U-Net-AG and U-Net-T-PSO is statistically significant (

p < 0.01

), highlighting the improved capability of the method to minimize false positives around complex tumor borders. Statistical differences were computed using paired t-tests over five-fold cross-validation results across all test samples.

4.9.2. BraTS 2021 Dataset: Robustness on Complex Multimodal Structures

BraTS 2021 is characterized by multimodal MR inputs (FLAIR, T1, T1CE, and T2) and heterogeneous tumor subregions. The proposed method consistently achieves the best DSC (0.9406) and JI (0.8881), outperforming both traditional CNNs and advanced transformer-based architectures. The 1.53% DSC margin over AWA-VGG-19 and a 2.32% gain over U-Net-ASPP-EVO validate the model’s adaptability in handling high-dimensional, modality-fused feature spaces.

Interestingly, models incorporating transformer blocks, such as ViT-self-attention and ViT-24, while capable of capturing long-range dependencies, struggle with fine-grained tumor margins, particularly in post-contrast FLAIR modalities. This suggests that attention alone is insufficient without adaptive learning regulation, as offered by the PSO and GA modules. The dropout-guided feature pruning employed in our method reduces false positives near ventricular boundaries—a frequent issue in BraTS datasets due to edema spread and MRI noise.

These improvements stem from the hybrid PSO-GA approach, in which PSO dynamically tunes the learning rate to accommodate gradient variability across modalities. At the same time, GA prevents co-adaptation of filters by regulating neuron dropout. This dual adaptation improves the robustness of the model to modality imbalance and spatial noise. The significance test (

p < 0.05

) confirms that overlap improvements (JI) are consistent across multiple patient cases and are not restricted to specific classes or modalities. This implies a reliable structure recovery even in post-contrast or edema-dominated slices.

4.9.3. BraTS 2018 Dataset: Performance in Mixed-Grade Tumor Cases

BraTS 2018 presents a nuanced challenge with a mix of high-grade (HGGs) and low-grade gliomas (LGGs), often exhibiting divergent spatial patterns and intensity profiles. The proposed PSO-GA-U-Net achieves the DSC of 0.9480 and the JI of 0.9024, outperforming IDSFCM, RMU-Net, and other fusion-enhanced baselines. Despite IDSFCM reporting a high JI (0.9287), the mismatch with its lower DSC (0.9418) may suggest inconsistent segmentation, possibly due to post-processing or thresholding.

The superior results of our method can be attributed to its robustness across grade variations. PSO’s ability to dynamically adapt learning rate schedules ensures effective representation learning across contrast-rich HGG cases and less-defined LGG tumors. The influence of GA on dropout encourages feature diversity, preventing overfitting on prominent subregions, and enabling accurate delineation of subtle tumor boundaries. A paired t-test reveals that the improvements over RMU-Net and U-Net-ResNet50 are significant at the

p < 0.05

level, particularly in LGG scenarios where boundary clarity is limited.

4.9.4. Overall Insights

These findings are supported by consistent quantitative improvements across all datasets. PSO-GA-U-Net not only achieves state-of-the-art DSC and JI values but also maintains high boundary precision, especially in structurally ambiguous or modality-impaired slices. Unlike many SOTA models that prioritize either overlap or boundary detail, our framework achieves an equilibrium between the two, enabled by hybrid optimization. PSO modulates learning rates in a modality-sensitive manner, while the GA promotes feature generalization through adaptive dropout. Together, these dynamics reinforce the robustness of the model across tumor classes and MRI protocols.

Figure 16, Figure 17, Figure 18 and Figure 19 visually reinforce these quantitative results, demonstrating that PSO-GA-U-Net preserves tumor shape, minimizes boundary drift, and maintains structural fidelity under varying contrast conditions.

Compared to the best-performing SOTA models:

DSC: PSO-GA-U-Net ranks first across BraTS 2021, BraTS 2018, and FBTS.
JI: Achieves the highest across all datasets.
Boundary Handling: Outperforms transformer-based models in regions with irregular geometry.

Unlike prior PSO- or GA-only variants, our integrated framework uniquely combines learning rate modulation with population-guided dropout tuning, enabling convergence acceleration and robustness against overfitting across diverse anatomical structures and acquisition protocols.

4.10. Discussion of Findings and Limitations

The experimental results demonstrate that the proposed PSO-GA-U-Net framework consistently improves segmentation performance compared to the baseline U-Net and single-metaheuristic variants. This improvement can be attributed to the complementary characteristics of the two optimization strategies. Particle Swarm Optimization effectively explores continuous hyperparameter spaces through velocity-guided updates, enabling rapid convergence toward promising regions of the parameter space. Genetic Algorithms, in contrast, promote architectural diversity through crossover and mutation operations. The combination of these mechanisms allows the hybrid framework to balance exploration and exploitation more effectively than either method alone.

Previous studies have explored the use of metaheuristic optimization to improve deep learning segmentation models. PSO-based optimization has been used to tune training parameters and improve convergence behavior [76], while GA-based approaches have been applied to architecture search and feature selection [79]. However, most existing works rely on a single optimization strategy, which may limit their ability to explore complex search spaces that contain both continuous and discrete hyperparameters.

In contrast, the proposed PSO-GA framework integrates the exploration capability of PSO with the evolutionary diversity introduced by GA operations. This hybrid strategy enables simultaneous optimization of learning parameters, architectural configurations, and training settings within a unified search process. The ablation results indicate that this combination yields more stable and effective hyperparameter exploration, leading to improved Dice and IoU scores across the evaluated datasets.

Figure 20a–c illustrate the distribution of evaluation metrics, namely, the AUC, DSC, Jaccard Index (JI), precision, recall, F1-score, and Matthews Correlation Coefficient (MCC), for each dataset. Across all three figures, the PSO-GA-U-Net maintains consistently high central tendencies (mean and median) for each metric while exhibiting narrow interquartile ranges and minimal outlier influence—indicating stable and generalizable segmentation behavior across patient samples.

Across the three datasets, the PSO-GA-U-Net consistently maintains high median values and narrow interquartile ranges for all evaluation metrics. In the FBTS dataset (Figure 20a), high values of DSC and JI indicate a strong spatial agreement across tumor classes. For BraTS 2021 (Figure 20b), the model demonstrates stable performance across multimodal inputs, reflected by high MCC and AUC values. Similarly, the BraTS 2018 results (Figure 20c) confirm the robustness of the framework for both high-grade and low-grade glioma segmentation, even when tumor boundaries are less distinct.

Limitations: Despite these strong outcomes, several limitations merit attention:

Domain Transferability: The model is primarily trained on public datasets with curated annotations. Generalizing to clinical real-world MRI images from various institutions (with scanner and protocol variations) may require domain adaptation techniques.
Computational Overhead: While the PSO-GA hybrid offers notable performance improvements, the added metaheuristic layers increase computational complexity. Real-time applications may require lightweight approximations or pruning strategies.
Class Imbalance and Rare Features: In BraTS, particularly for LGG or necrotic regions, infrequent class appearances can cause minor drops in recall. Incorporating focal loss or adaptive sampling might improve sensitivity to minority classes.
Another limitation concerns the hyperparameter optimization protocol. In this study, the PSO–GA search is executed once using the training–validation subset rather than within a fully nested cross-validation framework. While this design substantially reduces the computational cost associated with repeated evolutionary searches for deep segmentation models, it may introduce a mild bias toward the validation subset used during optimization. Therefore, the cross-validation results are interpreted primarily as an assessment of model robustness rather than a fully nested hyperparameter evaluation.

Future Work: Building on the current findings, several directions are envisioned:

Hybrid Transformer–CNN Integration: Future iterations could incorporate transformer encoders with evolutionary dropout tuning to explore long-range spatial dependencies without sacrificing convergence stability.
Multi-objective Evolutionary Optimization: Extending PSO-GA to handle trade-offs between accuracy, memory, and inference time using multi-objective fitness could yield deployment-ready segmentation models.
Clinical Deployment Studies: Evaluating the framework in longitudinal patient cohorts with clinical endpoint correlations (e.g. survival prediction and recurrence detection) can confirm real-world impact.

These findings highlight the advantage of integrating complementary metaheuristic search strategies for medical image segmentation, where both training dynamics and architectural configurations influence the final predictive performance.

5. Conclusions

In this study, we presented PSO-GA-U-Net, a hybrid deep learning framework that integrates Particle Swarm Optimization and Genetic Algorithms to enhance U-Net-based segmentation for brain tumor detection. By dynamically optimizing learning rates and dropout probabilities, the model demonstrates strong generalization capabilities and accurate boundary localization across three benchmark datasets—FBTS, BraTS 2021, and BraTS 2018. Quantitative metrics such as DSC, JI, HD, and ASSD consistently highlight the superiority of PSO-GA-U-Net over state-of-the-art methods. At the same time, qualitative results further confirm its structural precision, particularly in complex or ambiguous tumor regions.

Although the model excels in robustness and adaptability, the increased computational cost and modality-specific variation suggest areas for refinement. Future work will explore more efficient hybrid optimization strategies and the potential integration with transformer backbones to further enhance performance in real-world clinical settings. In general, PSO-GA-U-Net offers a promising direction for precision-oriented medical image segmentation through intelligent metaheuristic control.

Author Contributions

Conceptualization, S.S. and R.D.; methodology, S.S., A.Y. and R.D.; software, S.S.; validation, S.S.; formal analysis, S.S., R.T. and A.P.S.; investigation, S.S.; resources, S.S., A.Y. and R.D.; data curation, S.S.; writing—original draft preparation, S.S., A.Y. and R.D.; writing—review and editing, S.S., R.D., A.Y., R.T. and A.P.S.; visualization, S.S., R.T. and A.P.S.; supervision, A.Y. and R.D.; project administration, S.S. and R.D.; funding acquisition, S.S., R.D. and A.Y. All authors have read and agreed to the published version of the manuscript.

Funding

Research funding was provided by AGH University of Krakow (Program “Excellence initiative—research university”), ACK Cyfronet AGH (Grant no. PLG/2024/017503 and PLG/2025/018784), and Polish Ministry of Science and Higher Education funds assigned to AGH University of Krakow. We also thank Ahmad Dahlan University for its support, including the international internship (Grant no. U12/167/III/2025 and U12/1319/XII/2025).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available at (1) Figshare T1-CE MRI dataset https://figshare.com/articles/dataset/brain_tumor_dataset/1512427 (accessed on 6 March 2024), (2) BraTS 2021 https://www.cancerimagingarchive.net/analysis-result/rsna-asnr-miccai-brats-2021/ (accessed on 6 March 2024), and (3) BraTS 2018 https://www.med.upenn.edu/sbia/brats2018/data.html (accessed on 6 March 2024).

Acknowledgments

The authors would like to thank AGH University of Krakow, ACK Cyfronet AGH, Ahmad Dahlan University, the Polish Ministry of Science and Higher Education, and UPN Veteran Yogyakarta for their valuable support and contributions to this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Ablation Study Details

This appendix provides an expanded view of the ablation study in Section 4.2, including fold-wise results across five validation splits and visualizations of metric variability. This detailed breakdown offers greater insight into model stability, consistency, and inter-fold variance.

Appendix A.1. Fold-Wise Performance Metrics

Similar fold-wise tables were computed for IoU, accuracy, and loss and are available upon request.

Table A1. Fold-wise validation Dice Similarity Coefficient (DSC) for each model.

Model	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Mean ± Std
U-Net	0.4853	0.4887	0.4871	0.4850	0.4873	0.4865 ± 0.0015
GA-U-Net	0.4861	0.4892	0.4883	0.4875	0.4860	0.4874 ± 0.0013
PSO-U-Net	0.6121	0.6178	0.6135	0.6170	0.6160	0.6153 ± 0.0022
PSO-GA-U-Net	0.7321	0.7383	0.7348	0.7362	0.7381	0.7359 ± 0.0023

Appendix A.2. Metric Variability Visualization

The boxplot in Figure A1 demonstrates the robustness and consistency of the PSO-GA-U-Net, with a higher median and narrower interquartile range compared to the other variants. This suggests that its performance gains are not only statistically significant but also reliable across different subsets of the dataset.

Figure A1. Boxplot showing the fold-wise distribution of DSC scores across all models. PSO-GA-U-Net shows the highest median and lowest interquartile range, reflecting both superior and stable performance.

Appendix A.3. Confidence Interval Estimation

To quantify the uncertainty of mean DSC estimates, 95% confidence intervals were calculated using the standard error across folds. The intervals are as follows:

U-Net: $[0.4848, 0.4882]$ ;
GA-U-Net: $[0.4859, 0.4889]$ ;
PSO-U-Net: $[0.6124, 0.6182]$ ;
PSO-GA-U-Net: $[0.7328, 0.7390]$ .

These intervals indicate a clear separation between the baseline and optimized models, with PSO-GA showing the most pronounced performance margin. The small overlap between U-Net and GA-U-Net further supports the earlier observation that the GA alone yields only marginal improvements.

This appendix reinforces the reliability of the ablation study results presented in the main text. It highlights the advantage of hybrid metaheuristic optimization strategies for robust and consistent medical image segmentation.

Appendix B. Distributional Analysis of Cross-Validation Metrics

To provide more in-depth insights into the variability and distribution of segmentation performance across different tumor types, we present violin plots for Dice Similarity Coefficient (DSC) and Jaccard Index (JI) based on five-fold cross-validation results. These visualizations help identify class-wise stability, spread, and potential outliers that may not be evident in summary statistics alone.

Figure A2 presents the violin distributions of DSC and JI for meningioma, glioma, and pituitary classes. The following can be observed:

Meningioma exhibits narrow, peaked distributions in both DSC and JI, reflecting high stability across folds and low inter-fold variance. The tight interquartile range suggests consistent performance.
Glioma shows a wider spread, particularly in JI. This distribution reflects greater variability due to the morphological complexity and heterogeneity of gliomas. Although the mean performance is satisfactory, some folds yielded lower values, indicating sensitivity to data partitioning.
Pituitary performance distributions are slightly broader than meningioma but more concentrated than glioma, demonstrating strong generalization and moderate variance.

These violin plots align with the statistical analysis in Section 4.3, in which gliomas showed higher standard deviations in both DSC and JI. This motivates the use of robust augmentation strategies and stratified folds to reduce fluctuations in performance.

Figure A2. Violin plots showing the distribution of Dice Similarity Coefficient (DSC) and Jaccard Index (JI) across five folds for meningioma, glioma, and pituitary tumor classes on the FBTS dataset. These plots visualize class-wise variability and robustness.

Appendix C. Distributional Analysis of BraTS 2021 Metrics

To further support the evaluation of model performance across MRI modalities on the BraTS 2021 dataset, we conducted a distributional analysis of segmentation scores. Figure A3 presents violin plots for the Dice Similarity Coefficient (DSC) and the Jaccard Index (JI) across all modalities—FLAIR, T1, T2, and T1CE.

The plots highlight the central tendencies and variability of each modality’s performance:

FLAIR shows a tight distribution with high median values for both DSC and JI, reflecting robust segmentation performance and minimal outlier spread.
T2 also maintains a high-performing distribution, similar to FLAIR, indicating strong boundary adherence and capture of tumor structures.
T1CE shows a wider spread across both metrics, with greater variability, particularly in JI. This suggests that the model experiences inconsistent segmentation accuracy when relying solely on contrast-enhanced T1-weighted input.
T1 is between FLAIR and T1CE, with a moderate median but slightly wider interquartile range.

These visualizations reinforce the quantitative statistics provided in the main text, clarifying how modality-specific structural features influence segmentation accuracy and stability.

Figure A3. Violin Plots of Dice Similarity Coefficient (DSC) and Jaccard Index (JI) Across MRI Modalities in the BraTS 2021 Dataset. Each plot shows the distributional characteristics (median, interquartile range, and density) of scores per modality.

Appendix D. Violin Plot Visualizations of BraTS 2018 (HGG and LGG)

This appendix presents violin plots of the Dice Similarity Coefficient (DSC) and Jaccard Index (JI) scores for BraTS 2018, evaluated separately for high-grade glioma (HGG) and low-grade glioma (LGG) cases across all four MRI modalities (FLAIR, T1, T2, and T1CE). These plots highlight the distribution, variability, and density of model performance, offering a more in-depth insight into class-wise segmentation consistency.

As shown in Figure A4, the FLAIR and T2 modalities consistently demonstrate narrower distributions and higher median values, suggesting stable segmentation performance in both the HGG and LGG groups. LGG cases, however, tend to show broader JI distributions, reflecting greater variability in model responses—likely due to the subtler, less delineated lesion morphologies typical of low-grade tumors. T1CE exhibited the most dispersed performance, particularly in LGG, underscoring the challenges related to heterogeneity in contrast enhancement.

These patterns, differentiated by modality and tumor grade, reinforce the clinical value of multimodal imaging and underscore the necessity for architecture robustness across tumor types and contrast characteristics.

Figure A4. Distribution of DSC and JI Scores Across Four MRI Modalities for HGG and LGG Tumors in the BraTS 2018 Dataset. Each subplot illustrates the performance distribution via violin plots—with inner quartiles and medians highlighted. The top row corresponds to HGG and the bottom row to LGG. FLAIR and T2 show narrower dispersion and higher central tendencies, while T1CE reflects broader spread.

References

Zhang, J.; Zhang, J.; Yang, C. Autophagy in brain tumors: Molecular mechanisms, challenges, and therapeutic opportunities. J. Transl. Med. 2025, 23, 52. [Google Scholar] [CrossRef]
Batool, A.; Byun, Y.C. Brain tumor detection with integrating traditional and computational intelligence approaches across diverse imaging modalities—Challenges and future directions. Comput. Biol. Med. 2024, 175, 108412. [Google Scholar] [CrossRef]
Ghadimi, D.J.; Vahdani, A.M.; Karimi, H.; Ebrahimi, P.; Fathi, M.; Moodi, F.; Habibzadeh, A.; Khodadadi Shoushtari, F.; Valizadeh, G.; Mobarak Salari, H.; et al. Deep Learning-Based Techniques in Glioma Brain Tumor Segmentation Using Multi-Parametric MRI: A Review on Clinical Applications and Future Outlooks. J. Magn. Reson. Imaging 2025, 61, 1094–1109. [Google Scholar] [CrossRef]
Chukwujindu, E.; Faiz, H.; AI-Douri, S.; Faiz, K.; De Sequeira, A. Role of artificial intelligence in brain tumour imaging. Eur. J. Radiol. 2024, 176, 111509. [Google Scholar] [CrossRef]
Rasool, N.; Bhat, J.I. A Critical Review on Segmentation of Glioma Brain Tumor and Prediction of Overall Survival. Arch. Comput. Methods Eng. 2024, 32, 1525–1569. [Google Scholar] [CrossRef]
Datta, P.; Rohilla, R. Comprehensive Survey on Computational Techniques for Brain Tumor Detection: Past, Present and Future. Arch. Comput. Methods Eng. 2025, 32, 3241–3264. [Google Scholar] [CrossRef]
Sreedhar, D. Evaluating the Clinical Applicability of Neural Networks for Meningioma Tumor Segmentation on Multiparametric 3D MRI. In Proceedings of the 2024 International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 18–20 December 2024; IEEE: Piscataway, NJ, USA, 2025; pp. 1308–1313. [Google Scholar] [CrossRef]
Sajid Hussain, S.; Wani, N.A.; Kaur, J.; Ahmad, N.; Ahmad, S. Next-Generation Automation in Neuro-Oncology: Advanced Neural Networks for MRI-Based Brain Tumor Segmentation and Classification. IEEE Access 2025, 13, 41141–41158. [Google Scholar] [CrossRef]
Bonato, B.; Nanni, L.; Bertoldo, A. Advancing Precision: A Comprehensive Review of MRI Segmentation Datasets from BraTS Challenges (2012–2025). Sensors 2025, 25, 1838. [Google Scholar] [CrossRef]
Pani, K.; Chawla, I. Synthetic MRI in action: A novel framework in data augmentation strategies for robust multi-modal brain tumor segmentation. Comput. Biol. Med. 2024, 183, 109273. [Google Scholar] [CrossRef] [PubMed]
Shivani; Agrawal, K.K.; Agrawal, G. Precision diagnosis of brain tumors: An overview of advanced machine learning techniques. In Applications of Artificial Intelligence in 5G and Internet of Things; CRC Press: London, UK, 2025; pp. 112–118. [Google Scholar] [CrossRef]
Umarani, C.M.; Gollagi, S.; Allagi, S.; Sambrekar, K.; Ankali, S.B. Advancements in deep learning techniques for brain tumor segmentation: A survey. Inform. Med. Unlocked 2024, 50, 101576. [Google Scholar] [CrossRef]
Das, S.; Goswami, R.S. Advancements in brain tumor analysis: A comprehensive review of machine learning, hybrid deep learning, and transfer learning approaches for MRI-based classification and segmentation. Multimed. Tools Appl. 2025, 84, 26645–26682. [Google Scholar] [CrossRef]
Azad, R.; Aghdam, E.K.; Rauland, A.; Jia, Y.; Avval, A.H.; Bozorgpour, A.; Karimijafarbigloo, S.; Cohen, J.P.; Adeli, E.; Merhof, D. Medical Image Segmentation Review: The Success of U-Net. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10076–10095. [Google Scholar] [CrossRef]
Punn, N.S.; Agarwal, S. Modality specific U-Net variants for biomedical image segmentation: A survey. Artif. Intell. Rev. 2022, 55, 5845–5889. [Google Scholar] [CrossRef] [PubMed]
Siddique, N.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V. U-Net and Its Variants for Medical Image Segmentation: A Review of Theory and Applications. IEEE Access 2021, 9, 82031–82057. [Google Scholar] [CrossRef]
Du, G.; Cao, X.; Liang, J.; Chen, X.; Zhan, Y. Medical Image Segmentation based on U-Net: A Review. J. Imaging Sci. Technol. 2020, 64, 020508–1–020508–12. [Google Scholar] [CrossRef]
Saifullah, S.; Dreżewski, R. Automatic Brain Tumor Segmentation Using Convolutional Neural Networks: U-Net Framework with PSO-Tuned Hyperparameters. In Parallel Problem Solving from Nature—PPSN XVIII; Lecture Notes in Computer Science; Affenzeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tusar, T., Machado, P., Bäck, T., Eds.; Springer: Cham, Switzerland, 2024; Volume 15150, pp. 333–351. [Google Scholar] [CrossRef]
Asiri, A.A.; Shaf, A.; Ali, T.; Aamir, M.; Irfan, M.; Alqahtani, S. Enhancing brain tumor diagnosis: An optimized CNN hyperparameter model for improved accuracy and reliability. PeerJ Comput. Sci. 2024, 10, e1878. [Google Scholar] [CrossRef]
Zhang, Y.; Ngo, H.C.; Zhang, Y.; Yusof, N.F.A.; Wang, X. Imaging Segmentation of Brain Tumors Based on the Modified U-net Method. Inf. Technol. Control 2024, 53, 1074–1087. [Google Scholar] [CrossRef]
Ali, S.; Khurram, R.; Rehman, K.U.; Yasin, A.; Shaukat, Z.; Sakhawat, Z.; Mujtaba, G. An improved 3D U-Net-based deep learning system for brain tumor segmentation using multi-modal MRI. Multimed. Tools Appl. 2024, 83, 85027–85046. [Google Scholar] [CrossRef]
Malik, A.; Devarajan, G.G. Integrated Brain Tumor Detection: PSO-Guided Segmentation with U-Net and CNN Classification. Procedia Comput. Sci. 2024, 235, 3447–3457. [Google Scholar] [CrossRef]
Raza, A.; Bin Musa, S.; Shahrafidz Bin Khalid, A.; Mansoor Alam, M.; Mohd Su’ud, M.; Noor, F. Enhancing Medical Image Classification Through PSO-Optimized Dual Deterministic Approach and Robust Transfer Learning. IEEE Access 2024, 12, 177144–177159. [Google Scholar] [CrossRef]
Saifullah, S.; Dreżewski, R. Particle Swarm-Optimized U-Net Framework for Precise Multimodal Brain Tumor Segmentation. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’25 Companion), Malaga, Spain, 14–18 July 2025. [Google Scholar] [CrossRef]
Shanthi, D.L.; Chethan, N. Genetic Algorithm Based Hyper-Parameter Tuning to Improve the Performance of Machine Learning Models. SN Comput. Sci. 2022, 4, 119. [Google Scholar] [CrossRef]
Raji, I.D.; Bello-Salau, H.; Umoh, I.J.; Onumanyi, A.J.; Adegboye, M.A.; Salawudeen, A.T. Simple Deterministic Selection-Based Genetic Algorithm for Hyperparameter Tuning of Machine Learning Models. Appl. Sci. 2022, 12, 1186. [Google Scholar] [CrossRef]
Japa, L.; Serqueira, M.; MendonçA, I.; Aritsugi, M.; Bezerra, E.; González, P.H. A Population-Based Hybrid Approach for Hyperparameter Optimization of Neural Networks. IEEE Access 2023, 11, 50752–50768. [Google Scholar] [CrossRef]
Saifullah, S.; Dreżewski, R.; Yudhana, A.; Wielgosz, M.; Caesarendra, W. Modified U-Net with attention gate for enhanced automated brain tumor segmentation. Neural Comput. Appl. 2025, 37, 5521–5558. [Google Scholar] [CrossRef]
Jyothi, P.; Singh, A.R. Deep learning models and traditional automated techniques for brain tumor segmentation in MRI: A review. Artif. Intell. Rev. 2023, 56, 2923–2969. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Saifullah, S.; Dreżewski, R. GA-UNet: Genetic Algorithm-Optimized Lightweight U-Net Architecture for Multi-Sequence Brain Tumor MRI Segmentation. IEEE Access 2025, 13, 175010–175024. [Google Scholar] [CrossRef]
Jiangtao, W.; Ruhaiyem, N.I.R.; Panpan, F. A Comprehensive Review of U-Net and Its Variants: Advances and Applications in Medical Image Segmentation. IET Image Process. 2025, 19, e70019. [Google Scholar] [CrossRef]
Zhang, C.; Achuthan, A.; Himel, G.M.S. State-of-the-Art and Challenges in Pancreatic CT Segmentation: A Systematic Review of U-Net and Its Variants. IEEE Access 2024, 12, 78726–78742. [Google Scholar] [CrossRef]
Zhang, J.; Jiang, Z.; Dong, J.; Hou, Y.; Liu, B. Attention Gate ResU-Net for Automatic MRI Brain Tumor Segmentation. IEEE Access 2020, 8, 58533–58545. [Google Scholar] [CrossRef]
Zhang, Q.; Hang, Y.; Qiu, J.; Chen, H. Application of U-Net Network Utilizing Multiattention Gate for MRI Segmentation of Brain Tumors. J. Comput. Assist. Tomogr. 2024, 48, 991–997. [Google Scholar] [CrossRef] [PubMed]
Koteswara Rao Chinnam, S.; Sistla, V.; Krishna Kishore Kolli, V. Multimodal attention-gated cascaded U-Net model for automatic brain tumor detection and segmentation. Biomed. Signal Process. Control 2022, 78, 103907. [Google Scholar] [CrossRef]
Rahim Khan, W.; Mustafa Madni, T.; Iqbal Janjua, U.; Javed, U.; Attique Khan, M.; Alhaisoni, M.; Tariq, U.; Cha, J.H. A Hybrid Attention-Based Residual Unet for Semantic Segmentation of Brain Tumor. Comput. Mater. Contin. 2023, 76, 647–664. [Google Scholar] [CrossRef]
Ullah, Z.; Usman, M.; Jeon, M.; Gwak, J. Cascade multiscale residual attention CNNs with adaptive ROI for automatic brain tumor segmentation. Inf. Sci. 2022, 608, 1541–1556. [Google Scholar] [CrossRef]
Metlek, S.; Çetıner, H. ResUNet+: A New Convolutional and Attention Block-Based Approach for Brain Tumor Segmentation. IEEE Access 2023, 11, 69884–69902. [Google Scholar] [CrossRef]
Pandey, A.K.; Singh, S.P.; Chakraborty, C. Residual attention UNet GAN Model for enhancing the intelligent agents in retinal image analysis. Serv. Oriented Comput. Appl. 2024. [Google Scholar] [CrossRef]
Saifullah, S.; Dreżewski, R. Redefining brain tumor segmentation: A cutting-edge convolutional neural networks-transfer learning approach. Int. J. Electr. Comput. Eng. 2024, 14, 2583. [Google Scholar] [CrossRef]
Saifullah, S.; Dreżewski, R. Brain Tumor Segmentation Using Ensemble CNN-Transfer Learning Models: DeepLabV3plus and ResNet50 Approach. In Computational Science—ICCS 2024; Lecture Notes in Computer Science; Franco, L., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A., Eds.; Springer: Cham, Switzerland, 2024; Volume 14835, pp. 340–354. [Google Scholar] [CrossRef]
Vijay, S.; Guhan, T.; Srinivasan, K.; Vincent, P.M.D.R.; Chang, C.Y. MRI brain tumor segmentation using residual Spatial Pyramid Pooling-powered 3D U-Net. Front. Public Health 2023, 11, 1091850. [Google Scholar] [CrossRef]
Saifullah, S.; Dreżewski, R.; Yudhana, A.; Caesarendra, W.; Huda, N. Bio-Inspired Metaheuristics in Deep Learning for Brain Tumor Segmentation: A Decade of Advances and Future Directions. Information 2025, 16, 456. [Google Scholar] [CrossRef]
Siddique, A.A.; Raza, A.; Alshehri, M.S.; Alasbali, N.; Abbasi, S.F. Optimizing Tumor Classification Through Transfer Learning and Particle Swarm Optimization-Driven Feature Extraction. IEEE Access 2024, 12, 85929–85939. [Google Scholar] [CrossRef]
Liu, L.; Chang, J.; Liang, G.; Xiong, S. Simulated Quantum Mechanics-Based Joint Learning Network for Stroke Lesion Segmentation and TICI Grading. IEEE J. Biomed. Health Inform. 2023, 27, 3372–3383. [Google Scholar] [CrossRef]
Saifullah, S.; Dreżewski, R. Advanced Medical Image Segmentation Enhancement: A Particle-Swarm-Optimization-Based Histogram Equalization Approach. Appl. Sci. 2024, 14, 923. [Google Scholar] [CrossRef]
Wortmann, T. Genetic evolution vs. function approximation: Benchmarking algorithms for architectural design optimization. J. Comput. Des. Eng. 2019, 6, 414–428. [Google Scholar] [CrossRef]
Preethi, B.M.; Lekha, J.; Seethalakshmy, A.; Gokul, S. Adaptive Feature Selection for Brain Tumor Classification in MRI Images using Genetic Algorithm Polar Bear Optimization and SVM. J. Comput. Anal. Appl. 2024, 33, 356–375. [Google Scholar]
Saifullah, S.; Dreżewski, R. Optimizing U-Net Architecture Using Differential Evolution for Brain Tumor Segmentation. In Computational Science—ICCS 2025; Lecture Notes in Computer Science; Lees, M.H., Cai, W., Cheong, S.A., Su, Y., Abramson, D., Dongarra, J.J., Sloot, P.M., Eds.; Springer: Cham, Switzerland, 2025; Volume 15906, pp. 403–411. [Google Scholar] [CrossRef]
Chihaoui, M.; Dhibi, N.; Ferchichi, A. Optimization of convolutional neural network and visual geometry group-16 using genetic algorithms for pneumonia detection. Front. Med. 2024, 11, 1498403. [Google Scholar] [CrossRef] [PubMed]
Wei, J.; Zhu, G.; Fan, Z.; Liu, J.; Rong, Y.; Mo, J.; Li, W.; Chen, X. Genetic U-Net: Automatically Designed Deep Networks for Retinal Vessel Segmentation Using a Genetic Algorithm. IEEE Trans. Med. Imaging 2022, 41, 292–307. [Google Scholar] [CrossRef]
Khouy, M.; Jabrane, Y.; Ameur, M.; Hajjam El Hassani, A. Medical Image Segmentation Using Automatic Optimized U-Net Architecture Based on Genetic Algorithm. J. Pers. Med. 2023, 13, 1298. [Google Scholar] [CrossRef]
Dubey, R.; Agrawal, J. An Improved Genetic Algorithm for Automated Convolutional Neural Network Design. Intell. Autom. Soft Comput. 2022, 32, 747–763. [Google Scholar] [CrossRef]
Anwaar, A.; Ashraf, A.; Bangyal, W.H.K.; Iqbal, M. Genetic Algorithms: Brief review on Genetic Algorithms for Global Optimization Problems. In Proceedings of the 2022 Human-Centered Cognitive Systems (HCCS), Shanghai, China, 17–18 December 2022; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
Jiang, P.; Xue, Y.; Neri, F. Continuously evolving dropout with multi-objective evolutionary optimisation. Eng. Appl. Artif. Intell. 2023, 124, 106504. [Google Scholar] [CrossRef]
Arif, M.; Jims, A.; F., A.; Geman, O.; Craciun, M.D.; Leuciuc, F. Application of Genetic Algorithm and U-Net in Brain Tumor Segmentation and Classification: A Deep Learning Approach. Comput. Intell. Neurosci. 2022, 2022, 5625757. [Google Scholar] [CrossRef]
Ghazouani, F.; Vera, P.; Ruan, S. Efficient brain tumor segmentation using Swin transformer and enhanced local self-attention. Int. J. Comput. Assist. Radiol. Surg. 2023, 19, 273–281. [Google Scholar] [CrossRef]
Nancy, A.M.; Maheswari, R. Brain tumor segmentation and classification using transfer learning based CNN model with model agnostic concept interpretation. Multimed. Tools Appl. 2024, 84, 2509–2538. [Google Scholar] [CrossRef]
Jiang, Z.; Ding, C.; Liu, M.; Tao, D. Two-Stage Cascaded U-Net: 1st Place Solution to BraTS Challenge 2019 Segmentation Task. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Lecture Notes in Computer Science; Crimi, A., Bakas, S., Eds.; Springer: Cham, Switzerland, 2020; pp. 231–241. [Google Scholar] [CrossRef]
Yousef, R.; Khan, S.; Gupta, G.; Albahlal, B.M.; Alajlan, S.A.; Ali, A. Bridged-U-Net-ASPP-EVO and Deep Learning Optimization for Brain Tumor Segmentation. Diagnostics 2023, 13, 2633. [Google Scholar] [CrossRef]
Cheng, J.; Huang, W.; Cao, S.; Yang, R.; Yang, W.; Yun, Z.; Wang, Z.; Feng, Q. Enhanced Performance of Brain Tumor Classification via Tumor Region Augmentation and Partition. PLoS ONE 2015, 10, e0140381. [Google Scholar] [CrossRef] [PubMed]
Baid, U.; Ghodasara, S.; Mohan, S.; Bilello, M.; Calabrese, E.; Colak, E.; Farahani, K.; Kalpathy-Cramer, J.; Kitamura, F.C.; Pati, S.; et al. RSNA-ASNR-MICCAI-BraTS-2021 Dataset. 2023. Available online: https://www.cancerimagingarchive.net/analysis-result/rsna-asnr-miccai-brats-2021/ (accessed on 6 March 2024). [CrossRef]
Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Van Leemput, K.; et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 2015, 34, 1993–2024. [Google Scholar] [CrossRef]
Bakas, S.; Akbari, H.; Sotiras, A.; Bilello, M.; Rozycki, M.; Kirby, J.S.; Freymann, J.B.; Farahani, K.; Davatzikos, C. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data 2017, 4, 170117. [Google Scholar] [CrossRef] [PubMed]
Bakas, S.; Reyes, M.; Jakab, A.; Bauer, S.; Rempfler, M.; Crimi, A.; Shinohara, R.T.; Berger, C.; Ha, S.M.; Rozycki, M.; et al. Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge. arXiv 2018, arXiv:1811.02629. [Google Scholar] [CrossRef]
Abou Ali, M.; Charafeddine, J.; Dornaika, F.; Arganda-Carreras, I. Enhancing Generalization and Mitigating Overfitting in Deep Learning for Brain Cancer Diagnosis from MRI. Appl. Magn. Reson. 2025, 56, 359–394. [Google Scholar] [CrossRef]
Tran, A.T.; Zeevi, T.; Payabvash, S. Strategies to Improve the Robustness and Generalizability of Deep Learning Segmentation and Classification in Neuroimaging. BioMedInformatics 2025, 5, 20. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Ammari, S.; Balleyguier, C.; Lassau, N.; Chouzenoux, E. Impact of Preprocessing and Harmonization Methods on the Removal of Scanner Effects in Brain MRI Radiomic Features. Cancers 2021, 13, 3000. [Google Scholar] [CrossRef]
Vitale, S.; Orlando, J.I.; Iarussi, E.; Díaz, A.; Larrabide, I. Improving realism in abdominal ultrasound simulation combining a segmentation-guided loss and polar coordinates training. Med. Phys. 2025, 52, 4540–4556. [Google Scholar] [CrossRef]
Parimanam, K.; Lakshmanan, L.; Palaniswamy, T. Hybrid optimization based learning technique for multi-disease analytics from healthcare big data using optimal pre-processing, clustering and classifier. Concurr. Comput. Pract. Exp. 2022, 34, e6986. [Google Scholar] [CrossRef]
Mahmud, M.I.; Mamun, M.; Abdelgawad, A. A Deep Analysis of Brain Tumor Detection from MR Images Using Deep Learning Networks. Algorithms 2023, 16, 176. [Google Scholar] [CrossRef]
Ghaffar, A.; Javid, M.A.; Yaseen, K.; Ali, N.; Arshad, S.; El-Bahkiry, H.S.; Hassani, M.K.; Akgül, A. Innovative fusion: MRSI-guided brain tumour classification via integrated image segmentation and GLCM feature extraction. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2025, 13, 2479707. [Google Scholar] [CrossRef]
Pal, A.; Kruk, J.; Phute, M.; Bhattaram, M.; Yang, D.; Chau, D.H.; Hoffman, J. Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors. In Proceedings of the Advances in Neural Information Processing Systems; Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C., Eds.; Neural Information Processing Systems Foundation, Inc.: San Diego, CA, USA, 2024; Volume 37, pp. 118025–118051. [Google Scholar]
Singha, A.; Thakur, R.S.; Patel, T. Deep Learning Applications in Medical Image Analysis. In Biomedical Data Mining for Information Retrieval; Wiley: Hoboken, NJ, USA, 2021; pp. 293–350. [Google Scholar] [CrossRef]
Pang, J.; Li, X.; Han, S. PSO with Mixed Strategy for Global Optimization. Complexity 2023, 2023, 7111548. [Google Scholar] [CrossRef]
Abualigah, L.; Sheikhan, A.; M. Ikotun, A.; Zitar, R.A.; Alsoud, A.R.; Al-Shourbaji, I.; Hussien, A.G.; Jia, H. Particle swarm optimization algorithm: Review and applications. In Metaheuristic Optimization Algorithms; Elsevier: Amsterdam, The Netherlands, 2024; pp. 1–14. [Google Scholar] [CrossRef]
Gad, A.G. Particle Swarm Optimization Algorithm and Its Applications: A Systematic Review. Arch. Comput. Methods Eng. 2022, 29, 2531–2561. [Google Scholar] [CrossRef]
Dar, M.F.; Ganivada, A. Deep learning and genetic algorithm-based ensemble model for feature selection and classification of breast ultrasound images. Image Vis. Comput. 2024, 146, 105018. [Google Scholar] [CrossRef]
El Abassi, F.; Darouichi, A.; Ouaarab, A. Refining U-Net Architecture Through Genetic Algorithms for Improved Skin Lesion Image Segmentation. In Intelligent Systems and Pattern Recognition. ISPR 2024. Communications in Computer and Information Science; Bennour, A., Bouridane, A., Almaadeed, S., Bouaziz, B., Edirisinghe, E., Eds.; Springer: Cham, Switzerland, 2025; Volume 2305, pp. 135–149. [Google Scholar] [CrossRef]
Das, S.; Swain, M.K.; Nayak, G.K.; Saxena, S.; Satpathy, S.C. Effect of learning parameters on the performance of U-Net Model in segmentation of Brain tumor. Multimed. Tools Appl. 2022, 81, 34717–34735. [Google Scholar] [CrossRef]
Maniatopoulos, A.; Mitianoudis, N. Learnable Leaky ReLU (LeLeLU): An Alternative Accuracy-Optimized Activation Function. Information 2021, 12, 513. [Google Scholar] [CrossRef]
Varshney, M.; Singh, P. Optimizing nonlinear activation function for convolutional neural networks. Signal Image Video Process. 2021, 15, 1323–1330. [Google Scholar] [CrossRef]
Krithika alias AnbuDevi, M.; Suganthi, K. Review of Semantic Segmentation of Medical Images Using Modified Architectures of UNET. Diagnostics 2022, 12, 3064. [Google Scholar] [CrossRef] [PubMed]
Liu, L.; Qi, H. Learning Effective Binary Descriptors via Cross Entropy. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1251–1258. [Google Scholar] [CrossRef]
Shu, H. Symmetrization weighted binary cross-entropy: Modeling perceptual asymmetry for human-consistent neural edge detection. Appl. Soft Comput. 2026, 192, 114750. [Google Scholar] [CrossRef]
Saifullah, S.; Dreżewski, R.; Yudhana, A. Advanced brain tumor segmentation using DeepLabV3Plus with Xception encoder on a multi-class MR image dataset. Multimed. Tools Appl. 2025, 84, 38071–38092. [Google Scholar] [CrossRef]
Irfan, M.; Shaf, A.; Ali, T.; Farooq, U.; Rahman, S.; Nasar Faraj Mursal, S.; Jalalah, M.; M. Alqhtani, S.; AlShorman, O. Effectiveness of Deep Learning Models for Brain Tumor Classification and Segmentation. Comput. Mater. Contin. 2023, 76, 711–729. [Google Scholar] [CrossRef]
Saifullah, S.; Dreżewski, R.; Yudhana, A.; Suryotomo, A.P. Automatic Brain Tumor Segmentation: Advancing U-Net with ResNet50 Encoder for Precise Medical Image Analysis. IEEE Access 2025, 13, 43473–43489. [Google Scholar] [CrossRef]
Murmu, A.; Kumar, P. A novel Gateaux derivatives with efficient DCNN-Resunet method for segmenting multi-class brain tumor. Med. Biol. Eng. Comput. 2023, 61, 2115–2138. [Google Scholar] [CrossRef]
Kumar Tiwary, P.; Johri, P.; Katiyar, A.; Chhipa, M.K. Deep Learning-Based MRI Brain Tumor Segmentation with EfficientNet-Enhanced UNet. IEEE Access 2025, 13, 54920–54937. [Google Scholar] [CrossRef]
Preetha, R.; Jasmine Pemeena Priyadarsini, M.; Nisha, J.S. Brain tumor segmentation using multi-scale attention U-Net with EfficientNetB4 encoder for enhanced MRI analysis. Sci. Rep. 2025, 15, 9914. [Google Scholar] [CrossRef]
Davar, S.; Fevens, T. Enhanced U-Net Architecture for Brain Tumour Localization and Segmentation in T1-Weighted MRI. IEEE Trans. Circuits Syst. II Express Briefs 2025, 72, 993–997. [Google Scholar] [CrossRef]
Xiong, M.; Wu, A.; Yang, Y.; Fu, Q. Efficient Brain Tumor Segmentation for MRI Images Using YOLO-BT. Sensors 2025, 25, 3645. [Google Scholar] [CrossRef]
Alkhalid, F.F.; Salih, N.Z. Implementation of biomedical segmentation for brain tumor utilizing an adapted U-net model. Comput. Biol. Med. 2025, 194, 110531. [Google Scholar] [CrossRef]
El Badaoui, R.; Bonmati Coll, E.; Psarrou, A.; Asaturyan, H.A.; Villarini, B. Enhanced CATBraTS for Brain Tumour Semantic Segmentation. J. Imaging 2025, 11, 8. [Google Scholar] [CrossRef]
Sachdeva, J.; Sharma, D.; Ahuja, C.K. Multiscale segmentation net for segregating heterogeneous brain tumors: Gliomas on multimodal MR images. Image Vis. Comput. 2024, 149, 105191. [Google Scholar] [CrossRef]
Hernandez-Gutierrez, F.D.; Avina-Bravo, E.G.; Zambrano-Gutierrez, D.F.; Almanza-Conejo, O.; Ibarra-Manzano, M.A.; Ruiz-Pinales, J.; Ovalle-Magallanes, E.; Avina-Cervantes, J.G. Brain Tumor Segmentation from Optimal MRI Slices Using a Lightweight U-Net. Technologies 2024, 12, 183. [Google Scholar] [CrossRef]
Mojtahedi, R.; Hamghalam, M.; Simpson, A.L. Multi-modal Brain Tumour Segmentation Using Transformer with Optimal Patch Size. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Lecture Notes in Computer Science; Bakas, S., Crimi, A., Baid, U., Malec, S., Pytlarz, M., Baheti, B., Zenk, M., Dorent, R., Eds.; Springer: Cham, Switzerland, 2023; Volume 13769, pp. 195–204. [Google Scholar] [CrossRef]
Bouchet, P.; Deloges, J.B.; Canton-Bacara, H.; Pusel, G.; Pinot, L.; Elbaz, O.; Boutry, N. An Efficient Cascade of U-Net-Like Convolutional Neural Networks Devoted to Brain Tumor Segmentation. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Lecture Notes in Computer Science; Bakas, S., Crimi, A., Baid, U., Malec, S., Pytlarz, M., Baheti, B., Zenk, M., Dorent, R., Eds.; Springer: Cham, Switzerland, 2023; Volume 13769, pp. 149–161. [Google Scholar] [CrossRef]
Sadique, M.S.; Rahman, M.M.; Farzana, W.; Temtam, A.; Iftekharuddin, K.M. Brain Tumor Segmentation Using Neural Ordinary Differential Equations with UNet-Context Encoding Network. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Lecture Notes in Computer Science; Bakas, S., Crimi, A., Baid, U., Malec, S., Pytlarz, M., Baheti, B., Zenk, M., Dorent, R., Eds.; Springer: Cham, Switzerland, 2023; pp. 205–215. [Google Scholar] [CrossRef]
Hu, K.; Gan, Q.; Zhang, Y.; Deng, S.; Xiao, F.; Huang, W.; Cao, C.; Gao, X. Brain Tumor Segmentation Using Multi-Cascaded Convolutional Neural Networks and Conditional Random Field. IEEE Access 2019, 7, 92615–92629. [Google Scholar] [CrossRef]
Li, X.; Jiang, Y.; Li, M.; Zhang, J.; Yin, S.; Luo, H. MSFR-Net: Multi-modality and single-modality feature recalibration network for brain tumor segmentation. Med. Phys. 2023, 50, 2249–2262. [Google Scholar] [CrossRef] [PubMed]
Dash, S.; Mishra, S.; Siddique, M.; Gelmecha, D.J.; Singh, R.S. Improved Deviation Sparse Fuzzy C-Means-2D Cumulative Sum Average Filter and Modified Sine Cosine Crow Search Algorithm-Wavelet Extreme Learning Machine for Brain Tumor Detection and Classification. Appl. Comput. Intell. Soft Comput. 2025, 2025, 9991264. [Google Scholar] [CrossRef]
Saeed, M.U.; Ali, G.; Bin, W.; Almotiri, S.H.; AlGhamdi, M.A.; Nagra, A.A.; Masood, K.; Amin, R.U. RMU-Net: A Novel Residual Mobile U-Net Model for Brain Tumor Segmentation from MR Images. Electronics 2021, 10, 1962. [Google Scholar] [CrossRef]
Zhou, T.; Wang, Z.; Liu, X.; Liu, W.; Zhu, S. Learning deep feature representations for multi-modal MR brain tumor segmentation. Neurocomputing 2025, 638, 130162. [Google Scholar] [CrossRef]
Zhou, C.; Chen, S.; Ding, C.; Tao, D. Learning Contextual and Attentive Information for Brain Tumor Segmentation. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Lecture Notes in Computer Science; Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T., Eds.; Springer: Cham, Switzerland, 2019; Volume 11384, pp. 497–507. [Google Scholar] [CrossRef]
Ullah, F.; Ansari, S.U.; Hanif, M.; Ayari, M.A.; Chowdhury, M.E.H.; Khandakar, A.A.; Khan, M.S. Brain MR Image Enhancement for Tumor Segmentation Using 3D U-Net. Sensors 2021, 21, 7528. [Google Scholar] [CrossRef]
Wang, G.; Li, W.; Ourselin, S.; Vercauteren, T. Automatic Brain Tumor Segmentation Using Convolutional Neural Networks with Test-Time Augmentation. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Lecture Notes in Computer Science; Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T., Eds.; Springer: Cham, Switzerland, 2019; Volume 11384, pp. 61–72. [Google Scholar] [CrossRef]
Albiol, A.; Albiol, A.; Albiol, F. Extending 2D Deep Learning Architectures to 3D Image Segmentation Problems. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Lecture Notes in Computer Science; Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T., Eds.; Springer: Cham, Switzerland, 2019; Volume 11384, pp. 73–82. [Google Scholar] [CrossRef]
Rehman, M.U.; Cho, S.; Kim, J.; Chong, K.T. BrainSeg-Net: Brain Tumor MR Image Segmentation via Enhanced Encoder–Decoder Network. Diagnostics 2021, 11, 169. [Google Scholar] [CrossRef]
Dong, H.; Yang, G.; Liu, F.; Mo, Y.; Guo, Y. Automatic Brain Tumor Detection and Segmentation Using U-Net Based Fully Convolutional Networks. In Medical Image Understanding and Analysis. MIUA 2017. Communications in Computer and Information Science; Valdés Hernández, M., González-Castro, V., Eds.; Springer: Cham, Switzerland, 2017; Volume 723, pp. 506–517. [Google Scholar] [CrossRef]

Figure 3. Representative axial slices from the BraTS 2018 dataset across both HGGs and LGGs cases, including modality-specific images and binary whole tumor masks.

Figure 4. Preprocessing and augmentation steps applied to MRI slices: grayscale conversion, resizing, three-channel stacking, binary mask generation, training-only augmentation, and final 80/10/10 dataset split.

Figure 5. Flowchart of the PSO-GA Hybrid Optimization Process for U-Net Hyperparameter Tuning. The algorithm begins with a random initialization of particles that encode the learning rate, dropout, and architecture parameters. PSO updates guide exploration, while the GA refines the population through crossover and mutation. The process iterates until the convergence criteria are met, producing an optimized configuration for tumor segmentation.

Figure 6. Baseline U-Net framework showing the encoder–decoder structure, convolutional layers, skip connections, and segmentation output.

Figure 7. Parameterized U-Net where architectural components are preserved but key hyperparameters (learning rate, kernel size, activation, dropout, and optimizer) are dynamically optimized via the PSO-GA framework.

Figure 8. PSO-GA Optimization Loop for U-Net Hyperparameter Tuning. Candidate hyperparameters are proposed by the PSO-GA framework, evaluated by training the U-Net model, and iteratively refined until convergence.

Figure 9. Convergence of Mean and Best Validation Dice Scores Over 10 Generations of PSO-GA. Rapid improvements in early generations indicate efficient exploration.

Figure 10. Heatmap of Pearson Correlation Coefficients Between Numeric Hyperparameters and Validation DSC. The learning and dropout rates show the strongest associations.

Figure 11. Pairwise Scatter Matrix of Selected Hyperparameters vs. Validation DSC. The interaction between dropout, bottleneck size, and learning rate influences final performance.

Figure 12. Violin plots illustrating the influence of architectural and training hyperparameters on validation Dice Similarity Coefficient (DSC): (a) activation functions, where leaky_relu achieves higher and more consistent performance; (b) optimizer-wise validation DSC, showing superior results with RMSprop; (c) effect of kernel size on validation DSC, where

(5, 5)

exhibits greater variance while

(3, 3)

provides more stable performance; (d) validation DSC as a function of encoder depth, indicating higher median scores for deeper encoder configurations.

Figure 12. Violin plots illustrating the influence of architectural and training hyperparameters on validation Dice Similarity Coefficient (DSC): (a) activation functions, where leaky_relu achieves higher and more consistent performance; (b) optimizer-wise validation DSC, showing superior results with RMSprop; (c) effect of kernel size on validation DSC, where

(5, 5)

exhibits greater variance while

(3, 3)

provides more stable performance; (d) validation DSC as a function of encoder depth, indicating higher median scores for deeper encoder configurations.

Figure 13. DSC distribution across encoder depths in U-Net configurations discovered via PSO-GA.

Figure 14. Convergence curve of the PSO-GA optimization showing mean validation DSC

\pm σ

standard deviation (shaded) across all candidate configurations in each generation. This complements Figure 9 by illustrating population-level performance distribution rather than only hold-out evolution.

Figure 14. Convergence curve of the PSO-GA optimization showing mean validation DSC

\pm σ

standard deviation (shaded) across all candidate configurations in each generation. This complements Figure 9 by illustrating population-level performance distribution rather than only hold-out evolution.

Figure 15. Performance Comparison of U-Net Models for Different Optimization Strategies. Improvements over the baseline U-Net are shown in parentheses.

Figure 16. Qualitative Segmentation Results on the FBTS Dataset for (a) Meningioma, (b) Glioma, and (c) Pituitary Cases. Columns represent the original image, ground truth boundary (red), predicted boundary (yellow), error heatmap overlay, and attention heatmap overlay.

Figure 17. Qualitative Segmentation Results Obtained Using PSO-GA-U-Net on the BraTS 2021 Dataset. Each row corresponds to a different MRI modality: (a) FLAIR, (b) T1, (c) T2, and (d) T1CE. From left to right—columns show the original image, ground truth boundary (red), predicted boundary (yellow), error heatmap overlay, and attention heatmap overlay.

Figure 18. Qualitative Segmentation Results for BraTS 2018 High-Grade Glioma (HGG) Cases Across Four MRI Modalities: (a) FLAIR, (b) T1, (c) T2, and (d) T1CE. From left to right—each row shows the original image, ground truth boundary (red), predicted boundary (yellow), error heatmap overlay, and attention heatmap overlay.

Figure 19. Qualitative Segmentation Results for BraTS 2018 Low-Grade Glioma (LGG) Cases Across Four MRI Modalities: (a) FLAIR, (b) T1, (c) T2, and (d) T1CE. From left to right—each row shows the original image, ground truth boundary (red), predicted boundary (yellow), error heatmap overlay, and attention heatmap overlay.

Figure 20. Model performance metrics distribution on the (a) FBTS, (b) BraTS 2021, and (c) BraTS 2018 datasets.

Table 1. Best-performing U-Net configurations selected from each generation of the PSO-GA optimization process. Validation Dice Similarity Coefficient (DSC) is reported as the fitness value.

Gen	LR	DO	BS	KS	Encoder Layers	BT	Activation	Optimizer	DSC
1	0.0725	0.4	8	(3, 3)	[32, 64, 128, 256]	128	leaky_relu	rmsprop	0.7524
2	0.0095	0.1	8	(3, 3)	[64, 128, 256, 512]	128	leaky_relu	rmsprop	0.7086
3	0.0095	0.4	8	(3, 3)	[64, 128, 512]	1024	elu	rmsprop	0.6925
4	0.0095	0.4	8	(3, 3)	[64, 128, 512]	128	leaky_relu	rmsprop	0.6728
5	0.0095	0.4	8	(3, 3)	[64, 128, 512]	128	leaky_relu	rmsprop	0.6955
6	0.0624	0.5	8	(3, 3)	[32, 64, 128, 512]	128	leaky_relu	rmsprop	0.7023
7	0.0095	0.4	8	(3, 3)	[64, 128, 512]	128	elu	sgd	0.1593
8	0.0095	0.2	32	(3, 3)	[64, 128, 512]	128	leaky_relu	adam	0.4187
9	0.0095	0.4	32	(3, 3)	[64, 128, 512]	128	leaky_relu	adam	0.6747
10	0.0095	0.4	32	(3, 3)	[64, 128, 256, 512]	128	leaky_relu	adam	0.7662

Table 2. Comparison of final validation metrics across GA, PSO, and PSO-GA optimized U-Net models.

Method	DSC	IoU	Accuracy	Loss
GA	0.4874	0.3231	0.9886	0.0342
PSO	0.6153	0.4510	0.9912	0.0308
PSO-GA (Hybrid)	0.7359	0.5857	0.9929	0.0321

Table 3. Performance comparison of U-Net models for different optimization strategies. Improvements over the baseline U-Net are shown in parentheses. Statistical significance compared to U-Net was assessed using paired t-tests (

p < 0.05

marked with *; not significant marked with n.s.).

Table 3. Performance comparison of U-Net models for different optimization strategies. Improvements over the baseline U-Net are shown in parentheses. Statistical significance compared to U-Net was assessed using paired t-tests (

p < 0.05

marked with *; not significant marked with n.s.).

Model	DSC	IoU	Accuracy	Loss
U-Net	0.4865	0.3223	0.9885	0.0350
GA-U-Net	0.4874 (+0.19%) ^n.s.	0.3231 (+0.25%) ^n.s.	0.9886 (+0.01%) ^n.s.	0.0342 (−2.29%) ^n.s.
PSO-U-Net	0.6153 (+26.47%) *	0.4510 (+39.93%) *	0.9912 (+0.27%) *	0.0308 (−11.85%) *
PSO-GA-U-Net	0.7359 (+51.27%) *	0.5857 (+81.72%) *	0.9929 (+0.45%) *	0.0321 (−8.29%) *

Table 4. Computational complexity and training efficiency of the evaluated models.

Model	Parameters	Epoch Time (s)	Training Time (50 Epochs)
U-Net	34,513,410	39.2	32.7 min
GA-U-Net	6,166,433	30.0	25.0 min
PSO-U-Net	7,771,873	23.0	30.0 min
PSO-GA-U-Net	17,000,193	37.0	30.83 min

Table 5. Segmentation results with and without augmentation across the FBTS dataset.

Class	Augment	Training				Validation
Class	Augment	Acc	Loss	DSC	JI	Acc	Loss	DSC	JI
Meningioma	No	0.9971	0.0084	0.8624	0.7600	0.9966	0.0094	0.8728	0.7755
Meningioma	Yes	0.9985	0.0039	0.9327	0.8743	0.9979	0.0064	0.9257	0.8620
Glioma	No	0.9904	0.0247	0.6722	0.5096	0.9836	0.0537	0.4782	0.3207
Glioma	Yes	0.9935	0.0166	0.7857	0.6503	0.9908	0.0281	0.7132	0.5597
Pituitary	No	0.9983	0.0043	0.8462	0.7344	0.9978	0.0063	0.8279	0.7095
Pituitary	Yes	0.9986	0.0034	0.8778	0.7830	0.9981	0.0057	0.8573	0.7520

Table 6. Performance gains due to augmentation (Yes–No). Negative loss values indicate improvement.

Class	Validation				Testing
Class	$Δ$ Acc	$Δ$ Loss	$Δ$ DSC	$Δ$ JI	$Δ$ Acc	$Δ$ Loss	$Δ$ DSC	$Δ$ JI
Meningioma	+0.0013	−0.0030	+0.0529	+0.0865	+0.0010	−0.0019	+0.0394	+0.0677
Glioma	+0.0072	−0.0256	+0.2350	+0.2390	+0.0051	−0.0211	+0.2184	+0.2237
Pituitary	+0.0003	−0.0006	+0.0294	+0.0425	+0.0005	−0.0013	+0.0429	+0.0644

Table 7. Five-fold cross-validation results on augmented FBTS dataset.

Class	Fold	Accuracy	Loss	DSC	JI
Meningioma	F1–F5 Avg	0.9992	0.0018	0.9698	0.9414
Glioma	F1–F5 Avg	0.9970	0.0081	0.9054	0.8280
Pituitary	F1–F5 Avg	0.9994	0.0013	0.9481	0.9014

Table 8. Paired t-test results and 95% confidence intervals for DSC values across 5 folds.

Class	T-Stat	p Value	95% CI [Loc, Scale]
Meningioma	8.0331	0.0013	[0.9473, 0.9765]
Glioma	7.9285	0.0014	[0.7808, 0.9379]
Pituitary	7.9872	0.0013	[0.8870, 0.9572]

Table 9. Training, validation, and testing results on BraTS 2021 for each MRI modality.

Modality	Training				Validation				Testing
Modality	Acc	Loss	DSC	JI	Acc	Loss	DSC	JI	Acc	Loss	DSC	JI
FLAIR	0.9976	0.0057	0.9424	0.8911	0.9967	0.0081	0.9331	0.8747	0.9973	0.0063	0.9406	0.8881
T1	0.9967	0.0081	0.9179	0.8485	0.9963	0.0088	0.9200	0.8520	0.9971	0.0069	0.9344	0.8770
T2	0.9970	0.0073	0.9279	0.8656	0.9964	0.0090	0.9187	0.8498	0.9974	0.0061	0.9405	0.8877
T1CE	0.9962	0.0092	0.9090	0.8334	0.9954	0.0114	0.9021	0.8219	0.9958	0.0097	0.9168	0.8466

Table 10. Paired t-test results and 95% confidence intervals for DSC values across BraTS 2021 modalities.

Modality	T-Stat	p Value	95% CI [Loc, Scale]
FLAIR	11.0254	0.0004	[0.8794, 0.9379]
T1	8.3864	0.0011	[0.8148, 0.9350]
T2	8.2421	0.0012	[0.8524, 0.9445]
T1CE	10.5462	0.0005	[0.8160, 0.9167]

Table 11. BraTS 2018 HGG segmentation results by modality.

Modality	Training				Validation				Testing
Modality	Acc	Loss	DSC	JI	Acc	Loss	DSC	JI	Acc	Loss	DSC	JI
FLAIR	0.9973	0.0065	0.9114	0.8375	0.9966	0.0087	0.9269	0.8647	0.9964	0.0109	0.9087	0.8330
T1	0.9968	0.0077	0.9007	0.8197	0.9970	0.0072	0.9218	0.8554	0.9979	0.0054	0.9270	0.8640
T2	0.9971	0.0071	0.9098	0.8347	0.9970	0.0072	0.9266	0.8638	0.9980	0.0047	0.9316	0.8720
T1CE	0.9968	0.0077	0.9028	0.8230	0.9968	0.0076	0.9221	0.8563	0.9977	0.0055	0.9193	0.8507

Table 12. Paired t-test results and 95% confidence intervals for BraTS 2018 HGG segmentation (DSC).

Modality	T-Stat	p Value	95% CI [Loc, Scale]
FLAIR	17.1374	6.8011	[0.8899, 0.9250]
T1	10.7844	0.0004	[0.8553, 0.9277]
T2	13.5308	0.0002	[0.8761, 0.9263]
T1CE	9.8723	0.0006	[0.8389, 0.9247]

Table 13. BraTS 2018 LGG segmentation results by modality.

Modality	Training				Validation				Testing
Modality	Acc	Loss	DSC	JI	Acc	Loss	DSC	JI	Acc	Loss	DSC	JI
FLAIR	0.9984	0.0037	0.9680	0.9379	0.9980	0.0047	0.9569	0.9173	0.9985	0.0035	0.9770	0.9550
T1	0.9981	0.0047	0.9583	0.9199	0.9985	0.0035	0.9651	0.9325	0.9985	0.0036	0.9758	0.9528
T2	0.9977	0.0056	0.9504	0.9056	0.9979	0.0055	0.9545	0.9130	0.9981	0.0045	0.9712	0.9441
T1CE	0.9977	0.0055	0.9495	0.9039	0.9981	0.0046	0.9568	0.9171	0.9983	0.0040	0.9730	0.9474

Table 14. Paired t-test results and 95% confidence intervals for BraTS 2018 LGG segmentation (DSC).

Modality	T-Stat	p Value	95% CI [Loc, Scale]
FLAIR	6.0138	0.0038	[0.8836, 0.9672]
T1	14.1150	0.0001	[0.9397, 0.9609]
T2	5.8641	0.0042	[0.8693, 0.9658]
T1CE	11.0614	0.0004	[0.9202, 0.9553]

Table 15. Per-class evaluation metrics on FBTS: Dice Similarity Coefficient (DSC), Jaccard Index (JI), Hausdorff Distance (HD), and Average Symmetric Surface Distance (ASSD). HD and ASSD values are expressed in pixel units based on resized

256 \times 256

slices.

Table 15. Per-class evaluation metrics on FBTS: Dice Similarity Coefficient (DSC), Jaccard Index (JI), Hausdorff Distance (HD), and Average Symmetric Surface Distance (ASSD). HD and ASSD values are expressed in pixel units based on resized

256 \times 256

slices.

Class	DSC	JI	HD	ASSD
Meningioma	0.9788	0.9585	1.0000	0.0212
Glioma	0.9444	0.8947	3.6056	0.0824
Pituitary	0.9590	0.9212	1.4142	0.0405

Table 16. Qualitative evaluation metrics on BraTS 2021 test samples using PSO-GA-U-Net.

Modality	DSC	JI	HD	ASSD
FLAIR	0.9731	0.9476	2.2361	0.0311
T1	0.9563	0.9162	3.6056	0.0603
T2	0.9682	0.9384	3.6056	0.0417
T1CE	0.9541	0.9122	4.4721	0.0679

Table 17. Qualitative segmentation metrics (DSC, JI, HD, and ASSD) for BraTS 2018 HGG and LGG cases.

Grade	Modality	DSC	JI	HD	ASSD
HGG	FLAIR	0.9554	0.9146	5.3852	0.0545
	T1	0.9596	0.9223	8.5440	0.0511
	T2	0.9604	0.9237	8.2462	0.0488
	T1CE	0.9581	0.9196	7.6158	0.0489
LGG	FLAIR	0.9739	0.9492	4.0000	0.0322
	T1	0.9840	0.9685	65.146	0.0849
	T2	0.9836	0.9677	2.8284	0.0179
	T1CE	0.9842	0.9689	2.0000	0.0166

Table 18. Comparison of segmentation performance on FBTS dataset.

Method	DSC	JI
Proposed (PSO-GA-U-Net)	0.9587	0.9209
U-Net-AG * [31]	0.9521	0.9093
ResUnet-TL [90]	0.9194	-
DeepLabV3+ ResNet18 [41]	0.9124	-
U-Net-T-PSO [18]	0.9312	0.8722
U-Net-ResNet50 [89]	0.9553	0.9151
EfficientNet-U-Net [91]	0.9132	-
Residual-Attention-U-Net [8]	0.9110	0.8930
EfficientNetB4 [92]	0.9339	0.8795
YOLO-U-Net [93]	0.9273	0.8915
YOLO-BT-UNetV2 [94]	0.9260	0.8630
Self-Attention U-Net [95]	0.9327	0.7800
YOLO-M-U-Net [93]	0.8915	0.8833

* Statistically lower than proposed method (

p < 0.01

).

Table 19. Comparison of segmentation performance on BraTS 2021 dataset.

Method	DSC	JI
Proposed (PSO-GA-U-Net)	0.9406	0.8881
U-Net-AG [31]	0.9095	0.8323
E-CATBraTS [96]	0.8510	0.7660
AWA-VGG-19 [59]	0.9273	-
SPPNet-2 [43]	0.9040	-
ViT-self-attention [58]	0.9174	-
MS-Segnet [97]	0.9200	-
U-Net-ASPP-EVO [61]	0.9251	-
U-Net [98]	0.8600	0.7807
ViT-24 [99]	0.8048	-
ResU-Net [100]	0.8841	-
2C-U-Net [60]	0.8370	-
UNCE-NODE [101]	0.8949	-

Table 20. Comparison of segmentation performance on BraTS 2018 dataset.

Method	DSC	JI
Proposed (PSO-GA-U-Net)	0.9480	0.9024
MCCNN-CRFs [102]	0.8824	-
MSFR-Net [103]	0.8600	-
IDSFCM [104]	0.9418	0.9287
RMU-Net [105]	0.9080	0.8956
U-Net-ResNet50 [89]	0.9202	0.8536
MFFM + SCFFM (Baseline) [106]	0.8460	-
OM-Net [107]	0.9074	-
U-Net-Prep [108]	0.9000	-
Cascaded Networks [109]	0.8956	-
Ensemble-Net [110]	0.8824	-
BrainSeg-Net [111]	0.8940	-
U-Net-FCN [112]	0.8600	-
AGResU-Net [34]	0.8760	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Saifullah, S.; Dreżewski, R.; Yudhana, A.; Tanone, R.; Suryotomo, A.P. A Hybrid Particle Swarm–Genetic Algorithm Framework for U-Net Hyperparameter Optimization in High-Precision Brain Tumor MRI Segmentation. Appl. Sci. 2026, 16, 3041. https://doi.org/10.3390/app16063041

AMA Style

Saifullah S, Dreżewski R, Yudhana A, Tanone R, Suryotomo AP. A Hybrid Particle Swarm–Genetic Algorithm Framework for U-Net Hyperparameter Optimization in High-Precision Brain Tumor MRI Segmentation. Applied Sciences. 2026; 16(6):3041. https://doi.org/10.3390/app16063041

Chicago/Turabian Style

Saifullah, Shoffan, Rafał Dreżewski, Anton Yudhana, Radius Tanone, and Andiko Putro Suryotomo. 2026. "A Hybrid Particle Swarm–Genetic Algorithm Framework for U-Net Hyperparameter Optimization in High-Precision Brain Tumor MRI Segmentation" Applied Sciences 16, no. 6: 3041. https://doi.org/10.3390/app16063041

APA Style

Saifullah, S., Dreżewski, R., Yudhana, A., Tanone, R., & Suryotomo, A. P. (2026). A Hybrid Particle Swarm–Genetic Algorithm Framework for U-Net Hyperparameter Optimization in High-Precision Brain Tumor MRI Segmentation. Applied Sciences, 16(6), 3041. https://doi.org/10.3390/app16063041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Particle Swarm–Genetic Algorithm Framework for U-Net Hyperparameter Optimization in High-Precision Brain Tumor MRI Segmentation

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. Dataset Description

3.1.1. Figshare Brain Tumor Segmentation (FBTS) Dataset

3.1.2. BraTS 2021 Dataset

3.1.3. BraTS 2018 Dataset

3.1.4. Preprocessing and Augmentation Pipeline

3.2. PSO-GA Hybrid Framework

3.2.1. Particle Swarm Optimization (PSO)

3.2.2. Genetic Algorithm (GA) Phase

3.2.3. Fitness Evaluation

3.2.4. Termination Criteria

3.2.5. Optimization Workflow

3.2.6. Pseudocode Implementation

3.3. U-Net Architecture and Training Configuration

3.3.1. Encoder and Decoder Design

3.3.2. Activation and Optimization Strategy

3.3.3. Output Layer and Loss Function

3.3.4. Evaluation Metrics

3.3.5. Implementation and Hardware

3.3.6. Example Instantiation

3.3.7. Integrated Optimization and Training Loop

3.4. Experimental Setup

Unit of Boundary Metrics

4. Results and Discussion

4.1. Optimization Results and Comparative Evaluation

4.1.1. PSO-GA-Driven Hyperparameter Tuning and Convergence Behavior

4.1.2. Comparative Performance of PSO, GA, and PSO-GA Metaheuristics

4.2. Ablation Study: Evaluating Individual and Hybrid Metaheuristics

Computational Cost and Model Complexity

4.3. Evaluation on FBTS: Augmentation and Cross-Validation Impact

4.4. Evaluation on BraTS 2021: Multi-Modality Performance Analysis

4.5. Evaluation on BraTS 2018: HGG and LGG Segmentation Performance

4.5.1. High-Grade Glioma (HGG) Evaluation

4.5.2. Low-Grade Glioma (LGG) Evaluation

4.6. Qualitative Evaluation on FBTS: Visual and Interpretive Analysis

4.7. Qualitative Results on BraTS 2021: Visual and Metric-Based Analysis

4.8. Qualitative Evaluation on BraTS 2018: Visual Analysis of HGG and LGG

4.8.1. HGG: Boundary Sensitivity and Modality Reliability

4.8.2. LGG: Homogeneous Boundaries and Compact Attention Spread

4.8.3. Comparative Observations: HGG vs. LGG

4.9. Comparison with State-of-the-Art Methods

4.9.1. FBTS Dataset: Model Precision and Boundary Localization

4.9.2. BraTS 2021 Dataset: Robustness on Complex Multimodal Structures

4.9.3. BraTS 2018 Dataset: Performance in Mixed-Grade Tumor Cases

4.9.4. Overall Insights

4.10. Discussion of Findings and Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Ablation Study Details

Appendix A.1. Fold-Wise Performance Metrics

Appendix A.2. Metric Variability Visualization

Appendix A.3. Confidence Interval Estimation

Appendix B. Distributional Analysis of Cross-Validation Metrics

Appendix C. Distributional Analysis of BraTS 2021 Metrics

Appendix D. Violin Plot Visualizations of BraTS 2018 (HGG and LGG)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI