Enhanced Segmentation of Glioma Subregions via Modality-Aware Encoding and Channel-Wise Attention in Multimodal MRI

Cariola, Annachiara; Sibilano, Elena; Brunetti, Antonio; Buongiorno, Domenico; Guerriero, Andrea; Bevilacqua, Vitoantonio

doi:10.3390/app15148061

Open AccessArticle

Enhanced Segmentation of Glioma Subregions via Modality-Aware Encoding and Channel-Wise Attention in Multimodal MRI

by

Annachiara Cariola

^†

,

Elena Sibilano

^†

,

Antonio Brunetti

^*

,

Domenico Buongiorno

,

Andrea Guerriero

and

Vitoantonio Bevilacqua

Department of Electrical and Information Engineering, Polytechnic University of Bari, Via Orabona 4, 70126 Bari, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(14), 8061; https://doi.org/10.3390/app15148061

Submission received: 23 June 2025 / Revised: 15 July 2025 / Accepted: 18 July 2025 / Published: 20 July 2025

(This article belongs to the Special Issue The Role of Artificial Intelligence Technologies in Health)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Accurate segmentation of key tumor subregions in adult gliomas from Magnetic Resonance Imaging (MRI) is of critical importance for brain tumor diagnosis, treatment planning, and prognosis. However, this task remains poorly investigated and highly challenging due to the considerable variability in shape and appearance of these areas across patients. This study proposes a novel Deep Learning architecture leveraging modality-specific encoding and attention-based refinement for the segmentation of glioma subregions, including peritumoral edema (ED), necrotic core (NCR), and enhancing tissue (ET). The model is trained and validated on the Brain Tumor Segmentation (BraTS) 2023 challenge dataset and benchmarked against a state-of-the-art transformer-based approach. Our architecture achieves promising results, with Dice scores of 0.78, 0.86, and 0.88 for NCR, ED, and ET, respectively, outperforming SegFormer3D while maintaining comparable model complexity. To ensure a comprehensive evaluation, performance was also assessed on standard composite tumor regions, i.e., tumor core (TC) and whole tumor (WT). The statistically significant improvements obtained on all regions highlight the effectiveness of integrating complementary modality-specific information and applying channel-wise feature recalibration in the proposed model.

Keywords:

glioma segmentation; multimodal MRI; multi-encoder architecture; channel-wise attention

1. Introduction

Primary brain tumors represent a heterogeneous group of neoplasms that originate from cells within the Central Nervous System (CNS). In particular, gliomas develop from glial cells and are the most common brain tumors among adults, representing 75% of malignant primary neoplasms [1]. The WHO 2021 classification of CNS tumors identifies three main categories of diffuse gliomas: pediatric-type diffuse low-grade gliomas, pediatric-type diffuse high-grade gliomas, and adult-type diffuse gliomas, further subdivided into astrocytomas (IDH mutant), oligodendrogliomas (IDH mutant, and encoded 1p/19q) and glioblastomas (IDH wildtype) [2]. Multiparametric brain Magnetic Resonance Imaging (MRI), including T2-weighted, T2-weighted fluid-attenuated inversion recovery (FLAIR), and pre- and post-contrast 3D T1-weighted sequences (T1CE), is considered the diagnostic gold standard for the detection of brain tumors [3]. Each MRI modality provides complementary information and is employed to assess location, size, margins, structure, and spread of the tumors, along with the treatment response. Specifically, T1CE imaging mainly highlights areas where the blood–brain barrier is disrupted, while peritumoral edema is generally visible on T2-weighted images and is clearly visible on FLAIR sequences, in which the cerebrospinal fluid (CSF) is suppressed [4].

In this context, the segmentation of tumor structures in brain MRI is an essential step for diagnosis, quantitative assessment, and surgical planning. Although MRI allows for an informative description of brain structures, accurately and reproducibly segmenting brain tumors remains a challenging task due to poor spatial resolution, low contrast, ill-defined boundaries, inhomogeneity, partial volume effect, noise, and other acquisition artifacts in data that can hinder the outcome [5]. Moreover, manual delineation of brain tumors in MRI images is a time-consuming and error-prone process because of inter- and intra-variability among medical operators. This has led to the development of automatic segmentation approaches based mainly on Deep Learning (DL), which enable a faster and more consistent delineation of tumor regions and have the potential to facilitate radiomic analyses and enhance clinical decision-making [6].

Among DL architectures, Convolutional Neural Networks (CNNs) are the most widely adopted models for brain tumor segmentation [7]. CNN-based models such as U-Net [8], its 3D extensions [9], and more recent frameworks like nnU-Net [10], have demonstrated state-of-the-art performance in brain tumor segmentation tasks. More recently, emerging paradigms based on the use of Vision Transformers (ViTs) and attention mechanisms have demonstrated superior performance due to their ability to capture global contextual information, which is often limited in CNN-based approaches [11].

However, higher computational costs and the need for large data availability still limit their clinical adoption. Although numerous DL models have been proposed for glioma segmentation, most studies primarily focus on segmenting three tumor subregions: the enhancing tumor (ET), which is visible in T1CE images; the tumor core (TC), which includes both the enhancing and necrotic (NCR) areas; and the whole tumor (WT), which represents the overall extension of the tumor including the peritumoral edema (ED). This standardization is derived from large public datasets such as the Brain Tumor Segmentation (BraTS) challenge dataset [12]. However, from a clinical perspective, a finer and more accurate segmentation of individual tumor components is necessary for a better assessment of the neoplasm. Indeed, the WHO 2021 classification of CNS tumors highlighted the importance of evaluating necrotic regions in the diagnosis and prognosis of adult-type diffuse gliomas, underlining the diagnostic challenges behind their identification and the impact of this ambiguity in tumor-type differentiation [13]. In addition, several studies have identified peritumoral edema as a significant prognostic factor in glioblastomas, assessing that patients with extensive edema have shorter overall survival compared to patients with less extended edema [14,15,16,17]. Together, this evidence highlights the need for advanced and clinically informed segmentation strategies that incorporate meaningful tumor substructures. Hence, the aim of this study is to perform accurate and fine-grained segmentation of key subregions in adult gliomas, i.e., ET, NCR, and ED, by employing a novel multi-encoder architecture that leverages a channel-wise attention mechanism while maintaining low computational complexity. The proposed model is specifically adapted for multimodal MRI input through the use of Squeeze-and-Excitation (SE) blocks, which incorporate a modular design that enables the network to capture complementary and modality-specific features. To ensure a comprehensive performance analysis of our model, the segmentation results are compared to those obtained with a benchmark ViT-based model, namely, SegFormer3D [18], which was trained and tested on the same dataset. Furthermore, to ensure direct comparison with state-of-the-art methods, segmentation results achieved on TC and WT regions are also reported. In particular, the main contributions of our work are summarized in the following:

We introduce a multi-encoder architecture for fine-grained segmentation of glioma subregions, namely, ET, NCR, and ED, which are clinically relevant yet often underrepresented in standard segmentation protocols.
We leverage both MRI modality-specific encoding and modular attention-based refinement to effectively capture complementary information across modalities. This allows us to improve segmentation performance while keeping a moderate number of parameters, making the model suitable for clinical scenarios with constrained computational resources.
We benchmark the performance of our model against a state-of-the-art transformer-based approach, obtaining statistically significant improvements on all tumor subregions.
We test our model on two public external validation datasets, achieving competitive results in the segmentation of all tumor subregions and demonstrating the strong generalization capacity of our model.

2. Related Work

An extensive branch of the literature on DL-based brain tumor segmentation is related to the BraTS Challenge dataset [12], which is a standard benchmark for assessing automatic brain tumor segmentation techniques. Thanks to its high-quality annotations, standardized format, and broad availability, the BraTS dataset has been widely adopted in various studies aimed at developing and evaluating DL-based brain tumor segmentation methods.

Among CNN-based models, Kamnitsas et al. [19] proposed DeepMedic, one of the top-performing algorithms in the BraTS 2016 competition. DeepMedic features a parallel path structure with two resolution levels, enabling simultaneous capture of both the local details and broader global context of the image. In subsequent BraTS challenges, models based on U-Net [8] gained significant prominence. Specifically, in the BraTS 2017 challenge, 16 models among those presented drew inspiration from U-Net, while in BraTS 2018 [20] over 35 models combined U-Net with other CNN architectures. Notably, Sun et al. [21] tested an ensemble approach on the BraTS 2018 dataset that incorporated three advanced models: an anisotropic cascade CNN, a previously described 3D U-Net architecture [22], and a modified 3D version of the standard U-Net. By employing a majority voting mechanism to combine these independently trained networks, the ensemble approach achieved remarkable segmentation accuracy. Specifically, this strategy demonstrated improved segmentation for WT and ET, with Dice scores of 0.91 and 0.81, respectively, while the anisotropic cascade CNN excelled in segmenting TC, with a Dice score of 0.85. Henry et al. implemented a 3D U-Net model incorporating deep supervision techniques, with four downsampling stages located in the encoder, symmetrically replicated in the decoder. This approach ranked among the top 10 in the BraTS 2020 segmentation challenge, with Dice scores of 0.79 for ET, 0.89 for WT, and 0.84 for TC [23]. Another innovative approach was the MS-SegNet architecture by Sachdeva et al., which was tested on the BraTS 2020 and 2021 datasets. This model utilizes convolutional filters of varying sizes to extract both local and global features across multiple MRI modalities, thereby expanding the receptive field and enhancing robustness. Its encoder employs a multi-scale feature extraction block to capture both low-level visual details and high-level semantic features, leading to Dice scores for ET, WT, and TC of 0.81, 0.91, and 0.83 on BraTS 2020 and 0.86, 0.92, and 0.84 on BraTS 2021, respectively [24]. Further exploration was conducted by Ahuja et al., who employed DeepLabv3+ integrated with different backbone networks such as ResNet18 and MobileNetv2 along with customized loss functions to improve segmentation accuracy for various cancer regions in the BraTS 2020 dataset, achieving a very high average Dice score of 0.92 [25]. In 2020, Isensee et al. [26] ranked first in the challenge by applying the nnU-Net framework [10] to the segmentation task and integrating BraTS-specific optimizations such as batch Dice loss, extensive data augmentation, and targeted postprocessing to manage small ET regions. This approach achieved Dice scores of 0.89 for WT, 0.85 for TC, and 0.82 for ET. Starting from BraTS 2021, the introduction of the attention mechanism employed by transformers has provided new hybrid methods that account for previous limitations. Luu et al. [27] presented an extension of the nnU-Net model based on a deeper architecture along with the introduction of group normalization and axial attention mechanisms within the decoder. This approach led to a substantial performance improvement, with Dice scores of 0.93 for WT, 0.88 for TC, and 0.85 for ET, although some of these gains were likely driven by the expanded dataset available that year, which provided more training examples. In 2022, the top-performing model was an ensemble combining DeepSeg [28], an enhanced nnU-Net, with DeepSCAN architectures [29] and using the STAPLE algorithm [30] to fuse final predictions. This ensemble achieved Dice scores of 0.93 for WT, 0.88 for TC, and 0.88 for ET, further advancing the state-of-the-art in glioma subregion segmentation [31]. Ferreira et al. [32] proposed an ensemble comprising nnU-Net, Swin UNETR, and the BraTS 2021 winning model, as well as integrating advanced synthetic data augmentation techniques such as GANs and registration. This approach achieved impressive Dice score results of 0.90 for WT, 0.87 for TC, and 0.85 for ET on the BraTS 2023 dataset by combining the strengths of CNNs and transformers, albeit with increased computational cost. Liang et al. [33] proposed BTSwin-Unet, a 3D symmetric U-shaped segmentation network that integrates Swin Transformer modules into both the encoder and decoder pathways for volumetric brain tumor segmentation. Their model employs window-based self-attention to capture long-range dependencies, an overlapping patch embedding strategy to improve local feature representation, and a convolutional stem to stabilize training, reporting an average Dice score of 0.86 across ET, WT, and, TC subregions. Another architecture tested for brain tumor segmentation is TSEUnet by Chen et al. [34]. Their model, inspired by nnU-Net, integrates a transformer module in the encoder path along with Squeeze-and-Excitation (SE) attention in the decoder to selectively enhance informative features. This strategy resulted in improved performance on the BraTS 2018 dataset, achieving Dice scores of 0.82 for ET, 0.91 for WT, and 0.87 for TC, outperforming the top-ranked method of that year by 0.62%, 0.37%, and 1.23%, respectively. Further investigating transformer-based architectures, Xing et al. [35] proposed NestedFormer. This model employs a multi-encoder/single-decoder structure with two key components: Modality-Sensitive Gating (MSG) for improved skip connections at lower scales, and the Nested Modality-aware Feature Aggregation (NMaFA) module for high-level fusion, achieving Dice scores of 0.92 for WT, 0.86 for TC, and 0.80 for ET on BraTS 2020 dataset. Perera et al. [18] developed SegFormer3D for medical image segmentation to overcome the limitation of high computational resources typically associated with complex transformer architectures, enabling efficient training and deployment without compromising performance. This model succeeded in reducing the computational burden and capturing multi-scale information by encoding feature maps at various resolutions of the input volume, following the hierarchical structure introduced by Pyramid Vision Transformer [36]. Moreover, the efficient self-attention mechanism and overlapping patch embedding module employed in the encoder enable both local continuity and global context modeling. For the decoder, SegFormer3D uses the all-MLP approach presented by Xie et al. [37]. All of these strategies led to a significant reduction in model size and complexity, with only 4.5M parameters while achieving competitive performance in terms of Dice scores, with 0.90, 0.74, and 0.82 for WT, ET, and TC, respectively, on the BraTS 2017 dataset. Nonetheless, a common limitation across existing works is their primary focus on the delineation of composite tumor regions as defined by the BraTS challenge protocol. To the best of our knowledge, only a few works have investigated a more detailed segmentation of key brain tumor subregions [38,39], reporting very limited results, particularly in the delineation of necrotic areas and edema. Therefore, having assessed the important clinical significance of each tumor subregion, the aim of this work is to design and validate a novel DL architecture for accurate segmentation of the ET, NCR, and ED regions while maintaining moderate computational complexity.

3. Materials and Methods

To achieve more precise and fine-grained segmentation of brain tumor subregions, we present a novel multi-encoder network inspired by the U-Net architecture. This model features three parallel encoders, each dedicated to specific MRI modalities, and is enhanced with Squeeze-and-Excitation (SE) blocks to improve channel-wise feature recalibration.

In the following section, we describe the dataset and preprocessing steps, then outline the experimental setup used for model training and evaluation.

3.1. BraTS-2023 Dataset

The dataset employed in our experiments is the publicly available BraTS 2023 training dataset [12], which includes a multi-institutional cohort of 1250 multi-parametric MRI (mpMRI) scans of adult gliomas. For each patient, four MRI modalities are provided including pre- and post-gadolinium T1-weighted, T2-weighted, and T2-weighted FLAIR sequences, all of which are co-registered to the same anatomical template, interpolated to the same resolution (1 mm³), and skull-stripped. Ground truth annotations of the tumor subregions approved by expert neuroradiologists are also available for the evaluation, as documented in the related paper [40].

The provided segmentation labels include:

Contrast-enhanced tumor (ET): Regions that appear strongly highlighted in MRI after contrast medium administration.
Necrotic tumor core (NCR): Tumor regions which appear hypointense in T1CE sequences.
Peritumoral edematous/invaded tissue (ED): Areas of diffuse hyperintensity in FLAIR sequences, which includes the infiltrative non-enhancing tumor as well as vasogenic edema in the peritumoral region.

As previously described, the evaluation of segmentation performance of the BraTS 2023 Adult Glioma Challenge participants was based on the three composite tumor subregions, i.e., WT, TC, and ET, each derived from specific combinations of underlying labels. In contrast, for the purpose of our study, we chose to retain the original individual labels and compute the evaluation metrics separately for each class. However, to ensure clarity we also report the results of our model on the composite tumor regions, i.e., TC and WT.

3.2. External Validation Datasets

To evaluate the generalizability of our model, we employed two publicly available external datasets: TCGA-GBM [41] and UPENN-GBM [42]. These datasets were both collected by The Cancer Imaging Archive (TCIA) and annotated using the same labeling protocol adopted in the BraTS-2023 challenge. The TCGA-GBM dataset involves pre-operative multi-institutional MRI scans from The Cancer Genome Atlas (TCGA) Glioblastoma Multiforme (GBM) collection. Each case includes co-registered and skull-stripped multi-parametric MRI (T1, T1CE, T2, T2-FLAIR) volumes. Segmentation labels were initially generated using the top-performing method from BraTS 2015 and subsequently reviewed and corrected by expert neuroradiologists, resulting in high-quality gold-standard annotations consistent with BraTS framework.

The UPENN-GBM dataset includes the same mpMRI scans of patients affected by multiform glioblastoma collected at the University of Pennsylvania Health System. Tumor subregion labels were created using a semi-automated pipeline and manually reviewed or approved by neuroradiologists, and are provided in co-registered NIfTI format.

To ensure an objective assessment of model performance, we selected all cases for which segmentation labels were available and used the official mapping file released by the BraTS 2021 organizers [40] to identify and exclude all subjects that overlapped with the BraTS 2023 training set. After filtering, we finally obtained a total of 92 TCGA-GBM cases and 144 UPENN-GBM cases for external validation. This ensured that the evaluation was performed on fully independent data that were unseen during model training or internal testing.

3.3. Preprocessing

The preprocessing steps included cropping, normalization, and data augmentation strategies applied to all volumes prior to training, as illustrated in Figure 1.

For the cropping phase, indices were determined through statistical analysis of the binary masks obtained by applying Otsu’s thresholding filter on the T2-weighted scans and a sequence of additional morphological operations, i.e., erosion (radius of structuring element:

[1, 1, 1]

), binary hole filling, and dilation (radius of structuring element:

[4, 4, 4]

). Specifically, for each subject, the largest non-zero region in the xy-plane was computed, averaging the indices across all slices; the resulting cropping bounds were

[40, 196]

for

[x_{min}, x_{max}]

and

[30, 222]

for

[y_{min}, y_{max}]

. Along the z-axis, ground truth masks were used to identify the last slice index containing a segmentation label. The volumes were then cropped while preserving the largest dimension between x and y across the volumes, resulting in a final size of

192 \times 192 \times 150

.

To ensure consistency, robustness, and generalizability during training, a series of transformations from the Medical Open Network for AI (MONAI) library [43] was applied on the imaging data, including orientation alignment, random flipping, intensity normalization, and intensity-based augmentations. Initially, all images and their corresponding segmentation masks were reoriented to the standard RAS coordinate system. Random flipping was then applied independently along each of the three spatial axes with a probability of 50%. Subsequently, intensity normalization was performed on non-zero voxels to standardize the signal intensity values across scans. To further improve model generalization to intensity variations commonly observed in MRI data, random intensity augmentations were applied, including random scaling and shifting of voxel intensities.

3.4. Multi-Encoder Architecture

The design and development of our multi-encoder model is specifically adapted for multimodal MRI input and accurate segmentation of small regions through the incorporation of modality-specific encoding and attention-based refinement.

The network is presented in Figure 2. It is characterized by three separate encoders designed to process T1-weighted, T2-weighted, and a combination of T1CE and FLAIR input volumes, respectively. Each encoder consists of two downsampling layers, further involving residual blocks that increase input channels by a factor of two and a max-pooling stage that halves the spatial dimensions. The encoder ends up with two residual blocks and a Squeeze-and-Excitation (SE) block for channel attention [44]. The encoder employed for combined T1CE and T2 FLAIR modalities presents an additional bottleneck stage, where a further downsampling layer allows for fine-grained feature extraction. More specifically, the blocks composing the encoders are shown in Figure 3. Residual blocks are made of 3D convolutional layers with a kernel size of

[3, 3, 3]

, stride 1, and padding 1, which preserves the spatial dimensions of the input while adjusting the channel depth. Instance normalization and a LeakyReLU activation function are then applied to introduce nonlinearity. Residual connections sum the input directly to the output, facilitating efficient gradient propagation and preserving low-level feature information. The SE block begins with a squeeze phase, where adaptive average pooling is applied to collapse spatial dimensions and retain only channel-wise statistics. During the excitation phase, two fully connected layers reduce then restore the number of channels. This transformation is handled by Gaussian Error Linear Unit (GeLU) activation and a dropout layer (20%) to enhance generalization and prevent overfitting. Finally, a sigmoid activation function is applied to produce normalized channel-wise attention weights, which are used to refine the original feature maps in order to selectively emphasize the most informative channels. The decoder reflects the encoder structure, with three upsampling layers identified by transposed and residual convolutions to incrementally recover the spatial resolution and reduce the number of channels, respectively. Moreover, skip connections are applied to link each encoder and decoder stage at the corresponding depth, preserving both global and local features. The final 3D voxel-wise segmentation map is obtained through a convolutional layer with a kernel size of

[1, 1, 1]

, stride 1, and three output channels that predict the class probabilities for the enhancing tumor, necrosis, and edema regions.

3.5. Experimental Details

The dataset was split into a training set (80%) and a test set (20%). The training set was further divided using a 5-fold cross-validation strategy to avoid the risk of overfitting. In this scheme, the 1000 training subjects were partitioned into five equally sized folds: in each iteration, four folds were used as training data and the remaining one was used for the validation phase. This process was repeated five times, resulting in five models, each trained and validated on different subsets of the data. In the final evaluation phase, predictions from these five independently trained models were combined using a majority voting ensemble strategy to produce the final segmentation on the held-out test set. The segmentation pipeline was developed in Python (version 3.12) and executed through the Anaconda interpreter using the Spyder integrated development environment. The training process was performed on the LEONARDO supercomputer infrastructure utilizing an NVIDIA A100 GPU.

The model was trained for up to 200 epochs, starting with an initial learning rate of

1 \times 10^{- 4}

. A Cosine Annealing scheduler was used to progressively reduce the learning rate, helping stabilize the training and improving generalization. Early stopping with a patience of 20 epochs was also implemented, ending training if no improvement in validation performance was observed, thereby preventing overfitting and unnecessary computational costs.

Training was carried out with a batch size of 1, which is common choice in volumetric medical image segmentation due to memory constraints. The model was optimized using the AdamW optimizer with a weight decay of

1 \times 10^{- 5}

. AdamW decouples the weight decay from the gradient update step, allowing for better control over regularization and learning dynamics, which improves both convergence speed and generalization [45]. The initial loss function was the Dice loss, which is particularly suitable for segmentation tasks with class imbalance.

3.6. Model Evaluation

The effectiveness of the proposed segmentation strategy was evaluated using standard metrics commonly adopted in medical image segmentation, including the Dice Similarity Coefficient (DSC) or Dice score (Equation (1)), Recall (Equation (2)), and Precision (Equation (3)), as recommended in the recent literature and domain-specific evaluation guidelines [46].

The Dice score, also known as (F1-score), is a widely adopted metric for evaluating overlap between predicted and ground truth segmentations [47]. It is defined as follows:

D S C = \frac{2 \cdot | X \cap Y |}{| X | + | Y |}

(1)

where X denotes the model’s predicted masks and Y denotes the ground-truth labels. The DSC ranges from 0 to 1, with a score of 1 indicating perfect agreement between prediction and ground truth.

Recall quantifies the model’s ability to correctly identify true positives predictions, and is computed as follows:

R e c a l l = \frac{T P}{T P + F N}

(2)

where

T P

denotes true positives and

F N

false negatives.

Precision evaluates the proportion of correctly predicted positive voxels compared to all voxels predicted as positive, and is defined as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

where

F P

represents false positives. This metric reflects the model’s ability to avoid incorrect labeling of non-relevant regions. High precision implies a low false positive rate, indicating reliable specificity in segmentation predictions.

Furthermore, to provide a robust comparison between our model and Segformer3D as well as to objectively evaluate its performance, we conducted statistical analysis on the subject-level metrics. Specifically, the non-parametric Wilcoxon signed-rank test was applied, which is suitable for assessing the statistical significance of the differences between two models. Statistical significance was considered at a threshold of p < 0.05.

4. Results

Comparative performance analysis on the BraTS dataset and results on the additional external validation datasets are presented in the following section.

4.1. Results on BraTS Dataset

Segmentation results in terms of average Dice score (Equation (1)), Recall (Equation (2)) and Precision (Equation (3)) on the test set for each tumor subregion are reported in Table 1. Our multi-encoder architecture showed strong segmentation performance overall, with some variability across the different subregions. The highest Dice score of 0.88 ± 0.15 was achieved on ET, followed closely by 0.86 ± 0.14 on ED, while NCR segmentation, which represents the most challenging task, yielded a score of 0.78 ± 0.28.

This internal variation reflects the common difficulties in exactly defining small or irregular structures such as necrosis, which are often less distinct and more variable across patients. On the other hand, the modality-specific encoding and SE-based attention mechanism led to both high precision (0.89 ± 0.12) and high recall (0.89 ± 0.15) for the ET region, which is better visualized in T1CE. The ED region also showed balanced performance on precision (0.87 ± 0.15) and recall (0.87 ± 0.14), highlighting the network’s ability to capture larger and more diffuse areas of abnormality.

NCR segmentation also exhibited slightly lower recall (0.79 ± 0.27) compared to ED and ET. Nonetheless, the model showed high precision (0.84 ± 0.24), suggesting reliable identification of necrotic tissue when predicted.

To evaluate the effectiveness of our results and ensure a robust comparative analysis, we trained and tested a SegFormer3D model [18]. This state-of-the-art transformer-based architecture provides a good tradeoff between efficacy and model complexity, making it a solid benchmark in the context of brain tumor segmentation. All experiments were carried out under the same experimental conditions to provide consistency, including the same dataset splits, training parameters, 5-fold cross-validation strategy, evaluation metrics, and computational environment. As shown in Table 1, the multi-encoder model consistently outperformed the SegFormer3D baseline across all metrics and classes.

In terms of Dice scores, the best SegFormer3D performance was observed for ET (0.79 ± 0.18), similar to the multi-encoder, followed by ED (0.78 ± 0.18), while NCR was confirmed to be the most challenging region, with the lowest Dice score of 0.66 ± 0.29.

The precision and recall metrics reflected this trend as well, with ET achieving relatively high precision and recall of 0.80 ± 0.17 and 0.80 ± 0.19, respectively, indicating stable and consistent predictions. For ED, the model reached a precision of 0.77 ± 0.18 and recall of 0.82 ± 0.18, suggesting a trend of oversegmentation, whereas for NCR the precision (0.68 ± 0.28) and recall (0.74 ± 0.26) were both lower, showing both under- and over-segmentation tendencies.

The statistical analysis conducted on the models found significant differences on all tumor subregions and all metrics (

p < 0.001

). To provide a comprehensive evaluation of our model, performance metrics on tumor core (TC) and whole tumor (WT) composite regions are also reported. In both cases, the improvements on all metrics were significant compared to SegFormer3D.

4.2. Results on External Validation Datasets

The results obtained on the external validation datasets are shown in Table 2. The proposed model exhibited consistent performance across both the UPENN and TCGA cohorts, confirming its generalization ability. Notably, both datasets confirmed the model’s strong performance on ET and WT regions, with ET Dice scores of 0.85 ± 0.11 on UPENN and 0.88 ± 0.12 on TCGA and WT scores of 0.90 ± 0.09 and 0.94 ± 0.10, respectively.

As observed in the internal evaluation, segmentation of the non-enhancing core (NCR) proved more challenging, with a slightly lower Dice score (0.76 ± 0.19 on UPENN and 0.74 ± 0.22 on TCGA) and reduced recall, likely due to its limited size and lower contrast. Nevertheless, the precision for NCR is relatively high on both datasets, suggesting that necrotic regions were accurately identified when predicted.

5. Discussion

Although the automatic segmentation of brain tumors, particularly gliomas, from MRI scans has been widely investigated in the literature, most studies have not addressed the need for fine-grained segmentation of key tumor subregions such as necrotic areas and peritumoral edemas. Nonetheless, these regions are associated with distinct pathological features that can guide the diagnosis and prognosis of adult-type diffuse gliomas [13].

Our segmentation results demonstrate that the proposed multi-encoder design with the ability to process MRI modality-specific information and apply channel-wise recalibration performs particularly well on tumor regions with more distinct imaging characteristics while still handling more complex subregions such as necrosis with outstanding accuracy. Our comparison with the benchmark SegFormer3D transformer-based model for medical image segmentation showed that this model struggles to capture finer details, while our attention-enhanced multi-encoder design is particularly effective in segmenting challenging and small-scale tumor components such as NCR and ED.

Figure 4 shows four representative cases sampled from the test set, illustrating a qualitative comparison between the segmentation results produced by Segformer3D and our multi-encoder model along with the corresponding ground-truth masks. Qualitative results confirm that both models are overall capable of identifying tumor subregions, although with visible differences in their precision and consistency. For instance, the lower precision for ED and ET achieved by SegFormer3D and highlighted in quantitative analysis is reflected by an overestimation of these regions.

This can be explained by considering that despite its transformer-based hierarchical structure and lightweight design, Segformer3D may not fare well when segmenting fine-grained details due to its reduced preservation of local continuity. In contrast, our architecture features separate encoding paths for each MRI modality, which allows for more effective usage of complementary information. This modular design enables the network to learn modality-specific representations prior to feature fusion, improving the accuracy of subregion-level predictions. Additionally, the integration of Squeeze-and-Excitation blocks enhances the model’s ability to focus on salient features, particularly in smaller or less clearly-defined regions such as necrotic tissue. The presence of modular attention blocks within each encoder further promotes focused refinement of features prior to integration, leading to more discriminative fused representations. Overall, the statistically significant improvements across all performance metrics and regions (

p < 0.001

) suggest that our customized architecture leveraging both CNN- and attention-based characteristics outperforms state-of-the-art segmentation models while also maintaining a reduced number of 11.2 M parameters. Computational profiles of all the tested models are summarized in Table 3.

Despite the increased computational cost in terms of inference time and peak memory usage, the multi-encoder model achieves significantly better performance than SegFormer3D across all tumor subregions. As a result, the model offers a balanced tradeoff between computational demand and segmentation accuracy, making it particularly suitable for applications such as clinical decision support systems and radiomic analysis where precision and reliability are essential.

Considering recent results on the same segmentation regions, the Dice scores achieved by our model exceed those reported by Mejía et al. [38] on BraTS 2018 (0.32 for NCR, 0.62 for ED, and 0.70 for ET) and by Beser-Robles et al. [39] on BraTS 2021 (0.60 for NCR, 0.75 for ED, and 0.79 for ET). Moreover, performance on ET (Dice score of 0.88 ± 0.15) are in line with those obtained by the top-ranked BraTS 2023 model on the test set (0.85) [32]. These results were further improved when evaluating the performance of our model on WT and TC regions, yielding high mean Dice scores of 0.91 ± 0.13 on TC and 0.92 ± 0.08 on WT. It is worth noting that in addition to achieving higher overall metrics, our model also showed lower standard deviation values across all tumor subregions, suggesting a higher level of consistency and robustness across inter-patient variability.

Nonetheless, a direct comparative analysis relative to the top-ranking BraTS 2023 methods is not feasible due to dataset limitations. Specifically, our model was trained and tested on the training set portion of the challenge dataset, whereas the official results of the challenge were computed on a separate test set which was not released. Therefore, in order to prove the generalizability and robustness of our approach, we conducted an external validation on two publicly available datasets, UPENN-GBM and TCGA-GBM, while carefully excluding any subjects that were part of the BraTS 2023 training cohort. This allowed us to evaluate the model’s performance on unseen data collected from different institutions and with different acquisition protocols, providing a more realistic assessment of its clinical applicability. The external results support the strength of the proposed architecture, particularly on ET and WT regions, which are clinically relevant for treatment planning and response monitoring. NCR segmentation shows slightly lower Dice scores and higher standard deviation, reflecting the challenges previously raised due to its small size and ill-defined boundaries. Furthermore, the balance between precision and recall observed across all subregions suggests that the model avoids both oversegmentation and undersegmentation, an important property for clinical reliability.

Considering recent results on the same datasets for further comparisons, Liu et al. [48] trained their BRAINNET framework on the UPENN-GBM dataset, testing it on a subset of 122 patients with Dice scores of 0.894 (TC), 0.891 (WT), and 0.812 (ET). Moreover, authors have also reported the performance in terms of Dice scores of 0.766 (ET), 0.815 (TC), and 0.884 (WT) obtained through 3D Autoencoder [49] and 0.845 (ET), 0.878 (TC), and 0.928 (WT) obtained through nnU-Net. Another 3D DL-based approach presented by Guo et al. [50] and tested on 80 samples from the UPENN-GBM dataset reached Dice scores of 0.45 (NCR), 0.75 (ED), and 0.74 (ET). Overall, our method either outperforms or matches these results, further underscoring the effectiveness of the proposed architecture compared to state-of-the-art models.

To better assess the model’s clinical relevance across different patients and tumor morphologies, future work will include further validation on additional cohorts, also involving private datasets. Future studies will also incorporate other state-of-the-art models such as nnU-Net or Swin-UNETR into the benchmarking pipeline in order to more comprehensively evaluate the tradeoff between performance and efficiency. Moreover, advanced strategies such as the use of different loss functions will be explored to further improve segmentation performance in ill-defined tumor regions such as the necrotic core.

6. Conclusions

In this paper, we have developed and evaluated a new DL-based paradigm to improve the segmentation of key glioma subregions in multimodal MRI scans, i.e., enhanced tumor (ET), necrosis (NCR), and peritumoral edema (ED). We propose a different input-level integration in a novel multi-encoder architecture enhanced with Squeeze-and-Exitation (SE) blocks used as a channel-wise attention mechanism. By leveraging modality-specific encoding, we obtained competitive results on the BraTS 2023 training dataset as well as the TCGA-GBM and UPENN-GBM datasets, employed for external validation. In particular, our model outperforms a model using the state-of-the-art Vision Transformer architecture on all considered regions, showing very strong efficacy in both standard and fine-grained segmentation of brain tumor regions while maintaining a moderate model complexity. The specificity introduced by multiple encoders and the adaptive recalibration of channel-wise feature responses through the explicitly modeling of inter-dependencies between channels proves to be a valid approach for our purpose. The improved segmentation results achieved by our model along with its reduced variability across patients can support more precise diagnosis, surgical planning, and longitudinal monitoring of brain tumors. Moreover, the model’s robust performance on external data without any domain-specific fine-tuning demonstrates its effectiveness and reliability.

Author Contributions

Conceptualization, A.C., E.S., A.B.; methodology, A.C., E.S., V.B.; software, A.C., E.S., A.B.; validation, A.B., D.B., V.B.; investigation, A.B., A.G., V.B.; writing—original draft preparation, A.C., E.S., V.B.; writing—review and editing, A.G., D.B., A.B.; visualization, D.B., A.G., V.B.; supervision, A.B., D.B., V.B. All authors have read and agreed to the published version of the manuscript.

Funding

The study was funded by the Italian Ministry of University and Research under the National Plan for Complementary Investments to the NRRP, project “D34H—Digital Driven Diagnostics, prognostics and therapeutics for sustainable health care” (project code: PNC0000001), Spoke 2 “Multilayer platform to support the generation of the patients’ digital twin” (CUP: B53C22006170001).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets analyzed in the current study are available at: https://www.synapse.org/Synapse:syn51156910/wiki/621282 (accessed on 20 May 2025).

Acknowledgments

This work was supported by the NRRP project “BRIEF—Biorobotics Research and Innovation Engineering Facilities”, Mission 4: “Istruzione e Ricerca”, Component 2: “Dalla ricerca all’impresa”, Investment 3.1: “Fondo per la realizzazione di un sistema integrato di infrastrutture di ricerca e innovazione”, CUP: J13C22000400007, funded by European Union—NextGenerationEU.

Conflicts of Interest

The authors declare no conflicts of interest; the funders had no role in the design of the study, in the collection, analysis, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

References

Van den Bent, M.J.; Geurts, M.; French, P.J.; Smits, M.; Capper, D.; Bromberg, J.E.; Chang, S.M. Primary brain tumours in adults. Lancet 2023, 402, 1564–1579. [Google Scholar] [CrossRef] [PubMed]
Osborn, A.; Louis, D.; Poussaint, T.; Linscott, L.; Salzman, K. The 2021 World Health Organization Classification of Tumors of the Central Nervous System: What Neuroradiologists Need to Know. Am. J. Neuroradiol. 2022, 43, 928–937. [Google Scholar] [CrossRef] [PubMed]
Weller, M.; van den Bent, M.; Preusser, M.; Le Rhun, E.; Tonn, J.C.; Minniti, G.; Bendszus, M.; Balana, C.; Chinot, O.; Dirven, L.; et al. EANO guidelines on the diagnosis and treatment of diffuse gliomas of adulthood. Nat. Rev. Clin. Oncol. 2021, 18, 170–186. [Google Scholar] [CrossRef] [PubMed]
Trinh, D.L.; Kim, S.H.; Yang, H.J.; Lee, G.S. The efficacy of shape radiomics and deep features for glioblastoma survival prediction by deep learning. Electronics 2022, 11, 1038. [Google Scholar] [CrossRef]
Wadhwa, A.; Bhardwaj, A.; Verma, V.S. A review on brain tumor segmentation of MRI images. Magn. Reson. Imaging 2019, 61, 247–259. [Google Scholar] [CrossRef] [PubMed]
Daimary, D.; Bora, M.B.; Amitab, K.; Kandar, D. Brain tumor segmentation from MRI images using hybrid convolutional neural networks. Procedia Comput. Sci. 2020, 167, 2419–2428. [Google Scholar] [CrossRef]
Rayed, M.E.; Islam, S.S.; Niha, S.I.; Jim, J.R.; Kabir, M.M.; Mridha, M. Deep learning for medical image segmentation: State-of-the-art advancements and challenges. Inform. Med. Unlocked 2024, 47, 101504. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; pp. 424–432. [Google Scholar]
Isensee, F.; Petersen, J.; Klein, A.; Zimmerer, D.; Jaeger, P.F.; Kohl, S.; Wasserthal, J.; Koehler, G.; Norajitra, T.; Wirkert, S.; et al. nnu-net: Self-adapting framework for u-net-based medical image segmentation. arXiv 2018, arXiv:1809.10486. [Google Scholar]
Wang, P.; Yang, Q.; He, Z.; Yuan, Y. Vision transformers in multi-modal brain tumor MRI segmentation: A review. Meta Radiol. 2023, 1, 100004. [Google Scholar] [CrossRef]
Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R.; et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med Imaging 2014, 34, 1993–2024. [Google Scholar] [CrossRef] [PubMed]
Ma, H.; Zeng, S.; Xie, D.; Zeng, W.; Huang, Y.; Mazu, L.; Zhu, N.; Yang, Z.; Chu, J.; Zhao, J. Looking through the imaging perspective: The importance of imaging necrosis in glioma diagnosis and prognostic prediction–single centre experience. Radiol. Oncol. 2024, 58, 23. [Google Scholar] [CrossRef] [PubMed]
Fang, Z.; Shu, T.; Luo, P.; Shao, Y.; Lin, L.; Tu, Z.; Zhu, X.; Wu, L. The peritumoral edema index and related mechanisms influence the prognosis of GBM patients. Front. Oncol. 2024, 14, 1417208. [Google Scholar] [CrossRef] [PubMed]
Liang, H.K.T.; Mizumoto, M.; Ishikawa, E.; Matsuda, M.; Tanaka, K.; Kohzuki, H.; Numajiri, H.; Oshiro, Y.; Okumura, T.; Matsumura, A.; et al. Peritumoral edema status of glioblastoma identifies patients reaching long-term disease control with specific progression patterns after tumor resection and high-dose proton boost. J. Cancer Res. Clin. Oncol. 2021, 147, 3503–3516. [Google Scholar] [CrossRef] [PubMed]
Zhang, M.; Ye, F.; Su, M.; Cui, M.; Chen, H.; Ma, X. The prognostic role of peritumoral edema in patients with newly diagnosed glioblastoma: A retrospective analysis. J. Clin. Neurosci. 2021, 89, 249–257. [Google Scholar] [CrossRef] [PubMed]
Schoenegger, K.; Oberndorfer, S.; Wuschitz, B.; Struhal, W.; Hainfellner, J.; Prayer, D.; Heinzl, H.; Lahrmann, H.; Marosi, C.; Grisold, W. Peritumoral edema on MRI at initial diagnosis: An independent prognostic factor for glioblastoma? Eur. J. Neurol. 2009, 16, 874–878. [Google Scholar] [CrossRef] [PubMed]
Perera, S.; Navard, P.; Yilmaz, A. Segformer3d: An efficient transformer for 3d medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 4981–4988. [Google Scholar]
Kamnitsas, K.; Ferrante, E.; Parisot, S.; Ledig, C.; Nori, A.V.; Criminisi, A.; Rueckert, D.; Glocker, B. DeepMedic for brain tumor segmentation. In Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: Second InternationalWorkshop, BrainLes 2016, with the Challenges on BRATS, ISLES and mTOP 2016, Held in Conjunction with MICCAI 2016, Athens, Greece, 17 October 2016; pp. 138–149. [Google Scholar]
Ghaffari, M.; Sowmya, A.; Oliver, R. Automated brain tumor segmentation using multimodal brain scans: A survey based on models submitted to the BraTS 2012–2018 challenges. IEEE Rev. Biomed. Eng. 2019, 13, 156–168. [Google Scholar] [CrossRef] [PubMed]
Sun, L.; Zhang, S.; Chen, H.; Luo, L. Brain tumor segmentation and survival prediction using multimodal MRI scans with deep learning. Front. Neurosci. 2019, 13, 810. [Google Scholar] [CrossRef] [PubMed]
Isensee, F.; Kickingereder, P.; Wick, W.; Bendszus, M.; Maier-Hein, K.H. Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge. arXiv 2018, arXiv:1802.10508. [Google Scholar] [CrossRef]
Henry, T.; Carré, A.; Lerousseau, M.; Estienne, T.; Robert, C.; Paragios, N.; Deutsch, E. Brain tumor segmentation with self-ensembled, deeply-supervised 3D U-net neural networks: A BraTS 2020 challenge solution. In Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, 4 October 2020; pp. 327–339. [Google Scholar]
Sachdeva, J.; Sharma, D.; Ahuja, C.K. Multiscale segmentation net for segregating heterogeneous brain tumors: Gliomas on multimodal MR images. Image Vis. Comput. 2024, 149, 105191. [Google Scholar] [CrossRef]
Ahuja, S.; Panigrahi, B.; Gandhi, T.K. Fully automatic brain tumor segmentation using DeepLabv3+ with variable loss functions. In Proceedings of the 2021 8th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 26–27 August 2021; pp. 522–526. [Google Scholar]
Isensee, F.; Jäger, P.F.; Full, P.M.; Vollmuth, P.; Maier-Hein, K.H. nnU-Net for brain tumor segmentation. In Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, 4 October 2020; pp. 118–132. [Google Scholar]
Luu, H.M.; Park, S.H. Extending nn-UNet for brain tumor segmentation. In Proceedings of the International MICCAI Brainlesion Workshop, Virtual Event, 27 September 2021; pp. 173–186. [Google Scholar]
Zeineldin, R.A.; Karar, M.E.; Coburger, J.; Wirtz, C.R.; Burgert, O. DeepSeg: Deep neural network framework for automatic brain tumor segmentation using magnetic resonance FLAIR images. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 909–920. [Google Scholar] [CrossRef] [PubMed]
Gong, Q.; Chen, Y.; He, X.; Zhuang, Z.; Wang, T.; Huang, H.; Wang, X.; Fu, X. DeepScan: Exploiting deep learning for malicious account detection in location-based social networks. IEEE Commun. Mag. 2018, 56, 21–27. [Google Scholar] [CrossRef]
Warfield, S.K.; Zou, K.H.; Wells, W.M. Simultaneous truth and performance level estimation (STAPLE): An algorithm for the validation of image segmentation. IEEE Trans. Med Imaging 2004, 23, 903–921. [Google Scholar] [CrossRef] [PubMed]
Zeineldin, R.A.; Karar, M.E.; Burgert, O.; Mathis-Ullrich, F. Multimodal CNN networks for brain tumor segmentation in MRI: A BraTS 2022 challenge solution. In Proceedings of the International MICCAI Brainlesion Workshop, Singapore, 18 September 2022; pp. 127–137. [Google Scholar]
Ferreira, A.; Solak, N.; Li, J.; Dammann, P.; Kleesiek, J.; Alves, V.; Egger, J. How we won brats 2023 adult glioma challenge? Just faking it! Enhanced synthetic data augmentation and model ensemble for brain tumour segmentation. arXiv 2024, arXiv:2402.17317. [Google Scholar] [CrossRef]
Liang, J.; Yang, C.; Zhong, J.; Ye, X. BTSwin-Unet: 3D U-shaped symmetrical Swin transformer-based network for brain tumor segmentation with self-supervised pre-training. Neural Process. Lett. 2023, 55, 3695–3713. [Google Scholar] [CrossRef]
Chen, Y.; Wang, J. TSEUnet: A 3D neural network with fused Transformer and SE-Attention for brain tumor segmentation. In Proceedings of the 2022 IEEE 35th International Symposium on Computer-Based Medical Systems (CBMS), Shenzen, China, 21–23 July 2022; pp. 131–136. [Google Scholar]
Xing, Z.; Yu, L.; Wan, L.; Han, T.; Zhu, L. NestedFormer: Nested modality-aware transformer for brain tumor segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; pp. 140–150. [Google Scholar]
Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 568–578. [Google Scholar]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
Mejía, G.; Moreno, D.; Ruiz, D.; Aparicio, N. Hirni: Segmentation of Brain Tumors in Multi-parametric Magnetic Resonance Imaging Scans. In Proceedings of the 2021 IEEE 2nd International Congress of Biomedical Engineering and Bioengineering (CI-IB&BI), Bogotá, Colombia, 13–15 October 2021; pp. 1–4. [Google Scholar]
Beser-Robles, M.; Castellá-Malonda, J.; Martínez-Gironés, P.M.; Galiana-Bordera, A.; Ferrer-Lozano, J.; Ribas-Despuig, G.; Teruel-Coll, R.; Cerdá-Alberich, L.; Martí-Bonmatí, L. Deep learning automatic semantic segmentation of glioblastoma multiforme regions on multimodal magnetic resonance images. Int. J. Comput. Assist. Radiol. Surg. 2024, 19, 1743–1751. [Google Scholar] [CrossRef] [PubMed]
Baid, U.; Ghodasara, S.; Mohan, S.; Bilello, M.; Calabrese, E.; Colak, E.; Farahani, K.; Kalpathy-Cramer, J.; Kitamura, F.; Pati, S.; et al. The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv 2021, arXiv:2107.02314. [Google Scholar]
Bakas, S.; Akbari, H.; Sotiras, A.; Bilello, M.; Rozycki, M.; Kirby, J.; Freymann, J.; Farahani, K.; Davatzikos, C. Segmentation Labels for the Pre-Operative Scans of the TCGA-GBM Collection. 2017. [Data Set]. Available online: https://www.cancerimagingarchive.net/analysis-result/brats-tcga-gbm/ (accessed on 9 July 2025).
Bakas, S.; Sako, C.; Akbari, H.; Bilello, M.; Sotiras, A.; Shukla, G.; Rudie, J.; Flores Santamaria, N.; Fathi Kazerooni, A.; Pati, S.; et al. Multi-parametric magnetic resonance imaging (mpMRI) scans for de novo Glioblastoma (GBM) patients from the University of Pennsylvania Health System (UPENN-GBM). Cancer Imaging Arch. 2021, 9, 453. [Google Scholar]
Cardoso, M.J.; Li, W.; Brown, R.; Ma, N.; Kerfoot, E.; Wang, Y.; Murrey, B.; Myronenko, A.; Zhao, C.; Yang, D.; et al. MONAI: An open-source framework for deep learning in healthcare. arXiv 2022, arXiv:2211.02701. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
Müller, D.; Soto-Rey, I.; Kramer, F. Towards a guideline for evaluation metrics in medical image segmentation. BMC Res. Notes 2022, 15, 210. [Google Scholar] [CrossRef] [PubMed]
Taha, A.A.; Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Med. Imaging 2015, 15, 29. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Dowdell, B.; Engelder, T.; Pulmano, Z.; Osa, N.; Barman, A. Glioblastoma tumor segmentation using an ensemble of vision transformers. In Proceedings of the Medical Imaging 2025: Computer-Aided Diagnosis, San Diego, CA, USA, 17–20 February 2025; Volume 13407, pp. 487–496. [Google Scholar]
Myronenko, A. 3D MRI brain tumor segmentation using autoencoder regularization. arXiv 2018, arXiv:1810.11654. [Google Scholar] [CrossRef]
Guo, X.; Zhang, B.; Peng, Y.; Chen, F.; Li, W. Segmentation of glioblastomas via 3D FusionNet. Front. Oncol. 2024, 14, 1488616. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Segmentation workflow.

Figure 2. Multi-encoder architecture.

Figure 3. Architectures of the blocks composing the encoder: (a) residual block structure and (b) squeeze-and-excitation block structure.

Figure 4. Visual comparison of segmentation results for some representative cases sampled from the test set.

Table 1. Model performance on the test set, expressed as mean ± std. Best results are in bold.

Subregion	SegFormer3D			Multi-Encoder
	Dice	Precision	Recall	Dice	Precision	Recall
NCR *	0.66 ± 0.29	0.68 ± 0.28	0.74 ± 0.26	0.78 ± 0.28	0.84 ± 0.24	0.79 ± 0.27
ED *	0.78 ± 0.18	0.77 ± 0.18	0.82 ± 0.18	0.86 ± 0.14	0.87 ± 0.15	0.87 ± 0.14
ET *	0.79 ± 0.18	0.80 ± 0.17	0.80 ± 0.19	0.88 ± 0.15	0.89 ± 0.12	0.89 ± 0.15
TC *	0.81 ± 0.26	0.82 ± 0.29	0.87 ± 0.15	0.91 ± 0.13	0.93 ± 0.13	0.90 ± 0.16
WT *	0.84 ± 0.19	0.84 ± 0.22	0.90 ± 0.11	0.92 ± 0.08	0.94 ± 0.07	0.92 ± 0.10

* Differences between models are statistically significant for all subregions (p < 0.001, Wilcoxon signed-rank test).

Table 2. Model performance on external validation datasets, expressed as mean ± std.

Subregion	UPENN-GBM			TCGA-GBM
	Dice	Precision	Recall	Dice	Precision	Recall
NCR	0.76 ± 0.19	0.86 ± 0.16	0.72 ± 0.21	0.74 ± 0.22	0.82 ± 0.22	0.73 ± 0.22
ED	0.83 ± 0.13	0.83 ± 0.15	0.85 ± 0.12	0.86 ± 0.13	0.88 ± 0.12	0.86 ± 0.15
ET	0.85 ± 0.11	0.83 ± 0.13	0.88 ± 0.12	0.88 ± 0.12	0.88 ± 0.15	0.91 ± 0.12
TC	0.91 ± 0.11	0.93 ± 0.11	0.89 ± 0.12	0.92 ± 0.12	0.94 ± 0.13	0.91 ± 0.11
WT	0.90 ± 0.09	0.91 ± 0.10	0.90 ± 0.08	0.94 ± 0.10	0.95 ± 0.10	0.92 ± 0.11

Table 3. Comparison of models in terms of performance and computational requirements.

Model	Mean Dice ET	Mean Dice NCR	Mean Dice ED	Inference Time (s)	Peak Memory (MB)	Parameters (M)
SegFormer3D	0.79	0.66	0.78	0.65	1939.5	∼4.5
Multi-encoder	0.88	0.78	0.86	1.41	5487.6	∼11.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cariola, A.; Sibilano, E.; Brunetti, A.; Buongiorno, D.; Guerriero, A.; Bevilacqua, V. Enhanced Segmentation of Glioma Subregions via Modality-Aware Encoding and Channel-Wise Attention in Multimodal MRI. Appl. Sci. 2025, 15, 8061. https://doi.org/10.3390/app15148061

AMA Style

Cariola A, Sibilano E, Brunetti A, Buongiorno D, Guerriero A, Bevilacqua V. Enhanced Segmentation of Glioma Subregions via Modality-Aware Encoding and Channel-Wise Attention in Multimodal MRI. Applied Sciences. 2025; 15(14):8061. https://doi.org/10.3390/app15148061

Chicago/Turabian Style

Cariola, Annachiara, Elena Sibilano, Antonio Brunetti, Domenico Buongiorno, Andrea Guerriero, and Vitoantonio Bevilacqua. 2025. "Enhanced Segmentation of Glioma Subregions via Modality-Aware Encoding and Channel-Wise Attention in Multimodal MRI" Applied Sciences 15, no. 14: 8061. https://doi.org/10.3390/app15148061

APA Style

Cariola, A., Sibilano, E., Brunetti, A., Buongiorno, D., Guerriero, A., & Bevilacqua, V. (2025). Enhanced Segmentation of Glioma Subregions via Modality-Aware Encoding and Channel-Wise Attention in Multimodal MRI. Applied Sciences, 15(14), 8061. https://doi.org/10.3390/app15148061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Segmentation of Glioma Subregions via Modality-Aware Encoding and Channel-Wise Attention in Multimodal MRI

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. BraTS-2023 Dataset

3.2. External Validation Datasets

3.3. Preprocessing

3.4. Multi-Encoder Architecture

3.5. Experimental Details

3.6. Model Evaluation

4. Results

4.1. Results on BraTS Dataset

4.2. Results on External Validation Datasets

5. Discussion

6. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI