A Novel 3D U-Net–Vision Transformer Hybrid with Multi-Scale Fusion for Precision Multimodal Brain Tumor Segmentation in 3D MRI
Abstract
1. Introduction
- The integration of a 3D U Net encoder–decoder with a native 3D Vision Transformer, which enables the model to capture fine-grained anatomical details via skip connections while simultaneously modeling long-range spatial dependencies through multi-head self-attention.
- The design of a flexible patch-based pipeline supporting optional overlap and lightweight 3D augmentations, whereby increased sample diversity through overlapping patches and random flips along all axes enhances generalization on limited medical datasets.
- The application of training strategies to improve efficiency and convergence.
- The proposal of a multimodal architecture that exploits MRI inputs (T1, T1-CE, T2, and FLAIR) through early stacking of volumes, combined with inter-modal attention in the Vision Transformer branch, allows the network to leverage complementary contrast information for more precise tumor boundary delineation and edema detection.
2. Related Work
2.1. U-Net-Based Models
2.2. Vision Transformer-Based Models
| Model | Key Features | Advantages | Inconvenient | 
|---|---|---|---|
| MAU-Net [18] | MDCon, Context Pyramid Module (CPM) | - Better multi-scale feature and spatial context - Precise segmentation | - Increased complexity and computation | 
| VGG-19 + Decoder [22] | U-Net—VGG-19 backbone | - High accuracy and generalization - Efficient on CPU | - No GPU accelerator tested | 
| Bu-Net [23] | RES block, WC block | - Better multi-scale and context capture - Improved segmentation | - Slightly more complex than standard U-Net | 
| U-Net + VGG-16 [26] | VGG-16 backbone | - High segmentation accuracy | - Slightly more complex than standard U-Net | 
| FE-HU-Net [28] | Preprocessing, HU-Net, CNN, Post-processing | - Very high accuracy - Improved boundary refinement | - More complex pipeline - Multiple variants | 
| M-Unet [29] | Multi-encoder Single decoder, HAB, DCB | - Multimodal feature fusion - Improved multi-scale segmentation | - More dice score compared to some latest models | 
| Hybrid Multihead Attentive U Net [33] | Multihead attention | - Improved boundary detection - Better focus on informative regions | - More complex than standard U-Net | 
| TransBTS [37] | 3DCNN encoder, Transformer progressive decoder | - Capture local and global features - Accurate 3D segmentation | - More complex architecture - Higher computational cost | 
| UNETR [38] | Transformer encoder U-shaped decoder with skip connections | - Capture global context - Effective for volumetric segmentation | - High computational cost - Complex architecture | 
| BRAINET [41] | Ensemble of mask former-based ViTs, multi-plane prediction | - Accurate segmentation of tumor subregions | - High computational cost | 
| 3D Brain-Former [42] | FHSA for multi-scale attention fusion. IDFM for deformable feature extraction | - Outperforms SOTA across Dice, Sensitivity, PPV, HD95 | - Complex design - Computationally expensive | 
| nnFormer [43] | Local and global self-attention Skip attention instead of standard skip connections | - Superior Dice and HD95 | - High computational complexity | 
| SegFormer3D [44] | Hierarchical multiscale attention Lightweight MLP encoder | - Competitive performance | - May sacrifice some accuracy compared to heavier models | 
| 3D MAT [45] | Axial attention + self-distillation | - Compact and suitable for clinical use | - Limited evaluation | 
| Swin-Unet [46] | Swin Transformer blocks | - Strong generation - Outperforms CNN and hybrid models | - Computation demand | 
| TransUNet [47] | Transformer encoder for global context CNN decoder for global detail | - Outperforms CNN, Self-attention models - Accurate segmentation | - Still computationally heavy | 
3. Proposed Model
3.1. Architecture Overview
3.1.1. Step 1: 3D U-Net Component
- Encoder. The input volume has dimensions 128 × 128 × 128 and is initially fed into the encoder. The encoder systematically reduces spatial resolution using a series of specialized blocks while preserving spatial features. Each block contains two 3D convolutional layers. To achieve stable and robust learning, batch normalization and a ReLU activation function are applied, together with a dropout rate of 0.1 to prevent overfitting. The number of filters doubles at each level, increasing from 64 to 128, then to 256, and finally to 512. After each level, a 3D max-pooling operation with stride 2 reduces spatial dimensions while retaining semantic information (downsampling).
- Bottleneck. The bottleneck, situated between the encoder and decoder, uses 1024 filters to capture the most abstract and high-level features from the input.
- Decoder. The decoder mirrors the encoder. Each upsampling stage employs a 3D transposed convolution layer with stride 2, followed by two 3D convolutional layers with batch normalization and ReLU activations. After each upsampling operation, the decoder concatenates detailed features from the corresponding encoder stage via skip connections.
3.1.2. Step 2: 3D ViT Component
3.1.3. Step 3: Feature Fusion and Segmentation Output
4. Experiments
4.1. Implementation Details
4.2. Dataset Description
4.3. Data Preparation
4.4. Evaluation Metrics
4.5. Loss Function
- C refers to the total number of classes, refer to the true label for class i, and .
5. Results
5.1. Training Analysis
5.2. Performance Evaluation
5.3. Qualitative Visualization
5.4. Comparison with State-of-the-Art
5.5. Impact of the Multimodalities
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef] [PubMed]
- Shen, D.; Wu, G.; Suk, H.-I. Deep Learning in Medical Image Analysis. Annu. Rev. Biomed. Eng. 2017, 19, 221–248. [Google Scholar] [CrossRef] [PubMed]
- Rayed, M.E.; Islam, S.S.; Niha, S.I.; Jim, J.R.; Kabir, M.M.; Mridha, M. Deep learning for medical image segmentation: State-of-the-art advancements and challenges. Inform. Med. Unlocked 2024, 47, 101504. [Google Scholar] [CrossRef]
- Sun, W.; Song, C.; Tang, C.; Pan, C.; Xue, P.; Fan, J.; Qiao, Y. Performance of deep learning algorithms to distinguish high-grade glioma from low-grade glioma: A systematic review and meta-analysis. iScience 2023, 26, 106815. [Google Scholar] [CrossRef]
- Hamdaoui, F.; Sakly, A. Automatic diagnostic system for segmentation of 3D/2D brain MRI images based on a hardware architecture. Microprocess. Microsyst. 2023, 98, 104814. [Google Scholar] [CrossRef]
- Ghribi, F.; Hamdaoui, F. Innovative Deep Learning Architectures for Medical Image Diagnosis: A Comprehensive Review of Convolutional, Recurrent, and Transformer Models. Vis. Comput. 2025, 1–26. [Google Scholar] [CrossRef]
- Xiao, H.; Li, L.; Liu, Q.; Zhu, X.; Zhang, Q. Transformers in medical image segmentation: A review. Biomed. Signal Process. Control 2023, 84, 104791. [Google Scholar] [CrossRef]
- El-Taraboulsi, J.; Cabrera, C.P.; Roney, C.; Aung, N. Deep neural network architectures for cardiac image segmentation. Artif. Intell. Life Sci. 2023, 4, 100083. [Google Scholar] [CrossRef]
- Anand, V.; Gupta, S.; Koundal, D.; Nayak, S.R.; Barsocchi, P.; Bhoi, A.K. Modified U-NET Architecture for Segmentation of Skin Lesion. Sensors 2022, 22, 867. [Google Scholar] [CrossRef]
- Dong, X.; Lei, Y.; Wang, T.; Thomas, M.; Tang, L.; Curran, W.J.; Liu, T.; Yang, X. Automatic multiorgan segmentation in thorax CT images using U-net-GAN. Med. Phys. 2019, 46, 2157–2168. [Google Scholar] [CrossRef]
- Azad, R.; Kazerouni, A.; Heidari, M.; Aghdam, E.K.; Molaei, A.; Jia, Y.; Jose, A.; Roy, R.; Merhof, D. Advances in medical image analysis with vision Transformers: A comprehensive review. Med. Image Anal. 2024, 91, 103000. [Google Scholar] [CrossRef]
- Jiang, Y.; Dong, J.; Cheng, T.; Zhang, Y.; Lin, X.; Liang, J. iU-Net: A hybrid structured network with a novel feature fusion approach for medical image segmentation. BioData Min. 2023, 16, 5. [Google Scholar] [CrossRef]
- Bayoudh, K.; Hamdaoui, F.; Mtibaa, A. An Attention-based Hybrid 2D/3D CNN-LSTM for Human Action Recognition. In Proceedings of the 2022 2nd International Conference on Computing and Information Technology, ICCIT, Tabuk, Saudi Arabia, 25–27 January 2022; pp. 97–103. [Google Scholar]
- Zhang, J.; Li, F.; Zhang, X.; Wang, H.; Hei, X. Automatic Medical Image Segmentation with Vision Transformer. Appl. Sci. 2024, 14, 2741. [Google Scholar] [CrossRef]
- Al-hammuri, K.; Gebali, F.; Kanan, A.; Chelvan, I.E.T. Vision transformer architecture and applications in digital health: A tutorial and survey. Vis. Comput. Ind. Biomed. Art 2023, 6, 14. [Google Scholar] [CrossRef] [PubMed]
- Azad, R.; Heidari, M.; Wu, Y.; Merhof, D. Contextual Attention Network: Transformer Meets U-Net. In Proceedings of the 13th International Workshop, MLMI 2022, Singapore, 18 September 2022. [Google Scholar]
- Chen, B.; He, T.; Wang, W.; Han, Y.; Zhang, J.; Bobek, S.; Zabukovsek, S.S. MRI Brain Tumour Segmentation Using Multiscale Attention U-Net. Informatica 2024, 35, 751–774. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q.V. MixConv: Mixed Depthwise Convolutional Kernels. arXiv 2019, arXiv:1907.09595. [Google Scholar] [CrossRef]
- Feng, S.; Zhao, H.; Shi, F.; Cheng, X.; Wang, M.; Ma, Y.; Xiang, D.; Zhu, W.; Chen, X. CPFNet: Context Pyramid Fusion Network for Medical Image Segmentation. IEEE Trans. Med. Imaging 2020, 39, 3008–3018. [Google Scholar] [CrossRef]
- Bousselham, W.; Thibault, G.; Pagano, L.; Machireddy, A.; Gray, J.; Chang, Y.H.; Song, X. Efficient Self-Ensemble for Semantic Segmentation. arXiv 2022, arXiv:2111.13280. [Google Scholar] [CrossRef]
- Aboussaleh, I.; Riffi, J.; El Fazazy, K.; Mahraz, M.A.; Tairi, H. Efficient U-Net Architecture with Multiple Encoders and Attention Mechanism Decoders for Brain Tumor Segmentation. Diagnostics 2023, 13, 872. [Google Scholar] [CrossRef]
- Rehman, M.U.; Cho, S.; Kim, J.H.; Chong, K.T. BU-Net: Brain Tumor Segmentation Using Modified U-Net Architecture. Electronics 2020, 9, 2203. [Google Scholar] [CrossRef]
- Bakas, S.; Reyes, M.; Jakab, A.; Bauer, S.; Rempfler, M. Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge. arXiv 2018, arXiv:1811.02629. [Google Scholar] [CrossRef]
- Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 2014, 34, 1993–2024. [Google Scholar] [CrossRef]
- Ghosh, S.; Chaki, A.; Santosh, K. Improved U-Net architecture with VGG-16 for brain tumor. Phys. Eng. Sci. Med. 2021, 44, 703–712. [Google Scholar] [CrossRef]
- Mazurowski, M.A.; Clark, K.; Czarnek, N.M.; Shamsesfandabadi, P.; Peters, K.B.; Saha, A. Radiogenomics of lower-grade glioma: Algorithmically-assessed tumor shape is associated with tumor genomic subtypes and patient outcomes in a multi-institutional study with The Cancer Genome Atlas data. J. Neurooncol. 2017, 133, 27–35. [Google Scholar] [CrossRef]
- Nizamani, A.H.; Chen, Z.; Nizamani, A.A.; Bhatti, U.A. Advance brain tumor segmentation using feature fusion methods with deep U-Net model with CNN for MRI data. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 101793. [Google Scholar] [CrossRef]
- Zhao, L.; Ma, J.; Shao, Y.; Jia, C.; Zhao, J.; Yuan, H. MM-UNet: A multimodality brain tumor segmentation network in MRI images. Front. Oncol. 2022, 12, 950706. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical. In Proceedings of the 18th International Conference on Medical Image Computing and Computer—Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Oktay, O.; Schlemper, J.; Le Folgoc, L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
- Kaur, A.; Singh, Y.; Chinagundi, B. ResUNet + +: A comprehensive improved UNet + + framework for volumetric semantic segmentation of brain tumor MR image. Envol. Syst. 2024, 15, 1567–1585. [Google Scholar] [CrossRef]
- Butt, M.A.; Jabbar, A.U. Hybrid Multihead Attentive Unet-3D for Brain Tumor Segmentation. arXiv 2024, arXiv:2405.13304. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Wang, W.; Chen, C.; Ding, M.; Yu, H.; Zha, S.; Li, J. TransBTS: Multimodal Brain Tumor Segmentation Using Transformer. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021. [Google Scholar]
- Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D. UNETR: Transformers for 3D Medical Image Segmentation. In Proceedings of the 2022 IEEE/CVF Winter Conference on Application of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 574–584. [Google Scholar]
- Landman, B.; Xu, Z.; Igelsias, J.; Styner, M.; Langerak, T.; Klein, A. Multi-Atlas Labeling Beyond the Cranial Vault—Workshop and Challenge. In Proceedings of the MICCAI, Munich, Germany, 5–9 October 2015. [Google Scholar] [CrossRef]
- Simpson, A.L.; Antonelli, M.; Bakas, S.; Bilello, M.; Farahani, K.; van Ginneken, B.; Kopp-Schneider, A.; Landman, B.A.; Litjens, G.; Menze, B.; et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv 2017, arXiv:1902.09063. [Google Scholar]
- Liu, H.; Dowdell, B.; Engelder, T.; Pulmano, Z.; Osa, N.; Barman, A. Glioblastoma Tumor Segmentation using an Ensemble of Vision Transformers. In Proceedings of the MICAD 2023, Cambridge, UK, 9–10 December 2023. [Google Scholar]
- Bakas, S.; Sako, C.; Akbari, H.; Bilello, M.; Sotiras, A.; Shukla, G.; Rudie, J.D.; Santamaría, N.F.; Kazerooni, A.F.; Pati, S.; et al. The university of pennsylvania glioblastoma (upenn-gbm) cohort: Advanced mri, clinical, genomics, & radiomics. Sci. Data 2022, 9, 453. [Google Scholar] [CrossRef]
- Nian, R.; Zhang, G.; Sui, Y.; Qian, Y.; Li, Q.; Zhao, M.; Li, J.; Gholipour, A.; Warfield, S.K. 3D Brainformer: 3D Fusion Transformer for Brain Tumor Segmentation. arXiv 2023, arXiv:2304.14508. [Google Scholar] [CrossRef]
- Zhou, H.-Y.; Guo, J.; Zhang, Y.; Han, X.; Wang, L.; Yu, Y. nnFormer: Volumetric Medical Image Segmentation via a 3D Transformer. IEEE Trans. Image Process. 2023, 32, 4036–4045. [Google Scholar] [CrossRef] [PubMed]
- Perera, S.; Navard, P.; Yilmaz, A. SegFormer3D: An Efficient Transformer for 3D Medical Image Segmentation. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–18 June 2024. [Google Scholar]
- Liu, C.; Kiryu, H. 3D Medical Axial Transformer: A Lightweight Transformer Model for 3D Brain Tumor Segmentation. In Proceedings of the Medical Imaging with Deep Learning 2023, Nashville, TN, USA, 10–12 July 2023; Volume 227, pp. 799–813. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv 2021, arXiv:2105.05537v1. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]









| Stage | Encoder | Output Size | Decoder | Output Size | 
|---|---|---|---|---|
| 0 | Input | 128 × 128 × 128 × 4 | Conv3D [output Layer] [1 × 1] | 128 × 128 × 128 × 4 | 
| 1 | (Conv3D [3 × 3 × 3, BatchNorm, Relu] Dropout [dropout_rate = 0.1]) × 2 MaxPooling3D [2 × 2 × 2] | 64 × 64 × 64 × 64 | Conv3DTranspose [64] (Conv3D [3 × 3 × 3, BatchNorm, Relu] Dropout [ dropout_rate = 0.1]) × 2 | 128 × 128 × 128 × 64 | 
| 2 | (Conv3D [3 × 3 × 3, BatchNorm, Relu] Dropout [dropout_rate = 0.1]) × 2 MaxPooling3D [2 × 2 × 2] | 32 × 32 × 32 × 128 | Conv3DTranspose [128] (Conv3D [3 × 3 × 3, BatchNorm, Relu] Dropout [ dropout_rate = 0.1]) × 2 | 64 × 64 × 64 × 128 | 
| 3 | (Conv3D [3 × 3 × 3, BatchNorm, Relu] Dropout [dropout_rate = 0.1]) × 2 MaxPooling3D [2 × 2 × 2] | 16 × 16 × 16 × 256 | Conv3DTranspose [256] (Conv3D [3 × 3 × 3, BatchNorm, Relu] Dropout [ dropout_rate = 0.1]) × 2 | 32 × 32 × 32 × 256 | 
| 4 | (Conv3D [3 × 3 × 3, BatchNorm, Relu] Dropout [ dropout_rate = 0.1]) × 2 MaxPooling3D [2 × 2 × 2] | 8 × 8 × 8 × 512 | Conv3DTranspose [512] (Conv3D [3 × 3 × 3, BatchNorm, Relu] Dropout [ dropout_rate = 0.1]) × 2 | 16 × 16 × 16 × 512 | 
| 5 | (Conv3D [3 × 3 × 3, BatchNorm, Relu] Dropout [ dropout_rate = 0.1]) × 2 | 8 × 8 × 8 × 1024 | _ | _ | 
| Stage | Operation | Output Size | 
|---|---|---|
| 1 | Input (4 channels (T1, T1ce, T2, FLAIR)) | 128 × 128 × 128 × 4 | 
| Patch Splitting and embedding | 16 × 16 × 16 × 128 → 4096 tokens of size 128 | |
| Conv3D [filters = 128, kernel = 8, strides = 8] | ||
| Flatten to tokens: reshape (batch, 4096, 128) | ||
| 2 | Add positional embedding Learnable pos_emb of shape (1, 4096, 128) added to each token | (batch, 4096, 128) | 
| 3 | Bloc Transformer encoder (∗4) | (batch, 4096, 128) | 
| LayerNormalization [] | ||
| MultiHeadAttention [] | ||
| Dropout [dropout_rate = 0.1] | ||
| Residual connections | ||
| LayerNormalization [] | ||
| FFN [Dense 512 → Dropout → Dense 128 → Dropout] | ||
| Residual connections | ||
| 4 | Reconstruct 3D volume Reshape tokens back: reshape (batch, 16, 16, 16, 128) | 16 × 16 × 16 × 128 | 
| 5 | Segmentation head Conv3D (filters = 4, kernel_size = 1, activation = ‘softmax’) | 16 × 16 × 16 × 4 | 
| Actual Predicted | Background (0) | ET (1) | TC (2) | WT (3) | 
|---|---|---|---|---|
| Background (0) | 7,776,535 | 35 | 15,250 | 1176 | 
| ET (1) | 1458 | 7494 | 1361 | 3033 | 
| TC (2) | 8014 | 91 | 38,175 | 793 | 
| WT (3) | 1027 | 1309 | 968 | 7601 | 
| Global Accuracy | 99.56% | |||
| Average DSC | 77.43% | |||
| STD DSC | 0.0727 | |||
| Average IoU | 71.69% | |||
| STD IoU | 0.0757 | |||
| Methods | DSC (%) | IoU (%) | ||||||
|---|---|---|---|---|---|---|---|---|
| ET | TC | WT | Average | ET | TC | WT | Average | |
| U-Net [26] | 60.0 | 62.0 | 80.0 | 67.33 | 42.0 | 52.0 | 67.0 | 53.66 | 
| VGG-19 + Decoder [22] | 81.32 | 75.13 | 61.66 | 72.70 | 44.57 | 60.17 | 68.53 | 57.75 | 
| SwinUnet [46] | 65.1 | 66.1 | 77.0 | 69.4 | 46.4 | 51.9 | 65.2 | 54.5 | 
| UNETR [38] | 58.5 | 76.1 | 78.9 | 71.16 | 41.33 | 61.42 | 65.13 | 55.96 | 
| M-Unet [29] | 66.00 | 72.00 | 87.00 | 75.00 | 49.25 | 56.25 | 76.99 | 60.83 | 
| TranBTS [37] | 57.40 | 73.50 | 77.90 | 69.6 | 40.25 | 58.10 | 63.80 | 54.04 | 
| TransUnet [47] | 54.20 | 68.40 | 70.60 | 64.4 | 37.17 | 52.01 | 54.55 | 47.91 | 
| Our Method | 66.97 | 80.97 | 84.35 | 77.43 | 60.93 | 74.45 | 79.69 | 71.69 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ghribi, F.; Hamdaoui, F. A Novel 3D U-Net–Vision Transformer Hybrid with Multi-Scale Fusion for Precision Multimodal Brain Tumor Segmentation in 3D MRI. Electronics 2025, 14, 3604. https://doi.org/10.3390/electronics14183604
Ghribi F, Hamdaoui F. A Novel 3D U-Net–Vision Transformer Hybrid with Multi-Scale Fusion for Precision Multimodal Brain Tumor Segmentation in 3D MRI. Electronics. 2025; 14(18):3604. https://doi.org/10.3390/electronics14183604
Chicago/Turabian StyleGhribi, Fathia, and Fayçal Hamdaoui. 2025. "A Novel 3D U-Net–Vision Transformer Hybrid with Multi-Scale Fusion for Precision Multimodal Brain Tumor Segmentation in 3D MRI" Electronics 14, no. 18: 3604. https://doi.org/10.3390/electronics14183604
APA StyleGhribi, F., & Hamdaoui, F. (2025). A Novel 3D U-Net–Vision Transformer Hybrid with Multi-Scale Fusion for Precision Multimodal Brain Tumor Segmentation in 3D MRI. Electronics, 14(18), 3604. https://doi.org/10.3390/electronics14183604
 
         
                                                


 
       