Bidirectional T1–T2 Brain MRI Synthesis Using a Fusion U-Net Transformer for Real-World Clinical Data
Abstract
1. Introduction
2. Related Work
2.1. Traditional Methods
2.2. Deep Learning-Based MRI Synthesis
2.3. Generative Adversarial Networks for MRI Synthesis
2.4. Vision Transformers and Hybrid Architectures for Medical Image Synthesis
3. Materials and Methods
3.1. Problem Definition and Overview
- Convolutional inductive biases for local texture modeling.
- Axial attention for efficient long-range dependency capture.
- Transformer-based global context reasoning at the bottleneck.
- Adversarial training with LSGAN for perceptual realism.
- Feature fusion refinement to mitigate skip connection artifacts.
3.2. Generator Architecture: Fusion U-Net Transformer
3.2.1. Encoder Path with Hierarchical Feature Extraction
3.2.2. Axial Attention
3.2.3. Transformer Bottleneck
3.2.4. Decoder Path with Fusion Refinement Mechanism
3.2.5. Output Layer
3.3. Discriminator Architecture: Conditional PatchGAN
3.4. Loss Functions and Training Strategies
3.4.1. Adversarial Learning with LSGAN Objective
3.4.2. L1 Loss for Pixel-Level Accuracy
3.4.3. SSIM Loss for Structural Similarity
3.4.4. Hybrid Reconstruction Loss Function
4. Clinical Dataset and Preprocessing
4.1. Data Acquisition and Dataset Characteristics
4.2. Preprocessing Pipeline
5. Experiments
5.1. Experimental Setup
5.2. Training Details
5.3. Evaluation Metrics
5.4. Model Complexity Analysis
6. Results
6.1. Quantitative Results
6.2. Qualitative Results
6.3. Expert Radiological Evaluation
7. Discussion
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CNN | Convolutional Neural Network |
| GAN | Generative Adversarial Network |
| LSGAN | Least Squares GAN |
| MRI | Magnetic Resonance Imaging |
| PSNR | Peak Signal-to-Noise Ratio |
| SSIM | Structural Similarity Index Measure |
References
- Smith-Bindman, R.; Miglioretti, D.L.; Larson, E.B. Rising use of diagnostic medical imaging in a large integrated health system. Health Aff. 2008, 27, 1491–1502. [Google Scholar] [CrossRef]
- Bitar, R.; Leung, G.; Perng, R.; Tadros, S.; Moody, A.R.; Sarrazin, J.; Roberts, T.P. MR pulse sequences: What every radiologist wants to know but is afraid to ask. Radiographics 2006, 26, 513–537. [Google Scholar] [CrossRef] [PubMed]
- Dayarathna, S.; Islam, K.; Uribe, S.; Yang, G.; Hayat, M.; Chen, Z. Deep learning based synthesis of MRI, CT and PET: Review and analysis. Med. Image Anal. 2023, 92, 103046. [Google Scholar] [CrossRef]
- Fard, A.S.; Reutens, D.C.; Vegh, V. From CNNs to GANs for cross-modality medical image estimation. Comput. Biol. Med. 2022, 146, 105556. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16×16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 3–7 May 2021. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tao, D. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar] [CrossRef]
- Jog, A.; Carass, A.; Roy, S.; Pham, D.L.; Prince, J.L. Random forest regression for magnetic resonance image synthesis. Med. Image Anal. 2017, 35, 475–488. [Google Scholar] [CrossRef]
- Roy, S.; Carass, A.; Prince, J.L. Magnetic resonance image example-based contrast synthesis. IEEE Trans. Med. Imaging 2013, 32, 2348–2363. [Google Scholar] [CrossRef]
- Huang, Y.; Shao, L.; Frangi, A.F. Cross-modality image synthesis via weakly coupled and geometry co-regularized joint dictionary learning. IEEE Trans. Med. Imaging 2017, 37, 815–827. [Google Scholar] [CrossRef]
- Osman, A.F.I.; Tamam, N.M. Deep learning-based convolutional neural network for intramodality brain MRI synthesis. J. Appl. Clin. Med. Phys. 2022, 23, e13530. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Huang, X.; Zhang, Z.; Liu, L.; Wang, F.; Li, S.; Xia, J. Synthesis of magnetic resonance images from computed tomography data using convolutional neural network with contextual loss function. Quant. Imaging Med. Surg. 2022, 12, 3151–3164. [Google Scholar] [CrossRef]
- Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar] [CrossRef]
- Nie, D.; Trullo, R.; Lian, J.; Petitjean, C.; Ruan, S.; Wang, Q.; Shen, D. Medical image synthesis with context-aware generative adversarial networks. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Quebec, QC, Canada, 10–14 September 2017; pp. 417–425. [Google Scholar] [CrossRef]
- Dar, S.U.; Yurt, M.; Karacan, L.; Erdem, A.; Erdem, E.; Çukur, T. Image synthesis in multi-contrast MRI with conditional generative adversarial networks. IEEE Trans. Med. Imaging 2019, 38, 2375–2388. [Google Scholar] [CrossRef]
- Sharma, A.; Hamarneh, G. Missing MRI pulse sequence synthesis using multi-modal generative adversarial network. IEEE Trans. Med. Imaging 2019, 39, 1170–1183. [Google Scholar] [CrossRef] [PubMed]
- Kawahara, D.; Nagata, Y. T1-weighted and T2-weighted MRI image synthesis with convolutional generative adversarial networks. Rep. Pract. Oncol. Radiother. 2021, 26, 35–42. [Google Scholar] [CrossRef] [PubMed]
- Xu, L.; Zhang, H.; Song, L.; Lei, Y. Bi-MGAN: Bidirectional T1-to-T2 MRI images prediction using multi-generative multi-adversarial nets. Biomed. Signal Process. Control 2022, 78, 103994. [Google Scholar] [CrossRef]
- Xu, L.; Lei, Y.; Shao, J.; Zeng, X.; Li, W. Modal disentangled generative adversarial networks for bidirectional magnetic resonance image synthesis. Eng. Appl. Artif. Intell. 2025, 141, 109817. [Google Scholar] [CrossRef]
- Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar] [CrossRef]
- Yang, H.; Sun, J.; Carass, A.; Zhao, C.; Lee, J.; Xu, Z.; Prince, J. Unpaired brain MR-to-CT synthesis using a structure-constrained CycleGAN. In Proceedings of the Deep Learning in Medical Image Analysis, Granada, Spain, 20 September 2018; pp. 174–182. [Google Scholar] [CrossRef]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.K.; Wang, Z.; Smolley, S.P. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar] [CrossRef]
- Skandarani, Y.; Jodoin, P.-M.; Lalande, A. GANs for medical image synthesis: An empirical study. J. Imaging 2023, 9, 69. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Zhou, Y. TransUNet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Dalmaz, O.; Yurt, M.; Çukur, T. ResViT: Residual vision transformers for multimodal medical image synthesis. IEEE Trans. Med. Imaging 2022, 41, 2598–2614. [Google Scholar] [CrossRef]
- Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for activation functions. arXiv 2017, arXiv:1710.05941. [Google Scholar] [CrossRef]
- Wang, H.; Zhu, Y.; Green, B.; Adam, H.; Yuille, A.; Chen, L.-C. Axial-DeepLab: Stand-alone axial-attention for panoptic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Virtual, 23–28 August 2020; pp. 108–126. [Google Scholar] [CrossRef]
- Xiong, R.; Yang, Y.; He, D.; Zheng, K.; Zheng, S.; Xing, C.; Zhang, H.; Lan, Y.; Wang, L.; Liu, T. On layer normalization in the transformer architecture. In Proceedings of the International Conference on Machine Learning (ICML), Virtual, 13–18 July 2020; pp. 10524–10533. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar] [CrossRef]
- Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 2017, 3, 47–57. [Google Scholar] [CrossRef]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Yaniv, Z.; Lowekamp, B.C.; Johnson, H.J.; Beare, R. SimpleITK image-analysis notebooks: A collaborative environment for education and reproducible research. J. Digit. Imaging 2018, 31, 290–303. [Google Scholar] [CrossRef] [PubMed]



| Layer | Operation | Output Shape |
|---|---|---|
| 1 | Conv (2 → 64, k = 4, s = 2) + LeakyReLU (0.2) | (B, 64, 128, 128) |
| 2 | Conv (64 → 128, k = 4, s = 2) + InstanceNorm + LeakyReLU | (B, 128, 64, 64) |
| 3 | Conv (128 → 256, k = 4, s = 2) + InstanceNorm + LeakyReLU | (B, 256, 32, 32) |
| 4 | Conv (256 → 512, k = 4, s = 1) + InstanceNorm + LeakyReLU | (B, 512, 31, 31) |
| 5 | Conv (512 → 1, k = 4, s = 1) | (B, 1, 30, 30) |
| Model | Direction | SSIM | PSNR |
|---|---|---|---|
| Simple U-Net | T1 → T2 | 0.5775 ± 0.010 | 20.53 ± 0.14 dB |
| T2 → T1 | 0.6323 ± 0.005 | 23.71 ± 0.21 dB | |
| CycleGAN | T1 → T2 | 0.6425 ± 0.005 | 17.77 ± 0.02 dB |
| T2 → T1 | 0.6541 ± 0.002 | 17.83 ± 0.26 dB | |
| Pix2pix | T1 → T2 | 0.6895 ± 0.075 | 19.36 ± 0.11 dB |
| T2 → T1 | 0.7415 ± 0.018 | 22.10 ± 0.29 dB | |
| ResViT | T1 → T2 | 0.7200 ± 0.021 | 20.14 ± 0.30 dB |
| T2 → T1 | 0.7670 ± 0.004 | 22.50 ± 0.18 dB | |
| Our Model | T1 → T2 | 0.7870 ± 0.015 | 21.06 ± 0.13 dB |
| T2 → T1 | 0.8300 ± 0.005 | 24.03 ± 0.07 dB |
| Ablation Study | Direction | SSIM | PSNR |
|---|---|---|---|
| w/o 1 SSIM loss | T1 → T2 | 0.7578 ± 0.0017 | 20.82 ± 0.11 dB |
| T2 → T1 | 0.8100 ± 0.0030 | 23.98 ± 0.15 dB | |
| w/o 1 GAN loss | T1 → T2 | 0.7674 ± 0.0011 | 21.06 ± 0.06 dB |
| T2 → T1 | 0.8254 ± 0.0019 | 24.05 ± 0.02 dB | |
| w/o 1 Axial Attention | T1 → T2 | 0.7885 ± 0.0190 | 20.87 ± 0.01 dB |
| T2 → T1 | 0.8320 ± 0.0060 | 24.11 ± 0.05 dB | |
| w/o 1 Transformer | T1 → T2 | 0.7601 ± 0.0004 | 20.59 ± 0.10 dB |
| T2 → T1 | 0.8148 ± 0.0035 | 23.73 ± 0.08 dB | |
| w/o Fusion Refine Block | T1 → T2 | 0.7740 ± 0.0170 | 20.74 ± 0.08 dB |
| T2 → T1 | 0.8212 ± 0.0014 | 23.93 ± 0.11 dB |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Cantemir, Z.; Karacan, H.; Cindil, E.; Kalafat, B. Bidirectional T1–T2 Brain MRI Synthesis Using a Fusion U-Net Transformer for Real-World Clinical Data. Appl. Sci. 2026, 16, 3674. https://doi.org/10.3390/app16083674
Cantemir Z, Karacan H, Cindil E, Kalafat B. Bidirectional T1–T2 Brain MRI Synthesis Using a Fusion U-Net Transformer for Real-World Clinical Data. Applied Sciences. 2026; 16(8):3674. https://doi.org/10.3390/app16083674
Chicago/Turabian StyleCantemir, Zeynep, Hacer Karacan, Emetullah Cindil, and Burak Kalafat. 2026. "Bidirectional T1–T2 Brain MRI Synthesis Using a Fusion U-Net Transformer for Real-World Clinical Data" Applied Sciences 16, no. 8: 3674. https://doi.org/10.3390/app16083674
APA StyleCantemir, Z., Karacan, H., Cindil, E., & Kalafat, B. (2026). Bidirectional T1–T2 Brain MRI Synthesis Using a Fusion U-Net Transformer for Real-World Clinical Data. Applied Sciences, 16(8), 3674. https://doi.org/10.3390/app16083674

