3D TractFormer: 3D Direct Volumetric White Matter Tract Segmentation with Hybrid Channel-Wise Transformer
Abstract
1. Introduction
- We propose to deeply interleave convolutions and transformer blocks into a U-shaped network. This effectively integrates the respective strengths of both to extract spatial contextual features and global long-distance dependencies, thereby enhancing feature extraction for direct volumetric tract segmentation.
- We propose a novel memory- and computationally efficient channel-wise transformer. It integrates depth-wise separable convolution and compressed contextual feature-based channel-wise attention. It can effectively address the memory and computational challenges of 4D computation. Moreover, it facilitates the modeling of global dependencies of contextual features and ensures that each hierarchical layer of the 3D TractFormer focuses on complementary features for tract segmentation.
- We propose to train a fully symmetric network with gradually sized volumetric patches, which can solve the challenge of few 3D training samples and further reduce memory and computational costs.
- We show experimentally that the proposed 3D TractFormer outperforms existing methods that have demonstrated state-of-the-art performance for white matter tract segmentation.
2. Related Works
2.1. White Matter Tract Segmentation
2.2. Vision Transformer
2.3. Hybrid Convolution and Transformer
3. Methods
3.1. Overall Pipeline
3.2. Multi-Head Convolution-Based Channel-Wise Attention
3.3. Context-Enhanced Feed-Forward Network
3.4. Network Configuration and Training Strategy
4. Experimental Results
4.1. Datasets
4.2. Experimental Settings
4.3. Evaluation Metrics
4.4. Performance Evaluation
4.4.1. Comparison with State-of-the-Art Methods
4.4.2. Effectiveness on Small-Scale and Challenging Tracts
4.4.3. Model Complexity and Computational Cost
4.5. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lawes, I.N.C.; Barrick, T.R.; Murugam, V.; Spierings, N.; Evans, D.R.; Song, M.; Clark, C.A. Atlas-based segmentation of white matter tracts of the human brain using diffusion tensor tractography and comparison with classical dissection. Neuroimage 2008, 39, 62–79. [Google Scholar] [CrossRef] [PubMed]
- Van Essen, D.C.; Smith, S.M.; Barch, D.M.; Behrens, T.E.; Yacoub, E.; Ugurbil, K.; WU-Minn HCP Consortium. The WU-Minn human connectome project: An overview. Neuroimage 2013, 80, 62–79. [Google Scholar] [CrossRef] [PubMed]
- Clayden, J.D.; Storkey, A.J.; Bastin, M.E. A probabilistic model-based approach to consistent white matter tract segmentation. IEEE Trans. Med. Imaging 2007, 26, 1555–1561. [Google Scholar] [CrossRef] [PubMed]
- Wasserthal, J.; Neher, P.; Maier-Hein, K.H. TractSeg-Fast and accurate white matter tract segmentation. NeuroImage 2018, 183, 239–253. [Google Scholar] [CrossRef]
- Zhang, F.; Daducci, A.; He, Y.; Schiavi, S.; Seguin, C.; Smith, R.E.; Yeh, C.H.; Zhao, T.; O’Donnell, L.J. Quantitative mapping of the brain’s structural connectivity using diffusion MRI tractography: A review. Neuroimage 2022, 249, 118870. [Google Scholar] [CrossRef]
- Jonasson, L.; Bresson, X.; Hagmann, P.; Cuisenaire, O.; Meuli, R.; Thiran, J.P. White matter fiber tract segmentation in DT-MRI using geometric flows. Med. Image Anal. 2005, 9, 223–236. [Google Scholar] [CrossRef]
- Bazin, P.L.; Ye, C.; Bogovic, J.A.; Shiee, N.; Reich, D.S.; Prince, J.L.; Pham, D.L. Direct segmentation of the major white matter tracts in diffusion tensor images. Neuroimage 2011, 58, 458–468. [Google Scholar] [CrossRef]
- Ghazi, N.; Aarabi, M.H.; Soltanian-Zadeh, H. Deep Learning Methods for Identification of White Matter Fiber Tracts: Review of State-of-the-Art and Future Prospective. Neuroinformatics 2023, 21, 517–548. [Google Scholar] [CrossRef]
- Anderson, A.W. Measurement of fiber orientation distributions using high angular resolution diffusion imaging. Magn. Reson. Med. Off. J. Int. Soc. Magn. Reson. Med. 2005, 54, 1194–1206. [Google Scholar] [CrossRef]
- Wasserthal, J.; Neher, P.F.; Hirjak, D.; Maier-Hein, K.H. Combined tract segmentation and orientation mapping for bundle-specific tractography. Med. Image Anal. 2019, 58, 101559. [Google Scholar] [CrossRef]
- Li, B.; De Groot, M.; Steketee, R.M.; Meijboom, R.; Smits, M.; Vernooij, M.W.; Ikram, M.A.; Liu, J.; Niessen, W.J.; Bron, E.E. Neuro4Neuro: A neural network approach for neural tract segmentation using large-scale population-based diffusion imaging. Neuroimage 2020, 218, 116993. [Google Scholar] [CrossRef]
- Nelkenbaum, I.; Tsarfaty, G.; Kiryati, N.; Konen, E.; Mayer, A. Automatic segmentation of white matter tracts using multiple brain MRI sequences. In Proceedings of the IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 368–371. [Google Scholar]
- Zhang, F.; Karayumak, S.C.; Hoffmann, N.; Rathi, Y.; Golby, A.J.; O’Donnell, L.J. Deep white matter analysis (DeepWMA): Fast and consistent tractography segmentation. Med. Image Anal. 2020, 65, 101761. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Lu, Q.; Zhuo, Z.; Li, Y.; Duan, Y.; Yu, P.; Qu, L.; Ye, C.; Liu, Y. Volumetric segmentation of white matter tracts with label embedding. Neuroimage 2022, 250, 118934. [Google Scholar] [CrossRef] [PubMed]
- Dong, X.; Yang, Z.; Peng, J.; Wu, X. Multimodality white matter tract segmentation using CNN. In Proceedings of the ACM Turing Celebration Conference-China, Chengdu, China, 17–19 May 2019; pp. 1–8. [Google Scholar]
- Lu, Q.; Li, Y.; Ye, C. Volumetric white matter tract segmentation with nested self-supervised learning using sequential pretext tasks. Med. Image Anal. 2021, 72, 102094. [Google Scholar] [CrossRef] [PubMed]
- Lu, Q.; Liu, W.; Zhuo, Z.; Li, Y.; Duan, Y.; Yu, P.; Qu, L.; Ye, C.; Liu, Y. A transfer learning approach to few-shot segmentation of novel white matter tracts. Med. Image Anal. 2022, 79, 102454. [Google Scholar] [CrossRef]
- Tchetchenian, A.; Zhu, Y.; Zhang, F.; O’Donnell, L.J.; Song, Y.; Meijering, E. A comparison of manual and automated neural architecture search for white matter tract segmentation. Sci. Rep. 2023, 13, 1617. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Virtual, 6–14 December 2021; pp. 12077–12090. [Google Scholar]
- Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 17683–17693. [Google Scholar]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. 2022, 54, 200. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
- Zhang, J.; Zhang, Y.; Gu, J.; Dong, J.; Kong, L.; Yang, X. Xformer: Hybrid X-Shaped Transformer for Image Denoising. arXiv 2023, arXiv:2303.06440. [Google Scholar]
- Chen, C.; Miao, J.; Wu, D.; Zhong, A.; Yan, Z.; Kim, S.; Hu, J.; Liu, Z.; Sun, L.; Li, X.; et al. Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation. Med. Image Anal. 2024, 98, 103310. [Google Scholar] [CrossRef]
- Pang, Y.; Liang, J.; Huang, T.; Chen, H.; Li, Y.; Li, D.; Huang, L.; Wang, Q. Slim UNETR: Scale hybrid transformers to efficient 3D medical image segmentation under limited computational resources. IEEE Trans. Med. Imaging 2023, 43, 994–1005. [Google Scholar] [CrossRef] [PubMed]
- Xing, Z.; Ye, T.; Yang, Y.; Cai, D.; Gai, B.; Wu, X.J.; Gao, F.; Zhu, L. Segmamba-v2: Long-range sequential modeling mamba for general 3d medical image segmentation. IEEE Trans. Med. Imaging 2025, 45, 4–15. [Google Scholar] [CrossRef] [PubMed]
- Ruan, J.; Li, J.; Xiang, S. Vm-unet: Vision mamba unet for medical image segmentation. ACM Trans. Multimed. Comput. Commun. Appl. 2024. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 574–584. [Google Scholar]
- Chen, L.; Wan, L. CTUNet: Automatic pancreas segmentation using a channel-wise transformer and 3D U-Net. Vis. Comput. 2023, 39, 5229–5243. [Google Scholar] [CrossRef]
- Zhou, H.Y.; Guo, J.; Zhang, Y.; Han, X.; Yu, L.; Wang, L.; Yu, Y. nnFormer: Volumetric medical image segmentation via a 3D transformer. IEEE Trans. Image Process. 2023, 32, 4036–4045. [Google Scholar] [CrossRef]
- Kuang, H.; Wang, Y.; Tan, X.; Yang, J.; Sun, J.; Liu, J.; Qiu, W.; Zhang, J.; Zhang, J.; Yang, C.; et al. LW-CTrans: A lightweight hybrid network of CNN and Transformer for 3D medical image segmentation. Med. Image Anal. 2025, 102, 103545. [Google Scholar] [CrossRef]
- Zhang, Z.; Ma, Q.; Zhang, T.; Chen, J.; Zheng, H.; Gao, W. Switch-UMamba: Dynamic scanning vision Mamba UNet for medical image segmentation. Med. Image Anal. 2025, 107, 103792. [Google Scholar] [CrossRef]
- Li, F.; Xu, L.; Ma, Z.; Zhao, Y.; Li, X. APU-Net: A U-Net Enhanced Network with Dynamic Feature Fusion and Pyramid Cross-Attention Mechanism for Polyp Segmentation. Digit. Signal Process. 2026, 172, 105879. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (GELUs). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Acharya, A.; Ortner, J. Progressive learning. Econometrica 2017, 85, 1965–1990. [Google Scholar] [CrossRef]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 41–48. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics: Methodology and Distribution; Springer: New York, NY, USA, 1992; pp. 196–202. [Google Scholar]
- Oktay, O.; Schlemper, J.; Le Folgoc, L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. In Proceedings of the Medical Imaging with Deep Learning, Zurich, Switzerland, 6–8 July 2022. [Google Scholar]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested U-net architecture for medical image segmentation. In Proceedings of the 4th International Workshop on Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Granada, Spain, 20 September 2018; pp. 3–11. [Google Scholar]
- Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.W.; Wu, J. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 4–8 May 2020; pp. 1055–1059. [Google Scholar]
- Zhu, Y.; Tchetchenian, A.; Yin, X.; Liew, A.; Song, Y.; Meijering, E. AC-UNet: Adaptive Connection UNet for White Matter Tract Segmentation Through Neural Architecture Search. In Proceedings of the 2024 IEEE International Symposium on Biomedical Imaging (ISBI), Athens, Greece, 27–30 May 2024; pp. 1–5. [Google Scholar]
- Lucena, O.; Borges, P.; Cardoso, J.; Ashkan, K.; Sparks, R.; Ourselin, S. Informative and reliable tract segmentation for preoperative planning. Front. Radiol. 2022, 2, 866974. [Google Scholar] [CrossRef]



| Methods | 2D/3D | Dice Scores (↑) | RVD Values (↓) | ||
|---|---|---|---|---|---|
| Mean (STD) | p-Value | Mean (STD) | p-Value | ||
| TractSeg [4] | 2D | 0.849 (±0.003) | N/A | 0.097 (±0.006) | N/A |
| Attention Unet [44] | 0.850 (±0.003) | <0.000 | 0.097 (±0.005) | 0.599 | |
| DS-Unet [10] | 0.851 (±0.002) | 0.008 | 0.097 (±0.005) | 0.564 | |
| Unet++ [45] | 0.852 (±0.003) | <0.000 | 0.096 (±0.005) | <0.000 | |
| Unet3+ [46] | 0.853 (±0.003) | <0.000 | 0.097 (±0.005) | 0.001 | |
| AC-UNet [47] | 0.856 (±0.003) | <0.000 | 0.096 (±0.005) | 0.001 | |
| nnUNet [48] | 3D | 0.842 (±0.040) | N/A | 0.109 (±0.006) | N/A |
| m-Neuro4Neuro [11] | 0.844 (±0.004) | <0.000 | 0.101 (±0.006) | <0.000 | |
| Uformer [11] | 0.852 (±0.003) | <0.000 | 0.097 (±0.005) | <0.000 | |
| 3D TractFormer | 0.857 (±0.003) | <0.000 | 0.096 (±0.005) | <0.000 | |
| Tracts | Mean Dice Scores (↑) | Mean RVD Values (↓) | ||||
|---|---|---|---|---|---|---|
| Unet3+ | AC-UNet | 3D TractFormer | Unet3+ | AC-UNet | 3D TractFormer | |
| CA | 0.704 | 0.706 | 0.726 | 0.482 | 0.482 | 0.402 |
| FX_left | 0.757 | 0.757 | 0.763 | 0.210 | 0.210 | 0.189 |
| FX_right | 0.715 | 0.714 | 0.730 | 0.271 | 0.271 | 0.256 |
| ILF_left | 0.822 | 0.823 | 0.829 | 0.105 | 0.104 | 0.106 |
| ILF_right | 0.806 | 0.808 | 0.819 | 0.115 | 0.115 | 0.121 |
| UF_left | 0.787 | 0.789 | 0.801 | 0.195 | 0.194 | 0.163 |
| UF_right | 0.819 | 0.820 | 0.829 | 0.133 | 0.133 | 0.098 |
| Methods | Parameters (M) | TMACs | TFLOPs |
|---|---|---|---|
| TractSeg [4] (2D) | 9.24 | 0.696 | 1.395 |
| Attention Unet [44] (2D) | 34.89 | 3.05 | 6.11 |
| DS-Unet [10] (2D) | 55.505 | 4.296 | 8.605 |
| Unet++ [45] (2D) | 9.05 | 1.369 | 2.742 |
| Unet3+ [46] (2D) | 27.179 | 8.848 | 17.695 |
| AC-UNet [47] (2D) | 22.03 | 17.26 | 34.86 |
| nnUNet [48] (3D) | 4.09 | 0.69 | 1.39 |
| m-Neuro4Neuro [11] (3D) | 51.79 | 2.55 | 5.1 |
| Uformer [11] (3D) | 47.45 | 0.672 | 1.361 |
| 3D TractFormer (3D) | 8.84 | 0.431 | 0.867 |
| Methods | Dice Scores | RVD Values | ||
|---|---|---|---|---|
| Mean (STD) | p-Value | Mean (STD) | p-Value | |
| 3D TractFormer | 0.857 (±0.003) | N/A | 0.096 (±0.005) | N/A |
| 3D TractFormer-xCFFN | 0.853 (±0.003) | <0.000 | 0.097 (±0.005) | <0.000 |
| 3D TractFormer-xMCCA | 0.852 (±0.003) | <0.000 | 0.097 (±0.005) | <0.000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Gao, X.; Tian, H.; Yin, X.; Liew, A.W.-C. 3D TractFormer: 3D Direct Volumetric White Matter Tract Segmentation with Hybrid Channel-Wise Transformer. Sensors 2026, 26, 1068. https://doi.org/10.3390/s26031068
Gao X, Tian H, Yin X, Liew AW-C. 3D TractFormer: 3D Direct Volumetric White Matter Tract Segmentation with Hybrid Channel-Wise Transformer. Sensors. 2026; 26(3):1068. https://doi.org/10.3390/s26031068
Chicago/Turabian StyleGao, Xiang, Hui Tian, Xuefei Yin, and Alan Wee-Chung Liew. 2026. "3D TractFormer: 3D Direct Volumetric White Matter Tract Segmentation with Hybrid Channel-Wise Transformer" Sensors 26, no. 3: 1068. https://doi.org/10.3390/s26031068
APA StyleGao, X., Tian, H., Yin, X., & Liew, A. W.-C. (2026). 3D TractFormer: 3D Direct Volumetric White Matter Tract Segmentation with Hybrid Channel-Wise Transformer. Sensors, 26(3), 1068. https://doi.org/10.3390/s26031068

