CCTrans: Improving Medical Image Segmentation with Contoured Convolutional Transformer Network
Abstract
1. Introduction
- A novel segmentation model named the contoured convolutional transformer network (CCTrans) was designed for accurate medical image segmentation, which utilizes gated modules and skip connections. Both the dual convolutional (DC) transformer block and the contour detection module are designed to process important information contained in medical images.
- The DC transformer blocks utilize convolutional kernels with different sizes to capture multi-scale information. Short-distance and long-distance attention mechanisms are combined to extract local features and capture long-range dependencies, thereby enhancing the model’s interpretability.
- The contour detection module employs traditional CV techniques, which can help determine regions of interest and refine insignificant contoured segmentation information.
- Comprehensive experiments on two public datasets showed that the novel CCTrans model outperforms other state-of-the-art medical image segmentation methods. Diversified experimental results with illustrations are also presented in the paper.
2. Related Works
2.1. Convolutional Neural Network Methods for Medical Image Segmentation
2.2. Transformer-Based Architectures for Medical Image Segmentation
2.3. U-Shaped Architectures with Transformers for Medical Image Segmentation
3. Proposed Method
3.1. Overall Architecture
3.2. Dual Convolutional (DC) Transformer Block
3.3. Contour Detection Module
4. Datasets and Experiments
4.1. Experimental Datasets
- The synapse abdominal multi-organ (Synapse) dataset: Synapse contains thirty CT volumes of eight kinds of abdominal organs (the aorta, gallbladder, spleen, left kidney, right kidney, liver, pancreas, spleen, and stomach), with a total of 3779 slices [28]. In the subsequent experiments, the training set consists of eighteen CT cases, while the testing set is composed of the remaining twelve cases.
- Automated cardiac diagnosis challenge (ACDC) dataset: The ACDC dataset contains one hundred MRI cases [29]. The MRI data consist of data concerning the right ventricle, myocardium, and left ventricle. In the subsequent experiments, seventy cases form the training set, ten cases form the validating set, and the remaining twenty cases form the testing set.
4.2. Experimental Settings
4.3. Experimental Results and Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Parker, J.R. Algorithms for Image Processing and Computer Vision; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
- Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep learning for computer vision: A brief review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Chen, H.; Qi, X.; Dou, Q.; Fu, C.-W.; Heng, P.-A. H-DenseUNet: Hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 2018, 37, 2663–2674. [Google Scholar] [CrossRef] [PubMed]
- Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.-W.; Wu, J. Unet 3+: A Full-Scale Connected Unet for Medical Image Segmentation. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1055–1059. [Google Scholar]
- Xiao, X.; Lian, S.; Luo, Z.; Li, S. Weighted Res-Unet for High-Quality Retina Vessel Segmentation. In Proceedings of the 2018 9th International Conference on Information Technology in Medicine and Education (ITME), Hangzhou, China, 19–21 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 327–331. [Google Scholar]
- Cai, S.; Tian, Y.; Lui, H.; Zeng, H.; Wu, Y.; Chen, G. Dense-UNet: A novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network. Quant. Imaging Med. Surg. 2020, 10, 1275. [Google Scholar] [CrossRef]
- Drozdzal, M.; Vorontsov, E.; Chartrand, G.; Kadoury, S.; Pal, C. The Importance of Skip Connections in Biomedical Image Segmentation. In Proceedings of the International Workshop on Deep Learning in Medical Image Analysis, International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis, Shenzhen, China, 13–17 October 2019; Springer: Berlin/Heidelberg, Germany, 2016; pp. 179–187. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on EMPIRICAL Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; Association for Computational Linguistics: Cedarville, OH, USA, 2020; pp. 38–45. [Google Scholar]
- Tetko, I.V.; Karpov, P.; Van Deursen, R.; Godin, G. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat. Commun. 2020, 11, 5575. [Google Scholar] [CrossRef]
- Gillioz, A.; Casas, J.; Mugellini, E.; Abou Khaled, O. Overview of the Transformer-based Models for NLP Tasks. In Proceedings of the 2020 15th Conference on Computer Science and Information Systems (FedCSIS), Sofia, Bulgaria, 6–9 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 179–183. [Google Scholar]
- Sharma, N.; Aggarwal, L.M. Automated medical image segmentation techniques. J. Med. Phys. 2010, 35, 3–14. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 2019, 39, 1856–1867. [Google Scholar] [CrossRef] [PubMed]
- Lian, S.; Luo, Z.; Zhong, Z.; Lin, X.; Su, S.; Li, S.; Representation, I. Attention guided U-Net for accurate iris segmentation. J. Vis. Commun. Image Represent. 2018, 56, 296–304. [Google Scholar] [CrossRef]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef]
- Wang, C.; MacGillivray, T.; Macnaught, G.; Yang, G.; Newby, D. A two-stage 3D Unet framework for multi-class segmentation on full resolution image. arXiv 2018, arXiv:1804.04341. [Google Scholar]
- Milletari, F.; Navab, N.; Ahmadi, S.-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 565–571. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- Liu, Y.; Wang, H.; Chen, Z.; Huangliang, K.; Zhang, H. TransUNet+: Redesigning the skip connection to enhance features in medical image segmentation. Knowl. Based Syst. 2022, 256, 109859. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the Computer Vision–ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part III. Springer: Berlin/Heidelberg, Germany, 2023; pp. 205–218. [Google Scholar]
- Wang, J.; Zhao, H.; Liang, W.; Wang, S.; Zhang, Y. Biology, Cross-convolutional transformer for automated multi-organs segmentation in a variety of medical images. Phys. Med. Biol. 2023, 68, 035008. [Google Scholar] [CrossRef] [PubMed]
- Chu, X.; Tian, Z.; Wang, Y.; Zhang, B.; Ren, H.; Wei, X.; Xia, H.; Shen, C. Twins: Revisiting the design of spatial attention in vision transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 9355–9366. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.-P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 568–578. [Google Scholar]
- Wang, W.; Yao, L.; Chen, L.; Lin, B.; Cai, D.; He, X.; Liu, W. CrossFormer: A versatile vision transformer hinging on cross-scale attention. arXiv 2021, arXiv:2108.00154. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Staffler, B.; Berning, M.; Boergens, K.M.; Gour, A.; Smagt, P.v.d.; Helmstaedter, M.J.E. SynEM, automated synapse detection for connectomics. Elife 2017, 6, e26414. [Google Scholar] [CrossRef]
- Bernard, O.; Lalande, A.; Zotti, C.; Cervenansky, F.; Yang, X.; Heng, P.-A.; Cetin, I.; Lekadir, K.; Camara, O.; Ballester, M.A.G. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Trans. Med. Imaging 2018, 37, 2514–2525. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035. Available online: https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf (accessed on 11 April 2023).
- Thada, V.; Jaglan, V. Technology, Comparison of jaccard, dice, cosine similarity coefficient to find best fitness value for web retrieved documents using genetic algorithm. Int. J. Innov. Eng. Technol. 2013, 2, 202–205. [Google Scholar]
- Heimann, T.; Meinzer, H.-P. Statistical shape models for 3D medical image segmentation: A review. Med. Image Anal. 2009, 13, 543–563. [Google Scholar] [CrossRef]
- Taha, A.A.; Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Med. Imaging 2015, 15, 29. [Google Scholar] [CrossRef] [PubMed]






| Methods | DSC (%) ↑ | Aorta | Gall-Bladder | Kidney (L) | Kidney (R) | Liver | Pancreas | Spleen | Stomach | 
|---|---|---|---|---|---|---|---|---|---|
| Unet [12] | 76.53 | 89.32 | 68.83 | 77.15 | 67.95 | 93.47 | 52.75 | 87.18 | 75.64 | 
| Att-Unet [14] | 75.47 | 85.82 | 63.81 | 79.10 | 72.61 | 93.46 | 49.27 | 87.09 | 74.85 | 
| ViT [18] | 75.33 | 88.12 | 67.63 | 75.95 | 66.75 | 92.27 | 51.55 | 85.98 | 74.44 | 
| Unet++ [13] | 77.28 | 87.46 | 62.79 | 80.23 | 79.07 | 92.92 | 56.35 | 84.88 | 74.61 | 
| TransUnet [19] | 77.49 | 87.62 | 63.41 | 80.88 | 77.29 | 94.75 | 55.57 | 84.90 | 75.49 | 
| SwinUnet [21] | 78.83 | 85.21 | 65.72 | 82.84 | 79.14 | 94.67 | 56.41 | 90.09 | 76.57 | 
| TransUnet+ [20] | 81.12 | 88.53 | 66.80 | 82.12 | 81.44 | 93.91 | 65.28 | 90.19 | 80.71 | 
| nnUnet [15] | 82.02 | 90.33 | 64.79 | 81.02 | 77.64 | 95.10 | 69.85 | 91.50 | 85.96 | 
| C2Former [22] | 82.94 | 87.20 | 71.87 | 83.41 | 81.58 | 94.66 | 68.32 | 92.94 | 83.52 | 
| CCTrans (ours) | 83.97 | 89.98 | 73.38 | 83.33 | 82.72 | 94.72 | 69.43 | 93.87 | 84.32 | 
| Methods | DSC (%) ↑ | Ventricle (R) | Myocardium | Ventricle (L) | 
|---|---|---|---|---|
| Unet [12] | 87.37 | 87.12 | 80.29 | 94.71 | 
| Att-Unet [14] | 86.55 | 87.38 | 79.00 | 93.07 | 
| ViT [18] | 87.39 | 85.89 | 81.70 | 94.57 | 
| Unet++ [13] | 88.16 | 86.93 | 85.45 | 92.11 | 
| TransUnet [19] | 89.52 | 88.61 | 84.09 | 95.87 | 
| SwinUnet [21] | 89.73 | 88.76 | 85.38 | 95.05 | 
| TransUnet+ [20] | 90.47 | 89.13 | 87.96 | 94.31 | 
| nnUnet [15] | 91.20 | 89.55 | 90.23 | 93.81 | 
| C2Former [22] | 91.43 | 91.67 | 88.19 | 94.42 | 
| CCTrans (ours) | 92.15 | 91.28 | 89.81 | 95.35 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Zhang, H.; Yi, Z. CCTrans: Improving Medical Image Segmentation with Contoured Convolutional Transformer Network. Mathematics 2023, 11, 2082. https://doi.org/10.3390/math11092082
Wang J, Zhang H, Yi Z. CCTrans: Improving Medical Image Segmentation with Contoured Convolutional Transformer Network. Mathematics. 2023; 11(9):2082. https://doi.org/10.3390/math11092082
Chicago/Turabian StyleWang, Jingling, Haixian Zhang, and Zhang Yi. 2023. "CCTrans: Improving Medical Image Segmentation with Contoured Convolutional Transformer Network" Mathematics 11, no. 9: 2082. https://doi.org/10.3390/math11092082
APA StyleWang, J., Zhang, H., & Yi, Z. (2023). CCTrans: Improving Medical Image Segmentation with Contoured Convolutional Transformer Network. Mathematics, 11(9), 2082. https://doi.org/10.3390/math11092082
 
        


 
       