MSMHSA-DeepLab V3+: An Effective Multi-Scale, Multi-Head Self-Attention Network for Dual-Modality Cardiac Medical Image Segmentation
Abstract
:1. Introduction
2. Related Works
2.1. Supervised Image Segmentation
2.2. Cardiac Medical Image Segmentation
2.3. Research and Application of AMs
3. Materials and Methods
3.1. MHSA Mechanisms
3.1.1. Self-Attention Mechanisms
3.1.2. MHSA Mechanism Framework
- Main framework:
- 2.
- Patch embedding:
- 3.
- Position embedding:
- 4.
- Class token:
3.2. MSMHSA-DeepLab V3+
3.2.1. Encoder
- Xception:
- 2.
- MSMHSA:
- 3.
- ASPP
3.2.2. Decoder
3.2.3. Feature Engineering
4. Results and Discussions
4.1. Implementation Details
4.1.1. Datasets
4.1.2. Training
4.1.3. Evaluation Indexes
4.2. Comparisons with Other Advanced Methods
IoU | Myo | LA | LV | RA | RV | PA | AA | MIoU |
---|---|---|---|---|---|---|---|---|
Segmenter | 0.440 | 0.500 | 0.558 | 0.338 | 0.480 | 0.672 | 0.716 | 0.529 |
U-net | 0.774 | 0.878 | 0.864 | 0.792 | 0.804 | 0.927 | 0.804 | 0.835 |
DeepLabV3+ | 0.858 | 0.921 | 0.914 | 0.885 | 0.885 | 0.937 | 0.858 | 0.894 |
ConvFormer [27] | 0.857 | 0.885 | 0.914 | 0.835 | 0.836 | 0.740 | 0.957 | 0.861 |
16×_res to MHSA | 0.868 | 0.980 | 0.898 | 0.982 | 0.902 | 0.984 | 0.943 | 0.937 |
8×_res to MHSA | 0.873 | 0.969 | 0.901 | 0.982 | 0.905 | 0.968 | 0.906 | 0.929 |
4×_res to MHSA | 0.872 | 0.975 | 0.915 | 0.969 | 0.887 | 0.944 | 0.915 | 0.925 |
Dice | Myo | LA | LV | RA | RV | PA | AA | Avg |
---|---|---|---|---|---|---|---|---|
GUT | 0.881 | 0.929 | 0.918 | 0.888 | 0.909 | 0.840 | 0.933 | 0.899 |
KTH [28] | 0.856 | 0.930 | 0.923 | 0.971 | 0.857 | 0.835 | 0.894 | 0.881 |
CUHK1 [29] | 0.851 | 0.916 | 0.904 | 0.836 | 0.883 | 0.784 | 0.907 | 0.869 |
3D U-net | 0.791 | 0.853 | 0.813 | 0.909 | 0.816 | 0.763 | 0.717 | 0.8089 |
Two stage U-net | 0.729 | 0.904 | 0.799 | 0.786 | 0.793 | 0.648 | 0.873 | 0.7903 |
CFUN | 0.822 | 0.832 | 0.879 | 0.902 | 0.844 | 0.821 | 0.940 | 0.8590 |
SEG-CNN | 0.872 | 0.91 | 0.924 | 0.879 | 0.865 | 0.837 | 0.913 | 0.8896 |
Swin-Unet | 0.856 | - | 0.958 | - | 0.886 | - | - | 0.9000 |
MAUNet [30] | 0.893 | 0.910 | 0.925 | 0.928 | 0.886 | 0.866 | 0.925 | 0.9063 |
ConvFormer | 0.926 | 0.943 | 0.955 | 0.914 | 0.910 | 0.844 | 0.980 | 0.9250 |
16×_res to MHSA | 0.899 | 0.961 | 0.917 | 0.962 | 0.919 | 0.963 | 0.941 | 0.9374 |
8×_res to MHSA | 0.901 | 0.955 | 0.918 | 0.962 | 0.921 | 0.954 | 0.922 | 0.9333 |
4×_res to MHSA | 0.900 | 0.958 | 0.926 | 0.955 | 0.912 | 0.943 | 0.927 | 0.9316 |
IoU | Myo | LA | LV | RA | RV | PA | AA | MIoU |
U-net | 0.5572 | 0.4087 | 0.6958 | 0.5282 | 0.6085 | 0.3354 | 0.1170 | 0.4644 |
MascParallelUnet | 0.5835 | 0.6155 | 0.7271 | 0.5625 | 0.5361 | 0.6849 | 0.6556 | 0.6207 |
DeepLabV3+ | 0.7676 | 0.8129 | 0.8817 | 0.8403 | 0.8628 | 0.7989 | 0.7302 | 0.8135 |
ConvFormer | 0.857 | 0.885 | 0.914 | 0.835 | 0.836 | 0.740 | 0.957 | 0.861 |
16×_res to MHSA | 0.828 | 0.922 | 0.803 | 0.934 | 0.879 | 0.926 | 0.891 | 0.883 |
8×_res to MHSA | 0.827 | 0.914 | 0.815 | 0.911 | 0.850 | 0.905 | 0.852 | 0.868 |
4×_res to MHSA | 0.835 | 0.930 | 0.818 | 0.921 | 0.870 | 0.928 | 0.893 | 0.885 |
Dice | MYO | LA | LV | RA | RV | PA | AA | Avg |
---|---|---|---|---|---|---|---|---|
nnUnet [31] | 0.892 | - | 0.953 | - | 0.902 | - | - | 0.916 |
nnFormer [32] | 0.896 | - | 0.957 | - | 0.909 | - | - | 0.921 |
MMGL [33] | 0.900 | - | 0.961 | - | 0.909 | - | - | 0.923 |
MISSFormer [34] | 0.880 | - | 0.949 | - | 0.896 | - | - | 0.909 |
Unet | 0.716 | 0.580 | 0.821 | 0.691 | 0.757 | 0.5023 | 0.209 | 0.611 |
MascParallelUnet | 0.737 | 0.762 | 0.842 | 0.720 | 0.698 | 0.813 | 0.792 | 0.766 |
Deeplab V3+ | 0.869 | 0.897 | 0.937 | 0.913 | 0.926 | 0.888 | 0.844 | 0.896 |
MAUNet | 0.893 | 0.910 | 0.925 | 0.928 | 0.886 | 0.866 | 0.925 | 0.9063 |
ConvFormer | 0.926 | 0.943 | 0.955 | 0.914 | 0.910 | 0.844 | 0.980 | 0.925 |
16×_res to MHSA | 0.906 | 0.959 | 0.891 | 0.966 | 0.936 | 0.962 | 0.942 | 0.937 |
8×_res to MHSA | 0.905 | 0.955 | 0.898 | 0.953 | 0.919 | 0.950 | 0.920 | 0.929 |
4×_res to MHSA | 0.910 | 0.964 | 0.900 | 0.959 | 0.930 | 0.963 | 0.943 | 0.938 |
4.3. Comparison between the Models at Different Scales
4.4. Segmentation Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mendis, S.; Puska, P.; Norrving, B.E.; World Health Organization. Global Atlas on Cardiovascular Disease Prevention and Control; World Health Organization: Geneva, Switzerland, 2011. [Google Scholar]
- Chartrand-Lefebvre, C.; Cadrin-Chenevert, A.; Bordeleau, E.; Ugolini, P.; Ouellet, R.; Sablayrolles, J.L.; Prenovault, J. Coronary computed tomography angiography: Overview of technical aspects, current concepts, and perspectives. Can. Assoc. Radiol. J. 2007, 58, 92–108. [Google Scholar] [PubMed]
- Earls, J.P.; Ho, V.B.; Foo, T.K.; Castillo, E.; Flamm, S.D. Cardiac MRI: Recent progress and continued challenges. J. Magn. Reason. Imaging 2002, 16, 111–127. [Google Scholar] [CrossRef] [PubMed]
- Kang, D. Heart chambers and whole heart segmentation techniques: Review. J. Electron. Imaging 2012, 21, 010901. [Google Scholar] [CrossRef]
- Tang, C.; Hu, C.; Sun, J.; Sima, H. Deep learning techniques for medical images: Development from convolution to graph convolution. J. Image Graph. 2021, 26, 2078–2093. [Google Scholar]
- Zhang, Y.H.; Qiu, Z.F.; Yao, T.; Liu, D.; Mei, T. Fully Convolutional Adaptation Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6810–6818. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar] [CrossRef]
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 565–571. [Google Scholar] [CrossRef]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; pp. 3–11. [Google Scholar] [CrossRef]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Chen, L.C.E.; Zhu, Y.K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Pt Vii 2018. Volume 11211, pp. 833–851. [Google Scholar] [CrossRef]
- Tran, P.V. A fully convolutional neural network for cardiac segmentation in short-axis MRI. arXiv 2016, arXiv:1604.00494. [Google Scholar]
- Yang, X.; Bian, C.; Yu, L.Q.; Ni, D.; Heng, P.A. Hybrid Loss Guided Convolutional Networks for Whole Heart Parsing. Lect. Notes Comput. Sci. 2018, 10663, 215–223. [Google Scholar] [CrossRef]
- Payer, C.; Štern, D.; Bischof, H.; Urschler, M. Multi-label whole heart segmentation using CNNs and anatomical label configurations. In Proceedings of the International Workshop on Statistical Atlases and Computational Models of the Heart, Quebec City, QC, Canada, 10–14 September 2017; pp. 190–198. [Google Scholar] [CrossRef]
- Xu, Z.; Wu, Z.; Feng, J. CFUN: Combining faster R-CNN and U-net network for efficient whole heart segmentation. arXiv 2018, arXiv:1812.04914. [Google Scholar]
- Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent Models of Visual Attention. arXiv 2014, arXiv:1406.6247. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Xu, K.; Ba, J.L.; Kiros, R.; Cho, K.; Courville, A.; Salakhutdinov, R.; Zemel, R.S.; Bengio, Y. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Proc. Mach. Learn. Res. 2015, 37, 2048–2057. [Google Scholar] [CrossRef]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 205–218. [Google Scholar] [CrossRef]
- Zhuang, X.; Li, L.; Payer, C.; Stern, D.; Urschler, M.; Heinrich, M.P.; Oster, J.; Wang, C.; Smedby, O.; Bian, C.; et al. Evaluation of algorithms for Multi-Modality Whole Heart Segmentation: An open-access grand challenge. Med. Image Anal. 2019, 58, 101537. [Google Scholar] [CrossRef] [PubMed]
- Gu, P.; Zhang, Y.; Wang, C.; Chen, D.Z. ConvFormer: Combining CNN and Transformer for Medical Image Segmentation. arXiv 2022, arXiv:2211.08564. [Google Scholar]
- Wang, C.L.; Smedby, Ö. Automatic Whole Heart Segmentation Using Deep Learning and Shape Context. Lect. Notes Comput. Sci. 2018, 10663, 242–249. [Google Scholar] [CrossRef]
- Yang, X.; Bian, C.; Yu, L.Q.; Ni, D.; Heng, P.A. 3D Convolutional Networks for Fully Automatic Fine-Grained Whole Heart Partition. Lect. Notes Comput. Sci. 2018, 10663, 181–189. [Google Scholar] [CrossRef]
- Ding, Y.; Mu, D.; Zhang, J.; Qin, Z.; You, L.; Qin, Z.; Guo, Y. A cascaded framework with cross-modality transfer learning for whole heart segmentation. Pattern Recognit. 2024, 147, 110088. [Google Scholar] [CrossRef]
- Isensee, F.; Petersen, J.; Klein, A.; Zimmerer, D.; Jaeger, P.F.; Kohl, S.; Wasserthal, J.; Koehler, G.; Norajitra, T.; Wirkert, S.; et al. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation. arXiv 2018, arXiv:1809.10486. [Google Scholar]
- Zhou, H.-Y.; Guo, J.; Zhang, Y.; Yu, L.; Wang, L.; Yu, Y. nnFormer: Interleaved Transformer for Volumetric Segmentation. arXiv 2022, arXiv:2109.03201. [Google Scholar]
- Zhao, Z.; Hu, J.; Zeng, Z.; Yang, X.; Qian, P.; Veeravalli, B.; Guan, C. MMGL: Multi-Scale Multi-View Global-Local Contrastive learning for Semi-supervised Cardiac Image Segmentation. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 401–405. [Google Scholar] [CrossRef]
- Huang, X.; Deng, Z.; Li, D.; Yuan, X. MISSFormer: An Effective Medical Image Segmentation Transformer. arXiv 2021, arXiv:2109.07162. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. arXiv 2020, arXiv:1912.05074. [Google Scholar] [CrossRef] [PubMed]
- Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.-W.; Wu, J. UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. arXiv 2020, arXiv:2004.08790. [Google Scholar]
- Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net Architecture for Multimodal Biomedical Image Segmentation. Neural Netw. 2020, 121, 74–87. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, B.; Li, Y.; Liu, J.; Yang, F.; Zhang, L. MSMHSA-DeepLab V3+: An Effective Multi-Scale, Multi-Head Self-Attention Network for Dual-Modality Cardiac Medical Image Segmentation. J. Imaging 2024, 10, 135. https://doi.org/10.3390/jimaging10060135
Chen B, Li Y, Liu J, Yang F, Zhang L. MSMHSA-DeepLab V3+: An Effective Multi-Scale, Multi-Head Self-Attention Network for Dual-Modality Cardiac Medical Image Segmentation. Journal of Imaging. 2024; 10(6):135. https://doi.org/10.3390/jimaging10060135
Chicago/Turabian StyleChen, Bo, Yongbo Li, Jiacheng Liu, Fei Yang, and Lei Zhang. 2024. "MSMHSA-DeepLab V3+: An Effective Multi-Scale, Multi-Head Self-Attention Network for Dual-Modality Cardiac Medical Image Segmentation" Journal of Imaging 10, no. 6: 135. https://doi.org/10.3390/jimaging10060135
APA StyleChen, B., Li, Y., Liu, J., Yang, F., & Zhang, L. (2024). MSMHSA-DeepLab V3+: An Effective Multi-Scale, Multi-Head Self-Attention Network for Dual-Modality Cardiac Medical Image Segmentation. Journal of Imaging, 10(6), 135. https://doi.org/10.3390/jimaging10060135