A Symmetric Multiscale Detail-Guided Attention Network for Cardiac MR Image Semantic Segmentation
Abstract
1. Introduction
- We propose a semantic segmentation architecture that effectively extracts multiscale context and detailed features.
- We introduce a detail-guided module to extract detailed information and generate detailed feature maps in the encoder as part of the detail guidance of the model.
- We introduce a series of multiscale upsampling blocks that utilize spatial and channel attention, a feature aggregation module, and image gradients to extract multiscale and multilevel context.
2. Related Work
2.1. Resolution Preservation
2.2. Context Embedding
2.3. Attention Aggregation
3. Proposed Method
3.1. CNN Architecture
3.2. Detail-Guided Module
3.3. Multiscale Upsampling Attention Block
3.4. Loss Function
4. Experiments
4.1. Implementation Details
4.2. Quantitative Results
4.3. Ablation Experiments
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lu, Z.; Li, J.; Liu, Z.; Cao, Q.; Tian, T.; Wang, X.; Huang, Z. Semi-Supervised Retinal Vessel Segmentation Based on Pseudo Label Filtering. Symmetry 2025, 17, 1462. [Google Scholar] [CrossRef]
- Zhu, Y.; Zhang, D.; Lin, Y.; Feng, Y.; Tang, J. Merging Context Clustering With Visual State Space Models for Medical Image Segmentation. IEEE Trans. Med. Imaging 2025, 44, 2131–2142. [Google Scholar] [PubMed]
- Orbe-Trujillo, E.; Novillo, C.J.; Pérez-Ramírez, M.; Vazquez-Avila, J.L.; Pérez-Ramírez, A. Fast Treetops Counting Using Mathematical Image Symmetry, Segmentation, and Fast K-Means Classification Algorithms. Symmetry 2022, 14, 532. [Google Scholar] [CrossRef]
- Zhou, M.; Hu, H.M.; Zhang, Y. Region-based intra-frame rate-control scheme for high efficiency video coding. In Proceedings of the Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, Siem Reap, Cambodia, 9–12 December 2014; pp. 1–4. [Google Scholar]
- Kheirkhah, F.M.; Mohammadi, H.R.S.; Shahverdi, A. Modified histogram-based segmentation and adaptive distance tracking of sperm cells image sequences. Comput. Methods Programs Biomed. 2018, 154, 173–182. [Google Scholar] [CrossRef]
- Long, J.; Feng, X.; Zhu, X.; Zhang, J.; Gou, G. Efficient superpixel-guided interactive image segmentation based on graph theory. Symmetry 2018, 10, 169. [Google Scholar] [CrossRef]
- Shen, W.; Zhou, M.; Luo, J.; Li, Z.; Kwong, S. Graph-represented distribution similarity index for full-reference image quality assessment. IEEE Trans. Image Process. 2024, 33, 3075–3089. [Google Scholar]
- Dey, N.; Rajinikanth, V.; Ashour, A.S.; Tavares, J.M.R. Social group optimization supported segmentation and evaluation of skin melanoma images. Symmetry 2018, 10, 51. [Google Scholar] [CrossRef]
- Liu, R.; Liao, J.; Liu, X.; Liu, Y.; Chen, Y. LSRL-Net: A level set-guided re-learning network for semi-supervised cardiac and prostate segmentation. Biomed. Signal Process. Control 2025, 110, 108062. [Google Scholar]
- Zhu, Z.; Hou, J.; Liu, H.; Zeng, H.; Hou, J. Learning efficient and effective trajectories for differential equation-based image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 9150–9168. [Google Scholar]
- Hasan, A.M.; Meziane, F.; Aspin, R.; Jalab, H.A. Segmentation of brain tumors in MRI images using three-dimensional active contour without edge. Symmetry 2016, 8, 132. [Google Scholar] [CrossRef]
- Pratondo, A.; Chui, C.K.; Ong, S.H. Robust edge-stop functions for edge-based active contour models in medical image segmentation. IEEE Signal Process. Lett. 2015, 23, 222–226. [Google Scholar]
- Minaee, S.; Wang, Y. An ADMM approach to masked signal decomposition using subspace representation. IEEE Trans. Image Process. 2019, 28, 3192–3204. [Google Scholar] [CrossRef]
- Zhou, M.; Leng, H.; Fang, B.; Xiang, T.; Wei, X.; Jia, W. Low-light image enhancement via a frequency-based model with structure and texture decomposition. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 187. [Google Scholar] [CrossRef]
- Lang, S.; Liu, X.; Zhou, M.; Luo, J.; Pu, H.; Zhuang, X.; Wang, J.; Wei, X.; Zhang, T.; Feng, Y.; et al. A full-reference image quality assessment method via deep meta-learning and conformer. IEEE Trans. Broadcast. 2023, 70, 316–324. [Google Scholar] [CrossRef]
- Liao, X.; Wei, X.; Zhou, M.; Kwong, S. Full-reference image quality assessment: Addressing content misalignment issue by comparing order statistics of deep features. IEEE Trans. Broadcast. 2023, 70, 305–315. [Google Scholar] [CrossRef]
- Xian, W.; Zhou, M.; Fang, B.; Liao, X.; Ji, C.; Xiang, T.; Jia, W. Spatiotemporal feature hierarchy-based blind prediction of natural video quality via transfer learning. IEEE Trans. Broadcast. 2022, 69, 130–143. [Google Scholar] [CrossRef]
- Liao, X.; Wei, X.; Zhou, M.; Li, Z.; Kwong, S. Image quality assessment: Measuring perceptual degradation via distribution measures in deep feature spaces. IEEE Trans. Image Process. 2024, 33, 4044–4059. [Google Scholar] [CrossRef]
- Shen, W.; Zhou, M.; Chen, Y.; Wei, X.; Feng, Y.; Pu, H.; Jia, W. Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 17990–17999. [Google Scholar]
- Zhou, Z.; Zhou, M.; Luo, J.; Pu, H.; U, L.H.; Wei, X.; Jia, W. VideoGNN: Video Representation Learning via Dynamic Graph Modelling. Acm Trans. Multimed. Comput. Commun. Appl. 2025. [Google Scholar] [CrossRef]
- Wei, X.; Li, J.; Zhou, M.; Wang, X. Contrastive distortion-level learning-based no-reference image-quality assessment. Int. J. Intell. Syst. 2022, 37, 8730–8746. [Google Scholar] [CrossRef]
- Xian, W.; Zhou, M.; Fang, B.; Kwong, S. A content-oriented no-reference perceptual video quality assessment method for computer graphics animation videos. Inf. Sci. 2022, 608, 1731–1746. [Google Scholar] [CrossRef]
- Zhou, M.; Wang, H.; Wei, X.; Feng, Y.; Luo, J.; Pu, H.; Zhao, J.; Wang, L.; Chu, Z.; Wang, X.; et al. HDIQA: A hyper debiasing framework for full reference image quality assessment. IEEE Trans. Broadcast. 2024, 70, 545–554. [Google Scholar] [CrossRef]
- Lan, X.; Xian, W.; Zhou, M.; Yan, J.; Wei, X.; Luo, J.; Jia, W.; Kwong, S. No-Reference Image Quality Assessment: Exploring Intrinsic Distortion Characteristics via Generative Noise Estimation with Mamba. IEEE Trans. Circuits Syst. Video Technol. 2025. Early Access. [Google Scholar] [CrossRef]
- Lang, S.; Zhou, M.; Wei, X.; Yan, J.; Feng, Y.; Jia, W. Image Quality Assessment: Exploring the Similarity of Deep Features via Covariance-Constrained Spectra. IEEE Trans. Broadcast. 2025. Early Access. [Google Scholar]
- Gao, T.; Sheng, W.; Zhou, M.; Fang, B.; Zheng, L. MEMS inertial sensor fault diagnosis using a cnn-based data-driven method. Int. J. Pattern Recognit. Artif. Intell. 2020, 34, 2059048. [Google Scholar] [CrossRef]
- Gao, T.; Sheng, W.; Zhou, M.; Fang, B.; Luo, F.; Li, J. Method for fault diagnosis of temperature-related mems inertial sensors by combining Hilbert–Huang transform and deep learning. Sensors 2020, 20, 5633. [Google Scholar] [PubMed]
- Zhang, Y.; Hou, J.; Ren, S.; Wu, J.; Yuan, Y.; Shi, G. Self-supervised learning of lidar 3d point clouds via 2d-3d neural calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 9201–9216. [Google Scholar] [CrossRef]
- Ren, S.; Hou, J.; Chen, X.; Xiong, H.; Wang, W. DDM: A Metric for Comparing 3D Shapes Using Directional Distance Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 6631–6646. [Google Scholar] [CrossRef]
- Zhang, Q.; Hou, J.; Qian, Y.; Zeng, Y.; Zhang, J.; He, Y. Flattening-net: Deep regular 2d representation for 3d point cloud analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 9726–9742. [Google Scholar] [CrossRef]
- Zhou, M.; Wei, X.; Wang, S.; Kwong, S.; Fong, C.K.; Wong, P.H.; Yuen, W.Y. Global rate-distortion optimization-based rate control for HEVC HDR coding. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 4648–4662. [Google Scholar] [CrossRef]
- Zhou, M.; Wei, X.; Ji, C.; Xiang, T.; Fang, B. Optimum quality control algorithm for versatile video coding. IEEE Trans. Broadcast. 2022, 68, 582–593. [Google Scholar] [CrossRef]
- Wei, X.; Zhou, M.; Wang, H.; Yang, H.; Chen, L.; Kwong, S. Recent advances in rate control: From optimization to implementation and beyond. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 17–33. [Google Scholar] [CrossRef]
- Zhou, M.; Zhang, Y.; Li, B.; Hu, H.M. Complexity-based intra frame rate control by jointing inter-frame correlation for high efficiency video coding. J. Vis. Commun. Image Represent. 2017, 42, 46–64. [Google Scholar]
- Zhou, M.; Zhang, Y.; Li, B.; Lin, X. Complexity correlation-based CTU-level rate control with direction selection for HEVC. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2017, 13, 53. [Google Scholar] [CrossRef]
- Wei, X.; Zhou, M.; Kwong, S.; Yuan, H.; Jia, W. A hybrid control scheme for 360-degree dynamic adaptive video streaming over mobile devices. IEEE Trans. Mob. Comput. 2021, 21, 3428–3442. [Google Scholar] [CrossRef]
- Wei, X.; Zhou, M.; Kwong, S.; Yuan, H.; Wang, S.; Zhu, G.; Cao, J. Reinforcement learning-based QoE-oriented dynamic adaptive streaming framework. Inf. Sci. 2021, 569, 786–803. [Google Scholar]
- Shen, Y.; Feng, Y.; Fang, B.; Zhou, M.; Kwong, S.; Qiang, B.h. DSRPH: Deep semantic-aware ranking preserving hashing for efficient multi-label image retrieval. Inf. Sci. 2020, 539, 145–156. [Google Scholar]
- Zhao, L.; Shang, Z.; Tan, J.; Zhou, M.; Zhang, M.; Gu, D.; Zhang, T.; Tang, Y.Y. Siamese networks with an online reweighted example for imbalanced data learning. Pattern Recognit. 2022, 132, 108947. [Google Scholar] [CrossRef]
- Romaguera, L.V.; Romero, F.P.; Costa Filho, C.F.F.; Costa, M.G.F. Myocardial segmentation in cardiac magnetic resonance images using fully convolutional neural networks. Biomed. Signal Process. Control 2018, 44, 48–57. [Google Scholar] [CrossRef]
- Trinh, M.N.; Tran, T.T.; Pham, V.T.; Tran, T.T. A modified FCN-based method for Left Ventricle endocardium and epicardium segmentation with new block modules. In Proceedings of the 2021 8th NAFOSTED Conference on Information and Computer Science (NICS), Hanoi, Vietnam, 21–22 December 2021; pp. 392–397. [Google Scholar]
- Sun, J.; Darbehani, F.; Zaidi, M.; Wang, B. Saunet: Shape attentive u-net for interpretable medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Lima, Peru, 4–8 October 2020; Springer: Cham, Switzerland, 2020; pp. 797–806. [Google Scholar]
- Le, D.H.; Le, N.M.; Le, K.H.; Pham, V.T.; Tran, T.T. DR-Unet++: An Approach for Left Ventricle Segmentation from Magnetic Resonance Images. In Proceedings of the 2022 6th International Conference on Green Technology and Sustainable Development (GTSD), Nha Trang City, Vietnam, 29–30 July 2022; pp. 1048–1052. [Google Scholar]
- Tran, T.T.; Tran, T.T.; Ninh, Q.C.; Bui, M.D.; Pham, V.T. Segmentation of left ventricle in short-axis MR images based on fully convolutional network and active contour model. In Computational Intelligence Methods for Green Technology and Sustainable Development, Proceedings of the International Conference GTSD2020 5, Da Nang City, Vietnam, 27–28 November 2020; Springer: Cham, Switzerland, 2021; pp. 49–59. [Google Scholar]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [PubMed]
- Zhou, H.Y.; Guo, J.; Zhang, Y.; Han, X.; Yu, L.; Wang, L.; Yu, Y. nnFormer: Volumetric Medical Image Segmentation via a 3D Transformer. IEEE Trans. Image Process. 2023, 32, 4036–4045. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2023; pp. 205–218. [Google Scholar]
- Liu, J.; Yang, H.; Zhou, H.Y.; Xi, Y.; Yu, L.; Li, C.; Liang, Y.; Shi, G.; Yu, Y.; Zhang, S.; et al. Swin-umamba: Mamba-based unet with imagenet-based pretraining. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Marrakesh, Morocco, 6–10 October 2024; Springer: Cham, Switzerland, 2024; pp. 615–625. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [PubMed]
- Ghiasi, G.; Fowlkes, C.C. Laplacian pyramid reconstruction and refinement for semantic segmentation. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part III 14. Springer: Cham, Switzerland, 2016; pp. 519–534. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Cheng, B.; Chen, L.C.; Wei, Y.; Zhu, Y.; Huang, Z.; Xiong, J.; Huang, T.S.; Hwu, W.M.; Shi, H. Spgnet: Semantic prediction guidance for scene parsing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5218–5228. [Google Scholar]
- Cheng, B.; Xiao, B.; Wang, J.; Shi, H.; Huang, T.S.; Zhang, L. Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5386–5395. [Google Scholar]
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
- Yan, J.; Zhang, B.; Zhou, M.; Kwok, H.F.; Siu, S.W. Multi-Branch-CNN: Classification of ion channel interacting peptides using multi-branch convolutional neural network. Comput. Biol. Med. 2022, 147, 105717. [Google Scholar] [CrossRef]
- Yan, J.; Zhang, B.; Zhou, M.; Campbell-Valois, F.X.; Siu, S.W. A deep learning method for predicting the minimum inhibitory concentration of antimicrobial peptides against Escherichia coli using Multi-Branch-CNN and Attention. mSystems 2023, 8, e00345-23. [Google Scholar] [CrossRef]
- Zhang, W.; Zhou, M.; Ji, C.; Sui, X.; Bai, J. Cross-frame transformer-based spatio-temporal video super-resolution. IEEE Trans. Broadcast. 2022, 68, 359–369. [Google Scholar]
- Ding, H.; Jiang, X.; Shuai, B.; Liu, A.Q.; Wang, G. Context contrasted feature and gated multi-scale aggregation for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2393–2402. [Google Scholar]
- He, J.; Deng, Z.; Qiao, Y. Dynamic multi-scale filters for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3562–3572. [Google Scholar]
- He, J.; Deng, Z.; Zhou, L.; Wang, Y.; Qiao, Y. Adaptive pyramid context network for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7519–7528. [Google Scholar]
- Yu, F.; Koltun, V.; Funkhouser, T. Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 472–480. [Google Scholar]
- Li, Z.; Peng, C.; Yu, G.; Zhang, X.; Deng, Y.; Sun, J. Detnet: Design backbone for object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 334–350. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
- Yang, M.; Yu, K.; Zhang, C.; Li, Z.; Yang, K. Denseaspp for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3684–3692. [Google Scholar]
- Pan, H.; Hong, Y.; Sun, W.; Jia, Y. Deep Dual-Resolution Networks for Real-Time and Accurate Semantic Segmentation of Traffic Scenes. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3448–3460. [Google Scholar] [CrossRef]
- Zhao, H.; Zhang, Y.; Liu, S.; Shi, J.; Loy, C.C.; Lin, D.; Jia, J. Psanet: Point-wise spatial attention network for scene parsing. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 267–283. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Jetley, S.; Lord, N.A.; Lee, N.; Torr, P.H. Learn to Pay Attention. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Yuan, Y.; Huang, L.; Guo, J.; Zhang, C.; Chen, X.; Wang, J. Ocnet: Object context network for scene parsing. arXiv 2018, arXiv:1809.00916. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Zhang, H.; Zhang, H.; Wang, C.; Xie, J. Co-occurrent features in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 548–557. [Google Scholar]
- Huang, Z.; Wang, X.; Wei, Y.; Huang, L.; Shi, H.; Liu, W.; Huang, T.S. CCNet: Criss-Cross Attention for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 6896–6908. [Google Scholar] [CrossRef]
- Zhou, M.; Han, S.; Luo, J.; Zhuang, X.; Mao, Q.; Li, Z. Transformer-Based and Structure-Aware Dual-Stream Network for Low-Light Image Enhancement. ACM Trans. Multimed. Comput. Commun. Appl. 2025, 21, 293. [Google Scholar] [CrossRef]
- Guo, Q.; Zhou, M. Progressive domain translation defogging network for real-world fog images. IEEE Trans. Broadcast. 2022, 68, 876–885. [Google Scholar] [CrossRef]
- Song, J.; Zhou, M.; Luo, J.; Pu, H.; Feng, Y.; Wei, X.; Jia, W. Boundary-aware feature fusion with dual-stream attention for remote sensing small object detection. IEEE Trans. Geosci. Remote Sens. 2024, 63, 5600213. [Google Scholar] [CrossRef]
- Zhou, M.; Li, Y.; Yang, G.; Wei, X.; Pu, H.; Luo, J.; Jia, W. COFNet: Contrastive Object-aware Fusion using Box-level Masks for Multispectral Object Detection. IEEE Trans. Multimed. 2025. Early Access. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhou, M.; Shang, Z.; Wei, X.; Pu, H.; Luo, J.; Jia, W. GAANet: Graph Aggregation Alignment Feature Fusion for Multispectral Object Detection. IEEE Trans. Ind. Inform. 2025, 21, 8282–8292. [Google Scholar] [CrossRef]
- Zhou, M.; Zhao, X.; Luo, F.; Luo, J.; Pu, H.; Xiang, T. Robust rgb-t tracking via adaptive modality weight correlation filters and cross-modality learning. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 20, 95. [Google Scholar] [CrossRef]
- Li, Y.l.; Feng, Y.; Zhou, M.l.; Xiong, X.c.; Wang, Y.h.; Qiang, B.h. DMA-YOLO: Multi-scale object detection method with attention mechanism for aerial images. Vis. Comput. 2024, 40, 4505–4518. [Google Scholar] [CrossRef]
- Zhou, M.; Li, J.; Wei, X.; Luo, J.; Pu, H.; Wang, W.; He, J.; Shang, Z. AFES: Attention-Based Feature Excitation and Sorting for Action Recognition. IEEE Trans. Consum. Electron. 2025, 71, 5752–5760. [Google Scholar] [CrossRef]
- Cheng, S.; Song, J.; Zhou, M.; Wei, X.; Pu, H.; Luo, J.; Jia, W. Ef-detr: A lightweight transformer-based object detector with an encoder-free neck. IEEE Trans. Ind. Inform. 2024, 20, 12994–13002. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Radau, P.; Lu, Y.; Connelly, K.; Paul, G.; Dick, A.J.; Wright, G.A. Evaluation framework for algorithms segmenting short axis cardiac MRI. MIDAS J. 2009. [Google Scholar] [CrossRef]



| Method | Endocardium | Epicardium |
|---|---|---|
| Romaguera et al. [40] | 0.9200 | 0.9300 |
| Trinh et al. [41] | 0.9196 | 0.9505 |
| SAUNet [42] | 0.9520 | 0.9620 |
| DR-Unet++ [43] | 0.9131 | 0.9315 |
| FCN-ACM [44] | 0.9400 | 0.9600 |
| nnU-net [45] | 0.9527 | 0.9648 |
| nnformer [46] | 0.9572 | 0.9695 |
| Swin-unet [47] | 0.9548 | 0.9639 |
| SwinUMamba [48] | 0.9531 | 0.9636 |
| Ours | 0.9586 | 0.9718 |
| Cardiac Phase | ||||
|---|---|---|---|---|
| Geometric Metric | End-Diastole | End-Systole | ||
| Endocardium | Epicardium | Endocardium | Epicardium | |
| Pixel Accuracy | 0.9606 | 0.9781 | 0.9612 | - |
| Jaccard | 0.9198 | 0.9451 | 0.9212 | - |
| DSC | 0.9582 | 0.9718 | 0.9590 | - |
| Method | Endocardium | Epicardium | ||
|---|---|---|---|---|
| Jaccard | DSC | Jaccard | DSC | |
| Baseline | 0.8768 | 0.9344 | 0.8981 | 0.9463 |
| Baseline + DG | 0.8883 | 0.9408 | 0.9132 | 0.9546 |
| Method | Endocardium | Epicardium | ||
|---|---|---|---|---|
| Jaccard | DSC | Jaccard | DSC | |
| Baseline | 0.8768 | 0.9344 | 0.8981 | 0.9463 |
| Baseline + DG + PPM | 0.8898 | 0.9417 | 0.9161 | 0.9562 |
| Baseline + DG + ASPP | 0.8922 | 0.9430 | 0.9190 | 0.9578 |
| Method | Endocardium | Epicardium | ||
|---|---|---|---|---|
| Jaccard | DSC | Jaccard | DSC | |
| Baseline | 0.8768 | 0.9344 | 0.8981 | 0.9463 |
| Baseline + DG + ASPP + UA | 0.9113 | 0.9536 | 0.9367 | 0.9673 |
| Method | Endocardium | Epicardium | ||
|---|---|---|---|---|
| Jaccard | DSC | Jaccard | DSC | |
| Baseline | 0.8768 | 0.9344 | 0.8981 | 0.9463 |
| Baseline + DG + ASPP + UA + Sobel | 0.9155 | 0.9559 | 0.9406 | 0.9694 |
| Baseline + DG + ASPP + UA + Canny | 0.9205 | 0.9586 | 0.9451 | 0.9718 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, H.; Fang, B.; Duo, B.; Wei, X.; Yan, J.; Xian, W.; Li, D. A Symmetric Multiscale Detail-Guided Attention Network for Cardiac MR Image Semantic Segmentation. Symmetry 2025, 17, 1807. https://doi.org/10.3390/sym17111807
Hu H, Fang B, Duo B, Wei X, Yan J, Xian W, Li D. A Symmetric Multiscale Detail-Guided Attention Network for Cardiac MR Image Semantic Segmentation. Symmetry. 2025; 17(11):1807. https://doi.org/10.3390/sym17111807
Chicago/Turabian StyleHu, Hengqi, Bin Fang, Bin Duo, Xuekai Wei, Jielu Yan, Weizhi Xian, and Dongfen Li. 2025. "A Symmetric Multiscale Detail-Guided Attention Network for Cardiac MR Image Semantic Segmentation" Symmetry 17, no. 11: 1807. https://doi.org/10.3390/sym17111807
APA StyleHu, H., Fang, B., Duo, B., Wei, X., Yan, J., Xian, W., & Li, D. (2025). A Symmetric Multiscale Detail-Guided Attention Network for Cardiac MR Image Semantic Segmentation. Symmetry, 17(11), 1807. https://doi.org/10.3390/sym17111807

