Thangka Element Semantic Segmentation with an Integrated Multi-Scale Attention Mechanism
Abstract
1. Introduction
2. Related Work
3. Methodology
3.1. The Introduction of Attention Mechanisms
3.2. Lightweight Processing of the HRNet Architecture
3.3. Establishment of the MSAC Attention Mechanism
4. Experimental Setup and Analysis
4.1. Dataset Description
4.1.1. Image Cropping
4.1.2. Data Augmentation and Preprocessing
4.2. Experimental Setup
4.3. Experimental Evaluation Criteria
4.4. Quantitative Evaluation of Algorithm Performance
4.4.1. Comparison with Baseline Models
4.4.2. Ablation Studies
4.5. Public Dataset Experimental Comparison
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jackson, D.P.; Jackson, J.A. Tibetan Thangka Painting: Methods and Materials; Snow Lion Publications: Ithaca, NY, USA, 1988. [Google Scholar]
- Meng, J.; Hu, W.; Jia, L.; He, G.; Xue, P. A semantic segmentation model for headdresses in Thangka image based on line drawing augmentation and spatial prior knowledge. IEEE Sens. J. 2021, 21, 25161–25170. [Google Scholar] [CrossRef]
- Fu, W.; Liu, Z.; Cai, C.; Xue, Y.; Ren, J. Damage Diagnosis of Frame Structure Based on Convolutional Neural Network with SE-Res2Net Module. Appl. Sci. 2023, 13, 2545. [Google Scholar] [CrossRef]
- Wang, X.; Zheng, Z.; He, Y.; Yan, F.; Zeng, Z.; Yang, Y. Progressive local filter pruning for image retrieval acceleration. IEEE Trans. Multimed. 2023, 25, 9597–9607. [Google Scholar] [CrossRef]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 13713–13722. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. pp. 234–241. [Google Scholar]
- Zhou, L.; Wang, R.; Zhang, L. Accurate Robot Arm Attitude Estimation Based on Multi-View Images and Super-Resolution Keypoint Detection Networks. Sensors 2024, 24, 305. [Google Scholar] [CrossRef] [PubMed]
- Zhao, L.; Chen, Z. CRNet: Context feature and refined network for multi-person pose estimation. J. Intell. Syst. 2022, 31, 780–794. [Google Scholar] [CrossRef]
- Liu, M. Multi-scale oriented object detection based on improved RoI Transformer in remote sensing images. J. Appl. Opt. 2023, 44, 13–16. [Google Scholar]
- Sung, C.; Kim, W.; An, J.; Lee, W.; Lim, H.; Myung, H. Contextrast: Contextual contrastive learning for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–18 June 2024; pp. 3732–3742. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Hou, Q.; Li, K.; Xu, Z. Optimization of ICANet lightweight human pose estimation based on HRNet. J. Phys. Conf. Ser. 2023, 2595, 012005. [Google Scholar] [CrossRef]
- Shi, J.; Ruan, S.; Zhu, Z.; Zhao, M.; An, H.; Xue, X.; Yan, B. Predictive accuracy-based active learning for medical image segmentation. arXiv 2024, arXiv:2405.00452. [Google Scholar]
- Huang, H.; Lan, Y.; Deng, J.; Yang, A.; Deng, X.; Zhang, L.; Wen, S. A semantic labeling approach for accurate weed mapping of high resolution UAV imagery. Sensors 2018, 18, 2113. [Google Scholar] [CrossRef] [PubMed]
- Xie, X.; Xu, L.; LI, X.; Wang, B.; Wan, T. A high-effective multitask surface defect detection method based on CBAM and atrous convolution. J. Adv. Mech. Des. Syst. Manuf. 2022, 16, JAMDSM0063. [Google Scholar] [CrossRef]
- Wang, H.; Liu, J.; Tan, H.; Lou, J.; Liu, X.; Zhou, W.; Liu, H. Blind image quality assessment via adaptive graph attention. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 10299–10309. [Google Scholar] [CrossRef]
- Ates, G.C.; Mohan, P.; Celik, E. Dual cross-attention for medical image segmentation. Eng. Appl. Artif. Intell. 2023, 126, 107139. [Google Scholar] [CrossRef]
- Luo, S.; Pan, L.; Jian, Y.; Lu, Y.; Luo, S. CTBANet: Convolution transformers and bidirectional attention for medical image segmentation. Alex. Eng. J. 2024, 88, 133–143. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Shi, L.; Wang, G.; Mo, L.; Yi, X.; Wu, X.; Wu, P. Automatic segmentation of standing trees from forest images based on deep Learning. Sensors 2022, 22, 6663. [Google Scholar] [CrossRef]
- Wu, Y.; Wu, Y. Application of Split Coordinate Channel Attention Embedding U2Net in Salient Object Detection. Algorithms 2024, 17, 109. [Google Scholar] [CrossRef]
- Yuan, Y.; Cui, J.; Liu, Y.; Wu, B. A Multi-Step Fusion Network for Semantic Segmentation of High-Resolution Aerial Images. Sensors 2023, 23, 5323. [Google Scholar] [CrossRef]
- Yan, G.; Jing, H.; Li, H.; Guo, H.; He, S. Enhancing building segmentation in remote sensing images: Advanced multi-scale boundary refinement with MBR-HRNet. Remote Sens. 2023, 15, 3766. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 4015–4026. [Google Scholar]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 603–612. [Google Scholar]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Xu, J.; Xiong, Z.; Bhattacharyya, S.P. PIDNet: A real-time semantic segmentation network inspired by PID controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 19529–19539. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11534–11542. [Google Scholar]
- Xu, W.; Wan, Y. ELA: Efficient local attention for deep convolutional neural networks. arXiv 2024, arXiv:2403.01123. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; GCNet, H.H. Non-Local Networks Meet Squeeze-Excitation Networks and Beyond, 2019 IEEE. In Proceedings of the CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 1971–1980. [Google Scholar]
- Contributors, M. MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark. Available online: https://github.com/open-mmlab/mmsegmentation (accessed on 3 April 2024).
The Category | Number |
---|---|
Alms Bowl | 37 |
Vajra | 40 |
Ghanta | 30 |
Flaming Sword | 31 |
Lotus | 46 |
Kapala | 56 |
Eight-Spoked Dharma Wheel | 41 |
Kartika | 33 |
Vishvavajra | 38 |
Victory Banner (Dhvaja) | 45 |
Sutra | 45 |
Models | mIoU (%)↑ | mB-Fscore (%)↑ | Time (s)↓ | Params (M)↓ | FPS (img/s)↑ |
---|---|---|---|---|---|
PIDNet-L [29] | 73.99 | 40.91 | 0.0225 | 37.311 | 58.88 |
Swin-Transformer [30] | 74.87 | 39.41 | 0.1008 | 58.948 | 12.12 |
CCNet [25] | 75.94 | 43.82 | 0.0815 | 66.470 | 13.20 |
DeepLabv3+ [26] | 75.04 | 40.81 | 0.1250 | 65.922 | 13.09 |
PSPNet [27] | 75.91 | 41.93 | 0.0740 | 65.890 | 14.81 |
HRNet [5] | 80.33 | 49.35 | 0.0642 | 64.657 | 14.13 |
SegFormer(B4) [28] | 82.69 | 53.82 | 0.1063 | 62.854 | 10.05 |
Ours | 84.83 | 56.87 | 0.0574 | 44.136 | 18.62 |
Model Variants | Attention Mechanism | MSAC | Lightweight Hrnet-w48 | mIoU (%)↑ | mB-Fscore (%)↑ | Time (s)↓ |
---|---|---|---|---|---|---|
1 | - | - | - | 80.33 | 49.35 | 0.0642 |
2 | √ | - | - | 83.88 | 55.42 | 0.0697 |
3 | - | √ | - | 83.72 | 54.42 | 0.0672 |
4 | √ | √ | - | 84.22 | 54.51 | 0.0734 |
5 | √ | - | √ | 83.86 | 55.21 | 0.0516 |
6 | √ | √ | √ | 84.83 | 56.87 | 0.0574 |
Attention Mechanism | ECA [34] | ELA [35] | CBAM [22] | oursATT |
---|---|---|---|---|
mIou | 81.75 | 81.93 | 82.27 | 83.88 |
Label | HRNet | Ours |
---|---|---|
mIoU (%) | mIoU (%) | |
Background | 98.04 | 98.33 |
Central Deity | 93.75 | 94.49 |
Alms Bowl | 91.89 | 94.27 |
Vajra | 67.69 | 75.42 |
Ghanta | 70.45 | 78.01 |
Flaming Sword | 74.46 | 74.10 |
Lotus | 75.53 | 85.73 |
Kapala | 81.19 | 82.86 |
Eight-Spoked Dharma Wheel | 91.01 | 91.73 |
Kartika | 57.06 | 71.84 |
Vishvavajra | 77.41 | 81.60 |
Victory Banner (Dhvaja) | 83.72 | 88.38 |
Sutra | 82.12 | 86.01 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, T.; Wu, J.; Guo, X.; Duan, T.; Wei, Y.; Wu, C. Thangka Element Semantic Segmentation with an Integrated Multi-Scale Attention Mechanism. Electronics 2025, 14, 2533. https://doi.org/10.3390/electronics14132533
Wang T, Wu J, Guo X, Duan T, Wei Y, Wu C. Thangka Element Semantic Segmentation with an Integrated Multi-Scale Attention Mechanism. Electronics. 2025; 14(13):2533. https://doi.org/10.3390/electronics14132533
Chicago/Turabian StyleWang, Tiejun, Jiao Wu, Xiaoran Guo, Tianjiao Duan, Yanjiao Wei, and Chaoyang Wu. 2025. "Thangka Element Semantic Segmentation with an Integrated Multi-Scale Attention Mechanism" Electronics 14, no. 13: 2533. https://doi.org/10.3390/electronics14132533
APA StyleWang, T., Wu, J., Guo, X., Duan, T., Wei, Y., & Wu, C. (2025). Thangka Element Semantic Segmentation with an Integrated Multi-Scale Attention Mechanism. Electronics, 14(13), 2533. https://doi.org/10.3390/electronics14132533