MythPose: Enhanced Detection of Complex Poses in Thangka Figures
Abstract
1. Introduction
2. Related Work
2.1. Pose Estimation
2.2. YOLOv11-Pose
2.3. Mamba
3. Methods
3.1. Advanced Scanning and Synthesis Block (ASSB)
3.2. Fusion Module
3.3. Loss Function
4. Experiments
4.1. Dataset
4.2. Experimental Setup
4.3. Processing Analysis
4.4. Comparative Experiment
4.5. Ablation Study
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ASSB | Advanced Scanning and Synthesis Block |
ASS | Advanced Scanning and Synthesis |
DPA | Dual-Pool Attention |
DW Conv | Depthwise Convolution |
GAP | Global Average Pooling |
GAN | Generative Adversarial Network |
GMP | Global Max Pooling |
HRNet | High-Resolution Network |
KAL | Keypoint Association Loss |
SAA | Spatial Axis Attention |
SSM | Selective State Space Model |
UASM | Unified Attention Synergy Module |
References
- Hu, W.; Ye, Y.; Zeng, F.; Meng, J. A new method of Thangka image inpainting quality assessment. J. Vis. Commun. Image Represent. 2019, 59, 292–299. [Google Scholar] [CrossRef]
- Ma, Y.; Liu, Y.; Xie, Q.; Xiong, W.; Bai, L.; Hu, A. A Tibetan Thangka data set and relative tasks. Image Vis. Comput. 2021, 108, 104125. [Google Scholar] [CrossRef]
- Xian, Y.; Shen, T.; Xiang, Y.; Danzeng, P.; Lee, Y. Region-Aware Style Transfer Between Thangka Images via Combined Segmentation and Adaptive Style Fusion. In Proceedings of the 2025 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Compiegne, France, 5–7 May 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1788–1793. [Google Scholar]
- Wang, H.; Hu, J.; Xue, R.; Liu, Y.; Pan, G. Thangka image segmentation method based on enhanced receptive field. IEEE Access 2022, 10, 89687–89695. [Google Scholar] [CrossRef]
- Shen, J.; Liu, N.; Sun, H.; Li, D.; Zhang, Y.; Han, L. An algorithm based on lightweight semantic features for ancient mural element object detection. NPJ Herit. Sci. 2025, 13, 70. [Google Scholar]
- Xian, Y.; Lee, Y.; Shen, T.; Lan, P.; Zhao, Q.; Yan, L. Enhanced Object Detection in Thangka Images Using Gabor, Wavelet, and Color Feature Fusion. Sensors 2025, 25, 3565. [Google Scholar] [CrossRef]
- Li, Y.; Liu, X. Sketch based Thangka image retrieval. In Proceedings of the 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 12–14 March 2021; pp. 2066–2070. [Google Scholar]
- Xian, Y.; Xiang, Y.; Yang, X.; Zhao, Q.; Cairang, X. Thangka school image retrieval based on multi-attribute features. NPJ Herit. Sci. 2025, 13, 1–14. [Google Scholar]
- Yang, Y.; Yang, Y.; Danzeng, X.; Zhao, Q.; Danzeng, P.; Li, X. Learning multi-granularity features for re-identifying figures in portrait Thangka images. In Proceedings of the International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022. [Google Scholar]
- Yang, Y.; Fan, F. Ancient Thangka Buddha face recognition based on the Dlib machine learning library and comparison with secular aesthetics. Herit. Sci. 2023, 11, 137. [Google Scholar]
- Hsieh, T.; Zhao, Q.; Pan, F.; Danzeng, P.; Gao, D.; Dorji, G. Text and Edge Guided Thangka Image Inpainting with Diffusion Model. In Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, 15–19 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–10. [Google Scholar]
- Toshev, A.; Szegedy, C. DeepPose: Human Pose Estimation via Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 1653–1660. [Google Scholar]
- Luvizon, D.C.; Tabia, H.; Picard, D. Human pose regression by combining indirect part detection and contextual information. Comput. Graph. 2019, 85, 15–22. [Google Scholar] [CrossRef]
- Mao, W.; Ge, Y.; Shen, C.; Tian, Z.; Wang, X.; Wang, Z. TFPose: Direct Human Pose Estimation with Transformers. arXiv 2021. [Google Scholar] [CrossRef]
- Li, K.; Wang, S.; Zhang, X.; Xu, Y.; Xu, W.; Tu, Z. Pose Recognition With Cascade Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021; pp. 1944–1953. [Google Scholar]
- Mao, W.; Ge, Y.; Shen, C.; Tian, Z.; Wang, X.; Wang, Z.; van den Hengel, A. Poseur: Direct Human Pose Regression with Transformers. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–24 October 2022. [Google Scholar]
- Sun, K.; Lan, C.; Xing, J.; Zeng, W.; Liu, D.; Wang, J. Human Pose Estimation Using Global and Local Normalization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5599–5607. [Google Scholar]
- Marras, I.; Palasek, P.; Patras, I. Deep Globally Constrained MRFs for Human Pose Estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3466–3475. [Google Scholar]
- Ke, L.; Chang, M.C.; Qi, H.; Lyu, S. Multi-Scale Structure-Aware Network for Human Pose Estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 713–728. [Google Scholar]
- Tang, W.; Yu, P.; Wu, Y. Deeply Learned Compositional Models for Human Pose Estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 190–206. [Google Scholar]
- Tang, W.; Wu, Y. Does Learning Specific Features for Related Parts Help Human Pose Estimation? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1107–1116. [Google Scholar]
- Li, Y.; Yang, S.; Liu, P.; Zhang, S.; Wang, Y.; Wang, Z.; Yang, W.; Xia, S.T. SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; Volume 13666, pp. 89–106. [Google Scholar] [CrossRef]
- Li, J.; Bian, S.; Zeng, A.; Wang, C.; Pang, B.; Liu, W.; Lu, C. Human Pose Regression with Residual Log-Likelihood Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 11025–11034. [Google Scholar]
- Ye, S.; Zhang, Y.; Hu, J.; Cao, L.; Zhang, S.; Shen, L.; Wang, J.; Ding, S.; Ji, R. DistilPose: Tokenized Pose Regression with Heatmap Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 2163–2172. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Su, K.; Yu, D.; Xu, Z.; Geng, X.; Wang, C. Multi-person Pose Estimation with Enhanced Channel-wise and Spatial Information. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5674–5682. [Google Scholar]
- Yang, S.; Quan, Z.; Nie, M.; Yang, W. TransPose: Keypoint Localization via Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 11802–11812. [Google Scholar]
- Li, Y.; Zhang, S.; Wang, Z.; Yang, S.; Yang, W.; Xia, S.T.; Zhou, E. TokenPose: Learning Keypoint Tokens for Human Pose Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 11313–11322. [Google Scholar]
- Yuan, Y.H.; Fu, R.; Huang, L.; Lin, W.H.; Zhang, C.; Chen, X.L.; Wang, J.D. HRFormer: High-Resolution Transformer for Dense Prediction. In Proceedings of the Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS), Online, 7 December 2021. [Google Scholar]
- Xu, Y.; Zhang, J.; Zhang, Q.; Tao, D. ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation. Adv. Neural Inf. Process. Syst. (NeurIPS) 2022, 35, 38571–38584. [Google Scholar]
- Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Kreiss, S.; Bertoni, L.; Alahi, A. PIFPAF: Composite Fields for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11977–11986. [Google Scholar]
- Li, J.; Wang, C.; Zhu, H.; Mao, Y.; Fang, H.S.; Lu, C. CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Cheng, B.; Xiao, B.; Wang, J.; Shi, S.; Huang, T.S.; Zhang, L. HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Geng, X.; Xiao, Y.; Li, H. Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Li, J.; Wang, Y.; Zhang, S. PolarPose: Single-Stage Multi-Person Pose Estimation in Polar Coordinates. IEEE Trans. Image Process. 2023, 32, 1108–1119. [Google Scholar] [CrossRef]
- Tian, Z.; Chen, H.; Shen, C. DirectPose: Direct End-to-End Multi-Person Pose Estimation. arXiv 2019, arXiv:1911.07451. [Google Scholar]
- Shi, D.; Wei, X.; Yu, X.; Tan, W.; Ren, Y.; Pu, S. Inspose: Instance-Aware Networks for Single-Stage Multi-Person Pose Estimation. In Proceedings of the ACM International Conference on Multimedia (MM), Gold Coast, Australia, 1–3 December 2021; pp. 3079–3087. [Google Scholar]
- Shi, D.; Wei, X.; Li, L.; Ren, Y.; Tan, W. End-to-End Multi-Person Pose Estimation with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Miao, H.; Lin, J.; Cao, J.; He, X.; Su, Z.; Liu, R. SMPR: Single-stage multi-person pose regression. Pattern Recognit. 2023, 143, 109743. [Google Scholar]
- Yang, J.; Zeng, A.; Liu, S.; Li, F.; Zhang, R.; Zhang, L. Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Liu, H.; Chen, Q.; Tan, Z.; Liu, J.J.; Wang, J.; Su, X.; Li, X.; Yao, K.; Han, J.; Ding, E.; et al. Group Pose: A Simple Baseline for End-to-End Multi-Person Pose Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 15029–15038. [Google Scholar]
- Xu, L.; Jin, S.; Zeng, W.; Liu, W.; Qian, C.; Ouyang, W.; Luo, P.; Wang, X. Pose for Everything: Towards Category-Agnostic Pose Estimation. Comput. Vis. ECCV 2022, 13666, 123–139. [Google Scholar]
- Shi, M.; Huang, Z.; Ma, X.; Hu, X.; Cao, Z. Matching is not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7308–7317. [Google Scholar]
- Wang, H.; Han, K.; Guo, J.; Tang, Y. Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation. arXiv 2023, arXiv:2303.08912. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023. [Google Scholar] [CrossRef]
- Tibetan Fine Arts. The Art of Painting. Thangka—Chamdo Volume (Tibetan-Chinese Version); Sichuan Minzu Publishing House: Chengdu, China, 2018. [Google Scholar]
- China Thangka Culture Research Centre, Kham Kelsang Yixi. Chinese Thangka; Heritage Publishing House: Beijing, China, 2015. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020. [Google Scholar] [CrossRef]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Yang, Z.; Zeng, A.; Yuan, C.; Li, Y. Effective whole-body pose estimation with two-stages distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 4210–4220. [Google Scholar]
- Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.E.; Sheikh, Y. Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 172–186. [Google Scholar]
- Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. Rtmdet: An empirical study of designing real-time object detectors. arXiv 2022. [Google Scholar] [CrossRef]
- Fang, H.S.; Li, J.; Tang, H.; Xu, C.; Zhu, H.; Xiu, Y.; Li, Y.L.; Lu, C. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 7157–7173. [Google Scholar]
- Xu, Y.; Zhang, J.; Zhang, Q.; Tao, D. Vitpose++: Vision transformer for generic body pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 1212–1230. [Google Scholar]
- Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024. [Google Scholar] [CrossRef]
- Ju, X.; Zeng, A.; Wang, J.; Xu, Q.; Zhang, L. Human-art: A versatile human-centric dataset bridging natural and artificial scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 618–629. [Google Scholar]
Model | mAP@0.5 | mAP@0.75 | mAP@0.5:0.95 | PCK@0.1 | OKS |
---|---|---|---|---|---|
DWPose [53] | 78.62% | 65.34% | 57.42% | 85.23% | 81.14% |
OpenPose [54] | 80.13% | 67.85% | 59.31% | 86.71% | 82.39% |
RTMDet [55] | 81.57% | 68.92% | 61.15% | 88.43% | 83.72% |
AlphaPose [56] | 82.02% | 69.10% | 61.00% | 88.72% | 84.02% |
TokenPose [28] | 81.89% | 69.85% | 61.58% | 89.01% | 83.87% |
ViTPose++ [57] | 82.31% | 69.97% | 62.22% | 88.94% | 84.26% |
YOLOv11-Pose [58] | 83.38% | 71.24% | 63.78% | 89.32% | 85.04% |
MythPose | 89.13% | 73.54% | 66.07% | 92.51% | 87.22% |
Model | mAP@0.5 | mAP@0.75 | mAP@0.5:0.95 | PCK@0.1 | OKS |
---|---|---|---|---|---|
DWPose [53] | 76.12% | 62.48% | 54.05% | 82.93% | 78.24% |
OpenPose [54] | 77.51% | 64.02% | 55.88% | 84.41% | 79.85% |
RTMDet [55] | 78.63% | 65.34% | 57.11% | 85.97% | 81.13% |
AlphaPose [56] | 78.22% | 64.91% | 56.35% | 85.41% | 80.52% |
TokenPose [28] | 78.75% | 65.02% | 56.60% | 85.47% | 81.08% |
ViTPose++ [57] | 78.81% | 65.17% | 56.73% | 85.67% | 81.34% |
YOLOv11-Pose [58] | 79.82% | 66.10% | 57.94% | 86.34% | 81.92% |
MythPose | 86.10% | 69.68% | 61.45% | 89.32% | 84.02% |
Model | mAP@0.5 | mAP@0.75 | mAP@0.5:0.95 | PCK@0.1 | OKS |
---|---|---|---|---|---|
DWPose [53] | 87.03% | 67.29% | 53.12% | 91.19% | 87.11% |
OpenPose [54] | 86.55% | 65.48% | 51.46% | 89.24% | 85.57% |
RTMDet [55] | 90.48% | 71.22% | 55.27% | 92.12% | 89.51% |
AlphaPose [56] | 87.54% | 66.46% | 52.08% | 90.13% | 86.77% |
TokenPose [28] | 92.34% | 74.57% | 59.39% | 92.49% | 90.53% |
ViTPose++ [57] | 92.67% | 75.53% | 60.44% | 93.32% | 91.02% |
YOLOv11-Pose [58] | 92.25% | 75.49% | 59.07% | 92.54% | 89.57% |
MythPose | 94.48% | 78.23% | 63.71% | 95.83% | 92.51% |
ASSB | Feature Fusion | KAL | Precision (%) | Recall (%) | mAP@0.5 (%) |
---|---|---|---|---|---|
- | - | - | 83.42 | 81.23 | 83.38 |
✓ | - | - | 84.53 | 82.14 | 86.21 |
- | ✓ | - | 84.87 | 82.47 | 85.52 |
- | - | ✓ | 85.04 | 82.73 | 85.83 |
✓ | ✓ | - | 85.32 | 83.06 | 87.13 |
✓ | - | ✓ | 85.65 | 83.34 | 87.42 |
- | ✓ | ✓ | 85.36 | 82.51 | 86.72 |
✓ | ✓ | ✓ | 86.07 | 84.22 | 89.13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xian, Y.; Shen, T.; Lee, Y.; Lan, P.; Zhao, Q.; Yan, L. MythPose: Enhanced Detection of Complex Poses in Thangka Figures. Sensors 2025, 25, 4983. https://doi.org/10.3390/s25164983
Xian Y, Shen T, Lee Y, Lan P, Zhao Q, Yan L. MythPose: Enhanced Detection of Complex Poses in Thangka Figures. Sensors. 2025; 25(16):4983. https://doi.org/10.3390/s25164983
Chicago/Turabian StyleXian, Yukai, Te Shen, Yurui Lee, Ping Lan, Qijun Zhao, and Liang Yan. 2025. "MythPose: Enhanced Detection of Complex Poses in Thangka Figures" Sensors 25, no. 16: 4983. https://doi.org/10.3390/s25164983
APA StyleXian, Y., Shen, T., Lee, Y., Lan, P., Zhao, Q., & Yan, L. (2025). MythPose: Enhanced Detection of Complex Poses in Thangka Figures. Sensors, 25(16), 4983. https://doi.org/10.3390/s25164983