Human Pose Intelligent Detection Algorithm Based on Spatiotemporal Hybrid Dilated Convolution Model
Abstract
1. Introduction
2. Background
2.1. Main Problems
2.2. Preparation of Knowledge
2.2.1. Model-Based 3D Human Pose Estimation Method
2.2.2. Mixed Dilated Convolution
3. Spatiotemporal Mixed Dilated Convolution Model for Human Pose Intelligent Detection Algorithm
3.1. System Model
3.2. Multi-Head Self-Attention Layer
3.3. Time-Domain Hybrid Dilated Convolutional Network
4. Experimental Results and Data Analysis
4.1. Experimental Setup
4.2. Comparison with Existing Technical Methods
4.3. Implementation of Visual 3D HPE on the Jetson TX2 Platform
4.4. Related Tests
4.4.1. Testing and Analysis of TensorRT Acceleration Effect
4.4.2. Ablation Experiment and Analysis of MSA Layer
4.4.3. Ablation Study and Analysis on Receptive Field Frames
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, Y.; Wen, G.; Mi, S.; Zhang, M.; Geng, X. A review of two-dimensional human pose estimation based on deep learning. J. Softw. 2022, 33, 4173–4191. [Google Scholar]
- Toshev, A.; Szegedy, C. DeepPose: Human pose estimation via deep neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1653–1660. [Google Scholar]
- Pavllo, D.; Feichtenhofer, C.; Grangier, D.; Auli, M. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7745–7754. [Google Scholar]
- Xiong, Y.; Li, Q.; Li, Q.; Huang, H. 3D Human Pose Estimation with a Spatio-Temporal Cross Attention-GCN Network. In Proceedings of the 2025 5th International Conference on Neural Networks, Information and Communication Engineering (NNICE), Guangzhou, China, 10–12 January 2025; pp. 717–720. [Google Scholar]
- Li, W.; Liu, H.; Tang, H.; Wang, P.; Van Gool, L. MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 13137–13146. [Google Scholar]
- Jiao, L.; Huang, X.; Ma, L. 3D Human Pose Estimation in Spatio-Temporal Based on Graph Convolutional Networks. In Proceedings of the 2024 IEEE 4th International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 6–8 December 2024; pp. 1252–1256. [Google Scholar]
- Li, Y.; Zhang, S.; Wang, Z.; Yang, S.; Yang, W.; Xia, S.T.; Zhou, E. TokenPose: Learning Keypoint Tokens for Human Pose Estimation. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 11293–11302. [Google Scholar]
- Zhao, J.; Guo, Y.; Ye, L.; Yuan, P.; Peng, D.; Zhang, X.; Li, J.; Wang, W.; Liu, Q. 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images. In Proceedings of the 2024 8th Asian Conference on Artificial Intelligence Technology (ACAIT), Fuzhou, China, 8–10 November 2024; pp. 1–6. [Google Scholar]
- Hu, L.; Hu, J. 3D Human Pose Estimation with Two-Step MixedGraph Convolution Transformer Encoder. In Proceedings of the 2025 5th International Conference on Neural Networks, Information and Communication Engineering (NNICE), Guangzhou, China, 10–12 January 2025; pp. 1479–1484. [Google Scholar]
- Pan, C.; Qu, B.; Miao, R.; Wang, X. Cost-Efficient Fall Risk Assessment With Attention Augmented Vision Machine Learning on Sit-to-Stand Test Videos. IEEE Access 2025, 13, 145373–145386. [Google Scholar] [CrossRef]
- Li, Z.; Wang, X.; Wang, F.; Jiang, P. On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2192–2201. [Google Scholar]
- Zhao, S.; Wu, C.; Zhang, X. Improving 3D Human Pose Estimation with Enhanced Body Feature Representation. In Proceedings of the 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE), Shanghai, China, 21–23 March 2025; pp. 2255–2258. [Google Scholar]
- Cai, Y.; Ge, L.; Liu, J.; Cai, J.; Cham, T.J.; Yuan, J.; Thalmann, N.M. Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2272–2281. [Google Scholar]
- Wang, J.; Huang, S.; Wang, X.; Tao, D. Not All Parts Are Created Equal: 3D Pose Estimation by Modelling Bi-directional Dependencies of Body Parts. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7770–7779. [Google Scholar]
- Hwang, J.; Kang, J. Double Discrete Representation for 3D Human Pose Estimation from Head-mounted Camera. In Proceedings of the 2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 6–8 January 2024; pp. 1–4. [Google Scholar]
- Liu, J.; Akhtar, N.; Mian, A. Adversarial Attack on Skeleton-based Human Action Recognition. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 1609–1622. [Google Scholar] [CrossRef] [PubMed]
- Jiang, W.; Kolotouros, N.; Pavlakos, G.; Zhou, X.; Daniilidis, K. Coherent Reconstruction of Multiple Humans from a Single Image. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5578–5587. [Google Scholar]
- Jena, S.; Multon, F.; Boukhayma, A. Monocular Human Shape and Pose with Dense Mesh-borne Local Image Features. In Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, 15–18 December 2021; pp. 1–5. [Google Scholar]
- Sun, Y.; Zhang, J.; Wang, W. Adversarial Learning Enhancement for 3D Human Pose and Shape Estimation. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 3743–3747. [Google Scholar]
- Chen, L.; Li, F.; Chen, X.; Yang, L. Pressure-Based In-Bed Pose Estimation Methods Using the Skinned Multi-person Linear Model. In Proceedings of the 2024 4th Asia Conference on Information Engineering (ACIE), Singapore, 26–28 January 2024; pp. 74–79. [Google Scholar]
- Li, Q.; Tian, L.; Du, Q. Multi-Resolution Convolution for 3D Semantic Segmentation. In Proceedings of the 2024 International Conference on Intelligent Robotics and Automatic Control (IRAC), Guangzhou, China, 29 November–1 December 2024; pp. 274–278. [Google Scholar]
- Tang, H.; Li, Z.; Zhang, D.; He, S.; Tang, J. Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 1958–1974. [Google Scholar] [CrossRef] [PubMed]
- Khaleghi, L.; Sepas-Moghaddam, A.; Marshall, J.; Etemad, A. Multiview Video-Based 3-D Hand Pose Estimation. IEEE Trans. Artif. Intell. 2023, 4, 896–909. [Google Scholar] [CrossRef]
- Li, Y.; Chen, H. Image recognition based on deep residual shrinkage Network. In Proceedings of the 2021 International Conference on Artificial Intelligence and Electromechanical Automation (AIEA), Guangzhou, China, 14–16 May 2021; pp. 334–337. [Google Scholar]
- Akhtar, M.; Tanveer, M.; Arshad, M. RoBoSS: A Robust, Bounded, Sparse, and Smooth Loss Function for Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 149–160. [Google Scholar] [CrossRef] [PubMed]







| Method | MPJPE/mm | Method | MPJPE/mm |
|---|---|---|---|
| Martinez et al. | 62.9 | Tekin et al. | 69.7 |
| Fang et al. | 60.4 | Yang et al. | 58.6 |
| GraphSH | 51.9 | ST-GCN | 48.8 |
| VPose | 46.8 | MGCN | 49.4 |
| DPoser-X | 39.6 | MHformer | 43.0 |
| MixSTE | 41.5 | PoseFormer | 42.2 |
| P-STMo | 38.7 | STC-Former | 40.2 |
| SimBa | 37.8 | TSHDC (ours) | 48.4 |
| Precision Type | Inference Speed (FPS) | Accuracy (MPJPE/mm) |
|---|---|---|
| No acceleration | 20.8 | 49.4 |
| FP32 | 27.5 | 49.5 |
| FP16 | 36.1 | 49.9 |
| INT8 | 46.4 | 51.9 |
| Dilation Strategy | MPJPE (mm) | Time (ms) | RF (Frames) |
|---|---|---|---|
| Standard dilated | 53.7 | 8.6 | 21 |
| Hybrid (HDC) | 49.4 | 6.2 | 27 |
| Frames | MPJPE (mm) | Time (ms) | Memory (MB) |
|---|---|---|---|
| 9 | 55.2 | 4.1 | 642 |
| 27 | 49.4 | 6.2 | 902 |
| 81 | 48.6 | 18.7 | 1568 |
| 243 | 48.3 | 56.2 | 4728 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, L.; Dai, S.; She, L.; Huo, S. Human Pose Intelligent Detection Algorithm Based on Spatiotemporal Hybrid Dilated Convolution Model. Electronics 2025, 14, 4798. https://doi.org/10.3390/electronics14244798
Zhang L, Dai S, She L, Huo S. Human Pose Intelligent Detection Algorithm Based on Spatiotemporal Hybrid Dilated Convolution Model. Electronics. 2025; 14(24):4798. https://doi.org/10.3390/electronics14244798
Chicago/Turabian StyleZhang, Lili, Shenxi Dai, Lihuang She, and Shuwei Huo. 2025. "Human Pose Intelligent Detection Algorithm Based on Spatiotemporal Hybrid Dilated Convolution Model" Electronics 14, no. 24: 4798. https://doi.org/10.3390/electronics14244798
APA StyleZhang, L., Dai, S., She, L., & Huo, S. (2025). Human Pose Intelligent Detection Algorithm Based on Spatiotemporal Hybrid Dilated Convolution Model. Electronics, 14(24), 4798. https://doi.org/10.3390/electronics14244798
