Next Article in Journal
Risk Assessment Method for CPS-Based Distributed Generation Cluster Control in Active Distribution Networks Under Cyber Attacks
Previous Article in Journal
Monocular Visual/IMU/GNSS Integration System Using Deep Learning-Based Optical Flow for Intelligent Vehicle Localization
Previous Article in Special Issue
Analysis of AWS Rekognition and Azure Custom Vision Performance in Parking Sign Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

CGFusionFormer: Exploring Compact Spatial Representation for Robust 3D Human Pose Estimation with Low Computation Complexity

College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2025, 25(19), 6052; https://doi.org/10.3390/s25196052
Submission received: 29 August 2025 / Revised: 25 September 2025 / Accepted: 29 September 2025 / Published: 1 October 2025

Abstract

Transformer-based 2D-to-3D lifting methods have demonstrated outstanding performance in 3D human pose estimation from 2D pose sequences. However, they still encounter challenges with the relatively poor quality of 2D joints and substantial computational costs. In this paper, we propose a CGFusionFormer to address these problems. We propose a compact spatial representation (CSR) to robustly generate local spatial multihypothesis features from part of the 2D pose sequence. Specifically, CSR models spatial constraints based on body parts and incorporates 2D Gaussian filters and nonparametric reduction to improve spatial features against low-quality 2D poses and reduce the computational cost of subsequent temporal encoding. We design a residual-based Hybrid Adaptive Fusion module that combines multihypothesis features with global frequency domain features to accurately estimate the 3D human pose with minimal computational cost. We realize CGFusionFormer with a PoseFormer-like transformer backbone. Extensive experiments on the challenging Human3.6M and MPI-INF-3DHP benchmarks show that our method outperforms prior transformer-based variants in short receptive fields and achieves a superior accuracy–efficiency trade-off. On Human3.6M (sequence length 27, 3 input frames), it achieves 47.6 mm Mean Per Joint Position Error (MPJPE) at only 71.3 MFLOPs, representing about a 40 percent reduction in computation compared with PoseFormerV2 while attaining better accuracy. On MPI-INF-3DHP (81-frame sequences), it reaches 97.9 Percentage of Correct Keypoints (PCK), 78.5 Area Under the Curve (AUC), and 27.2 mm MPJPE, matching the best PCK and achieving the lowest MPJPE among the compared methods under the same setting.
Keywords: 2D-to-3D lifting; human pose estimation; compact spatial representation; transformer 2D-to-3D lifting; human pose estimation; compact spatial representation; transformer

Share and Cite

MDPI and ACS Style

Lu, T.; Wang, H.; Xiao, D. CGFusionFormer: Exploring Compact Spatial Representation for Robust 3D Human Pose Estimation with Low Computation Complexity. Sensors 2025, 25, 6052. https://doi.org/10.3390/s25196052

AMA Style

Lu T, Wang H, Xiao D. CGFusionFormer: Exploring Compact Spatial Representation for Robust 3D Human Pose Estimation with Low Computation Complexity. Sensors. 2025; 25(19):6052. https://doi.org/10.3390/s25196052

Chicago/Turabian Style

Lu, Tao, Hongtao Wang, and Degui Xiao. 2025. "CGFusionFormer: Exploring Compact Spatial Representation for Robust 3D Human Pose Estimation with Low Computation Complexity" Sensors 25, no. 19: 6052. https://doi.org/10.3390/s25196052

APA Style

Lu, T., Wang, H., & Xiao, D. (2025). CGFusionFormer: Exploring Compact Spatial Representation for Robust 3D Human Pose Estimation with Low Computation Complexity. Sensors, 25(19), 6052. https://doi.org/10.3390/s25196052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop