Temporal Probability-Guided Graph Topology Learning for Robust 3D Human Mesh Reconstruction
Abstract
1. Introduction
- 1.
- We pioneer the integration of temporal probability alignment with graph topology learning for 3D human pose and shape estimation. Our framework synergistically combines probabilistic feature modeling with anatomical prior knowledge encoded in graph structures, yielding improved reconstruction accuracy.
- 2.
- This paper introduces a novel Graph Topological Modeling (GTM) module that learns the inherent connectivity patterns of the human body through graph convolutional networks. GTM encodes per-frame latent representations that capture the topological organization of the human body structure. The proposed Hierarchical Human Loss (HHLoss) function progressively computes probability distribution errors across different hierarchical levels of body part decomposition, enabling fine-grained supervision during training.
- 3.
- Comprehensive experimental validation demonstrates the efficacy of our approach for 3D human mesh reconstruction, with particularly strong performance in challenging scenarios featuring occlusions and motion blur. Our method establishes new state-of-the-art results for video-based reconstruction on the 3DPW [27] benchmark dataset.
- 4.
- This paper extends our preliminary arXiv preprint [28] by incorporating detailed implementation specifications, computational efficiency analysis compared to 4DHumans, additional qualitative results, and discussion of future research directions.
2. Methods
2.1. Overview
2.2. Graph Topological Modeling (GTM)
2.3. Temporally-Alignable Probability Distribution (TPDist)
2.4. Hierarchical Human Loss
- Level 1 (Coarse): Full Body. The entire 6890-vertex mesh is treated as a single entity to capture global pose consistency. This level ensures overall body structure alignment and prevents catastrophic misalignment of major body segments.
- Level 2 (Intermediate): Major Body Parts. The mesh is partitioned into six anatomical regions based on the SMPL skeleton kinematic tree: (1) head and neck (vertices 0–554), (2) torso (vertices 555–1946), (3) left arm (vertices 1947–3118), (4) right arm (vertices 3119–4290), (5) left leg (vertices 4291–5540), and (6) right leg (vertices 5541–6889). This decomposition follows the natural articulation boundaries of the human body, with vertex assignments determined by geodesic distance to skeleton joints.
- Level 3 (Fine): Joint-level Regions. Each major body part is further subdivided into finer regions centered on key joints (e.g., shoulder, elbow, and wrist for the arms). This produces 24 joint-centric regions, corresponding to the SMPL skeleton’s 24 joints. For each joint j, vertices within a geodesic radius of three edges are assigned to that joint’s region, enabling localized supervision of critical articulation points.
3. Results
- MPJPE (Mean Per-Joint Position Error): Measures the average Euclidean distance between predicted and ground truth 3D joint positions across all joints:where J is the number of joints, denotes the ground truth 3D position of joint j, and is the predicted position.
- PA-MPJPE (Procrustes-Aligned MPJPE): Computes MPJPE after aligning the predicted pose to ground truth using Procrustes analysis (accounting for rotation, translation, and scale), measuring shape accuracy independent of global alignment:where denotes the Procrustes alignment operation.
- MPVPE (Mean Per-Vertex Position Error): Analogous to MPJPE but evaluated over all mesh vertices rather than skeletal joints, providing a more comprehensive assessment of full mesh reconstruction accuracy:where V is the number of vertices (6890 for SMPL), is the ground truth vertex position, and is the predicted vertex position.
3.1. Ablation Study
3.2. Comparative Study
3.3. Performance Analysis Under Varying Occlusion Levels
4. Discussion
4.1. Real-World Applications: Software-Defined Internet of Vehicles (SD-IoV)
- Pedestrian Behavior Analysis for Autonomous Vehicles. Vehicle-mounted cameras must track pedestrian poses in real-time under adverse conditions including occlusions by other vehicles, motion blur from vehicle speed, and intermittent visibility due to network packet loss during video streaming. Our temporal–topological framework can reconstruct pedestrian body poses even when frames are missing or corrupted, enabling more reliable collision avoidance and trajectory prediction systems.
- Driver Activity Monitoring in Connected Vehicles. In-cabin cameras can monitor driver behavior (e.g., detecting distracted driving, fatigue) by analyzing 3D pose. However, driver body parts are frequently occluded by steering wheels, seatbelts, and cabin structures. Our GTM-TPDist architecture leverages body topology priors to infer occluded limb positions, while HHLoss ensures anatomically plausible pose estimates crucial for safety-critical driver state assessment.
- Vehicle-to-Everything (V2X) Video Communication. Video compression artifacts and dropped frames are common when vehicles share camera footage over bandwidth-constrained V2X networks. Our probabilistic temporal modeling can “fill in” missing information from degraded video streams, thereby maintaining continuous 3D human tracking for traffic monitoring and smart city applications.
4.2. Limitations
- Children and adolescents with different limb-to-torso ratios.
- Individuals with atypical body shapes (e.g., obesity, muscular builds).
- People with physical disabilities or prosthetic limbs.
- Non-human subjects (animals, robots) that may benefit from similar pose estimation.
- Identity confusion: When two people overlap significantly (>50% mutual occlusion), the model may incorrectly associate limbs between individuals, producing chimeric reconstructions.
- Depth ambiguity: In scenes with multiple people at similar depths, the model cannot reliably determine which body parts belong to which person without explicit tracking.
- Prolonged occlusion: When a person is fully occluded for more than ten consecutive frames, temporal probability propagation becomes unreliable and the model may “hallucinate” implausible poses.
- Very low resolution (<128 × 128 pixels): Insufficient visual detail for reliable feature extraction.
- Severe motion blur (exposure >100ms): Loss of edge information critical for body part segmentation.
- Extreme lighting conditions: Overexposure or underexposure causing loss of texture information.
- Cultural pose variations (e.g., traditional dances, martial arts).
- Occupational activities (e.g., construction workers, athletes in specialized sports).
- Clothing diversity (loose garments, costumes that significantly alter body silhouette).
5. Conclusions
6. Enhancements to the Preprint Version
- Comprehensive implementation details enabling full reproducibility (Section 6.1).
- Computational efficiency analysis with detailed comparison to state-of-the-art methods (Section 6.2).
- Extended experimental validation and qualitative analysis (Section 6.3).
- Refined presentation with improved clarity and additional technical specifications.
6.1. Implementation Specifications
6.2. Computational Efficiency Analysis
6.3. Additional Qualitative Analysis
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Lupión, M.; Polo-Rodríguez, A.; Medina-Quero, J.; Sanjuan, J.F.; Ortigosa, P.M. 3D Human Pose Estimation from multi-view thermal vision sensors. Inf. Fusion 2024, 104, 102154. [Google Scholar] [CrossRef]
- Du, S.; Yuan, Z.; Lai, P.; Ikenaga, T. JoyPose: Jointly learning evolutionary data augmentation and anatomy-aware global–local representation for 3D human pose estimation. Pattern Recognit. 2024, 147, 110116. [Google Scholar] [CrossRef]
- Yan, R.; Yin, Q.; Zhang, X.; Zhang, Q.; Zhang, G.; Ma, S. Pose-Driven Compression for Dynamic 3D Human via Human Prior Models. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5820–5834. [Google Scholar] [CrossRef] [PubMed]
- de Silva, C.W. Intelligent robotics-misconceptions, current trends, and opportunities. Intell. Robot. 2021, 1, 3–17. [Google Scholar] [CrossRef]
- Li, J.; Xu, Z.; Zhu, D.; Dong, K.; Yan, T.; Zeng, Z.; Yang, S.X. Bio-inspired intelligence with applications to robotics: A survey. Intell. Robot. 2021, 1, 58–83. [Google Scholar] [CrossRef]
- Qi, J.; Zhou, Q.; Lei, L.; Zheng, K. Federated reinforcement learning: Techniques, applications, and open challenges. Intell. Robot. 2021, 1, 18–57. [Google Scholar] [CrossRef]
- Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
- Mirjalili, S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015, 89, 228–249. [Google Scholar] [CrossRef]
- Salehnia, T.; Fathi, A.; Azar, A.T. A MTIS method using a combined of whale and moth-flame optimization algorithms. In Handbook of Whale Optimization Algorithm; Academic Press: Cambridge, MA, USA, 2024; pp. 625–651. [Google Scholar]
- Ye, C.; Che, K.; Yao, Y.; Ma, N.; Zhang, R.; Xu, Y.; Wang, J.; Meng, M.Q.H. A deep learning-based system for accurate detection of anatomical landmarks in colon environment. Intell. Robot. 2024, 4, 164–178. [Google Scholar] [CrossRef]
- Zhang, Y.; Pan, D.; Griensven, J.V.; Yang, S.X.; Gharabaghi, B. Intelligent flood forecasting and warning. Intell. Robot. 2023, 3, 190–212. [Google Scholar] [CrossRef]
- Xin, J.; Tao, G.; Tang, Q.; Zou, F.; Xiang, C. Structural damage identification method based on Swin Transformer and continuous wavelet transform. Intell. Robot. 2024, 4, 200–215. [Google Scholar] [CrossRef]
- Ni, J.; Chen, Y.; Tang, G.; Shi, J.; Cao, W.; Shi, P. Deep learning-based scene understanding for autonomous robots: A survey. Intell. Robot. 2023, 3, 374–401. [Google Scholar] [CrossRef]
- Zhang, D.; Xue, X.; Gao, P.; Jin, Z.; Hu, M.; Wu, Y.; Ying, X. A survey of datasets in medicine for large language models. Intell. Robot. 2024, 4, 457–478. [Google Scholar] [CrossRef]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A comprehensive survey on deep graph representation learning. Neural Netw. 2024, 171, 106207. [Google Scholar] [CrossRef]
- Cai, Y.; Ge, L.; Liu, J.; Cai, J.; Cham, T.J.; Yuan, J.; Thalmann, N.M. Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 2272–2281. [Google Scholar]
- Li, J.; Yang, S.X. A novel feature learning-based bio-inspired neural network for real-time collision-free rescue of multi-robot systems. IEEE Trans. Ind. Electron. 2024, 71, 14420–14429. [Google Scholar] [CrossRef]
- Li, J.; Yang, S.X. Intelligent collective escape of swarm robots based on a novel fish-inspired self-adaptive approach with neurodynamic models. IEEE Trans. Ind. Electron. 2024, 71, 14460–14469. [Google Scholar] [CrossRef]
- Xu, Z.; Yan, T.; Yang, S.X.; Gadsden, S.A.; Biglarbegian, M. Distributed robust learning based formation control of mobile robots based on bioinspired neural dynamics. IEEE Trans. Intell. Veh. 2024, 10, 2608–2617. [Google Scholar] [CrossRef]
- Liu, Y.; Yuan, J.; Tu, Z. A discriminative multi-modal adaptation neural network model for video action recognition. Neural Netw. 2024, 180, 107114. [Google Scholar] [CrossRef]
- Cho, J.; Youwang, K.; Oh, T.H. Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2022. [Google Scholar]
- Goel, S.; Pavlakos, G.; Rajasegaran, J.; Kanazawa, A.; Malik, J. Humans in 4D: Reconstructing and Tracking Humans with Transformers. In Proceedings of the International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023. [Google Scholar]
- Yan, T.; Xu, Z.; Yang, S.X. Distributed robust learning-based backstepping control aided with neurodynamics for consensus formation tracking of underwater vessels. IEEE Trans. Cybern. 2024, 54, 2434–2445. [Google Scholar] [CrossRef]
- Yan, T.; Xu, Z.; Yang, S.X. Consensus formation control for multiple AUV systems using distributed bioinspired sliding mode control. IEEE Trans. Intell. Veh. 2023, 8, 1081–1092. [Google Scholar] [CrossRef]
- Zhang, K.; Li, Y.; Liang, J.; Cao, J.; Zhang, Y.; Tang, H.; Timofte, R.; Van Gool, L. MPCNet: Compressed multi-view video restoration via motion-parallax complementation network. Neural Netw. 2023, 167, 108–121. [Google Scholar] [CrossRef]
- Sengupta, A.; Budvytis, I.; Cipolla, R. Hierarchical kinematic probability distributions for 3D human shape and pose estimation from images in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 11219–11229. [Google Scholar]
- von Marcard, T.; Henschel, R.; Black, M.; Rosenhahn, B.; Pons-Moll, G. Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Wang, H.; Yang, J.; Wan, X.; Zhang, Y.; Lin, F.; Wu, F. ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction. arXiv 2024, arXiv:2411.04399. [Google Scholar]
- Berger, C.; Doherty, P.; Rudo, P.; Wzorek, M. Leveraging active queries in collaborative robotic mission planning. Intell. Robot. 2024, 4, 87–106. [Google Scholar] [CrossRef]
- Yang, P.; Yan, H.; Rao, K.; Yang, P.; Lv, Y. Distributed model predictive control for unmanned aerial vehicles and vehicle platoon systems: A review. Intell. Robot. 2024, 4, 293–317. [Google Scholar] [CrossRef]
- Huang, Y.; Zhang, H.; Shi, Y.; Wang, L.; Chen, S. Towards complex dynamic physics system simulation with graph neural ordinary equations. Neural Netw. 2024, 177, 106341. [Google Scholar] [CrossRef]
- Wu, Y.; Hu, X.; Zhang, Y.; Gong, M.; Ma, W.; Miao, Q. SACF-Net: Skip-Attention Based Correspondence Filtering Network for Point Cloud Registration. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 3585–3595. [Google Scholar] [CrossRef]
- Wu, Y.; Zhang, Y.; Fan, X.; Gong, M.; Miao, Q.; Ma, W. INENet: Inliers Estimation Network With Similarity Learning for Partial Overlapping Registration. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 1413–1426. [Google Scholar] [CrossRef]
- Yuan, Y.; Wu, Y.; Fan, X.; Gong, M.; Miao, Q.; Ma, W. Inlier Confidence Calibration for Point Cloud Registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
- Wu, Y.; Hu, X.; Yuan, Y.; Fan, X.; Gong, M.; Li, H.; Zhang, M.; Miao, Q.; Ma, W. PointMC: Multi-instance Point Cloud Registration Based on Maximal Cliques. In Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Loper, M.; Mahmood, N.; Romero, J.; Pons-Moll, G.; Black, M.J. SMPL: A Skinned Multi-Person Linear Model. Acm Trans. Graph. 2015, 34, 248. [Google Scholar] [CrossRef]
- Li, Z.; Gao, J.; Wang, X.; Zhang, Y. DyGraphformer: Transformer combining dynamic spatio-temporal graph network for multivariate time series forecasting. Neural Netw. 2025, 181, 106776. [Google Scholar] [CrossRef]
- Arnab, A.; Dehghani, M.; Heigold, G.; Sun, C.; Lučić, M.; Schmid, C. ViViT: A Video Vision Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 6836–6846. [Google Scholar]
- Ni, J.; Shen, H.; Wang, X.; Zhang, J.; Zheng, Y. Multi-scale convolution enhanced transformer for multivariate long-term time series forecasting. Neural Netw. 2024, 180, 106745. [Google Scholar] [CrossRef]
- Shi, F.; Yang, S.X.; Mukherjee, M.; Jiang, H.; da Costa, D.B.; Wong, W.K. Parameter sharing-based average-consensus time synchronization in IoT networks. IEEE Internet Things J. 2023, 10, 8215–8227. [Google Scholar] [CrossRef]
- Lu, Y.; Yang, L.; Yang, S.X.; Hua, Q.; Sangaiah, A.K.; Guo, T.; Yu, K. An intelligent deterministic scheduling method for ultra-low latency communication in edge enabled industrial internet of things. IEEE Trans. Ind. Inform. 2023, 19, 1756–1767. [Google Scholar] [CrossRef]
- Qin, Z.; Zhou, S.; Wang, L.; Duan, J.; Hua, G.; Tang, W. MotionTrack: Learning motion predictor for multiple object tracking. Neural Netw. 2024, 179, 106539. [Google Scholar] [CrossRef]
- Chen, M.; Liu, Y.; Zhu, D.; Shen, A.; Wang, C.; Ji, K. Parameter identification of an open-frame underwater vehicle based on quantum particle swarm optimization. Intell. Robot. 2024, 4, 216–229. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020, arXiv:2006.11239. [Google Scholar] [CrossRef]
- Imani, M.; Ghoreishi, S.F. Empirical strategy for stretching probability distribution in neural-network-based regression. Neural Netw. 2021, 138, 82–95. [Google Scholar] [CrossRef]
- Sanchez-Cauce, R.; Paris, I.; Diez, F.J. A survey of sum–product networks structural learning. Neural Netw. 2023, 165, 345–364. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, Z.; Yang, Z. Modeling Bellman-error with logistic distribution with applications in reinforcement learning. Neural Netw. 2024, 177, 106387. [Google Scholar] [CrossRef]
- Liu, X.; Liang, Y.; Huang, C.; Zheng, Y.; Hooi, B.; Zimmermann, R. Interpretable local flow attention for multi-step traffic flow prediction. Neural Netw. 2023, 161, 25–39. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, Y.; Li, M. DAFA-BiLSTM: Deep Autoregression Feature Augmented Bidirectional LSTM network for time series prediction. Neural Netw. 2023, 157, 240–256. [Google Scholar] [CrossRef]
- Ni, J.; Shen, K.; Chen, Y.; Yang, S.X. An improved SSD-like deep network-based object detection method for indoor scenes. IEEE Trans. Instrum. Meas. 2023, 72, 5006915. [Google Scholar] [CrossRef]
- Li, Y.; Ren, T.; Liu, Q.; Chen, Y.; Yang, S.X.; Yuan, H.; Li, Y.; Yang, Y. Novel bionic soft robotic hand with dexterous deformation and reliable grasping. IEEE Trans. Instrum. Meas. 2023, 72, 7502110. [Google Scholar] [CrossRef]
- Wu, Y.; Sheng, J.; Ding, H.; Gong, P.; Li, H.; Gong, M.; Ma, W.; Miao, Q. Evolutionary Multitasking Descriptor Optimization for Point Cloud Registration. IEEE Trans. Evol. Comput. 2024, 29, 1239–1253. [Google Scholar] [CrossRef]
- Li, J.; Yang, S.X. Intelligent Fish-Inspired Foraging of Swarm Robots with Sub-Group Behaviors Based on Neurodynamic Models. Biomimetics 2024, 9, 16. [Google Scholar] [CrossRef]
- Han, F.; Reily, B.; Hoff, W.; Zhang, H. Space-time representation of people based on 3D skeletal data: A review. Comput. Vis. Image Underst. 2017, 158, 85–105. [Google Scholar] [CrossRef]
- Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar]
- Liu, J.; Li, J.; Zhang, L.; Dai, F.; Zhang, Y.; Meng, X.; Shen, J. Software-defined internet of vehicles: Architecture, challenges and solutions. J. Commun. Inf. Netw. 2018, 3, 21–40. [Google Scholar]
- Zhou, Z.; Guo, Y.; He, Y.; Zhao, X.; Bazzi, W.M. Software defined machine-to-machine communication for smart energy management. IEEE Commun. Mag. 2016, 54, 52–57. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Xiao, B. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef]
- Wang, M.; Xing, J.; Liu, Y. ActionCLIP: A New Paradigm for Video Action Recognition. arXiv 2021, arXiv:2109.08472. [Google Scholar] [CrossRef]
- Zhao, Q.; Zheng, C.; Liu, M.; Wang, P.; Chen, C. PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 8877–8886. [Google Scholar]






| Vertices | Reduction | MPJPE ↓ | PA-MPJPE ↓ | Relative FLOPs ↓ |
|---|---|---|---|---|
| 6890 (Full) | 0% | 71.5 | 43.2 | 1.00× |
| 1723 | 75.0% | 71.8 | 43.4 | 0.31× |
| 431 | 93.7% | 72.7 | 43.8 | 0.063× |
| 108 | 98.4% | 76.3 | 47.1 | 0.021× |
| Level | Description | Regions | Vertices | Weight |
|---|---|---|---|---|
| 1 | Full Body (Global) | 1 | 6890 | |
| 2 | Head & Neck | 6 | 0–554 | |
| Torso | 555–1946 | |||
| Left Arm | 1947–3118 | |||
| Right Arm | 3119–4290 | |||
| Left Leg | 4291–5540 | |||
| Right Leg | 5541–6889 | |||
| 3 | Joint-centric Regions | 24 | ∼287 each |
| TPDist | Loss | 3DPW | Human3.6M | |||
|---|---|---|---|---|---|---|
| MPVE ↓ | MPJPE ↓ | PA-MPJPE ↓ | MPJPE ↓ | PA-MPJPE ↓ | ||
| × | × | 85.5 ± 0.8 | 73.9 ± 0.6 | 44.9 ± 0.4 | 51.9 ± 0.5 | 33.8 ± 0.3 |
| ✓ | × | 82.4 ± 0.7 ** | 72.8 ± 0.5 * | 44.1 ± 0.4 * | 49.9 ± 0.6 ** | 33.7 ± 0.3 |
| × | ✓ | 81.9 ± 0.6 *** | 72.8 ± 0.5 * | 44.0 ± 0.3 ** | 49.5 ± 0.4 *** | 33.2 ± 0.3 ** |
| ✓ | ✓ | 81.9 ± 0.5 *** | 72.7 ± 0.4 *** | 43.8 ± 0.3 *** | 49.1 ± 0.5 *** | 33.1 ± 0.2 *** |
| Method | Type | 3DPW | Human3.6M | |||
|---|---|---|---|---|---|---|
| MPVPE ↓ | MPJPE ↓ | PA-MPJPE ↓ | MPJPE ↓ | PA-MPJPE ↓ | ||
| Frame based | ||||||
| Graphormer | vert. | 87.7 | 74.7 | 45.6 | 51.2 | 34.5 |
| METRO | vert. | 88.2 | 77.1 | 47.9 | 54.0 | 36.7 |
| Hybrik-X | param. | 94.5 | 80.0 | 48.8 | - | - |
| Pymaf-X | param. | 110.1 | 92.8 | 58.9 | 57.7 | 40.5 |
| Potter | param. | 87.4 | 75.0 | 44.8 | 56.5 | 35.1 |
| Fastmetro | vert. | 84.1 | 73.5 | 44.6 | 52.2 | 33.7 |
| PointHMR | vert. | 85.5 | 73.9 | 44.9 | 48.3 | 32.9 |
| Video based | ||||||
| HMMR | param. | 139.3 | 116.5 | 72.6 | - | 56.9 |
| VIBE | param. | 99.1 | 82.9 | 51.9 | 65.6 | 41.4 |
| TCMR | param. | 111.3 | 95.0 | 55.8 | 62.3 | 41.1 |
| MAED | param. | 92.6 | 79.1 | 45.7 | 56.4 | 38.7 |
| MPS-Net | param. | 109.6 | 91.6 | 54.0 | 69.4 | 47.4 |
| GLoT | param. | 96.3 | 80.7 | 50.6 | 67.0 | 46.3 |
| 4DHumans | param. | 85.2 ± 2.1 | 70.0 ± 0.9 | 44.5 ± 0.5 | 44.8 ± 0.6 | 33.6 ± 0.4 |
| Ours | vert. | 81.9 ± 0.5 * | 72. ± 0.4 | 43.8 ± 0.3 * | 49.1 ± 0.5 | 33.1 ± 0.2 ** |
| Method | MPJPE (mm) ↓ | PA-MPJPE (mm) ↓ | ||||
|---|---|---|---|---|---|---|
| Mild | Moderate | Heavy | Mild | Moderate | Heavy | |
| Fastmetro | 69.2 | 75.8 | 86.3 | 42.1 | 45.7 | 52.4 |
| GLoT | 75.1 | 82.4 | 94.7 | 47.3 | 51.2 | 59.8 |
| 4DHumans | 65.8 | 71.3 | 81.5 | 41.9 | 45.6 | 52.1 |
| Ours | 68.1 | 73.7 | 79.8 | 41.2 | 44.3 | 49.6 |
| Improvement over 4DHumans: | ||||||
| Ours | −2.3 | −4.8 | −7.2 | +0.7 | +1.3 | −2.5 |
| Method | FLOPs ↓ | Params ↓ | 3DPW | Human3.6M | |
|---|---|---|---|---|---|
| MPVE ↓ | PA-MPJPE ↓ | PA-MPJPE ↓ | |||
| 4DHumans | 122,590.2 M | 670.2 M | 85.22 | 44.50 | 33.6 |
| Ours | 42,737.4 M | 291.1 M | 82.12 | 43.82 | 33.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, H.; Yang, J.; Lin, F.; Wu, F. Temporal Probability-Guided Graph Topology Learning for Robust 3D Human Mesh Reconstruction. Mathematics 2026, 14, 367. https://doi.org/10.3390/math14020367
Wang H, Yang J, Lin F, Wu F. Temporal Probability-Guided Graph Topology Learning for Robust 3D Human Mesh Reconstruction. Mathematics. 2026; 14(2):367. https://doi.org/10.3390/math14020367
Chicago/Turabian StyleWang, Hongsheng, Jie Yang, Feng Lin, and Fei Wu. 2026. "Temporal Probability-Guided Graph Topology Learning for Robust 3D Human Mesh Reconstruction" Mathematics 14, no. 2: 367. https://doi.org/10.3390/math14020367
APA StyleWang, H., Yang, J., Lin, F., & Wu, F. (2026). Temporal Probability-Guided Graph Topology Learning for Robust 3D Human Mesh Reconstruction. Mathematics, 14(2), 367. https://doi.org/10.3390/math14020367

