Adaptive Curve-Guided Convolution for Robust 3D Hand Pose Estimation from Corrupted Point Clouds
Abstract
1. Introduction
- We introduce adaptive sampling into 3D hand pose estimation, enabling the network to dynamically select informative points from corrupted point clouds. This strategy highlights structural key regions while suppressing redundancy, which significantly improves robustness to missing data and sensor corruption in real-world hand scans.
- We propose a convolution module that leverages curve guidance to model the geometric continuity of hand structures and integrates it with graph-based context aggregation. The curve-guided representation preserves fine-grained finger trajectories, while the graph convolution captures long-range dependencies across joints and surfaces. This synergy effectively compensates for structural distortions and enhances feature completion in corrupted hand point clouds.
- We propose an evaluation protocol that simulates real-world distortions by removing a fixed percentage of points from existing datasets, with missing-point ratios of 30% to 50%. Two metrics, RCA and RG, are employed to provide a comprehensive quantitative assessment of model robustness under varying distortion levels, while ensuring consistent performance on intact point clouds.
2. Related Work
2.1. 3D Hand Pose Estimation from Point Clouds
2.2. Adaptive Sampling for Point Cloud Learning
2.3. Curve Fitting, Structured Traversals, and Geometry-Aware Aggregation
3. Methodology
3.1. Network Architecture
3.2. Adaptive Sampling Module
3.3. Hand-Curve Guide Convolution
3.3.1. Curve Grouping Stage
3.3.2. Curve Aggregation Stage
3.4. Evaluation Metrics
4. Experimentation
4.1. Datasets, Setup, and Metrics
4.2. Pose Estimation Visualization
4.3. Robustness to Corrupted Point Clouds
4.4. Comparative Analysis
4.5. Key Modules Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AS | Adaptive Sampling |
| HCGC | Hand-Curve Guide Convolution |
| CIC | Curve Instance Convolution |
| FPS | Farthest Point Sampling |
| KNN | K-Nearest Neighbors |
| CG | Curve Grouping |
| CA | Curve Aggregation |
| AP | Attentive Pooling |
| MPJPE | Mean Per Joint Position Error |
| RCA | Robustness Curve Area |
| RG | Robustness Gain |
References
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Guo, M.H.; Cai, J.X.; Liu, Z.N.; Mu, T.J.; Martin, R.R.; Hu, S.M. Pct: Point cloud transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
- Huang, Y.; Zhang, M.; Ding, D.; Jiang, E.; Wang, Z.; Yang, M. Causalpc: Improving the robustness of point cloud classification by causal effect identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 19779–19789. [Google Scholar]
- Watanabe, R.; Nonaka, K.; Pavez, E.; Kobayashi, T.; Ortega, A. Fast graph-based denoising for point cloud color information. In Proceedings of the ICASSP 2024—IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 4025–4029. [Google Scholar]
- Zhu, Z.; Chen, H.; He, X.; Wang, W.; Qin, J.; Wei, M. Svdformer: Complementing point cloud via self-view augmentation and self-structure dual-generator. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 14508–14518. [Google Scholar]
- Ge, L.; Cai, Y.; Weng, J.; Yuan, J. Hand pointnet: 3d hand pose estimation using point sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8417–8426. [Google Scholar]
- Xiang, T.; Zhang, C.; Song, Y.; Yu, J.; Cai, W. Walk in the cloud: Learning curves for point clouds shape analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 915–924. [Google Scholar]
- Erol, A.; Bebis, G.; Nicolescu, M.; Boyle, R.D.; Twombly, X. Vision-based hand pose estimation: A review. Comput. Vis. Image Underst. 2007, 108, 52–73. [Google Scholar] [CrossRef]
- Oikonomidis, I.; Kyriazis, N.; Argyros, A. Efficient model-based 3d tracking of hand articulations using kinect. In Proceedings of the British Machine Vision Conference, Dundee, UK, 29 August–2 September 2011. [Google Scholar]
- Keskin, C.; Kıraç, F.; Kara, Y.E.; Akarun, L. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Proceedings of the Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Proceedings, Part VI 12. Springer: Berlin/Heidelberg, Germany, 2012; pp. 852–863. [Google Scholar]
- Tompson, J.; Stein, M.; Lecun, Y.; Perlin, K. Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. ToG 2014, 33, 1–10. [Google Scholar] [CrossRef]
- Oberweger, M.; Wohlhart, P.; Lepetit, V. Hands deep in deep learning for hand pose estimation. arXiv 2015, arXiv:1502.06807. [Google Scholar]
- Oberweger, M.; Lepetit, V. Deepprior++: Improving fast and accurate 3d hand pose estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 585–594. [Google Scholar]
- Ge, L.; Liang, H.; Yuan, J.; Thalmann, D. 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1991–2000. [Google Scholar]
- Moon, G.; Chang, J.Y.; Lee, K.M. V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5079–5088. [Google Scholar]
- Wan, C.; Probst, T.; Van Gool, L.; Yao, A. Dense 3d regression for hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5147–5156. [Google Scholar]
- Spurr, A.; Iqbal, U.; Molchanov, P.; Hilliges, O.; Kautz, J. Weakly supervised 3d hand pose estimation via biomechanical constraints. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 211–228. [Google Scholar]
- Wang, J.; Mueller, F.; Bernard, F.; Theobalt, C. Generative Model-Based Loss to the Rescue: A Method to Overcome Annotation Errors for Depth-Based Hand Pose Estimation. In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG), Buenos Aires, Argentina, 16–20 November 2020; IEEE: New York, NY, USA, 2020; pp. 101–108. [Google Scholar] [CrossRef]
- Cheng, J.; Wan, Y.; Zuo, D.; Ma, C.; Gu, J.; Tan, P.; Wang, H.; Deng, X.; Zhang, Y. Efficient virtual view selection for 3d hand pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Association for the Advancement of Artificial Intelligence: Washington, DC, USA, 2022; Volume 36, pp. 419–426. [Google Scholar]
- Ge, L.; Ren, Z.; Yuan, J. Point-to-point regression pointnet for 3d hand pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 475–491. [Google Scholar]
- Tang, D.; Ye, Q.; Yuan, S.; Taylor, J.; Kohli, P.; Keskin, C.; Kim, T.K.; Shotton, J. Opening the black box: Hierarchical sampling optimization for hand pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2161–2175. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Wang, G.; Zhang, C.; Kim, T.K.; Ji, X. Shpr-net: Deep semantic hand pose regression from point clouds. IEEE Access 2018, 6, 43425–43439. [Google Scholar] [CrossRef]
- Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. In Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Zhao, H.; Jiang, L.; Fu, C.W.; Jia, J. Pointweb: Enhancing local neighborhood features for point cloud processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5565–5573. [Google Scholar]
- Liu, X.; Han, Z.; Liu, Y.S.; Zwicker, M. Point2sequence: Learning the shape representation of 3d point clouds with an attention-based sequence to sequence network. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 8778–8785. [Google Scholar]
- Malik, J.; Elhayek, A.; Nunnari, F.; Varanasi, K.; Tamaddon, K.; Heloir, A.; Stricker, D. Deephps: End-to-end estimation of 3d hand pose and shape by learning from synthetic depth. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; IEEE: New York, NY, USA, 2018; pp. 110–119. [Google Scholar]
- Rezaei, M.; Rastgoo, R.; Athitsos, V. TriHorn-net: A model for accurate depth-based 3D hand pose estimation. Expert Syst. Appl. 2023, 223, 119922. [Google Scholar] [CrossRef]
- Li, S.; Lee, D. Point-to-Pose Voting Based Hand Pose Estimation Using Residual Permutation Equivariant Layer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11927–11936. [Google Scholar] [CrossRef]
- Wu, Y.; Ma, S.; Zhang, D.; Sun, J. 3D Capsule Hand Pose Estimation Network Based on Structural Relationship Information. Symmetry 2020, 12, 1636. [Google Scholar] [CrossRef]
- Du, K.; Lin, X.; Sun, Y.; Ma, X. Crossinfonet: Multi-task information sharing based hand pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9896–9905. [Google Scholar]
- Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 16259–16268. [Google Scholar]
- Wu, X.; Jiang, L.; Wang, P.S.; Liu, Z.; Liu, X.; Qiao, Y.; Ouyang, W.; He, T.; Zhao, H. Point transformer v3: Simpler faster stronger. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 4840–4851. [Google Scholar]
- Cheng, W.; Tang, H.; Gool, L.V.; Jeon, J.H.; Ko, J.H. HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 2274–2284. [Google Scholar] [CrossRef]
- Wang, Y.; Xu, H.; Heng, P.A.; Fu, C.W. Unihope: A unified approach for hand-only and hand-object pose estimation. In Proceedings of the Computer Vision and Pattern Recognition Conference, Denver, CO, USA, 3–7 June 2025; pp. 12231–12241. [Google Scholar]
- Zhang, C.; Qi, Z.; Yuan, W.; Qi, W.; Yang, Z.; Su, Z. PCDM: Point Cloud Completion by Conditional Diffusion Model: C. Neural Process. Lett. 2025, 57, 51. [Google Scholar] [CrossRef]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. TOG 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117. [Google Scholar]
- Yan, X.; Zheng, C.; Li, Z.; Wang, S.; Cui, S. Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5589–5598. [Google Scholar]
- Li, N.; Li, X.; Zhou, J.; Jiang, D.; Liu, J.; Qin, H. GeoHi-GNN: Geometry-aware hierarchical graph representation learning for normal estimation. Comput. Aided Geom. Des. 2024, 114, 102390. [Google Scholar] [CrossRef]
- Zhu, X.; Du, D.; Chen, W.; Zhao, Z.; Nie, Y.; Han, X. Nerve: Neural volumetric edges for parametric curve extraction from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13601–13610. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Sun, X.; Wei, Y.; Liang, S.; Tang, X.; Sun, J. Cascaded hand pose regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 824–832. [Google Scholar]
- Tang, D.; Jin Chang, H.; Tejani, A.; Kim, T.K. Latent regression forest: Structured estimation of 3d articulated hand posture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3786–3793. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]












| Methodologies | MSRA (mm) | ICVL (mm) | NYU (mm) |
|---|---|---|---|
| DeepModel | - | 11.56 | 17.04 |
| DeepPrior | - | - | 20.75 |
| DeepPrior++ | - | - | 12.24 |
| 3DCNN | 9.58 | - | 14.11 |
| Ren-4x6x6 | - | 7.63 | 13.39 |
| HandPointNet | 8.51 | 6.94 | 10.54 |
| Ren-9x6x6 | 9.79 | 7.31 | 12.69 |
| Pose-REN | 8.65 | 6.79 | 11.81 |
| DenseReg | 7.23 | 7.24 | 10.21 |
| V2V-PoseNet | - | 6.28 | 8.42 |
| SHPR-Net | 7.76 | 7.22 | 10.78 |
| DeepHPS | - | - | 14.42 |
| Point-to-Point | 7.71 | 6.33 | 9.05 |
| CrossInfoNet | 7.86 | 6.73 | 10.08 |
| TriHorn-Net | 7.13 | 5.74 | 7.68 |
| Ours | 7.43 | 6.56 | 10.07 |
| Method | MSRA | ICVL | NYU | Average |
|---|---|---|---|---|
| 3DCNN | 50.2 | 40.5 | 45.8 | 45.5 |
| Ren-9x6x6 | 48.3 | 38.1 | 43.2 | 43.2 |
| Pose-REN | 45.6 | 36.4 | 40.1 | 40.7 |
| DenseReg | 40.8 | 35.2 | 38.5 | 38.2 |
| SHPR-Net | 42.7 | 34.9 | 37.3 | 38.3 |
| Point-to-Point | 38.9 | 33.6 | 36.4 | 36.3 |
| CrossInfoNet | 43.5 | 37.2 | 39.7 | 40.1 |
| TriHorn-Net | 33.2 | 31.5 | 34.3 | 33.0 |
| HandPointNet | 49.4 | 35.6 | 39.8 | 41.6 |
| Our Method | 28.7 | 30.7 | 33.1 | 30.8 |
| RG | 20.7 | 4.9 | 6.7 | 10.8 |
| Method | MSRA | ICVL | NYU | Average |
|---|---|---|---|---|
| 3DCNN | 24.6 | 21.8 | 23.1 | 23.2 |
| Ren-9x6x6 | 23.8 | 20.7 | 22.3 | 22.3 |
| Pose-REN | 22.5 | 19.9 | 21.2 | 21.2 |
| DenseReg | 20.7 | 19.1 | 20.0 | 19.9 |
| SHPR-Net | 21.1 | 18.6 | 19.5 | 19.7 |
| Point-to-Point | 19.8 | 18.3 | 19.0 | 19.0 |
| CrossInfoNet | 22.2 | 19.5 | 20.4 | 20.7 |
| TriHorn-Net | 18.9 | 17.6 | 18.3 | 18.3 |
| HandPointNet | 23.4 | 19.3 | 20.1 | 20.9 |
| Our Method | 18.4 | 18.1 | 17.9 | 18.1 |
| RG | 5.0 | 1.2 | 2.2 | 2.8 |
| Configuration | MSRA (mm) | ICVL (mm) | NYU (mm) |
|---|---|---|---|
| AS Module Ablation | |||
| FPS Sampling | 8.08 | 7.25 | 11.62 |
| AS Module | 7.43 | 6.56 | 10.07 |
| HCGC Module Ablation | |||
| Traditional Feature Extraction | 9.52 | 8.12 | 12.28 |
| Curve Grouping Only | 8.53 | 7.33 | 11.24 |
| Full HCGC Module | 7.43 | 6.56 | 10.07 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
She, L.; Sun, H.; Zou, H.; Liang, H.; Guo, X.; Chen, Y. Adaptive Curve-Guided Convolution for Robust 3D Hand Pose Estimation from Corrupted Point Clouds. Electronics 2025, 14, 4133. https://doi.org/10.3390/electronics14214133
She L, Sun H, Zou H, Liang H, Guo X, Chen Y. Adaptive Curve-Guided Convolution for Robust 3D Hand Pose Estimation from Corrupted Point Clouds. Electronics. 2025; 14(21):4133. https://doi.org/10.3390/electronics14214133
Chicago/Turabian StyleShe, Lihuang, Haonan Sun, Hui Zou, Hanze Liang, Xiangli Guo, and Yehan Chen. 2025. "Adaptive Curve-Guided Convolution for Robust 3D Hand Pose Estimation from Corrupted Point Clouds" Electronics 14, no. 21: 4133. https://doi.org/10.3390/electronics14214133
APA StyleShe, L., Sun, H., Zou, H., Liang, H., Guo, X., & Chen, Y. (2025). Adaptive Curve-Guided Convolution for Robust 3D Hand Pose Estimation from Corrupted Point Clouds. Electronics, 14(21), 4133. https://doi.org/10.3390/electronics14214133

