Implicit 3D Human Reconstruction Guided by Parametric Models and Normal Maps
Abstract
:1. Introduction
- We proposed VFSM (VQGAN Feature Space Mapper), which is a novel representation learning method in the field of point cloud data. This method excels at capturing intricate features within images and can deduce highly precise 3D structures from a limited set of semantic information.
- In response to the aforementioned innovative aspects, we have introduced a novel loss function. This function aids the network in aligning its output with the ground truth of human posture estimation, thereby facilitating the generation of a more precise SMPL-X mesh.
- We propose IHRPN, a new method for 3D human reconstruction that effectively addresses the robustness issue with loose clothing.
2. Related Work
2.1. Explicit Representation
2.2. Implicit Representation
3. Method
3.1. Parametric Mesh
3.2. Implicit Human Reconstruction
4. Experiments and Results
4.1. Datasets
4.2. Metrics
4.3. Implementation Details
4.4. Evaluation
4.5. Ablation Study
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Feng, Y.; Choutas, V.; Bolkart, T.; Tzionas, D.; Black, M.J. Collaborative regression of expressive bodies using moderation. In Proceedings of the 2021 International Conference on 3D Vision (3DV)(IEEE2021), London, UK, 1–3 December 2021; pp. 792–804. [Google Scholar]
- Thomas, H.; Qi, C.R.; Deschaud, J.-E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420. [Google Scholar]
- Varol, G.; Ceylan, D.; Russell, B.; Yang, J.; Yumer, E.; Laptev, I.; Schmid, C. Bodynet: Volumetric inference of 3d human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV) (2018), Munich, Germany, 8–14 September 2018; pp. 20–36. [Google Scholar]
- Wang, W.; Ceylan, D.; Mech, R.; Neumann, U. 3dn: 3d deformation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), Long Beach, CA, USA, 15–20 June 2019; pp. 1038–1046. [Google Scholar]
- Saito, S.; Huang, Z.; Natsume, R.; Morishima, S.; Kanazawa, A.; Li, H. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2304–2314. [Google Scholar]
- Saito, S.; Simon, T.; Saragih, J.; Joo, H. Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), Seattle, WA, USA, 13–19 June 2020; pp. 84–93. [Google Scholar]
- Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 165–174. [Google Scholar]
- Xiu, Y.; Yang, J.; Tzionas, D.; Black, M.J. Icon: Implicit clothed humans obtained from normals. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE2022), New Orleans, LA, USA, 18–24 June 2022; pp. 13286–13296. [Google Scholar]
- Zheng, Z.; Yu, T.; Liu, Y.; Dai, Q. Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3170–3184. [Google Scholar] [CrossRef] [PubMed]
- Esser, P.; Rombach, R.; Ommer, B. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and pattern Recognition (2021), Nashville, TN, USA, 20–25 June 2021; pp. 12873–12883. [Google Scholar]
- Alldieck, T.; Pons-Moll, G.; Theobalt, C.; Magnor, M. Tex2shape: Detailed full human body geometry from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2293–2303. [Google Scholar]
- Fan, H.; Su, H.; Guibas, L.J. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), Honolulu, HI, USA, 21–26 July 2017; pp. 605–613. [Google Scholar]
- Kanazawa, A.; Black, M.J.; Jacobs, D.W.; Malik, J. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7122–7131. [Google Scholar]
- Jackson, A.S.; Manafas, C.; Tzimiropoulos, G. 3d human body reconstruction from a single image via volumetric regression. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Lin, C.-H.; Kong, C.; Lucey, S. Learning efficient point cloud generation for dense 3d object reconstruction. In Proceedings of the AAAI Conference on Artificial Intelligence (2018), New Orleans, LO, USA, 2–7 February 2018. [Google Scholar]
- Yang, G.; Huang, X.; Hao, Z.; Liu, M.-Y.; Belongie, S.; Hariharan, B. Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4541–4550. [Google Scholar]
- Li, J.; Bian, S.; Liu, Q.; Tang, J.; Wang, F.; Lu, C. NIKI: Neural inverse kinematics with invertible neural networks for 3d human pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), Vancouver, BC, Canada, 17–24 June 2023; pp. 12933–12942. [Google Scholar]
- Li, J.; Xu, C.; Chen, Z.; Bian, S.; Yang, L.; Lu, C. Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), Nashville, TN, USA, 20–25 June 2021; pp. 3383–3393. [Google Scholar]
- Yi, H.; Liang, H.; Liu, Y.; Cao, Q.; Wen, Y.; Bolkart, T.; Tao, D.; Black, M.J. Generating holistic 3d human motion from speech. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), Vancouver, BC, Canada, 17–24 June 2023; pp. 469–480. [Google Scholar]
- Tripathi, S.; Müller, L.; Huang, C.-H.P.; Taheri, O.; Black, M.J.; Tzionas, D. 3D human pose estimation via intuitive physics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023), Vancouver, BC, Canada, 17–24 June 2023; pp. 4713–4725. [Google Scholar]
- Ma, Q.; Yang, J.; Ranjan, A.; Pujades, S.; Pons-Moll, G.; Tang, S.; Black, M.J. Learning to dress 3d people in generative clothing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), Seattle, WA, USA, 13–19 June 2020; pp. 6469–6478. [Google Scholar]
- Lazova, V.; Insafutdinov, E.; Pons-Moll, G. 360-degree textures of people in clothing from a single image. In Proceedings of the 2019 International Conference on 3D Vision (3DV) (IEEE 2019), Quebec City, QC, Canada, 16–19 September 2019; pp. 643–653. [Google Scholar]
- Jiang, B.; Zhang, J.; Hong, Y.; Luo, J.; Liu, L.; Bao, H. Bcnet: Learning body and cloth shape from a single image. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 18–35. [Google Scholar]
- Bhatnagar, B.L.; Tiwari, G.; Theobalt, C.; Pons-Moll, G. Multi-garment net: Learning to dress 3d people from images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5420–5430. [Google Scholar]
- He, T.; Xu, Y.; Saito, S.; Soatto, S.; Tung, T. Arch++: Animation-ready clothed human reconstruction revisited. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), Nashville, TN, USA, 20–25 June 2021; pp. 11046–11056. [Google Scholar]
- Bhatnagar, B.L.; Sminchisescu, C.; Theobalt, C.; Pons-Moll, G. Combining implicit function learning and parametric models for 3d human reconstruction. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 311–329. [Google Scholar]
- Chen, Z.; Zhang, H. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5939–5948. [Google Scholar]
- He, T.; Collomosse, J.; Jin, H.; Soatto, S. Geo-pifu: Geometry and pixel aligned implicit functions for single-view human reconstruction. Adv. Neural Inf. Process. Syst. 2020, 33, 9276–9287. [Google Scholar]
- Li, Z.; Yu, T.; Pan, C.; Zheng, Z.; Liu, Y. Robust 3d self-portraits in seconds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), Seattle, WA, USA, 13–19 June 2020; pp. 1344–1353. [Google Scholar]
- Dong, Z.; Guo, C.; Song, J.; Chen, X.; Geiger, A.; Hilliges, O. PINA: Learning a personalized implicit neural avatar from a single RGB-D video sequence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), New Orleans, LA, USA, 18–24 June 2022; pp. 20470–20480. [Google Scholar]
- Yang, Z.; Wang, S.; Manivasagam, S.; Huang, Z.; Ma, W.-C.; Yan, X.; Yumer, E.; Urtasun, R. S3: Neural shape, skeleton, and skinning fields for 3d human modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), Nashville, TN, USA, 20–25 June 2021; pp. 13284–13293. [Google Scholar]
- Huang, Z.; Xu, Y.; Lassner, C.; Li, H.; Tung, T. Arch: Animatable reconstruction of clothed humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), Seattle, WA, USA, 13–19 June 2020; pp. 3093–3102. [Google Scholar]
- Alldieck, T.; Xu, H.; Sminchisescu, C. imghum: Implicit generative models of 3d human shape and articulated pose. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), Nashville, TN, USA, 20–25 June 2021; pp. 5461–5470. [Google Scholar]
- Lorensen, W.E.; Cline, H.E. Marching cubes: A high resolution 3D surface construction algorithm. Semin. Graph. Pioneer. Efforts Shaped Field 1998, 347–353. [Google Scholar]
- Yu, T.; Zheng, Z.; Guo, K.; Liu, P.; Dai, Q.; Liu, Y. Function4d: Real-time human volumetric capture from very sparse consumer rgbd sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), Nashville, TN, USA, 20–25 June 2021; pp. 5746–5756. [Google Scholar]
- Patel, P.; Huang, C.-H.P.; Tesch, J.; Hoffmann, D.T.; Tripathi, S.; Black, M.J. AGORA: Avatars in geography optimized for regression analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), Nashville, TN, USA, 20–25 June 2021; pp. 13468–13478. [Google Scholar]
Dataset | CAPE | AGORA | Execution Time ↓ | ||||
---|---|---|---|---|---|---|---|
Method | Chamfer ↓ | P2S ↓ | Normals ↓ | Chamfer ↓ | P2S ↓ | Normals ↓ | |
Ours | 1.163 | 1.296 | 0.059 | 1.202 | 1.586 | 0.061 | 20.1 s |
PIFu | 3.587 | 3.652 | 0.119 | 3.379 | 3.452 | 0.092 | 33.7 s |
PIFuHD | 3.103 | 2.897 | 0.113 | 3.087 | 3.326 | 0.084 | 41.6 s |
ICON | 1.187 | 1.358 | 0.065 | 1.221 | 1.593 | 0.063 | 21.4 s |
wo/VFSM | 2.039 | 1.414 | 0.078 | 1.974 | 1.651 | 0.072 | 19.6 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ren, Y.; Zhou, M.; Wang, Y.; Feng, L.; Zhu, Q.; Li, K.; Geng, G. Implicit 3D Human Reconstruction Guided by Parametric Models and Normal Maps. J. Imaging 2024, 10, 133. https://doi.org/10.3390/jimaging10060133
Ren Y, Zhou M, Wang Y, Feng L, Zhu Q, Li K, Geng G. Implicit 3D Human Reconstruction Guided by Parametric Models and Normal Maps. Journal of Imaging. 2024; 10(6):133. https://doi.org/10.3390/jimaging10060133
Chicago/Turabian StyleRen, Yong, Mingquan Zhou, Yifan Wang, Long Feng, Qiuquan Zhu, Kang Li, and Guohua Geng. 2024. "Implicit 3D Human Reconstruction Guided by Parametric Models and Normal Maps" Journal of Imaging 10, no. 6: 133. https://doi.org/10.3390/jimaging10060133
APA StyleRen, Y., Zhou, M., Wang, Y., Feng, L., Zhu, Q., Li, K., & Geng, G. (2024). Implicit 3D Human Reconstruction Guided by Parametric Models and Normal Maps. Journal of Imaging, 10(6), 133. https://doi.org/10.3390/jimaging10060133