GC-HG Gaussian Splatting Single-View 3D Reconstruction Method Based on Depth Prior and Pseudo-Triplane
Abstract
1. Introduction
- Depth prior-guided point cloud initialization: we introduce the pre-trained VGGT 3D geometric model as a prior for depth perception, which is used to extract a high-quality depth map. An initial point cloud is then generated through a confidence sampling strategy, providing 3DGS with the initialization of structure perception ability. This prior can enhance the geometric representation ability and cross-scene generalization performance of the model.
- Pseudo-triplane representation enhances geometric consistency: we construct a pseudo-triplane representation by integrating a learnable Z-plane token with deep features extracted by an image encoder. This decouples 2D image information into 3D feature planes on the XZ and YZ planes, enabling a specialized fusion mechanism for joint modeling. This enhances geometric consistency and structural details preservation.
- Parent–child hierarchical Gaussian renderer modeling depth hierarchy: in the feed-forward 3DGS framework, we design a parent–child hierarchical Gaussian renderer to models multi-level spatial structure by combining depth and 3D offsets. A multi-layer perceptron (MLP) learns the linear mapping between parent and child Gaussians, enhancing the model’s ability to represent complex depth hierarchies.
2. Basic Principles
2.1. Feed-Forward 3DGS
2.2. Algorithm in This Paper
2.3. Depth Prior
2.4. Pseudo-Triplane Mechanism
2.5. Parent–Child Hierarchical Gaussian Renderer
3. Experimental Results and Analysis
3.1. Dataset and Parameter Settings
3.2. Ablation Experiment
3.2.1. Pseudo-Triplane Depth Prior Analysis
3.2.2. Parent–Child Hierarchical Gaussian Analysis
3.3. Comparative Experiment
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 165–174. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Trans. Graph. 2023, 42, 1–14. [Google Scholar] [CrossRef]
- Li, J.; Feng, Z.; She, Q.; Ding, H.; Wang, C.; Lee, G.H. MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 12578–12588. [Google Scholar]
- Szymanowicz, S.; Rupprecht, C.; Vedaldi, A. Splatter Image: Ultra-Fast Single-View 3D Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 10208–10217. [Google Scholar]
- Xu, H.; Peng, S.; Wang, F.; Blum, H.; Barath, D.; Geiger, A.; Pollefeys, M. DepthSplat: Connecting Gaussian Splatting and Depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 16453–16463. [Google Scholar]
- Chen, Y.; Xu, H.; Zheng, C.; Zhuang, B.; Pollefeys, M.; Geiger, A.; Cham, T.-J.; Cai, J. MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 370–386. [Google Scholar]
- Charatan, D.; Li, S.L.; Tagliasacchi, A.; Sitzmann, V. pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 19457–19467. [Google Scholar]
- Chan, E.R.; Lin, C.Z.; Chan, M.A.; Nagano, K.; Pan, B.; De Mello, S.; Gallo, O.; Guibas, L.; Tremblay, J.; Khamis, S.; et al. Efficient Geometry-aware 3D Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 16123–16133. [Google Scholar]
- Shue, J.R.; Chan, E.R.; Po, R.; Ankner, Z.; Wu, J.; Wetzstein, G. 3D Neural Field Generation using Triplane Diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 20875–20886. [Google Scholar]
- Zou, Z.X.; Yu, Z.; Guo, Y.C.; Li, Y.; Liang, D.; Cao, Y.P.; Zhang, S.H. Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 10324–10335. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 10684–10695. [Google Scholar]
- Yi, T.; Fang, J.; Wang, J.; Wu, G.; Xie, L.; Zhang, X.; Liu, W.; Tian, Q.; Wang, X. GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 6796–6807. [Google Scholar]
- Liu, R.; Wu, R.; Van Hoorick, B.; Tokmakov, P.; Zakharov, S.; Vondrick, C. Zero-1-to-3: Zero-shot One Image to 3D Object. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 9298–9309. [Google Scholar]
- Chan, E.R.; Nagano, K.; Chan, M.A.; Bergman, A.W.; Park, J.J.; Levy, A.; Aittala, M.; De Mello, S.; Karras, T.; Wetzstein, G. Generative Novel View Synthesis with 3D-Aware Diffusion Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 4217–4229. [Google Scholar]
- Zhou, Z.; Tulsiani, S. SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 12588–12597. [Google Scholar]
- Ma, B.; Gao, H.; Deng, H.; Luo, Z.; Huang, T.; Tang, L.; Wang, X. You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 2016–2029. [Google Scholar]
- Gong, T.; Li, B.; Zhong, Y.; Wang, F. ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image. arXiv 2025, arXiv:2503.23881. [Google Scholar]
- Li, J.; Tan, H.; Zhang, K.; Xu, Z.; Luan, F.; Xu, Y.; Hong, Y.; Sunkavalli, K.; Shakhnarovich, G.; Bi, S. Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model. arXiv 2023, arXiv:2311.06214. [Google Scholar]
- Melas-Kyriazi, L.; Rupprecht, C.; Laina, I.; Vedaldi, A. RealFusion: 360° Reconstruction of Any Object from a Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 8446–8455. [Google Scholar]
- Melas-Kyriazi, L.; Laina, I.; Rupprecht, C.; Neverova, N.; Vedaldi, A.; Gafni, O.; Kokkinos, F. IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation. arXiv 2024, arXiv:2402.08682. [Google Scholar]
- Shi, Y.; Wang, P.; Ye, J.; Long, M.; Li, K.; Yang, X. MVDream: Multi-view Diffusion for 3D Generation. arXiv 2023, arXiv:2308.16512. [Google Scholar]
- Zheng, C.; Vedaldi, A. Free3D: Consistent Novel View Synthesis without 3D Representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 9720–9731. [Google Scholar]
- Blattmann, A.; Dockhorn, T.; Kulal, S.; Mendelevitch, D.; Kilian, M.; Lorenz, D.; Levi, Y.; English, Z.; Voleti, V.; Letts, A.; et al. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. arXiv 2023, arXiv:2311.15127. [Google Scholar] [CrossRef]
- Dai, X.; Hou, J.; Ma, C.Y.; Tsai, S.; Wang, J.; Wang, R.; Zhang, P.; Vandenhende, S.; Wang, X.; Dubey, A.; et al. Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack. arXiv 2023, arXiv:2309.15807. [Google Scholar] [CrossRef]
- Girdhar, R.; Singh, M.; Brown, A.; Duval, Q.; Azadi, S.; Rambhatla, S.S.; Shah, A.; Yin, X.; Parikh, D.; Misra, I. Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 205–224. [Google Scholar]
- Podell, D.; English, Z.; Lacey, K.; Blattmann, A.; Dockhorn, T.; Müller, J.; Penna, J.; Rombach, R. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv 2023, arXiv:2307.01952. [Google Scholar] [CrossRef]
- Rombach, R.; Esser, P.; Ommer, B. Geometry-Free View Synthesis: Transformers and no 3D Prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 14356–14366. [Google Scholar]
- Zwicker, M.; Pfister, H.; Van Baar, J.; Gross, M. EWA Splatting. IEEE Trans. Vis. Comput. Graph. 2002, 8, 223–238. [Google Scholar] [CrossRef]
- Wang, J.; Chen, M.; Karaev, N.; Vedaldi, A.; Rupprecht, C.; Novotny, D. VGGT: Visual Geometry Grounded Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 5294–5306. [Google Scholar]
- Zhou, T.; Tucker, R.; Flynn, J.; Fyffe, G.; Snavely, N. Stereo Magnification: Learning View Synthesis using Multiplane Images. arXiv 2018, arXiv:1805.09817. [Google Scholar] [CrossRef]
- Belghazi, M.I.; Baratin, A.; Rajeswar, S.; Ozair, S.; Bengio, Y.; Courville, A.; Hjelm, R.D. MINE: Mutual Information Neural Estimation. arXiv 2018, arXiv:1801.04062. [Google Scholar]
- Tucker, R.; Snavely, N. Single-View View Synthesis with Multiplane Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–20 June 2020; pp. 551–560. [Google Scholar]
- Szymanowicz, S.; Insafutdinov, E.; Zheng, C.; Campbell, D.; Henriques, J.F.; Rupprecht, C.; Vedaldi, A. Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image. arXiv 2024, arXiv:2406.04343. [Google Scholar]

















| Designs | PSNR/dB↑ | SSIM↑ | LPIPS↓ |
|---|---|---|---|
| Baseline | 24.45 | 0.825 | 0.163 |
| w/Pseudo-triplane Depth Prior | 24.86 (+1.67%) | 0.828 (+0.37%) | 0.160 (+1.84%) |
| w/Parent–child hierarchical Gaussian | 24.01 (−1.79%) | 0.806 (−2.35%) | 0.176 (−7.97%) |
| GC-HG | 25.15 (+2.86%) | 0.833 (+0.99%) | 0.158 (+3.06%) |
| Metric | Edge Density↑ | High-Frequency Energy↑ |
|---|---|---|
| w/o Pseudo-triplane Depth Prior | 0.0307 | 95.28 |
| w/Pseudo-triplane Depth Prior | 0.0485 (+57.98%) | 100.96 (+5.9%) |
| 5 Frames | 10 Frames | U [−30, 30] Frames | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Model | PSNR/dB | SSIM | LPIPS | PSNR/dB | SSIM | LPIPS | PSNR/dB | SSIM | LPIPS |
| SV-MPI [33] | 27.10 | 0.870 | 0.129 | 24.40 | 0.812 | 0.170 | 23.52 | 0.785 | 0.830 |
| MINE [4] | 28.45 | 0.897 | 0.111 | 25.89 | 0.850 | 0.150 | 24.75 | 0.820 | 0.179 |
| Splatter Image [5] | 28.15 | 0.894 | 0.110 | 25.34 | 0.842 | 0.144 | 24.15 | 0.810 | 0.177 |
| Flash3D [34] | 28.46 | 0.899 | 0.100 | 25.94 | 0.857 | 0.133 | 24.93 | 0.833 | 0.160 |
| GC-HG | 28.79 | 0.899 | 0.097 | 26.07 | 0.857 | 0.132 | 25.15 | 0.833 | 0.158 |
| 5 Frames | 10 Frames | U [−30, 30] Frames | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Datatype | Method | PSNR/dB | SSIM | LPIPS | PSNR/dB | SSIM | LPIPS | PSNR/dB | SSIM | LPIPS |
| Estate | MINE [4] | 36.12 | 0.958 | 0.028 | 35.24 | 0.946 | 0.043 | 31.62 | 0.933 | 0.051 |
| Splatter Image [5] | 35.47 | 0.955 | 0.035 | 34.00 | 0.939 | 0.051 | 31.21 | 0.930 | 0.062 | |
| Flash3D [34] | 36.25 | 0.957 | 0.033 | 34.89 | 0.942 | 0.049 | 31.25 | 0.929 | 0.066 | |
| GC-HG | 37.53 | 0.959 | 0.031 | 35.91 | 0.946 | 0.046 | 32.09 | 0.940 | 0.052 | |
| Loft | MINE [4] | 23.43 | 0.911 | 0.080 | 22.50 | 0.863 | 0.150 | 20.34 | 0.808 | 0.177 |
| Splatter Image [5] | 23.26 | 0.910 | 0.089 | 22.52 | 0.864 | 0.149 | 20.70 | 0.810 | 0.185 | |
| Flash3D [34] | 23.33 | 0.908 | 0.092 | 22.83 | 0.861 | 0.144 | 20.09 | 0.798 | 0.187 | |
| GC-HG | 23.79 | 0.916 | 0.074 | 22.96 | 0.870 | 0.147 | 20.90 | 0.821 | 0.173 | |
| Kitchen | MINE [4] | 25.85 | 0.906 | 0.054 | 21.99 | 0.840 | 0.084 | 18.94 | 0.746 | 0.156 |
| Splatter Image [5] | 25.76 | 0.906 | 0.053 | 21.79 | 0.837 | 0.085 | 18.68 | 0.738 | 0.152 | |
| Flash3D [34] | 25.61 | 0.901 | 0.057 | 21.33 | 0.820 | 0.089 | 18.09 | 0.699 | 0.159 | |
| GC-HG | 26.36 | 0.913 | 0.046 | 22.26 | 0.851 | 0.074 | 19.23 | 0.767 | 0.141 | |
| Balcony | MINE [4] | 31.67 | 0.944 | 0.027 | 28.88 | 0.922 | 0.044 | 19.86 | 0.800 | 0.203 |
| Splatter Image [5] | 31.32 | 0.940 | 0.029 | 28.61 | 0.916 | 0.046 | 19.90 | 0.799 | 0.206 | |
| Flash3D [34] | 31.35 | 0.939 | 0.032 | 28.52 | 0.915 | 0.048 | 19.32 | 0.790 | 0.210 | |
| GC-HG | 32.07 | 0.946 | 0.028 | 29.14 | 0.926 | 0.044 | 19.98 | 0.799 | 0.201 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gong, H.; Wang, P.; Ma, Y.; Zhang, Y. GC-HG Gaussian Splatting Single-View 3D Reconstruction Method Based on Depth Prior and Pseudo-Triplane. Algorithms 2025, 18, 761. https://doi.org/10.3390/a18120761
Gong H, Wang P, Ma Y, Zhang Y. GC-HG Gaussian Splatting Single-View 3D Reconstruction Method Based on Depth Prior and Pseudo-Triplane. Algorithms. 2025; 18(12):761. https://doi.org/10.3390/a18120761
Chicago/Turabian StyleGong, Hua, Peide Wang, Yuanjing Ma, and Yong Zhang. 2025. "GC-HG Gaussian Splatting Single-View 3D Reconstruction Method Based on Depth Prior and Pseudo-Triplane" Algorithms 18, no. 12: 761. https://doi.org/10.3390/a18120761
APA StyleGong, H., Wang, P., Ma, Y., & Zhang, Y. (2025). GC-HG Gaussian Splatting Single-View 3D Reconstruction Method Based on Depth Prior and Pseudo-Triplane. Algorithms, 18(12), 761. https://doi.org/10.3390/a18120761

