From 2D to 3D: A Generative Model from Single Image to Digital 3D of Chinese Three Gorges Cultural Relics
Abstract
1. Introduction
- We present a generative single-image-to-3D reconstruction framework tailored for cultural relic digitization, enabling the recovery of structurally coherent 3D models from a single RGB image under unconstrained conditions.
- We adopt a transformer-based image-to-triplane representation that effectively captures both global structure and fine-grained visual details, and decode it into an implicit volumetric representation for high-quality geometry synthesis.
- We demonstrate the effectiveness of the proposed framework on a dataset of Chinese Three Gorges cultural relics, showing superior reconstruction accuracy, surface completeness, and visual consistency compared with existing single-image and multi-view baselines.
- This study provides a scalable and practical solution for heritage digitization from limited visual data, contributing to the broader application of generative 3D modeling techniques in cultural heritage preservation and digital restoration.
2. Related Work
2.1. Neural Implicit Representations for 3D Reconstruction
2.2. Single-Image 3D Reconstruction: From Optimization to Feed-Forward Models
2.3. Triplane and Hybrid 3D Representations
2.4. Vision Transformers and Self-Supervised Learning
2.5. Digital Documentation and Traditional Reconstruction
3. Methodology
3.1. Task Definition and Motivation
3.2. Implicit 3D Representation
3.3. Image Encoder
3.4. Image-to-Triplane Decoder
3.5. Triplane Feature Sampling
3.6. Radiance Field Prediction
3.7. Camera-Agnostic Rendering
3.8. Surface Extraction and Texture Generation
4. Experimental Results
4.1. Dataset and Setup
| Parameter | Value | |
|---|---|---|
| Image Tokenizer | image resolution | |
| patch size | 16 | |
| attention layers | 12 | |
| feature channels | 768 | |
| Triplane Tokenizer | tokens | |
| channels | 16 | |
| Backbone | channels | 1024 |
| attention layers | 16 | |
| attention heads | 16 | |
| attention head dim | 64 | |
| cross attention dim | 768 | |
| Triplane Upsampler | factor | 2 |
| input channels | 1024 | |
| output channels | 40 | |
| output shape | ||
| NeRF MLP | width | 64 |
| layers | 10 | |
| activation | SiLU | |
| Renderer | samples per ray | 128 |
| radius | 0.87 | |
| density activation | exp | |
| density bias | −1.0 | |
| Training | learning rate | |
| optimizer | AdamW | |
| weight decay | 0.05 | |
| lr scheduler | CosineAnnealingLR | |
| warm-up steps | 2000 | |
| batch size | 64 | |
| total epochs | 40 | |
| Training Time | ∼46 h | |
| total params | ∼438 M | |
| FLOPs | ∼156 G |
| Algorithm 1: From 2D to 3D: A Generative Model from Single Image to Digital 3D of Chinese Three Gorges Cultural Relics |
![]() |
4.2. Qualitative Evaluation
4.3. Quantitative Evaluation
4.4. Failure Case Analysis
- Case 1: The original object is a pottery jar, but the shooting angle was taken from above the jar’s mouth, causing the model to misinterpret it as a pottery bowl.
- Case 2: The entity in the input photo is a four-legged pottery vessel, but since the photo failed to capture all four legs, the model’s final output resulted in a discontinuous four-legged vessel.
- Case 3: The object in the input image was originally a clay figurine. However, the photograph was taken entirely from a top-down orthographic perspective, failing to capture the figurine’s thickness. This ultimately caused the model to collapse into a uniformly thick 3D clay figurine.
- Case 4: The object should have been a bronze mirror (one side smooth, the other decorated). However, due to the model’s limited texture perception, the decorative patterns on the back were lost, and the smooth surface was not represented.
- Case 5: The object should have been a jade buckle. Although our model reconstructed part of the outline and shape, the smooth texture of the jade was not effectively expressed.
4.5. Ablation Study
5. Discussion
5.1. Implications for Cultural Heritage Preservation
5.2. Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Del Giudice, M.; Osello, A. BIM for cultural heritage. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2013, 40, 225–229. [Google Scholar] [CrossRef]
- Murphy, M.; McGovern, E.; Pavia, S. Historic building information modelling (HBIM). Struct. Surv. 2009, 27, 311–327. [Google Scholar] [CrossRef]
- Remondino, F.; Rizzi, A. Reality-based 3D documentation of natural and cultural heritage sites—techniques, problems, and examples. Appl. Geomat. 2010, 2, 85–100. [Google Scholar] [CrossRef]
- Luhmann, T.; Robson, S.; Kyle, S.; Boehm, J. Close-Range Photogrammetry and 3D Imaging; Walter de Gruyter GmbH & Co. KG: Berlin, Germany, 2023. [Google Scholar]
- Vosselman, G.; Maas, H.G. Airborne and Terrestrial Laser Scanning; Whittles Publishing: Dunbeath, UK, 2010. [Google Scholar]
- Salvi, J.; Fernandez, S.; Pribanic, T.; Llado, X. A state of the art in structured light patterns for surface profilometry. Pattern Recognit. 2010, 43, 2666–2680. [Google Scholar] [CrossRef]
- Westoby, M.J.; Brasington, J.; Glasser, N.F.; Hambrey, M.J.; Reynolds, J.M. ‘Structure-from-Motion’photogrammetry: A low-cost, effective tool for geoscience applications. Geomorphology 2012, 179, 300–314. [Google Scholar] [CrossRef]
- Kersten, T.P.; Lindstaedt, M. Image-based low-cost systems for automatic 3D recording and modelling of archaeological finds and objects. In Proceedings of the Euro-Mediterranean Conference; Springer: Berlin/Heidelberg, Germany, 2012; pp. 1–10. [Google Scholar]
- Gallo, G.; Stanco, F.; Battiato, S. Digital Imaging for Cultural Heritage Preservation; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
- Wang, N.; Zhang, Y.; Li, Z.; Fu, Y.; Liu, W.; Jiang, Y.G. Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 52–67. [Google Scholar]
- Kato, H.; Ushiku, Y.; Harada, T. Neural 3d mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 3907–3916. [Google Scholar]
- Pintus, R.; Pal, K.; Yang, Y.; Weyrich, T.; Gobbetti, E.; Rushmeier, H. A survey of geometric analysis in cultural heritage. In Proceedings of the Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2016; Volume 35, pp. 4–31. [Google Scholar]
- Barsanti, S.G.; Remondino, F.; Visintini, D. Photogrammetry and Laser Scanning for archaeological site 3D modeling–Some critical issues. In Proceedings of the 2nd Workshop on ‘The New Technologies for Aquileia’, Aquileia, Italy, 25 June 2012; Roberto, V., Fozzati, L., Eds.; Volume 1, pp. 1–10. [Google Scholar]
- Chen, Z.; Zhang, H. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2019; pp. 5939–5948. [Google Scholar]
- Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2019; pp. 165–174. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Oechsle, M.; Peng, S.; Geiger, A. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2021; pp. 5589–5599. [Google Scholar]
- Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2019; pp. 4460–4470. [Google Scholar]
- Barron, J.T.; Mildenhall, B.; Tancik, M.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P.P. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2021; pp. 5855–5864. [Google Scholar]
- Barron, J.T.; Mildenhall, B.; Verbin, D.; Srinivasan, P.P.; Hedman, P. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 5470–5479. [Google Scholar]
- Lin, C.H.; Ma, W.C.; Torralba, A.; Lucey, S. Barf: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2021; pp. 5741–5751. [Google Scholar]
- Wang, Z.; Wu, S.; Xie, W.; Chen, M.; Prisacariu, V.A. NeRF–: Neural Radiance Fields Without Known Camera Parameters. arXiv 2021, arXiv:2102.07064. [Google Scholar]
- Verbin, D.; Hedman, P.; Mildenhall, B.; Zickler, T.; Barron, J.T.; Srinivasan, P.P. Ref-nerf: Structured view-dependent appearance for neural radiance fields. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 9426–9437. [Google Scholar] [CrossRef] [PubMed]
- Martin-Brualla, R.; Radwan, N.; Sajjadi, M.S.; Barron, J.T.; Dosovitskiy, A.; Duckworth, D. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 7210–7219. [Google Scholar]
- Tewari, A.; Fried, O.; Thies, J.; Sitzmann, V.; Lombardi, S.; Sunkavalli, K.; Martin-Brualla, R.; Simon, T.; Saragih, J.; Nießner, M.; et al. State of the art on neural rendering. In Proceedings of the Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2020; Volume 39, pp. 701–727. [Google Scholar]
- Xiao, W.; Cruz, R.S.; Ahmedt-Aristizabal, D.; Salvado, O.; Fookes, C.; Lebrat, L. Nerf director: Revisiting view selection in neural volume rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 20742–20751. [Google Scholar]
- Prados, E.; Faugeras, O. Shape from shading. In Handbook of Mathematical Models in Computer Vision; Springer: Berlin/Heidelberg, Germany, 2006; pp. 375–388. [Google Scholar]
- Blanz, V.; Vetter, T. A morphable model for the synthesis of 3D faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2; Association for Computing Machinery: New York, NY, USA, 2023; pp. 157–164. [Google Scholar]
- Anguelov, D.; Srinivasan, P.; Koller, D.; Thrun, S.; Rodgers, J.; Davis, J. Scape: Shape completion and animation of people. In ACM Siggraph 2005 Papers; Association for Computing Machinery: New York, NY, USA, 2005; pp. 408–416. [Google Scholar]
- Choy, C.B.; Xu, D.; Gwak, J.; Chen, K.; Savarese, S. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 628–644. [Google Scholar]
- Fan, H.; Su, H.; Guibas, L.J. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2017; pp. 605–613. [Google Scholar]
- Xie, H.; Yao, H.; Sun, X.; Zhou, S.; Zhang, S. Pix2vox: Context-aware 3d reconstruction from single and multi-view images. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2019; pp. 2690–2698. [Google Scholar]
- Hong, Y.; Zhang, K.; Gu, J.; Bi, S.; Zhou, Y.; Liu, D.; Liu, F.; Sunkavalli, K.; Bui, T.; Tan, H. Lrm: Large reconstruction model for single image to 3d. arXiv 2023, arXiv:2311.04400. [Google Scholar]
- Tochilkin, D.; Pankratz, D.; Liu, Z.; Huang, Z.; Letts, A.; Li, Y.; Liang, D.; Laforte, C.; Jampani, V.; Cao, Y.P. Triposr: Fast 3d object reconstruction from a single image. arXiv 2024, arXiv:2403.02151. [Google Scholar] [CrossRef]
- Li, J.; Tan, H.; Zhang, K.; Xu, Z.; Luan, F.; Xu, Y.; Hong, Y.; Sunkavalli, K.; Shakhnarovich, G.; Bi, S. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv 2023, arXiv:2311.06214. [Google Scholar]
- Huang, Z.; Stojanov, S.; Thai, A.; Jampani, V.; Rehg, J.M. Zeroshape: Regression-based zero-shot shape reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 10061–10071. [Google Scholar]
- Wang, Z.; Wang, Y.; Chen, Y.; Xiang, C.; Chen, S.; Yu, D.; Li, C.; Su, H.; Zhu, J. Crm: Single image to 3d textured mesh with convolutional reconstruction model. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2024; pp. 57–74. [Google Scholar]
- Wei, X.; Zhang, K.; Bi, S.; Tan, H.; Luan, F.; Deschaintre, V.; Sunkavalli, K.; Su, H.; Xu, Z. Meshlrm: Large reconstruction model for high-quality meshes. arXiv 2024, arXiv:2404.12385. [Google Scholar]
- Deitke, M.; Schwenk, D.; Salvador, J.; Weihs, L.; Michel, O.; VanderBilt, E.; Schmidt, L.; Ehsani, K.; Kembhavi, A.; Farhadi, A. Objaverse: A universe of annotated 3d objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2023; pp. 13142–13153. [Google Scholar]
- Collins, J.; Goel, S.; Deng, K.; Luthra, A.; Xu, L.; Gundogdu, E.; Zhang, X.; Vicente, T.F.Y.; Dideriksen, T.; Arora, H.; et al. Abo: Dataset and benchmarks for real-world 3d object understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 21126–21136. [Google Scholar]
- Barrile, V.; Bilotta, G.; Lamari, D. 3D models of Cultural Heritage. Int. J. Math. Model. Methods Appl. Sci. 2017, 11, 1–8. [Google Scholar]
- Tatarchenko, M.; Dosovitskiy, A.; Brox, T. Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2017; pp. 2088–2096. [Google Scholar]
- Chan, E.R.; Lin, C.Z.; Chan, M.A.; Nagano, K.; Pan, B.; De Mello, S.; Gallo, O.; Guibas, L.J.; Tremblay, J.; Khamis, S.; et al. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 16123–16133. [Google Scholar]
- Gu, J.; Liu, L.; Wang, P.; Theobalt, C. Stylenerf: A style-based 3d-aware generator for high-resolution image synthesis. arXiv 2021, arXiv:2110.08985. [Google Scholar]
- Zou, Z.X.; Yu, Z.; Guo, Y.C.; Li, Y.; Liang, D.; Cao, Y.P.; Zhang, S.H. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 10324–10335. [Google Scholar]
- Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 139-1. [Google Scholar] [CrossRef]
- Kania, K.; Yi, K.M.; Kowalski, M.; Trzciński, T.; Tagliasacchi, A. Conerf: Controllable neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 18623–18632. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2021; pp. 10012–10022. [Google Scholar]
- Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 12009–12019. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2021; pp. 9650–9660. [Google Scholar]
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. Dinov2: Learning robust visual features without supervision. arXiv 2023, arXiv:2304.07193. [Google Scholar]
- Siméoni, O.; Vo, H.V.; Seitzer, M.; Baldassarre, F.; Oquab, M.; Jose, C.; Khalidov, V.; Szafraniec, M.; Yi, S.; Ramamonjisoa, M.; et al. Dinov3. arXiv 2025, arXiv:2508.10104. [Google Scholar]
- Wang, J.; Chen, M.; Karaev, N.; Vedaldi, A.; Rupprecht, C.; Novotny, D. Vggt: Visual geometry grounded transformer. In Proceedings of the Computer Vision and Pattern Recognition Conference; IEEE: Piscataway, NJ, USA, 2025; pp. 5294–5306. [Google Scholar]
- Remondino, F. Heritage recording and 3D modeling with photogrammetry and 3D scanning. Remote Sens. 2011, 3, 1104–1138. [Google Scholar] [CrossRef]
- Levoy, M.; Pulli, K.; Curless, B.; Rusinkiewicz, S.; Koller, D.; Pereira, L.; Ginzton, M.; Anderson, S.; Davis, J.; Ginsberg, J.; et al. The digital Michelangelo project: 3D scanning of large statues. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques; ACM Press: New York, NY, USA, 2000; pp. 131–144. [Google Scholar]
- Schonberger, J.L.; Frahm, J.M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2016; pp. 4104–4113. [Google Scholar]
- Lorensen, W.E.; Cline, H.E. Marching cubes: A high resolution 3D surface construction algorithm. In Seminal Graphics: Pioneering Efforts that Shaped the Field; Association for Computing Machinery: New York, NY, USA, 1998; pp. 347–353. [Google Scholar]
- Liu, M.; Xu, C.; Jin, H.; Chen, L.; Varma T, M.; Xu, Z.; Su, H. One-2-3-45: Any single image to 3d mesh in 45 s without per-shape optimization. Adv. Neural Inf. Process. Syst. 2023, 36, 22226–22246. [Google Scholar]
- Tang, J.; Chen, Z.; Chen, X.; Wang, T.; Zeng, G.; Liu, Z. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–18. [Google Scholar]
- Xu, J.; Cheng, W.; Gao, Y.; Wang, X.; Gao, S.; Shan, Y. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models. arXiv 2024, arXiv:2404.07191. [Google Scholar]
- Kasula, V.K.; Yenugula, M.; Konda, B.; Yadulla, A.R.; Tumma, C.; Rakki, S.B. Federated learning with secure aggregation for privacy-preserving deep learning in IoT environments. In Proceedings of the 2025 IEEE Conference on Computer Applications (ICCA); IEEE: Piscataway, NJ, USA, 2025; pp. 1–7. [Google Scholar]






| Methods | CD↓ | FS@0.1↑ | FS@0.2↑ | FS@0.5↑ |
|---|---|---|---|---|
| One-2-3-45 [61] | 0.378 | 0.299 | 0.597 | 0.776 |
| ZeroShape [38] | 0.223 | 0.423 | 0.665 | 0.809 |
| TGS [47] | 0.188 | 0.579 | 0.731 | 0.826 |
| LGM [62] | 0.263 | 0.445 | 0.583 | 0.671 |
| InstantMesh [63] | 0.177 | 0.606 | 0.752 | 0.833 |
| CRM [39] | 0.252 | 0.561 | 0.701 | 0.787 |
| LRM [35] | 0.177 | 0.599 | 0.755 | 0.831 |
| VGGT [56] | 0.170 | 0.607 | 0.740 | 0.838 |
| Ours | 0.163 * | 0.622 * | 0.756 * | 0.848 * |
| Configuration | CD↓ | FS@0.1↑ | FS@0.2↑ | FS@0.5↑ |
|---|---|---|---|---|
| w/o DINO | 0.189 | 0.581 | 0.715 | 0.808 |
| w/o Camera Estimation | 0.215 | 0.542 | 0.678 | 0.789 |
| w/o Triplane Resolution | 0.295 | 0.575 | 0.705 | 0.801 |
| Full Model | 0.163 | 0.622 | 0.756 | 0.848 |
| Triplane Resolution | Inference Time (s) | Peak VRAM (GB) | CD↓ |
|---|---|---|---|
| 16 × 16 | 0.8 | 4.2 | 0.295 |
| 32 × 32 (Ours) | 1.6 | 6.4 | 0.163 |
| 64 × 64 | 3.5 | 10.8 | 0.153 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wu, G.; Ge, M.; Wang, Y.; Chen, Y.; Liu, L. From 2D to 3D: A Generative Model from Single Image to Digital 3D of Chinese Three Gorges Cultural Relics. Appl. Sci. 2026, 16, 2678. https://doi.org/10.3390/app16062678
Wu G, Ge M, Wang Y, Chen Y, Liu L. From 2D to 3D: A Generative Model from Single Image to Digital 3D of Chinese Three Gorges Cultural Relics. Applied Sciences. 2026; 16(6):2678. https://doi.org/10.3390/app16062678
Chicago/Turabian StyleWu, Guang, Mingyuan Ge, Yunxiang Wang, Youhao Chen, and Li Liu. 2026. "From 2D to 3D: A Generative Model from Single Image to Digital 3D of Chinese Three Gorges Cultural Relics" Applied Sciences 16, no. 6: 2678. https://doi.org/10.3390/app16062678
APA StyleWu, G., Ge, M., Wang, Y., Chen, Y., & Liu, L. (2026). From 2D to 3D: A Generative Model from Single Image to Digital 3D of Chinese Three Gorges Cultural Relics. Applied Sciences, 16(6), 2678. https://doi.org/10.3390/app16062678


