Relaxing Accurate Initialization for Monocular Dynamic Scene Reconstruction with Gaussian Splatting
Abstract
1. Introduction
- We introduce an innovative optimization approach that relaxes the requirement of initial point cloud for monocular dynamic scene reconstruction, providing the possibility of using Gaussian Splatting in situations where obtaining accurate point clouds is challenging.
- We propose an error-based method that separates the Gaussian representing static areas and dynamic ones, which helps with precise and diverse control over different Gaussians.
- Extensive experiments on different datasets are conducted to demonstrate the effectiveness of our method, showing that our method using randomly initialized point clouds achieves comparable or even better results compared to methods trained with accurate point clouds.
2. Related Work
2.1. Dynamic Scene Reconstruction
2.2. Initialization for 3DGS
3. Preliminary
3.1. Three-Dimensional Gaussian Splatting
3.2. Deformation for Dynamics
4. Method
4.1. Dynamic Scene Modeling
4.1.1. Initial Scene Modeling as Static
4.1.2. Error-Based Dynamic Activation
4.2. Annealing Jitter Regularization
5. Results
5.1. Implementation Details
Datasets
5.2. Quantitative Results
5.3. Qualitative Results
5.4. Ablation Study
6. Discussion
7. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Trans. Graph. 2023, 42, 139. [Google Scholar] [CrossRef]
- Chen, Z.; Wang, F.; Wang, Y.; Liu, H. Text-to-3d using gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21401–21412. [Google Scholar]
- Guédon, A.; Lepetit, V. Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5354–5363. [Google Scholar]
- Tang, J.; Chen, Z.; Chen, X.; Wang, T.; Zeng, G.; Liu, Z. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Berlin/Heidelberg, Germany, 2025; pp. 1–18. [Google Scholar]
- Yu, Z.; Chen, A.; Huang, B.; Sattler, T.; Geiger, A. Mip-splatting: Alias-free 3d gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 19447–19456. [Google Scholar]
- Park, K.; Sinha, U.; Hedman, P.; Barron, J.T.; Bouaziz, S.; Goldman, D.B.; Martin-Brualla, R.; Seitz, S.M. HyperNeRF: A higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph. (TOG) 2021, 40, 1–12. [Google Scholar] [CrossRef]
- Pumarola, A.; Corona, E.; Pons-Moll, G.; Moreno-Noguer, F. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10318–10327. [Google Scholar]
- Yan, Z.; Li, C.; Lee, G.H. Nerf-ds: Neural radiance fields for dynamic specular objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 8285–8295. [Google Scholar]
- Snavely, N.; Seitz, S.M.; Szeliski, R. Photo tourism: Exploring photo collections in 3D. In ACM Siggraph 2006 Papers; Association for Computing Machinery: New York, NY, USA, 2006; pp. 835–846. [Google Scholar]
- Wu, G.; Yi, T.; Fang, J.; Xie, L.; Zhang, X.; Wei, W.; Liu, W.; Tian, Q.; Wang, X. 4d gaussian splatting for real-time dynamic scene rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 20310–20320. [Google Scholar]
- Bian, W.; Wang, Z.; Li, K.; Bian, J.W.; Prisacariu, V.A. Nope-nerf: Optimising neural radiance field with no pose prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 4160–4169. [Google Scholar]
- Park, J.; Bui, M.Q.V.; Bello, J.L.G.; Moon, J.; Oh, J.; Kim, M. Splinegs: Robust motion-adaptive spline for real-time dynamic 3d gaussians from monocular video. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 26866–26875. [Google Scholar]
- Wang, S.; Yang, X.; Shen, Q.; Jiang, Z.; Wang, X. Gflow: Recovering 4d world from monocular video. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 7862–7870. [Google Scholar]
- Rota Bulò, S.; Porzi, L.; Kontschieder, P. Revising densification in gaussian splatting. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 347–362. [Google Scholar]
- Yang, Z.; Gao, X.; Zhou, W.; Jiao, S.; Zhang, Y.; Jin, X. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 20331–20341. [Google Scholar]
- Broxton, M.; Flynn, J.; Overbeck, R.; Erickson, D.; Hedman, P.; Duvall, M.; Dourgarian, J.; Busch, J.; Whalen, M.; Debevec, P. Immersive light field video with a layered mesh representation. ACM Trans. Graph. (TOG) 2020, 39, 86. [Google Scholar] [CrossRef]
- Collet, A.; Chuang, M.; Sweeney, P.; Gillett, D.; Evseev, D.; Calabrese, D.; Hoppe, H.; Kirk, A.; Sullivan, S. High-quality streamable free-viewpoint video. ACM Trans. Graph. (ToG) 2015, 34, 69. [Google Scholar] [CrossRef]
- Dou, M.; Davidson, P.; Fanello, S.R.; Khamis, S.; Kowdle, A.; Rhemann, C.; Tankovich, V.; Izadi, S. Motion2fusion: Real-time volumetric performance capture. ACM Trans. Graph. (ToG) 2017, 36, 246. [Google Scholar] [CrossRef]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Cao, A.; Johnson, J. Hexplane: A fast representation for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 130–141. [Google Scholar]
- Fang, J.; Yi, T.; Wang, X.; Xie, L.; Zhang, X.; Liu, W.; Nießner, M.; Tian, Q. Fast dynamic radiance fields with time-aware neural voxels. In Proceedings of the SIGGRAPH Asia 2022 Conference Papers, Daegu, Republic of Korea, 6–9 December 2022; pp. 1–9. [Google Scholar]
- Fridovich-Keil, S.; Meanti, G.; Warburg, F.R.; Recht, B.; Kanazawa, A. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12479–12488. [Google Scholar]
- Li, T.; Slavcheva, M.; Zollhoefer, M.; Green, S.; Lassner, C.; Kim, C.; Schmidt, T.; Lovegrove, S.; Goesele, M.; Newcombe, R.; et al. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5521–5531. [Google Scholar]
- Park, S.; Son, M.; Jang, S.; Ahn, Y.C.; Kim, J.Y.; Kang, N. Temporal interpolation is all you need for dynamic neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 4212–4221. [Google Scholar]
- Wang, F.; Tan, S.; Li, X.; Tian, Z.; Song, Y.; Liu, H. Mixed neural voxels for fast multi-view video synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 19706–19716. [Google Scholar]
- Attal, B.; Huang, J.B.; Richardt, C.; Zollhoefer, M.; Kopf, J.; O’Toole, M.; Kim, C. Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16610–16620. [Google Scholar]
- Park, K.; Sinha, U.; Barron, J.T.; Bouaziz, S.; Goldman, D.B.; Seitz, S.M.; Martin-Brualla, R. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5865–5874. [Google Scholar]
- Song, L.; Chen, A.; Li, Z.; Chen, Z.; Chen, L.; Yuan, J.; Xu, Y.; Geiger, A. Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields. IEEE Trans. Vis. Comput. Graph. 2023, 29, 2732–2742. [Google Scholar] [CrossRef] [PubMed]
- Liang, Y.; Khan, N.; Li, Z.; Nguyen-Phuoc, T.; Lanman, D.; Tompkin, J.; Xiao, L. Gaufre: Gaussian deformation fields for real-time dynamic novel view synthesis. arXiv 2023, arXiv:2312.11458. [Google Scholar]
- Jung, J.; Han, J.; An, H.; Kang, J.; Park, S.; Kim, S. Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting. arXiv 2024, arXiv:2403.09413. [Google Scholar] [CrossRef]





| Method | TiNeuVox | HyperNeRF | NeRF-DS | DeformableGS | DeformableGS + Our Strategy | 4DGS | 4DGS + Our Strategy | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Init. Points | - | - | - | SfM 1 | Random | Random | SfM 2 | SfM 1 | Random | Random |
| PSNR | 21.61 | 23.45 | 23.60 | 24.11 | 23.7513 | 23.89 | 23.47 | 21.01 | 20.10 | 22.6695 |
| LPIPS | 0.2766 | 0.1990 | 0.1816 | 0.1769 | 0.1859 | 0.1901 | 0.1651 | 0.2887 | 0.3417 | 0.2259 |
| SSIM | 0.8234 | 0.8488 | 0.8494 | 0.8524 | 0.8461 | 0.8491 | 0.8288 | 0.7243 | 0.6683 | 0.8203 |
| 3D Printer | Broom | ||||||
|---|---|---|---|---|---|---|---|
| Method | Init. Points | PSNR | LPIPS | SSIM | PSNR | LPIPS | SSIM |
| DeformableGS | SfM 1 | 20.78 | 0.2846 | 0.6579 | 20.03 | 0.7006 | 0.2679 |
| Random | 20.36 | 0.3871 | 0.6165 | 19.56 | 0.8325 | 0.2351 | |
| DeformableGS + Our Strategy | Random | 20.43 | 0.3137 | 0.6341 | 20.29 | 0.4074 | 0.3742 |
| 4DGS | SfM 2 | 21.94 | 0.3227 | 0.7084 | 22.24 | 0.5451 | 0.3858 |
| SfM 1 | 21.75 | 0.3144 | 0.6792 | 21.13 | 0.5802 | 0.3393 | |
| Random | 20.70 | 0.3915 | 0.6745 | 20.46 | 0.5821 | 0.3169 | |
| 4DGS + Our Strategy | Random | 21.63 | 0.3203 | 0.7036 | 21.46 | 0.5693 | 0.3483 |
| DeformableGS | SfM 1 | 23.16 | 0.2286 | 0.6273 | 25.96 | 0.1603 | 0.8502 |
| Random | 23.13 | 0.2322 | 0.6279 | 21.10 | 0.2502 | 0.7462 | |
| DeformableGS + Our Strategy | Random | 23.26 | 0.3063 | 0.6533 | 25.04 | 0.1751 | 0.8136 |
| 4DGS | SfM 2 | 28.83 | 0.2771 | 0.8158 | 28.69 | 0.1842 | 0.8666 |
| SfM 1 | 27.79 | 0.2566 | 0.7855 | 22.72 | 0.2805 | 0.7439 | |
| Random | 27.11 | 0.3164 | 0.7152 | 24.52 | 0.2617 | 0.7721 | |
| 4DGS + Our Strategy | Random | 27.50 | 0.2729 | 0.7732 | 26.54 | 0.2369 | 0.8620 |
| Activation Ratio | Activation Epoch | Reg. | PSNR | LPIPS | SSIM |
|---|---|---|---|---|---|
| 0.1 | 5000 | ✗ | 22.64 | 0.2212 | 0.8284 |
| 0.2 | 5000 | ✗ | 22.82 | 0.2285 | 0.8241 |
| 0.25 | 5000 | ✗ | 22.41 | 0.2351 | 0.8192 |
| 0.15 | 3000 | ✗ | 21.34 | 0.2563 | 0.7891 |
| 0.15 | 4000 | ✗ | 23.16 | 0.2170 | 0.8185 |
| 0.15 | 6000 | ✗ | 19.91 | 0.2922 | 0.7666 |
| 0.15 | 7000 | ✗ | 22.18 | 0.2291 | 0.8204 |
| 0.15 | 5000 | ✗ | 23.20 | 0.2175 | 0.8329 |
| 0.15 | 5000 | ✓ | 23.89 | 0.1901 | 0.8491 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, X.; Chen, J.; Xing, W.; Lin, H.; Zhao, L. Relaxing Accurate Initialization for Monocular Dynamic Scene Reconstruction with Gaussian Splatting. Appl. Sci. 2026, 16, 1321. https://doi.org/10.3390/app16031321
Wang X, Chen J, Xing W, Lin H, Zhao L. Relaxing Accurate Initialization for Monocular Dynamic Scene Reconstruction with Gaussian Splatting. Applied Sciences. 2026; 16(3):1321. https://doi.org/10.3390/app16031321
Chicago/Turabian StyleWang, Xinyu, Jiafu Chen, Wei Xing, Huaizhong Lin, and Lei Zhao. 2026. "Relaxing Accurate Initialization for Monocular Dynamic Scene Reconstruction with Gaussian Splatting" Applied Sciences 16, no. 3: 1321. https://doi.org/10.3390/app16031321
APA StyleWang, X., Chen, J., Xing, W., Lin, H., & Zhao, L. (2026). Relaxing Accurate Initialization for Monocular Dynamic Scene Reconstruction with Gaussian Splatting. Applied Sciences, 16(3), 1321. https://doi.org/10.3390/app16031321

