Three-Dimensional Gaussian Style Transfer Method Based on Two-Dimensional Priors and Iterative Optimization
Abstract
1. Introduction
- Real-time 3D style transfer via 3D Gaussian representation: We adopt 3D Gaussians as the base representation for stylization, achieving fast rendering and interactive feedback that meets the demands of real-time applications.
- Cross-domain guidance using 2D style models and diffusion priors: By incorporating 2D neural style transfer models and stable diffusion guidance, we provide strong priors that enhance visual fidelity and stylistic coherence throughout the 3D scene.
- An iterative optimization framework for gradual and consistent stylization: We introduce a multi-stage training strategy that allows for progressive style injection under multi-view constraints, leading to improved 3D consistency and perceptual quality.
- A new direction for 3D stylization research: Our framework demonstrates that 2D priors and diffusion models can be effectively leveraged to enhance 3D scene stylization, opening new possibilities for integrating generative models with graphics pipelines.
2. Related Works
2.1. Two-Dimensional Image Style Transfer
2.2. Three-Dimensional Gaussian Scene Style Transfer
3. Method
3.1. Overview
3.2. Optimization of Three-Dimensional Gaussians
3.3. Diffusion Prior-Based 3D Gaussian Optimization
3.4. Loss Function
4. Experiment
4.1. Datasets
4.2. Implementation Details
- Randomize camera view order for supervision image replacement.
- Render the 3D Gaussian from the selected views, perform style transfer on rendered images using the style image.
- Replace corresponding supervision images with stylized ones.
- Compute loss between rendered and supervision images to optimize the 3D Gaussian via backpropagation.
- Generate new camera poses and render new views using the optimized 3D Gaussian.
- Add noise to these images and use a pre-trained diffusion model to predict noise. The difference between predicted and added noise is used to further refine the 3D Gaussian.
4.3. Qualitative Comparison
4.4. Quantitative Comparison
4.5. User Study
4.6. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Johnson, J.; Alahi, A.; Li, F.-F. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 694–711. [Google Scholar]
- Park, D.Y.; Lee, K.H. Arbitrary Style Transfer With Style-Attentional Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5873–5881. [Google Scholar] [CrossRef]
- Chen, J.; Xing, W.; Sun, J.; Chu, T.; Huang, Y.; Ji, B.; Zhao, L.; Lin, H.; Chen, H.; Wang, Z. PNeSM: Arbitrary 3D scene stylization via prompt-based neural style mapping. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 1091–1099. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 26th Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Li, Y.; Fang, C.; Yang, J.; Wang, Z.; Lu, X.; Yang, M.H. Universal style transfer via feature transforms. In Proceedings of the 30th Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- Zhang, Y.; Huang, N.; Tang, F.; Huang, H.; Ma, C.; Dong, W.; Xu, C. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 10146–10156. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; PmLR: Cambridge, MA, USA, 2021; pp. 8748–8763. [Google Scholar]
- Wang, Z.; Zhao, L.; Xing, W. Stylediffusion: Controllable disentangled style transfer via diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 7677–7689. [Google Scholar]
- Huang, H.P.; Tseng, H.Y.; Saini, S.; Singh, M.; Yang, M.H. Learning to stylize novel views. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 13869–13878. [Google Scholar]
- Chen, S.; Chen, X.; Pang, A.; Zeng, X.; Cheng, W.; Fu, Y.; Yin, F.; Wang, B.; Yu, J.; Yu, G.; et al. Meshxl: Neural coordinate field for generative 3d foundation models. Adv. Neural Inf. Process. Syst. 2024, 37, 97141–97166. [Google Scholar]
- Yang, G.; Huang, X.; Hao, Z.; Liu, M.Y.; Belongie, S.; Hariharan, B. Pointflow: 3D point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4541–4550. [Google Scholar]
- Liu, K.; Zhan, F.; Chen, Y.; Zhang, J.; Yu, Y.; El Saddik, A.; Lu, S.; Xing, E.P. Stylerf: Zero-shot 3D style transfer of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 8338–8348. [Google Scholar]
- Zhang, Y.; He, Z.; Xing, J.; Yao, X.; Jia, J. Ref-npr: Reference-based non-photorealistic radiance fields for controllable scene stylization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 4242–4251. [Google Scholar]
- Liu, K.; Zhan, F.; Xu, M.; Theobalt, C.; Shao, L.; Lu, S. Stylegaussian: Instant 3D style transfer with gaussian splatting. In Proceedings of the SIGGRAPH Asia 2024 Technical Communications, Tokyo, Japan, 3–6 December 2024; pp. 1–4. [Google Scholar]
- Galerne, B.; Wang, J.; Raad, L.; Morel, J.M. SGSST: Scaling Gaussian Splatting Style Transfer. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 26535–26544. [Google Scholar]
- Jain, S.; Kuthiala, A.; Sethi, P.S.; Saxena, P. Stylesplat: 3d object style transfer with gaussian splatting. arXiv 2024, arXiv:2407.09473. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 10684–10695. [Google Scholar]
- Xiong, H. SparseGS: Real-Time 360° Sparse View Synthesis Using Gaussian Splatting; University of California: Los Angeles, CA, USA, 2024. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Poole, B.; Jain, A.; Barron, J.T.; Mildenhall, B. Dreamfusion: Text-to-3D using 2D diffusion. arXiv 2022, arXiv:2209.14988. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Ortiz-Cayon, R.; Kalantari, N.K.; Ramamoorthi, R.; Ng, R.; Kar, A. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (ToG) 2019, 38, 1–14. [Google Scholar] [CrossRef]
- Knapitsch, A.; Park, J.; Zhou, Q.Y.; Koltun, V. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans. Graph. (ToG) 2017, 36, 1–13. [Google Scholar] [CrossRef]
- Barron, J.T.; Mildenhall, B.; Verbin, D.; Srinivasan, P.P.; Hedman, P. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5470–5479. [Google Scholar]
- Schönberger, J.L.; Zheng, E.; Frahm, J.M.; Pollefeys, M. Pixelwise view selection for unstructured multi-view stereo. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part III 14. Springer: Cham, Switzerland, 2016; pp. 501–518. [Google Scholar]
- Cheng, K.; Long, X.; Yang, K.; Yao, Y.; Yin, W.; Ma, Y.; Wang, W.; Chen, X. Gaussianpro: 3D gaussian splatting with progressive propagation. In Proceedings of the Forty-First International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Zhang, K.; Kolkin, N.; Bi, S.; Luan, F.; Xu, Z.; Shechtman, E.; Snavely, N. Arf: Artistic radiance fields. In Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 717–733. [Google Scholar]
- Huang, Y.H.; He, Y.; Yuan, Y.J.; Lai, Y.K.; Gao, L. Stylizednerf: Consistent 3D scene stylization as stylized nerf via 2D-3D mutual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18342–18352. [Google Scholar]
ARF | Stylized-NeRF | StyleRF | Ours | ||
---|---|---|---|---|---|
short-range | LPIPS | 0.086 | 0.035 | 0.056 | 0.052 |
RMSE | 0.066 | 0.032 | 0.061 | 0.053 | |
long-range | LPIPS | 0.141 | 0.052 | 0.083 | 0.082 |
RMSE | 0.089 | 0.044 | 0.083 | 0.079 |
ARF | LSNV | StyleRF | Style-Gaussian | Ours | ||
---|---|---|---|---|---|---|
short-range | LPIPS | 0.054 | 0.063 | 0.060 | 0.037 | 0.031 |
RMSE | 0.064 | 0.053 | 0.048 | 0.037 | 0.029 | |
long-range | LPIPS | 0.249 | 0.084 | 0.223 | 0.091 | 0.056 |
RMSE | 0.283 | 0.060 | 0.206 | 0.082 | 0.075 |
ARF | Stylized-NeRF | LSNV | StyleRF | Style-Gaussian | Ours | |
---|---|---|---|---|---|---|
Rendering speed | 0.55 | 11.33 | 2.59 | 15.51 | 0.017 | 0.017 |
Comparison | User Preference Score |
---|---|
Ours vs. ARF | 88.2% |
Ours vs. StyleRF | 83.7% |
Ours vs. LSNV | 82.3% |
Ours vs. Style-Gaussian | 76.4% |
Ours vs. StylizedNeRF | 82.3% |
0.1 | 0.3 | 0.5 | 0.7 | 0.9 | ||
---|---|---|---|---|---|---|
short-range | LPIPS | 0.061 | 0.046 | 0.063 | 0.056 | 0.069 |
RMSE | 0.062 | 0.049 | 0.069 | 0.064 | 0.070 | |
long-range | LPIPS | 0.143 | 0.155 | 0.185 | 0.151 | 0.177 |
RMSE | 0.190 | 0.195 | 0.215 | 0.196 | 0.201 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, W.; Wang, X.; Yin, H.; Xing, W.; Lin, H.; Chen, L.; Zhao, L. Three-Dimensional Gaussian Style Transfer Method Based on Two-Dimensional Priors and Iterative Optimization. Appl. Sci. 2025, 15, 9678. https://doi.org/10.3390/app15179678
Zhang W, Wang X, Yin H, Xing W, Lin H, Chen L, Zhao L. Three-Dimensional Gaussian Style Transfer Method Based on Two-Dimensional Priors and Iterative Optimization. Applied Sciences. 2025; 15(17):9678. https://doi.org/10.3390/app15179678
Chicago/Turabian StyleZhang, Weijing, Xinyu Wang, Haolin Yin, Wei Xing, Huaizhong Lin, Lixia Chen, and Lei Zhao. 2025. "Three-Dimensional Gaussian Style Transfer Method Based on Two-Dimensional Priors and Iterative Optimization" Applied Sciences 15, no. 17: 9678. https://doi.org/10.3390/app15179678
APA StyleZhang, W., Wang, X., Yin, H., Xing, W., Lin, H., Chen, L., & Zhao, L. (2025). Three-Dimensional Gaussian Style Transfer Method Based on Two-Dimensional Priors and Iterative Optimization. Applied Sciences, 15(17), 9678. https://doi.org/10.3390/app15179678