DBDST-Net: Dual-Branch Decoupled Image Style Transfer Network
Abstract
1. Introduction
- Due to the limited availability of data, this work builds upon Stable Diffusion and leverages the LoRA technique for fine-tuning, producing a model that can consistently generate the required dataset in scenarios featuring elements of Chinese culture.
- To address the difficulty of clearly separating content and style features in existing methods, a Dual-Branch Decoupled Image Style Transfer Network (DBDST-Net) is proposed. In the content feature decoupling branch, a Content Feature Attention Extractor module is designed to effectively focus on the detailed information of the content image, enabling more accurate extraction of content features. In the style feature decoupling branch, the proposed Style Feature Attention Extractor module helps the model to place greater attention on the color and shape of the image, enhancing the effectiveness of style feature extraction.
- To enhance the decoupling capability of DBDST-Net, a loss function called the dense-regressive loss is proposed. This loss measures the difference between the original content image and the content image regressed from the stylized result, effectively optimizing the decoupling performance of the dual-branch structure.
- Extensive experiments show that DBDST-Net can effectively separate content and style features, generating high-quality stylized images.
2. Related Work
2.1. Diffusion Models
2.2. Style Transfer
3. Proposed Method
3.1. Overall Architecture
3.2. Feature Decoupling Encoder
3.3. Loss Function
3.3.1. Style Loss Function
3.3.2. Content Loss Function
3.3.3. Dense-Regressive Loss Function
4. Dataset
5. Experiments
5.1. LoRA Experiments
5.2. Style Transfer Experiments
5.2.1. User Preference Result
5.2.2. Quantitative Comparison of Style Transfer
5.2.3. Inference Time
5.2.4. Ablation Study
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Meier, B.J. Painterly rendering for animation. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 4–9 August 1996; pp. 477–484. [Google Scholar]
- Hertzmann, A.; Jacobs, C.E.; Oliver, N.; Curless, B.; Salesin, D.H. Image analo-gies. In Proceedings of the 28th Annual Conference on Computer Graphics Andinteractive Techniques, Los Angeles, CA, USA, 12–17 August 2001; pp. 327–340. [Google Scholar]
- Efros, A.A.; Leung, T.K. Texture synthesis by nonparametric sam- pling. In Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra (Corfu), Greece, 20–27 September 1999; pp. 1033–1038. [Google Scholar]
- Wei, L.Y.; Levoy, M. Fast texture synthesis using treestructured vector quantiza- tion. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 23–28 July 2000; pp. 479–488. [Google Scholar]
- Han, C.; Risser, E.; Ramamoorthi, R.; Grinspun, E. Multiscale texture synthesis. In Proceedings of the ACM SIGGRAPH 2008, Los Angeles, CA, USA, 11–15 August 2008; pp. 1–8. [Google Scholar]
- Gatys, L.; Ecker, A.S.; Bethge, M. Texture synthesis using convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 7–12 December 2015; pp. 262–270. [Google Scholar]
- Johnson, J.; Alahi, A.; Li, F.-F. Perceptual losses for real-time style transfer and super- resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 694–711. [Google Scholar]
- Ulyanov, D.; Lebedev, V.; Vedaldi, A.; Lempitsky, V. Texture networks: Feed-forward synthesis of textures and stylized images. In Proceedings of the International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; pp. 1349–1357. [Google Scholar]
- Jing, Y.; Liu, X.; Ding, Y.; Wang, X.; Ding, E.; Song, M.; Wen, S. Dynamic instance normalization for arbitrary style transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 4369–4376. [Google Scholar]
- Zhang, Y.; Tang, F.; Dong, W.; Huang, H.; Ma, C.; Lee, T.Y.; Xu, C. A unified arbitrary style transfer framework via adaptive contrastive learning. ACM Trans. Graph. 2023, 42, 1–16. [Google Scholar] [CrossRef]
- Deng, Y.; Tang, F.; Dong, W.; Ma, C.; Pan, X.; Wang, L.; Xu, C. StyTr2: Image Style Transfer with Transformers. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2021; pp. 11316–11326. [Google Scholar]
- Chen, J.; Liu, G.; Chen, X. AnimeGAN: A novel lightweight GAN for photo animation. In Artificial Intelligence Algorithms and Applications: 11th International Symposium, ISICA 2019, Guangzhou, China, November 16–17, 2019; Revised Selected Papers 11; Springer: Singapore, 16 November 2020; pp. 242–256. [Google Scholar]
- Wang, Z.; Zhao, L.; Zuo, Z.; Li, A.; Chen, H.; Xing, W.; Lu, D. MicroAST: Towards super-fast ultra-resolution arbitrary style transfer. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 2742–2750. [Google Scholar]
- Ruta, D.S.; Gilbert, A.; Collomosse, J.P.; Shechtman, E.; Kolkin, N. Neat: Neural artistic tracing for beautiful style transfer. arXiv 2023, arXiv:2304.05139. [Google Scholar]
- Park, D.Y.; Lee, K.H. Arbitrary style transfer with style-attentional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–21 June 2019; pp. 5880–5888. [Google Scholar]
- Zhang, Y.; Tang, F.; Dong, W.; Huang, H.; Ma, C.; Lee, T.Y.; Xu, C. Domain enhanced arbitrary image style transfer via contrastive learning. In Proceedings of the ACM SIGGRAPH 2022 Conference Proceedings, Vancouver, BC, Canada, 8–11 August 2022; pp. 1–8. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 6840–6851. [Google Scholar]
- Wang, Z.; Zhao, L.; Xing, W. Stylediffusion: Controllable disentangled style transfer via diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision., Paris, France, 2-6 October 2023; pp. 7677–7689. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar]
- Kang, M.; Zhang, R.; Barnes, C.; Paris, S.; Kwak, S.; Park, J.; Shechtman, E.; Zhu, J.Y.; Park, T. Distilling diffusion models into conditional gans. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2024; pp. 428–447. [Google Scholar]
- Ashikhmin, M. Synthesizing natural textures. In Proceedings of the 2001 Symposium on Interactive 3D Graphics, Research Triangle Park, NC, USA, 19–21 March 2001; pp. 217–226. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
- Chen, Y.; Lai, Y.K.; Liu, Y.J. CartoonGAN: Generative Adversarial Networks for Photo Carbonization. In Proceedings of the IEEE/CVF Conference on Computer Vision & Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; IEEE: Piscataway, NJ, USA, 2018. [Google Scholar]
- Lin, T.; Ma, Z.; Li, F.; He, D.; Li, X.; Ding, E.; Wang, N.; Li, J.; Gao, X. Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021. [Google Scholar] [CrossRef]
- Li, S. Diffstyler: Diffusion-based localized image style transfer. arXiv 2024, arXiv:2403.18461. [Google Scholar]
- Trockman, A.; Kolter, J.Z. Patches are all you need? arXiv 2022, arXiv:2201.09792. [Google Scholar]
- Kolkin, N.; Salavon, J.; Shakhnarovich, G. Style transfer by relaxed optimal transport and self-similarity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–21 June 2019; pp. 10051–10060. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 10684–10695. [Google Scholar]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- Yoo, J.; Uh, Y.; Chun, S.; Kang, B.; Ha, J.W. Photorealistic Style Transfer via Wavelet Trans- forms. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar] [CrossRef]
- Wen, L.; Gao, C.; Zou, C. CAP-VSTNet: Content affinity preserved versatile style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 18300–18309. [Google Scholar]
- Wang, Z.; Zhang, J.; Ji, Z.; Bai, J.; Shan, S. Cclap: Controllable chinese landscape painting generation via latent diffusion model. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, QLD, Australia, 10–14 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2117–2122. [Google Scholar]
Method | SSIM ↑ | Gram Loss ↓ | Content Similarity ↑ | Style Similarity ↑ |
---|---|---|---|---|
AdnIN [30] | 0.37 | 1.977 | 53.80 | 43.15 |
AnimeGANv2 [12] | 0.41 | 1.428 | 55.45 | 47.60 |
PhotoWCT2 [31] | 0.58 | 1.352 | 57.31 | 59.24 |
Lapstyle [25] | 0.59 | 0.766 | 60.60 | 59.16 |
CAP-VSTNet [32] | 0.65 | 0.750 | 66.24 | 63.23 |
CCLAP [33] | 0.62 | 0.799 | 64.85 | 60.25 |
DBDST-Net | 0.67 | 0.750 | 69.02 | 63.27 |
Method | Run Time(s) (FHD) | Run Time(s) (2K) | Run Time(s) (4K) |
---|---|---|---|
AdaIN [30] | 0.600 | Out | Out |
AnimeGANv2 [12] | 0.599 | Out | Out |
PhotoWCT2 [31] | 0.291 | 0.447 | 1.036 |
Lapstyle [25] | 0.277 | 0.401 | 1.058 |
CAP-VSTNet [32] | 0.120 | 0.174 | 0.199 |
DBDST-Net | 0.015 | 0.016 | 0.022 |
Dense-Regressive Loss | FEM | Decoupling Encoder | Style Loss | Content Loss |
---|---|---|---|---|
× | × | × | 5.854 | 6.324 |
✓ | × | × | 3.450 | 5.187 |
✓ | ✓ | × | 1.633 | 2.326 |
✓ | ✓ | ✓ | 0.480 | 1.037 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Su, N.; Wang, J.; Zhang, J.; Li, Y.; Pan, Y. DBDST-Net: Dual-Branch Decoupled Image Style Transfer Network. Information 2025, 16, 561. https://doi.org/10.3390/info16070561
Su N, Wang J, Zhang J, Li Y, Pan Y. DBDST-Net: Dual-Branch Decoupled Image Style Transfer Network. Information. 2025; 16(7):561. https://doi.org/10.3390/info16070561
Chicago/Turabian StyleSu, Na, Jingtao Wang, Jingjing Zhang, Ying Li, and Yun Pan. 2025. "DBDST-Net: Dual-Branch Decoupled Image Style Transfer Network" Information 16, no. 7: 561. https://doi.org/10.3390/info16070561
APA StyleSu, N., Wang, J., Zhang, J., Li, Y., & Pan, Y. (2025). DBDST-Net: Dual-Branch Decoupled Image Style Transfer Network. Information, 16(7), 561. https://doi.org/10.3390/info16070561