StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding
Abstract
1. Introduction
- Dual-binding personalization framework: we introduce StyleForge, a novel approach that binds a target style to both a unique style token and an auxiliary token, enabling controllable and robust style adaptation from limited data.
- Auxiliary-guided human rendering: we leverage carefully curated auxiliary images containing human-centric style elements to guide the rendering of people, enhancing generalization and mitigating language drift.
- Multi-token decomposition for fine-grained control: we extend our framework, called Single-StyleForge, to Multi-StyleForge, which disentangles person and background characteristics into separate tokens to improve the compositional alignment between the prompt and image.
- Plug-and play adaptation for any artistic style: StyleForge requires only 15–20 reference images, imposes no architectural changes, and can be easily integrated with existing text-to-image diffusion models for user-friendly personalization.
- Extensive evaluation: We conduct experiments across six distinct art styles and demonstrate superior style fidelity, text–image alignment, and robustness compared to state-of-the-art methods. To summarize, our main result is illustrated in Figure 1.
2. Related Work
2.1. Text-to-Image Synthesis
2.2. Style Transfer
2.3. Personalizing and Controlling Diffusion Models
2.4. Toward Stylized Personalization
3. Preliminaries
3.1. Diffusion Models
3.2. DreamBooth
4. Method: StyleForge
4.1. Single-StyleForge: Overall Architecture
Algorithm 1 Single-StyleForge. |
|
4.2. Rationale Behind Auxiliary Images
4.2.1. Aiding in the Binding of the Target Style
4.2.2. Improving Text-to-Image Performance
4.2.3. Mitigating Language Drift
4.3. Multi-StyleForge
4.3.1. Multi-StyleRef Prompts Configuration
4.3.2. Training of Multi-StyleForge
Algorithm 2 Multi-StyleForge. |
|
5. Experimental Results
5.1. Experimental Setup
- Realism focuses on an accurate and detailed representation of subjects.
- Midjourney is characterized by detailed rendering and dramatic imaginative expressions, reflecting the distinctive style of the MidJourney model [61].
- Anime refers to a Japanese animation style characterized by vibrant colors, exaggerated facial expressions, and dynamic movement.
- Romanticism prioritizes emotional expression, imagination, and the sublime, often portraying fantastical and emotional subjects with a focus on rich dark tones and extensive canvases.
- Cubism emphasizes representing visual experiences by depicting objects from multiple angles simultaneously, often in polygonal or fragmented forms.
- Pixel art involves creating images by breaking them down into small square pixels, adjusting their size and arrangement to form the overall image.
5.2. Implementation Details
5.2.1. Ours
5.2.2. Baseline Models
5.3. Analysis of StyleRef Images
- Only backgrounds: 20 landscape images in the target style.
- Only persons: 20 portraits and/or people images in the target style.
- Mixed backgrounds and persons: a mix of 10 landscape and 10 people images.
5.4. Analysis of the Aux Images
5.4.1. Configuration of Aux Images
5.4.2. Auxiliary Binding
5.4.3. Comparison with DreamBooth
5.5. Multi-StyleForge: Improved Text–Image Alignment Method
5.6. Comparison
5.7. User Study
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. Training Step
Appendix A.2. Training Strategy Comparison
Strategy | Realism | Midjourney | Anime | |||
---|---|---|---|---|---|---|
FID↓ | CLIP↑ | FID↓ | CLIP↑ | FID↓ | CLIP↑ | |
sequential training | 15.24 | 29.85 | 14.92 | 30.10 | 22.73 | 27.45 |
parallel training (ours) | 13.48 | 31.21 | 12.76 | 32.08 | 20.88 | 28.66 |
Appendix A.3. Auxiliary Image
Data Generation Details
Appendix A.4. CLIP-Based Analysis of Auxiliary Image Selection
Auxiliary Set | Roman | Realism | Anime | Midjourney | Cubism | Pixel Art |
---|---|---|---|---|---|---|
Style token [14] | 0.532 | 0.520 | 0.527 | 0.535 | 0.517 | 0.523 |
Human-drawn art | 0.498 | 0.486 | 0.490 | 0.493 | 0.482 | 0.489 |
Ours | 0.561 | 0.546 | 0.555 | 0.568 | 0.544 | 0.552 |
Appendix A.5. Qualitative Comparison with Baseline Methods
Appendix A.6. Applications
Appendix A.7. User Study Questionnaire
References
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 10684–10695. [Google Scholar]
- Esser, P.; Kulal, S.; Blattmann, A.; Entezari, R.; Müller, J.; Saini, H.; Levi, Y.; Lorenz, D.; Sauer, A.; Boesel, F.; et al. Scaling rectified flow transformers for high-resolution image synthesis. In Proceedings of the Forty-First International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-shot text-to-image generation. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; pp. 8821–8831. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.L.; Ghasemipour, K.; Gontijo Lopes, R.; Karagol Ayan, B.; Salimans, T.; et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural Inf. Process. Syst. 2022, 35, 36479–36494. [Google Scholar]
- Sehwag, V.; Kong, X.; Li, J.; Spranger, M.; Lyu, L. Stretching each dollar: Diffusion training from scratch on a micro-budget. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville TN, USA, 11–15 June 2025; pp. 28596–28608. [Google Scholar]
- Jiang, D.; Song, G.; Wu, X.; Zhang, R.; Shen, D.; Zong, Z.; Liu, Y.; Li, H. Comat: Aligning text-to-image diffusion model with image-to-text concept matching. Adv. Neural Inf. Process. Syst. 2024, 37, 76177–76209. [Google Scholar]
- Liu, J.; Li, C.; Sun, Q.; Ming, J.; Fang, C.; Wang, J.; Zeng, B.; Liu, S. Ada-adapter: Fast few-shot style personlization of diffusion model with pre-trained image encoder. arXiv 2024, arXiv:2407.05552. [Google Scholar]
- Song, N.; Yang, X.; Yang, Z.; Lin, G. Towards lifelong few-shot customization of text-to-image diffusion. arXiv 2024, arXiv:2411.05544. [Google Scholar]
- Alshahrani, A. Bridging Cities and Citizens with Generative AI: Public Readiness and Trust in Urban Planning. Buildings 2025, 15, 2494. [Google Scholar] [CrossRef]
- Liu, Z.; He, Y.; Demian, P.; Osmani, M. Immersive technology and building information modeling (BIM) for sustainable smart cities. Buildings 2024, 14, 1765. [Google Scholar] [CrossRef]
- del Campo, G.; Saavedra, E.; Piovano, L.; Luque, F.; Santamaria, A. Virtual Reality and Internet of Things Based Digital Twin for Smart City Cross-Domain Interoperability. Appl. Sci. 2024, 14, 2747. [Google Scholar] [CrossRef]
- De Silva, D.; Mills, N.; Moraliyage, H.; Rathnayaka, P.; Wishart, S.; Jennings, A. Responsible artificial intelligence hyper-automation with generative AI agents for sustainable cities of the future. Smart Cities 2025, 8, 34. [Google Scholar] [CrossRef]
- Ruiz, N.; Li, Y.; Jampani, V.; Pritch, Y.; Rubinstein, M.; Aberman, K. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22500–22510. [Google Scholar]
- Gal, R.; Alaluf, Y.; Atzmon, Y.; Patashnik, O.; Bermano, A.H.; Chechik, G.; Cohen-Or, D. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv 2022, arXiv:2208.01618. [Google Scholar]
- Kim, M.; Yoo, J.; Kwon, S. Personalized text-to-image model enhancement strategies: Sod preprocessing and cnn local feature integration. Electronics 2023, 12, 4707. [Google Scholar] [CrossRef]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
- Chang, H.; Zhang, H.; Barber, J.; Maschinot, A.; Lezama, J.; Jiang, L.; Yang, M.H.; Murphy, K.; Freeman, W.T.; Rubinstein, M.; et al. Muse: Text-To-Image Generation via Masked Generative Transformers. arXiv 2023, arXiv:2301.00704. [Google Scholar]
- Yu, J.; Xu, Y.; Koh, J.Y.; Luong, T.; Baid, G.; Wang, Z.; Vasudevan, V.; Ku, A.; Yang, Y.; Ayan, B.K.; et al. Scaling autoregressive models for content-rich text-to-image generation. arXiv 2022, arXiv:2206.10789. [Google Scholar]
- Tewel, Y.; Kaduri, O.; Gal, R.; Kasten, Y.; Wolf, L.; Chechik, G.; Atzmon, Y. Training-free consistent text-to-image generation. ACM Trans. Graph. 2024, 43, 52. [Google Scholar] [CrossRef]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 4401–4410. [Google Scholar]
- Li, R. Image Style Transfer with Generative Adversarial Networks. In Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, New York, NY, USA, 20–24 October 2021; pp. 2950–2954. [Google Scholar] [CrossRef]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Way, D.L.; Chang, W.C.; Shih, Z.C. Deep Learning for Anime Style Transfer. In Proceedings of the 2019 3rd International Conference on Advances in Image Processing, ICAIP ’19, New York, NY, USA, Chengdu, China, 8–10 November 2019; pp. 139–143. [Google Scholar] [CrossRef]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- Liu, S.; Lin, T.; He, D.; Li, F.; Wang, M.; Li, X.; Sun, Z.; Li, Q.; Ding, E. Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6649–6658. [Google Scholar]
- Ma, Y.; Zhao, C.; Huang, B.; Li, X.; Basu, A. RAST: Restorable Arbitrary Style Transfer. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 143. [Google Scholar] [CrossRef]
- Patashnik, O.; Wu, Z.; Shechtman, E.; Cohen-Or, D.; Lischinski, D. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 2085–2094. [Google Scholar]
- Chen, Y.; Zhou, H.; Chen, J.; Yang, N.; Zhao, J.; Chao, Y. Diffusion Model-Based Cartoon Style Transfer for Real-World 3D Scenes. ISPRS Int. J. Geo-Inf. 2025, 14, 303. [Google Scholar] [CrossRef]
- Han, X.; Wu, Y.; Wan, R. A method for style transfer from artistic images based on depth extraction generative adversarial network. Appl. Sci. 2023, 13, 867. [Google Scholar] [CrossRef]
- Su, N.; Wang, J.; Pan, Y. Multi-Scale Universal Style-Transfer Network Based on Diffusion Model. Algorithms 2025, 18, 481. [Google Scholar] [CrossRef]
- Xiang, Z.; Wan, X.; Xu, L.; Yu, X.; Mao, Y. A Training-Free Latent Diffusion Style Transfer Method. Information 2024, 15, 588. [Google Scholar] [CrossRef]
- Yang, H.; Yang, H.; Min, K. Artfusion: A Diffusion Model-Based Style Synthesis Framework for Portraits. Electronics 2024, 13, 509. [Google Scholar] [CrossRef]
- Wang, Z.; Zhao, L.; Xing, W. StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 7677–7689. [Google Scholar]
- Hamazaspyan, M.; Navasardyan, S. Diffusion-Enhanced PatchMatch: A Framework for Arbitrary Style Transfer With Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 797–805. [Google Scholar]
- Zhang, Y.; Huang, N.; Tang, F.; Huang, H.; Ma, C.; Dong, W.; Xu, C. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 10146–10156. [Google Scholar]
- Ahn, N.; Lee, J.; Lee, C.; Kim, K.; Kim, D.; Nam, S.H.; Hong, K. DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models. arXiv 2023, arXiv:2309.06933. [Google Scholar] [CrossRef]
- Li, J.; Li, D.; Savarese, S.; Hoi, S. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv 2023, arXiv:2301.12597. [Google Scholar]
- Li, H.; Liu, Y.; Liu, C.; Pang, H.; Xu, K. A Few-Shot Steel Surface Defect Generation Method Based on Diffusion Models. Sensors 2025, 25, 3038. [Google Scholar] [CrossRef] [PubMed]
- Martini, L.; Iacono, S.; Zolezzi, D.; Vercelli, G.V. Advancing Persistent Character Generation: Comparative Analysis of Fine-Tuning Techniques for Diffusion Models. AI 2024, 5, 1779–1792. [Google Scholar] [CrossRef]
- Alaluf, Y.; Richardson, E.; Metzer, G.; Cohen-Or, D. A neural space-time representation for text-to-image personalization. ACM Trans. Graph. 2023, 42, 243. [Google Scholar] [CrossRef]
- Park, J.; Ko, B.; Jang, H. StyleBoost: A Study of Personalizing Text-to-Image Generation in Any Style using DreamBooth. In Proceedings of the 2023 14th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 11–14 October 2023; pp. 93–98. [Google Scholar] [CrossRef]
- Dong, Z.; Wei, P.; Lin, L. DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning. arXiv 2023, arXiv:2211.11337. [Google Scholar]
- Lu, H.; Tunanyan, H.; Wang, K.; Navasardyan, S.; Wang, Z.; Shi, H. Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14267–14276. [Google Scholar]
- Sohn, K.; Ruiz, N.; Lee, K.; Chin, D.C.; Blok, I.; Chang, H.; Barber, J.; Jiang, L.; Entis, G.; Li, Y.; et al. StyleDrop: Text-to-Image Generation in Any Style. arXiv 2023, arXiv:2306.00983. [Google Scholar]
- Ruiz, N.; Li, Y.; Jampani, V.; Wei, W.; Hou, T.; Pritch, Y.; Wadhwa, N.; Rubinstein, M.; Aberman, K. Hyperdreambooth: Hypernetworks for fast personalization of text-to-image models. arXiv 2023, arXiv:2307.06949. [Google Scholar]
- Gal, R.; Arar, M.; Atzmon, Y.; Bermano, A.H.; Chechik, G.; Cohen-Or, D. Encoder-based domain tuning for fast personalization of text-to-image models. ACM Trans. Graph. (TOG) 2023, 42, 150. [Google Scholar] [CrossRef]
- Arar, M.; Gal, R.; Atzmon, Y.; Chechik, G.; Cohen-Or, D.; Shamir, A.; H. Bermano, A. Domain-agnostic tuning-encoder for fast personalization of text-to-image models. In Proceedings of the SIGGRAPH Asia 2023 Conference Papers, Sydney, NSW, Australia, 12–15 December 2023; pp. 1–10. [Google Scholar]
- Kumari, N.; Zhang, B.; Zhang, R.; Shechtman, E.; Zhu, J.Y. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1931–1941. [Google Scholar]
- Han, L.; Li, Y.; Zhang, H.; Milanfar, P.; Metaxas, D.; Yang, F. SVDiff: Compact Parameter Space for Diffusion Fine-Tuning. arXiv 2023, arXiv:2303.11305. [Google Scholar] [CrossRef]
- Ma, J.; Liang, J.; Chen, C.; Lu, H. Subject-diffusion: Open domain personalized text-to-image generation without test-time fine-tuning. In Proceedings of the ACM SIGGRAPH 2024 Conference Papers, Tokyo, Japan, 3–6 December 2024; pp. 1–12. [Google Scholar]
- Tewel, Y.; Gal, R.; Chechik, G.; Atzmon, Y. Key-locked rank one editing for text-to-image personalization. In Proceedings of the ACM SIGGRAPH 2023 Conference, Los Angeles, CA, USA, 6–10 August 2023; pp. 1–11. [Google Scholar]
- Avrahami, O.; Aberman, K.; Fried, O.; Cohen-Or, D.; Lischinski, D. Break-a-scene: Extracting multiple concepts from a single image. In Proceedings of the SIGGRAPH Asia 2023 Conference Papers, Sydney, NSW, Australia, 12–15 December 2023; pp. 1–12. [Google Scholar]
- Zhang, L.; Agrawala, M. Adding conditional control to text-to-image diffusion models. arXiv 2023, arXiv:2302.05543. [Google Scholar]
- Tang, R.; Liu, L.; Pandey, A.; Jiang, Z.; Yang, G.; Kumar, K.; Stenetorp, P.; Lin, J.; Ture, F. What the daam: Interpreting stable diffusion using cross attention. arXiv 2022, arXiv:2210.04885. [Google Scholar] [CrossRef]
- Park, J.; Jang, H. I2AM: Interpreting Image-to-Image Latent Diffusion Models via Attribution Maps. arXiv 2024, arXiv:2407.12331. [Google Scholar]
- Sohl-Dickstein, J.; Weiss, E.A.; Maheswaranathan, N.; Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. arXiv 2015, arXiv:1503.03585. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. arXiv 2022, arXiv:2010.02502. [Google Scholar] [CrossRef]
- MidJourney. Available online: https://www.midjourney.com/ (accessed on 20 September 2025).
- Hugging Face. Available online: https://huggingface.co (accessed on 20 September 2025).
- WikiArt. Available online: https://www.wikiart.org/ (accessed on 20 September 2025).
- Pixel-Art. Available online: https://www.kaggle.com/datasets (accessed on 20 September 2025).
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Bińkowski, M.; Sutherland, D.J.; Arbel, M.; Gretton, A. Demystifying MMD GANs. arXiv 2021, arXiv:1801.01401. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Hertz, A.; Mokady, R.; Tenenbaum, J.; Aberman, K.; Pritch, Y.; Cohen-Or, D. Prompt-to-Prompt Image Editing with Cross Attention Control. arXiv 2022, arXiv:2208.01626. [Google Scholar]
- Meng, C.; He, Y.; Song, Y.; Song, J.; Wu, J.; Zhu, J.Y.; Ermon, S. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv 2021, arXiv:2108.01073. [Google Scholar]
- Jiang, D.; Wang, H.; Li, T.; Gouda, M.A.; Zhou, B. Real-time tracker of chicken for poultry based on attention mechanism-enhanced YOLO-Chicken algorithm. Comput. Electron. Agric. 2025, 237, 110640. [Google Scholar] [CrossRef]
DreamBooth | Textual Inversion | LoRA | Custom Diffusion | Single-StyleForge | Multi-StyleForge | |
---|---|---|---|---|---|---|
Tuning method | Full | Partial | Partial | Partial | Full | Full |
StyleRef image | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Aux image | ✓ | ✗ | ✗ | ✓ | ✓ | ✓ |
StyleRef Images | FID Score (↓) | KID Score (↓) | ||||
---|---|---|---|---|---|---|
Realism | Midjourney | Anime | Romanticism | Cubism | Pixel Art | |
only backgrounds | – | – | – | |||
only persons | – | – | – | |||
mix of backgrounds + persons |
Method | FID Score (↓) | KID Score (↓) | |||||
---|---|---|---|---|---|---|---|
Realism | Midjourney | Anime | Roman | Cubism | Pixel-Art | ||
Aux images | Style token [14] | ||||||
Illustration style token [14] | – | – | – | ||||
Human-drawn art | – | – | – | ||||
Target style | – | – | – | ||||
Single-StyleForge (ours) |
Method | FID Score (↓) | KID Score (↓) | CLIP Score (↑) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Realism | Midjourney | Anime | Roman | Cubism | Pixel-Art | Realism | Midjourney | Anime | Roman | Cubism | Pixel-Art | |
DreamBooth [14] | ||||||||||||
Textual Inversion [15] | ||||||||||||
LoRA [17] | ||||||||||||
Custom Diffusion [50] | ||||||||||||
Single-StyleForge (ours) | ||||||||||||
Multi-StyleForge (ours) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, J.; Ko, B.; Kang, M.; Jang, H. StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding. Appl. Sci. 2025, 15, 10623. https://doi.org/10.3390/app151910623
Park J, Ko B, Kang M, Jang H. StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding. Applied Sciences. 2025; 15(19):10623. https://doi.org/10.3390/app151910623
Chicago/Turabian StylePark, Junseo, Beomseok Ko, Minji Kang, and Hyeryung Jang. 2025. "StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding" Applied Sciences 15, no. 19: 10623. https://doi.org/10.3390/app151910623
APA StylePark, J., Ko, B., Kang, M., & Jang, H. (2025). StyleForge: Enhancing Text-to-Image Synthesis for Any Artistic Styles with Dual Binding. Applied Sciences, 15(19), 10623. https://doi.org/10.3390/app151910623