Adaptive Stylized Image Generation for Traditional Miao Batik Using Style-Conditioned LCM-LoRA Enhanced Diffusion Models
Abstract
:1. Introduction
- A structured semantic dataset, “Chinese Miao Batik Pattern Dataset with 9 Categories” (CMBP-9), was created to assist image generation and improve text–image correspondence and semantic understanding during the generation process.
- We introduce the efficient LCM-LoRA method for Miao batik image generation, which enhances both fidelity and deployment efficiency without requiring additional training.
- We propose a Style-Conditioned Linear Fusion (SCLF) module. By adaptively balancing LoRA and LCM, the limitations of static weighting in the original LCM-LoRA framework are overcome to achieve semantically sensitive and flexible Miao ethnic image generation.
- SCLF combined with LCM-LoRA outperforms individual components and baselines regarding perceived quality and generation speed. It has been verified through expert consistency evaluation that it shows higher overall performance in the generation task of diverse patterns of Miao batik.
2. Related Work
2.1. Challenges of Batik Revitalization and Innovation
2.2. Stable Diffusion Stylization in Cultural Heritage
3. Materials and Methods
3.1. CMBP-9 Dataset
3.2. LoRA Training of Traditional Chinese Miao Batik Patterns Based on Stable Diffusion
3.2.1. Model Architecture of Stable Diffusion
Forward Diffusion Process
Reverse Process
3.2.2. Low-Rank Adaptation
3.2.3. Fine-Tuning for Traditional Chinese Miao Batik Patterns
Algorithm 1 Efficient LoRA-based fine-tuning for Miao batik pattern modeling. |
|
3.3. Batik-MPDM: Diffusion-Based Accelerated Generation for Traditional Miao Batik Patterns
3.3.1. Latent Consistency Models
3.3.2. Fast Pattern Generation Module Based on Style-Conditioned Linear Fusion (SCLF)
4. Experiment
4.1. Experimental Settings
4.2. Evaluation Indicator
4.3. Ablation Experiment
4.4. Qualitative Ablation Study on Fusion Strategies
4.5. Robustness Under Multifaceted Prompt Conditions
4.6. Robustness Under Unseen Prompts
4.7. Qualitative Comparison with Style-Oriented Models
4.8. Comparative Analysis of Text-to-Image Models
4.9. Human Evaluation Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, Z.; Ren, X.; Zhang, Z. Cultural heritage as rural economic development: Batik production amongst China’s Miao population. J. Rural Stud. 2021, 81, 182–193. [Google Scholar] [CrossRef]
- Zhang, S.; Xu, J.; Gou, H.; Tan, J. A research review on the key technologies of intelligent design for customized products. Engineering 2017, 3, 631–640. [Google Scholar] [CrossRef]
- Dimitropoulos, K.; Tsalakanidou, F.; Nikolopoulos, S.; Kompatsiaris, I.; Grammalidis, N.; Manitsaris, S.; Denby, B.; Crevier-Buchman, L.; Dupont, S.; Charisis, V.; et al. A multimodal approach for the safeguarding and transmission of intangible cultural heritage: The case of i-Treasures. IEEE Intell. Syst. 2018, 33, 3–16. [Google Scholar] [CrossRef]
- Wagner, A.; de Clippele, M.S. Safeguarding cultural heritage in the digital era—A critical challenge. Int. J. Semiot. Law-Rev. Int. Sémiot. Jurid. 2023, 36, 1915–1923. [Google Scholar] [CrossRef]
- Sweetman, A.E.M. Intangible Heritage Management: An Investigation of the Role of Digital Technology in Safeguarding Heritage Crafts in the UK Now and in the Future. PhD Thesis, Queen Mary University of London, London, UK, 2019. [Google Scholar]
- Liu, R.; Pang, W.; Chen, J.; Balakrishnan, V.A.; Chin, H.L. The application of scaffolding instruction and AI-driven diffusion models in children’s aesthetic education: A case study on teaching traditional chinese painting of the twenty-four solar terms in chinese culture. Educ. Inf. Technol. 2024, 30, 9129–9160. [Google Scholar] [CrossRef]
- Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.L.; Ghasemipour, K.; Gontijo Lopes, R.; Karagol Ayan, B.; Salimans, T.; et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural Inf. Process. Syst. 2022, 35, 36479–36494. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
- Luo, S.; Tan, Y.; Huang, L.; Li, J.; Zhao, H. Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv 2023, arXiv:2310.04378. [Google Scholar]
- Song, Y.; Dhariwal, P.; Chen, M.; Sutskever, I. Consistency models. arXiv 2023, arXiv:2303.01469. [Google Scholar]
- Podell, D.; English, Z.; Lacey, K.; Blattmann, A.; Dockhorn, T.; Müller, J.; Penna, J.; Rombach, R. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv 2023, arXiv:2307.01952. [Google Scholar]
- Song, G.; Prompongsaton, N.; Kotchapakdee, P. Cultural Preservation and Revival: Discuss efforts to preserve and revive traditional Miao costume patterns. Explore how local communities, artisans, and cultural organizations are working to ensure that these patterns are passed down to future generations. Migr. Lett. 2024, 21, 231–243. [Google Scholar]
- Wang, S.; Kolosnichenko, O. Study of Miao embroidery: Semiotics of patterns and artistic value. Art Des. 2024, 3, 98–109. [Google Scholar]
- Na, Z.; Sharudin, S.A. Research on Innovative Development of Miao Embroidery Intangible Cultural Heritage in Guizhou, China Based on Digital Design. J. Bus. Econ. Rev. (JBER) 2024, 9, 85–94. [Google Scholar] [CrossRef]
- Covarrubia, P. Geographical indications of traditional handicrafts: A cultural element in a predominantly economic activity. IIC-Int. Rev. Intellect. Prop. Compet. Law 2019, 50, 441–466. [Google Scholar] [CrossRef]
- Buonincontri, P.; Morvillo, A.; Okumus, F.; van Niekerk, M. Managing the experience co-creation process in tourism destinations: Empirical findings from Naples. Tour. Manag. 2017, 62, 264–277. [Google Scholar] [CrossRef]
- Liang, J. The application of artificial intelligence-assisted technology in cultural and creative product design. Sci. Rep. 2024, 14, 31069. [Google Scholar] [CrossRef]
- Wang, Y.; Zhou, Y. Artificial Intelligence-Driven Interactive Experience for Intangible Cultural Heritage: Sustainable Innovation of Blue Clamp-Resist Dyeing. Sustainability 2025, 17, 898. [Google Scholar] [CrossRef]
- Chandran, R.; Chon, H. Redefining heritage and cultural preservation through design: A framework for experience design. In Proceedings of the Congress of the International Association of Societies of Design Research, Hong Kong, China, 5–9 December 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 1246–1264. [Google Scholar]
- Chen, J.; Zheng, X.; Shao, Z.; Ruan, M.; Li, H.; Zheng, D.; Liang, Y. Creative interior design matching the indoor structure generated through diffusion model with an improved control network. Front. Archit. Res. 2025, 14, 614–629. [Google Scholar] [CrossRef]
- Sauer, A.; Boesel, F.; Dockhorn, T.; Blattmann, A.; Esser, P.; Rombach, R. Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation. In Proceedings of the SA ’24: SIGGRAPH Asia 2024 Conference Papers, Tokyo, Japan, 3–6 December 2024. [Google Scholar] [CrossRef]
- Bevacqua, A.; Singha, T.; Pham, D.S. Enhancing Semantic Segmentation with Synthetic Image Generation: A Novel Approach Using Stable Diffusion and ControlNet. In Proceedings of the 2024 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Perth, Australia, 27–29 November 2024; pp. 685–692. [Google Scholar]
- Hu, Y.; Zhuang, C.; Gao, P. DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer. In Proceedings of the 6th ACM International Conference on Multimedia in Asia, Auckland, New Zealand, 3–6 December 2024; p. 1. [Google Scholar]
- Wang, L.; Gao, B.; Li, Y.; Wang, Z.; Yang, X.; Clifton, D.A.; Xiao, J. Exploring the latent space of diffusion models directly through singular value decomposition. arXiv 2025, arXiv:2502.02225. [Google Scholar]
- Po, R.; Yifan, W.; Golyanik, V.; Aberman, K.; Barron, J.T.; Bermano, A.; Chan, E.; Dekel, T.; Holynski, A.; Kanazawa, A.; et al. State of the art on diffusion models for visual computing. Comput. Graph. Forum 2024, 43, e15063. [Google Scholar] [CrossRef]
- Yoo, H. Fine Tuning Text-to-Image Diffusion Models for Correcting Anomalous Images. arXiv 2024, arXiv:2409.16174. [Google Scholar]
- Xu, C.; Xu, Y.; Zhang, H.; Xu, X.; He, S. DreamAnime: Learning style-identity textual disentanglement for anime and beyond. IEEE Trans. Vis. Comput. Graph. 2024; early access. [Google Scholar] [CrossRef]
- Pascual, R.; Maiza, A.; Sesma-Sara, M.; Paternain, D.; Galar, M. Enhancing DreamBooth with LoRA for Generating Unlimited Characters With Stable Diffusion. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–8. [Google Scholar]
- Zhang, X. AI-Assisted Restoration of Yangshao Painted Pottery Using LoRA and Stable Diffusion. Heritage 2024, 7, 6282–6309. [Google Scholar] [CrossRef]
- Ma, Z.; Zhang, Y.; Jia, G.; Zhao, L.; Ma, Y.; Ma, M.; Liu, G.; Zhang, K.; Li, J.; Zhou, B. Efficient diffusion models: A comprehensive survey from principles to practices. arXiv 2024, arXiv:2410.11795. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Wang, K.; van de Weijer, J.; Khan, F.S.; Guo, C.L.; Yang, S.; Wang, Y.; Yang, J.; Cheng, M.M. InterLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration. arXiv 2025, arXiv:2502.02215. [Google Scholar]
- Gîrbacia, F. An Analysis of Research Trends for Using Artificial Intelligence in Cultural Heritage. Electronics 2024, 13, 3738. [Google Scholar] [CrossRef]
- Luo, S.; Tan, Y.; Patil, S.; Gu, D.; von Platen, P.; Passos, A.; Huang, L.; Li, J.; Zhao, H. Lcm-lora: A universal stable-diffusion acceleration module. arXiv 2023, arXiv:2311.05556. [Google Scholar]
- Du, L. How much deep learning does neural style transfer really need? An ablation study. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 1–5 March 2020; pp. 3150–3159. [Google Scholar]
- Chefer, H.; Alaluf, Y.; Vinker, Y.; Wolf, L.; Cohen-Or, D. Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models. ACM Trans. Graph. (TOG) 2023, 42, 1–10. [Google Scholar] [CrossRef]
- Zhang, L.; Rao, A.; Agrawala, M. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 3836–3847. [Google Scholar]
- Meng, C.; He, Y.; Song, Y.; Song, J.; Wu, J.; Zhu, J.Y.; Ermon, S. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv 2021, arXiv:2108.01073. [Google Scholar]
- Gal, R.; Alaluf, Y.; Atzmon, Y.; Patashnik, O.; Bermano, A.H.; Chechik, G.; Cohen-Or, D. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv 2022, arXiv:2208.01618. [Google Scholar]
- Ruiz, N.; Li, Y.; Jampani, V.; Pritch, Y.; Rubinstein, M.; Aberman, K. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22500–22510. [Google Scholar]
- Tiribelli, S.; Pansoni, S.; Frontoni, E.; Giovanola, B. Ethics of Artificial Intelligence for Cultural Heritage: Opportunities and Challenges. IEEE Trans. Technol. Soc. 2024, 5, 293–305. [Google Scholar] [CrossRef]
- Schuhmann, C.; Beaumont, R.; Vencu, R.; Gordon, C.; Wightman, R.; Cherti, M.; Coombes, T.; Katta, A.; Mullis, C.; Wortsman, M.; et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Adv. Neural Inf. Process. Syst. 2022, 35, 25278–25294. [Google Scholar]
- Kim, D.; Lai, C.H.; Liao, W.H.; Murata, N.; Takida, Y.; Uesaka, T.; He, Y.; Mitsufuji, Y.; Ermon, S. Consistency trajectory models: Learning probability flow ode trajectory of diffusion. arXiv 2023, arXiv:2310.02279. [Google Scholar]
- Mustafa, W.A.; Yazid, H.; Jaafar, M.; Zainal, M.; Abdul-Nasir, A.S.; Mazlan, N. A review of image quality assessment (iqa): Snr, gcf, ad, nae, psnr, me. J. Adv. Res. Comput. Appl. 2017, 7, 1–7. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 2016, 29. Available online: https://proceedings.neurips.cc/paper_files/paper/2016/file/8a3363abe792db2d8761d6403605aeb7-Paper.pdf (accessed on 8 June 2025).
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/file/8a1d694707eb0fefe65871369074926d-Paper.pdf (accessed on 8 June 2025).
- Brunet, D.; Vrscay, E.R.; Wang, Z. On the mathematical properties of the structural similarity index. IEEE Trans. Image Process. 2011, 21, 1488–1499. [Google Scholar] [CrossRef]
- Rombach, R.; Esser, P. Stable Diffusion v1.5: Open-Source Image Generation Model. 2023. Available online: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5 (accessed on 16 April 2025).
- Holz, D. MidJourney: AI Image Generation Platform. 2022. Available online: https://www.midjourney.com/ (accessed on 16 April 2025).
- OpenAI. DALL·E 3: AI Image Generation Model. Available online: https://openai.com/dall-e (accessed on 16 April 2025).
- ByteDance. Doubao: AI Image Generation Platform. 2023. Available online: https://www.doubao.ai (accessed on 16 April 2025).
- Bansal, G.; Nawal, A.; Chamola, V.; Herencsar, N. Revolutionizing visuals: The role of generative AI in modern image generation. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 1–22. [Google Scholar] [CrossRef]
Category | Bird | Butterfly | Composite | Flowers & Plants | Geometric | Human | Fish | Drum | Dragon | Total |
---|---|---|---|---|---|---|---|---|---|---|
Count | 166 | 77 | 92 | 50 | 53 | 62 | 58 | 66 | 25 | 649 |
Field | Value |
---|---|
Category | bird |
Elements | phoenix, floral motifs, leaves |
Description | A traditional Miao batik pattern depicting an elegant phoenix with elaborately detailed feather textures, gracefully interacting with stylized floral motifs and leaves, presented in black on a white background. |
Dominant Colors | white, black |
Symmetry | asymmetric |
Complexity | complex |
Traditional Symbols | phoenix |
Method | PSNR ↑ | SSIM ↑ | IS ↑ | FID ↓ | Time (s) ↓ | LPIPS ↓ | Params |
---|---|---|---|---|---|---|---|
Stable Diffusion | 18.97 | 0.476 | 2.32 | 117.74 | 1.324 | 0.751 | 865 |
+LoRA | 29.32 | 0.822 | 5.88 | 15.80 | 1.663 | 0.633 | 866 |
+LCM | 23.68 | 0.691 | 4.46 | 22.65 | 0.242 | 0.709 | 931 |
+LCM-LoRA | 28.96 | 0.808 | 5.69 | 16.31 | 0.337 | 0.658 | 932 |
Batik-MPDM | 29.10 | 0.815 | 5.75 | 15.90 | 0.335 | 0.652 | 933 |
Source | Sum of Squares | Degrees of Freedom (df) | F-Value | p-Value |
---|---|---|---|---|
Model | 2463.62 | 4 | 647.45 | <0.001 |
Category | 5.78 | 3 | 2.03 | 0.108 |
Dimension | 2.07 | 3 | 0.73 | 0.536 |
Group | 0.55 | 2 | 0.29 | 0.748 |
Participant | 19.18 | 14 | 1.44 | 0.125 |
Model A | Model B | Mean Diff | p-Value | 95% CI Lower | 95% CI Upper | Significance |
---|---|---|---|---|---|---|
Batik-MPDM | DALL·E 3 | −1.5825 | <0.001 | −1.6909 | −1.4741 | *** |
Batik-MPDM | Doubao | −1.6150 | <0.001 | −1.7234 | −1.5066 | *** |
Batik-MPDM | MidJourney | −1.5733 | <0.001 | −1.6817 | −1.4649 | *** |
Batik-MPDM | Stable Diffusion | −1.6333 | <0.001 | −1.7417 | −1.5249 | *** |
DALL·E 3 | Doubao | −0.0325 | 0.9252 | −0.1409 | 0.0759 | |
DALL·E 3 | MidJourney | 0.0092 | 0.9994 | −0.0992 | 0.1176 | |
DALL·E 3 | Stable Diffusion | −0.0508 | 0.7039 | −0.1592 | 0.0576 | |
Doubao | MidJourney | 0.0417 | 0.8325 | −0.0667 | 0.1501 | |
Doubao | Stable Diffusion | −0.0183 | 0.9907 | −0.1267 | 0.0901 | |
MidJourney | Stable Diffusion | −0.0600 | 0.5558 | −0.1684 | 0.0484 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, Q.; Peng, Y.; Xu, J.; Shao, Z.; Tian, Z.; Chen, J. Adaptive Stylized Image Generation for Traditional Miao Batik Using Style-Conditioned LCM-LoRA Enhanced Diffusion Models. Mathematics 2025, 13, 1947. https://doi.org/10.3390/math13121947
Hu Q, Peng Y, Xu J, Shao Z, Tian Z, Chen J. Adaptive Stylized Image Generation for Traditional Miao Batik Using Style-Conditioned LCM-LoRA Enhanced Diffusion Models. Mathematics. 2025; 13(12):1947. https://doi.org/10.3390/math13121947
Chicago/Turabian StyleHu, Qingqing, Yiran Peng, Jing Xu, Zichun Shao, Zhen Tian, and Junming Chen. 2025. "Adaptive Stylized Image Generation for Traditional Miao Batik Using Style-Conditioned LCM-LoRA Enhanced Diffusion Models" Mathematics 13, no. 12: 1947. https://doi.org/10.3390/math13121947
APA StyleHu, Q., Peng, Y., Xu, J., Shao, Z., Tian, Z., & Chen, J. (2025). Adaptive Stylized Image Generation for Traditional Miao Batik Using Style-Conditioned LCM-LoRA Enhanced Diffusion Models. Mathematics, 13(12), 1947. https://doi.org/10.3390/math13121947