Advancing Traditional Dunhuang Regional Pattern Design with Diffusion Adapter Networks and Cross-Entropy
Abstract
:1. Introduction
- (1)
- We propose a new model, DANet, which integrates textual input with sketch information to efficiently generate Dunhuang patterns. The multihead attention mechanism in the attention adapter processes multiple subspaces in parallel, with each subspace focusing on different features. This allows the model to capture both global structures and fine-grained details simultaneously, enhancing the accuracy and precision of the generated patterns.
- (2)
- In order to extract multiscale feature information of a Dunhuang image, the MSAM module simultaneously considers features at different resolutions to provide a more comprehensive image understanding. It not only recognizes large contours and shapes but also captures small details and textures.
- (3)
- The ACM module adaptively adjusts the generation guidance coefficients of different feature layers. This dynamic balancing improves the contribution of different feature layers, improving both the accuracy and efficiency of the generation process.
- (4)
- For the Dunhuang pattern generation task, cross-entropy loss is introduced as an auxiliary supervision signal to enhance the semantic understanding capability of the attention adapter. This compensates for the limited semantic learning caused by freezing the parameters of the diffusion model. As a result, the generated Dunhuang images exhibit improved semantic consistency and accuracy.
2. Related Work
2.1. Generating Adversarial Network (GAN)
2.2. Diffusion Model
2.3. Application of Artificial Intelligence in Traditional Culture Preservation and Design
3. Methodology
3.1. Overall Structure of DANet
3.2. Attention Adapter
3.3. Cross-Entropy Loss
4. Experiments
4.1. Dataset and Metrics
4.2. Ablation Experiments
4.3. Comparative Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, Z. (HTBNet) Arbitrary Shape Scene Text Detection with Binarization of Hyperbolic Tangent and Cross-Entropy. Entropy 2024, 26, 560. [Google Scholar] [CrossRef] [PubMed]
- Chen, Z.; Yi, Y.; Gan, C.; Tang, Z.; Kong, D. Scene Chinese Recognition with Local and Global Attention. Pattern. Recognit. 2025, 158, 111013. [Google Scholar] [CrossRef]
- Zhu, J.Y.; Zhang, R.; Pathak, D.; Darrell, T.; Efros, A.A.; Wang, O.; Shechtman, E. Toward multimodal image-to-image translation. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Zhang, Z.; Xie, Y.; Yang, L. Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognitionm, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Park, T.; Liu, M.Y.; Wang, T.C.; Zhu, J.Y. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- DeVries, T.; Romero, A.; Pineda, L.; Taylor, G.W.; Drozdzal, M. On the evaluation of conditional GANs. arXiv 2019, arXiv:1907.08175. [Google Scholar]
- Shen, Y.; Liang, J.; Lin, M.C. GAN-based garment generation using sewing pattern images. In Computer Vision–ECCV 2020, Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings; Part XVIII 16; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
- Sha, S.; Wei, W.T.; Li, Q.; Li, B.; Tao, H.; Jiang, X.W. Textile image restoration of Chu tombs based on deep learning. J. Silk 2023, 60, 1–7. [Google Scholar]
- Chen, Z. Graph Adaptive Attention Network with Cross-Entropy. Entropy 2024, 26, 576. [Google Scholar] [CrossRef] [PubMed]
- Orteu, J.-J.; Garcia, D.; Robert, L.; Bugarin, F. A speckle texture image generator. In Proceedings of the Speckle06: Speckles, from Grains to Flowers, Nimes, France, 13–15 September 2006; Volume 6341. [Google Scholar]
- Adibah, N.; Noor, N.M.; Suaib, N.M. Facial Expression Transfer using Generative Adversarial Network: A Review. IOP Conf. Ser. Mater. Sci. Eng. 2020, 864, 012077. [Google Scholar] [CrossRef]
- Yan, B.; Zhang, L.; Zhang, J.; Xu, Z. Image Generation Method for Adversarial Network Based on Residual Structure. Laser Optoelectron. Prog. 2020, 57, 181504. [Google Scholar] [CrossRef]
- Denton, E.L.; Chintala, S.; Fergus, R. Deep Generative Image Models Using a Laplacian Pyramid of Adversarial Networks. In Proceedings of the 29th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; MIT Press: Cambridge, MA, USA, 2015; Volume 1, pp. 1486–1494. [Google Scholar]
- Belén, V.-M.; Rubio-Escudero, C.; Nepomuceno-Chamorro, I. Generation of synthetic data with conditional generative adversarial networks. Log. J. IGPL 2022, 30, 252–262. [Google Scholar]
- Zhang, H.; Xu, T.; Li, H.; Zhang, S.; Wang, X.; Huang, X.; Metaxas, D. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. arXiv 2019, arXiv:1812.04948. [Google Scholar]
- Brock, A. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
- Nichol, A.Q.; Dhariwal, P. Improved denoising diffusion probabilistic models. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021. [Google Scholar]
- Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
- Nichol, A.; Dhariwal, P.; Ramesh, A.; Shyam, P.; Mishkin, P.; McGrew, B.; Sutskever, I.; Chen, M. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv 2021, arXiv:2112.10741. [Google Scholar]
- Chen, Z. Arbitrary Shape Text Detection with Discrete Cosine Transform and CLIP for Urban Scene Perception in ITS. IEEE Trans. Intell. Transp. Syst. 2025. early access. [Google Scholar] [CrossRef]
- Avrahami, O.; Lischinski, D.; Fried, O. Blended diffusion for text-driven editing of natural images. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Voss, A.; Voss, J. Fast-dm: A free program for efficient diffusion model analysis. Behav. Res. Methods 2007, 39, 767–775. [Google Scholar] [CrossRef] [PubMed]
- Wagenmakers, E.J.; Van Der Maas, H.L.; Grasman, R.P. An EZ-diffusion model for response time and accuracy. Psychon. Bull. Rev. 2007, 14, 3–22. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Liu, Y.; Lian, L.; Yang, H.; Dong, Z.; Kang, D.; Zhang, S.; Keutzer, K. Q-diffusion: Quantizing diffusion models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023. [Google Scholar]
- Borji, A. Generated faces in the wild: Quantitative comparison of stable diffusion, midjourney and dall-e 2. arXiv 2022, arXiv:2210.00586. [Google Scholar]
- Eyadah, H.; Tawfiqe, A.; Odaibat, A.A. A Forward-Looking Vision to Employ Artificial Intelligence to Preserve Cultural Heritage. Humanities 2024, 12, 109–114. [Google Scholar] [CrossRef]
- Gaber, J.A.; Youssef, S.M.; Fathalla, K.M. The role of artificial intelligence and machine learning in preserving cultural heritage and art works via virtual restoration. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 10, 185–190. [Google Scholar] [CrossRef]
- Dandan, Z.; Lin, Z. Research on innovative applications of AI technology in the field of cultural heritage conservation. Acad. J. Humanit. Soc. Sci. 2024, 7, 111–120. [Google Scholar]
- Sun, D. Application of traditional culture in intelligent advertising design system in the internet era. Sci. Program. 2022, 2022, 7596991. [Google Scholar] [CrossRef]
- Winiarti, S.W.; Sunardi, S.; Ahdiani, U.; Pranolo, A. Tradition Meets Modernity: Learning Traditional Building using Artificial Intelligence. Asian J. Univ. Educ. 2022, 18, 375–385. [Google Scholar]
- Wu, H. Innovation of Traditional Culture Development Model in Advertising Design Based on Artificial Intelligence. Available online: https://www.clausiuspress.com/article/3440.html (accessed on 14 May 2025).
- Hu, Y. Research on the Design Method of Traditional Decorative Patterns of Ethnic Minorities under the Trend of AIGC. J. Electron. Inf. Sci. 2023, 8, 58–62. [Google Scholar]
- Hang, W.; Alli, H.; Hawari, N.; Wang, W. Artificial Intelligence in Packaging Design: Integrating Traditional Chinese Cultural Elements for Cultural Preservation and Innovation. Int. J. Acad. Res. Bus. Soc. Sci. 2024, 1826–1836. [Google Scholar] [CrossRef]
- Wu, S. Application of Chinese traditional elements in furniture design based on wireless communication and artificial intelligence decision. Wirel. Commun. Mob. Comput. 2022, 2022, 7113621. [Google Scholar] [CrossRef]
Sketch | Text | Output |
---|---|---|
Flying fairies with a blend of lotus flowers and auspicious clouds. |
Sketch | Text | Statement of Design Intent |
---|---|---|
A lotus in full bloom, blended with auspicious clouds and flames, symbolizing euphoria and prosperity | The image tries to capture the “aesthetics of the bloom”, reflecting the dynamics of the blossom. | |
A white deer with long antlers all the time, surrounded by flowing clouds, symbolizing good luck and beautyThis image shows a mythical deer with an elegant stance and long antlers. | Trying to capture the “aesthetics of running”, reflecting the momentum of jumping. | |
A lifelike lotus flower whose petals are partially fused with auspicious clouds, symbolizing good luck and good fortune. | The image attempts to capture the “dynamic aesthetics of the bloom,” reflecting the dynamism of the lotus bloom. |
Combination | MHAM | MSAM | ACM | Cross-Entropy Loss | L1 Loss | CLIP Score ↑ | CLIP-I↑ |
---|---|---|---|---|---|---|---|
None | × | × | × | √ | × | 0.473 | 0.649 |
MHAM | √ | × | × | √ | 0.485 | 0.701 | |
MSAM | × | √ | × | √ | 0.499 | 0.697 | |
MSAM-ACM | × | √ | √ | √ | 0.505 | 0.699 | |
MHAM-MSAM | √ | √ | × | √ | 0.514 | 0.758 | |
DANet (L1 Loss) | √ | √ | √ | √ | 0.521 | 0.745 | |
DANet | √ | √ | √ | √ | 0.532 | 0.774 |
Methods | Number of Training Parameters | LPIPS-Alex ↓ | CLIP Score ↑ | CLIP-I ↑ | FPS |
---|---|---|---|---|---|
GAN | 23M | 0.532 | 0.398 | 0.649 | 12.5 |
LAPGAN | 28M | 0.568 | 0.431 | 0.656 | 8.7 |
StackGAN++ | 44M | 0.612 | 0.463 | 0.701 | 4.9 |
PGGAN | 37M | 0.633 | 0.482 | 0.698 | 5.6 |
StyleGAN | 62M | 0.691 | 0.501 | 0.675 | 3.4 |
BigGAN | 73M | 0.704 | 0.483 | 0.712 | 2.2 |
CGAN | 91M | 0.729 | 0.512 | 0.734 | 1.8 |
Diffusion | 845M | 0.834 | 0.479 | 0.672 | 0.52 |
LoRA-Baseline | 12M | 0.513 | 0.512 | 0.751 | 0.50 |
DANet (Ours) | 94M | 0.498 | 0.532 | 0.774 | 0.42 |
Sketch | Text | PGGAN | LoRA-Baseline | DANet |
---|---|---|---|---|
A Dunhuang-style lotus pattern with cloud motifs, viewed from the front. | ||||
Two dancing fairies in Dunhuang mural style holding hands above a lotus flower. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tian, Y.; Yu, T.; Cheng, Z.; Lee, S. Advancing Traditional Dunhuang Regional Pattern Design with Diffusion Adapter Networks and Cross-Entropy. Entropy 2025, 27, 546. https://doi.org/10.3390/e27050546
Tian Y, Yu T, Cheng Z, Lee S. Advancing Traditional Dunhuang Regional Pattern Design with Diffusion Adapter Networks and Cross-Entropy. Entropy. 2025; 27(5):546. https://doi.org/10.3390/e27050546
Chicago/Turabian StyleTian, Yihuan, Tao Yu, Zuling Cheng, and Sunjung Lee. 2025. "Advancing Traditional Dunhuang Regional Pattern Design with Diffusion Adapter Networks and Cross-Entropy" Entropy 27, no. 5: 546. https://doi.org/10.3390/e27050546
APA StyleTian, Y., Yu, T., Cheng, Z., & Lee, S. (2025). Advancing Traditional Dunhuang Regional Pattern Design with Diffusion Adapter Networks and Cross-Entropy. Entropy, 27(5), 546. https://doi.org/10.3390/e27050546