A Study on the Generation and Evaluation of Illustrations for Chinese Idiom Allusions Based on AIGC
Abstract
1. Introduction
2. Related Work
2.1. Generative Models and Semantic Alignment
2.2. Sentiment Analysis
2.3. Aesthetic Evaluation
3. Methods
3.1. Prompt Generation
3.2. “Truth” Dimension: Cultural Symbols
3.3. “Goodness” Dimension: Affective Ontology
3.4. “Beauty” Dimension: Visual Aesthetics
3.4.1. Rule-Based Illustration Aesthetics
3.4.2. Deep-Learning-Based Assessment
3.5. System Implementation
4. Experiments
4.1. Illustration Generation
4.2. Determination of Optimal Thresholds
4.2.1. Quantitative Computation
4.2.2. Qualitative Analysis
4.2.3. Threshold and Model Selection
4.3. Validation of the Evaluation Framework
4.3.1. Validity, Objectivity, and Generalization Capability
4.3.2. Ablation Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix A.1
| Idiom | Model | Cultural Symbols (%) | Affective Ontology (%) | Visual Aesthetics (%) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Macro-Level Semantic Consistency | Micro-Level Symbol Similarity | Overall Score | First-level Affective Polarity | Second-Level Sub-Emotions | Overall Score | Rule-Based Illustration Aesthetics | Deep-Learning Based Assessment | Overall Score | |||||
| Composition Balance | Color Harmony | Line Expressiveness | |||||||||||
| hua she tian zu | Doubao | 82.6 | 72.8 | 77.7 | 90.4 | 51.2 | 70.8 | 67.4 | 84.0 | 72.8 | 67.5 | 71.1 | |
| GPT-4 | 81.7 | 70.5 | 76.1 | 88.6 | 52.2 | 70.4 | 62.3 | 81.0 | 96.0 | 72.1 | 75.9 | ||
| Midjourney | 55.7 | 45.6 | 50.7 | 78.9 | 47.9 | 63.4 | 63.6 | 88.3 | 77.4 | 70.6 | 73.5 | ||
| jing di zhi wa | Doubao | 87.6 | 72.1 | 79.8 | 88.0 | 53.0 | 70.5 | 63.6 | 82.3 | 84.2 | 72.8 | 74.8 | |
| GPT-4 | 85.6 | 75.0 | 80.3 | 89.7 | 50.1 | 69.9 | 59.3 | 83.0 | 69.7 | 69.2 | 69.9 | ||
| Midjourney | 93.8 | 70.1 | 82.0 | 85.2 | 52.0 | 68.6 | 64.8 | 81.7 | 87.3 | 70.7 | 74.3 | ||
| dui niu tan qin | Doubao | 93.9 | 68.9 | 81.4 | 93.2 | 50.0 | 71.6 | 63.9 | 82.9 | 89.5 | 67.7 | 73.2 | |
| GPT-4 | 87.3 | 73.8 | 80.6 | 95.9 | 48.9 | 72.4 | 57.1 | 79.9 | 94.2 | 78.6 | 77.8 | ||
| Midjourney | 94.2 | 68.6 | 81.4 | 82.6 | 55.4 | 69.0 | 59.4 | 82.6 | 94.5 | 72.2 | 75.5 | ||
| shui zhong lao yue | Doubao | 100.0 | 69.7 | 84.9 | 89.7 | 44.7 | 67.2 | 69.6 | 81.2 | 86.2 | 66.3 | 72.7 | |
| GPT-4 | 93.4 | 70.0 | 81.7 | 86.9 | 48.5 | 67.7 | 58.1 | 75.3 | 91.2 | 63.9 | 69.4 | ||
| Midjourney | 100.0 | 69.4 | 84.7 | 79.8 | 50.8 | 65.3 | 61.6 | 80.8 | 96.5 | 70.4 | 75.0 | ||
| shou zhu dai tu | Doubao | 98.6 | 62.9 | 80.7 | 93.5 | 45.5 | 69.5 | 63.8 | 76.5 | 70.4 | 74.8 | 72.5 | |
| GPT-4 | 97.0 | 63.0 | 80.0 | 92.1 | 47.5 | 69.8 | 55.8 | 85.9 | 98.3 | 69.1 | 74.6 | ||
| Midjourney | 87.0 | 62.0 | 74.5 | 78.4 | 55.2 | 66.8 | 65.6 | 84.8 | 82.0 | 71.1 | 74.3 | ||
| zao bi tou guang | Doubao | 99.0 | 60.1 | 79.5 | 97.3 | 49.3 | 73.3 | 70.6 | 81.9 | 84.6 | 71.2 | 75.1 | |
| GPT-4 | 96.9 | 65.5 | 81.2 | 96.5 | 50.5 | 73.5 | 59.4 | 75.6 | 72.2 | 70.0 | 69.5 | ||
| Midjourney | 88.9 | 56.3 | 72.6 | 85.7 | 52.9 | 69.3 | 62.1 | 82.4 | 70.4 | 71.3 | 71.5 | ||
| wen ji qi wu | Doubao | 91.7 | 56.5 | 74.1 | 94.9 | 46.3 | 70.6 | 59.0 | 87.1 | 72.4 | 74.4 | 73.6 | |
| GPT-4 | 87.5 | 60.7 | 74.1 | 92.3 | 49.9 | 71.1 | 61.1 | 82.5 | 99.3 | 76.7 | 78.8 | ||
| Midjourney | 92.4 | 61.1 | 76.8 | 79.5 | 51.9 | 65.7 | 56.5 | 80.4 | 99.2 | 68.9 | 73.8 | ||
| zhi lu wei ma | Doubao | 92.8 | 64.6 | 78.7 | 95.4 | 46.0 | 70.7 | 73.8 | 82.4 | 80.9 | 62.5 | 70.8 | |
| GPT-4 | 97.1 | 70.0 | 83.5 | 95.9 | 47.9 | 71.9 | 60.7 | 82.2 | 99.3 | 70.0 | 75.4 | ||
| Midjourney | 93.7 | 63.3 | 78.5 | 83.2 | 54.2 | 68.7 | 61.2 | 81.0 | 99.3 | 74.5 | 77.5 | ||
| fu jing qing zui | Doubao | 85.1 | 59.7 | 72.4 | 92.5 | 48.1 | 70.3 | 68.4 | 87.6 | 71.5 | 74.6 | 75.2 | |
| GPT-4 | 86.0 | 61.2 | 73.6 | 90.8 | 49.2 | 70.5 | 60.3 | 82.4 | 99.3 | 69.9 | 75.3 | ||
| Midjourney | 64.0 | 49.8 | 56.9 | 76.3 | 51.9 | 64.1 | 59.7 | 84.3 | 81.4 | 76.2 | 75.7 | ||
| zhi shang tan bing | Doubao | 85.6 | 63.7 | 74.6 | 89.2 | 41.6 | 65.4 | 62.9 | 82.3 | 99.2 | 64.9 | 73.2 | |
| GPT-4 | 82.8 | 66.4 | 73.6 | 86.9 | 48.5 | 67.7 | 59.2 | 78.0 | 99.3 | 64.9 | 71.9 | ||
| Midjourney | 85.5 | 58.4 | 72.0 | 78.9 | 50.3 | 64.6 | 61.3 | 82.0 | 99.3 | 71.3 | 76.1 | ||
| kua fu zhu ri | Doubao | 100.0 | 70.2 | 85.1 | 95.5 | 47.5 | 71.5 | 74.8 | 75.7 | 70.1 | 74.9 | 74.2 | |
| GPT-4 | 97.6 | 75.7 | 86.7 | 93.8 | 46.6 | 70.2 | 62.5 | 76.5 | 88.6 | 66.5 | 71.2 | ||
| Midjourney | 96.5 | 70.6 | 83.6 | 82.1 | 54.5 | 68.3 | 65.0 | 80.8 | 95.6 | 68.6 | 74.5 | ||
| nv wa bu tian | Doubao | 90.3 | 71.0 | 80.6 | 94.2 | 46.2 | 70.2 | 75.8 | 79.6 | 84.9 | 62.2 | 71.2 | |
| GPT-4 | 86.7 | 79.3 | 83.0 | 92.4 | 48.6 | 70.5 | 58.4 | 76.8 | 95.7 | 69.6 | 73.3 | ||
| Midjourney | 89.7 | 68.8 | 79.2 | 78.3 | 53.5 | 65.9 | 60.7 | 87.3 | 97.8 | 70.6 | 76.3 | ||
| hou yi she ri | Doubao | 84.1 | 66.2 | 75.1 | 93.6 | 47.2 | 70.4 | 61.9 | 85.6 | 80.3 | 76.2 | 76.1 | |
| GPT-4 | 91.0 | 78.2 | 84.6 | 90.9 | 48.7 | 69.8 | 58.2 | 74.3 | 99.1 | 69.2 | 73.2 | ||
| Midjourney | 100.0 | 78.0 | 89.0 | 96.0 | 45.0 | 70.5 | 60.6 | 87.6 | 99.3 | 69.5 | 76.0 | ||
| jing wei tian hai | Doubao | 90.4 | 68.2 | 79.3 | 91.8 | 44.6 | 68.2 | 63.4 | 82.9 | 84.1 | 75.7 | 76.3 | |
| GPT-4 | 86.5 | 67.7 | 77.1 | 88.7 | 46.3 | 67.5 | 61.9 | 85.1 | 98.4 | 68.1 | 75.0 | ||
| Midjourney | 94.2 | 63.7 | 78.9 | 82.9 | 53.9 | 68.4 | 53.4 | 81.9 | 88.2 | 67.1 | 70.8 | ||
| chang e ben yue | Doubao | 96.8 | 82.0 | 89.4 | 97.3 | 45.9 | 71.6 | 65.9 | 86.0 | 69.7 | 68.9 | 71.4 | |
| GPT-4 | 85.6 | 80.2 | 82.9 | 93.6 | 46.2 | 69.9 | 58.8 | 81.9 | 93.2 | 77.2 | 77.6 | ||
| Midjourney | 89.5 | 78.7 | 84.1 | 81.7 | 53.3 | 67.5 | 62.7 | 87.0 | 75.9 | 74.2 | 74.7 | ||
| xiong you cheng zhu | Doubao | 100.0 | 61.8 | 80.9 | 95.2 | 46.6 | 70.9 | 67.1 | 83.0 | 96.7 | 62.6 | 72.4 | |
| GPT-4 | 87.6 | 66.9 | 77.3 | 90.5 | 47.9 | 69.2 | 60.2 | 80.4 | 99.3 | 67.5 | 73.7 | ||
| Midjourney | 100.0 | 66.8 | 83.4 | 83.6 | 54.4 | 69.0 | 61.4 | 84.3 | 98.7 | 69.9 | 75.7 | ||
| zhuan xin zhi zhi | Doubao | 89.5 | 64.1 | 76.8 | 89.6 | 43.8 | 66.7 | 69.4 | 84.4 | 85.2 | 72.9 | 76.3 | |
| GPT-4 | 98.0 | 62.8 | 80.4 | 95.3 | 45.5 | 70.4 | 59.4 | 79.4 | 99.0 | 64.1 | 71.7 | ||
| Midjourney | 90.2 | 60.1 | 75.1 | 77.2 | 50.4 | 63.8 | 63.0 | 85.3 | 98.6 | 70.3 | 76.3 | ||
| xue hai wu ya | Doubao | 98.0 | 68.2 | 83.1 | 94.9 | 45.5 | 70.2 | 60.1 | 78.3 | 92.0 | 69.4 | 73.1 | |
| GPT-4 | 88.1 | 73.4 | 80.8 | 95.5 | 47.5 | 71.5 | 56.9 | 84.8 | 99.3 | 69.6 | 75.0 | ||
| Midjourney | 86.5 | 55.9 | 71.2 | 80.1 | 50.3 | 65.2 | 60.7 | 84.4 | 99.3 | 71.1 | 76.3 | ||
| bu chi xia wen | Doubao | 84.9 | 69.0 | 76.9 | 92.3 | 47.3 | 69.8 | 65.2 | 86.3 | 96.5 | 72.3 | 77.5 | |
| GPT-4 | 81.6 | 69.6 | 75.6 | 94.3 | 46.7 | 70.5 | 61.7 | 81.6 | 99.3 | 70.6 | 75.7 | ||
| Midjourney | 81.4 | 68.4 | 74.9 | 81.6 | 50.6 | 66.1 | 62.3 | 85.3 | 99.3 | 67.9 | 75.1 | ||
| shu neng sheng qiao | Doubao | 95.8 | 75.7 | 85.8 | 91.5 | 45.3 | 68.4 | 70.9 | 84.4 | 83.2 | 63.4 | 71.5 | |
| GPT-4 | 81.6 | 69.6 | 80.8 | 89.8 | 47.4 | 68.6 | 63.0 | 80.7 | 98.3 | 70.1 | 75.4 | ||
| Midjourney | 86.0 | 73.1 | 79.6 | 82.9 | 49.5 | 66.2 | 68.6 | 84.3 | 97.6 | 78.6 | 81.1 | ||
| average | Doubao | 79.8 | 69.9 | 73.5 | |||||||||
| GPT-4 | 79.7 | 70.2 | 74.1 | ||||||||||
| Midjourney | 76.5 | 66.8 | 75.2 | ||||||||||
Appendix A.2
| Title 1 | Title 2 |
|---|---|
| “Truth” dimension: Cultural Symbols | This image is consistent with the Chinese idiom/story as I understand it. |
| I can discern elements of traditional Chinese culture in this image. | |
| This image helps deepen my understanding of the particular culture. | |
| The image communicates respect for, and a profound understanding of, the idiom’s background. | |
| “Goodness” Dimension: Affective Ontology | This image evokes a strong emotional response in me. |
| The image effectively conveys a distinct emotional tone. | |
| The image’s atmosphere evokes deep associations or personal memories. | |
| The image’s visual elements are congruent with its emotional theme. | |
| “Beauty” Dimension: Visual Aesthetics | I find this image highly visually appealing. |
| The color palette of the image is harmonious. | |
| The overall composition is balanced, and the visual focus is clear. | |
| The use of line in the image is fluid and expressive. |
References
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In Proceedings of the NIPS, Vancouver, BC, Canada, 6–12 December 2020; pp. 1–12. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the ICML, Virtual, 18–24 July 2021. [Google Scholar]
- Huang, J.; Yang, D. Culturally aware natural language inference. In Proceedings of the EMNLP, Singapore, 6–10 December 2023. [Google Scholar]
- Huang, Y.; Fan, Z.; He, Z.; Polisetty, S.; Li, W.; Fung, Y.R. Culture CLIP: Empowering CLIP with cultural awareness through synthetic images and contextualized captions. In Proceedings of the Second Conference on Language Modeling, Montreal, QC, Canada, 7–10 October 2025. [Google Scholar]
- Stein, G.; Cresswell, J.C.; Hosseinzadeh, R.; Sui, Y.; Ross, B.L.; Villecroze, V.; Liu, Z.; Caterini, A.L.; Taylor, J.E.T.; Loaiza-Ganem, G. Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models. In Proceedings of the NIPS, New Orleans, LA, USA, 10–16 December 2023; pp. 1–53. [Google Scholar]
- Elsharif, W.; Agus, M.; Alzubaidi, M.; She, J. Cultural Relevance Index: Measuring Cultural Relevance in AI-Generated Images. In Proceedings of the IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 7–9 August 2024; pp. 410–416. [Google Scholar]
- Johnson, N.; Sudharsan, D.; Hamna; Dalal, S.; Holroyd, T.; Thieme, A.; Heidari, H.; Massiceti, D.; Vaughan, J.W.; Morrison, C. Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics. arXiv 2026, arXiv:2604.02406. [Google Scholar]
- Zhang, L.; Rao, A.; Agrawala, M. Adding conditional control to text-to-image diffusion models. In Proceedings of the ICCV, Paris, France, 1–6 October 2023; pp. 3813–3824. [Google Scholar] [CrossRef]
- Yang, A.; Pan, J.; Lin, J.; Men, R.; Zhang, Y.; Zhou, J.; Zhou, C. Chinese CLIP: Contrastive vision-language pretraining in Chinese. arXiv 2022, arXiv:2211.01335. [Google Scholar]
- Wu, X.; Zhang, D.; Gan, R.; Lu, J.; Wu, Z.; Sun, R.; Zhang, J.; Zhang, P.; Song, Y. Taiyi-Diffusion-XL: Advancing bilingual text-to-image generation with large vision-language model support. arXiv 2024, arXiv:2401.14688. [Google Scholar]
- Jeong, S.; Choi, I.; Yun, Y.; Kim, J. Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement. In Proceedings of the NAACL-HLT, Albuquerque, NM, USA, 29 April–4 May 2025; pp. 9543–9573. [Google Scholar] [CrossRef]
- Borth, D.; Ji, R.; Chen, T.; Breuel, T.; Chang, S.-F. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the ACM Multimedia, Barcelona, Spain, 21–25 October 2013; pp. 223–232. [Google Scholar] [CrossRef]
- Yang, J.; Feng, J.; Huang, H. EmoGen: Emotional image content generation with text-to-image diffusion models. In Proceedings of the CVPR, Seattle, WA, USA, 16–22 June 2024; pp. 6358–6368. [Google Scholar] [CrossRef]
- Dang, S.; He, Y.; Ling, L.; Qian, Z.; Zhao, N.; Cao, N. EmotiCrafter: Text-to-emotional-image generation based on valence-arousal model. arXiv 2025, arXiv:2501.05710. [Google Scholar]
- Yang, J.; Feng, J.; Luo, W.; Lischinski, D.; Cohen-Or, D.; Huang, H. EmoEdit: Evoking emotions through image manipulation. In Proceedings of the CVPR, Nashville, TN, USA, 10–17 June 2025; pp. 24690–24699. [Google Scholar] [CrossRef]
- Paskaleva, R.; Holubakha, M.; Ilic, A.; Motamed, S.; Van Gool, L.; Paudel, D. A unified and interpretable emotion representation and expression generation. In Proceedings of the CVPR, Seattle, WA, USA, 16–22 June 2024; pp. 2447–2456. [Google Scholar] [CrossRef]
- Murray, N.; Marchesotti, L.; Perronnin, F. AVA: A large-scale database for aesthetic visual analysis. In Proceedings of the CVPR, Providence, RI, USA, 16–21 June 2012; pp. 2408–2415. [Google Scholar] [CrossRef]
- Kong, S.; Shen, X.; Lin, Z.; Měch, R.; Fowlkes, C. Photo aesthetics ranking network with attributes and content adaptation. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; Volume 9905, pp. 662–679. [Google Scholar] [CrossRef]
- Talebi, H.; Milanfar, P. NIMA: Neural image assessment. IEEE Trans. Image Process. 2018, 27, 3998–4011. [Google Scholar] [CrossRef] [PubMed]
- Gu, Y.; Xu, S.; Tang, M.; Dong, J. AI supported computer-generated pen-and-ink illustration. In Proceedings of the CSCWD, London, ON, Canada, 14 July 2001; pp. 227–231. [Google Scholar] [CrossRef]
- Peirce, C.S. Logic as semiotic: The theory of signs. In Philosophical Writings of Peirce; Buchler, J., Ed.; Dover Publications: New York, NY, USA, 1940; pp. 98–119. [Google Scholar]
- Hessel, J.; Holtzman, A.; Forbes, M.; Le Bras, R.; Choi, Y. CLIPScore: A reference-free evaluation metric for image captioning. In Proceedings of the EMNLP, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 7514–7528. [Google Scholar] [CrossRef]
- Xu, L.H.; Lin, H.F.; Pan, Y.; Ren, H.; Chen, J.M. Constructing the affective lexicon ontology. J. China Soc. Sci. Tech. Inf. 2008, 27, 180–185. [Google Scholar] [CrossRef]
- Zhao, S.; Yao, X.; Yang, J.; Jia, G.; Ding, G.; Chua, T.-S.; Schuller, B.W.; Keutzer, K. Affective image content analysis: Two decades review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6729–6751. [Google Scholar] [CrossRef]
- Xu, X.; Wang, T.; Yang, Y.; Zuo, L.; Shen, F.; Shen, H.T. Cross-modal attention with semantic consistence for image–text matching. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 5412–5425. [Google Scholar] [CrossRef]
- Picard, R.W. Affective computing: From laughter to IEEE. IEEE Trans. Affect. Comput. 2010, 1, 11–17. [Google Scholar] [CrossRef]
- Sharma, S.; Ramaneswaran, S.; Akhtar, M.S.; Chakraborty, T. Emotion-aware multimodal fusion for meme emotion detection. IEEE Trans. Affect. Comput. 2024, 15, 1800–1811. [Google Scholar] [CrossRef]
- Liang, Z.; Li, H.; Zhang, R.; Liu, X. Non-uniform circular-structured loss inspired by psychology for image emotion recognition. Multimed. Syst. 2024, 30, 346. [Google Scholar] [CrossRef]
- Lee, G.; Yi, S.; Lee, J. A study on deep learning performances of identifying images’ emotion: Comparing performances of three algorithms to analyze fashion items. Appl. Sci. 2025, 15, 3318. [Google Scholar] [CrossRef]
- Zeki, S. Clive Bell’s “Significant Form” and the neurobiology of aesthetics. Front. Hum. Neurosci. 2013, 7, 730. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Miao, Y.; Yu, J. A comprehensive survey on computational aesthetic evaluation of visual art images: Metrics and challenges. IEEE Access 2021, 9, 77164–77187. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Canny, J. A computational app.roach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
- Itti, L.; Koch, C.; Niebur, E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef]
- Luo, Y.; Tang, X. Photo and video quality evaluation: Focusing on the subject. In Proceedings of the ECCV; Forsyth, D., Torr, P., Zisserman, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 1–14. [Google Scholar] [CrossRef]
- Tversky, B.; Hemenway, K. Objects, parts, and categories. J. Exp. Psychol. Gen. 1984, 113, 169–193. [Google Scholar] [CrossRef] [PubMed]
- Moon, P.H.; Spencer, D.E. Geometric formulation of classical color harmony. J. Opt. Soc. Am. 1944, 34, 46–59. [Google Scholar] [CrossRef]
- MacQueen, J.B. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Le Cam, L.M., Neyman, J., Eds.; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
- Sharma, G.; Wu, W.; Dalal, E.N. The CIEDE2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observations. Color Res. Appl. 2005, 30, 21–30. [Google Scholar] [CrossRef]
- Cohen-Or, D.; Sorkine, O.; Gal, R.; Leyvand, T.; Xu, Y.-Q. Color harmonization. ACM Trans. Graph. 2006, 25, 624–630. [Google Scholar] [CrossRef]
- O’Donovan, P.; Agarwala, A.; Hertzmann, A. Color compatibility from large datasets. ACM Trans. Graph. 2011, 30, 63. [Google Scholar] [CrossRef]
- Xie, S.; Tu, Z. Holistically-nested edge detection. In Proceedings of the ICCV, Santiago, Chile, 7–13 December 2015; pp. 1395–1403. [Google Scholar] [CrossRef]
- Lowe, D.G. Organization of smooth image curves at multiple scales. In Proceedings of the International Conference on Computer Vision, Tampa, FL, USA, 5–8 December 1988; pp. 558–567. [Google Scholar] [CrossRef]
- Martin, D.R.; Fowlkes, C.C.; Malik, J. Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 530–549. [Google Scholar] [CrossRef] [PubMed]
- Hager, M.; Hagemann, D.; Danner, D.; Schankin, A. Assessing aesthetic app.reciation of visual artworks—The construction of the Art Reception Survey (ARS). Psychol. Aesthet. Creat. Arts 2012, 6, 320–333. [Google Scholar] [CrossRef]










| Category | Idioms (Chinese—English) |
|---|---|
| Fable | 画蛇添足 (hua she tian zu, “to draw a snake and add feet”); 井底之蛙 (jing di zhi wa, “a frog at the bottom of a well”); 对牛弹琴 (dui niu tan qin, “playing lute to a cow”); 水中捞月 (shui zhong lao yue, “fishing for the moon”); 守株待兔 (shou zhu dai tu, “waiting by a stump for a rabbit”) |
| Historical Anecdote | 凿壁偷光 (zao bi tou guang, “borrowing light through a hole in the wall”); 闻鸡起舞 (wen ji qi wu, “rise at rooster’s call to practise”); 指鹿为马 (zhi lu wei ma, “calling a deer a horse”); 负荆请罪 (fu jing qing zui, “carry thorned branches to ask forgiveness”); 纸上谈兵 (zhi shang tan bing, “battle on paper”) |
| Myth/ Legend | 夸父逐日 (kua fu zhu ri, “Kuafu chasing the sun”); 女娲补天 (nv wa bu tian, “Nüwa repairing the sky”); 后羿射日 (hou yi she ri, “Houyi shooting the suns”); 精卫填海 (jing wei tian hai, “Jingwei filling the sea”); 嫦娥奔月 (chang e ben yue, “Chang’e ascending to the moon”) |
| Literary Classic | 胸有成竹 (xiong you cheng zhu, “to have a plan in mind”); 专心致志 (zhuan xin zhi zhi, “single-minded devotion”); 学海无涯 (xue hai wu ya, “boundless sea of learning”); 不耻下问 (bu chi xia wen, “not ashamed to ask subordinates”); 熟能生巧 (shu neng sheng qiao, “practice makes perfect”) |
| Model | Cultural Symbols | Affective Ontology | Visual Aesthetics |
|---|---|---|---|
| Doubao | 0.988 | 0.991 | 0.989 |
| GPT-4 | 0.987 | 0.987 | 0.991 |
| Midjourney | 0.990 | 0.989 | 0.989 |
| Evaluation Metric | Pearson r | Spearman ρ | Correlation with Expert Ratings |
|---|---|---|---|
| CLIPScore (Baseline) | 0.185 * | 0.172 * | weak correlation |
| Cultural symbols | 0.362 ** | 0.353 ** | moderate correlation |
| Emotional ontology | 0.593 ** | 0.469 ** | strong correlation |
| Visual aesthetics | −0.087 ns | −0.060 ns | weak correlation |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, J.; Teng, Y.; Wang, W. A Study on the Generation and Evaluation of Illustrations for Chinese Idiom Allusions Based on AIGC. Information 2026, 17, 495. https://doi.org/10.3390/info17050495
Li J, Teng Y, Wang W. A Study on the Generation and Evaluation of Illustrations for Chinese Idiom Allusions Based on AIGC. Information. 2026; 17(5):495. https://doi.org/10.3390/info17050495
Chicago/Turabian StyleLi, Jingxue, Youping Teng, and Weijia Wang. 2026. "A Study on the Generation and Evaluation of Illustrations for Chinese Idiom Allusions Based on AIGC" Information 17, no. 5: 495. https://doi.org/10.3390/info17050495
APA StyleLi, J., Teng, Y., & Wang, W. (2026). A Study on the Generation and Evaluation of Illustrations for Chinese Idiom Allusions Based on AIGC. Information, 17(5), 495. https://doi.org/10.3390/info17050495

