Semantic and Sketch-Guided Diffusion Model for Fine-Grained Restoration of Damaged Ancient Paintings
Abstract
1. Introduction
- 1.
- A paired dataset for ancient paintings restoration (originals, semantic maps, sketch maps) for training restoration and related networks.
- 2.
- A diffusion-based restoration network using regional semantics and local structures from semantic and depth-sensitive sketch maps to guide denoising.
- 3.
- A Semantic-Sketch-Attribute-Normalization (SSAN) block with multi-layer spatially adaptive normalization, integrating semantic layouts and sketch structures for high-quality decoding.
- 4.
- Integration of sketch guidance and class space attention to capture intricate details and artistic essence, bridging AI art generation and traditional ancient paintings.
2. Materials and Methods
2.1. State of the Art
2.2. Dataset
2.3. Problem Formalization
2.4. Proposed Framework
2.5. Classifier-Free Guidance
2.6. Attributes Clustering
| Algorithm 1 Attributes Clustering for Ancient Paintings Dataset. |
|
2.7. Depth-Sensitive Sketch Extraction
| Algorithm 2 Depth-Sensitive Sketch Extraction. |
|
2.8. Attention-Based Fusion
3. Results
3.1. Evaluation Setup
3.2. Quantitative Results
3.3. Qualitative Results

3.4. Impact of Manual Map Adjustments
3.5. Ablation Study
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yang, R.; Li, H.; Long, Y.; Wu, X.; He, S. Stroke2Sketch: Harnessing Stroke Attributes for Training-Free Sketch Generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, HI, USA, 19–23 October 2025; pp. 16545–16554. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-based generative modeling through stochastic differential equations. arXiv 2020, arXiv:2011.13456. [Google Scholar]
- Yang, R.; Yang, H.; Zhao, L.; Lei, Q.; Dong, M.; Ota, K.; Wu, X. One-Shot Reference-based Structure-Aware Image to Sketch Synthesis. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 9238–9246. [Google Scholar]
- Li, D.; Hu, J.; Wang, C.; Li, X.; She, Q.; Zhu, L.; Zhang, T.; Chen, Q. Involution: Inverting the Inherence of Convolution for Visual Recognition. arXiv 2021, arXiv:2103.06255. [Google Scholar] [CrossRef]
- Ye, C.; Chen, W.; Hu, B.; Zhang, L.; Zhang, Y.; Mao, Z. Improving Video Summarization by Exploring the Coherence Between Corresponding Captions. IEEE Trans. Image Process. 2025, 34, 5369–5384. [Google Scholar] [CrossRef]
- Lao, Q.; Javadi, S.; Mirzaei, M.; Green, B.; Isola, P.; Fisher, M. HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models. arXiv 2023, arXiv:2312.14091. [Google Scholar]
- Sharma, N.; Uhlig, S.; Ommer, B.; Akata, Z.; Sharma, S. Sketch-guided Image Inpainting with Partial Discrete Diffusion Process. arXiv 2024, arXiv:2404.11949. [Google Scholar] [CrossRef]
- Liu, X.; Lin, Z.; Wang, X.; Li, S.; Wang, H.; Liu, Y.; Wang, Y.; Yang, S. S2Edit: Text-Guided Image Editing with Precise Semantic and Spatial Control. arXiv 2025, arXiv:2507.04584. [Google Scholar]
- Chan, C.; Durand, F.; Isola, P. Learning to generate line drawings that convey geometry and semantics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 7915–7925. [Google Scholar]
- Yang, R.; Wu, X.; He, S. MixSA: Training-Free Reference-Based Sketch Extraction via Mixture-of-Self-Attention. IEEE Trans. Vis. Comput. Graph. 2025, 31, 6208–6222. [Google Scholar] [CrossRef]
- Xue, A. End-to-end Chinese landscape painting creation using generative adversarial networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 9 January 2021; pp. 3863–3871. [Google Scholar]
- Lyu, Q.; Zhao, N.; Yang, Y.; Gong, Y.; Gao, J. A diffusion probabilistic model for traditional Chinese landscape painting super-resolution. Herit. Sci. 2024, 12, 4. [Google Scholar] [CrossRef]
- Zhao, Y.; Li, H.; Zhang, Z.; Chen, Y.; Liu, Q.; Zhang, X. Regional Traditional Painting Generation Based on Controllable Disentanglement Model. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 6913–6925. [Google Scholar] [CrossRef]
- Wang, Z.; Zhang, J.; Ji, Z.; Bai, J.; Shan, S. CCLAP: Controllable Chinese Landscape Painting Generation Via Latent Diffusion Model. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 2117–2122. [Google Scholar] [CrossRef]
- Huang, W.; Zhang, R.; Li, X.; Wang, P. Digital Preservation and Analysis of Ancient Chinese Paintings Using Machine Learning. Pattern Recognit. 2025, 161, 111447. [Google Scholar] [CrossRef]
- Gao, Y.; Wang, C.; Zhang, J.; Li, H. Analysis and Generation of Traditional Chinese Painting Using Deep Learning. Comput. Vis. Media 2023, 9, 1–15. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, S.; Liu, Y.; Chen, H. Digital Analysis of Traditional Chinese Painting Composition Based on Computer Vision. Appl. Soft Comput. 2024, 156, 111492. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
- Liu, Y.; Qin, J.; Wang, S.; Wang, F. Generative Adversarial Networks for Chinese Calligraphy Synthesis: A Survey. Appl. Artif. Intell. 2021, 35, 1015–1034. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Papamakarios, G.; Nalisnick, E.; Rezende, D.J.; Mohamed, S.; Lakshminarayanan, B. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 2021, 22, 2617–2680. [Google Scholar]
- Nichol, A.Q.; Dhariwal, P. Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8162–8171. [Google Scholar]
- Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 2256–2265. [Google Scholar]
- Kingma, D.; Salimans, T.; Poole, B.; Ho, J. Variational diffusion models. Adv. Neural Inf. Process. Syst. 2021, 34, 21696–21707. [Google Scholar]
- Dhariwal, P.; Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
- Ho, J.; Saharia, C.; Chan, W.; Fleet, D.J.; Norouzi, M.; Salimans, T. Cascaded Diffusion Models for High Fidelity Image Generation. J. Mach. Learn. Res. 2022, 23, 1–33. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 1125–1134. [Google Scholar]
- Park, T.; Liu, M.Y.; Wang, T.C.; Zhu, J.Y. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2337–2346. [Google Scholar]
- Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8798–8807. [Google Scholar]
- Tang, H.; Xu, D.; Yan, Y.; Torr, P.H.; Sebe, N. Local class-specific and global image-level generative adversarial networks for semantic-guided scene generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 7870–7879. [Google Scholar]
- Tang, H.; Qi, X.; Xu, D.; Torr, P.H.; Sebe, N. Edge guided GANs with semantic preserving for semantic image synthesis. arXiv 2020, arXiv:2003.13898. [Google Scholar]
- Wang, Y.; Qi, L.; Chen, Y.C.; Zhang, X.; Jia, J. Image synthesis via semantic composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 13749–13758. [Google Scholar]
- Sushko, V.; Schönfeld, E.; Zhang, D.; Gall, J.; Schiele, B.; Khoreva, A. You only need adversarial supervision for semantic image synthesis. In Proceedings of the ICLR, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Wang, Y.; Li, M.; Liu, J.; Leng, Z.; Li, F.W.; Zhang, Z.; Liang, X. Fg-T2M++: LLMs-augmented fine-grained text driven human motion generation. Int. J. Comput. Vis. 2025, 133, 4277–4293. [Google Scholar] [CrossRef]
- Zhu, P.; Abdal, R.; Qin, Y.; Wonka, P. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 5104–5113. [Google Scholar]
- Tan, Z.; Chu, Q.; Chai, M.; Chen, D.; Liao, J.; Liu, Q.; Liu, B.; Hua, G.; Yu, N. Semantic Probability Distribution Modeling for Diverse Semantic Image Synthesis. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 6247–6264. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; Bao, J.; Zhou, W.; Chen, D.; Chen, D.; Yuan, L.; Li, H. Semantic image synthesis via diffusion models. arXiv 2022, arXiv:2207.00050. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Yang, R.; Yang, H.; Zhao, M.; Jia, R.; Wu, X.; Zhang, Y. Special perceptual parsing for Chinese landscape painting scene understanding: A semantic segmentation approach. Neural Comput. Appl. 2024, 36, 5231–5249. [Google Scholar] [CrossRef]
- Shi, Y.; Otto, C.; Jain, A.K. Face clustering: Representation and pairwise constraints. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1626–1640. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Kong, F.; Pu, Y.; Lee, I.; Nie, R.; Zhao, Z.; Xu, D.; Qian, W.; Liang, H. Unpaired Artistic Portrait Style Transfer via Asymmetric Double-Stream GAN. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 5427–5439. [Google Scholar] [CrossRef]
- Huang, N.; Zhang, Y.; Tang, F.; Ma, C.; Huang, H.; Zhang, Y.; Dong, W.; Xu, C. DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 3370–3383. [Google Scholar] [CrossRef]
- Tan, Z.; Chen, D.; Chu, Q.; Chai, M.; Liao, J.; He, M.; Yuan, L.; Hua, G.; Yu, N. Efficient semantic image synthesis via class-adaptive normalization. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4852–4866. [Google Scholar] [CrossRef]
- Ho, J.; Salimans, T. Classifier-free diffusion guidance. arXiv 2022, arXiv:2207.12598. [Google Scholar] [CrossRef]









| Method | Fidelity | Diversity | Quality | ||
|---|---|---|---|---|---|
| FID ↓ | mIoU (%) ↑ | KID ↓ | SSIM ↑ | PSNR ↑ | |
| SPADE [32] | 101.01 | 43.25 | 46.19 | 0.34 | 14.89 |
| CLADE [48] | 97.32 | 48.40 | 21.89 | 0.38 | 16.17 |
| OASIS [37] | 120.39 | 49.56 | 14.37 | 0.38 | 14.38 |
| SDM [41] | 150.57 | 52.67 | 3.09 | 0.37 | 11.70 |
| SSGR (Ours) | 108.93 | 53.30 | 3.24 | 0.42 | 13.11 |
| Method | Ed. | Att. | IS ↑ | PSNR ↑ | mIoU (%) ↑ | KID ↓ |
|---|---|---|---|---|---|---|
| Baseline | ✗ | ✗ | 3.09 | 11.70 | 52.67 | 31.98 |
| SSGR | ✓ | ✗ | 2.55 | 12.62 | 52.97 | 47.54 |
| SSGR | ✗ | ✓ | 3.17 | 12.04 | 52.42 | 33.45 |
| SSGR | ✓ | ✓ | 3.24 | 13.11 | 53.30 | 13.27 |
| Dropout Rate (%) | mIoU (%) ↑ | PSNR ↑ | KID ↓ |
|---|---|---|---|
| 10 | 52.1 | 12.5 | 15.6 |
| 20 | 52.8 | 12.9 | 8.4 |
| 30 | 53.3 | 13.1 | 3.2 |
| 40 | 52.5 | 12.7 | 7.9 |
| 50 | 51.9 | 12.3 | 16.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, L.; Chen, Y.; Du, G.; Wu, X. Semantic and Sketch-Guided Diffusion Model for Fine-Grained Restoration of Damaged Ancient Paintings. Electronics 2025, 14, 4187. https://doi.org/10.3390/electronics14214187
Zhao L, Chen Y, Du G, Wu X. Semantic and Sketch-Guided Diffusion Model for Fine-Grained Restoration of Damaged Ancient Paintings. Electronics. 2025; 14(21):4187. https://doi.org/10.3390/electronics14214187
Chicago/Turabian StyleZhao, Li, Yingzhi Chen, Guangqi Du, and Xiaojun Wu. 2025. "Semantic and Sketch-Guided Diffusion Model for Fine-Grained Restoration of Damaged Ancient Paintings" Electronics 14, no. 21: 4187. https://doi.org/10.3390/electronics14214187
APA StyleZhao, L., Chen, Y., Du, G., & Wu, X. (2025). Semantic and Sketch-Guided Diffusion Model for Fine-Grained Restoration of Damaged Ancient Paintings. Electronics, 14(21), 4187. https://doi.org/10.3390/electronics14214187

