Image Restoration Based on Semantic Prior Aware Hierarchical Network and Multi-Scale Fusion Generator
Abstract
1. Introduction
- (1)
- We present a novel Semantic Prior Generator, which combines a multi-label classification model with U-Net to map multi-scale features into semantic priors, enhancing semantic accuracy. Mapping the multi-scale high-level semantic features of the damaged image and the intact image into multi-scale semantic priors enables our model to more accurately capture the high-level semantic information of the image’s damaged areas.
- (2)
- We propose residual blocks with multi-scale fusion, which refines low-level texture features and progressively integrates them with structural features, preserving image integrity. By constructing a series of residual blocks, the model is able to effectively map the multi-scale semantic prior information into structural features of different scales. This innovation not only enhances the model’s expressive power but also ensures better preservation of the image structure’s integrity during the repair process, thereby improving the quality of the restored image. To address the insufficient interaction between high-level semantic features and low-level features, the model designs a multi-scale fusion module. This module can refine low-level texture features and fuse them with structural features of different scales in a progressive manner, allowing the model to perceive the multi-scale semantic information of high-level features.
2. Related Work
3. Approach
3.1. Multi-Scale Semantic-Driven Generator
- (1)
- Multi-scale Feature Fusion Architecture: The input of the MSF module consists of two parts: the first is the texture features extracted by the encoder (derived from the downsampling path of the U-Net), and the second is the multi-scale semantic structural prior (obtained from the semantic prior mapper designed in this paper, rather than the single-scale semantic input of SPADE). The output, on the other hand, is the enhanced feature that integrates multi-scale information and is upsampled to a unified scale.
- (2)
- Dynamic Computational Allocation Mechanism: the MSF module dynamically allocates different numbers of SPADE residual blocks for semantic priors at different input scales.
- (3)
- Integration of Deformable Convolution: deformable convolution is incorporated to enhance the model’s ability to model geometric transformations, replacing the standard convolution used in SPADE.
- (4)
- The MSF module retains and directly employs the core normalization and feature modulation algorithm of the original SPADE.
3.2. Mask-Guided Discriminator
3.3. Loss Function
4. Experiments
4.1. Datasets and Settings
4.2. Quantitative Comparison
4.3. Ablation Experiment
4.4. Qualitative Comparison
4.5. Analysis of Computational Efficiency
5. Conclusions and Future Works
5.1. Conclusions
5.2. Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, X.; Chen, H.; Qi, X.; Dou, Q.; Fu, C.W.; Heng, P.A. H-DenseUNet: Hybrid Densely Connected U-Net for Liver and Tumor Segmentation from CT Volumes. IEEE Trans. Med. Imaging 2018, 37, 2663–2674. [Google Scholar] [CrossRef] [PubMed]
- Liu, R.; Li, Z.; Ma, J. A Deep Learning Framework for the Recovery of Missing Data in High-Resolution Remote Sensing Images. ISPRS J. Photogramm. Remote Sens. 2021, 181, 34–47. [Google Scholar]
- Zheng, J.; Qin, M.; Xu, H.; Feng, Y.; Chen, P.; Chen, S. Tensor completion using patch-wise high order Hankelization and randomized tensor ring initialization. Eng. Appl. Artif. Intell. 2021, 106, 104472. [Google Scholar] [CrossRef]
- Pathak, D.; Krähenbühl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Trans. Graph. 2017, 36, 1–14. [Google Scholar] [CrossRef]
- Xiang, H.; Zou, Q.; Nawaz, M.A.; Huang, X.; Zhang, F.; Yu, H. Deep learning for image inpainting: A survey. Pattern Recognit. 2023, 134, 109046. [Google Scholar] [CrossRef]
- Li, J.; Ning, W.; Zhang, L.; Du, B.; Tao, D. Recurrent feature reasoning for image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7760–7768. [Google Scholar]
- Suvorov, R.; Logacheva, E.; Mashikhin, A.; Remizova, A.; Ashukha, A.; Silvestrov, A.; Kong, N.; Goka, H.; Park, K.; Lempitsky, V. Resolution-robust large mask inpainting with Fourier convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 2149–2159. [Google Scholar]
- Wan, Z.; Zhang, J.; Chen, D.; Liao, J. High-Fidelity and Efficient Pluralistic Image Completion with Transformers. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9612–9629. [Google Scholar] [CrossRef] [PubMed]
- Lugmayr, A.; Danelljan, M.; Romero, A.; Yu, F.; Timofte, R.; Van Gool, L. RePaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022. [Google Scholar] [CrossRef]
- Wang, Y.; Yu, J.; Zhang, J. Zero-shot image restoration using denoising diffusion null-space model. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar] [CrossRef]
- Wang, D.; Hu, L.; Li, Q.; Wang, G.; Li, H. Image Inpainting Based on Multi-Level Feature Aggregation Network for Future Internet. Electronics 2023, 12, 4065. [Google Scholar] [CrossRef]
- Wang, Z.; Jiang, X.; Chen, C.; Li, Y. Lightweight Multi-Scales Feature Diffusion for Image Inpainting Towards Underwater Fish Monitoring. Sensors 2024, 24, 2178. [Google Scholar] [CrossRef]
- Wang, S.; Guo, X.; Guo, W. MD-GAN: Multi-Scale Diversity GAN for Large Masks Inpainting. Appl. Sci. 2025, 25, 2218. [Google Scholar] [CrossRef]
- Nazeri, K.; Ng, E.; Joseph, T.; Qureshi, F.; Ebrahimi, M. EdgeConnect: Structure guided image inpainting using edge prediction. In Proceedings of the IEEE International Conference on Computer Vision Workshop, Seoul, Republic of Korea, 27–28 October 2019; pp. 3265–3274. [Google Scholar] [CrossRef]
- Liu, H.; Jiang, B.; Song, Y.; Huang, W.; Yang, C. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 725–741. [Google Scholar] [CrossRef]
- Xiong, W.; Yu, J.; Lin, Z.; Yang, J.; Lu, X.; Barnes, C.; Luo, J. Foreground-aware image inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5833–5841. [Google Scholar] [CrossRef]
- Zhao, L.; Shen, L.; Hong, R. Survey on image inpainting research progress. Comput. Sci. 2021, 48, 14–26. [Google Scholar]
- Xu, H.; Zheng, J.; Yao, X.; Feng, Y.; Chen, S. Fast tensor nuclear norm for structured low-rank visual inpainting. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 538–552. [Google Scholar] [CrossRef]
- Zheng, J.; Jiang, J.; Xu, H.; Liu, Z.; Gao, F. Manifold-based nonlocal second-order regularization for hyperspectral image inpainting. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 224–236. [Google Scholar] [CrossRef]
- Hays, J.; Efros, A.A. Scene completion using millions of photographs. ACM Trans. Graph. 2007, 26, 4. [Google Scholar] [CrossRef]
- Qin, J. Multi-scale attention network for image inpainting. Comput. Vis. Image Underst. 2021, 204, 103155. [Google Scholar] [CrossRef]
- Jin, Y.; Wu, J.; Wang, W.; Yan, Y.; Jiang, J.; Zheng, J. Cascading blend network for image inpainting. ACM Trans. Multimedia Comput. Commun. Appl. 2023, 20, 1–21. [Google Scholar] [CrossRef]
- Peng, J.; Liu, D.; Xu, S.; Li, H. Generating diverse structure for image inpainting with hierarchical VQ-VAE. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 10770–10779. [Google Scholar]
- Afsari, A.; Abbosh, A.M.; Rahmat-Samii, Y. A rapid medical microwave tomography based on partial differential equations. IEEE Trans. Antennas Propag. 2018, 66, 5521–5535. [Google Scholar] [CrossRef]
- Yi, Z.; Tang, Q.; Azizi, S.; Jang, D.; Xu, Z. Contextual residual aggregation for ultra high-resolution image inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7508–7517. [Google Scholar]
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Generative Image Inpainting with contextual attention. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5505–5514. [Google Scholar]
- Liu, H.; Jiang, B.; Xiao, Y.; Yang, C. Coherent semantic attention for image inpainting. In Proceedings of the 17th IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 4169–4178. [Google Scholar] [CrossRef]
- Sagong, M.-C.; Shin, Y.-G.; Kim, S.-W.; Park, S.; Ko, S.-J. PEPSI: Fast image inpainting with parallel decoding network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 11352–11360. [Google Scholar]
- Zeng, Y.; Fu, J.; Chao, H.; Guo, B. Learning pyramid-context encoder network for high-quality image inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 1486–1494. [Google Scholar] [CrossRef]
- Song, Y.; Yang, C.; Shen, Y.; Wang, P.; Huang, Q.; Kuo, C.C.J. SPG-Net: Segmentation prediction and guidance network for image inpainting. In Proceedings of the British Machine Vision Conference (BMVC), Newcastle, UK, 3–6 September 2018. [Google Scholar]
- Doersch, C.; Singh, S.; Gupta, A.; Sivic, J.; Efros, A.A. What makes Paris look like Paris? ACM Trans. Graph. 2012, 31, 1–9. [Google Scholar] [CrossRef]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the 15th IEEE International Conference on Computer Vision, Santiago, Chile, 13–16 December 2015; pp. 3730–3738. [Google Scholar]
- Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.-C.; Tao, A.; Catanzaro, B. Image inpainting for irregular holes using partial convolutions. In Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 89–105. [Google Scholar] [CrossRef]
- Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1452–1464. [Google Scholar] [CrossRef] [PubMed]
- Guo, X.; Yang, H.; Huang, D. Image inpainting via conditional texture and structure dual generation. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, Canada, 10–17 October 2021; pp. 14114–14123. [Google Scholar] [CrossRef]
- Zhang, W.; Zhu, J.; Tai, Y.; Wang, Y.; Chu, W.; Ni, B.; Wang, C.; Yang, X. Context-Aware Image Inpainting with Learned Semantic Priors. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 19–27 August 2021; pp. 1323–1329. [Google Scholar] [CrossRef]








| Mask Rate | EC | CTSDG | SPL | SPN | LaMa- Fourier | DDNM | Ours | |
|---|---|---|---|---|---|---|---|---|
| PSNR ↑ | (0.2, 0.4] | 27.65 | 28.08 | 29.34 | 29.76 | 31.25 | 33.39 | 33.22 |
| (0.4, 0.6] | 22.81 | 23.39 | 24.47 | 24.67 | 28.76 | 30.18 | 29.03 | |
| All | 27.5 | 28.11 | 31.38 | 29.69 | 32.85 | 34.76 | 34.99 | |
| SSIM ↑ | (0.2, 0.4] | 0.86 | 0.888 | 0.911 | 0.916 | 0.891 | 0.901 | 0.904 |
| (0.4, 0.6] | 0.74 | 0.746 | 0.79 | 0.795 | 0.825 | 0.833 | 0.821 | |
| All | 0.829 | 0.855 | 0.882 | 0.886 | 0.816 | 0.865 | 0.883 | |
| MAE ↓ | (0.2, 0.4] | 0.358 | 0.382 | 0.12 | 0.409 | 0.28 | 0.28 | 0.35 |
| (×10−1) | (0.4, 0.6] | 0.664 | 0.458 | 0.292 | 0.618 | 0.57 | 0.49 | 0.58 |
| All | 0.37 | 0.42 | 0.16 | 0.432 | 0.35 | 0.32 | 0.36 | |
| FID ↓ | (0.2, 0.4] | 13.56 | 11.89 | 9.161 | 11.54 | 12.28 | 12.43 | 10.22 |
| (0.4, 0.6] | 14.98 | 18.52 | 13.41 | 24 | 13.18 | 14.22 | 13.16 | |
| All | 14.3 | 13.76 | 12.46 | 14.76 | 12.35 | 11.89 | 11.56 | |
| Mask Rate | EC | CTSDG | SPL | SPN | LaMa-Fourier | DDNM | Ours | |
|---|---|---|---|---|---|---|---|---|
| PSNR ↑ | (0.2, 0.4] | 29.47 | 29.48 | 32.17 | 32.39 | 32.56 | 33.92 | 35.88 |
| (0.4, 0.6] | 24.08 | 24.21 | 26.43 | 26.54 | 30.3 | 31.54 | 30.51 | |
| All | 30.34 | 30.62 | 33.23 | 33.52 | 33.82 | 36.12 | 35.36 | |
| SSIM ↑ | (0.2, 0.4] | 0.802 | 0.829 | 0.916 | 0.842 | 0.831 | 0.852 | 0.904 |
| (0.4, 0.6] | 0.788 | 0.776 | 859 | 0.8011 | 0.792 | 0.801 | 0.838 | |
| All | 0.825 | 0.807 | 0.901 | 0.847 | 0.855 | 0.867 | 0.89 | |
| MAE ↓ (×10−1) | (0.2, 0.4] | 0.111 | 0.109 | 0.8 | 0.77 | 0.32 | 0.176 | 0.18 |
| (0.4, 0.6] | 0.274 | 0.267 | 0.203 | 0.63 | 0.58 | 0.2 | 0.25 | |
| All | 0.14 | 0.135 | 0.102 | 0.99 | 0.34 | 0.189 | 0.21 | |
| FID ↓ | (0.2, 0.4] | 10.41 | 11.32 | 10.27 | 10.16 | 7.26 | 10.16 | 6.85 |
| (0.4, 0.6] | 14.4 | 17.57 | 12.94 | 12.38 | 7.35 | 12.38 | 6.97 | |
| All | 7.24 | 8.14 | 6.88 | 6.77 | 6.96 | 6.77 | 6.82 | |
| Mask Rate | EC | CTSDG | SPL | SPN | LaMa-Fourier | DDNM | Ours | |
|---|---|---|---|---|---|---|---|---|
| PSNR ↑ | (0.2, 0.4] | 26.17 | 24.16 | 26.47 | 26.65 | 27.33 | 32.65 | 32.28 |
| (0.4, 0.6] | 22.32 | 20.36 | 22.05 | 22.18 | 24.68 | 30.18 | 29.21 | |
| All | 25.49 | 25.54 | 27.55 | 27.72 | 26.67 | 31.72 | 32.03 | |
| SSIM ↑ | (0.2, 0.4] | 0.847 | 0.848 | 0.863 | 0.867 | 0.882 | 0.854 | 0.879 |
| (0.4, 0.6] | 0.695 | 0.696 | 0.789 | 0.763 | 0.765 | 0.763 | 0.799 | |
| All | 0.832 | 0.837 | 0.871 | 0.877 | 0.856 | 0.862 | 0.881 | |
| MAE ↓ (×10−1) | (0.2, 0.4] | 0.221 | 0.22 | 0.164 | 0.158 | 0.2 | 0.16 | 0.286 |
| (0.4, 0.6] | 0.456 | 0.452 | 0.363 | 0.351 | 0.39 | 0.35 | 0.47 | |
| All | 0.248 | 0.237 | 0.191 | 0.185 | 0.258 | 0.19 | 0.278 | |
| FID ↓ | (0.2, 0.4] | 9.63 | 15.88 | 9.3 | 7.68 | 17.3 | 15.68 | 7.57 |
| (0.4, 0.6] | 24.04 | 22.19 | 13.98 | 20.73 | 18.55 | 20.73 | 7.52 | |
| All | 8.08 | 12.36 | 15.56 | 4.73 | 16.24 | 14.73 | 7.61 | |
| Without U-Net | Without Multi-Scale Semantic Fusion | Ours | |
|---|---|---|---|
| PSNR ↑ | 22.62 | 23.92 | 32.24 |
| SSIM ↑ | 0.740 | 0.764 | 0.841 |
| MAE(10−1) ↓ | 0.390 | 0.314 | 0.167 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Feng, Y.; Tang, Y.; Zhong, H. Image Restoration Based on Semantic Prior Aware Hierarchical Network and Multi-Scale Fusion Generator. Technologies 2025, 13, 521. https://doi.org/10.3390/technologies13110521
Feng Y, Tang Y, Zhong H. Image Restoration Based on Semantic Prior Aware Hierarchical Network and Multi-Scale Fusion Generator. Technologies. 2025; 13(11):521. https://doi.org/10.3390/technologies13110521
Chicago/Turabian StyleFeng, Yapei, Yuxiang Tang, and Hua Zhong. 2025. "Image Restoration Based on Semantic Prior Aware Hierarchical Network and Multi-Scale Fusion Generator" Technologies 13, no. 11: 521. https://doi.org/10.3390/technologies13110521
APA StyleFeng, Y., Tang, Y., & Zhong, H. (2025). Image Restoration Based on Semantic Prior Aware Hierarchical Network and Multi-Scale Fusion Generator. Technologies, 13(11), 521. https://doi.org/10.3390/technologies13110521
