Image Inpainting Methods: A Review of Deep Learning Approaches
Abstract
1. Introduction
- An integrative perspective is provided across methodological evolution, introducing unified design principles (e.g., symmetry) underlying technological development and the evolution of solutions to core challenges.
- A precise alignment analysis framework is constructed linking inpainting challenges to model mechanisms, thereby enhancing the review’s practical utility.
- A more diagnostic evaluation perspective oriented toward the open world is advocated for and preliminarily constructed, promoting the field’s development toward robust and practical applications.
2. Deep Learning-Based Inpainting Methods
2.1. CNN-Based Methods
2.1.1. Encoder–Decoder Architectures
2.1.2. FCN-Based Methods
2.1.3. U-Net-Based Methods
2.1.4. Summary of CNN-Based Methods
2.2. Generative Model-Based Inpainting Methods
2.2.1. RNN-Based Methods
2.2.2. VAE-Based Methods
2.2.3. GAN-Based Methods
2.2.4. Diffusion-Based Methods
2.2.5. Summary of Generative Model-Based Methods
2.3. Transformer-Based Methods
2.4. Comparative Analysis of Representative Methods
3. Datasets
3.1. Mask Datasets
3.2. Natural Image Datasets
3.3. Artistic Image Datasets
3.4. Scientific and Detection Image Datasets
| Type | Representative Dataset | Primary Contributions and Generality | Core Limitations | Directions for Improvement |
|---|---|---|---|---|
| Masks | NVIDIA Irregular Mask [48] | Established the benchmark for irregular masks; large-scale. | Diverges from real-world damage (e.g., tears, corrosion) in morphology and degradation patterns. | Develop physically simulated masks and semantic-guided mask generation. |
| Scene Images | Places2 [116] | Rich in scene categories, beneficial for scene understanding. | High intra-scene homogeneity; aesthetically filtered, lacking cluttered, unstructured real-world scenes. | Supplement with unstructured scene data. |
| Face Images | CelebA [117] CelebA-HQ [118] | Large-scale with rich annotations; de facto standard for face inpainting. | Primarily composed of celebrity facial images, Lacks diversity in race, age, expression, illumination, and pose. | Cross-validate using diverse sets (e.g., FFHQ) and report biases. |
| Artistic Images | WikiArt | Covers diverse art styles, promoting digital art restoration and stylized generation. | Strong style–content coupling requires style consistency; lacks damage annotations and professional evaluation. | Establish sub-benchmarks for artistic styles with expert ratings and damage-paired data. |
| Dunhuang Murals [123] Thangka [124] | Important cultural heritage, providing domain-specific data. | Highly specialized; limited data; difficult to acquire; features unique damage types. | Expand scale via high-fidelity synthetic data; explore domain adaptation methods. | |
| Medical Images | NIH Chest X-ray | Relatively large-scale public medical imaging dataset. | Data lacks fine-grained annotations; suffers from class imbalance and bias; presents issues with image quality and consistency. | Expand the dataset with clinically balanced categories and employ expert evaluation for annotation. |
| Remote Sensing | NWPU VHR-10 | High-resolution remote sensing imagery; high-quality annotations. | Its small scale impairs the model’s adaptability to diverse real-world scenarios. | Supplement with real multi-temporal and multispectral data; introduce spectral fidelity metrics during evaluation. |
| Large-Scale Multimodal | LAION-5B [113] | Unprecedentedly large-scale image–text paired dataset, supporting guided inpainting. | High noise and bias; risk of data contamination. | Requires rigorous cleaning, deduplication, and auditing. |
4. Image Quality Assessment (IQA)
- (a)
- PSNR
- (b)
- SSIM
- (c)
- FID
- (d)
- IS
- (e)
- L1 Loss
- (f)
- IFC
- (g)
- VSNR
- (h)
- FSIM
- (i)
- MS-SSIM
- (j)
- NIQE
- (k)
- LPIPS
- (l)
- PIQUE
- (m)
- TOPIQ
5. Challenges and Future Directions
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| CNN | Convolutional Neural Network |
| PDE | Partial Differential Equation |
| MLP | Multilayer Perceptron |
| RNN | Recurrent Neural Network |
| GAN | Generative Adversarial Network |
| FCN | Fully Convolutional Network |
| E-D | Encoder–Decoder |
| CE | Context Encoder |
| GLCIC | Globally and Locally Consistent Image Completion |
| GMCNN | Generative Multi-column Convolutional Neural Network |
| MC-CNN | Multi-Column Convolutional Neural Network |
| PEPSI | Parallel Extended-Decoder Path for Semantic Inpainting |
| MAE | Masked Autoencoder |
| MSDN | Multi-Stage Decoding Network |
| SPG-Net | Segmentation Prediction and Guidance Network |
| HPA-FCN | High-Pass Filter Attention Fully Convolutional Network |
| RLPF | Residual Low-Pass Filter |
| PEN-Net | Pyramid-context Encoder Network |
| DFNet | Deep Fusion Network |
| ISFRNet | Identity and Structure Feature Refinement Network |
| FFC | Fast Fourier Convolution |
| CycleRDM | Cycle Reconstruction Diffusion Model |
| VAE | Variational Autoencoder |
| LSTM | Long Short-Term Memory |
| KL | Kullback–Leibler |
| VQ-VAE | Vector-Quantized Variational Autoencoder |
| DCGAN | Deep Convolutional Generative Adversarial Network |
| UCTGAN | Unsupervised Cross-space Translation Generative Adversarial Network |
| CT | Computed Tomography |
| MRI | Magnetic Resonance Imaging |
| EG-GAN | Edge-Guided Generative Adversarial Network |
| PD-GAN | Probabilistic Diverse Generative Adversarial Network |
| NLP | Natural Language Processing |
| FT-TDR | Transformer and Top-Down Refinement Network |
| ViT | Vision Transformer |
| MAT | Mask-Aware Transformer |
| DIV2K | DIVerse 2K resolution image dataset |
| LAION-5B | Large-scale Artificial Intelligence Open Network 5-Billion |
| CelebA | CelebFaces Attributes Dataset |
| CelebA-HQ | CelebFaces Attributes Dataset—High Quality |
| SVNH | Street View House Numbers |
| DTD | Describable Textures Dataset |
| DRIVE | Digital Retinal Images for Vessel Extraction |
| SCR | SINAI Chest Radiograph |
| NIH | National Institutes of Health Chest X-ray Dataset |
| NWPU | Northwestern Polytechnical University VHR-10 Dataset |
| VHR-10 | Very High Resolution 10-class |
| RSOD | Remote Sensing Object Detection |
| DIOR | Dataset for Object Detection in Optical Remote Sensing Images |
| DOTA | Dataset for Object Detection in Aerial Images |
| HRRSD | High-Resolution Remote Sensing Detection |
| IQA | Image Quality Assessment |
| MOS | Mean Opinion Score |
| NR-IQA | No-Reference Image Quality Assessment |
| FR-IQA | Full-Reference Image Quality Assessment |
| PSNR | Peak Signal-to-Noise Ratio |
| SSIM | Structural Similarity Index Measure |
| FID | Fréchet Inception Distance |
| IS | Inception Score |
| IFC | Information Fidelity Criterion |
| VSNR | Visual Signal-to-Noise Ratio |
| FSIM | Feature Similarity Index Measure |
| MS-SSIM | Multi-Scale Structural Similarity Index Measure |
| NIQE | Natural Image Quality Evaluator |
| LPIPS | Learned Perceptual Image Patch Similarity |
| PIQUE | Perception-based Image Quality Evaluator |
References
- Kachkine, A. Physical restoration of a painting with a digitally constructed mask. Nature 2025, 642, 343–350. [Google Scholar] [CrossRef]
- Wang, Y.; Cao, C.; Yu, J.; Fan, K.; Xue, X.; Fu, Y. Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 23237–23248. [Google Scholar]
- Batool, I.; Imran, M. A dual residual dense network for image denoising. Eng. Appl. Artif. Intell. 2025, 147, 110275. [Google Scholar] [CrossRef]
- Zhang, S.; Zhang, X.; Shen, L.; Wan, S.; Ren, W. Wavelet-based physically guided normalization network for real-time traffic dehazing. Pattern Recognit. 2025, 172, 112451. [Google Scholar] [CrossRef]
- Guo, B.; Ping, P.; Liu, F.; Xu, F. Robust Reversible Watermarking with Invisible Distortion Against VAE Watermark Removal. IEEE Trans. Image Process. 2025, 34, 6386–6401. [Google Scholar] [CrossRef]
- Zhang, L.; Zheng, W.; Hao, Q.; Xiao, Y. Super-Pixel Blocks Based Fast Fourier Convolution Model for Image Restoration. In Proceedings of the International Conference on Neural Information Processing (ICONIP), Auckland, New Zealand, 2–6 December 2024; pp. 290–304. [Google Scholar]
- Bertalmio, M.; Sapiro, G.; Caselles, V.; Ballester, C. Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 23–28 July 2000; pp. 417–424. [Google Scholar]
- Efros, A.A.; Leung, T.K. Texture synthesis by non-parametric sampling. In Proceedings of the Seventh IEEE International Conference on Computer Vision (ICCV), Kerkyra, Greece, 20–27 September 1999; Volume 2, pp. 1033–1038. [Google Scholar]
- Guillemot, C.; Le Meur, O. Image inpainting: Overview and recent advances. IEEE Signal Process. 2014, 31, 127–144. [Google Scholar] [CrossRef]
- Haritha, L.; Prajith, C.A. Image inpainting using deep learning techniques: A review. In Proceedings of the International Conference on Control, Communication and Computing (ICCC), 19–21 May 2023; pp. 1–6. [Google Scholar]
- Elharrouss, O.; Damseh, R.; Belkacem, A.N.; Badidi, E.; Lakas, A. Transformer-based image and video inpainting: Current challenges and future directions. Artif. Intell. Rev. 2025, 58, 124. [Google Scholar] [CrossRef]
- Xu, Z.; Zhang, X.; Chen, W.; Yao, M.; Liu, J.; Xu, T.; Wang, Z. A review of image inpainting methods based on deep learning. Appl. Sci. 2023, 13, 11189. [Google Scholar] [CrossRef]
- Zhang, X.; Zhai, D.; Li, T.; Zhou, Y.; Yang, L. Image inpainting based on deep learning: A review. Inf. Fusion. 2023, 90, 74–94. [Google Scholar] [CrossRef]
- Quan, W.; Chen, J.; Liu, Y.; Yan, D.M.; Wonka, P. Deep learning-based image and video inpainting: A survey. Int. J. Comput. Vis. 2024, 132, 2367–2400. [Google Scholar] [CrossRef]
- Yang, J.; Ruhaiyem, N.I.R. Review of deep learning-based image inpainting techniques. IEEE Access 2024, 12, 138441–138482. [Google Scholar] [CrossRef]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
- Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958, 65, 386–408. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, L.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Internal Representations by Error Propagation; Tech. Rep. No. ICS8506; California Univ San Diego La Jolla Inst for Cognitive Science: La Jolla, CA, USA, 1985. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Trans. Graph. 2017, 36, 1–14. [Google Scholar] [CrossRef]
- Yeh, R.A.; Chen, C.; Lim, T.Y.; Schwing, A.G.; Hasegawa-Johnson, M.; Do, M.N. Semantic image inpainting with deep generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5485–5493. [Google Scholar]
- Zhang, H.; Xu, T.; Li, H.; Zhang, S.; Wang, X.; Huang, X.; Metaxas, D.N. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5907–5915. [Google Scholar]
- Wang, Y.; Tao, X.; Qi, X.; Shen, X.; Jia, J. Image inpainting via generative multi-column convolutional neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 329–338. [Google Scholar]
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4471–4480. [Google Scholar]
- Sagong, M.C.; Shin, Y.G.; Kim, S.W.; Park, S.; Ko, S.J. PEPSI: Fast image inpainting with parallel decoding network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11360–11368. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
- Li, X.; Guo, Q.; Lin, D.; Li, P.; Feng, W.; Wang, S. MISF: Multi-level interactive siamese filtering for high-fidelity image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1869–1878. [Google Scholar]
- Zhao, H.; Gu, Z.; Zheng, B.; Zheng, H. Transcnn-hae: Transformer-cnn hybrid autoencoder for blind image inpainting. In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), Lisboa, Portugal, 10–14 October 2022; pp. 6813–6821. [Google Scholar]
- Liu, W.; Cun, X.; Pun, C.M.; Xia, M.; Zhang, Y.; Wang, J. Coordfill: Efficient high-resolution image inpainting via parameterized coordinate querying. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 1746–1754. [Google Scholar]
- Kumar, N.; Meenpal, T. Encoder–decoder-based CNN model for detection of object removal by image inpainting. J. Electr. Imag. 2023, 32, 042110. [Google Scholar] [CrossRef]
- Lian, J.; Zhang, J.; Liu, J.; Dong, Z.; Zhang, H. Guiding image inpainting via structure and texture features with dual encoder. Vis. Comput. 2024, 40, 4303–4317. [Google Scholar] [CrossRef]
- Zhang, S.; Chen, Y. ATM-DEN: Image Inpainting via attention transfer module and Decoder-Encoder network. Sig. Proc. Image Commun. 2025, 133, 117268. [Google Scholar] [CrossRef]
- Hu, S.; Ma, J.; Wan, J.; Min, W.; Jing, Y.; Zhang, L.; Tao, D. ClusIR: Towards Cluster-Guided All-in-One Image Restoration. arXiv 2025, arXiv:2512.10948. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Chaudhury, S.; Roy, H. Can fully convolutional networks perform well for general image restoration problems? In Proceedings of the 15th IAPR International Conference on Machine Vision Applications (MVA), Nagoya, Japan, 8–12 May 2017; pp. 254–257. [Google Scholar]
- Yang, C.; Lu, X.; Lin, Z.; Shechtman, E.; Wang, O.; Li, H. High-resolution image inpainting using multi-scale neural patch synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6721–6729. [Google Scholar]
- Godard, C.; Matzen, K.; Uyttendaele, M. Deep burst denoising. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 538–554. [Google Scholar]
- Song, Y.; Yang, C.; Shen, Y.; Wang, P.; Huang, Q.; Kuo, C.C.J. SPG-Net: Segmentation prediction and guidance network for image inpainting. arXiv 2018, arXiv:1805.03356. [Google Scholar] [CrossRef]
- Xiao, C.; Li, F.; Zhang, D.; Huang, P.; Ding, X.; Sheng, V.S. Image inpainting detection based on high-pass filter attention network. Comput. Syst. Sci. Eng. 2022, 43, 1145–1159. [Google Scholar] [CrossRef]
- Dong, J.; Pan, J.; Yang, Z.; Tang, J. Multi-scale residual low-pass filter network for image deblurring. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 12345–12354. [Google Scholar]
- Yan, Z.; Li, X.; Li, M.; Zuo, W.; Shan, S. Shift-Net: Image inpainting via deep feature rearrangement. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 1–17. [Google Scholar]
- Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 85–100. [Google Scholar]
- Wang, N.; Li, J.; Zhang, L.; Du, B. MUSICAL: Multi-Scale Image Contextual Attention Learning for Inpainting. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 3748–3754. [Google Scholar]
- Zeng, Y.; Fu, J.; Chao, H.; Guo, B. Learning pyramid-context encoder network for high-quality image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1486–1494. [Google Scholar]
- Hong, X.; Xiong, P.; Ji, R.; Fan, H. Deep fusion network for image completion. In Proceedings of the 27th ACM International Conference on Multimedia (ACM MM), Nice, France, 21–25 October 2019; pp. 2033–2042. [Google Scholar]
- Yi, Z.; Tang, Q.; Azizi, S.; Jang, D.; Xu, Z. Contextual residual aggregation for ultra high-resolution image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 7508–7517. [Google Scholar]
- Shamsolmoali, P.; Zareapoor, M.; Granger, E. Transinpaint: Transformer-based image inpainting with context adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 849–858. [Google Scholar]
- Wang, Y.; Shin, J. ISFRNet: A Deep Three-stage Identity and Structure Feature Refinement Network for Facial Image Inpainting. KSII Trans. Internet Inf. Syst. 2023, 17, 881–899. [Google Scholar] [CrossRef]
- Suvorov, R.; Logacheva, E.; Mashikhin, A.; Remizova, A.; Ashukha, A.; Silvestrov, A.; Kong, N.; Goka, H.; Park, K.; Lempitsky, V. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 2149–2159. [Google Scholar]
- Zhang, Y.; Liu, Y.; Hu, R.; Wu, Q.; Zhang, J. Mutual dual-task generator with adaptive attention fusion for image inpainting. IEEE Trans. Multimed. 2023, 26, 1539–1550. [Google Scholar] [CrossRef]
- Hou, Y.; Ma, X.; Zhang, J.; Guo, C. Symmetric Connected U-Net with Multi-Head Self Attention (MHSA) and WGAN for Image Inpainting. Symmetry 2024, 16, 1423. [Google Scholar] [CrossRef]
- Zhao, H.; Wang, Y.; Gu, Z.; Zheng, B.; Zheng, H. Context-aware mutual learning for blind image inpainting and beyond. Expert Syst. Appl. 2025, 268, 126224. [Google Scholar] [CrossRef]
- Chen, S.; Zhang, H.; Atapour–Abarghouei, A.; Shum, H.P.H. SEM-Net: Efficient pixel modelling for image inpainting with spatially enhanced SSM. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, 26 February–6 March 2025; pp. 461–471. [Google Scholar]
- Jiao, W.; Lee, H.; Wang, P.; Zhu, P.; Hu, Q.; Ren, D. Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration. arXiv 2025, arXiv:2512.10581. [Google Scholar]
- van den Oord, A.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel recurrent neural networks. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1747–1756. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- van den Oord, A.; Kalchbrenner, N.; Espeholt, L.; Kavukcuoglu, K.; Vinyals, O.; Graves, A. Conditional image generation with pixelcnn decoders. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Barcelona, Spain, 5–10 December 2016; 29. [Google Scholar]
- Salimans, T.; Karpathy, A.; Chen, X.; Kingma, D.P. PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. arXiv 2017, arXiv:1701.05517. [Google Scholar]
- Chen, C.; Abbott, A.; Stilwell, D. Multi-level generative chaotic recurrent network for image inpainting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 3626–3635. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Zheng, C.; Cham, T.J.; Cai, J. Pluralistic image completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1438–1447. [Google Scholar]
- Han, X.; Wu, Z.; Huang, W.; Scott, M.R.; Davis, L.S. FiNet: Compatible and diverse fashion image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4481–4491. [Google Scholar]
- Tu, C.T.; Chen, Y.F. Facial image inpainting with variational autoencoder. In Proceedings of the 2019 2nd International Conference of Intelligent Robotic and Control Engineering (IRCE), Singapore, 25–27 August 2019; pp. 119–122. [Google Scholar]
- Peng, J.; Liu, D.; Xu, S.; Li, H. Generating diverse structure for image inpainting with hierarchical VQ-VAE. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10775–10784. [Google Scholar]
- Razavi, A.; van den Oord, A.; Vinyals, O. Generating diverse high-fidelity images with VQ-VAE-2. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Li, J.; Song, G.; Zhang, M. Occluded offline handwritten Chinese character recognition using deep convolutional generative adversarial network and improved GoogLeNet. Neural Comput. Appl. 2020, 32, 4805–4819. [Google Scholar] [CrossRef]
- Shin, Y.G.; Sagong, M.C.; Yeo, Y.J.; Kim, S.W.; Ko, S.J. PEPSI++: Fast and lightweight network for image inpainting. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 252–265. [Google Scholar] [CrossRef]
- Zhao, L.; Mo, Q.; Lin, S.; Wang, Z.; Zuo, Z.; Chen, H.; Xing, W.; Lu, D. UCTGAN: Diverse image inpainting based on unsupervised cross-space translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5741–5750. [Google Scholar]
- Armanious, K.; Mecky, Y.; Gatidis, S.; Yang, B. Adversarial inpainting of medical image modalities. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 3267–3271. [Google Scholar]
- Chai, Y.; Xu, B.; Zhang, K.; Lepore, N.; Wood, J.C. MRI restoration using edge-guided adversarial learning. IEEE Access 2020, 8, 83858–83870. [Google Scholar] [CrossRef]
- Bao, J.; Chen, D.; Wen, F.; Li, H.; Hua, G. CVAE-GAN: Fine-grained image generation through asymmetric training. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2745–2754. [Google Scholar]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8110–8119. [Google Scholar]
- Liu, H.; Wan, Z.; Huang, W.; Song, Y.; Han, X.; Liao, J. PD-GAN: Probabilistic diverse GAN for image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9371–9381. [Google Scholar]
- Gao, X.; Nguyen, M.; Yan, W.Q. Face image inpainting based on generative adversarial network. In Proceedings of the 2021 36th International Conference on Image and Vision Computing (ICICV), Wellington, New Zealand, 9–10 December 2021; pp. 1–6. [Google Scholar]
- Guo, X.; Yang, H.; Huang, D. Image inpainting via conditional texture and structure dual generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 14134–14143. [Google Scholar]
- Chen, D.; Liao, X.; Wu, X.; Chen, S. SafePaint: Anti-forensic Image Inpainting with Domain Adaptation. In Proceedings of the 32nd ACM International Conference on Multimedia (ACM MM), Melbourne, Australia, 28 October–1 November 2024; pp. 7774–7782. [Google Scholar]
- Xie, L.; Pakhomov, D.; Wang, Z.; Wu, Z.; Chen, Z.; Zhou, Y.; Zheng, H.; Zhang, Z.; Lin, Z.; Zhou, J.; et al. TurboFill: Adapting Few-Step Text-to-image Model for Fast Image Inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 7613–7622. [Google Scholar]
- Li, Z.; Zhang, Y.; Du, Y.; Wang, X.; Wen, C.; Zhang, Y.; Geng, G.; Jia, F. STNet: Structure and texture-guided network for image inpainting. Pattern Recognit. 2024, 156, 110786. [Google Scholar] [CrossRef]
- Zhang, L.; Yu, Y.; Yao, J.; Fan, H. High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion. Int. J. Comput. Vis. 2025, 133, 1–18. [Google Scholar] [CrossRef]
- Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; Volume 37, pp. 2256–2265. [Google Scholar]
- Song, Y.; Ermon, S. Generative modeling by estimating gradients of the data distribution. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Online, 6–12 December 2020; Volume 33, pp. 6840–6851. [Google Scholar]
- Ju, X.; Liu, X.; Wang, X.; Zhang, Y.; Bian, Y.; Shan, Y.; Xu, Q. BrushNet: A plug-and-play image inpainting model with decomposed dual-branch diffusion. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; pp. 150–168. [Google Scholar]
- Yue, Z.; Wang, J.; Loy, C.C. Efficient diffusion model for image restoration by residual shifting. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6116–6130. [Google Scholar] [CrossRef]
- Zhuang, J.; Zeng, Y.; Liu, W.; Yuan, C.; Chen, K. A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; pp. 195–211. [Google Scholar]
- Xue, M.; He, J.; Palaiahnakote, S.; Zhou, M. Unified image restoration and enhancement: Degradation calibrated cycle reconstruction diffusion model. Pattern Recognit. 2025, 153, 112073. [Google Scholar] [CrossRef]
- Liu, H.; Wang, Y.; Qian, B.; Wang, M.; Rui, Y. Structure matters: Tackling the semantic discrepancy in diffusion models for image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 8038–8047. [Google Scholar]
- Lu, R.; Luo, T.; Jiang, Y.; Wang, L.; Yue, C.; Yang, P.; Liu, G.; Gu, C. Exploring Diffusion with Test-Time Training on Efficient Image Restoration. arXiv 2025, arXiv:2506.14541. [Google Scholar] [CrossRef]
- Zhou, Y.; Barnes, C.; Shechtman, E.; Amirghodsi, S. TransFill: Reference-guided image inpainting by merging multiple color and spatial transformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2266–2276. [Google Scholar]
- Wan, Z.; Zhang, J.; Chen, D.; Liao, J. High-fidelity pluralistic image completion with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 4692–4701. [Google Scholar]
- Wang, J.; Chen, S.; Wu, Z.; Jiang, Y.G. FT-TDR: Frequency-guided transformer and top-down refinement network for blind face inpainting. IEEE Trans. Multimed. 2023, 25, 2382–2392. [Google Scholar] [CrossRef]
- Cao, C.; Dong, Q.; Fu, Y. Learning prior feature and attention enhanced image inpainting. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 306–322. [Google Scholar]
- Dong, Q.; Cao, C.; Fu, Y. Incremental transformer structure enhanced image inpainting with masking positional encoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11358–11368. [Google Scholar]
- Yu, Y.; Zhan, F.; Wu, R.; Pan, J.; Cui, K.; Lu, S.; Ma, F.; Xie, X.; Miao, C. Diverse image inpainting with bidirectional and autoregressive transformers. In Proceedings of the 29th ACM International Conference on Multimedia (ACM MM), Chengdu, China, 20–24 October 2021; pp. 69–78. [Google Scholar]
- Li, W.; Lin, Z.; Zhou, K.; Qi, L.; Wang, Y.; Jia, J. MAT: Mask-aware transformer for large hole image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10758–10768. [Google Scholar]
- Chen, B.W.; Liu, T.J.; Liu, K.H. Lightweight image inpainting by stripe window transformer with joint attention to CNN. In Proceedings of the 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP), Rome, Italy, 17–20 September 2023; pp. 1–6. [Google Scholar]
- Naderi, M.R.; Givkashi, M.H.; Karimi, N.; Shirani, S.; Samavi, S. SFI-Swin: Symmetric face inpainting with swin transformer by distinctly learning face components distributions. arXiv 2023, arXiv:2301.03130. [Google Scholar] [CrossRef]
- Ko, K.; Kim, C.S. Continuously masked transformer for image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 13169–13178. [Google Scholar]
- Phutke, S.S.; Kulkarni, A.; Vipparthi, S.K.; Murala, S. Blind image inpainting via omni-dimensional gated attention and wavelet queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 1251–1260. [Google Scholar]
- Liu, Q.; Jiang, Y.; Tan, Z.; Chen, D.; Fu, Y.; Chu, Q. Transformer based pluralistic image completion with reduced information loss. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6652–6668. [Google Scholar] [CrossRef]
- Wan, Z.; Zhang, J.; Chen, D.; Liao, J. High-fidelity and efficient pluralistic image completion with transformers. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9612–9629. [Google Scholar] [CrossRef]
- He, S.; Lin, G.; Li, T.; Chen, Y. Frequency-Domain Fusion Transformer for Image Inpainting. arXiv 2025, arXiv:2506.18437. [Google Scholar] [CrossRef]
- Ning, T.; Huang, G.; Li, J.; Huang, S. Complex image inpainting of cultural relics integrating multi-stage structural features and spatial textures. Pattern Anal. Appl. 2025, 28, 85. [Google Scholar] [CrossRef]
- Iskakov, K. Semi-parametric image inpainting. arXiv 2018, arXiv:1807.02855. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Agustsson, E.; Timofte, R. NTIRE 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
- Schuhmann, C.; Beaumont, R.; Vencu, R.; Gordon, C.; Wightman, R.; Cherti, M.; Coombes, T.; Katta, A.; Mullis, C.; Wortsman, M.; et al. LAION-5B: An open large-scale dataset for training next generation image-text models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; 35, pp. 25278–25294. [Google Scholar]
- Doersch, C.; Singh, S.; Gupta, A.; Sivic, J.; Efros, A.A. What makes Paris look like Paris? ACM Trans. Graph. 2012, 31, 103–110. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, U.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1452–1464. [Google Scholar] [CrossRef]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [Google Scholar]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Smith, B.M.; Zhang, L.; Brandt, J.; Lin, Z.; Yang, J. Exemplar-based face parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 3484–3491. [Google Scholar]
- Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 12–17 December 2011. [Google Scholar]
- Cimpoi, M.; Maji, S.; Kokkinos, I.; Mohamed, S.; Vedaldi, A. Describing textures in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 3606–3613. [Google Scholar]
- Krause, J.; Stark, M.; Deng, J.; Li, F.F. 3D object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, Sydney, NSW, Australia, 1–8 December 2013; pp. 554–561. [Google Scholar]
- Yu, T.; Lin, C.; Zhang, S.; You, S.; Ding, X.; Wu, J.; Zhang, J. End-to-end partial convolutions neural networks for Dunhuang grottoes wall-painting restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Wang, N.; Wang, W.; Hu, W.; Fenster, A.; Li, S. Thanka mural inpainting based on multi-scale adaptive partial convolution and stroke-like mask. IEEE Trans. Image Process. 2021, 30, 3720–3733. [Google Scholar] [CrossRef]
- Avcıbaş, I.; Sankur, B.; Sayood, K. Statistical evaluation of image quality measures. J. Electron. Imaging 2002, 11, 206–223. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training GANs. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
- Sheikh, H.R.; Bovik, A.C.; de Veciana, G. An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans. Image Process. 2005, 14, 2117–2128. [Google Scholar] [CrossRef]
- Chandler, D.M.; Hemami, S.S. VSNR: A wavelet-based visual signal-to-noise ratio for natural images. IEEE Trans. Image Process. 2007, 16, 2284–2298. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef]
- Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multiscale structural similarity for image quality assessment. In Proceedings of the Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 2, pp. 1398–1402. [Google Scholar]
- Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a ‘completely blind’ image quality analyzer. IEEE Signal Process. Lett. 2013, 20, 209–212. [Google Scholar] [CrossRef]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
- Venkatanath, N.; Praneeth, D.; Maruthi, C.B.; Sumohana, S.C.; Swarup, S.M. Blind image quality evaluation using perception based features. In Proceedings of the 2015 Twenty First National Conference on Communications, Mumbai, India, 27 February–1 March 2015; pp. 1–6. [Google Scholar]
- Chen, C.; Mo, J.; Hou, J.; Wu, G.; Liao, L.; Sun, W. TOPIQ: A top-down approach from semantics to distortions for image quality assessment. IEEE Trans. Image Process. 2024, 33, 2404–2418. [Google Scholar] [CrossRef]









| Model | Params (M) | Flops (G) | 0.01–20% | 20–40% | 40–60% | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PSNR † | SSIM † | FID ¶ | LPIPS ¶ | PSNR † | SSIM † | FID ¶ | LPIPS ¶ | PSNR † | SSIM † | FID ¶ | LPIPS ¶ | |||
| CTSDG [81] | 52 | 53 | 30.81 | 0.904 | / | 0.042 | 25.97 | 0.759 | / | 0.095 | 22.23 | 0.561 | / | 0.227 |
| LaMa [55] | 23 | 43 | 32.47 | 0.958 | 14.73 | 0.035 | 25.09 | 0.864 | 22.94 | 0.108 | 20.68 | 0.725 | 25.94 | 0.212 |
| TransCNN-HAE [34] | 3 | 20 | 30.48 | 0.955 | 10.32 | 0.050 | 26.05 | 0.887 | 20.82 | 0.105 | 22.93 | 0.783 | 40.03 | 0.203 |
| MISF [33] | 23 | 143 | 31.34 | 0.951 | / | 0.043 | 24.24 | 0.844 | / | 0.130 | 20.04 | 0.693 | / | 0.250 |
| CMT [104] | 143 | / | 32.58 | 0.962 | 22.18 | 0.036 | 24.98 | 0.867 | 32.02 | 0.118 | 20.49 | 0.711 | 35.17 | 0.238 |
| CoordFill [35] | / | / | 25.86 | 0.882 | 37.80 | 0.211 | 24.00 | 0.825 | 42.38 | 0.226 | 21.39 | 0.707 | 54.55 | 0.273 |
| OmniWavNET [105] | 3 | 17 | 27.77 | 0.909 | 15.33 | 0.169 | 24.55 | 0.811 | 27.99 | 0.124 | 21.83 | 0.727 | 56.72 | 0.261 |
| STNet [84] | 37 | 223 | 30.71 | 0.898 | / | / | 25.93 | 0.741 | / | / | 22.22 | 0.574 | / | / |
| PUT [106] | 99 | 234 | / | / | / | / | 26.22 | 0.877 | 29.76 | 0.122 | 22.58 | 0.740 | 21.23 | 0.216 |
| CAML [58] | 24 | 133 | 29.31 | 0.954 | 9.29 | 0.043 | 26.44 | 0.899 | 18.06 | 0.093 | 23.54 | 0.813 | 30.20 | 0.180 |
| SEM-Net [59] | 163 | / | 33.01 | 0.963 | 14.52 | 0.033 | 25.42 | 0.874 | 22.78 | 0.105 | 20.83 | 0.728 | 25.70 | 0.212 |
| Model | Architecture | Key Innovation | Mask Required | Advantages | Limitations |
|---|---|---|---|---|---|
| CTSDG [81] | GAN | Bidirectional gated feature fusion (Bi-GFF and contextual feature aggregation (CFA) | Yes | Structure–texture synergy; strong contextual modeling | High computational cost; dependent on structural priors |
| LaMa [55] | U-Net | Fast Fourier Convolution (FFC and large-mask training | Yes | Strong high-resolution generalization; parameter-efficient; excels at periodic structures | Sensitive to perspective; strong hardware dependence for FFC |
| MISF [33] | E-D | Multi-level interactive dynamic filter prediction | Yes | High-fidelity details; strong generalization | Limited for large missing regions; high computational cost |
| TransCNN-HAE [34] | E-D | Transformer encoder and CNN decoder | Blind | Lightweight and efficient; single-stage blind inpainting pipeline | Not validated on high resolution; limited capability in complex scenes |
| CMT [104] | Transformer | Continuously masked tokens, overlapping tokens, and mask updating | Yes | Interpretable progressive inpainting; amenable to irregular masks | High computational overhead; not fully blind |
| OmniWavNet [105] | Transformer | Wavelet query attention and omni-dimensional gated attention | Blind | Enhanced detail recovery; mask-free; computationally efficient | May be limited for highly structured or periodic textures |
| CoordFill [35] | E-D | Restorative feature synthesis and per-pixel coordinate querying | Yes | Efficient high-resolution processing | Potential coordination and optimization challenges |
| PUT [106] | Transformer | P-VQVAE and UQ-Transformer | Yes | High detail preservation; diversified and controllable completion | Slow autoregressive inference; high training complexity and computational cost |
| PowerPaint [91] | Diffusion | Learnable task prompts (object/context/shape) and multi-task unification | Yes | Multi-functional unification; flexible task control; Compatible with external controls | Reliant on base diffusion model capability; slow inference; weak shape guidance in extreme cases |
| StrDiffusion [93] | Diffusion | Structure-guided texture diffusion and adaptive re-sampling | Yes | Significantly mitigates semantic discrepancy; reasonable structure guidance; adaptive and robust | High training complexity; slow inference; reliant on predefined structure representation |
| STNet [84] | GAN | Three-stage decomposition (structure–texture–refinement) | Yes | Highly interpretable pipeline; precise instance alignment | Extremely high computational cost; requires external preprocessing |
| CAML [58] | U-Net | Context-aware mutual learning (mask estimation and inpainting) | Blind | Blind inpainting performance close to SOTA; easily extensible to other tasks | Error accumulation in two-stage process; limited improvement for small masks |
| SEM-Net [59] | U-Net | Snake-scan Mamba block and linear-complexity modeling | Yes | Linear-complexity global context; strong long-range consistency | Extremely large parameter count; high training difficulty; still limited for complex textures/occlusions |
| CIOCR [109] | Transformer | Structure repair (TSR) and texture repair (MFDAF); dual-domain feature fusion | Yes | Designed for cultural relic images; multi-scale structure guidance; dual-domain enhanced texture | Model complexity; slow inference; insufficient generalization |
| ClusIR [39] | E-D | Probabilistic cluster-guided routing (PCGRM) and degradation-aware frequency-domain modulation (DAFMM) | Yes | Decouples degradation identification and expert activation; optimizes frequency-domain synergy | High computational burden; sensitive to cluster initialization; weak single-task clustering |
| DiffRWKVIR [94] | Diffusion | Full-scale 2D state evolution, chunk-optimized flash memory, and prior-guided efficient diffusion | Yes | High computational efficiency; adaptable to unknown degradation; few-step diffusion (5–20 steps) | Module complexity; inference latency |
| Dabformer [108] | Transformer | Wavelet transform, Gabor filter fusion, and frequency-domain adaptive gating (FDAGN) | Yes | Strong high-frequency detail retention; frequency-domain adaptive filtering; relatively high computational efficiency | Sensitive to Gabor parameters; insufficiently lightweight |
| MMInvertFill [85] | GAN | Multimodal guidance encoder (MGE), F&W+ latent space, and soft-updated mean latent | Yes | Addresses the “gap” problem; unbiased multimodal guidance; strong for large-area inpainting and out-of-domain generalization | High training complexity; long-tail sample handling needs improvement |
| SymUNet [60] | U-Net | Symmetric encoder–decoder, skip connections, and bidirectional semantic guidance | Yes | Simple architecture; strong generalization; stable training; computationally efficient | Does not fully leverage external prior knowledge |
| Category | Metrics |
|---|---|
| Full-Reference (FR) | PSNR, SSIM, L1 Loss, IFC, VSNR, FSIM, MS-SSIM, LPIPS, TOPIQ-FR |
| No-Reference (NR) | NIQE, PIQUE, TOPIQ-NR |
| Generative Model Evaluation | FID, IS |
| Inpainting Objective | Example Tasks | Core Challenge | Primary Recommended Metrics | Secondary Metrics | Rationale and Explanation |
|---|---|---|---|---|---|
| Pursuing High Pixel Fidelity | Scientific image (astronomy, remote sensing) restoration; medical image (CT/MRI) artifact removal; repair of physical damage in historical archive photos. | Must strictly adhere to the original physical signal; introducing non-authentic synthetic content is impermissible. | PSNR | SSIM, MS-SSIM | PSNR provide the most direct, unbiased pixel-level error measures with clear physical meaning, serving as the cornerstone for “mathematically lossless” standards. SSIM/MS-SSIM can supplementally assess structural preservation. |
| Pursuing Visual Naturalness and Realism | General natural scene inpainting; face restoration; outdoor image completion. | Generated content must be semantically plausible, texturally realistic, and visually coherent with the context. | LPIPS, FID | SSIM, MS-SSIM, FSIM | LPIPS, based on deep semantic features, aligns best with human perception. FID evaluates the realism of the overall distribution of generated results. SSIM family metrics serve as a baseline for fidelity. |
| Evaluating Overall Performance of Generative Models | Comparing generative inpainting algorithms (e.g., GANs, diffusion models). | Requires a comprehensive measure of the quality, diversity, and closeness to the real data distribution of the generated image set. | FID | IS | FID compares distributions in feature space and is currently the most reliable comprehensive metric. IS quickly reflects the “recognizability” and diversity of generated images but can be easily gamed. The two should be used in conjunction. |
| Rapid/Low-Overhead Quality Monitoring | Mobile image processing; real-time filters; large-scale image processing. | Computational resources are limited, requiring efficient real-time or near-real-time quality feedback. | SSIM | PSNR | SSIM offers the best compromise between simplicity and perceptual relevance. PSNR is simple to calculate but has poor perceptual relevance. |
| Fully No-Reference Blind Assessment | Online image quality screening; user-uploaded content moderation; assessment of images from unknown sources. | No reference image is available; quality must be judged independently from a single image. | NIQE, PIQUE, TOPIQ | / | NIQE requires no training data and offers good generality. PIQUE and TOPIQ, optimized via machine learning, are more accurate in specific domains. |
| Optimization for Communication and Compression Systems | Post-compression image/video inpainting; error concealment for transmission; evaluation of joint compression-inpainting algorithms. | Requires quantifying the fidelity of visual information under bandwidth constraints or information loss. | MS-SSIM, IFC | PSNR, SSIM | MS-SSIM assesses structural fidelity at multiple scales. IFC measures information loss from an information-theoretic perspective, offering solid theoretical grounding. |
| Texture and Detail Synthesis Assessment | Texture inpainting; large-area regular texture generation; artwork restoration. | Evaluates the naturalness, directional consistency, and richness of detail in generated textures. | FSIM, LPIPS | SSIM, MS-SSIM | FSIM, based on phase consistency and gradient magnitude, is highly sensitive to texture and local structure. LPIPS captures deep textural semantics. The SSIM family can assess overall structural consistency. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, Q.; He, S.; Su, M.; Zhao, F. Image Inpainting Methods: A Review of Deep Learning Approaches. Symmetry 2026, 18, 94. https://doi.org/10.3390/sym18010094
Wang Q, He S, Su M, Zhao F. Image Inpainting Methods: A Review of Deep Learning Approaches. Symmetry. 2026; 18(1):94. https://doi.org/10.3390/sym18010094
Chicago/Turabian StyleWang, Quan, Shanshan He, Miao Su, and Feng Zhao. 2026. "Image Inpainting Methods: A Review of Deep Learning Approaches" Symmetry 18, no. 1: 94. https://doi.org/10.3390/sym18010094
APA StyleWang, Q., He, S., Su, M., & Zhao, F. (2026). Image Inpainting Methods: A Review of Deep Learning Approaches. Symmetry, 18(1), 94. https://doi.org/10.3390/sym18010094

