Efficient Deep Image Prior with Spatial-Channel Attention Transformer
Abstract
1. Introduction
2. Proposed Method
2.1. Deep Image Prior (DIP)
2.2. Backbone of Deep Image Prior (DIP)
2.3. Overview of TM-DIP
2.4. Triple Multi-Head Transposed Attention
Atention(Qs, Ks, Vs) = Concat(AH, AW, AC),
AH = VH × Softmax(KH × QH/aH),
AW = VW × Softmax(KW × QW/aW), and
AC = VC × Softmax(KC × QC/aC),
2.5. Efficiency of TMTA
3. Experiments
3.1. Experimental Setup
3.2. Comparison with DIP on Denoising and Generic Reconstruction
3.3. Comparison with DIP on Super-Resolution
3.4. Comparison with DIP on Inpainting
3.5. Comparison with DIP on Flash–No Flash Reconstruction
3.6. Comparison with DIP on Time Cost
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Burger, H.C.; Schuler, C.J.; Harmeling, S. Image denoising: Can plain neural networks compete with BM3D. In 2012 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2012; pp. 2392–2399. [Google Scholar]
- Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Pock, T. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1256–1272. [Google Scholar] [CrossRef] [PubMed]
- Cheng, S.; Wang, Y.; Huang, H.; Liu, D.; Fan, H.; Liu, S. Nbnet: Noise basis learning for image denoising with subspace projection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 4896–4906. [Google Scholar]
- Mao, X.; Shen, C.; Yang, Y.-B. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Advances in Neural Information Processing Systems 29; NeurIPS: San Diego, CA, USA, 2016. [Google Scholar]
- Tai, Y.; Yang, J.; Liu, X.; Xu, C. Memnet: A persistent memory network for image restoration. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2017; pp. 4539–4547. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 9446–9454. [Google Scholar]
- Zhou, Y.; Jiao, J.; Huang, H.; Wang, Y.; Wang, J.; Shi, H.; Huang, T. When awgnbased denoiser meets real noises. Proc. AAAI Conf. Artif. Intell. 2020, 34, 13074–13081. [Google Scholar] [CrossRef]
- Anwar, S.; Khan, S.; Barnes, N. A deep journey into super-resolution: A survey. ACM Comput. Surv. (CSUR) 2020, 53, 60. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25; NeurIPS: San Diego, CA, USA, 2012. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2017; pp. 4681–4690. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2017; pp. 136–144. [Google Scholar]
- Xia, B.; Hang, Y.; Tian, Y.; Yang, W.; Liao, Q.; Zhou, J. Efficient Non-Local Contrastive Attention for Image Super-Resolution. Proc. AAAI Conf. Artif. Intell. 2022, 36, 2759–2767. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV); IEEE: Piscataway, NJ, USA, 2018; pp. 286–301. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image superresolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2018; pp. 2472–2481. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2480–2495. [Google Scholar] [CrossRef] [PubMed]
- Asperti, A.; Tonelli, V. Comparing the latent space of generative models. Neural Comput. Appl. 2023, 35, 3155–3172. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Brox, T. Inverting convolutional networks with convolutional networks. arXiv 2015, arXiv:1506.02753. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Khawar, F.; Poon, L.; Zhang, N.L. Learning the structure of auto-encoding recommenders. In Proceedings of The Web Conference 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 519–529. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30; NeurIPS: San Diego, CA, USA, 2017. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 5728–5739. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H.; Shao, L. Learning enriched features for real image restoration and enhancement. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020, Proceedings, Part XXV 16; Springer: Cham, Switzerland, 2020; pp. 492–511. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H.; Shao, L. Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 14821–14831. [Google Scholar]
- Ding, X.; Fan, H.; Gong, J. Towards generating network of bikeways from Mapillary data. Comput. Environ. Urban Syst. 2021, 88, 101632. [Google Scholar] [CrossRef]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 13733–13742. [Google Scholar]
- Guo, S.; Yan, Z.; Zhang, K.; Zuo, W.; Zhang, L. Toward convolutional blind denoising of real photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2019; pp. 1712–1722. [Google Scholar]
- Wen, S.; Liu, W.; Yang, Y.; Huang, T.; Zeng, Z. Generating realistic videos from keyframes with concatenated GANs. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 2337–2348. [Google Scholar] [CrossRef]
- Wu, B.; Dai, X.; Zhang, P.; Wang, Y.; Sun, F.; Wu, Y.; Tian, Y.; Vajda, P.; Jia, Y.; Keutzer, K. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2019; pp. 10734–10742. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Abdelhamed, A.; Lin, S.; Brown, M.S. A high-quality denoising dataset for smartphone cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 1692–1700. [Google Scholar]
- Liu, Y.; Qin, Z.; Anwar, S.; Ji, P.; Kim, D.; Caldwell, S.; Gedeon, T. Invertible denoising network: A light solution for real noise removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 13365–13374. [Google Scholar]
- Mahendran, A.; Vedaldi, A. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2015; pp. 5188–5196. [Google Scholar]
- Glasner, D.; Bagon, S.; Irani, M. Super-resolution from a single image. In 2009 IEEE 12th International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2009; pp. 349–356. [Google Scholar]
- Lai, W.-S.; Huang, J.-B.; Ahuja, N.; Yang, M.-H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2017; pp. 624–632. [Google Scholar]
- Shocher, A.; Cohen, N.; Irani, M. “zero-shot” super-resolution using deep internal learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 3118–3126. [Google Scholar]
- Petschnigg, G.; Szeliski, R.; Agrawala, M.; Cohen, M.; Hoppe, H.; Toyama, K. Digital photography with flash and no-flash image pairs. ACM Trans. Graph. (TOG) 2004, 23, 664–672. [Google Scholar] [CrossRef]









| Baboon | Barbara | Bridge | Coastguard | Comic | Face | Flowers | Foreman | Lenna | Man | Monarch | Pepper | Ppt3 | Zebra | Avg | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No prior | 22.24 | 24.89 | 23.94 | 24.62 | 21.06 | 29.99 | 23.75 | 29.01 | 28.23 | 24.84 | 25.76 | 28.71 | 20.26 | 21.69 | 24.93 |
| Bicubic | 22.44 | 24.15 | 24.47 | 25.53 | 21.59 | 31.34 | 25.33 | 29.45 | 29.84 | 25.7 | 27.45 | 30.63 | 21.78 | 24.01 | 26.05 |
| TV prior [34] | 22.34 | 24.78 | 24.46 | 25.78 | 21.95 | 31.34 | 25.91 | 30.63 | 29.76 | 25.94 | 28.46 | 31.32 | 22.75 | 24.52 | 26.42 |
| Glasner et al. [35] | 22.44 | 25.38 | 24.73 | 25.38 | 21.98 | 31.09 | 25.54 | 30.4 | 30.48 | 26.33 | 28.22 | 32.02 | 22.16 | 24.34 | 26.46 |
| DIP | 22.29 | 25.53 | 24.38 | 25.81 | 22.18 | 31.02 | 26.14 | 31.66 | 30.83 | 26.09 | 29.98 | 32.08 | 24.38 | 25.71 | 27.00 |
| Ours | 22.31 | 25.63 | 24.45 | 25.94 | 22.29 | 31.17 | 26.28 | 31.73 | 30.99 | 26.14 | 30.12 | 32.21 | 24.43 | 25.85 | 27.11 |
| SRResNet-MSE [12] | 23.00 | 26.08 | 25.52 | 26.31 | 23.44 | 32.71 | 28.13 | 33.8 | 32.42 | 27.43 | 32.82 | 34.28 | 26.56 | 26.95 | 28.53 |
| LapSRN [36] | 22.83 | 25.69 | 25.36 | 26.21 | 22.9 | 32.62 | 27.54 | 33.59 | 31.98 | 27.27 | 31.62 | 33.88 | 25.36 | 26.98 | 28.13 |
| Baboon | Barbara | Bridge | Coastguard | Comic | Face | Flowers | Foreman | Lenna | Man | Monarch | Pepper | Ppt3 | Zebra | Avg | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No prior | 21.09 | 23.04 | 21.78 | 23.63 | 18.65 | 27.84 | 21.05 | 25.62 | 25.42 | 22.54 | 22.91 | 25.34 | 18.15 | 18.85 | 22.56 |
| Bicubic | 21.28 | 23.44 | 22.24 | 23.65 | 19.25 | 28.79 | 22.06 | 25.37 | 26.27 | 23.06 | 23.18 | 26.55 | 18.62 | 19.59 | 23.09 |
| TV prior [34] | 21.3 | 23.72 | 22.3 | 23.82 | 19.5 | 28.84 | 22.5 | 26.07 | 26.74 | 23.53 | 23.71 | 27.56 | 19.34 | 19.89 | 23.48 |
| SelfExSR [37] | 21.37 | 23.9 | 22.28 | 24.17 | 19.79 | 29.48 | 22.93 | 27.01 | 27.72 | 23.83 | 24.02 | 28.63 | 20.09 | 20.25 | 23.96 |
| DIP | 21.38 | 23.94 | 22.2 | 24.21 | 19.86 | 29.52 | 22.86 | 27.87 | 27.93 | 23.57 | 24.86 | 29.18 | 20.12 | 20.62 | 24.15 |
| Ours | 21.49 | 24.07 | 22.31 | 24.42 | 19.97 | 29.71 | 22.95 | 27.94 | 28.06 | 23.75 | 24.98 | 29.31 | 20.15 | 20.71 | 24.37 |
| LapSRN [36] | 21.51 | 24.21 | 22.77 | 24.10 | 20.06 | 29.85 | 23.31 | 28.13 | 28.22 | 24.20 | 24.97 | 29.22 | 20.13 | 20.28 | 24.35 |
| Baby | Bird | Butterfly | Head | Woman | Avg | |
|---|---|---|---|---|---|---|
| No prior | 30.16 | 27.67 | 19.82 | 29.98 | 25.18 | 26.56 |
| Bicubic | 31.78 | 30.2 | 22.13 | 31.34 | 26.75 | 28.44 |
| TV prior [34] | 31.21 | 30.43 | 24.38 | 31.34 | 26.93 | 28.85 |
| SelfExSR [37] | 32.24 | 31.1 | 22.36 | 31.69 | 26.85 | 28.84 |
| DIP | 31.49 | 31.8 | 26.23 | 31.04 | 28.93 | 29.89 |
| Ours | 32.25 | 31.95 | 26.45 | 31.17 | 29.21 | 30.21 |
| LapSRN [36] | 33.55 | 33.76 | 27.28 | 32.62 | 30.72 | 31.58 |
| SRResNet-MSE [12] | 33.66 | 35.1 | 28.41 | 32.73 | 30.6 | 32.1 |
| Baby | Bird | Butterfly | Head | Woman | Avg | |
|---|---|---|---|---|---|---|
| No prior | 26.28 | 24.03 | 17.64 | 27.94 | 21.37 | 23.45 |
| Bicubic | 27.28 | 25.28 | 17.74 | 28.82 | 22.74 | 24.37 |
| TV prior [34] | 27.93 | 25.82 | 18.4 | 28.87 | 23.36 | 24.87 |
| SelfExSR [37] | 28.45 | 26.48 | 18.8 | 29.36 | 24.05 | 25.42 |
| DIP | 28.28 | 27.09 | 20.02 | 29.55 | 24.5 | 25.88 |
| Ours | 28.41 | 27.22 | 20.13 | 29.71 | 24.67 | 26.05 |
| LapSRN [36] | 28.88 | 27.1 | 19.97 | 29.76 | 24.79 | 26.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lin, W.; Zhang, Z.; Lin, J.; You, Y. Efficient Deep Image Prior with Spatial-Channel Attention Transformer. Mathematics 2026, 14, 1185. https://doi.org/10.3390/math14071185
Lin W, Zhang Z, Lin J, You Y. Efficient Deep Image Prior with Spatial-Channel Attention Transformer. Mathematics. 2026; 14(7):1185. https://doi.org/10.3390/math14071185
Chicago/Turabian StyleLin, Weiwei, Zeqing Zhang, Jin Lin, and Ying You. 2026. "Efficient Deep Image Prior with Spatial-Channel Attention Transformer" Mathematics 14, no. 7: 1185. https://doi.org/10.3390/math14071185
APA StyleLin, W., Zhang, Z., Lin, J., & You, Y. (2026). Efficient Deep Image Prior with Spatial-Channel Attention Transformer. Mathematics, 14(7), 1185. https://doi.org/10.3390/math14071185

