Design of a Novel Conditional Noise Predictor for Image Super-Resolution Reconstruction Based on DDPM
Abstract
:1. Introduction
2. Related Work
2.1. Recent Advances in Image Super-Resolution
2.2. Diffusion Model
3. Methodology
3.1. RapidDiff Design
- (1)
- Diffusion process
- (2)
- Reverse process
3.2. Conditional Noise Predictor
3.2.1. Encoder Module
3.2.2. Decoder Module
3.2.3. Feature Fusion Block
3.2.4. Training and Inference
4. Experiments
4.1. Dataset Introduction
4.2. Experimental Details
4.3. Performance
4.4. Visual Comparisons
4.5. Ablation Study
4.6. Noise Prediction Performance
4.7. Computational Complexity Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Glasner, D.; Bagon, S.; Irani, M. SR from a single image. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September 2009; pp. 349–356. [Google Scholar]
- Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image SR via sparse representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef] [PubMed]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image SR using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the SR convolutional neural network. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 391–407. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image SR using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight SR with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 252–268. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image SR using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Recovering realistic texture in image SR by deep spatial feature transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 606–615. [Google Scholar]
- Indradi, S.D.; Arifianto, A.; Ramadhani, K.N. Face image SR using inception residual network and gan framework. In Proceedings of the 2019 7th International Conference on Information and Communication Technology (ICoICT), Kuala Lumpur, Malaysia, 24–26 July 2019; pp. 1–6. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Zhao, T.; Ren, W.; Zhang, C.; Ren, D.; Hu, Q. Unsupervised degradation learning for single image SR. arXiv 2018, arXiv:1812.04240. [Google Scholar]
- Xiao, Y.; Yuan, Q.; Jiang, K.; He, J.; Wang, Y.; Zhang, L. From degrade to upgrade: Learning a self-supervised degradation guided adaptive network for blind remote sensing image SR. Inf. Fusion 2023, 96, 297–311. [Google Scholar] [CrossRef]
- Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-esrgan: Training real-world blind SR with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 1905–1914. [Google Scholar]
- Chen, H.; He, X.; Qing, L.; Wu, Y.; Ren, C.; Sheriff, R.E.; Zhu, C. Real-world single image SR: A brief review. Inf. Fusion 2022, 79, 124–145. [Google Scholar] [CrossRef]
- Harms, J.; Lei, Y.; Wang, T.; Zhang, R.; Zhou, J.; Tang, X.; Curran, W.J.; Liu, T.; Yang, X. Paired cycle-GAN-based image correction for quantitative cone-beam computed tomography. Med. Phys. 2019, 46, 3998–4009. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.Y.; Bu, S.J.; Cho, S.B. Hybrid deep learning based on GAN for classifying BSR noises from invehicle sensors. In Proceedings of the Hybrid Artificial Intelligent Systems: 13th International Conference, HAIS 2018, Oviedo, Spain, 20–22 June 2018; Proceedings 13. Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 27–38. [Google Scholar]
- Sahak, H.; Watson, D.; Saharia, C.; Fleet, D. Denoising diffusion probabilistic models for robust image super-resolution in the wild. arXiv 2023, arXiv:2302.07864. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Saharia, C.; Ho, J.; Chan, W.; Salimans, T.; Fleet, D.J.; Norouzi, M. Image super-resolution via iterative refinement. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4713–4726. [Google Scholar] [CrossRef] [PubMed]
- Song, Y.; Ermon, S. Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. (NeurIPS) 2019, 32. Available online: https://proceedings.neurips.cc/paper/2019/hash/3001ef257407d5a371a96dcd947c7d93-Abstract.html?ref=https://githubhelp.com (accessed on 22 April 2025).
- Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4–8 May 2021. [Google Scholar]
- Lu, C.; Zhou, Y.; Bao, F.; Chen, J.; Li, C.; Zhu, J. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Adv. Neural Inf. Process. Syst. (NeurIPS) 2022, 35, 5775–5787. [Google Scholar]
- Karras, T.; Aittala, M.; Aila, T.; Laine, S. Elucidating the design space of diffusion-based generative models. Adv. Neural Inf. Process. Syst. (Neurips) 2022, 35, 26565–26577. [Google Scholar]
- Li, W.; Zhou, K.; Qi, L.; Lu, L.; Lu, J. Best-buddy gans for highly detailed image super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2 February–1 March 2022; Volume 36, pp. 1412–1420. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. Esrgan: Enhanced super-resolution generative adversarial net works. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Dinh, L.; Sohl-Dickstein, J.; Bengio, S. Density estimation using real NVP. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Sui, J.; Ma, X.; Zhang, X.; Pun, M.-O.; Wu, H. Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 892–906. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Djerida, A.; Djerriri, K.; Karoui, M.S. A new public Alsat-2B dataset for single-image super-resolution. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 8095–8098. [Google Scholar]
- Ji, X.; Cao, Y.; Tai, Y.; Wang, C.; Li, J.; Huang, F. Real-world super-resolution via kernel estimation and noise injection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 466–467. [Google Scholar]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
- Liang, J.; Zeng, H.; Zhang, L. Efficient and degradation-adaptive network for real-world image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 574–591. [Google Scholar]
- Yue, Z.; Wang, J.; Loy, C.C. Resshift: Efficient diffusion model for image super-resolution by residual shifting. Adv. Neural Inf. Process. Syst. 2024, 36, 13294–13307. [Google Scholar]
- Mei, Y.; Fan, Y.; Zhou, Y. Image super-resolution with non-local sparse attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3517–3526. [Google Scholar]
- Peebles, W.; Xie, S. Scalable diffusion models with trans-formers. arXiv 2022, arXiv:2212.09748. [Google Scholar]
- Xiao, Y.; Yuan, Q.; Jiang, K.; He, J.; Jin, X.; Zhang, L. EDiffSR: An efficient diffusion probabilistic model for remote sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2023, 62, 5601514. [Google Scholar] [CrossRef]
- Li, H.; Yang, Y.; Chang, M.; Chen, S.; Feng, H.; Xu, Z.; Li, Q.; Chen, Y. SRDiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing 2022, 479, 47–59. [Google Scholar] [CrossRef]
Dataset | Training Pairs | Testing Pairs | Scale Factor | Resolution |
---|---|---|---|---|
ImageNet | 2800 | 200 | 4 | 256 |
Alsat-2B | 2182 | 282 | 4 | 256 |
Method | PSNR | LPIPS | SSIM |
---|---|---|---|
ESRGAN | 20.631 | 0.4872 | 0.4424 |
RealSR-JPEG | 23.083 | 0.3227 | 0.6021 |
BSRGAN | 24.372 | 0.2503 | 0.6694 |
SwinIR | 24.021 | 0.2392 | 0.6724 |
Real-ESRGAN | 24.121 | 0.2525 | 0.6631 |
DASR | 24.685 | 0.2472 | 0.6821 |
LDM | 24.932 | 0.2712 | 0.6691 |
ResShift | 25.031 | 0.1843 | 0.6723 |
RapidDiff | 25.532 | 0.2614 | 0.7018 |
Method | PSNR | LPIPS | SSIM |
---|---|---|---|
NLSN | 15.660 | 0.4206 | 0.2661 |
SRGAN | 15.675 | 0.3970 | 0.2654 |
Beby-GAN | 15.737 | 0.3945 | 0.2684 |
ESRGAN | 12.781 | 0.3482 | 0.1763 |
DIT | 14.275 | 0.3621 | 0.2524 |
EDiffSR | 13.519 | 0.1832 | 0.1726 |
SRDiff | 13.852 | 0.1698 | 0.2115 |
ResShift | 13.763 | 0.2742 | 0.4486 |
RapidDiff | 13.925 | 0.3324 | 0.4546 |
U-Net Decoder | Transformer Decoder | Feature Fusion Block | PSNR | LPIPS | SSIM |
---|---|---|---|---|---|
✓ | 25.103 | 0.2751 | 0.6314 | ||
✓ | 25.212 | 0.2874 | 0.6472 | ||
✓ | ✓ | 25.457 | 0.2632 | 0.6989 | |
✓ | ✓ | ✓ | 25.532 | 0.2614 | 0.7018 |
U-Net Decoder | Transformer Decoder | Feature Fusion Block | PSNR | LPIPS | SSIM |
---|---|---|---|---|---|
✓ | 13.728 | 0.3416 | 0.4227 | ||
✓ | 13.832 | 0.3461 | 0.4253 | ||
✓ | ✓ | 13.879 | 0.3453 | 0.4354 | |
✓ | ✓ | ✓ | 13.925 | 0.3324 | 0.4546 |
Method | Complexity (GFLOPs) | Memory (MB) | Parameters (M) | Speed (FPS) |
---|---|---|---|---|
NLSN | 733.69 | 6877 | 44.75 | 0.774 |
SRGAN | 14.69 | 1653 | 0.73 | 0.802 |
Beby-GAN | 399.71 | 10318 | 23.17 | 0.907 |
ESRGAN | 9.97 | 1537 | 0.62 | 0.826 |
DIT | 225.16 | 6356 | 33.13 | 0.005 |
EDiffSR | 174.61 | 1954 | 30.39 | 0.073 |
SRDiff | 186.08 | 2842 | 11.66 | 0.014 |
RapidDiff | 100.30 | 452.09 | 118.51 | 0.079 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Sun, H.; Fan, H.; Xiong, Y.; Zhang, J. Design of a Novel Conditional Noise Predictor for Image Super-Resolution Reconstruction Based on DDPM. J. Imaging 2025, 11, 138. https://doi.org/10.3390/jimaging11050138
Zhang J, Sun H, Fan H, Xiong Y, Zhang J. Design of a Novel Conditional Noise Predictor for Image Super-Resolution Reconstruction Based on DDPM. Journal of Imaging. 2025; 11(5):138. https://doi.org/10.3390/jimaging11050138
Chicago/Turabian StyleZhang, Jiyan, Hua Sun, Haiyang Fan, Yujie Xiong, and Jiaqi Zhang. 2025. "Design of a Novel Conditional Noise Predictor for Image Super-Resolution Reconstruction Based on DDPM" Journal of Imaging 11, no. 5: 138. https://doi.org/10.3390/jimaging11050138
APA StyleZhang, J., Sun, H., Fan, H., Xiong, Y., & Zhang, J. (2025). Design of a Novel Conditional Noise Predictor for Image Super-Resolution Reconstruction Based on DDPM. Journal of Imaging, 11(5), 138. https://doi.org/10.3390/jimaging11050138