PixelCraftSR: Efficient Super-Resolution with Multi-Agent Reinforcement for Edge Devices
Abstract
:1. Introduction
- We propose PixelCraftSR, an RL-based method for ESR to deploy pixel-wise agents that adhere to the asynchronous advantage actor–critic (A3C) policy. This approach significantly improves the construction of super-resolution images while using considerably fewer parameters than existing ESR methods. To the best of our knowledge, this is the first approach to utilize RL in the ESR task.
- To create PixelCraftSR, we propose a novel action set that can be deployed at the pixel level. This action set is composed of three deep-learning-based ESR methods and four traditional image-enhancement techniques, collectively forming an effective ensemble strategy for SISR.
- Within our proposed action set, we introduced modifications to the SRCNN to enhance its performance further by increasing its depth and elevating channel-wise attention within the network.
- In addition, we deployed our model on the Jetson Nano Orin platform to evaluate its efficiency. Our approach demonstrated significantly faster and real-time output performance, highlighting its practical applicability for real-world scenarios.
2. Related Work
2.1. Deep-Learning-Based Efficient Super-Resolution
2.2. Reinforcement-Learning Based Image Super Resolution
3. Proposed Reinforcement-Learning-Based Super Resolution Method
3.1. Base Image Construction
3.2. Pixel-Wise A3C-Based Agent for PixelCraftSR
3.3. PixelCraftSR Action Set
3.4. Modified SRCNN as an Action
- Patch extraction and representation layer:
- Non-linear mapping layer:
- Reconstruction layer:
3.5. Reward Function
3.6. Loss Function
4. Experiments and Results
4.1. Dataset Preparation
4.2. Modified SRCNN Action
4.3. Comparison of PixelCraftSR with SOTA Methods
4.3.1. Quantitative Evaluation
4.3.2. Qualitative Evaluation
4.4. Inference Analysis
4.5. Ablation Study
4.5.1. Modifying Action Set
4.5.2. Analysing PixelCraftSR Performance for Each Timestep
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kong, F.; Li, M.; Liu, S.; Liu, D.; He, J.; Bai, Y.; Chen, F.; Fu, L. Residual local feature network for efficient super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 766–776. [Google Scholar]
- Sharif, S.; Naqvi, R.A.; Biswas, M. SAGAN: Adversarial spatial-asymmetric attention for noisy Nona-Bayer reconstruction. arXiv 2021, arXiv:2110.08619. [Google Scholar]
- Rasool, M.J.A.; Jeong, W.; Ahmed, S.; Whangbo, T.K. Stellar SR: A Convolutional Local Feature Network for Lightweight Image Super-Resolution. In Proceedings of the Korean Broadcasting Media Engineering Society Academic Conference, DBpia, Jeju, Republic of Korea, 25–28 June 2024; Volume 7, pp. 298–301. Available online: https://www.dbpia.co.kr/pdf/pdfView.do?nodeId=NODE11849087 (accessed on 1 January 2025).
- Rasool, M.A.; Ahmed, S.; Sabina, U.; Whangbo, T.J. MRESR: Multi-agent Reinforcement learning for Efficient Super-Resolution. In Proceedings of the Korean Broadcasting Media Engineering Society Academic Conference, DBpia, Jeju, Republic of Korea, 28–30 June 2023; pp. 460–463. [Google Scholar]
- Yue, L.; Shen, H.; Li, J.; Yuan, Q.; Zhang, H.; Zhang, L. Image super-resolution: The techniques, applications, and future. Signal Process. 2016, 128, 389–408. [Google Scholar]
- Uddin, S.N.; Jung, Y.J. SIFNet: Free-form image inpainting using color split-inpaint-fuse approach. Comput. Vis. Image Underst. 2022, 221, 103446. [Google Scholar]
- Li, L.; Zhang, Y.; Yuan, L.; Gao, X. SANet: Face super-resolution based on self-similarity prior and attention integration. Pattern Recognit. 2025, 157, 110854. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Wang, X.; Xie, L.; Dong, C.; Shan, Y. Realesrgan: Training real-world blind super-resolution with pure synthetic data supplementary material. Comput. Vis. Found. Open Access 2022, 1, 2. [Google Scholar]
- Pérez-Pellitero, E.; Catley-Chandar, S.; Leonardis, A.; Timofte, R. NTIRE 2021 challenge on high dynamic range imaging: Dataset, methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Conference, 19–25 June 2021; pp. 691–700. [Google Scholar]
- Mardieva, S.; Ahmad, S.; Umirzakova, S.; Rasool, M.A.; Whangbo, T.K. Lightweight image super-resolution for IoT devices using deep residual feature distillation network. Knowl.-Based Syst. 2024, 285, 111343. [Google Scholar]
- Rasool, M.; Ahmad, S.; Mardieva, S.; Akter, S.; Whangbo, T.K. A Comprehensive Survey on Real-Time Image Super-Resolution for IoT and Delay-Sensitive Applications. Appl. Sci. 2025, 15, 274. [Google Scholar] [CrossRef]
- Ahn, N.; Kang, B.; Sohn, K.A. Efficient deep neural network for photo-realistic image super-resolution. Pattern Recognit. 2022, 127, 108649. [Google Scholar]
- Zamfir, E.; Conde, M.V.; Timofte, R. Towards real-time 4k image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1522–1532. [Google Scholar]
- Conde, M.V.; Zamfir, E.; Timofte, R.; Motilla, D.; Liu, C.; Zhang, Z.; Peng, Y.; Lin, Y.; Guo, J.; Zou, X.; et al. Efficient deep models for real-time 4k image super-resolution. NTIRE 2023 benchmark and report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1495–1521. [Google Scholar]
- Gendy, G.; He, G.; Sabor, N. Lightweight image super-resolution based on deep learning: State-of-the-art and future directions. Inf. Fusion 2023, 94, 284–310. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1646–1654. [Google Scholar]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 391–407. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1874–1883. [Google Scholar]
- Wang, Y. Edge-enhanced feature distillation network for efficient super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 777–785. [Google Scholar]
- Rasool, M.A.; Ahmed, S.; Sabina, U.; Whangbo, T.K. KONet: Towards a Weighted Ensemble Learning Model for Knee Osteoporosis Classification. IEEE Access 2024, 15, 274. [Google Scholar]
- Liu, J.; Tang, J.; Wu, G. Residual feature distillation network for lightweight image super-resolution. In Proceedings of the Computer Vision–ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 41–55. [Google Scholar]
- Mazyavkina, N.; Sviridov, S.; Ivanov, S.; Burnaev, E. Reinforcement learning for combinatorial optimization: A survey. Comput. Oper. Res. 2021, 134, 105400. [Google Scholar] [CrossRef]
- Furuta, R.; Inoue, N.; Yamasaki, T. Pixelrl: Fully convolutional network with reinforcement learning for image processing. IEEE Trans. Multimed. 2019, 22, 1704–1719. [Google Scholar] [CrossRef]
- Jarosik, P.; Lewandowski, M.; Klimonda, Z.; Byra, M. Pixel-wise deep reinforcement learning approach for ultrasound image denoising. In Proceedings of the 2021 IEEE International Ultrasonics Symposium (IUS), Virtual, 11–16 September 2021; pp. 1–4. [Google Scholar]
- Su, H.; Li, Y.; Xu, Y.; Fu, X.; Liu, S. A review of deep-learning-based super-resolution: From methods to applications. Pattern Recognit. 2024, 157, 110935. [Google Scholar]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
- Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2024–2032. [Google Scholar]
- Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image super-resolution via sparse representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [PubMed]
- Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the Curves and Surfaces: 7th International Conference, Avignon, France, 24–30 June 2010; Revised Selected Papers 7. Springer: Berlin/Heidelberg, Germany, 2012; pp. 711–730. [Google Scholar]
- Arbelaez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 898–916. [Google Scholar] [CrossRef] [PubMed]
- Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
- Le, N.; Rathour, V.S.; Yamazaki, K.; Luu, K.; Savvides, M. Deep reinforcement learning in computer vision: A comprehensive survey. Artif. Intell. Rev. 2022, 55, 2733–2819. [Google Scholar] [CrossRef]
- Vassilo, K.; Heatwole, C.; Taha, T.; Mehmood, A. Multi-step reinforcement learning for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Virtual, 14–19 June 2020; pp. 512–513. [Google Scholar]
- Siu, W.C.; Hung, K.W. Review of image interpolation and super-resolution. In Proceedings of the 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Hollywood, CA, USA, 3–6 December 2012; pp. 1–10. [Google Scholar]
- Zhou, L.; Cai, H.; Gu, J.; Li, Z.; Liu, Y.; Chen, X.; Qiao, Y.; Dong, C. Efficient image super-resolution using vast-receptive-field attention. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 256–272. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Ahmad, S.; Kim, J.S.; Park, D.K.; Whangbo, T. Automated detection of gastric lesions in endoscopic images by leveraging attention-based yolov7. IEEE Access 2023, 11, 87166–87177. [Google Scholar]
- Sajjadi, M.S.; Scholkopf, B.; Hirsch, M. Enhancenet: Single image super-resolution through automated texture synthesis. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4491–4500. [Google Scholar]
- Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar]
- Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the 23rd British Machine Vision Conference (BMVC), Surrey, UK, 3–7 September 2012. [Google Scholar]
- Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
- Thawakar, O.; Patil, P.W.; Dudhane, A.; Murala, S.; Kulkarni, U. Image and video super resolution using recurrent generative adversarial network. In Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; pp. 1–8. [Google Scholar]
Approach | Published Venue | Training Dataset | Limitations |
---|---|---|---|
SRCNN [17] | TPAMI | T91 dataset [30] + ILSVRC 2013 ImageNet [31] | Struggles to generalize well to diverse image types and scales due to its reliance on fixed-size patches and lack of explicit understanding of image content. |
FSRCNN [19] | ECCV | General100 [19] + T91 dataset [30] | Suffers from reduced performance on very large upscaling factors due to its reliance on iterative upsampling and convolutional layers. |
ESPCN [20] | CVPR | T91 dataset [30] + BSD500 [32] | Exhibits artifacts and blurring in SR outputs due to the sub-pixel convolutional layer’s limited ability to reconstruct fine details. |
IMDN [29] | Proc. ACM Inter. Conf. on MM | DIV2K [33] | The iterative approach increases computational complexity and training times, potentially affecting the quality of the output due to challenges in preserving fine details and avoiding artifacts, especially for extreme upscaling factors. |
LapSRN [28] | CVPR | T91 dataset [30] + BSD500 [32] | The Laplacian pyramid structure potentially limits its scalability to high-quality output. |
RFDN [23] | ECCV | DIV2K [33] | Increased computational complexity and potential challenges in capturing diverse image features—affecting the overall quality and generalization capability of the model. |
RLFN [1] | CVPRW | DIV2K [33] | Local features lead to limitations in capturing global context and intricate details, potentially resulting in less accurate reconstruction of complex image structures and textures. |
No of Action | Action |
---|---|
1 | Pixel value −1 |
2 | Does nothing on that timestep |
3 | Pixel value +1 |
4 | ESPCN [20] |
5 | VDSR [18] |
6 | Modified SRCNN; refer to Figure 5 |
7 | Increase sharpness by 10% |
Method | Train PSNR/SSIM | Validation PSNR/SSIM | FLOPs (G) | Param. (k) | |
---|---|---|---|---|---|
Set14 | Set5 | ||||
SRCNN [17] | 31.05/0.8923 | 31.41/0.9063 | 36.66/0.9542 | 6.10 | 20 |
MSRCNN4 | 31.86/0.9001 | 31.75/0.9072 | 36.72/0.9552 | 18.30 | 316 |
MSRCNN5 | 32.16/0.9051 | 31.88/0.9098 | 37.58/0.9567 | 24.45 | 389 |
VDSR [18] | 32.00/0.9102 | 31.67/0.9127 | 37.53/0.9587 | 70.50 | 666 |
Scale | Model | Time (s) | Params (k) | FLOPs (G) | Set14 (PSNR/SSIM) | BSDS100 (PSNR/SSIM) | Urban100 (PSNR/SSIM) | Set5 (PSNR/SSIM) |
---|---|---|---|---|---|---|---|---|
x2 | SRCNN [17] | 0.01 | 20 | 6.10 | 31.41/0.9063 | 31.36/0.8879 | 29.50/0.8946 | 36.66/0.9542 |
FSRCNN [19] | 0.01 | 12 | 1.72 | 32.62/0.9087 | 31.50/0.8904 | 29.85/0.9009 | 36.98/0.9556 | |
VDSR [18] | 0.23 | 666 | 70.50 | 31.67/0.9127 | 31.90/0.8960 | 30.77/0.9141 | 37.53/0.9587 | |
LapSRN [28] | 0.71 | 251 | 8.57 | 32.99/0.9124 | 31.80/0.8952 | 30.41/0.9103 | 37.52/0.9591 | |
IMDN [29] | 0.85 | 694 | 45.23 | 33.63/0.9177 | 32.19/0.8996 | 32.17/0.9283 | 38.00/0.9605 | |
RFDN [23] | 0.05 | 534 | 37.67 | 33.68/0.9184 | 32.16/0.8994 | 32.12/0.9278 | 38.05/0.9606 | |
RLFN [1] | 0.03 | 527 | 35.45 | 33.72/0.9187 | 32.22/0.9000 | 32.33/0.9278 | 38.07/0.9607 | |
Our PixelCraftSR | 0.02 | 487 | 31.82 | 33.91/0.9648 | 34.87/0.9735 | 31.64/0.9590 | 38.08/0.9905 | |
x4 | SRCNN [17] | 0.01 | 20 | 6.10 | 27.49/0.7503 | 26.90/0.7101 | 24.52/0.7221 | 30.48/0.8628 |
FSRCNN [19] | 0.01 | 12 | 1.72 | 27.61/0.7503 | 26.98/0.7150 | 24.62/0.7280 | 30.72/0.8660 | |
VDSR [18] | 0.23 | 666 | 70.50 | 28.01/0.7550 | 27.29/0.7250 | 25.18/0.7524 | 31.35/0.8838 | |
LapSRN [28] | 0.82 | 502 | 8.57 | 28.09/0.7670 | 27.32/0.7562 | 25.21/0.7562 | 31.54/0.8852 | |
IMDN [29] | 0.91 | 715 | 45.23 | 28.61/0.7811 | 27.56/0.7353 | 26.04/0.7838 | 32.21/0.8948 | |
RFDN [23] | 0.05 | 550 | 45.10 | 28.58/0.7819 | 27.57/0.7360 | 26.11/0.7858 | 32.24/0.8952 | |
RLFN [1] | 0.04 | 543 | 37.67 | 28.61/0.7813 | 27.60/0.7364 | 26.17/0.7866 | 32.24/0.8952 | |
Our PixelCraftSR | 0.03 | 487 | 31.82 | 29.10/0.9218 | 31.21/0.9412 | 27.77/0.9100 | 31.96/0.9673 |
Scale | Input Dimension | FLOPs (G) | Inference Speed (s) | |||
---|---|---|---|---|---|---|
CPU | GPU | MAX-Q | MAX-N | |||
x2 | 31.82 | 0.42 | 0.02 | 0.05 | 0.05 | |
74.58 | 0.87 | 0.04 | 0.09 | 0.12 | ||
149.17 | 1.92 | 0.07 | 0.32 | 0.44 | ||
x4 | 31.82 | 0.43 | 0.03 | 0.05 | 0.06 | |
74.58 | 0.96 | 0.04 | 0.09 | 0.12 | ||
149.17 | 1.89 | 0.07 | 0.33 | 0.44 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rasool, M.J.A.; Ahmed, S.; Sharif, S.M.A.; Sevara, M.; Whangbo, T.K. PixelCraftSR: Efficient Super-Resolution with Multi-Agent Reinforcement for Edge Devices. Sensors 2025, 25, 2242. https://doi.org/10.3390/s25072242
Rasool MJA, Ahmed S, Sharif SMA, Sevara M, Whangbo TK. PixelCraftSR: Efficient Super-Resolution with Multi-Agent Reinforcement for Edge Devices. Sensors. 2025; 25(7):2242. https://doi.org/10.3390/s25072242
Chicago/Turabian StyleRasool, M. J. Aashik, Shabir Ahmed, S. M. A. Sharif, Mardieva Sevara, and Taeg Keun Whangbo. 2025. "PixelCraftSR: Efficient Super-Resolution with Multi-Agent Reinforcement for Edge Devices" Sensors 25, no. 7: 2242. https://doi.org/10.3390/s25072242
APA StyleRasool, M. J. A., Ahmed, S., Sharif, S. M. A., Sevara, M., & Whangbo, T. K. (2025). PixelCraftSR: Efficient Super-Resolution with Multi-Agent Reinforcement for Edge Devices. Sensors, 25(7), 2242. https://doi.org/10.3390/s25072242