Lightweight Network for Single Image Super-Resolution with Arbitrary Scale Factor

Dang, Quang Truong Duy; Huang, Kuan-Yu; Chen, Pei-Yin

doi:10.3390/engproc2023055015

Open AccessProceeding Paper

Lightweight Network for Single Image Super-Resolution with Arbitrary Scale Factor^†

by

Quang Truong Duy Dang

,

Kuan-Yu Huang

and

Pei-Yin Chen

^*

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan 70142, Taiwan

^*

Author to whom correspondence should be addressed.

^†

Presented at the IEEE 5th Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability, Tainan, Taiwan, 2–4 June 2023.

Eng. Proc. 2023, 55(1), 15; https://doi.org/10.3390/engproc2023055015

Published: 28 November 2023

(This article belongs to the Proceedings of 2023 IEEE 5th Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability)

Download

Browse Figures

Versions Notes

Abstract

:

The existing single image super-resolution (SISR) methods that consider integer scale factors (X2, X3, X4, and X8), have been developed well, but SISR methods with arbitrary scale factors (X1.3, X2.5, and X3.7) have gradually gained attention recently. Therefore, we proposed an efficient, lightweight model. In this study, there are two contributions as follows. (1) An efficient and lightweight network for SISR is combined with the up-scaled module, which determines its weights based on the size of the high-resolution (HR) image. (2) All scale factors are applied simultaneously using one model, which saves more storage and computational resources. Finally, we design various experiments to evaluate the proposed method based on multiple general datasets. The experimental results show that the proposed model is lightweight while the performance is relatively competitive.

Keywords:

single image super-resolution (SISR); arbitrary scale factor; lightweight network

1. Introduction

In recent years, convolutional neural networks (CNN) have become one of the most ubiquitous machine learning solutions for computer vision tasks. CNN is used intentionally in most fields of image processing, and single image super-resolution (SISR) is one of them. SISR was known as the up-scaling method in the past, which indicates generating a high-resolution (HR) image from a single low-resolution (LR) image. The SISR techniques [1,2,3,4,5,6] based on CNN have been developed since SRCNN [1], but most of them only consider the integer scale factors (X2, X3, X4, and X8), as shown in Figure 1a. Since there are situations where users need to up-scale low-resolution (LR) images to customize the size instead of fixing the size in real-world scenarios, the SISR methods with arbitrary scale factors (X1.3, X2.5, and X3.7) become important. In addition, if we train a single specific model with every scale factor, it saves time and effort, as shown in Figure 1b. Putting all the reasons together, the researchers have come up with the idea to solve that problem.

Meta Super-Resolution (Meta-SR) [2] was proposed in 2019. Their Meta-Upscale Module is applied to up-scaling LR based on different scale factors. In contrast, the up-scaling module of the SISR methods with the integer scale factors is the deconvolution layer or sub-pixel layer at the end of networks. In particular, the sub-pixel layer [3] is widely used in SR works, such as the Residual Dense Network (RDN) [4] and residual channel attention network (RCAN) [5]. Meta-SR adopts RDN [4] as the backbone and proves that Meta-SR obtains high performance and deals with arbitrary scale factors for SISR. However, Meta-SR [2] has high complexity, and implementing it involves many challenges in terms of the hardware requirements, making it computationally expensive.

Therefore, the proposed method is focused on constructing a lightweight model, which is more appropriate and likely to work in real-life scenarios. The proposed model is inspired by Meta-SR [2] and is called Light Arbitrary-SR (LAS), which is much lighter in weight than the original Meta-SR [2]. Compared to a similar study [6] of very deep super-resolution (VDSR) with arbitrary scale factors, the research result shows a better quality of HR images with lower usage of weights and computational cost.

2. Proposed Method

The proposed LAS is inspired by RCAN [5] and Meta-SR [2]. We found an efficient and lightweight network as the backbone based on RCAN [5] and combined it with the Meta-Upscale Module [2], as shown in Figure 2. One novelty in RCAN [5] is to establish a very deep network based on the residual in residual (RIR) structure. This network comprises several residual groups and long skip connections. Each group consists of multiple residual blocks and short skip connections.

Generally, RIR makes the main network concentrate on learning high-frequency information by allowing plentiful low-frequency information to be surpassed via numerous skip connections. The channel attention mechanism is also introduced to improve the representational ability of the network further. The dominant part of the network is the residual channel attention block (RCAB), which helps the network recognize informative components of the LR features efficiently. The RCAB, inspired by the success of channel attention (CA) and residual blocks (RB), helps the network learn and explore more information to improve the overall performance. RCAN is constructed using the foundation of RCAB and RIR structure.

However, a very deep RCAN brings about higher accuracy and superior results for the SR image. Thus, we aimed to build a low-complexity network for SR images with arbitrary scale factors, as the RCAN is still too complicated with a high computational cost, which makes it challenging to implement. In the original RCAN, 20 RCABs and 10 residual groups are set up, and the usage of total weights is about 16 M. To reduce the complexity and make it more appropriate for hardware implementation, we reduced it to around 90% of the entire implementation to only 3 or 6 RCABs and a single residual group.

Moreover, another highlight in the proposed LAS is the use of the Meta-Upscale Module, which has three core functions: Location Projection, Weight Prediction, and Feature Mapping. The block of Location Projection projects pixels of the HR image onto the LR image based on the scale factor and the kernel weights for each pixel on the HR image are predicted by the Weight Prediction Module. Lastly, the feature maps on the LR image and the predicted kernel weights are mapped back to the HR image using the Feature Mapping function to compute the value of the pixel of the HR image. We attempted to simplify the Meta-Upscale Module as well. Since the Weight Prediction function of the Meta-Upscale Module uses a network to predict the kernel weights using two fully connected layers, it consumes a lot of computational resources. We experimented with alleviating the neurons from 256 to 128 and then to 64 to observe their performance. Finally, the Meta-Upscale Module was simplified by reducing the number of neurons in the fully connected layer from 256 to 64. The proposed method is confirmed to be a lightweight SR method with arbitrary scale factors.

3. Experimental Result

To achieve a lightweight model of super-resolution with non-integer scale factors, we attempted to combine the Meta-Upscale Module and RCAN and simplified them. In the experiment, three versions were presented, LAS_A, LAS_B, LAS_C, and Meta-RCAN, based on a different setting. For LAS_A, we implemented three RCABs. A single residual block with a simplified Meta-Upscale Module reduced the number of neurons in the fully connected layer to 64. For LAS_B, there were six RCABs and a single residual block with a simplified Meta-Upscale Module reduced the number of neurons in the fully connected layer to 64. LAS_C contained six RCABs and one residual block with 256 fully connected layers in the Meta-Upscale Module. Lastly, Meta-RCAN indicates the use of a slightly simplified RCAN, which was set up with 16 RCABs and 10 residual groups. The setting of Meta-RCAN was adopted from the official source code in Ref. [2]. We re-trained the model and presented the test results, and we did not consider Meta-RCAN as one of the versions of LAS.

All the experiments were run in parallel on two GPUs (Nvidia GeForce GTX 1080 Ti). We used the Pytorch framework and Python 3 with CUDA (version 11.2.142). The training and testing required libraries, including Pytorch 0.5.0, Python 3.5 or higher, NumPy, skiamge, imageio, and cv2. The training scale factors for the proposed methods varied from 1, 1.1, 1.2, 1.3, 1.4, … to 4 with a stride of 0.1. The training dataset contained 800 images from the DIV2K [7] dataset. The test dataset was from on three datasets: Set5 [8], Set14 [9], and B100 [10]. For other details, the learning rate was decreased by half after every 200 epochs with an initialization of 10-4 for all the layers. The optimizer is Adam. For better convergence, the L1 loss function, instead of the L2, was used to train the network.

Since RCAN has a better representational ability of the model than RDN, the Meta-RCAN shows a similar value of the evaluated metric with around 40% lower parameters than Meta-RDN (Table 1). Moreover, LAS_C has more fully connected layers in the Meta-Upscale Module, so it has approximately 30% higher values of parameters than LAS_B. However, for the evaluated metric, LAS_B only has a slightly lower value than LAS_C. In the comparison of LAS_B and LAS_A, the more parameters there are, the better the quality of the image.

Compared to the lightweight VDSR, the evaluated metric of LAS_B is slightly higher, but it still requires relatively fewer parameters. For LAS_A, its parameters are 33% lower than those of VDSR and obtain almost similar evaluated metrics. The results of VDSR are obtained from the original data in Ref. [6], and the results of Meta-RDN [2] are obtained from the pre-trained model created by us.

We present the generated HR images made by LAS_B with several scale factors in Figure 3. The comparison of the proposed methods with others for practically generated x2.0, x3.0, and x4.0 HR images are provided in Figure 4, Figure 5 and Figure 6, respectively. Finally, there is a trade-off between the performance, which is evaluated using PSNR and SSIM metrics, and the cost is assessed using the parameters. The proposed LAS_A with only 400 K parameters makes the proposed model reasonable and realistic for implementation into hardware devices. In particular, it considers non-integer scale factors.

4. Conclusions

Super-resolution with non-integer scale factors is still a practical topic that has gradually gained attention in recent years. Meta-SR [2] is used for tackling this problem. A novel upscale module is proposed to proactively predict the kernel weights based on the corresponding scale factor. Based on this particular design, we need to train only a single model for all arbitrary scale factors. It saves time and effort compared to traditional training to train a specific model for each scale factor. However, it is still computationally expensive. Inspired by Ref. [2], we built a lightweight network, which is suitable for hardware applications. The main contribution of the proposed work is the creation of a single model for each arbitrary scale factor with a low computational cost. Its network is trained from scratch and only needs to be prepared once for all the scale factors.

Author Contributions

Conceptualization, Q.T.D.D. and K.-Y.H.; methodology, Q.T.D.D. and K.-Y.H.; software, Q.T.D.D.; validation, Q.T.D.D.; formal analysis, Q.T.D.D.; investigation, K.-Y.H.; resources, K.-Y.H.; data curation, K.-Y.H.; writing—original draft preparation, Q.T.D.D.; writing—review and editing, K.-Y.H.; visualization, K.-Y.H.; supervision, P.-Y.C.; project administration, P.-Y.C.; funding acquisition, P.-Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Science and Technology Council, R.O.C., under NSTC-110-2221-E-006-164-MY3, in part by National Academy of Marine Research, Taiwan, under NAMR-111001, and in part by Qualcomm through a Taiwan University Research Collaboration Project.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank anonymous editors for their valuable comments and suggestions to improve the quality of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
Hu, X.; Mu, H.; Zhang, X.; Wang, Z.; Tan, T.; Sun, J. Meta-SR: A magnification-arbitrary network for super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Agustsson, E.; Timofte, R. NTIRE 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1122–1131. [Google Scholar]
Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.-L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the British Machine Vision Conference 2012, Surrey, UK, 3–7 September 2012; pp. 1–10. [Google Scholar] [CrossRef]
Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the 7th international conference on Curves and Surfaces, Avignon, France, 24–30 June 2010; pp. 711–730. [Google Scholar]
Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001. [Google Scholar]

Figure 1. (a) Multiple SR model for different scale factors and (b) single SR model for the arbitrary scale factor.

Figure 2. Architecture of the Meta-RCAN network.

Figure 3. Generated HR image of “ppt3” from Set14 dataset.

Figure 4. Visual comparison of the image “Monarch” from dataset Set14 with a scale factor of 2.

Figure 5. Visual comparison of the image “zebra” from dataset Set14 with a scale factor of 3.

Figure 6. Visual comparison of the image “Baboon” from dataset Set14 with a scale factor of 4.

Table 1. Experimental results of the proposed method and comparison with other methods for the evaluated metric PSNR/SSIM.

		Bicubic	VDSR [6]	LAS_A	LAS_B	LAS_C	Meta-RCAN	Meta-RDN [2]
	Params	N/A	665 K	411 K	634 K	967 K	12.7 M	22 M
Set5	2	33.66/0.9299	37.53/0.9587	37.52/0.9583	37.67/0.9591	37.72/0.9593	38.22/0.9611	38.23/0.9610
	2.5	-	-	35.36/0.9395	35.60/0.9411	35.59/0.9410	36.20/0.9444	36.18/0.9441
	3	30.39/0.8682	33.66/0.9213	33.72/0.9209	34.03/0.9238	34.02/0.9240	34.73/0.9295	34.72/0.9296
	3.5	-	-	32.52/0.9019	32.81/0.9058	32.91/0.9071	33.56/0.9142	33.60/0.9146
	4	28.42/0.8104	31.35/0.8838	31.36/0.8807	31.72/0.8871	31.81/0.8886	32.52/0.8989	32.51/0.8986
Set14	2	30.24/0.8688	33.03/0.9124	33.05/0.9123	33.26/0.9149	33.27/0.9149	34.02/0.9206	34.03/0.9204
	2.5	-	-	31.17/0.8704	31.37/0.8740	31.37/0.8735	31.97/0.8819	31.89/0.8814
	3	27.55/0.7742	29.77/0.8314	29.83/0.8319	30.06/0.8364	30.06/0.8366	30.58/0.8463	30.58/0.8465
	3.5	-	-	28.86/0.7970	29.08/0.8023	29.11/0.8035	29.59/0.8139	29.60/0.8140
	4	26.00/0.7027	28.01/0.7674	28.05/0.7672	28.28/0.7735	28.31/0.7749	28.84/0.7872	28.86/0.7878
B100	2	29.56/0.8431	31.90/0.8960	31.81/0.8940	31.97/0.8966	31.99/0.8971	32.33/0.9008	32.36/0.9011
	2.5	-	-	29.95/0.8415	30.11/0.8450	30.12/0.8446	30.46/0.8509	30.48/0.8509
	3	27.21/0.7385	28.82/0.7976	28.75/0.7959	28.90/0.7999	28.92/0.8005	29.26/0.8079	29.28/0.8089
	3.5	-	-	27.89/0.7566	28.04/0.7617	28.06/0.7626	28.40/0.7718	28.42/0.7730
	4	25.96/0.6675	27.29/0.7251	27.22/0.7233	27.37/0.7289	27.40/0.7301	27.73/0.7409	27.76/0.7419

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dang, Q.T.D.; Huang, K.-Y.; Chen, P.-Y. Lightweight Network for Single Image Super-Resolution with Arbitrary Scale Factor. Eng. Proc. 2023, 55, 15. https://doi.org/10.3390/engproc2023055015

AMA Style

Dang QTD, Huang K-Y, Chen P-Y. Lightweight Network for Single Image Super-Resolution with Arbitrary Scale Factor. Engineering Proceedings. 2023; 55(1):15. https://doi.org/10.3390/engproc2023055015

Chicago/Turabian Style

Dang, Quang Truong Duy, Kuan-Yu Huang, and Pei-Yin Chen. 2023. "Lightweight Network for Single Image Super-Resolution with Arbitrary Scale Factor" Engineering Proceedings 55, no. 1: 15. https://doi.org/10.3390/engproc2023055015

APA Style

Dang, Q. T. D., Huang, K.-Y., & Chen, P.-Y. (2023). Lightweight Network for Single Image Super-Resolution with Arbitrary Scale Factor. Engineering Proceedings, 55(1), 15. https://doi.org/10.3390/engproc2023055015

Article Menu

Lightweight Network for Single Image Super-Resolution with Arbitrary Scale Factor^†

Abstract

1. Introduction

2. Proposed Method

3. Experimental Result

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Lightweight Network for Single Image Super-Resolution with Arbitrary Scale Factor †

Abstract

1. Introduction

2. Proposed Method

3. Experimental Result

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Lightweight Network for Single Image Super-Resolution with Arbitrary Scale Factor^†