Lightweight Network for Single Image Super-Resolution with Arbitrary Scale Factor †

: The existing single image super-resolution (SISR) methods that consider integer scale factors (X2, X3, X4, and X8), have been developed well, but SISR methods with arbitrary scale factors (X1.3, X2.5, and X3.7) have gradually gained attention recently. Therefore, we proposed an efﬁcient, lightweight model. In this study, there are two contributions as follows. (1) An efﬁcient and lightweight network for SISR is combined with the up-scaled module, which determines its weights based on the size of the high-resolution (HR) image. (2) All scale factors are applied simultaneously using one model, which saves more storage and computational resources. Finally, we design various experiments to evaluate the proposed method based on multiple general datasets. The experimental results show that the proposed model is lightweight while the performance is relatively competitive.


Introduction
In recent years, convolutional neural networks (CNN) have become one of the most ubiquitous machine learning solutions for computer vision tasks.CNN is used intentionally in most fields of image processing, and single image super-resolution (SISR) is one of them.SISR was known as the up-scaling method in the past, which indicates generating a highresolution (HR) image from a single low-resolution (LR) image.The SISR techniques [1][2][3][4][5][6] based on CNN have been developed since SRCNN [1], but most of them only consider the integer scale factors (X2, X3, X4, and X8), as shown in Figure 1a.Since there are situations where users need to up-scale low-resolution (LR) images to customize the size instead of fixing the size in real-world scenarios, the SISR methods with arbitrary scale factors (X1.3, X2.5, and X3.7) become important.In addition, if we train a single specific model with every scale factor, it saves time and effort, as shown in Figure 1b.Putting all the reasons together, the researchers have come up with the idea to solve that problem.
Meta Super-Resolution (Meta-SR) [2] was proposed in 2019.Their Meta-Upscale Module is applied to up-scaling LR based on different scale factors.In contrast, the upscaling module of the SISR methods with the integer scale factors is the deconvolution layer or sub-pixel layer at the end of networks.In particular, the sub-pixel layer [3] is widely used in SR works, such as the Residual Dense Network (RDN) [4] and residual channel attention network (RCAN) [5].Meta-SR adopts RDN [4] as the backbone and proves that Meta-SR obtains high performance and deals with arbitrary scale factors for SISR.However, Meta-SR [2] has high complexity, and implementing it involves many challenges in terms of the hardware requirements, making it computationally expensive.Therefore, the proposed method is focused on constructing a lightweight model, which is more appropriate and likely to work in real-life scenarios.The proposed model is inspired by Meta-SR [2] and is called Light Arbitrary-SR (LAS), which is much lighter in weight than the original Meta-SR [2].Compared to a similar study [6] of very deep super-resolution (VDSR) with arbitrary scale factors, the research result shows a better quality of HR images with lower usage of weights and computational cost.

Proposed Method
The proposed LAS is inspired by RCAN [5] and Meta-SR [2].We found an efficient and lightweight network as the backbone based on RCAN [5] and combined it with the Meta-Upscale Module [2], as shown in Figure 2. One novelty in RCAN [5] is to establish a very deep network based on the residual in residual (RIR) structure.This network comprises several residual groups and long skip connections.Each group consists of multiple residual blocks and short skip connections.
Generally, RIR makes the main network concentrate on learning high-frequency information by allowing plentiful low-frequency information to be surpassed via numerous skip connections.The channel attention mechanism is also introduced to improve the representational ability of the network further.The dominant part of the network is the residual channel attention block (RCAB), which helps the network recognize informative components of the LR features efficiently.The RCAB, inspired by the success of channel attention (CA) and residual blocks (RB), helps the network learn and explore more information to improve the overall performance.RCAN is constructed using the foundation of RCAB and RIR structure.Therefore, the proposed method is focused on constructing a lightweight model, which is more appropriate and likely to work in real-life scenarios.The proposed model is inspired by Meta-SR [2] and is called Light Arbitrary-SR (LAS), which is much lighter in weight than the original Meta-SR [2].Compared to a similar study [6] of very deep super-resolution (VDSR) with arbitrary scale factors, the research result shows a better quality of HR images with lower usage of weights and computational cost.

Proposed Method
The proposed LAS is inspired by RCAN [5] and Meta-SR [2].We found an efficient and lightweight network as the backbone based on RCAN [5] and combined it with the Meta-Upscale Module [2], as shown in Figure 2. One novelty in RCAN [5] is to establish a very deep network based on the residual in residual (RIR) structure.This network comprises several residual groups and long skip connections.Each group consists of multiple residual blocks and short skip connections.Therefore, the proposed method is focused on constructing a lightweight model, which is more appropriate and likely to work in real-life scenarios.The proposed model is inspired by Meta-SR [2] and is called Light Arbitrary-SR (LAS), which is much lighter in weight than the original Meta-SR [2].Compared to a similar study [6] of very deep super-resolution (VDSR) with arbitrary scale factors, the research result shows a better quality of HR images with lower usage of weights and computational cost.

Proposed Method
The proposed LAS is inspired by RCAN [5] and Meta-SR [2].We found an efficient and lightweight network as the backbone based on RCAN [5] and combined it with the Meta-Upscale Module [2], as shown in Figure 2. One novelty in RCAN [5] is to establish a very deep network based on the residual in residual (RIR) structure.This network comprises several residual groups and long skip connections.Each group consists of multiple residual blocks and short skip connections.
Generally, RIR makes the main network concentrate on learning high-frequency information by allowing plentiful low-frequency information to be surpassed via numerous skip connections.The channel attention mechanism is also introduced to improve the representational ability of the network further.The dominant part of the network is the residual channel attention block (RCAB), which helps the network recognize informative components of the LR features efficiently.The RCAB, inspired by the success of channel attention (CA) and residual blocks (RB), helps the network learn and explore more information to improve the overall performance.RCAN is constructed using the foundation of RCAB and RIR structure.Generally, RIR makes the main network concentrate on learning high-frequency information by allowing plentiful low-frequency information to be surpassed via numerous skip connections.The channel attention mechanism is also introduced to improve the representational ability of the network further.The dominant part of the network is the residual channel attention block (RCAB), which helps the network recognize informative components of the LR features efficiently.The RCAB, inspired by the success of channel attention (CA) and residual blocks (RB), helps the network learn and explore more information to improve the overall performance.RCAN is constructed using the foundation of RCAB and RIR structure.
However, a very deep RCAN brings about higher accuracy and superior results for the SR image.Thus, we aimed to build a low-complexity network for SR images with arbitrary scale factors, as the RCAN is still too complicated with a high computational cost, which makes it challenging to implement.In the original RCAN, 20 RCABs and 10 residual groups are set up, and the usage of total weights is about 16 M.To reduce the complexity and make it more appropriate for hardware implementation, we reduced it to around 90% of the entire implementation to only 3 or 6 RCABs and a single residual group.
Moreover, another highlight in the proposed LAS is the use of the Meta-Upscale Module, which has three core functions: Location Projection, Weight Prediction, and Feature Mapping.The block of Location Projection projects pixels of the HR image onto the LR image based on the scale factor and the kernel weights for each pixel on the HR image are predicted by the Weight Prediction Module.Lastly, the feature maps on the LR image and the predicted kernel weights are mapped back to the HR image using the Feature Mapping function to compute the value of the pixel of the HR image.We attempted to simplify the Meta-Upscale Module as well.Since the Weight Prediction function of the Meta-Upscale Module uses a network to predict the kernel weights using two fully connected layers, it consumes a lot of computational resources.We experimented with alleviating the neurons from 256 to 128 and then to 64 to observe their performance.Finally, the Meta-Upscale Module was simplified by reducing the number of neurons in the fully connected layer from 256 to 64.The proposed method is confirmed to be a lightweight SR method with arbitrary scale factors.

Experimental Result
To achieve a lightweight model of super-resolution with non-integer scale factors, we attempted to combine the Meta-Upscale Module and RCAN and simplified them.In the experiment, three versions were presented, LAS_A, LAS_B, LAS_C, and Meta-RCAN, based on a different setting.For LAS_A, we implemented three RCABs.A single residual block with a simplified Meta-Upscale Module reduced the number of neurons in the fully connected layer to 64.For LAS_B, there were six RCABs and a single residual block with a simplified Meta-Upscale Module reduced the number of neurons in the fully connected layer to 64.LAS_C contained six RCABs and one residual block with 256 fully connected layers in the Meta-Upscale Module.Lastly, Meta-RCAN indicates the use of a slightly simplified RCAN, which was set up with 16 RCABs and 10 residual groups.The setting of Meta-RCAN was adopted from the official source code in Ref. [2].We re-trained the model and presented the test results, and we did not consider Meta-RCAN as one of the versions of LAS.
All the experiments were run in parallel on two GPUs (Nvidia GeForce GTX 1080 Ti).We used the Pytorch framework and Python 3 with CUDA (version 11.2.142).The training and testing required libraries, including Pytorch 0.5.0,Python 3.5 or higher, NumPy, skiamge, imageio, and cv2.The training scale factors for the proposed methods varied from 1, 1.1, 1.2, 1.3, 1.4, . . . to 4 with a stride of 0.1.The training dataset contained 800 images from the DIV2K [7] dataset.The test dataset was from on three datasets: Set5 [8], Set14 [9], and B100 [10].For other details, the learning rate was decreased by half after every 200 epochs with an initialization of 10-4 for all the layers.The optimizer is Adam.For better convergence, the L1 loss function, instead of the L2, was used to train the network.
Since RCAN has a better representational ability of the model than RDN, the Meta-RCAN shows a similar value of the evaluated metric with around 40% lower parameters than Meta-RDN (Table 1).Moreover, LAS_C has more fully connected layers in the Meta-Upscale Module, so it has approximately 30% higher values of parameters than LAS_B.However, for the evaluated metric, LAS_B only has a slightly lower value than LAS_C.In the comparison of LAS_B and LAS_A, the more parameters there are, the better the quality of the image.Compared to the lightweight VDSR, the evaluated metric of LAS_B is slightly higher, but it still requires relatively fewer parameters.For LAS_A, its parameters are 33% lower than those of VDSR and obtain almost similar evaluated metrics.The results of VDSR are obtained from the original data in Ref. [6], and the results of Meta-RDN [2] are obtained from the pre-trained model created by us.
We present the generated HR images made by LAS_B with several scale factors in Figure 3.The comparison of the proposed methods with others for practically generated x2.0, x3.0, and x4.0 HR images are provided in Figures 4-6, respectively.Finally, there is a trade-off between the performance, which is evaluated using PSNR and SSIM metrics, and the cost is assessed using the parameters.The proposed LAS_A with only 400 K parameters makes the proposed model reasonable and realistic for implementation into hardware devices.In particular, it considers non-integer scale factors.obtained from the original data in Ref. [6], and the results of Meta-RDN [2] are obtained from the pre-trained model created by us.We present the generated HR images made by LAS_B with several scale factors in Figure 3.The comparison of the proposed methods with others for practically generated x2.0, x3.0, and x4.0 HR images are provided in Figures 4-6, respectively.Finally, there is a trade-off between the performance, which is evaluated using PSNR and SSIM metrics, and the cost is assessed using the parameters.The proposed LAS_A with only 400 K parameters makes the proposed model reasonable and realistic for implementation into hardware devices.In particular, it considers non-integer scale factors.

Conclusions
Super-resolution with non-integer scale factors is still a practical topic that has gradually gained attention in recent years.Meta-SR [2] is used for tackling this problem.A novel upscale module is proposed to proactively predict the kernel weights based on the corresponding scale factor.Based on this particular design, we need to train only a single model for all arbitrary scale factors.It saves time and effort compared to traditional training to train a specific model for each scale factor.However, it is still computationally expensive.Inspired by Ref. [2], we built a lightweight network, which is suitable for hardware applications.The main contribution of the proposed work is the creation of a single model for each arbitrary scale factor with a low computational cost.Its network is trained from scratch and only needs to be prepared once for all the scale factors.

Figure 1 .
Figure 1.(a) Multiple SR model for different scale factors and (b) single SR model for the arbitrary scale factor.

Figure 2 .Figure 1 .
Figure 2. Architecture of the Meta-RCAN network.However, a very deep RCAN brings about higher accuracy and superior results for the SR image.Thus, we aimed to build a low-complexity network for SR images with

Figure 1 .
Figure 1.(a) Multiple SR model for different scale factors and (b) single SR model for the arbitrary scale factor.

Figure 2 .Figure 2 .
Figure 2. Architecture of the Meta-RCAN network.However, a very deep RCAN brings about higher accuracy and superior results for the SR image.Thus, we aimed to build a low-complexity network for SR images with Eng. Proc.2023, 55, 15 4 of 7

Figure 4 .
Figure 4. Visual comparison of the image "Monarch" from dataset Set14 with a scale factor of 2.

Figure 5 .
Figure 5. Visual comparison of the image "zebra" from dataset Set14 with a scale factor of 3.

Figure 6 .
Figure 6.Visual comparison of the image "Baboon" from dataset Set14 with a scale factor of 4.

Figure 4 . 7 Figure 4 .
Figure 4. Visual comparison of the image "Monarch" from dataset Set14 with a scale factor of 2.

Figure 5 .
Figure 5. Visual comparison of the image "zebra" from dataset Set14 with a scale factor of 3.

Figure 6 .
Figure 6.Visual comparison of the image "Baboon" from dataset Set14 with a scale factor of 4.

Figure 5 . 7 Figure 4 .
Figure 5. Visual comparison of the image "zebra" from dataset Set14 with a scale factor of 3.

Figure 5 .
Figure 5. Visual comparison of the image "zebra" from dataset Set14 with a scale factor of 3.

Figure 6 .
Figure 6.Visual comparison of the image "Baboon" from dataset Set14 with a scale factor of 4.

Figure 6 .
Figure 6.Visual comparison of the image "Baboon" from dataset Set14 with a scale factor of 4.

Table 1 .
Experimental results of the proposed method and comparison with other methods for the evaluated metric PSNR/SSIM.

Table 1 .
Experimental results of the proposed method and comparison with other methods for the evaluated metric PSNR/SSIM.