Internal Learning for Image Super-Resolution by Adaptive Feature Transform

: Recent years have witnessed the great success of image super-resolution based on deep learning. However, it is hard to adapt a well-trained deep model for a speciﬁc image for further improvement. Since the internal repetition of patterns is widely observed in visual entities, internal self-similarity is expected to help improve image super-resolution. In this paper, we focus on exploiting a complementary relation between external and internal example-based super-resolution methods. Speciﬁcally, we ﬁrst develop a basic network learning external prior from large scale training data and then learn the internal prior from the given low-resolution image for task adaptation. By simply embedding a few additional layers into a pre-trained deep neural network, the image-adaptive super-resolution method exploits the internal prior for a speciﬁc image, and the external prior from a well-trained super-resolution model. We achieve 0.18 dB PSNR improvements over the basic network’s results on standard datasets. Extensive experiments under image super-resolution tasks demonstrate that the proposed method is ﬂexible and can be integrated with lightweight networks. The proposed method boosts the performance for images with repetitive structures, and it improves the accuracy of the reconstructed image of the lightweight model.


Introduction
For surveillance video systems, 4K high-definition TV, object recognition, and medical image analysis, image super-resolution is a crucial step to improve image quality. Single image super-resolution (SISR) algorithms, aiming to recover a high-resolution (HR) image from a low-resolution (LR) image, is challenging since that one LR image corresponds to many HR versions. SISR methods enforce some predetermined constraints on the reconstructed image to address the severe ill-posed issues, which include data consistency [1,2], self-similarity [3,4], and structural recurrence [5,6]. Example-driven SISR methods further explore useful image priors from a collection of exemplar LR-HR pairs and learn the nonlinear mapping functions to reconstruct the HR image [7][8][9][10].
The recently blooming deep convolutional neural network (CNN) based SR methods aim to exploit image priors from a large training dataset, expecting that enough training examples will provide a variety of LR-HR pairs. Benefiting from high-performance GPUs and large amounts of memory, deep neural networks have significantly improved SISR [11][12][13][14][15][16][17][18].
Although CNN-based SR methods achieve impressive results on data related to the prior training, they tend to produce distracting artifacts, such as over-smoothing or ringing results, once the input image cannot be well represented by training examples [6,19]. For example, when providing an LR image that is downsampled by a factor of two to a model trained for a downsampling factor of four or giving an LR image with building textures to a model trained on natural outdoor images, the well-trained model most probably introduces artifacts.
To address this issue, internal example-driven approaches believe that internal priors are more helpful for the recovery of the specific low-resolution image [4,6,[20][21][22][23][24]. From a given LR image and its pyramid of scale versions, the small training dataset provides particular information in repetitive image patterns and self-similarity across image scales for a specific image SR. Compared with external prior learning, internal training examples contain more relevant training patches than external data. Therefore, an effective way to obtain better SR results is to use both external and internal examples during the training phase [25].
There is a growing interest in introducing internal priors to CNN-based models for more accurate image restoration results. Unlike traditional example-driven methods, deep CNN-based models prefer large external training data; thus, adding a small internal dataset to the training data hardly improves SR performance for the given LR image.
Fine-tuning is introduced to exploit the internal prior by optimizing the parameters of pre-trained models using internal examples and this allows the deep model to adapt to the given LR image. Usually, the pre-trained model contains several sub-models and each of the sub-models is trained using specific training data with similar patterns. By providing an LR image, the most relevant sub-model is selected and then fine-tuned by the self-example pairs [19,26].
On the other hand, the zero-shot super-resolution (ZSSR) advocates training a super-resolver image-specific CNN at the test phase [11] from scratch. It is obvious that the trained model heavily relies on the specific settings per image and this makes it hard to generalize to other conditions. Figure 1 demonstrates the SR results of different methods. The LR image is downsampled by a factor of two from the ground-truth image. The SRCNN that is trained for the LR image downsampled by a factor of four tends to overly sharpen the image texture while the unsupervised method ZSSR solely learns from the input LR image, which yields artificial effects. Zero-shot super-resolution (ZSSR) [11] fails to reconstruct pleasant visual details. Image-adaptive super-resolution (IASR) learns the internal prior from an LR image based on the pre-trained SRCNN and creates better results.
It is observed that external examples promote visually pleasant results for relatively smooth regions while internal examples from the given image help to recover specific details of the input image [27]. Our work focuses on improving the pre-trained super-resolution model for a specific image based on the internal prior. The key to the solution is to exploit a complementary relation between external and internal example-based SISR methods. To this end, we develop a unified deep model to integrate external training and internal learning. Our method enjoys the impressive generalization capabilities of deep learning, and further improves it through internal learning in the test phase. We make the following three contributions in this work.

1.
We propose a novel framework to exploit the strengths of the external prior and internal prior in the image super-resolution task. In contrast to the full training and fine-tuning methods,  the proposed method modulates the intermediate output according to the testing low-resolution  image via its internal examples to produce more accurate SR images.  2. We perform adaptive feature transformation to simulate various image feature distributions extracted from the testing low-resolution image. We carefully investigate the properties of adaptive feature transformation layers, providing detailed guidance on the usage of the proposed method. Furthermore, the framework of our network is flexible and able to be integrated into CNN-based models. 3.
The extensive experimental results demonstrate that the proposed method is effective for improving the performance of lightweight deep network SR. This is promising for providing new ideas for the community to introduce internal priors to the deep network for SR methods.
The remainder of this paper is organized as follows. We briefly review the most related works in Section 2. Section 3 presents how to exploit external priors and internal priors using one unified framework. The experimental results and analysis are shown in Section 4. In Section 5, we discuss the details of the proposed method. Section 6 gives the conclusion.

Related Work
Given the observed low-resolution image I l , SISR attempts to reconstruct a high-resolution version by recovering all the missing details. Assume that I l is blurred and downsampled from high-resolution image I h . I l can be formulated as where D, H and denote the downsampling operator, blurring kernel and noise respectively. The example-driven method with the parameters Θ learns a nonlinear mapping function I s = f (I l , Θ) through the training data, where I s is the reconstructed SR image. The parameters Θ are optimized during training to guarantee the consistency between I s and the ground truth image I h .

Internal Learning for Image Super-Resolution
Learning the image internal prior is important to specific image super-resolution. There are two strategies of exploiting internal examples for CNN-based image super-resolution.
Fine-tuning a pre-trained CNN-based model. Due to the number of internal examples that come from the given LR image or its scaled version being limited, several methods prefer to use a fine-tuning strategy [19,26,28], which includes the following steps: (1) The CNN model with parameters Θ are learned from the collection of external examples. (2) For the test image I l s, internal LR-HR pairs are extracted from I l s and their scaled versions [19]. (3) Θ is optimized to adapt to these internal pairs. (4) The CNN with the new set of parametersΘ is supposed to produce a more accurate HR image I s = f (I l ,Θ).
It is also possible to use a variant of fine-tuning where part of Θ, only the part of the convolutional layers are frozen to prevent overfitting.
Strength: Fine-tuning overcomes the small dataset size issue and speeds up the training. Weakness: Fine-tuning often suffers from a low learning rate to prevent large drift in the existing parameters. Another notorious drawback of the fine-tuning strategy is that the fine-tuned networks suffer catastrophic forgetting and degrade performance on the old task [29].
Image-specific CNN-based model. Some researchers argued that internal dictionaries are sufficient for image reconstruction [4,6,20,21]. These methods solved the SR problem using unsupervised learning by building a particular SR model for each testing LR image directly. As an unsupervised CNN-based SR method, ZSSR exploits the internal recurrence of information inside a single image [11], and trains a lightweight image-specific network f (I l , Θ) at test time on examples extracted solely from the input image I l itself.
Strength: Full-Training aims to build a specific deep neural network for each test image. It adapts the SR model to diverse kinds of images where the acquisition process is unknown.
Weakness: The network aims to reconstruct a particular LR image; thus, it has limited generalization, tending to yield poor results for other images. Fully tuned parameters are only suitable for the lightweight CNNs.

Feature-Wise Transformation
The idea of adapting a well-trained image super-resolution model to a specific image has certain connections to domain adaption. For image-adaptive super-resolution, the source domain is a CNN-based SR model trained on a large external dataset while the target task is to reconstruct an HR version for a specific image with insufficient internal examples. Feature-wise transformation is broadly used for capturing variations of the feature distributions under different domains [30,31]. In a deep neural network, feature-wise transformation is implemented using the additional layers that are parametrized by some form of conditioning information [30]. The same idea is adopted to image style transfer for normalizing the feature maps according to some priors [32][33][34]. For image restoration, He performed adaptive feature modification to transfer the CNN-based model from a pre-defined level to another [35].
In this paper, we introduce the adaptive feature-wise transformation (AFT) layers to the pre-trained model. The internal priors are parameterized as a set of AFT layers. Integrated with the aid of AFT layers, the model formulates the external and internal priors together to efficiently reconstruct the high-resolution image.
Our method is different from Reference [35] in that (1) the proposed method unifies external learning and internal learning for image-adaptive super-resolution, and (2) the layer aims to adapt the pre-trained model to specific images.

Proposed Method
The overall scheme of IASR is demonstrated in Figure 2. As shown, IASR consists of three phases: external training, internal learning, and test. External training is conducted on large scale HR-LR pairs. This step is similar to the CNN-based SR [12,13]. Internal learning is conducted on the synthesized HR-LR pairs of the given LR image I l , which is used to learn the knowledge from I l . In contrast to fine-tuning, we introduce the adaptive feature-wise transformation (AFT) layers to the pre-trained model. The internal learning step enables our model to learn internal information within a single image. The test phase is the same as the CNN-based SR. Once internal learning is finished, I l is fed into IASR for super-resolution. For the internal learning and the testing part, only the testing image itself, is fed into IAR.
The framework of IASR is shown in Figure 3. IASR consists of two parts: the basic part is N ex for external learning, and the other part is the adaptive layers AFT for internal learning. As shown in Figure 3, a residual block usually has two convolutional layers and one ReLU layer typically. Compared with the traditional residual block, we integrate each convolutional layer with an AFT layer for image-adaptive internal learning. The overall image-adaptive super-resolution (IASR) scheme. IASR consists of two parts, the basic network (N ex ) and adaptive feature transformation layers (AFT). (a) External learning. N ex is a residual network, consisting of a residual block with parameters Θ ex , is trained on large external databases at first. (b) Internal learning. We build an internal training dataset based on the test image I l , and then optimize the parameters Θ in of AFT to learn the internal prior from internal examples while freezing the parameters Θ ex of N ex . Finally, the test image I l is fed into IASR to produce its HR output.

External Learning
The backbone of N ex is the ResNet (residual networks) [36], which consists of a residual block (Resblock). In our work, N ex performs external training on large scale HR-LR pairs. To this end, the parameters Θ ex of N ex are optimized to reconstruct an accurate high-resolution image. Algorithm 1 demonstrates the external learning phase. Algorithm 1 External training. 1: Input: The training data I l , I h synthesized from external dataset by the pre-defined downsampling operator; The hyper-parameters of N ex , including the learning rate, batch size and the number of epochs. 2: Output: N ex with optimized parameters Θ * ex . 3: Initialization phase.
The function of N ex is the same as the normal residual network, N ex will produce the high resolution image I s = f (I l , Θ * ex ) based on the external prior. Since natural images share similar properties, N ex is able to learn the representative image priors of high-resolution images, thus providing relatively reasonable SR results for test images.

Internal Learning via AFT Layers
As shown in Figure 1, due to the discrepancy between the feature distributions extracted from the task in the seen and unseen images, N ex may fail to generalize to the test image. We aim to improve the SR performance of N ex for the particular unseen image.

Adaptive Feature-Wise Transform Layer
In [35], the authors proposed a modulating strategy for the continual modulation of different restoration levels. Specifically, they performed channel-wise feature modification to adapt a well-trained model to another restoration level with high accuracy. Here, we insert the adaptive feature transform (AFT) layer into the residual blocks of N ex to augment the intermediate feature activations with the feature-wise transform, and then fine-tune the AFT layers to adapt to the unseen LR image. Figure 3 shows the ResBlock with the adaptive feature-wise transformation.
The AFT layer consists of a modulation parameter pair (γ, β) that is expected to learn the internal prior. Given an intermediate feature map z with the dimension of C × H × W, we modulatedẑ as, where z i is the ith input feature map, and * denotes the convolution operator. γ i and β i are the corresponding filter and bias, respectively.

Internal Learning
After the external training finished, we froze the pre-trained parameters Θ ex and inserted AFT layers into ResBlock. The internal learning stage aims to model the internal prior using AFT parameterized by Θ in .
Θ in denotes parameters of all γ and β of the additional AFT layers. In this phase, we synthesize LR sons by downsampling I l with the corresponding blur kernel. Specifically, the test image I l becomes a ground-truth I h in while its LR sons I l in become the corresponding LR images [11]. To augment the internal training examples, we feed the testing image into N ex to produce the I s . The LR sons of I s are collected as the internal examples also. Since I s is much larger than I l , it can extract many more internal examples than I l alone. Thus, the final internal training dataset includes the LR sons of I l and I s . Algorithm 2 demonstrates the internal learning phase. The learned parameter pair adaptively influences the final result by performing the adaptive feature-wise transformation of the intermediate feature maps z. Algorithm 2 Internal learning. 1: Input: Training data extracted from the test image I l and the output of N ex ; I s by the downsampling operator with the blur kernel; Θ * ex of the pre-trained N ex ; The hyper-parameters of N ex , including the learning rate, batch size and the number of epochs. 2: Output: IASR with parameters Θ * which includes Θ * ex and Θ * in . 3: Initialization phase.
Θ in ← randomly initialize Θ in ; in ← argmin Θ n L oss I s , I h in . 5: return AFT layers with parameters Θ * in .

Image-Adaptive Super-Resolution
IASR is ready for performing super-resolution for a specific image I l after the external learning and internal learning. Providing an LR test image I l to IASR with parameters Θ * in and Θ * ex , IASR yields the high-resolution image.
In the testing phase, only the testing image itself is fed into the network and all internal examples are extracted from the testing image.

1.
External training. For external training, we use the images from DIV2K [37]. The image patches sized 24 × 24 are input, and the ground truth is the corresponding HR patches sized 24r × 24r, where r is the upscaling factor. Training data augmentation is performed with random up-down and left-right flips and clockwise 90 • rotations. 2.
Internal learning. For internal learning, we generate internal LR-HR pairs from the test images I l and I s following the steps of [11]. I l and I s become the ground-truth images. After downsampling I l and I s with the blur kernel, their corresponding LR sons become LR images. The training dataset is built by extracting patches from the "ground-truth" images and their LR sons. In our experiment, IASR and ZSSR extract internal examples with the same strategy, including the number of examples (3000), the sampling stride (4), the scale augmentation (without). Finally, the internal dataset consists of HR patches sized 24r × 24r and LR patches sized 24 × 24, which are further enriched by augmentation such as rotations and flips. 3.
Training settings. For both training phases, we use the L 1 loss with the ADAM optimizer [38] with β 1 = 0.9 and β 2 = 0.999. All models are built using the PyTorch framework [39]. The output feature maps are padded by zeros before convolutions. To minimize the overhead and make maximum use of the GPU memory, the batch size is set to 64 and the training stops after 60 epochs. The initial learning rate is 10 −4 , which decreases by 10 percent after every 20 epochs. To synthesize the LR examples, these examples are first downsampled by a given upscaling factor, and then these LR examples are upscaled by the same factor via Bicubic interpolation to form the LR images. The upscaling block in Figure 3 is implemented via "bicubic" interpolation. We conduct the experiments on a machine with a NVIDIA TitanX GPU with 16G of memory.
The structure of IASR. The basic network N ex consists of 3 residual blocks. The number of filters is 64 and the filter size is 3 × 3 for all convolution layers. To build the image-adaptive SR network, we integrate the AFT layer into each residual block of the network, and set the filter as 64 × 3 × 3.
To evaluate our proposed method, we build a ResNet with the same structure as N ex in the following experiments.

Improvement for the Lightweight CNN
IASR aims to improve SR by integrating a lightweight CNN with AFT layers. We validate our method by integrating AFT layers with two lightweight networks: the well-known SRCNN [12] and ResNet with the same structure as N ex . Furthermore, we compare the proposed image-adaptive SR (A) with two other improvement techniques [25]: Iterative back projection (B) ensures that the HR reconstruction is consistent with the LR input, and Enhanced prediction (E) averages the predictions on a set of transformed images derived from the LR input. In the experiments, we rotate the LR input by 90 • to produce the enhanced prediction. SRCNN includes three convolutional layers with kernel sizes of 9, 5 and 5, respectively. We add AFT layers with a kernel size 3 × 3 to the first two convolutional layers to build SRCNN A . The structure of ResNet is the same as the basic part of IASR N ex , which consists of 3 ResBlocks, and ResNet A integrates N ex with the AFT layers. Furthermore, we combine image-adaptive with back projection (AB) and enhance prediction (AE) for further evaluation. The objective criterion is the PSNR in the Y-channel of YCbCr color space. We report their average PSNR on Set5 [40], BSD100 [41], and Urban100 [42] in Table 1. Some conclusions can be obtained.

1.
Image-adaptive (A) SR is a more effective way to improve performance than back-projections (B) and enhancement (E). The gains of the image-adaptive technique for SRCNN and ResNet are both about +0.18 dB. The gain of back projection is only about +0.01 dB on average (note that back projection needs to presuppose a degradation operator, which makes it hard to give a precise estimation). It confirms that our image-adaptive approach is a generic way to improve the lightweight network for SR.

2.
Among the three benchmark datasets, the Urban100 images present strong self-similarities and redundant repetitive patterns; therefore, they provide a large number of internal examples for internal learning. By applying the image-adaptive internal learning technique, both the SRCNN and ResNet are largely improved on Urban100 (+0.31 and +0.24 dB). The poorest gains are achieved on BSD100 (average +0.06 dB and +0.13 dB). The reason is mainly due to the BSD100 dataset being natural outdoor images, which are similar to the external training images.

3.
The combination of an image-adaptive internal learning technique and enhanced prediction brings larger gains. ResNet AE achieves better performance (+0.28 dB) than ResNet on average. It indicates some complementarity between the different methods.

Evaluations on "Ideal" Case
In these benchmarks, the LR images are ideally downscaled from their HR versions using MATLAB's "imresize" function. We compare IASR with state-of-the-arts supervised SISR methods and recently proposed unsupervised methods. All methods run on the same machine with an NVIDIA TitanX GPU with 16G of memory. In IASR, N ex consists of three ResBlocks with AFT layers and the upsampling block is "bicubic". The overall results are shown in Table 2. The external learning SISR methods include two deep CNNs, VDSR [43] and RCAN [44]. VDSR consists of 20 convolutional layers with 665 K parameters, and RCAN's number of parameters reaches 15,445 K. Under the same scenario as the training phase, meaning the same blur kernel and the same downsampling operator, the supervised deep CNNs achieve extremely overwhelming performances. Among the methods, ZSSR [11] is an internal learning method, which tries to reconstruct a high-resolution image solely from the testing LR image (we used the official code but without the gradual configuration). MZSR and IASR adopt external and internal learning. MZSR [45] is first trained on a large scale dataset and adapt to the test image based on meta-transfer learning. MZSR(1) and MZSR(10) denote MZSR with one single gradient descent update and 10 times gradient descent update respectively (we used the official code but without the kernel estimation). As Table 2 reports, ZSSR, MSZR and IASR are inferior to VDSR and RCAN, while achieve better performance over bicubic interpolation. Note that IASR yields comparable results to VDSR while only having one-third of its parameters. Thus, we conclude that integrated with the adaptive feature-wise transform layers can produce more diverse feature distributions, which provide more particular details for unseen images. As shown in Figures 4 and 5, IASR yields more accurate details than MZSR and ZSSR, such as straighter window frames and sharper floor gaps.

Evaluations on "Non-Ideal" Case
For the "Non-ideal" case, the experiments are conducted using two downsampling methods with different blur kernels [45]. g b λ refers to isotropic Gaussian blur kernel with width λ followed by bicubic downsampling, while g d λ refers to the isotropic Gaussian blur kernel with width λ followed by direct downsampling.
For the "direct" downsampling operators, IASR is retrained with the same downsampled LR images, meaning that we trained two models for different downsampling methods: "direct" and "bicubic". We report the results of g b 1.3 and g d 2.0 on three benchmarks in Table 3, where RCAN and IKC [46] are supervised methods based on external learning and IKC is recently proposed to estimate blur kernel for blind SR. The performance of the external learning methods trained on the "ideal" case significantly drops when the testing images are not satisfied with the "ideal" case. Interestingly, although N ex has never seen the any blurred images, IASR produces comparable results on g b 1.3 and g d 2.0 , and it outperforms both the MSZR(1) and ZSSR on Set5 and Urban100. A visual comparison is present in Figure 6. One can see when the condition is not satisfied with training, both ZSSR and MZSR can restore more details than IKC, and the result of IASR is more consistent with the ground-truth. In the test phase, IASR learns the internal prior solely depending on the test image. The testing LR image is fed to IASR to get the super-resolved image. Figures 7 and 8 show that IASR achieves more visually pleasing results than ResNet and ZSSR. It indicates the robustness of IASR for unknown conditions. For non-reference image reconstruction, the Naturalness Image Quality Evaluator (NIQE) score [46] and BRISQUE [47] are used to measure the quality of the restored image. A smaller NIQE and BRISQUE score indicates better perceptual quality. Table 4 reports the NIQE and BRISQUE scores of the real image reconstruction results. IASR achieves comparable results of the old photo and the Img_005_SRF, while fails to produce a better result of the eyechart (Figure 9) image than ZSSR.

The Kernel Size and Depth of the AFT Layers
Kernel size and performance. Usually, the larger kernel size tends to improve the SR performance due to better adaptation accuracy. To select a reasonable kernel size, we conduct experiments for the 2× super-resolution task. From the experimental results shown in Figure 10, we observe that gradually increasing the kernel size from 1 × 1 to 5 × 5 improves the performance from 37.21 to 37.39 dB, while the number of the parameters increase from 225 to 249 K. Moreover, the performance improvement slows down as the kernel size keeps increasing. The kernel size changing from 5 × 5 to 7 × 7 makes little difference, respectively resulting in 37.37 and 32.39 dB when evaluated on Set5. To save computations, we set the kernel size as 3 × 3 for all AFT layers for all experiments.
Depth and performance. The network goes deeper as more residual blocks are stacked. Figure 11 demonstrates the relation of the number of ResBlock and performance. IASR improves the performance of the basic network N ex as the number of ResBlock increases from one to three. The highest value is achieved when the number of ResBlocks reaches three. When the number is four, we find IASR underperforms the basic network. After that, the performance drops steeply. We suspect that overfitting happens when the limited internal training examples are used in the more complicated model.

Adapting to the Different Scale Factor
Most of the well-trained CNN-based SR methods are restricted to a fixed scale-factor, meaning that the network can only work well for the same scale during testing. Given LR images with different scales, the performance of the CNN is even worse than that of conventional bicubic interpolation. Figure 1 gives a failed example of CNN-based SR. Since the CNN 4× is trained on LR images downsampled by a factor of four, it fails to reconstruct a satisfactory HR image when fed an LR image downsampled by a factor of two. One can see that IASR creates visually more pleasing results than ZSSR, which totally depends on internal learning. Table 5 lists the results of the basic network and IASR for different downsampled LR images. 3 ↓→ 2× refers to the fact that we train N ex on LR images downsampled by a factor of two while the testing LR image is downsampled by a factor of three. The performance drops by −7.01 and −10.73 dB when 3 ↓ and 4 ↓ LR images are fed to N ex , respectively. On the contrary, IASR produces a more stable performance than N ex , which validates its adaptability to the different upscaling factors.

Complexity Analysis
Memory and time complexities are two critical factors for deep networks. We evaluate several state-of-the-art models on the same PC. The results are shown in Table 6.
Memory consumption. Besides the shallow network SRCNN, all three supervised deep models require a large number of parameters. On the contrary, the unsupervised methods only require about one-third of the parameters of VDSR.
Time consumption. To reconstruct an SR image, a fully-supervised network only needs one forward pass. The SRCNN, VDSR and RCAN reconstruct an SR image within two seconds. Internal learning is time expensive because it extracts internal examples and tunes models in the test phase. The runtime depends on the number of internal examples and training stopping criteria. For internal learning, ZSSR stops when its learning rate (starting with 0.001) falls to 10 −6 while IASR fixes the number of epochs as 60. Among the unsupervised methods, MZSR with a single gradient update requires the shortest time among the comparison methods. Benefitting from the pre-trained basic model, the convergence speed of IASR is faster than ZSSR, and the average runtime of IASR per image is 34 s for a 256 × 256 image, which is only 1/4 of ZSSR. Table 6 reports the time consumption and memory consumption of the different methods.

Comparison with Other State-of-the-Art Methods
We compare IASR with the other methods that adopt external and internal learning. The relevant three methods perform different strategies for the combination of internal and external learning. Reference [28] synthesized the training data with the additional SR inputs, which were produced by an internal example-driven SISR model; thus, the performances are dependent on the choice of internal example-based SR inputs. To adapt the model to the testing image, Reference [19] performs fine-tuned on the pre-trained deep model, while the performance is lower than other methods. On the contrary, Liang proposed to select the best model from the pre-trained models according to the testing image and then fine-tune the model by the internal example [26]. To perform the model selection strategy effectively, a pool of models must be trained and stored offline, which leads to heavy computation and storage burden. IASR achieves trade-offs between performance and parameter sizes. Table 7 reports the comparison results. Table 7. The average PSNR and memory consumption of different methods which adopt the external and internal learning under the "bicubic" down-sampling scenario on Set5 with 2×.

Methods Parameters PSNR
IASR 229 K 37.34 [28] 665 K 37.48 [26] 665 K 37.58 [19] 1045 K 36.78 Figure 11 shows a failed example of IASR. IASR fails to improve the visual quality and tends to blur the result of the pre-trained basic model once there are not enough repetitive pattern occurrences in the LR image. ZSSR recovers more subtle details than ResNet and IASR. We conclude that our method is robust in the case of the same downsampling condition, but it fails to recover image details when the downsampling operator is not consistent with the training phase.

Conclusions
In this paper, we proposed a unified framework to integrate external learning and internal learning for image SR. The proposed IASR benefits from a large training dataset via external training, and it implements internal learning during the test phase. We introduce adaptive feature-wise transform layers to learn the internal features' distribution using examples extracted from the testing LR image and fine-tune the pre-trained network for the given image. IASR boosts the performance of the lightweight model, especially for an image that has strong self-similarities and repetitive patterns. We experimentally determine the appropriate hyper-parameters such as the kernel size and number of blocks to overcome the overfitting issue, and report the limitation of IASR also. In future works, we will focus on how to generalize IASR to different downsampling methods.