Single Image Super-Resolution Based on Global Dense Feature Fusion Convolutional Network

Deep neural networks (DNNs) have been widely adopted in single image super-resolution (SISR) recently with great success. As a network goes deeper, intermediate features become hierarchical. However, most SISR methods based on DNNs do not make full use of the hierarchical features. The features cannot be read directly by the subsequent layers, therefore, the previous hierarchical information has little influence on the subsequent layer output, and the performance is relatively poor. To address this issue, a novel global dense feature fusion convolutional network (DFFNet) is proposed, which can take full advantage of global intermediate features. Especially, a feature fusion block (FFblock) is introduced as the basic module. Each block can directly read raw global features from previous ones and then learns the feature spatial correlation and channel correlation between features in a holistic way, leading to a continuous global information memory mechanism. Experiments on the benchmark tests show that the proposed method DFFNet achieves favorable performance against the state-of-art methods.

growth rate of 16. The same dense blocks are stacked to build a deep network SRDenseNet [17] with dense skip connections. Tai et al. [18] proposed a deep network called MemNet [18] consisting of cascaded memory blocks [18] which can densely fuse global features. Hu et al. [8] proposed a cascaded multi-scale cross network (CMSC) to fuse complementary multi-scale information. Hence, the state information of some layers can be influenced not only by the adjacent information, but also by certain previous long-term information with direct connection.
As a network goes deeper, features become hierarchical since the receptive field of convolution layers in the network differs. VDSR [13], DRCN [14], SRResNet [15], SRDenseNet [17], and MemNet [18] successfully improve the performance by using the intermediate information of the network, which means the information can provide more clues to reconstruct HR image. However, none of them pay enough attention to the full use of the global features. Even though the gate unit in MemNet [18] was intended to control short-term memory and long-term memory [18] through the 1 × 1 convolution layer, it could only learn the channel correlation between features but not the feature spatial correlation. Besides, MemNet [18] interpolates LR images to get the same size as the HR images at preprocessing, and features in MemNet [18] are not directly extracted from the original LR images.
To solve these problems, a global dense feature fusion convolutional network (DFFNet) is proposed. DFFNet can extract dense features from an original LR image to reconstruct a HR image directly, without any image scaling preprocessing. For an extremely deep network, it is not practical to extract every single layer's output feature. A feature fusion block (FFblock) is introduced as the basic module of DFFNet. FFblock consists of a global feature fusion (GFF) unit and a feature reduction and learning (FRL) unit, which can make full use of global features, learning the feature spatial correlation and channel correlation. GFF unit concatenates all the output features of preceding blocks. The global raw features of all the preceding blocks can be directly learnt by the current block at every stage in the network. Each FFblock has a direct connection to the previous ones. Hence, a structure called global dense feature fusion (GDFF) is established by the dense feature fusion blocks (DFFBs) composed of cascaded FFblocks, where GFF unit has been densely utilized. GDFF leads to a continuous global information memory mechanism, and improves the flow of global information in the network.
In summary, this work has three main contributions, including: • A deep end-to-end unified framework global dense feature fusion convolutional network (DFFNet) is proposed for single image super-resolution of different scale factors. The network can learn the dense features from the original LR image and intermediate blocks and directly reconstruct HR images without any image scaling preprocessing. • A feature fusion block (FFblock) is introduced in DFFNet, which builds a direct connection between any two blocks through global feature fusion (GFF) unit, FFblock learns the feature spatial correlation and channel correlation from the previous global features to extract higher order features. • Dense feature fusion blocks (DFFBs) consisting of cascaded FFblocks, build global dense feature fusion so that previous global raw features can be directly learnt by the current FFblock at any stage in the network, and each FFblock in the DFFBs would adaptively decide how many of these features to be reserved, leading to a continuous global information memory mechanism.

Related Work
SISR has become a hot research topic in the field of image processing due to its wide use and great application value. The key technology of SISR is how to estimate the mapping relationship between LR image and HR image. It is essential to extract image features and perform non-linear representation to achieve high resolution image restoration.
Recently, deep learning-based methods [12][13][14][15][16][17][18][19][20][21][22] have achieved superior performance over conventional methods in SISR. SRCNN [12] firstly end-to-end learns the mapping between LR image and HR image. However, there are still existing problems, like lack of contextual connection and slow convergence. Making a network deeper and wider is the common way to improve the performance of  [12]. VDSR [13] increased the depth of the network by cascading same convolution layers while introducing residual learning to ease the difficulty of training the deep network. DRCN [14] not only utilized skip connections, but also used recursive supervision to speed up the training progress. Tai et al. [18] introduced a recursive unit based on the residual structure and a gate unit into memory block [18] to fuse the intermediate information. SRDenseNet [17] enhanced the flow of global information via dense skip connections. CMSC [19] fuses complementary multi-scale information by cascading multiple multi-scale cross modules that can learn features under different receptive fields.
However, most of these methods, such as SRCNN [12], VDSR [13], DRCN [14], and MemNet [18], need to interpolate the LR image target size, which increases the computation complexity quadratically [20]. As a result, it is hard to say those networks build an end-to-end map between an LR image and HR image without extracting the features from the original LR image. To address this problem, a transposed convolution layer was proposed by Dong et al. [20] in fast super-resolution convolutional neural networks (FSRCNN) [20], which is adopted in SRDenseNet [17] as well. Shi et al. [21] proposed an efficient subpixel convolutional neural network named ESPCN [21], directly upscaling the features into HR image. This structure was also adopted in SRResNet [15]. ESPCN [21] and FSRCNN [20] make it possible to extract features from the original LR image to reconstruct a HR image directly.
Huang et al. [22] proposed DenseNet, which introduced a dense block that let any two layers in the block have a direct connection. The same structure is also introduced in MemNet [18] and SRDenseNet [17]. More differences between MemNet [18], DenseNet [22], SRDenseNet [17], and our DFFNet will be discussed in Section 4.
The methods mentioned above have achieved state-of-art performance. However, all of them ignore the useful features in the middle of the network. Since global intermediate features are hierarchical in a very deep network, it would be helpful for SISR if the features could be fully used. To address this issue, a global dense feature fusion convolutional network is proposed to adaptively learn the global features in the intermediate layers from the LR image efficiently. The network will be detailed in the next section.

Basic Architecture
The architecture of our DFFNet consists of three parts: coarse feature extraction block (CFblock), dense feature fusion blocks (DFFBs), and reconstruction block (Recblock), as shown in Figure 1. Denote x and y, that represent the input and output of the network, and a convolution layer is utilized in CFblock to extract the coarse features from the LR image: where f extract denotes the coarse extraction function, W 0 is the weight of the convolution layer, and F 0 is the output of CFblock. In the DFFBs, supposing there are N feature fusion blocks, the output of each FFblock can be represented as where f FFblock n denotes the n-th FFblock function, and F 0 , F 1 , . . . , F n−2 , F n−1 and F n are the input and output of the function, respectively. In particular, the first FFblock could only receive feed-forward features F 0 from CFblock, which is illustrated in Figure 1 as well. Before Recblock, another convolution layer (Mid_conv) is stacked after DFFBs to further extract features F N+1 . F N+1 is then added with the  [10] is utilized in Recblock, as shown in Figure 1. The output of DFFNet can be formulated as where f rec denotes Recblock function, f DFFNet denotes the function of basic DFFNet.  [10] is utilized in Recblock, as shown in Figure 1. The output of DFFNet can be formulated as where rec f denotes Recblock function, DFFNet f denotes the function of basic DFFNet.

Feature Fusion Block
This section presents details about the proposed feature fusion block, as shown in Figure 2. FFblock contains two parts, including global feature fusion unit (GFF unit), and feature reduction and learning unit (FRL unit).

Feature Fusion Block
This section presents details about the proposed feature fusion block, as shown in Figure 2. FFblock contains two parts, including global feature fusion unit (GFF unit), and feature reduction and learning unit (FRL unit).  [10] is utilized in Recblock, as shown in Figure 1. The output of DFFNet can be formulated as where rec f denotes Recblock function, DFFNet f denotes the function of basic DFFNet.

Feature Fusion Block
This section presents details about the proposed feature fusion block, as shown in Figure 2. FFblock contains two parts, including global feature fusion unit (GFF unit), and feature reduction and learning unit (FRL unit). Global feature fusion unit is designed to further improve the flow of information by fusing global raw features from all the preceding FFblocks and CFblock. Global features from previous blocks are concatenated as the output of the GFF unit: where f usion n is the output of GFF unit in n-th FFblock, [F 0 , F 1 , F 2 , . . . , F n−2 ] denotes the global output features of CFblock and previous 1, 2, . . . , (n − 2)-th FFblocks, and F n−1 is the output of (n − 1)-th FFblock. In particular, when n = 1, f usion 1 = F 0 , since the first FFblock only receives feed-forward features F 0 from CFblock. If GFF unit output has G F n feature-maps: where G F 0 is the number of features of CFblock output, and G F n−1 is the number of features of (n − 1)-th FFblock output. Each FFblock builds dense direct connections to all the subsequent ones. Therefore, by densely utilizing GFF unit, DFFBs builds global dense feature fusion (GDFF) which leads to a continuous global information memory mechanism. Feature reduction and learning unit is introduced to make further use of the global features and, unlike the gate unit in MemNet [18], two 3 × 3 convolution layers (C_1 and C_2) are utilized. Design of FRL unit is based on the residual structure in SRResNet [15]. Batch normalization (BN) layers are removed. As the experiment results shown in Figure 3, BN layer does not help performance improvement. Compared to 1 × 1 convolutional layer in the gate unit, the 3 × 3 convolution layers both learn the feature spatial correlation and channel correlation, then adaptively decide how much of the previous global features and feed-forward features should be reserved: where F n is the output of the n-th FFblock, and W n,1 and W n,2 are the weight parameters of C_1 and C_2, respectively. σ denotes the non-linear activation function ReLU [23]. In DFFNet, the number of feed-forward features G remains the same, thus, However, as the network goes forward, the number of GFF unit output features grows up, linearly. It is necessary to reduce parameters of FFblock when the network goes extremely deep. We let C_1 output have [θG F n ] features, where θ is the compression factor. When θ = 1, C_1 has the same number of features as GFF unit output. In our basic DFFNet, θ is set to 0.25.

Reconstruction Block
As shown in Figure 1, the first 3 × 3 convolution layer (Re_conv_1) in Recblock is utilized to extract dense features. If the scale factor is r (e.g., ×2 and ×3), the output of Re_conv_1 has 2G × r 2 feature-maps. A subpixel convolution layer [21] (Re_sub_pixel_1) is stacked after Re_conv_1. Re_sub_pixel_1 is a periodic shuffling operator that rearranges the elements of a H × W × C · r 2 tensor to a tensor of shape rH × rW × C, which is illustrated in Figure 1. The output of Re_conv_1 has 2G feature-maps with size rH LR × rW LR , where H LR and W LR is the height and width of the input. For a large-scale factor ×4, another convolution layer (Re_conv_1_1) and subpixel convolution layer (Re_sub_pixel_2) would be stacked after Re_sub_pixel_1. The last 3 × 3 convolution layer (Re_conv_2) outputs three feature-maps, forming the reconstructed RGB image.

Implementation Details
In our basic DFFNet, we set kernel size of all the convolution layers to 3 × 3, and the number of feed-forward features G remains the same, at 32, and we set zero padding to input of all layers to keep feature sizes fixed. The number of FFblocks in DFFBs is set to N = 32. Finally, DFFNet outputs three channel colorful images and can process gray image as well. For a detailed presentation of DFFNet, please see in Appendix A.
Given training datasets img , where the M denotes the number of image patches, hr denotes the HR image, and img (i) lr denotes the LR image, and L 1 loss is used as training loss function: Although most methods use L 2 loss, the L 1 loss is demonstrated to be more powerful for performance and convergence [24].

Discussions
Difference to DenseNet. DenseNet [22] builds its architecture on the dense connections within any two layers in the dense block [22]. However, this densely connected structure is utilized only in a local way, since the size of features is different in different dense blocks, so it is impossible for the dense block to read raw features from the subsequent ones. Moreover, batch normalization (BN) layers are removed in our DFFNet, which increase computation complexity and do not help improve performance. To keep the feature sizes fixed in the network, a pooling layer is not used in DFFNet. Furthermore, feature fusion block is utilized to read the global features directly from all the preceding blocks, and learn to extract higher order features, leading to a contiguous memory mechanism which the DenseNet [22] cannot achieve.
Difference to SRDenseNet. The dense block in SRDenseNet [17] has the same architecture as the one in DenseNet [22]. SRDenseNet [17] introduces the dense block to solve SISR and enhance it with dense skip connections. Although the dense block can read features from the convolution layers in the block while building a local residual learning with local skip connection, the block is, however, unable to directly read global raw features from the preceding ones in a global way, like our DFFblock does. Global dense feature fusion is introduced in DFFNet, and each FFblock can learn from all the global raw features of preceding blocks and then adaptively decide how much of the current and prior information should be reserved. With full use of global raw features, DFFNet achieves better performance than SRDenseNet [17].
Difference to MemNet. The difference between MemNet [18] and DFFNet can be summarized in two points. First, the gate unit in memory block fuses the global features with 1 × 1 convolution layer, thus, the memory block could only learn the channel correlation between features. Two 3 × 3 convolution layers are utilized in feature fusion block (FFblock). FFblock can not only learn the channel correlation between features, but also the feature spatial correlation and, as a result, our FFblock can make further use of the intermediate features in a more global way than the memory block. Second, MemNet [18] does not directly extract features from an LR image; it has to resize the LR image with interpolation preprocessing to get the target size of the HR image, while our DFFNet extracts the features from the original LR image and utilizes Recblock to reconstruct HR image directly with dense features.

Datasets and Metrics
A public high-quality dataset DVI2K [25] with 2K resolution released by Timofte et al. [25] is used for model training. DVI2K [25] includes 800 training images, 100 validation images, and 100 test images containing various types of images of landscapes, such as people, animals, insects, plants, buildings, and complex textures. The LR images used for training are obtained by bicubic downsampling with different scale factors, including ×2, ×3, and ×4, by adopting MATLAB function imresize with the option bicubic from 800 training images. Standard benchmark datasets, Set5 [26], Set14 [27], B100 [28], Urban100 [29], are used for testing. For comparison, the SISR results with different three scale factors are evaluated with peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index [30] on luminance channel (Y channel) in transformed YCbCr color space, and the same name of pixels as scale factors (×2, ×3 and ×4) are ignored, from the border, in the SISR results.

Training Details
For training, in each training batch, we use 16 RGB image patches with size 48 × 48 randomly cropped from LR images and the corresponding HR images for all model with different scale factors (×2, ×3, and ×4). Patches are augmented during training with random horizontal flip, vertical flip, and 90-degree rotation with random probability of 0.5. We normalize the image patches values and subtract them by the mean RGB value of the DIV2K [25] dataset as preprocessing. We implement our DFFNet with the Tensorflow framework and train the model with ADAM optimizer [31] by setting β 1 = 0.9, β 2 = 0.999, and ε = 10 −8 . The training loss function is L 1 loss. The learning rate is initialized as 0.0001 for all layers, and halved at every 200 epochs, and an epoch consists of 1000 updates. The model with different scale factors will be individually trained. It takes about 1 day with a GPU GTX1080 Ti for 300 epochs to train a basic DFFNet.
We train our model of DFFNet with scale factor ×2 (denoted as ×2 model), as described in Section 3.4, firstly, from scratch. After the ×2 model converges, we use it as a pre-trained network for the model with scale factor ×3 (denoted as ×3 model), we use ×2 model parameters to initialize all parameters in ×3 model, except the parameters in Recblock, and then fine-tune the ×3 model with the learning rate of 0.00005, about 50 epochs. The converged ×3 model will later be used as a pre-trained network for the model with scale factor ×4 (denoted as ×4 model). Training settings are kept as same as for the ×3 model. different three scale factors are evaluated with peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index [30] on luminance channel (Y channel) in transformed YCbCr color space, and the same name of pixels as scale factors (×2, ×3 and ×4) are ignored, from the border, in the SISR results.

Training Details
For training, in each training batch, we use 16 RGB image patches with size 48 × 48 randomly cropped from LR images and the corresponding HR images for all model with different scale factors (×2, ×3, and ×4). Patches are augmented during training with random horizontal flip, vertical flip, and 90-degree rotation with random probability of 0.5. We normalize the image patches values and subtract them by the mean RGB value of the DIV2K [25] dataset as preprocessing. We implement our DFFNet with the Tensorflow framework and train the model with ADAM optimizer [31] by setting We train our model of DFFNet with scale factor ×2 (denoted as ×2 model), as described in Section 3.4, firstly, from scratch. After the ×2 model converges, we use it as a pre-trained network for the model with scale factor ×3 (denoted as ×3 model), we use ×2 model parameters to initialize all parameters in ×3 model, except the parameters in Recblock, and then fine-tune the ×3 model with the learning rate of 0.00005, about 50 epochs. The converged ×3 model will later be used as a pre-trained network for the model with scale factor ×4 (denoted as ×4 model). Training settings are kept as same as for the ×3 model.  Table 1 presents the ablation study on the effects of global dense feature fusion (GDFF) and longterm skip connection (LTSC). Four networks in Table 1 have the same numbers of FFblocks and feedforward features as the standard model. The baseline model (denoted as M_base) is obtained without GDFF and LTSC, based on the standard DFFNet, which has the plain structure. The performance (PSNR = 28.87 dB) of M_base is poor, even worse than Bicubic (PSNR = 33.66 dB). This is caused by the difficulty of training [1], and demonstrates that stacking many basic convolution layers does not result in better performance. Table 1. Ablation study on effects of global dense feature fusion (GDFF) and long-term skip connection (LTSC). We present the best performance (average PSNR) on Set5 with scale factor ×2 in 200 epochs.  Table 1 presents the ablation study on the effects of global dense feature fusion (GDFF) and long-term skip connection (LTSC). Four networks in Table 1 have the same numbers of FFblocks and feed-forward features as the standard model. The baseline model (denoted as M_base) is obtained without GDFF and LTSC, based on the standard DFFNet, which has the plain structure. The performance (PSNR = 28.87 dB) of M_base is poor, even worse than Bicubic (PSNR = 33.66 dB). This is caused by the difficulty of training [1], and demonstrates that stacking many basic convolution layers does not result in better performance. Then, we add LTSC and GDFF to M_base, resulting in M_LTSC and M_GDFF. Results show that each structure can efficiently improve the performance of M_base. This is mainly because each structure enhances the flow of information and gradient. A combination of the two structures would perform better than either in isolation. When we used two structures simultaneously (denote as M_GDFF_LTSC), DFFNet with LTSC and GDFF obviously performs the best.

Ablation Study
The visualization of convergence process is presented in Figure 4. The curves verify the analyses above, and show that LTSC and GDFF both stabilize the training process while accelerating model convergence. GDFF can further improve the performance. From the red curve of M_GDFF_LTSC, we can see that LTSC can effectively reduce performance drop while improving performance, when combined with GDFF. Visual and quantitative analyses demonstrate that DFFNet can benefit greatly from LTSC and GDFF.

M_Base M_LTSC M_GDFF M_GDFF_LTSC
28.87 35.00 37.94 38.08 Then, we add LTSC and GDFF to M_base, resulting in M_LTSC and M_GDFF. Results show that each structure can efficiently improve the performance of M_base. This is mainly because each structure enhances the flow of information and gradient. A combination of the two structures would perform better than either in isolation. When we used two structures simultaneously (denote as M_GDFF_LTSC), DFFNet with LTSC and GDFF obviously performs the best.
The visualization of convergence process is presented in Figure 4. The curves verify the analyses above, and show that LTSC and GDFF both stabilize the training process while accelerating model convergence. GDFF can further improve the performance. From the red curve of M_GDFF_LTSC, we can see that LTSC can effectively reduce performance drop while improving performance, when combined with GDFF. Visual and quantitative analyses demonstrate that DFFNet can benefit greatly from LTSC and GDFF.

Benchmark Results
We compare our DFFNet with other methods on the benchmark testings, including Bicubic, SRCNN [12], VDSR [13], DRCN [14], SRResNet [15], LapSRN [32], CMSC [19], SRDenseNet [17], and MemNet [18]. We present quantitative results for ×2, ×3, and ×4 in Table 2. When compared with persistent models, such as MemNet [18] and SRDenseNet [17], our DFFNet performs best on all benchmarks with all scale factors. When the scale factor becomes larger (e.g., ×3, ×4), it is harder for all models to reconstruct HR images from LR images with much lower resolution, because more details need to be reconstructed. Nonetheless, our DFFNet still outperforms the others. Specifically, most images in Urban100 contain self-similar textures, although dense skip connections [6] in SRDenseNet [17] and gate unit in MemNet [18] can fuse global information to restore similar structures in images, and our DFFNet gives PSNR/SSIM of 26.20 dB/0.7893, which is 0.7 dB/0.0263 and 0.15 dB/0.0074 better than MemNet [18] and SRDenseNet [17] on Urban100 with scale factor ×4. This demonstrates that our feature fusion block (FFblock) is more effective than memory block in MemNet [18] and dense block in SRDenseNet [17], and further illustrates that fusing the global intermediate features via global dense feature fusion (GDFF) provides more clues to reconstruct HR image from the degraded image. When compared with other methods, our DFFNet still achieves the best average results on all datasets.
We also made comparison of model complexity with other methods in Table 3. DFFNet has many more parameters than other compared methods, which would occur when the network goes

Benchmark Results
We compare our DFFNet with other methods on the benchmark testings, including Bicubic, SRCNN [12], VDSR [13], DRCN [14], SRResNet [15], LapSRN [32], CMSC [19], SRDenseNet [17], and MemNet [18]. We present quantitative results for ×2, ×3, and ×4 in Table 2. When compared with persistent models, such as MemNet [18] and SRDenseNet [17], our DFFNet performs best on all benchmarks with all scale factors. When the scale factor becomes larger (e.g., ×3, ×4), it is harder for all models to reconstruct HR images from LR images with much lower resolution, because more details need to be reconstructed. Nonetheless, our DFFNet still outperforms the others. Specifically, most images in Urban100 contain self-similar textures, although dense skip connections [6] in SRDenseNet [17] and gate unit in MemNet [18] can fuse global information to restore similar structures in images, and our DFFNet gives PSNR/SSIM of 26.20 dB/0.7893, which is 0.7 dB/0.0263 and 0.15 dB/0.0074 better than MemNet [18] and SRDenseNet [17] on Urban100 with scale factor ×4. This demonstrates that our feature fusion block (FFblock) is more effective than memory block in MemNet [18] and dense block in SRDenseNet [17], and further illustrates that fusing the global intermediate features via global dense feature fusion (GDFF) provides more clues to reconstruct HR image from the degraded image. When compared with other methods, our DFFNet still achieves the best average results on all datasets.
We also made comparison of model complexity with other methods in Table 3. DFFNet has many more parameters than other compared methods, which would occur when the network goes deep and, so, many features need to be fused. Despite this drawback, our DFFNet is still three times faster than MemNet [18] with better performance for DFFNet, which does not need any image scaling preprocessing. Visual comparisons on scale factor ×4 are shown in Figures 5-8. For image 86,000.bmp and 102,061.bmp, it is observed that most compared methods, such as VDSR and DRCN, would produce visible artifacts and blurred textures and edges, and even fail to recover some small textures. By contrast, our DFFNet can reconstruct clearer textures and sharper edges with fewer artifacts, closer to the original image. For the line in img044.bmp in Figure 7, as pointed out by the red arrow, all the other methods cannot successfully recover it, while our DFFNet can recover it with an obviously sharper edge. This is mainly because our DFFNet takes full advantage of global intermediate information with global dense feature fusion.  Visual comparisons on scale factor ×4 are shown in Figures 5-8. For image 86,000.bmp and 10,2061.bmp, it is observed that most compared methods, such as VDSR and DRCN, would produce visible artifacts and blurred textures and edges, and even fail to recover some small textures. By contrast, our DFFNet can reconstruct clearer textures and sharper edges with fewer artifacts, closer to the original image. For the line in img044.bmp in Figure 7, as pointed out by the red arrow, all the other methods cannot successfully recover it, while our DFFNet can recover it with an obviously sharper edge. This is mainly because our DFFNet takes full advantage of global intermediate information with global dense feature fusion.

Conclusions
In this paper, we proposed a global dense feature fusion convolutional network (DFFNet) for SISR, where a feature fusion block (FFblock) is introduced as the basic module. Each FFblock can read raw features directly from all the preceding blocks in DFFNet, learning and adaptively controlling the reserve of previous global features. The global dense feature fusion (GDFF) in dense feature fusion blocks further builds the dense connections between FFblocks and coarse feature extraction block (CFblock) while stabilizing the training process and improving the flow of global information and gradient, leading to a continuous global information memory mechanism. Moreover, our DFFNet extracts features from the original LR images and reconstructs HR images with dense features directly, without any image scaling preprocessing. By fully utilizing the global features, our DFFNet leads to a deep and wide network. Quantitative and visual benchmark evaluation results demonstrate well that our DFFNet achieves superior performance over state-of-the-art methods.
Author Contributions: All authors have read and approved the manuscript. W.X. proposed the research idea of this paper, and was responsible for the experiment, data analysis and result interpretation. R.C. was responsible for the verification of the research scheme. B.H. collected experimental data and conducted literature search, and verified the experimental results. X.Z. and C.L. was responsible for the visualization of experimental data and chart making. The paper was mainly written by W.X., with the participation of X.Z. and C.L. Revision and review of the manuscript was completed by R.C.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In this section, we present the architecture of DFFNet in a more detailed way. The configuration of DFFNet has been described in Section 3.4. The number of FFblock, N = 32; the number of feed-forward features, G = 32; the compression factor, θ = 0.25. Given a preprocessed RGB image of shape 48 × 48 × 3 as input, dimensions and parameters of each block in the DFFNet have been summarized in Table A1. For the n-th FFblock, the output of GFF unit and C_1 have 32n and 8n feature-maps, respectively, as shown in Table A1. Before Recblock, all the outputs have the same size as the input since we have set zero padding for input of all layers to keep feature sizes fixed. We list the specifications of Recblock with a different scale factor separately in Table 3, since some parameters of Recblock depend on scale factor.