Next Article in Journal
A Realization Approach to Lossy Network Compression of a Tuple of Correlated Multivariate Gaussian RVs
Previous Article in Journal
A Bayesian Analysis of Plant DNA Length Distribution via κ-Statistics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Attention Network with Information Distillation for Super-Resolution

Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education, School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(9), 1226; https://doi.org/10.3390/e24091226
Submission received: 6 June 2022 / Revised: 19 August 2022 / Accepted: 26 August 2022 / Published: 1 September 2022

Abstract

:
Resolution is an intuitive assessment for the visual quality of images, which is limited by physical devices. Recently, image super-resolution (SR) models based on deep convolutional neural networks (CNNs) have made significant progress. However, most existing SR models require high computational costs with network depth, hindering practical application. In addition, these models treat intermediate features equally and rarely explore the discriminative capacity hidden in their abundant features. To tackle these issues, we propose an attention network with information distillation(AIDN) for efficient and accurate image super-resolution, which adaptively modulates the feature responses by modeling the interactions between channel dimension and spatial features. Specifically, gated channel transformation (GCT) is introduced to gather global contextual information among different channels to modulate intermediate high-level features. Moreover, a recalibrated attention module (RAM) is proposed to rescale these feature responses, and RAM concentrates the essential contents around spatial locations. Benefiting from the gated channel transformation and spatial information masks working jointly, our proposed AIDN can obtain a more powerful ability to identify information. It effectively improves computational efficiency while improving reconstruction accuracy. Comprehensive quantitative and qualitative evaluations demonstrate that our AIDN outperforms state-of-the-art models in terms of reconstruction performance and visual quality.

1. Introduction

The resolution of an image is restricted by the sensor imaging device, hindering its development. Single-image super-resolution (SISR) is a typical low-level problem in computer vision, which aims to restore an accurate high-resolution (HR) image from a degraded low-resolution (LR) observation. It has been widely used in various important fields involving the development of multimedia technology [1], such as remote-sensing imaging, live video [2], and monitoring devices. However, image super-resolution is still a challenging topic because multiple HR images may be reconstructed from any LR image. To tackle this difficulty, plenty of approaches based on deep convolutional neural networks (CNNs) have been proposed to establish LR–HR image mappings, which have achieved excellent performance [3,4].
SRCNN [5] was the pioneering work in deep learning for image super-resolution reconstruction, directly modeling an end-to-end mapping through only a three-layer convolutional network, which achieved better results than traditional algorithms. Subsequently, deep CNN-based SR models have become the mainstream. Kim et al. presented a very deep convolutional network (VDSR) [6] and DRCN [7], pushing the model depth to 20 layers by equipping the residual structure [8], which led to a remarkable performance gain (e.g., VDSR obtained a PSNR of 37.53 vs. SRCNN’s PSNR of 36.66 on Set5 ×2; PSNR is defined in Section 4.1). These methods take the interpolated LR image as the input to the network, undoubtedly increasing the computational burden and time overhead. FSRCNN [9], using transposed convolution, and ESPCN [10], adopting sub-pixel convolution, have been proposed to accelerate the inference and reduce the computational burden by changing the up-scaled position of the input low-resolution (LR) image. Thanks to effective sub-pixel convolution, Lim et al. [11] explored a broad and deep EDSR network without the batch normalization module, dramatically improving the SR perfromance (e.g., EDSR PSNR = 38.11 vs. SRCNN PSNR = 37.53 on Set5 ×2). Since then, researchers have attempted to design more complex networks to enhance network accuracy.
To obtain more abundant information, hierarchical features and multi-scale features can be used. Wang et al. [12] introduced an adaptive weighted multi-scale (AWMS) module residual structure to realize a lightweight network. SRDenseNet [13], based on DenseNet [14], used the concatenated features of all layers to enhance feature propagation and maintain continuous feature transmission. Furthermore, Song et al. [15] leveraged NAS [16] to find an efficient structure based on a residual dense module for accurate super-resolution. However, most SR models do not distinguish these intermediate features and lack flexibility in processing different information types, thus preventing better performance. RCAN [17] developed a channel attention module to model the channel interdependencies, in order to obtain discriminative information, and achieved a PSNR of 38.27 on Set5 ×2; however, it has more than 16M parameters, which is not conducive to deployment on resource-limited devices. Later, Hui et al. [18] constructed an information multi-distillation structure with the splitting operation, greatly reducing the number of channels. Lan et al. [19] introduced channel attention into the residual multi-scale module to enhance the feature representation capability (MADNet), and generated a PSNR of 37.85 on Set5 ×2 with 878 K parameters.
Motivated by the above, we propose an attention network with an information distillation structure (AIDN) for efficient SISR, using several stacked attention information distillation blocks (AIDB). Inspired by IDN, we carefully develop an attention information distillation block (AIDB) to asymptotically learn more intermediate feature representations, mainly employing multiple splitting operations combined with gate channel transformation (GCT). Specifically, the splitting strategy divides the previously extracted features into two parts, where one is retained while the other is further processed by GCT. The normalization method and attention mechanism are combined to gain precise contextual information. GCT can learn the importance of different channels adaptively and takes weighted feature maps as the input to the next layer. Meanwhile, GCT encourages cooperation at shallow layers and competition at deeper layers. Moreover, all distilled features are aggregated through the recalibrated attention module (RAM), which further refines these high-frequency features and revises the importance of features in the channel dimension. In general, the main contributions of our work can be summarized as follows:
  • We propose an attention network with an information distillation structure (AIDN) for efficient and accurate image super-resolution, which extracts the valuable intermediate features step by step using the distillation structures;
  • We introduce gate channel transformation (GCT) into SISR and use it in one distillation branch;
  • We propose a recalibrated attention module (RAM) to re-highlight the contributions of features and strengthen the expressive ability of the network. Comprehensive experimental results demonstrate that the proposed method strikes a good balance between performance and model size.

2. Related Work

2.1. Deep CNN-Based Super-Resolution Methods

In recent years, methods based on deep convolutional neural networks (CNNs) have been successfully applied to various tasks, showing excellent performance.
Dong et al. [5] first explored the use of three convolutional layers for single-image super-resolution (SISR), and obtained better reconstruction results than by using the traditional methods. Subsequently, with the successful application of the residual network architecture [8] in computer vision tasks, more and more residual-learning-variant algorithms have been used to reconstruct SR images, including LapSRN [20], WMRN [21], CFSRCNN [22], and RFANet [23]. Dense connections have also been introduced for image super-resolution through the information flow of hierarchical features. RDN [24] combined the residual structure with dense connections to form a residual dense network with a continuous memory. Zhang et al. [25] developed GLADSR through the use of the global–local adjustment of dense connections to increase the network capacity.
Although these methods have achieved good performance, the parameters increase dramatically with the network depth, making them unsuitable for mobile platforms. DRCN [7] leveraged recursive learning to decrease the parameters of the network. CARN [26] developed a cascading architecture in the residual structure, forming a lightweight model suitable for practical applications. CBPN [27] struck a good balance between efficiency and performance by learning mixed residual features. Song et al. [28] devised AdderNets to resolve the defects of adder neural networks. It provided a better visual effect with lower energy consumption without changing the original structures. More recently, some NAS-based SR models have been proposed to automatically search for optimal architectures. Chu et al. [29] presented an automatic search algorithm, FALSR, based on NAS, to achieve a fast and lightweight SR model. DRSDN [30] explored diverse plug-and-play network architectures for efficient single-image super-resolution.

2.2. Attention Mechanism

Attention mechanism is a data processing method in machine learning, which is used to improve the performance of convolutional neural networks (CNN) in computer vision tasks. Attention mechanism aims to enable a network to automatically learn more focused areas by using masks (new weights). SENet [31] can be regarded as the first model of attention mechanism, which improved the representational capability of the network by modeling the relationship between channels. Wang et al. [32] presented a non-local block to calculate the response of a location to the information of all positions. CBAM [33] connected channel attention and spatial attention in a series to obtain a 3D attention map to form a lightweight, universal module. GCT [34] combined a normalization module with attention mechanism using lightweight variables to learn the interrelationships between channel-wise information. ECA-Net [35] developed a local cross-channel interaction scheme without dimension reduction, which proved to be an efficient and lightweight channel attention structure.
In addition, attention-based works have been proposed to further improve super-resolution performance. Zhang et al. [23] introduced enhanced spatial attention (ESA) into the residual-in-residual (RIR) structure to build a residual feature aggregation block, thus forming a lightweight and effective model. Dai et al. [36] designed a second-order attention network (SAN), which employed second-order feature statistics to learn more discriminative feature expressions. DRLN [37] developed a novel Laplacian attention with dense connections on the cascaded residual structure to study the inter- and intra-layer dependencies that achieved deep supervision. Hu et al. [38] explored channel-wise and spatial attention residual blocks (CSAR) to modulate hierarchical features in both global and local manners, achieving prominent performance. CSNLN [39] proposed a non-local attention with a different scale, which thoroughly explored all possible priors through non-local calculations of the feature-wise similarities between patches in cross-scales.

3. Proposed Method

3.1. Network Architecture

In this section, we introduce the entire framework of our proposed attention network with information distillation (AIDN), as shown in Figure 1. Our AIDN architecture comprises three parts: a low-level feature extraction module (LFE), stacked attention information distillation blocks (AIDBs), and a image reconstruction module. Here, I L R represents the original low-resolution (LR) input image, while I S R denotes its output super-resolution (SR) image. Specifically, a convolutional layer is first leveraged to extract the shallow features from the given LR input. This procedure can be expressed as
X 0 = F L F E ( I L R )
where F L F E ( · ) denotes a convolutional layer with a kernel size of 3 × 3 , and X 0 is the extracted shallow features. Then, X 0 is sent to the next part, which consists of multiple attention information distillation blocks (AIDBs) in a chain, which gradually refines multiple hierarchical features. This process can be denoted as
X n = F A I D B n ( X n 1 ) = F A I D B n ( F A I D B n 1 ( F A I D B 0 ( X 0 ) ) )
where F A I D B n indicates the n-th AIDB function, and X n 1 and X n denote the input and output feature maps of the n-th AIDB, respectively.
Then, the deep features generated by this sequence of AIDBs are further concatenated together through global feature fusion. After fusing, the deep features are processed by two convolution layers to the reconstruction module, which can be formulated as
X a g g r e g a t e = F a g g r e g a t e ( C o n c a t ( X 1 , , X n ) )
where C o n c a t represents the concatenation operation, and F a g g r e g a t e denotes a composite function of a convolution layer with a kernel size of 1 × 1 following a convolution layer with a kernel size of 3 × 3 .
In addition, the deep-aggregated feature X a g g r e g a t e is added to the shallow feature X 0 through global residual learning. Finally, the super-resolving output images are produced through the reconstruction function, as follows
I S R = F r e c ( X a g g r e g a t e + X 0 )
where F r e c ( · ) represents the reconstruction module function and I S R is the output super-resolution image of the network. The reconstruction module consists of a 3 × 3 convolutional layer and a pixel-shuffle layer.
Different loss functions have been introduced to optimize SR networks. For fair comparison with the most advanced methods, our model is optimized using the L 1 loss function, as in previous works [18,21]. Given a training set { I L R i , I H R i } i = 1 N , N denotes the number of LR–HR image patches. Hence, the loss function of our AIDN can be represented as
L ( Θ ) = 1 N i = 1 N | | H A I D N ( I L R i ) I H R i | | 1
where Θ indicates the learnable parameters of our AIDN model and H A I D N ( · ) denotes the function of our model. Our goal is to minimize the L 1 loss function between the reconstructed image I S R and the corresponding ground-truth high-resolution (HR) image I H R .

3.2. Attention Information Distillation Block

This section mainly introduces the key parts of the proposed AIDB. As shown in Figure 2, the proposed attention information distillation block (AIDB) mainly contains the feature refinement module (FRM) and the recalibrated attention module (RAM). Specifically, the FRM module gradually extracts the multi-layer features by employing information diffluence to obtain a discriminative learning ability. A few features are also aggregated according to their contributions. Moreover, the RAM module re-highlights the informativeness of the features and enhances the expression capability of the network.

3.3. Feature Refinement Module

The feature refinement module (FRM) exploits the distillation network and attention mechanism to separate and process features by connection or convolution. Specifically, a 3 × 3 convolution layer is first exploited to extract input features for multiple succeeding distillation steps in the FRM. For each step, the channel split operation is performed on the previous features, resulting in two-part features. Both parts require further processing. One part is reserved, while the other part is used as input to the gate channel transformation (GCT) module [34]. Assuming the input features are denoted by X i n , this procedure can be formulated as
X r e t a i n 1 , X c o a r s e 1 = S p l i t 1 ( F c o n v 1 ( X i n ) ) X r e t a i n 2 , X c o a r s e 2 = S p l i t 2 ( F c o n v 2 ( F G C T 1 ( X c o a r s e 1 ) ) X r e t a i n 3 , X c o a r s e 3 = S p l i t 3 ( F c o n v 3 ( F G C T 2 ( X c o a r s e 2 ) ) X r e t a i n 4 = F c o n v 4 ( X c o a r s e 3 )
where F c o n v i indicates the i-th 3 × 3 convolution operation followed by the Leaky ReLU(LReLU) activation function, F G C T i denotes the channel transformation operation (detailed in the following section), S p l i t j represents the j-th channel split operation, X r e t a i n i denotes the i-th retained features, and X c o a r s e j represents the j-th coarse features, which are further fed to the subsequent layers. Afterward, all the features retained in each step are concatenated along the channel dimension, which can be denoted as
X F R M = C o n c a t ( F d i s t i l l e d 1 , F d i s t i l l e d 2 , F d i s t i l l e d 3 , F d i s t i l l e d 4 )
where C o n c a t indicates the concatenation operation and X F R M denotes the output of the feature refinement module (FRM).

3.4. Gate Channel Transformation

Gate channel transformation (GCT) [34] is an attention mechanism. Moreover, GCT is a simple and effective channel-relationship-modeling architecture, combining a normalization module and gating mechanism. As shown in Figure 3, the overall structure of the GCT module consists three parts: global context embedding, channel normalization, and a gating mechanism. First, we employ L 2 -norm to capture global contextual information from the input feature. Given the input feature X = { x 1 , x 2 , . . . , x k } , X R C × H × W , it can be written mathematically as [34]
s c = α c | | x c | | 2 = α c { [ i = 1 H j = 1 W ( x c i , j ) 2 ] + ϵ } 1 2
where S = { S 1 , S 2 , . . . , S c } , S R C × 1 × 1 is the gathered global-context-embedding information along each channel dimension, ε represents a very small constant to avoid the derivation problem at zero point, and α c denotes the trainable parameter, namely the embedding weight. Furthermore, α c can control the different weights of each channel. In particular, when α c approaches 0, the channel will not participate in the subsequent normalization module. Accordingly, it enables the network to recognize when one channel is independent of the others. Then, we adopt the normalization operation to reduce the number of parameters and improve the computational efficiency. Furthermore, normalization approaches [40] have been shown to establish competitive relations between different neurons (or channels) in neural networks, which stabilize the training process. This allows for larger values with larger channel responses and restrains the other channels with less feedback. The channel normalization function can be expressed as
s ^ c = C s c | | s | | 2 = C s c [ ( c = 1 C s c 2 ) + ϵ ] 1 2
where C is the number of channels. Finally, the gating mechanism is introduced to control the activation of the gate channel. The gating function is defined as follows
x ^ c = x c [ 1 + t a n h ( γ c s ^ c + β c ) ]
where γ = [ γ 1 , . . . , γ C ] denotes gating weights, β = [ β 1 , . . . , β C ] represents gating biases, and x c , x ^ c are the input and output features of the gating mechanism module, respectively. The weights and biases determine the behavior of GCT in each channel. When the gating weight γ C is activated actively, GCT enhances this channel to compete with the others. When the gating weight is activated passively, GCT pushes the channel to cooperate with the others. In other words, low-level features are primarily learned in the shallow layers of the network. Thus, cooperation between channels is required to more widely extract features. In the deeper layers, high-level features are mainly learned, and their differences are often large. Therefore, competition between channels is needed to obtain more valuable feature information.
In addition, when the gating weight and bias are zeros, the original features are allowed to pass to the next layer, which can be formulated as
x ^ c = x c
This can establish an identity mapping and solve the degradation problem of deep networks. Hence, during GCT module initialization, α is initialized with 1, and γ and β are initialized with 0. The initial steps will be improved the robustness of the training process, and the final GCT results will be more accurate.

3.5. Recalibrated Attention Module

To recalibrate informative features, the output features of FRM are further fed into the recalibrated attention module (RAM), where the informative features are selectively emphasized and useless features are inhibited according to their importance. As shown in Figure 4, the overall structure of the RAM is a bottleneck architecture. Here, X F R M and X R A M are defined as the input and output of the RAM, respectively. Specifically, the concatenated features are first passed to a 1 × 1 convolution layer to decrease channel dimensions; then, they are divided into two branches. One branch preserves the original information with a 1 × 1 convolution to produce X 1 , while the other processes the spatial information to search for the areas with the highest contribution. In addition, this branch is equipped with two 3 × 3 convolutions, a max-pooling layer, and a bilinear interpolation operator to generate X 2 . The max-pooling operation not only enhances the receptive field but also captures high-frequency details. The bilinear interpolation layer maps the intermediate features to the original feature space to keep the identical size of the input and output. Finally, X 1 and X 2 are concatenated and fed into a 1 × 1 convolution followed by a sigmoid function. This 1 × 1 convolution is adopted to restore the channel dimensions. Hence, the recalibrated attention can be expressed as
X R A M = F R A M ( X F R M ) · X F R M
where F R A M ( · ) is the recalibrated attention module function.
Therefore, the final output of the attention information distillation block (AIDB) can be formulated as
X B n = X B n 1 + F c o n v ( X R A M )
where F c o n v is a 3 × 3 convolutional layer, and X B n and X B n 1 denote the input and output of the n-th AIDB, respectively. Furthermore, the GCT module considers the channel-wise statistics, while the recalibrated attention module (RAM) encodes multi-scale features, focusing on the context around the spatial locations. Therefore, AIDB can modulate more informative features to obtain a more powerful feature representation capability, which is conducive to improving SR performance.

4. Experiments Section

In this section, we first describe our experimental conditions regarding the implementation details and training settings. Then, we study the validity of the proposed modules in our model. Finally, we systematically compare the proposed network with plenty of state-of-the-art models.

4.1. Datasets and Metrics

In our experiments, following previous works [18,21], we employed the DIV2K dataset [41] to train our model. It includes 800 high-quality training images. In the testing phase, we adopted five public benchmark datasets—Set5 [42], Set14 [43], BSD100 [44], Urban100 [45], and Manga109 [46]—to comprehensively validate the effectiveness of our model. In addition, we leveraged the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) [47] as quantitative evaluation metrics for the performance of the final reconstructed super-resolution images. We computed the PSNR and SSIM values on the luminance channel of the YCbCr color space. We also compared parameter amounts with other leading models. Given a ground-truth image I H R and a super-resolved image I S R , we defined the PSNR as:
P S N R ( I H R , I S R ) = 10 l o g 10 ( M a x I 2 M S E )
where
M S E = 1 H × W i = 1 H j = 1 W ( I H R ( i , j ) I S R ( i , j ) ) 2
M a x I is the maximum pixel value of an image, and H and W are the height and width, respectively. We formulated SSIM as:
S S I M ( I H R , I S R ) = l ( I H R , I S R ) c ( I H R , I S R ) s ( I H R , I S R )
where
l ( I H R , I S R ) = 2 μ I H R μ I S R + C 1 μ I H R 2 + μ I S R 2 + C 1 c ( I H R , I S R ) = 2 σ I H R σ I S R + C 2 σ I H R 2 + σ I S R 2 + C 2 s ( I H R , I S R ) = 2 σ I H R I S R + C 3 σ I H R σ I S R + C 3
μ I H R , σ I H R , and σ I H R I S R are the mean, standard deviation, and covariance of an image, respectively, and C 1 , C 2 , and C 3 are positive constants.

4.2. Implementation Details

4.2.1. Training Settings

We obtained the input low-resolution (LR) images from the corresponding HR images by bicubic down-sampling in the training stage. Then, we set 16 LR patches as each training mini-batch, and extracted with a size of 48 × 48 from the LR images. Moreover, we randomly rotated the image in the training dataset by 90 , 180 , and 270 , and flipped it horizontally for data augmentation. We utilized Adam optimizer [48] to optimize our model with settings of β 1 = 0.9 and β 2 = 0.999 . We fixed the initial learning rate to 2 × 10 4 , and decreased by half every 200 epochs. We performed the proposed model on the PyTorch framework with an NVIDIA GTX 1080Ti GPU. More setting details of our experiments are listed in Table 1.

4.2.2. Model Details

Our model includes six attention information distillation blocks (AIDBs), and we set the number of feature channels to 64. Among them, we reserved the channels=16 and further processed the other parts. We set the activation functions in the feature refinement module (FRM) as LReLU, while we applied ReLU to the other parts [49]. Additionally, in the recalibrated attention module (RAM), we deployed the first 3 × 3 convolution layer with a stride = 2, the other 3 × 3 convolution layer with stride = 1, and used the max-pooling operation with a 7 × 7 convolution with stride = 3.

4.3. Study of GCT and RAM

To study the contributions of the different modules in the proposed model, we conducted ablation experiments. All the models are trained from scratch for 1000 epochs, and are executed under similar settings. Each time we removed one module, we directly tested the model performance without adding other operations. Table 2 shows the experimental results at a scale factor of 2 on multiple datasets. Without gate channel transformation (GCT) and the recalibrated attention module (RAM) in the information distillation block (AIDB), the PSNR values of all datasets are relatively lower. The performance of the second row with GCT module is better than that of the first row with only 1 K more parameters. Similarly, RAM in the third row also improves the performance, especially on Urban100 and Manga109 datasets. Therefore, both the GCT module and RAM can independently obtain better reconstruction accuracy. This can be attributed to the multi-layer features being discriminatively treated, and different weights being allocated according to the characteristics of features to screen out high-value information features, improving the efficiency and accuracy of the network. Furthermore, the best reconstruction results are provided when integrating GCT and RAM into the AIDB with few additional parameters, as shown in the last row of Table 2. Thus, the proposed AIDB can capture spatial and global contextual information in each channel, benefiting image restoration. The above quantitative results effectively prove the effectiveness of the network structure with the introduced GCT and RAM, and their integration.

4.4. Comparison with State-of-the-Art Methods

To demonstrate the effectiveness of our proposed architecture, we compared recently proposed competitive works, including SRCNN [5], VDSR [6], DRCN [7], LapSRN [20], IDN [18], CARN [26], MoreMNAS-A [50], FALSR-A [29], ESRN-V [15], WMRN [21], MADNet- L 1 [19], MSICF [51], and CFSRCNN [22], with the proposed network. These works are almost all lightweight networks with less than 2.0 M parameters. The quantitative results with scale factors of ×2, ×3, and ×4 on five benchmark datasets are provided in Table 3. It can be seen that our proposed model is superior to the other leading algorithms across different datasets and scaling factors. Specifically, compared with several automatic search SR architectures based on NAS (FALSR-A, MoreMNAS, and ESRN-V), our AIDN network gets higher PSNR values with fewer parameters (FLOPs) on five datasets for ×2 up-scaling.
Although WMRN has slightly fewer parameters than the proposed network, its reconstruction results are far worse. For example, with a scale factor of 3 on Set5, our network obtains a significant performance gain of 0.24 dB. Moreover, our AIDN performs well compared to MADNet- L 1 , which has a similar number of parameters. MADNet also applied an attention mechanism with a residual multi-scale module. For a scale factor of 4 on four datasets, CFSRCNN achieves the second-best performance with nearly twice as many parameters as our method. From Table 3, it can be seen that the number of parameters of SRCNN and VDSR do not change across scaling factors, as the input image is interpolated and then sent into the network. Other models have varying parameters due to different up-sampling approaches. As our model has relatively few parameters, it can be considered a lightweight model. Consequently, our method has better reconstruction performance than the most advanced methods, with fewer parameters.
In addition, we also compared the visual quality with other methods at the ×4 scale, as shown in Figure 5. For “148026” from BSD100, most methods restored the blurred edges, while CARN even produced wrong textures; furthermore, the image generated by our method was closer to the original image. For “img_042” from Urban100, other methods suffered from severe artifacts, and the lines produced are curved. Only the image refined by our method outputted horizontal lines correctly. For “img_037” from Urban100, LapSRN could not recover grids, and VDSR rebuilt several redundant white vertical lines at the upper right of the image. Our AIDN reconstructed accurate grids with better visual effects.

4.5. Heatmaps of the Proposed AIDN

This section describes the heatmaps of the proposed AIDN at stages with the Urban100 dataset (×2). In Figure 6, the top row shows heatmaps of the shallow features before passing into AIDBs, while the next two rows are heatmaps of the refined high-level features. The results show that our method has different weights in different states. The rows represent the states of different channels at the same time, and the columns represent the states of the same channel at different times. Yellow is heavily weighted, blue is lightly weighted, and green is centered. It can be seen that our method has the function of modulating features, which is conducive to image reconstruction.

4.6. Model Size Analysis

In addition, to further illustrate the superiority of the proposed network, we compared its number of parameters and performance with other leading works. The number of parameters is especially important when building a lightweight network, especially for resource-constrained mobile devices. The experimental results on Urban100 with a scale factor of ×2 are shown in Figure 7. Compared with other methods, our AIDN model obtained comparable or higher PSNR values with fewer parameters, while other methods either had a larger number of parameters or lower performance. These analyses indicate that the proposed AIDN strikes a better balance between parameters and performance.

4.7. Visualization on Historical Images

To further illustrate the robustness and effect of our model, we evaluated our attention network with information distillation (AIDN) on historical images. The degradation process of these low-resolution images is unknown, and no corresponding high-quality images are available. Figure 8 shows the visual results on scale factor ×4. For “img006”, the characters produced by our model were clearer and more independent. For “img007”, our AIDN could reconstruct finer details, and the refined images showed lower blurring. In short, the images generated by our method have better perceptual quality than those of other methods.

5. Conclusions

In this paper, we proposed an attention network with information distillation (AIDN) for image super-resolution. Specifically, global contextual information embedding among different channels is employed to modulate multiple features in a step-by-step manner, forming the distillation structure. Moreover, a recalibrated attention module (RAM) is adopted to re-highlight these features, concentrating on the vital contents around spatial locations. Benefiting from the gated channel transformation and spatial information unit masks working jointly, the proposed AIDN possesses a more powerful information identifying capability, effectively improving the computational efficiency while enhancing the reconstruction accuracy. Comprehensive quantitative and qualitative evaluations effectively demonstrate that our AIDN outperforms state-of-the-art models in terms of both reconstruction performance and visual quality. In future work, we will extend our AIDN to other complex tasks (e.g., images with noise, blurring, etc.).

Author Contributions

Conceptualization, H.Z. (Huaijuan Zang) and S.Z.; methodology, H.Z. (Huaijuan Zang) and S.Z.; formal analysis, H.Z. (Huaijuan Zang) and Y.Z.; writing—original draft preparation, H.Z. (Huaijuan Zang); writing—review and editing, S.Z., C.N. and H.Z. (Haiyan Zhang); visualization, H.Z. (Huaijuan Zang); supervision, S.Z.; project administration, S.Z. and H.Z. (Huaijuan Zang). All authors have read and agreed to the published version of the manuscript.

Funding

This workwas supported by Hefei Municipal Natural Science Foundation (Grant No.2021008), Anhui Province Scientific and Technological Research Programs (Grant No: 1401B042019).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
  2. Guo, T.; Dai, T.; Liu, L.; Zhu, Z.; Xia, S.-T. S2A:Scale Attention-Aware Networks for Video Super-Resolution. Entropy 2021, 23, 1398. [Google Scholar] [CrossRef] [PubMed]
  3. Tang, R.; Chen, L.; Zou, Y.; Lai, Z.; Albertini, M.K.; Yang, X. Lightweight network with one-shot aggregation for image super-resolution. J. Real-Time Image Process. 2021, 18, 1275–1284. [Google Scholar] [CrossRef]
  4. Jiang, X.; Wang, N.; Xin, J.; Xia, X.; Yang, X.; Gao, X. Learning lightweight super-resolution networks with weight pruning. Neural Netw. 2021, 144, 21–32. [Google Scholar] [CrossRef] [PubMed]
  5. Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the European Conference on Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; pp. 184–199. [Google Scholar]
  6. Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1646–1654. [Google Scholar]
  7. Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1637–1645. [Google Scholar]
  8. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  9. Dong, C.; Chen, C.L.; Tang, X.O. Accelerating the Super-Resolution Convolutional Neural Network. In Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
  10. Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1874–1883. [Google Scholar]
  11. Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced deep residual networks for single image super-resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1132–1140. [Google Scholar]
  12. Li, Z.; Wang, C.; Wang, J.; Ying, S.; Shi, J. Lightweight adaptive weighted network for single image super-resolution. Comput. Vis. Image Underst. 2021, 211, 103254. [Google Scholar] [CrossRef]
  13. Tong, T.; Li, G.; Liu, X.J.; Gao, Q.Q. Image Super-Resolution Using Dense Skip Connections. In Proceedings of the 2017 IEEE/CVF International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4809–4817. [Google Scholar]
  14. Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
  15. Song, D.; Xu, C.; Jia, X.; Chen, Y.; Xu, C.; Wang, Y. Efficient residual dense block search for image super-resolution. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA, 7–12 February 2020. [Google Scholar]
  16. Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
  17. Zhang, Y.L.; Li, K.P.; Li, K.; Wang, L.C.; Zhong, B.E.; Fu, Y. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  18. Hui, Z.; Wang, X.; Gao, X. Fast and accurate single image superresolution via information distillation network. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 723–731. [Google Scholar]
  19. Lan, R.; Sun, L.; Liu, Z.; Lu, H.; Pang, C.; Luo, X. MADNet: A Fast and Lightweight Network for Single-Image Super Resolution. IEEE Trans. Cybern. 2021, 51, 1443–1453. [Google Scholar] [CrossRef] [PubMed]
  20. Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep Laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
  21. Sun, L.; Liu, Z.; Sun, X.; Liu, L.; Lan, R.; Luo, X. Lightweight Image Super-Resolution via Weighted Multi-Scale Residual Network. IEEE/CAA J. Autom. Sin. 2021, 8, 1271–1280. [Google Scholar] [CrossRef]
  22. Tian, C.; Xu, Y.; Zuo, W.; Zhang, B.; Fei, L.; Lin, C.W. Coarse-to-fine cnn for image super-resolution. IEEE Trans. Multimed. 2020, 23, 1489–1502. [Google Scholar] [CrossRef]
  23. Liu, J.; Zhang, W.J.; Tang, Y.T.; Tang, J.; Wu, G.S. Residual Feature Aggregation Network for Image Super-Resolution. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 2357–2365.
  24. Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  25. Zhang, X.Y.; Gao, P.; Liu, S.X.; Zhao, K.Y.; Yin, L.G.; Chen, C.W. Accurate and Efficient Image Super-Resolution via Global-Local Adjusting Dense Network. IEEE Trans. Multimed. 2021, 23, 1924–1937. [Google Scholar] [CrossRef]
  26. Ahn, N.; Kang, B.; Sohn, K. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  27. Zhu, F.Y.; Zhao, Q.J. Efficient single image super-resolution via hybrid residual feature learning with compact back-projection network. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27 October 2019; pp. 2453–2460. [Google Scholar]
  28. Song, D.H.; Wang, Y.H.; Chen, H.T.; Xu, C.J.; Tao, D.C. AdderSR: Towards Energy Efficient Image Super-Resolution. In Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Online, 19–25 June 2021; pp. 1–10. [Google Scholar]
  29. Chu, X.; Zhang, B.; Ma, H.; Xu, R.; Li, J.; Li, Q. Fast, accurate and lightweight super-resolution with neural architecture search. In Proceedings of the Conference: 2020 25th International Conference on Pattern Recognition (ICPR), Online, 10–15 January 2020. [Google Scholar]
  30. Chen, G.A.; Matsune, A.; Du, H.; Liu, X.Z.; Zhan, S. Exploring more diverse network architectures for single image super-resolution. Knowl.-Based Syst. 2022, 235, 1–14. [Google Scholar]
  31. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
  32. Wang, X.L.; Girshick, R.; Gupta, A.; He, K.M. Non-local Neural Networks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7794–7803. [Google Scholar]
  33. Woo, S.H.Y.; Park, J.C.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  34. Yang, Z.X.; Zhu, L.C.; Wu, Y.; Yang, Y. Gated Channel Transformation for Visual Recognition. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1791–1800. [Google Scholar]
  35. Wang, Q.L.; Wu, B.G.; Zhu, P.F.; Li, P.H.; Zuo, W.M.; Hu, Q.H. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1531–1539. [Google Scholar]
  36. Dai, T.; Cai, J.R.; Zhang, Y.B.; Xia, S.T.; Zhang, L. Second-order Attention Network for Single Image Super-Resolution. In Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 1057–1066. [Google Scholar]
  37. Anwar, S.; Barnes, N. Densely Residual Laplacian Super-Resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1192–1204. [Google Scholar] [CrossRef] [PubMed]
  38. Hu, Y.T.; Li, J.; Huang, Y.F.; Gao, X.B. Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 3911–3927. [Google Scholar] [CrossRef]
  39. Mei, Y.Q.; Fan, Y.C.; Zhou, Y.Q.; Huang, L.C. Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining. In Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 5689–5698. [Google Scholar]
  40. Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25, 84–90. [Google Scholar] [CrossRef]
  41. Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Honolulu, HI, USA, 21–26 June 2017. [Google Scholar]
  42. Bevilacqua, M.; Roumy, A.; Guillemot, C.; Morel, M.l. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the British Machine Vision Conference, Surrey, UK, 3–7 September 2012; pp. 135.1–135.10. [Google Scholar]
  43. Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Curves and Surfaces; Boissonnat, J.-D., Chenin, P., Cohen, A., Christian, G., Lyche, T., Mazure, M.-L., Schumaker, L., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 711–730. [Google Scholar]
  44. Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the 8th International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar]
  45. Huang, J.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
  46. Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Sketch-based manga retrieval using manga109 dataset. Multimed. Tools Appl. 2017, 76, 21811–21838. [Google Scholar] [CrossRef]
  47. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  48. Kingma, D.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  49. Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the ICML Workshop on Deep Learning for Audio, Speech and Language Processing, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
  50. Chu, X.; Zhang, B.; Xu, R.; Ma, H. Multi-objective reinforced evolution in mobile neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), Online, 23–28 August 2020. [Google Scholar]
  51. Hu, Y.T.; Gao, X.B.; Li, J.; Wang, H.Z. Single image super-resolution with multi-scale information cross-fusion network. Image Process. 2021, 179, 107831. [Google Scholar] [CrossRef]
Figure 1. Overview of our AIDN architecture.
Figure 1. Overview of our AIDN architecture.
Entropy 24 01226 g001
Figure 2. Attention information distillation block (AIDB).
Figure 2. Attention information distillation block (AIDB).
Entropy 24 01226 g002
Figure 3. Gate channel transformation module (GCT).
Figure 3. Gate channel transformation module (GCT).
Entropy 24 01226 g003
Figure 4. Recalibrated attention module (RAM).
Figure 4. Recalibrated attention module (RAM).
Entropy 24 01226 g004
Figure 5. Visual comparison results of our AIDN with SRCNN [5], VDSR [6], DRCN [7], LapSRN [20] and CARN-M [26] for ×4 SR images on BSD100 and Urban100 dataset. The best results are highlighted by red.
Figure 5. Visual comparison results of our AIDN with SRCNN [5], VDSR [6], DRCN [7], LapSRN [20] and CARN-M [26] for ×4 SR images on BSD100 and Urban100 dataset. The best results are highlighted by red.
Entropy 24 01226 g005
Figure 6. Heatmaps of our AIDN at different stages from Urban100 dataset.
Figure 6. Heatmaps of our AIDN at different stages from Urban100 dataset.
Entropy 24 01226 g006
Figure 7. Our AIDN compared with other models in terms of parameters and performance. Results are evaluated on Urban100 with a scale factor of 2.
Figure 7. Our AIDN compared with other models in terms of parameters and performance. Results are evaluated on Urban100 with a scale factor of 2.
Entropy 24 01226 g007
Figure 8. Visual comparisons for a scale factor of × 4 SR on historical images.
Figure 8. Visual comparisons for a scale factor of × 4 SR on historical images.
Entropy 24 01226 g008
Table 1. Setting parameters for our AIDN.
Table 1. Setting parameters for our AIDN.
Batch size48 × 48
Patch size16
The numbers of information distillation blocks6
Initial learning rate 2 × 10 4
Channels64
Channels—reserved (split)16
Optimizer (Adam) β 1 = 0.9 , β 2 = 0.999
Table 2. Investigations of GCT module and RAM unit on five benchmark datasets at scaling factors of × 2 . PSNR/SSIM represent the two values. Params: kernel*kernel*channel-input*channel-output. The best and second-best performances are highlighted in red and blue.
Table 2. Investigations of GCT module and RAM unit on five benchmark datasets at scaling factors of × 2 . PSNR/SSIM represent the two values. Params: kernel*kernel*channel-input*channel-output. The best and second-best performances are highlighted in red and blue.
ScaleGCTRAMParamsSet5Set14BSD100Urban100Manga109
(K)PSNR/SSIMPSNR/SSIMPSNR/SSIMPSNR/SSIMPSNR/SSIM
×2××69037.63/0.958431.30/0.914631.94/0.896631.31/0.919937.79/0.9744
×2×69137.81/0.959133.41/0.915732.07/0.898031.75/0.924138.27/0.9754
×2×73337.85/0.959233.51/0.916732.10/0.898231.94/0.926038.44/0.9758
×273437.95/0.959633.57/0.916932.16/0.898932.16/0.927838.68/0.9763
Table 3. Quantitative results of several state-of-the-art SR models at scaling factors of ×2, ×3 and ×4 (average PSNR/SSIM). The best performance is highlighted in red, while the second-best performance is highlighted in blue.
Table 3. Quantitative results of several state-of-the-art SR models at scaling factors of ×2, ×3 and ×4 (average PSNR/SSIM). The best performance is highlighted in red, while the second-best performance is highlighted in blue.
MethodScaleParamsSet5Set10BSD100Urban100Manga109
(K)PSNR/SSIMPSNR/SSIMPSNR/SSIMPSNR/SSIMPSNR/SSIM
SRCNN [5]×25736.66/0.954232.42/0.906331.36/0.887929.50/0.894635.60/0.9663
VDSR [6]×266537.53/0.958733.03/0.912431.90/0.896030.76/0.914037.22/0.9729
LapSRN [20]×281337.52/0.959033.08/0.913031.80/0.895030.41/0.910037.27/0.9740
IDN [18]×259037.83/0.960033.30/0.914832.08/0.895031.27/0.9196-
CARN-M [26]×241237.53/0.958333.26/0.914131.92/0.896030.83/0.9233-
AWSRN-S [12]×239737.75/0.959633.31/0.915132.00/0.897431.39/0.920737.90/0.9755
ESRN-V [15]×232437.85/0.960033.42/0.916132.10/0.898731.79/0.9248-
WMRN [21]×245237.83/0.959933.41/0.916232.08/0.898431.68/0.924138.27/0.9763
MADNet-L1 [19]×287837.85/0.960033.38/0.916132.04/0.897931.62/0.9233-
MoreMNAS-A [50]×2103937.63/0.958433.23/0.913831.95/0.896131.24/0.9187-
MSICF [51]×2190037.89/0.960533.41/0.915332.15/0.899231.47/0.9220-
FALSR-A [29]×2102137.82/0.959533.55/0.916832.12/0.898731.93/0.9256-
CFSRCNN [22]×2131037.79/0.959133.51/0.916532.11/0.898832.07/0.9273-
AIDN(ours)×273437.95/0.960333.57/0.916932.16/0.899332.16/0.927838.68/0.9782
SRCNN [5]×35732.75/0.909029.28/0.820928.41/0.786326.24/0.798930.59/0.9107
VDSR [6]×366533.66/0.921329.77/0.831428.82/0.797627.14/0.827932.01/0.9310
IDN [18]×359034.11/0.925329.99/0.835428.95/0.801327.42/0.8359-
CARN-M [26]×341233.99/0.923630.08/0.836728.91/0.800026.86/0.8263-
AWSRN-S [12]×344734.02/0.924030.09/0.837628.92/0.800927.57/0.839132.82/0.9393
ESRN-V [15]×332434.23/0.926230.27/0.840029.03/0.803927.95/0.8481-
WMRN [21]×355634.11/0.925130.17/0.839028.98/0.802127.80/0.844833.07/0.9413
MADNet-L1 [19]×393034.16/0.925330.21/0.839828.98/0.802327.77/0.8439-
MSICF [51]×3190034.24/0.926630.09/0.837129.01/0.802427.69/0.8411-
CFSRCNN [22]×3149534.24/0.925630.27/0.841029.03/0.803528.04/0.8496-
AIDN(ours)×374234.35/0.925930.35/0.841329.07/0.803928.13/0.851233.50/0.9433
SRCNN [5]×45730.48/0.862827.49/0.750326.90/0.710124.52/0.722127.66/0.8505
VDSR [6]×466531.35/0.883828.01/0.767427.29/0.725125.18/0.752428.83/0.8809
LapSRN [20]×481331.54/0.885028.19/0.772027.32/0.728025.21/0.756029.09/0.8845
IDN [18]×459031.82/0.890328.25/0.773027.41/0.729725.41/0.7632-
CARN-M [26]×441231.92/0.890328.42/0.776227.44/0.730425.63/0.7688-
AWSRN-S [12]×458831.77/0.889328.35/0.776127.41/0.730425.56/0.767829.74/0.8982
ESRN-V [15]×432431.99/0.891928.49/0.777927.50/0.733125.87/0.7782-
WMRN [21]×453632.00/0.892528.47/0.778627.49/0.732825.89/0.778930.11/0.9040
MADNet-L1 [19]×4100231.95/0.891728.44/0.778027.47/0.732725.76/0.7746-
MSICF [51]×4190031.91/0.892328.35/0.775127.46/0.730825.64/0.7692-
CFSRCNN [22]×4145832.06/0.892028.57/0.780027.53/0.733326.03/0.7824-
AIDN(ours)×475432.17/0.893928.61/0.780627.55/0.734426.04/0.783330.42/0.9065
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zang, H.; Zhao, Y.; Niu, C.; Zhang, H.; Zhan, S. Attention Network with Information Distillation for Super-Resolution. Entropy 2022, 24, 1226. https://doi.org/10.3390/e24091226

AMA Style

Zang H, Zhao Y, Niu C, Zhang H, Zhan S. Attention Network with Information Distillation for Super-Resolution. Entropy. 2022; 24(9):1226. https://doi.org/10.3390/e24091226

Chicago/Turabian Style

Zang, Huaijuan, Ying Zhao, Chao Niu, Haiyan Zhang, and Shu Zhan. 2022. "Attention Network with Information Distillation for Super-Resolution" Entropy 24, no. 9: 1226. https://doi.org/10.3390/e24091226

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop