Abstract
Panchromatic (PAN) images contain abundant spatial information that is useful for earth observation, but always suffer from low-resolution ( LR) due to the sensor limitation and large-scale view field. The current super-resolution (SR) methods based on traditional attention mechanism have shown remarkable advantages but remain imperfect to reconstruct the edge details of SR images. To address this problem, an improved SR model which involves the self-attention augmented Wasserstein generative adversarial network ( SAA-WGAN) is designed to dig out the reference information among multiple features for detail enhancement. We use an encoder-decoder network followed by a fully convolutional network (FCN) as the backbone to extract multi-scale features and reconstruct the High-resolution (HR) results. To exploit the relevance between multi-layer feature maps, we first integrate a convolutional block attention module (CBAM) into each skip-connection of the encoder-decoder subnet, generating weighted maps to enhance both channel-wise and spatial-wise feature representation automatically. Besides, considering that the HR results and LR inputs are highly similar in structure, yet cannot be fully reflected in traditional attention mechanism, we, therefore, designed a self augmented attention (SAA) module, where the attention weights are produced dynamically via a similarity function between hidden features; this design allows the network to flexibly adjust the fraction relevance among multi-layer features and keep the long-range inter information, which is helpful to preserve details. In addition, the pixel-wise loss is combined with perceptual and gradient loss to achieve comprehensive supervision. Experiments on benchmark datasets demonstrate that the proposed method outperforms other SR methods in terms of both objective evaluation and visual effect.
1. Introduction
Panchromatic (PAN) images have been widely used in various applications, such as weather forecasts, environmental monitor, and earth observation. However, since the PAN images are always taken from space satellites with a large field of view, their spatial resolution is usually quite limited, and details of ground objects, for this reason, cannot be well distinguished. To resolve this problem, recent works began to focus on the super-resolution (SR) of PAN images. Due to the limitation of sensors, the PAN images captured from satellite sensors suffers from the heavy image degradation, which is an urgent need for SR to improve resolution and rich image texture through image processing algorithms.
The performance of SR algorithms [,,,] has been greatly boosted by the convolutional neural networks. The conventional supervised learning model tries to minimize the error between ground truth and SR results, whereas this design cannot well utilize the difference between those two samples semantically, arising from the fact that the loss functions are usually assigned as basic error functions, such as Mean Square Error (MSE), Structural Similarity Index Measurement(SSIM), or L1 Norm []. The the adversarial generative network (GAN) is introduced to resolve this shortage. Unlike normal generative networks, GAN-based methods apply the discriminative network to minimize the semantic distance between generated images and ground truth images through the discriminative error, producing High-resolution (HR) results with more details and naturalness. Although GAN-based SR models have made successful progress, there are still some limitations, such as the training instability and the limited representative ability of spatial-wise and channel-wise features []. The training instability is usually caused by the nonlinearity of the discriminative supervision, which may cause mode collapse. In addition, traditional convolution cannot respond to the different contribution among multi-channel and different locations of the feature maps.
Some improved models have been proposed to address the issue of feature representation in SR networks. Residual channel attention networks (RCAN) [] is introduced to learn features across channels and enhance long-term information. Channel attention is used to exploit the features across different channels, but this design cannot fully use the the relevance among different locations of the feature. To further dig out the hidden relation within features, a channel and spatial attention block (CBAM) [] is developed via combining channel-wise and spatial attention mechanisms into the network. Y.T. Hu first introduced the CBAM block into SR network [], which integrates the CBAM features of the channel-wise attention and spatial attention into the residual block (CSAR) to modulate the residual features. The CSAR blocks are stacked in a chain structure to dynamically modulate multi-level features in a global-and-local manner. The multi-level features, in this way, can be adapted and fused with a hierarchical feature map through gated fusion. But, in fact, the relevant information between the channel feature and the spatial feature has not been excavated in CSAR blocks.
To effectively settle the above problem, attention-augmented convolution [] is introduced in this paper to utilize the relevance among multiple features. Attention-augmented convolution improves classic convolution by augmenting the features and giving adaptive weight for feature combination, which can flexibly adjust the fraction of attentional channels to keep inter information among features. This allows the network to capture long-range interactions without increasing the number of parameters, whereas the self-attention mechanism has not been fully explored in SR.
In this work, a self augmented attention Wasserstein generative adversarial network (SAA-WGAN) is proposed for PAN images SR. We first integrate a convolutional block attention module (CBAM) into each skip-connection of the encoder-decoder subnet instead of stacking in a chain for CBAM features extraction in multi-scale. To obtain relevant information for hierarchical features, the self augmented attention (SAA) block using attention-augmented convolution are presented for extraction of the hinder feature and contextual information. In our SAA-WGAN, an encode-decode structure with CBAMs is used as one branch network, and the SAA block is utilized as another parallel branch, providing more helpful features in multiply scales and layers for the reconstruction of HR result. In addition, the pixel-wise information and high-level semantic information can be exploited by the combined loss of pixel loss and perceptual loss. As result, our method obtains better visual quality and recovers more image details compared with other state-of-the-art SR methods.
In summary, the main contributions of this paper are listed as follows:
- (1)
- We propose a WGAN-based network (SAA-WGAN) for PAN image SR, which is integrated with the encode-decode structure and CBAM.
- (2)
- We apply the self-attention module into the WGAN network, from which the long-range features can be well preserved and transferred.
- (3)
- The generate loss is a combination of pixel loss, perceptual loss, and gradient loss to achieve the supervise in terms of both image quality and visual effect.
- (4)
- Extensive evaluations have been conducted to verify the above contributions.
The remainder of this paper is organized as follows. We introduce the related Generative Adversarial Networks and Attention Features in Section 2. The proposed method of SAA-WGAN for PAN image super resolution is described in Section 3. The experimental results and analysis are reported in Section 4. The conclusion of this paper is stated in Section 5.
2. Related Work
2.1. Generative Adversarial Networks
Traditional GANs [,,] always narrow the gap between the generated sample and the real image by minimizing the Kullback–Leibler divergence (KL)distance between discriminative results, and the structure of gan is shown in Figure 1. The discriminator network in GAN can distinguish real and false samples, as well as produce very realistic SR results. Since the KL divergence is not linear for the input distribution space, which means the supervision of the discriminator is non-uniform for all the input samples, thus, the performance of traditional GANs is quite limited.
Figure 1.
Structure of standard WGAN.
In SR, the generator network is trained to capture the real data distribution so that its generative samples can be as real as possible, which means to minimize . The discriminative network estimates the probability of a given sample coming from the real dataset, i.e., it can maximize the probability to distinguish SR sample from real data. So, the contest between the discriminator and the generator is usually formulated as a zero sum with cross-entropy targets.
where x is the input, is the SR image, I is groundtruth, D is discriminator in the network, and G is generator in the network. Hence, the discriminator loss is
In practice, a modified generator loss is used:
2.2. Attention Features
Attention has enjoyed widespread adoption in convolutional neural networks (CNN ) models, including SR networks, because of its ability to enhance feature representation. Y.L. Zhang proposed channel attention (CA) mechanism to adaptively rescale channel-wise features by considering interdependencies among channels. To consider the channel attention and space attention jointly, Y.T. Hu introduced the channel and spatial attention block (CBAM) [] module into deep SR network [], where a set of channel-wise and spatial attention residual (CSAR) blocks was conducted and stacked in a chain structure to dynamically modulate multi-level features in a global-and-local manner. Lately, more attention mechanisms has been applied in super resolution [,]. Tao Dai presented a second-order attention network (SAN []) that employs repeated local-source residual attention groups (LSRAG) to learn increasingly abstract feature representations. In SAN, a novel trainable second-order channel attention (SOCA) module was developed to adaptively rescale the channel-wise features by using second-order feature statistics for more discriminative representations. Further, L.G. Wang created a parallax-attention mechanism (PASSRnet []) to integrate the information from a stereo image pair, handling different stereo images with large disparity variations.
Although these existing attention-based approaches have made good efforts to improve SR performance, the reconstruction of rich details for SISR is still a challenge. In deep networks, the low resolution (LR) inputs and extracted features contain different types of information across channels, locations, and layers, which have different reconstruction contributions for reasons. However, the common convolutional layer imposes locality and translation equivariance via a limited receptive and weight sharing, respectively. The local nature of the convolutional kernel prevents it from capturing global contexts in an image, which is necessary for the details of SR images. Consequently, contributions across different aspects are not equal, which causes that multiple feature maps cannot be fully utilized.
Inspired by the above observations, we propose a method that can capture the global contexts by attention-augmented convolution and extract multi-scale features via an encode-decode network. The features from attention-augmented convolution and the encode-decode network are shown in Figure 2. Features of attention-augmented remain lots of details, such as corners and edges. It further assists the HR image reconstruction in the spatial domain and can be concatenated with the multi-scale feature.
Figure 2.
Architecture of self augmented attention (SAA)-WGAN.
3. Method
Our system reconstructs high-resolution images via Wasserstein generative adversarial networks with the channel and spatial attention to obtain more representative features. Especially, the attention of channel and spatial is a flexible mechanism to capture information of channel features and position features in a self-adaptive manner such that accumulated important information is weighted highly. Besides, a WGAN network with a comprehensive loss function is applied to achieve a realistic display of SR reconstruction results with more details.
3.1. Architecture
The architecture of the self augmented attention WGAN (SAA-WGAN) is illustrated in Figure 2. It consists of two parallel branches, including an encode-decode network and a self attention network. The encode-decode network is composed of two modules, i.e., the encode-decode module (EDM) and the fully convolutional network (FCN). The FCN involves five convolution blocks of eight kernels with a size of 3 × 3. The EDM is a four-scale encode-decode convolutional module, and the CBAM is rubbed into each scale to enhance multi-scale feature representation. Meanwhile, self augmented attention (SAA) convolution is introduced to make a relation of the space and the channel feature subspace for a powerful convolution.
Wasserstein GAN. Wasserstein GAN is proposed by Martin Arjovsky and others to optimize a discriminator by maximizing the Earth Mover (EM) distance between the discriminative result of fake and real samples. Thus, the “discriminator” is not a direct critic of telling the fake samples apart from the real ones anymore. Instead, it is trained to learn a K-Lipschitz continuous function (satisfy ) to help compute Wasserstein distance [] which is linear for the entire sample space. The Wasserstein distance is informally defined as
where I is LR image; x is the real image; and is SR image. is the set of 1-Lipschitz functions. The discriminator loss is
In practice, a modified generator loss is expressed as
Wasserstein GAN removes the logarithm for continuous gradient update and uses gradient penalty for the relevance of parameters and constraints. It solves some problems, such as the unstable gradient of the generator and insufficient diversity of generated data, in GAN. So, it is used in our model to facilitate the reconstruction of more detail in SR image.
Channel and spatial attention block (CBAM). Attention model has been used to help the network to focus on the features which are more critical for the performance. In our model, to fully exploit its information, we utilize a channel and spatial attention block for the feature re-enhancement. The CBAM adopts average-pooling to squeeze the spatial dimension of the input feature map to achieve channel attention. It also applies average-pooling and max-pooling operations along channel axis for spatial attention, then concatenates these features, and generates a spatial attention map by a convolution layer. In this way, the weights of different channels and different positions can be flexibly adjusted under the importance of the information. The input of CBAM is the feature of each layer in EDM, and the multi-scale features captured by CBAM are displayed in the first row of Figure 3.
Figure 3.
Attention-augmented results.
The CBAM structure as Figure 4, CBAM feature math expression can be inferred by a 1D channel attention map and a 2D spatial attention map , and the CBAM feature can be expressed as
where ⨂ denotes element-wise multiplication. is average-pooling, and is max-pooling. and denote channel average-pooled features and max-pooled channel features, respectively. and denote average-pooled spatial features and max-pooled spatial features, respectively. denotes the sigmoid function, and represents a convolution operation with the convolutional filter size of . and are feature weights after pooling and after activation.
Figure 4.
Channel and spatial attention block (CBAM).
Self augmented attention (SAA) convolution. Self augmented attention (SAA) convolution aims to compute a weighted average of values from hidden units, and the weights are produced dynamically via a similarity function. It can also capture long-range interactions among input signals and gives the dynamical weights obtained by hidden units to the input. The SAA takes local information and re-calibrated global information into the convolution. That is, two heads of feature subspace can participate in the attention mechanism to get both spatial and channel-wise weighted maps, which are used to re-weight the corresponding location of the input image, and, finally, concatenates with the point-wise convolution to achieve the enhanced convolution operation in SAA. Therefore, augment convolutions are applied to the self-attention mechanism for representative and abstract features, as displayed in the second row of Figure 3.
Self attention-augmented convolution is achieved by concatenating convolutional feature maps into self-attentional feature maps which is capable of modeling longer range dependencies (see Figure 5). First, we flatten the input matrix X shape of to and take an operation of multi-head attention as the transformer architecture []. The output of the self-attention module for a head h can be formulated as:
where , ∈, and ∈ are learned linear transformations used to map the input X to queries , keys , and values . Attention () map uses query Q and keys K matrix as weight of values V, and we can obtain a matrix via . Then, the outputs of all heads (1, 2, 3, …, h) are then concatenated as follows:
where is a linear transformation. is then reshaped into a tensor of shape to match the original spatial dimensions.
Figure 5.
Attention-augmented convolution results.
The comprehensive feature is expressed by SAA feature and FCN feature .
where function is sum of and , and function is series of convolution in FCN module. is the fusion (i = 1, 2, 3, 4), is the CBAM feature, is the SAA feature, and is the FCN feature.
3.2. Loss function
Efficient loss functions and deep CNN networks have been exploited in other SR methods []. To achieve better performance, we utilize the pixel-wise loss (e.g., L1 loss) to minimize the error between the real image and SR result in pixel-level, which has been widely used in many image reconstruction problems []. The pixel-wise loss can get excellent performance in Peak Signal to Noise Ratio (PSNR) but always introduces some artifacts. To avoid image artifacts, we also introduce the perceptual loss in our model. Perceptual loss tries to reduce the feature gap between the real image and the reconstructed image at certain layers of VGG19 features, and it can be used to preserve semantic information and achieve better visual quality. In addition, the gradient loss is used to minimize the gradient difference between the real image and the reconstructed image in different directions. The combination of pixel-wise loss, gradient loss, and perceptual loss is applied to supervise the training process. The combined generator loss can be expressed as
In addition, the discriminator loss is
Here, , and denote the pixel-wise loss, perceptual loss, and gradient loss, respectively, , , ; a, b, and c are weighted values which are adjusted according to the training situation. is gradient computation, is the feature map of the lth layer of VGG19 (l = 1, 2, 3, 4, 5), indicates L1 norm, and indicates L2 norm.
We conducted the three experiments using different combination of these losses to validate the effectiveness of the comprehensive loss. It can be seen from Table 1 that the loss acted on the generative network can improve the performance as expected, from 32.23 dB to 33.29 dB. These comparisons firmly demonstrate the effectiveness of loss.
Table 1.
Test of different loss on the Set5 dataset.
4. Experimental Evaluation
In this part, we conduct experimental comparison of state-of-art deep learning methods, including SRCNN [], VDSR [], EDSR [], LapSRN [], RCAN [], ESPCN [], RDN [], SRGAN [], and CGAN []. And the baselines are re-implemented based on the source-code that the authors provided. We implement our models with the TensorFlow framework and train them using NVIDIA Titan V GPU. In the following subsection, we will provide reasonable settings for the implementation details and parameters in our SR model.
4.1. Implementation Details
We use the DIV2K dataset, a high-quality (2K resolution) dataset with 800 images, for our training. The training samples are randomly cropped from the original images with a fixed size of 64 × 64.
The generative model is trained using the loss function in Equation (11) with a = 0.22, b = 0.43, c = 0.35. The learning rate is initialized as and decayed by a factor of 1 every of mini-batch updates. For optimization, we used Adam with = 0.9, = 0.999, , and step size . We alternately updated the generator and discriminator network until the model converges.
4.2. Comparisons and Results
To validate the effects of self-attention, we carried out a series of experiments involving the following three parts:
- Comparison test between castrated model without SAA convolution and SAA-WGAN.
- Comparison test on PAN datasets of DOTA and GEO, evaluation index of PSNR and SSIM on GEO images.
- Comparison with benchmark networks on classic datasets, including Set5, Set14, BSD100, and Urban100.
First, we test the performance of SAA-WGAN and the castrated model without SAA on the DOTA dataset, and some results are shown in Figure 6. The details of the airplane, the shade of the tree, and the house in Figure 6 have been significantly improved because of SAA ability to keep long-distance details, indicating that self-attention could improve the network performance.
Figure 6.
Panchromatic (PAN) image super-resolution (SR) from GEO.
Then, comparisons are conducted on DOTA and GEO images using state-of-art algorithms, which proves the performance superiority of SAA-WGAN. The results of the SAA-WGAN are displayed in Figure 7, Figure 8 and Figure 9. We show visual comparisons of different benchmark algorithms on scale ×4 in Figure 7. As can be seen, all the compared methods suffer from blurring artifacts with varying degrees, failing to recover more details. However, our SAA-WGAN can recover them obviously, showing more faithful to the ground truth. Due to the resolution of the image is too high, the size of PAN images are cropped into 64 × 64.
Figure 7.
DOTA image for x4 scale SR.
Figure 8.
PAN image SR from GEO (scale = 4).
Figure 9.
SR results from DOTA datasets (scale = 4).
To further illustrate the universality advantage on other datasets of SAA-WGAN, we compare our method with 8 state-of-the-art methods (Bicubic, SRCNN, SCN, VDSR, LapSRN, EDSR, RDN, RCAN) on some most used SR dataset, e.g., Set5, Set14, BSD100, Urban100. More comparisons about PSNR/SSIM are provided in Table 2. It shows quantitative comparisons for ×2, ×4, and ×8 SR. The best results are annotated with blue text in Table 2. It demonstrates that our method almost achieves the best performance on all the datasets with all scaling factors.
Table 2.
SR results of benchmark.
We also find that, when the scaling factor becomes larger (e.g., 8), the PSNR gain of our method also becomes larger. When the scale factor is 2, the PSNR gain of our method tested on BSD100 and Urban100 exceeds RCAN by 1.5 dB and 1.2 dB, respectively. Similarly, on the same two datasets with the scale factor of 4, the proposed method has more gains than RCAN of 2.2 dB and 1.4 dB, respectively. When the scale factor is 8, the PSNR gain of this method exceeds RCAN by 2.99 dB and 1.97 dB, respectively. This observation shows that deeper network structure and powerful attention mechanism can improve network performance.
Figure 10 is the objective evaluation on image patches of Figure 8. In comparison algorithms, the performance curves of the SRGAN and ESPCN are significantly higher than the other five algorithms. Although SRGAN uses discriminative network that can extract the semantic information to get more useful features, ESPCN adopts a reconstruction strategy of concentrating multiple channel features to form a fused feature map which uses the relationship across channels. However, the evaluation indicators of SRGAN and ESPCN cannot exceed the proposed method. The PSNR of SAA-WGAN reaches 32 dB, and the SSIM curve fluctuates around 0.92; it is achieved by its ability of extracting attention feature from hidden units using SAA and CBAM, which is superior to other comparison algorithms in subjective vision and objective evaluation.
Figure 10.
The metrics of GEO image (The proposed SAA-WGAN is the purple curve.).
5. Conclusions
We propose a self attention-augmented network SAA-WGAN for PAN image SR. SAA-WGAN uses the EDM to extract multi-scale information and utilizes FCN to reconstruct HR images. The CBAM and the SAA are rubbed into the SAA-WGAN to enhance multi-scale feature representation and make use of the relationship in both spatial and channel subspaces. Further, the pixel loss, perceptual loss and gradient loss are combined to supervise the training process. Extensive experiments on benchmark datasets and PAN images demonstrate the effectiveness of our proposed SAA-WGAN.
Author Contributions
J.D. made substantial contributions to the algorithm design. K.C. and J.D. made substantial contributions to the algorithm implementation and manuscript preparation. D.W. and J.D. made substantial contributions to the experimental data collection and the result analysis. H.Z. and Y.Y. revised the final manuscript. All authors have read and agreed to the published version of the manuscript.
Funding
The work was supported by the National Natural Science Foundation of China (61705173, 51801142), Equipment Development Research Project of China (61404140506, 61409230214, 61404130316), Aeronautical Science Foundation of China (201901081002), International exchange project (B17035), and the short-term study abroad program of doctoral students of Xidian University.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement
DOTA dataset is available at https://captain-whu.github.io/DOTA.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Courtrai, L.; Pham, M.T.; Lefèvre, S. Small Object Detection in Remote Sensing Images Based on Super-Resolution with Auxiliary Generative Adversarial Networks. Remote Sens. 2020, 12, 3152. [Google Scholar] [CrossRef]
- Wang, P.; Zhang, G.; Hao, S.; Wang, L. Improving Remote Sensing Image Super-Resolution Mapping Based on the Spatial Attraction Model by Utilizing the Pansharpening Technique. Remote Sens. 2019, 11, 247. [Google Scholar] [CrossRef]
- Ma, W.; Pan, Z.; Yuan, F.; Lei, B. Super-Resolution of Remote Sensing Images via a Dense Residual Generative Adversarial Network. Remote Sens. 2019, 11, 2578. [Google Scholar] [CrossRef]
- Du, J.; Song, J.; Cheng, K.; Zhang, Z.; Zhou, H.; Qin, H. Efficient Spatial Pyramid of Dilated Convolution and Bottleneck Network for Zero-Shot Super Resolution. IEEE Access 2020, 8, 117961–117971. [Google Scholar] [CrossRef]
- Yang, W.; Zhang, X.; Tian, Y.; Wang, W.; Xue, J.; Liao, Q. Deep Learning for Single Image Super-Resolution: A Brief Review. IEEE Trans. Multimed. 2019, 21, 3106–3121. [Google Scholar] [CrossRef]
- Khetan, A.; Oh, S. Achieving budget-optimality with adaptive schemes in crowdsourcing. In Advances in Neural Information Processing Systems; Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29, pp. 4844–4852. [Google Scholar]
- Zhang, Y.; Li, K.; Kai, L.; Wang, L.; Yun, F. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. CoRR 2017. Available online: http://xxx.lanl.gov/abs/1706.03762 (accessed on 6 December 2017).
- Hu, Y.; Li, J.; Huang, Y.; Gao, X. Channel-wise and Spatial Feature Modulation Network for Single Image Super-Resolution. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 3911–3927. [Google Scholar] [CrossRef]
- Bello, I.; Zoph, B.; Vaswani, A.; Shlens, J.; Le, Q.V. Attention Augmented Convolutional Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 Octobere–2 November 2019. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems; 2017; Available online: https://arxiv.org/pdf/1704.00028.pdf (accessed on 6 December 2017).
- Kupyn, O.; Budzan, V.; Mykhailych, M.; Mishkin, D.; Matas, J. DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Dai, T.; Cai, J. Second-order Attention Network for Single Image Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Wang, L.; Wang, Y.; Liang, Z.; Lin, Z.; Yang, J.; An, W.; Guo, Y. Learning Parallax Attention for Stereo Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Weng, L. From GAN to WGAN. Available online: https://arxiv.org/pdf/1904.08994.pdf (accessed on 18 April 2019).
- Shamsolmoali, P.; Zareapoor, M.; Wang, R.; Jain, D.K.; Yang, J. G-GANISR: Gradual generative adversarial network for image super resolution. Neurocomputing 2019, 366, 140–153. [Google Scholar] [CrossRef]
- Du, J.; Zhou, H.; Qian, K.; Tan, W.; Yu, Y. RGB-IR Cross Input and Sub-Pixel Upsampling Network for Infrared Image Super-Resolution. Sensors 2020, 20, 281. [Google Scholar] [CrossRef] [PubMed]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image SuperResolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2599–2613. [Google Scholar] [CrossRef] [PubMed]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual Dense Network for Image Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Zhang, X.; Song, H.; Zhang, K.; Qiao, J.; Liu, Q. Single image super-resolution with enhanced Laplacian pyramid network via conditional generative adversarial learning. Neurocomputing 2020, 398, 531–538. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).