Next Article in Journal
Thaw-Season InSAR Surface Displacements and Frost Susceptibility Mapping to Support Community-Scale Planning in Ilulissat, West Greenland
Next Article in Special Issue
Enhancing Remote Sensing Image Super-Resolution with Efficient Hybrid Conditional Diffusion Model
Previous Article in Journal
2D-DOA Estimation in Multipath Using EMVS Rectangle Array
Previous Article in Special Issue
CGC-Net: A Context-Guided Constrained Network for Remote-Sensing Image Super Resolution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Remote Sensing Image Super-Resolution Guided by Bicubic-Downsampled Low-Resolution Image

1
Department of Civil and Environmental Engineering, Seoul National University, Seoul 08826, Republic of Korea
2
Lyles School of Civil Engineering, Purdue University, West Lafayette, IN 47907, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(13), 3309; https://doi.org/10.3390/rs15133309
Submission received: 4 May 2023 / Revised: 23 June 2023 / Accepted: 26 June 2023 / Published: 28 June 2023
(This article belongs to the Special Issue Advanced Super-resolution Methods in Remote Sensing)

Abstract

:
Image super-resolution (SR) is a significant technique in image processing as it enhances the spatial resolution of images, enabling various downstream applications. Based on recent achievements in SR studies in computer vision, deep-learning-based SR methods have been widely investigated for remote sensing images. In this study, we proposed a two-stage approach called bicubic-downsampled low-resolution (LR) image-guided generative adversarial network (BLG-GAN) for remote sensing image super-resolution. The proposed BLG-GAN method divides the image super-resolution procedure into two stages: LR image transfer and super-resolution. In the LR image transfer stage, real-world LR images are restored to less blurry and noisy bicubic-like LR images using guidance from synthetic LR images obtained through bicubic downsampling. Subsequently, the generated bicubic-like LR images are used as inputs to the SR network, which learns the mapping between the bicubic-like LR image and the corresponding high-resolution (HR) image. By approaching the SR problem as finding optimal solutions for subproblems, the BLG-GAN achieves superior results compared to state-of-the-art models, even with a smaller overall capacity of the SR network. As the BLG-GAN utilizes a synthetic LR image as a bridge between real-world LR and HR images, the proposed method shows improved image quality compared to the SR models trained to learn the direct mapping from a real-world LR image to an HR image. Experimental results on HR satellite image datasets demonstrate the effectiveness of the proposed method in improving perceptual quality and preserving image fidelity.

1. Introduction

Image super-resolution (SR) refers to the task of reconstructing high-resolution (HR) images from their low-resolution (LR) counterparts [1], and it has great significance in image processing, enabling various downstream applications [2]. However, image super-resolution is a well-known ill-posed problem because a single LR image can correspond to multiple HR images. Recent SR studies have addressed this problem by leveraging deep learning networks and achieved remarkable performance improvements compared to conventional example-based methods [3], even in the absence of prior information [4,5,6,7,8,9,10].
Since the advent of the deep-learning-based SR approach [4], several studies have devised deeper networks using various learning strategies such as residual [6,9,11,12,13,14], recursive [6,7], and adversarial [9,15] learning. Deep-learning-based SR models can be categorized into two groups, including convolutional neural network (CNN)-based models and generative adversarial network (GAN)-based models [16]. The first deep-learning-based SR model, SRCNN, is a CNN-based model composed of three convolutional layers, each corresponding to patch extraction, nonlinear mapping, and reconstruction [4]. Following the success of the SRCNN, CNN-based models such as VDSR [6] and EDSR [11] have been widely developed to fully leverage the learning capability of deeper networks. Currently, a residual network with a stack of residual blocks is perceived as the basic structure of CNN-based SR models. Zhang et al. [14] proposed RDN, consisting of residual dense blocks with dense local connections to enhance the residual block. In [13], a channel attention mechanism was adopted for a residual structure, demonstrating a significant improvement in SR image quality. However, because most CNN-based models are trained to optimize pixel-wise losses, such as the mean squared error (MSE) loss or L1 loss, these models are prone to producing overly smoothed SR outputs with restrictions on realistic texture recovery [17].
Compared with CNN-based models, GAN-based SR models generate more realistic textures by introducing adversarial training into existing CNN-based models [18]. The basic principle of a GAN is to train two networks (generator and discriminator) simultaneously for opposite purposes. The discriminator is trained to distinguish real HR images from SR images, whereas the generator is trained to produce realistic SR images to fool the discriminator. SRGAN [9] and ESRGAN [15] integrated two additional losses (perceptual and adversarial losses) into a loss function and improved the perceptual quality of the SR results.
A GAN-based SR approach has also been employed for remote sensing image processing to improve the perceptual quality [19]. Jiang et al. [20] complemented a GAN-based model by incorporating a subnetwork for edge enhancement, which refines edge information from satellite image datasets. Furthermore, Rabbi et al. [21] proposed EESRGAN, which trained the SR and object detection networks end-to-end and attempted to enhance the SR performance by using the detector loss from the subsequent object detection network. Liu et al. [22] proposed SG-GAN to benefit from employing a downstream task network by applying a pre-trained saliency detection model to the outputs of the SR network.
In general, deep-learning-based SR methods require LR images and their corresponding HR images as the training dataset. However, owing to the difficulties in obtaining real-world LR-HR datasets, most SR studies have only used HR images and generated LR images by applying degradations to HR images. The most commonly used method for generating LR images from HR images is downsampling by bicubic interpolation with a predefined scale factor [6,8,9,10,12,15,23]. However, SR models trained on simple degradation do not reflect the properties of real-world degradation, and often result in deteriorated performance when applied to real-world LR images. Therefore, some researchers have attempted to alleviate the gap between simple downsampling and real-world image degradation by applying a blur kernel and noise [4,13,14,24,25,26,27]. Conversely, several tailored datasets have been constructed, such as RealSR [28], DRealSR [29], and SR-RAW [30], which are more targeted at real-world image super-resolution. These datasets comprise real-world LR-HR image pairs obtained by adjusting the camera’s focal length. Similarly, deep-learning-based SR models for remote sensing images commonly use predefined degradation to generate synthetic LR-HR datasets for training and validation [21,22]. Some recent studies adopted a degrader [31] or downsample generator [32] in the deep-learning architecture and attempted to make the model learn image degradation and super-resolution.
For HR satellite images, image datasets are usually provided as pairs of panchromatic (PAN) and multispectral (MS) images. Thus, these paired images provide a favorable opportunity for constructing real-world LR-HR image datasets. In this study, to train and validate the proposed model on real-world LR-HR image datasets, we performed pansharpening [33] using paired PAN and MS images from WorldView-3 (WV3) to generate real-world LR-HR remote sensing image datasets. The pansharpened and original MS images were then used as the HR and LR images, respectively. The scale factor was set to 4, based on the scale ratio of the PAN and MS images. The experimental results from the overall study were obtained from SR models trained on real-world LR-HR image datasets. A detailed description of the datasets used in this study is provided in Section 3.1.
Figure 1 demonstrates the difference between real-world and synthetic LR images. The ground objects are discernible in the bicubic-downsampled LR images (Figure 1a), whereas the clarity of the object boundaries is diminished in the real-world LR images (Figure 1b) because of blurring. Therefore, SR models trained on synthetic LR images from bicubic downsampling often fail to achieve satisfactory SR performance on real-world LR images. Furthermore, we observed that the SR models demonstrated better SR performance when trained on synthetic LR-HR image datasets than when trained on real-world LR-HR image datasets (see Appendix A).
Based on these observations, we have inferred that refining the input LR image is as crucial as designing a complex SR network architecture to enhance the SR performance. Thus, this study proposed a bicubic-downsampled LR image-guided generative adversarial network (BLG-GAN) for the super-resolution of remote sensing images. The BLG-GAN performs super-resolution for real-world LR images under the guidance of clean synthetic LR images, obtained through a simple bicubic operation. By dividing the SR problem into subproblems with separate networks, the learning objective of each network becomes clearer. As a result, the training process of the BLG-GAN can be more stabilized than that of deep networks trained to learn a direct relationship between real-world LR and HR images.
To the best of our knowledge, this is the first study to introduce a training strategy that uses a synthetic LR image from bicubic downsampling to guide the supervised image super-resolution of remote sensing images. Moreover, we investigated the effectiveness of our method by comparing it with state-of-the-art methods and thoroughly analyzed the influence of its components on SR performance.
The remainder of this study is organized as follows. Section 2 presents the architecture of the proposed BLG-GAN model. Section 3 presents the experimental results of the WV3 datasets. In Section 4, the effectiveness of the proposed method was validated using ablation studies on the network architecture and type of loss. Finally, Section 5 presents the conclusions of this study.

2. Methodology

The proposed model aims to learn mapping from the real-world LR image domain X to the HR image domain Y from training using the given samples x X and y Y with the guidance of bicubic-downsampled LR images. While real-world LR image x is obtained from MS bands of WV3, a synthetic LR image is generated from HR images y with bicubic downsampling and denoted as y Y . Inspired by [34,35,36], we assumed y as “clean LR image,” which has less corruption in an image such as blur and noise. Thus, we used these bicubic-downsampled LR images as a bridge between the LR images and the corresponding HR images to restore clear details from the clean LR images. The prior application of image transfer to the input LR image is intended to reduce corruption within the real-world LR image and affects the quality of the output images from the following SR process.
As shown in Figure 2, the proposed BLG-GAN model consists of two stages: LR image transfer and super-resolution. In the LR image transfer stage, the LR images are processed through G X Y to generate LR images that have similar image characteristics or distributions with synthetic LR images, referred to as “bicubic-like LR images”. The output of the LR image transfer stage is then fed into the generator with upsampling blocks ( G Y Y ) for super-resolution. Both stages include a generator and a discriminator to adopt adversarial training for the generation of bicubic-like LR and SR images. Each generator is trained to fool its corresponding discriminator and produce bicubic-like LR or SR images. Conversely, the discriminator is trained to distinguish whether the generated image is real or fake. The following subsections provide detailed explanations of each stage.

2.1. LR Image Transfer

In LR image transfer, generator G X Y learns the mapping from the LR domain X to the bicubic-like LR domain Y , as illustrated in Stage 1 of Figure 2. For the given input LR image x, G X Y generate a bicubic-like LR image y ^ , which looks similar to a synthetic LR image y . This LR image transfer process can be formulated as:
y ^ = G X Y ( x ) .
Using adversarial training, G X Y was trained to fool the corresponding discriminator, D Y , for the generated bicubic-like LR image ( y ^ ). In the meantime, D Y is trained to discern the generated LR image y ^ as fake and the synthetic LR image y as real.
The generator loss for LR Image transfer consists of two different losses: pixel-wise loss L p i x L R and adversarial loss L a d v L R . The pixel-wise loss calculates the l 1 -distance between y ^ and y . We chose LSGAN [37] for adversarial loss, which uses the form of least squares loss instead of negative log-likelihood loss. The LSGAN is known to stabilize the learning process while achieving a higher SR performance than the standard GAN [38]. The two different losses are formulated as:
L p i x L R = 1 N i = 1 N G X Y ( x i ) y i 1 ,
L a d v L R = 1 N i = 1 N D Y ( G X Y ( x i ) ) 1 2 ,
where N denotes the number of training samples. The discriminator loss for D Y can be formulated as:
L D L R = 1 N i = 1 N D Y ( y i ) 1 2 + D Y ( G X Y ( x i ) ) 2 .
Finally, the total loss for generator G X Y can be expressed as the weighted sum of the pixel-wise loss ( L p i x L R ) and adversarial loss ( L a d v L R ),
L G L R = L p i x L R + ω 1 L a d v L R ,
where ω 1 is the weight of adversarial loss for LR images.

2.2. Super-Resolution

Using the LR image generated from the prior LR image transfer as the input, the generator for super-resolution ( G Y Y ) learns the mapping relationship from the bicubic-like LR domain Y to the HR domain Y . As shown in Stage 2 of Figure 2, the output of G X Y , which is a bicubic-like LR image y ^ , is input into the SR network G Y Y to produce an SR image y ^ . In the training phase, the discriminator D Y interacts with G Y Y and helps the network generate an SR image similar to the corresponding HR image, y . The super-resolution process can be formulated as follows:
y ^ = G Y Y ( y ^ ) = G Y Y ( G X Y ( x ) ) = G Y Y G X Y ( x ) .
We denote the consecutive processes of G X Y and G Y Y as G Y Y ° G X Y . As with the LR image transfer, G Y Y is trained to fool the corresponding discriminator D Y for the generated SR image y ^ , whereas D Y is trained to distinguish the generated SR image y ^ as fake and the ground truth HR image y as real.
The generator loss function for super-resolution consists of three different losses: pixel-wise loss L p i x H R , perceptual loss L p e r H R , and adversarial loss L a d v H R . Similar to the LR image transfer, we chose the L1 norm for pixel-wise loss and LSGAN for adversarial loss. The pixel-wise and adversarial losses for HR image are formulated as:
L p i x H R = 1 N i = 1 N G Y Y ( y ^ i ) y i 1
L a d v H R = 1 N i = 1 N D Y ( G Y Y ( y ^ i ) ) 1 2 .
The discriminator loss for D Y can then be formulated as:
L D H R = 1 N i = 1 N D Y ( y i ) 1 2 + D Y ( G Y Y ( y ^ i ) ) 2 .
Additionally, we added the perceptual loss L p e r H R between y ^ and y . For perceptual loss, we adopted the learned perceptual image patch similarity (LPIPS) [39], which measures the perceptual similarity of images with multi-layer features. Recent SR studies have verified the usefulness of the LPIPS as a perceptual loss measure by achieving high ranks in challenges on SR tasks [40,41]. In Section 4.3, we also compare perceptual loss with LPIPS and the commonly used VGG-based perceptual loss.
The total loss for generator G Y Y can be expressed as the weighted sum of L p i x H R , L a d v H R , and L p e r H R
L G H R = L p i x H R + λ 1 L a d v H R + λ 2 L p e r H R ,
where λ 1 and λ 2 are the weights of adversarial loss and perceptual loss for HR images, respectively.

2.3. Network Architecture

The proposed SR network consists of two generators and two discriminators, each for LR image transfer and super-resolution. In this section, the architecture of each network component is described.

2.3.1. Generator

For the two generators ( G X Y and G Y Y ) in the proposed model, we adopted the network architecture from residual channel attention networks (RCAN) [13] (Figure 3), considering its superior SR performance even without a discriminator. RCAN is based on residual in residual (RIR) architecture with several residual groups and long skip connections. Each residual group comprises multiple residual channel attention blocks (RCABs). As shown in Figure 3b, RCAB integrates channel attention into the residual block to extract channel-wise features and achieves considerable enhancement in the image quality of the SR outputs. Further investigation of the effectiveness of the RCAN-based generator is addressed in Section 4.1 through a comparison with other generator architectures.
Although the basic architecture for the two generators is almost identical, we adjusted the network capacity by setting the number of residual groups and the number of RCABs in each residual group to (5, 10) and (5, 20) for G X Y and G Y Y , respectively. Even though our total generative network ( G Y Y ° G X Y ) is smaller in size than the original RCAN model with 10 residual groups and 20 RCABs for each residual group, BLG-GAN achieves superior SR performance by dividing the SR problem into subproblems. In addition, G X Y does not include upsampling blocks because the scales of the input and output images remain the same in the LR image transfer.

2.3.2. Discriminator

The discriminators D Y and D Y share the same discriminator structure based on patchGAN architecture [42] (Figure 4). The patchGAN consists of four convolutional layers with the number of features increasing from 64 to 512 by a factor of 2, followed by a final convolutional layer. The output features represent the patch-based decision of whether the image region is real or fake. To discriminate between the generated SR and HR images ( D Y ), we used a 70 × 70 patchGAN discriminator. For the discriminator for the generated bicubic-like LR images ( D Y ), we modified the stride of the first three convolutional layers of D Y from two to one [34], because the size of the LR images is smaller than 70 × 70 pixels. As a result, the receptive field of D Y is reduced to 16 × 16 for LR images.

3. Experimental Results

Here, we describe the datasets used in this study and the quantitative assessment metrics used for the evaluation. Based on those metrics, the proposed method was compared with state-of-the-art techniques to verify the effectiveness of BLG-GAN.

3.1. Datasets

This study used HR satellite images from the WV3 sensor as the remote sensing images. The WV3 sensor provides PAN and MS bands with spatial resolutions of 0.31 m and 1.24 m, respectively. Two WV3 datasets, WV3-1 and WV3-2 (Figure 5), were generated using two WV3 images captured over the Pyeongdong Industrial Complex in Gwangju, Republic of Korea, with a temporal interval of approximately one year (26 May 2017 and 4 May 2018). The scene contained various land-cover types and objects, including urban areas, paddy fields, grasslands, forests, and rivers. To construct the LR-HR datasets, we adopted the Gram–Schmidt adaptive (GSA) algorithm [33] for pansharpening the paired PAN and MS images to generate HR images. The image quality assessment results for the pansharpened images are provided in Appendix B. For training and testing the SR models, the HR images were divided into sub-images of 512 × 512 pixels, which corresponded to sub-images of 128 × 128 pixels for the LR images. Consequently, the generated datasets comprised 1208 images for WV3-1 and 1136 images for WV3-2. These datasets were split into training, validation, and test datasets, with each set containing 60%, 20%, and 20% of the total dataset, respectively.

3.2. Quantitative Assessment Metrics

The SR results were evaluated using several image quality metrics, including the peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM) [43], LPIPS [39], spectral angle mapper (SAM) [44], erreur relative globale adimensionnelle de synthèse (ERGAS) [45], universal image quality index (UIQI) [46], and natural image quality evaluator (NIQE) [47].
The PSNR is calculated as:
PSNR ( y ^ , y ) = 10 log   10 M A X 2 M S E ( y ^ , y ) ,
where MAX is the maximum pixel value of the image and MSE is the mean squared error between the SR image ( y ^ ) and the ground truth HR image ( y ).
SSIM [43] measures three properties of an image: luminance, contrast, and structural characteristics. SSIM is defined as:
SSIM ( y ^ , y ) = ( 2 μ y ^ μ y + c 1 ) ( 2 σ y ^ y + c 2 ) ( μ y ^ 2 + μ y 2 + c 1 ) ( σ y ^ 2 + σ y 2 + c 2 ) ,
where μ y ^ and μ y are the means of y ^ and y , respectively. σ y ^ and σ y are the standard deviations of y ^ and y , respectively, and σ y ^ y is the cross-covariance of images y ^ and y . c 1 and c 2 are constants to prevent division by zero.
Although the PSNR and SSIM are the most widely used indices for evaluating the image quality of SR products, these conventional metrics focus on image fidelity rather than human perception. Therefore, we also used LPIPS, which was devised to reflect human perception and calculate the perceptual similarity of images [39]. LPIPS measures the perceptual similarity of image patches using pre-trained networks such as VGGNet and AlexNet. We used the pre-trained VGG-16 model to compute the l 2 -distance of the features from multiple layers. The formulation of LPIPS is as follows:
LPIPS ( y ^ , y ) = l 1 H l W l h , w w l ( f h , w l f 0 , h , w l ) 2 2 ,
where f h , w l and f 0 , h , w l represent the features extracted from the lth layer at the locations (h, w) of images y ^ and y , respectively. H l and W l are the height and width of the features from the lth layer, respectively. w l is a learned weight vector and ⊙ represents the element-wise product. Although higher PSNR and SSIM values indicate better image quality, a low LPIPS value is desirable because it measures the distance between the features of the input images.
In addition, SAM, ERGAS, and UIQI are conventional image quality assessment metrics that also focus on image fidelity rather than perceptual quality. SAM [44] measures the spectral angle between two images by calculating the dot product divided by the l 2 -norm of each image. As the input images show high similarity, the SAM value approaches zero. The SAM is defined as
SAM ( y ^ , y ) = arccos ( y ^ y y ^ 2 y 2 ) .
ERGAS [45] measures the image quality in terms of the band-wise normalized mean error. A lower ERGAS value indicates higher image quality. The formulation of ERGAS is as follows:
ERGAS ( y ^ , y ) = 100 s 1 N k = 1 N ( R M S E ( y ^ k , y k ) y ¯ k ) ,
where s and N represent the scale factor and the number of spectral bands of the images being evaluated, respectively.
Wang and Bovik [46] proposed UIQI, which models image distortion as a combination of three factors: loss of correlation, luminance distortion, and contrast distortion. Higher UIQI values that are close to one indicate better image quality. The UIQI is calculated as:
UIQI ( y ^ , y ) = 4 σ y ^ y μ y ^ μ y ( μ y ^ 2 + μ y 2 ) ( σ y ^ 2 + σ y 2 ) .
To evaluate the quality of the SR results, we also employed a no-reference quality metric. Unlike the previously mentioned metrics, a no-reference quality metric does not require a reference image for image quality assessment. NIQE [47] operates by extracting the natural scene statistics (NSS) features from the image patches and fitting them with a multivariate Gaussian (MVG) model. The derived MVG model is then compared with the MVG model obtained from a natural image database. A lower NIQE value indicates better image quality.

3.3. Implementation Details

To stabilize the training process, two generators in the proposed model ( G X Y and G Y Y ) were pre-trained using only the pixel-wise loss (L1 loss). Subsequently, based on the pre-trained generators, discriminators were added for adversarial training and jointly trained to generate SR images from real-world LR images (MS images).
In the training phase, we randomly cropped eight LR image patches with a size of 32 × 32 pixels for every iteration and augmented the data with random flip (horizontal and vertical) and rotation (90°, 180°, or 270°) to complement the limited number of training samples. The generators and discriminators were trained using the Adam optimizer with β1 = 0.5, β2 = 0.999, and ε = 10−8, except for G Y Y , which uses β1 = 0.9 instead. The learning rate was initialized to 10−4 and halved every 100 epochs. To ensure a fair comparison among different SR models, we trained all the models from scratch on our training datasets rather than using the pre-trained models. We set the weights for loss as ω 1 = 0.001, λ 1 = 0.001, and λ 2 = 0.01.

3.4. Comparison with State-of-the-Art Methods

To validate the effectiveness of the proposed BLG-GAN, we implemented several state-of-the-art methods, including seven CNN-based methods and five GAN-based methods. The CNN-based methods implemented were EDSR [11], D-DBPN [12], RRDB [15], RDN [14], RCAN [13], HAN [27], and DRN-L [48]. Additionally, we implemented the following GAN-based methods: SRGAN [9], ESRGAN [15], ESRGAN-FS [40], EESRGAN [21], and SG-GAN [22]. For all these models, real-world LR images (MS images) were used as inputs, and the corresponding SR images were obtained directly from a single SR network or generator.
SRGAN [9] and ESRGAN [15] are widely recognized studies that introduced GANs to solve the SR problem, and their basic structures have been adopted by many researchers. ESRGAN-FS [40] builds upon the structure of ESRGAN and incorporates a frequency separation training strategy. As these GAN-based models were originally developed for images used in computer vision, we also compared our method with two GAN-based SR methods designed for remote sensing images: EESRGAN [21] and SG-GAN [22]. EESRGAN is an improved model based on EEGAN [20], which aims to enhance the edges extracted from an image by adding an edge-enhancement network to the back of the generator. On the other hand, the SG-GAN adopts a salient object detection network [49] at the rear of the SRGAN-based generator, leveraging saliency information to generate more detailed SR outputs.
The quantitative evaluation results for the WV3-1 and WV3-2 datasets are presented in Table 1 and Table 2. Consistent with previous research [9,40], CNN-based methods tend to achieve high PSNR and SSIM values but they also exhibit high LPIPS values. This is because using only pixel-wise loss (e.g., MSE loss or L1 loss) for the SR model training often results in blurry and overly smoothed SR outputs with low perceptual quality. In contrast, GAN-based methods generated visually pleasing SR results with low LPIPS and NIQE values. However, this comes at the expense of decreased PSNR, SSIM, and UIQI values, as well as increased SAM and ERGAS values, which can be attributed to the introduction of pseudo-texture through adversarial training. Therefore, it is crucial to effectively suppress the pseudo-texture while preserving high image fidelity to construct a successful GAN-based SR model. Furthermore, it is worth noting that, although GAN-based methods yield better NIQE values than CNN-based methods, it can be difficult to distinguish subtle performance differences among the CNN-based or GAN-based methods using NIQE alone. This limitation arises from the inherent nature of NIQE as a no-reference image quality index derived from a natural image database [47]. As remote sensing images have distinct image characteristics compared to natural images, the evaluation results using NIQE often deviate from human perceptions [31,50]. Hence, it is preferable to consider the limitations associated with using a no-reference index when evaluating the performance of SR methods for remote sensing images.
As shown in Table 1 and Table 2, HAN [27] and RDN [14] exhibited superior SR performance among the CNN-based methods. Among the GAN-based methods, the proposed BLG-GAN model achieved superior SR performance for both the WV3-1 and WV3-2 datasets. Although HAN shows better image quality in terms of image fidelity than BLG-GAN, the LPIPS and NIQE values of HAN are significantly higher than those of BLG-GAN. This indicates a limitation of CNN-based methods in respect of perceptual quality. To further investigate the performance, we also implemented a one-stage version of the BLG-GAN model, denoted as “BLG-GAN (1-stage)” in Table 1 and Table 2. This one-stage model consists of G Y Y and D Y , which employ the same network structures as the proposed two-stage BLG-GAN model. The one-stage BLG-GAN is intended to learn a direct relationship between real-world LR and HR images. While the one-stage BLG-GAN model outperforms other GAN-based models, the two-stage BLG-GAN model achieves superior SR performance in terms of both image fidelity and perceptual quality. These experimental results verify that the proposed BLG-GAN can generate clearer SR images than other methods by utilizing bicubic-like LR images obtained through LR image transfer as input to the SR model. In particular, our method significantly reduces the LPIPS values while maintaining high values for image fidelity metrics, outperforming all other GAN-based methods. The enhancement of the perceptual quality of the SR outputs can also be observed in Figure 6.
Figure 7 illustrates the SR process of the BLG-GAN model for real-world remote sensing images using LR image transfer to bicubic-like LR images. Once the real-world LR image (Figure 7a) is fed into the image transfer network, the input image is restored to a less blurry bicubic-like LR image (Figure 7b) with sharper edges. The subsequent SR network can use these edges to generate SR images with clear details. As shown in Figure 7c, the BLG-GAN successfully recovers various ground features, including road lanes, parking lines in the parking lot, and the rectangular shape of building roofs.
Furthermore, we evaluated the computational efficiency of BLG-GAN by considering the number of network parameters (M) and SR performance. As illustrated in Figure 8, the CNN-based models showed higher PSNR and LPIPS values with fewer network parameters than the GAN-based models. This is because GAN-based models incorporate additional parameters from the discriminator. Most GAN-based models showed slightly lower PSNR values than the CNN-based models while improving the perceptual quality of the SR outputs, as indicated by low LPIPS values. Remarkably, BLG-GAN achieved superior results in terms of both PSNR and LPIPS compared to the state-of-the-art models, even with a smaller overall capacity of the SR network. This indicates that our approach of dividing the SR problem into subproblems is valid for real-world remote sensing images.

4. Discussion

To verify the effectiveness of each component of the proposed BLG-GAN, we analyzed the influence of the generator and discriminator architectures, the type of GAN loss, and the type of perceptual loss on the SR performance. The final architecture and loss function of the BLG-GAN were determined based on the results of the following analyses.

4.1. Generator Architecture

We compared the SR performance of the generators G Y Y using four different basic blocks: the residual block from SRResNet [9], the residual block based on ResNet-18/34, the residual in residual dense block (RRDB) from ESRGAN [15] (Figure 9), and RCAB from RCAN [13] (Figure 3b). The residual block from SRRestNet consists of two convolutional layers followed by batch normalization (BN) and uses ParametricReLU (PReLU) as the activation function [9] (Figure 9a). The main difference between the residual blocks from SRResNet and ResNet-18/34 is whether the block contains BN layers or not. Previous studies [11,15] have shown that BN layers are preferentially removed from the SR network because normalization of the features can restrain the generalization ability and deteriorate the SR performance. The RRDB also eliminates BN layers from the block architecture and integrates dense connections into a multilevel residual network to increase the network capacity. As mentioned in Section 2.3, the RCAB was proposed as a basic block for RCAN [13]. The RCAN model is based on the RIR structure with multiple residual groups. Each residual group comprises RCABs, which utilize a channel attention mechanism to extract more informative features from the input.
The number of blocks was set to 16 for the generators using residual blocks from SRResNet and ResNet-18/34, following the configuration in [9]. For the RRDB, we used the same number of blocks as in the original study [15], which is 23. While the original RCAN model had 10 residual groups with 20 residual blocks [13], we reduced it to five residual groups with 20 residual blocks in our implementation. Despite the reduced network capacity, we could achieve satisfactory SR results from the generator. To ensure a fair comparison among the generators with different basic blocks, all the input LR images were obtained from the LR image transfer generator G X Y , which is described in Section 2.1. In the training phase, we employed patchGAN [42] as the discriminator for the SR outputs and trained the generators using pixel-wise and adversarial losses in all cases.
From the evaluation results presented in Table 3 and Table 4, it was confirmed that utilizing residual blocks without BN instead of residual blocks with BN can improve the SR performance, which is consistent with previous observations [11,15]. The generator that employed the RCAB exhibited a superior performance, achieving the highest values for PSNR, SSIM, and UIQI, as well as the lowest values for SAM, ERGAS, and LPIPS values for both datasets. As a result, RCAB was chosen as the basic block for G Y Y , and further analysis was conducted on discriminators and losses.

4.2. Discriminator Architecture and GAN Loss

To compare the discriminator architectures, we selected two widely used networks, SRGAN [9] and patchGAN [42], for adversarial training of the SR network. The SRGAN discriminator (SRGAN-D) originates from [9] and is used in two state-of-the-art GAN-based SR methods, SRGAN and ESRGAN. SRGAN-D consists of eight convolutional layers with increasing features from 64 to 512 by a factor of two. The output features are then passed through two fully connected dense layers, followed by a sigmoid activation function. On the other hand, the patchGAN discriminator [42] provides a patch-based decision on whether the input image patch is real or fake. The detailed architecture of the patchGAN discriminator is presented in Section 2.3.2 (Figure 4b). We tested two types of GAN losses for both discriminators: the standard GAN loss [51] and the LSGAN loss [37]. Additionally, we applied the relativistic average GAN (RaGAN) [52] to SRGAN-D, as proposed in [15]. Besides the LSGAN, which is already explained in Section 2.1, the formulations for the standard GAN and RaGAN losses are provided below for comparison. The standard GAN loss for the generator ( L G ) and discriminator ( L D ) can be formulated as:
L G = 1 N i = 1 N log D Y ( x f , i ) ,
L D = 1 N i = 1 N log ( D Y ( x r , i ) ) log ( 1 D Y ( x f , i ) ) ,
where N denotes the number of training samples. x r and x f represent the real data (HR image) and fake data (SR image), respectively. While standard GAN loss for the generator uses 1-side loss, RaGAN utilizes both real and fake data in adversarial training. The RaGAN losses for the generator ( L G R a ) and discriminator ( L D R a ) are formulated as:
L G R a = E x r [ log ( 1 D Y R a ( x r , x f ) ) ] E x f [ log D Y R a ( x f , x r ) ] ,
L D R a = E x r [ log D Y R a ( x r , x f ) ] E x f [ log ( 1 D Y R a ( x f , x r ) ) ] ,
where D R a ( x r , x f ) = σ ( C ( x r ) E x f [ C ( x f ) ] ) . σ is the sigmoid function and C(x) is the output of the discriminator before applying the final sigmoid function. E x f [ · ] means averaging the inputs for fake data ( x f ) in the batch.
Based on the evaluation results presented in Table 5 and Table 6, we verified the effectiveness of the PatchGAN discriminator for SR model training. The SR results obtained using the PatchGAN discriminator showed better LPIPS values than SRGAN-D for all types of GAN loss, while maintaining high values for PSNR, SSIM, and UIQI, and low values for SAM and ERGAS. Among the different types of GAN loss, LSGAN was found to improve image fidelity metrics more than the standard GAN. On the other hand, RaGAN showed inconsistent performance across the two test datasets, which can be attributed to the distortions introduced in the SR outputs by excessive pseudo-textures. Therefore, for our proposed BLG-GAN model, we selected PatchGAN as the discriminator and LSGAN loss as the GAN loss, considering the superior perceptual quality of the SR outputs along with reasonably good image fidelity.

4.3. Perceptual Loss

In most CNN-based methods, the perceptual quality of SR images is often limited due to the optimization using only pixel-wise MSE loss or L1 loss. To address this limitation and improve the perceptual quality of SR images, Ledig et al. [9] introduced perceptual loss in their GAN-based SR model. However, some studies [53] reported that the use of perceptual loss can introduce color variations and alter the original spectral information of the images. Therefore, we aimed to investigate the effect of different perceptual losses on the performance of SR models, using the model architecture determined in Section 4.1 and Section 4.2. We employed the same generator and discriminator architecture and tested three different perceptual losses: (1) VGG loss based on the pre-trained VGG-19 model, computed in the L2 norm ( L v g g 19 L 2 ), as proposed in [9]; (2) VGG loss computed in the L1 norm ( L v g g 19 L 1 ); and (3) perceptual loss based on LPIPS, which utilizes features extracted from the pre-trained VGG-16 model. The two VGG-19 model-based perceptual losses are defined as:
L v g g 19 L 1 ( y ^ , y ) = ϕ ( y ^ ) ϕ ( y ) 1 ,
L v g g 19 L 2 ( y ^ , y ) = ϕ ( y ^ ) ϕ ( y ) 2 ,
where ϕ ( · ) represents the output features from the pre-trained VGG-19 model.
As shown in Table 7 and Table 8, the utilization of any perceptual loss resulted in a slight decrease in the values of PSNR, SSIM, and UIQI, and an increase in the values of SAM and ERGAS. However, it significantly enhanced the perceptual quality of the SR images, as evidenced by the decreased values of LPIPS. Among the three perceptual losses tested, the LPIPS loss exhibited the highest values for image fidelity metrics and achieved a remarkable enhancement in perceptual image quality. These trends remained consistent across both test datasets, validating the effectiveness of employing LPIPS as a perceptual loss for improving the perceptual quality of SR images. The pre-trained models used for calculating perceptual loss were originally trained for high-level tasks, such as VGG models trained for classification. Leveraging these pre-trained models in training SR models proves highly beneficial because it enables the integration of high-level features into low-level tasks, such as image super-resolution.

5. Conclusions

In this study, we proposed a novel two-stage SR model for real-world remote sensing images. The proposed BLG-GAN method divides the image super-resolution procedure into two stages: LR image transfer and super-resolution. In the LR transfer stage, our proposed method refines the input LR images by transforming them into less blurry and noisy bicubic-like LR images using the guidance from synthetic LR images obtained through bicubic downsampling. The refined LR images are then fed into the SR network, which learns the relationship between the bicubic-like LR images and their corresponding HR images. By utilizing bicubic-downsampled LR images as a bridge between the real-world LR and HR images, our BLG-GAN method achieves a superior SR performance in terms of both image fidelity and perceptual quality. Moreover, since synthetic LR images can be easily obtained through bicubic downsampling, BLG-GAN can be easily implemented with a lower computational burden. In future studies, our method can be further validated using remote sensing images from other sources. Incorporating multi-source remote sensing images would enable the construction of large-scale datasets and facilitate comparisons with data-intensive models, which was not feasible in this study due to limited dataset size. Furthermore, the proposed method can be enhanced by integrating transfer learning techniques within the framework to address real-world remote sensing images without reference.

Author Contributions

Conceptualization, M.C.; methodology, M.C. and M.J.; software, M.C.; validation, M.C.; formal analysis, M.C.; investigation, M.C. and M.J.; resources, Y.K.; data curation, M.C.; writing—original draft preparation, M.C.; writing—review and editing, M.J. and Y.K.; visualization, M.C.; supervision, Y.K.; project administration, Y.K.; funding acquisition, Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (Grant NRF-2023R1A2C2005548) and by the Korea Agency for Infrastructure Technology Advancement (KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant RS-2022-00155763).

Data Availability Statement

The remote sensing imagery used in this study (WorldView-3) is subject to a restrictive commercial license and cannot be shared publicly.

Acknowledgments

The Institute of Engineering Research at Seoul National University provided research facilities for this work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1 and Table A2 provide quantitative assessment results of SR models trained on synthetic datasets. The results demonstrate that the SR models trained on synthetic LR-HR image datasets achieve better SR performance than those trained on real-world LR-HR image datasets. To ensure a fair comparison, all models were trained from scratch using the same hyperparameters for synthetic and real-world datasets. The comparison between Table 1 and Table 2 in Section 3.4 and Table A1 and Table A2 reveals that generating HR images from real-world LR images is more challenging than synthetic LR images.
Table A1. Quantitative assessment results of state-of-the-art SR models trained on synthetic LR-HR image dataset (WV3-1 dataset).
Table A1. Quantitative assessment results of state-of-the-art SR models trained on synthetic LR-HR image dataset (WV3-1 dataset).
MethodPSNRSSIMSAMERGASUIQILPIPSNIQE
Bicubic32.83640.88460.020647.54500.62910.27576.6302
CNN-basedEDSR [11]34.31550.91030.018040.71230.66980.23007.8177
D-DBPN [12]33.81190.90060.020442.84390.64230.25347.4990
RRDBNet [15]33.95960.90450.020042.29570.65140.24887.5386
RDN [14]34.50490.91340.018539.88930.67550.21977.6263
RCAN [13]34.24890.90940.019141.02360.66500.22947.4470
HAN [27]34.63530.91550.018239.33250.68140.21427.8436
DRN-L [48]34.42710.91230.018640.27550.67400.22367.6602
GAN-basedSRGAN [9]31.53880.83550.052556.19570.48710.25465.3869
ESRGAN [15]31.45960.84670.045256.50950.52740.22905.2891
ESRGAN-FS [40]31.40430.84910.042657.98270.53700.21885.3988
EESRGAN [21]32.69460.87550.030548.90420.59390.20465.8502
SG-GAN [22]32.32340.86290.035651.03800.54350.26555.5719
Table A2. Quantitative assessment results of state-of-the-art SR models trained on synthetic LR-HR image dataset (WV3-2 dataset).
Table A2. Quantitative assessment results of state-of-the-art SR models trained on synthetic LR-HR image dataset (WV3-2 dataset).
MethodPSNRSSIMSAMERGASUIQILPIPSNIQE
Bicubic33.16500.89210.018538.72330.66120.26366.6402
CNN-basedEDSR [11]34.44220.91270.016833.62000.69120.23257.6672
D-DBPN [12]34.00340.90460.019035.29380.66940.25097.2414
RRDBNet [15]34.17240.90900.019534.63220.67860.24357.1857
RDN [14]34.65150.91640.017232.85340.69820.22357.4615
RCAN [13]34.44690.91300.017633.61670.69110.22757.4895
HAN [27]34.80120.91880.016932.33180.70480.21747.4457
DRN-L [48]34.58610.91510.017333.09980.69720.22627.4913
GAN-basedSRGAN [9]31.38950.83670.048447.71440.51200.26615.4253
ESRGAN [15]31.38610.85350.041847.93430.56390.22605.0752
ESRGAN-FS [40]31.66240.85670.038346.21970.56890.21835.4178
EESRGAN [21]33.03590.87930.027039.51300.62250.19875.6776
SG-GAN [22]33.80970.90380.024736.03390.66500.23887.1215

Appendix B

To construct the real-world LR-HR datasets, we evaluated several pansharpening methods, including component substitution (CS)-based, multiresolution analysis (MRA)-based, and hybrid methods. The CS-based methods employed were GSA [33], partial replacement adaptive component substitution (PRACS) [54], and hybrid pansharpening algorithm using NDVI in spectral mode (HP-NDVIspectral) [55]. In addition, we implemented the following MRA-based methods: high pass filtering (HPF) algorithm [56], additive wavelet luminance proportional (AWLP) [57], and generalized Laplacian pyramid with modulation transfer function and high-pass modulation (MTF-GLP-HPM) algorithm [58]. Furthermore, the hybrid pansharpening algorithm using NDVI in spatial mode (HP-NDVIspatial) [55] was also implemented as a hybrid method. The detailed explanation of each pansharpening method is beyond the scope of this study. For detailed methodological information, please refer to the original papers.
Due to the unavailability of reference HR images, the image quality of pansharpened images from the WV3-1 and WV3-2 datasets was evaluated using no-reference metrics, including perception-based image quality evaluator (PIQE) [59], NIQE [47], and blind/referenceless image spatial quality evaluator (BRISQUE) [60]. Lower values of PIQE, NIQE, and BRISQUE indicate better image quality. Figure A1 presents the box plots of the image quality assessment results obtained from different pansharpening methods. The MRA-based methods exhibited slightly lower PIQE and BRISQUE values than the CS-based methods. However, the MRA-based methods tended to generate blurry images in comparison to the CS-based methods, as depicted in Figure A2. This suggests that the blurriness in remote sensing images may be perceived as smoothness in natural images. Among the CS-based methods, GSA showed stable performance with low PIQE values and concentrated distributions for BRISQUE values. The distributions of NIQE values were similar across the different pansharpening methods, indicating no significant difference. In addition, the HP-NDVIspatial method exhibited superior performance in terms of PIQE and BRISQUE values, but the introduction of excessive spatial information often led to undesired artifacts such as pseudo-textures. Therefore, the GSA algorithm was chosen to generate HR images for the LR-HR datasets due to its stable performance and visually clear pansharpened images. Nevertheless, further investigation on large-scale datasets is necessary to generalize these observations.
Figure A1. Comparison of image quality assessment results of pansharpening images obtained from the WV3-1 dataset using (a) PIQE, (c) NIQE, and (e) BRISQUE, and from the WV3-2 dataset using (b) PIQE, (d) NIQE, and (f) BRISQUE. On each box, the red line indicates the median and the outliers are plotted as blue ‘+’ markers.
Figure A1. Comparison of image quality assessment results of pansharpening images obtained from the WV3-1 dataset using (a) PIQE, (c) NIQE, and (e) BRISQUE, and from the WV3-2 dataset using (b) PIQE, (d) NIQE, and (f) BRISQUE. On each box, the red line indicates the median and the outliers are plotted as blue ‘+’ markers.
Remotesensing 15 03309 g0a1
Figure A2. Visual comparison on two remote sensing datasets: Examples of pansharpened images obtained from the (a) WV3-1 and (b) WV3-2 datasets.
Figure A2. Visual comparison on two remote sensing datasets: Examples of pansharpened images obtained from the (a) WV3-1 and (b) WV3-2 datasets.
Remotesensing 15 03309 g0a2

References

  1. Freeman, W.T.; Pasztor, E.C.; Carmichael, O.T. Learning Low-Level Vision. Int. J. Comput. Vis. 2000, 40, 25–47. [Google Scholar] [CrossRef]
  2. Yue, L.; Shen, H.; Li, J.; Yuan, Q.; Zhang, H.; Zhang, L. Image Super-Resolution: The Techniques, Applications, and Future. Signal Process. 2016, 128, 389–408. [Google Scholar] [CrossRef]
  3. Freeman, W.T.; Jones, T.R.; Pasztor, E.C. Example-Based Super-Resolution. IEEE Comput. Graph. Appl. 2002, 22, 56–65. [Google Scholar] [CrossRef]
  4. Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef]
  5. Dong, C.; Loy, C.C.; Tang, X. Accelerating the Super-Resolution Convolutional Neural Network. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 391–407. [Google Scholar]
  6. Kim, J.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
  7. Kim, J.; Lee, J.K.; Lee, K.M. Deeply-Recursive Convolutional Network for Image Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar]
  8. Tai, Y.; Yang, J.; Liu, X. Image Super-Resolution via Deep Recursive Residual Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3148–3155. [Google Scholar]
  9. Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar]
  10. Sajjadi, M.S.M.; Schölkopf, B.; Hirsch, M. EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4491–4500. [Google Scholar]
  11. Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1132–1140. [Google Scholar]
  12. Haris, M.; Shakhnarovich, G.; Ukita, N. Deep Back-Projection Networks for Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1664–1673. [Google Scholar]
  13. Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 294–310. [Google Scholar]
  14. Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual Dense Network for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
  15. Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW), Munich, Germany, 8–14 September 2018; pp. 63–79. [Google Scholar]
  16. Ha, V.K.; Ren, J.-C.; Xu, X.-Y.; Zhao, S.; Xie, G.; Masero, V.; Hussain, A. Deep Learning Based Single Image Super-resolution: A Survey. Int. J. Autom. Comput. 2019, 16, 413–426. [Google Scholar] [CrossRef]
  17. Chen, H.; He, X.; Qing, L.; Wu, Y.; Ren, C.; Sheriff, R.E.; Zhu, C. Real-World Single Image Super-Resolution: A Brief Review. Inf. Fusion 2022, 79, 124–145. [Google Scholar] [CrossRef]
  18. Singla, K.; Pandey, R.; Ghanekar, U. A Review on Single Image Super Resolution Techniques Using Generative Adversarial Network. Optik 2022, 266, 169607. [Google Scholar] [CrossRef]
  19. Jozdani, S.; Chen, D.; Pouliot, D.; Johnson, B.A. A Review and Meta-Analysis of Generative Adversarial Networks and Their Applications in Remote Sensing. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102734. [Google Scholar] [CrossRef]
  20. Jiang, K.; Wang, Z.; Yi, P.; Wang, G.; Lu, T.; Jiang, J. Edge-Enhanced GAN for Remote Sensing Image Superresolution. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5799–5812. [Google Scholar] [CrossRef]
  21. Rabbi, J.; Ray, N.; Schubert, M.; Chowdhury, S.; Chao, D. Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network. Remote Sens. 2020, 12, 1432. [Google Scholar] [CrossRef]
  22. Liu, B.; Zhao, L.; Li, J.; Zhao, H.; Liu, W.; Li, Y.; Wang, Y.; Chen, H.; Cao, W. Saliency-Guided Remote Sensing Image Super-Resolution. Remote Sens. 2021, 13, 5144. [Google Scholar] [CrossRef]
  23. Lai, W.-S.; Huang, J.-B.; Ahuja, N.; Yang, M.-H. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5835–5843. [Google Scholar]
  24. Zhang, K.; Zuo, W.; Gu, S.; Zhang, L. Learning Deep CNN Denoiser Prior for Image Restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2808–2817. [Google Scholar]
  25. Zhang, K.; Zuo, W.; Zhang, L. Learning a Single Convolutional Super-Resolution Network for Multiple Degradations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3262–3271. [Google Scholar]
  26. Dai, T.; Cai, J.; Zhang, Y.; Xia, S.-T.; Zhang, L. Second-Order Attention Network for Single Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11057–11066. [Google Scholar]
  27. Niu, B.; Wen, W.; Ren, W.; Zhang, X.; Yang, L.; Wang, S.; Zhang, K.; Cao, X.; Shen, H. Single Image Super-Resolution via a Holistic Attention Network. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 191–207. [Google Scholar]
  28. Cai, J.; Zeng, H.; Yong, H.; Cao, Z.; Zhang, L. Toward Real-World Single Image Super-Resolution: A New Benchmark and a New Model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3086–3095. [Google Scholar]
  29. Wei, P.; Xie, Z.; Lu, H.; Zhan, Z.; Ye, Q.; Zuo, W.; Lin, L. Component Divide-and-Conquer for Real-World Image Super-Resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 101–117. [Google Scholar]
  30. Zhang, X.; Chen, Q.; Ng, R.; Koltun, V. Zoom to Learn, Learn to Zoom. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3757–3765. [Google Scholar]
  31. Zhang, N.; Wang, Y.; Zhang, X.; Xu, D.; Wang, X.; Ben, G.; Zhao, Z.; Li, Z. A Multi-Degradation Aided Method for Unsupervised Remote Sensing Image Super Resolution With Convolution Neural Networks. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  32. Zhang, J.; Xu, T.; Li, J.; Jiang, S.; Zhang, Y. Single-Image Super Resolution of Remote Sensing Images with Real-World Degradation Modeling. Remote Sens. 2022, 14, 2895. [Google Scholar] [CrossRef]
  33. Aiazzi, B.; Baronti, S.; Selva, M. Improving Component Substitution Pansharpening Through Multivariate Regression of MS+Pan Data. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3230–3239. [Google Scholar] [CrossRef]
  34. Yuan, Y.; Liu, S.; Zhang, J.; Zhang, Y.; Dong, C.; Lin, L. Unsupervised Image Super-Resolution Using Cycle-in-Cycle Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 814–823. [Google Scholar]
  35. Lugmayr, A.; Danelljan, M.; Timofte, R. Unsupervised Learning for Real-World Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 3408–3416. [Google Scholar]
  36. Maeda, S. Unpaired Image Super-Resolution Using Pseudo-Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 291–300. [Google Scholar]
  37. Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.K.; Wang, Z.; Smolley, S.P. Least Squares Generative Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2813–2821. [Google Scholar]
  38. Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.K.; Wang, Z.; Smolley, S.P. On the Effectiveness of Least Squares Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2947–2960. [Google Scholar] [CrossRef]
  39. Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
  40. Fritsche, M.; Gu, S.; Timofte, R. Frequency Separation for Real-World Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 3599–3608. [Google Scholar]
  41. Jo, Y.; Yang, S.; Kim, S.J. Investigating Loss Functions for Extreme Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1705–1712. [Google Scholar]
  42. Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
  43. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
  44. Yuhas, R.H.; Goetz, A.F.H.; Boardman, J.W. Discrimination among Semi-Arid Landscape Endmembers Using the Spectral Angle Mapper (SAM) Algorithm. In Proceedings of the Summaries of the Third Annu. JPL Airborne Geoscience Workshop, Pasadena, CA, USA, 1–5 June 1992; pp. 147–149. [Google Scholar]
  45. Liu, J.G. Smoothing Filter-based Intensity Modulation: A Spectral Preserve Image Fusion Technique for Improving Spatial Details. Int. J. Remote Sens. 2000, 21, 3461–3472. [Google Scholar] [CrossRef]
  46. Wang, Z.; Bovik, A.C. A Universal Image Quality Index. IEEE Signal Process. Lett. 2002, 9, 81–84. [Google Scholar] [CrossRef]
  47. Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “Completely Blind” Image Quality Analyzer. IEEE Signal Process. Lett. 2013, 20, 209–212. [Google Scholar] [CrossRef]
  48. Guo, Y.; Chen, J.; Wang, J.; Chen, Q.; Cao, J.; Deng, Z.; Xu, Y.; Tan, M. Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5406–5415. [Google Scholar]
  49. Qin, X.; Zhang, Z.; Huang, C.; Gao, C.; Dehghan, M.; Jagersand, M. BASNet: Boundary-Aware Salient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7471–7481. [Google Scholar]
  50. Lei, S.; Shi, Z.; Zou, Z. Coupled Adversarial Training for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3633–3643. [Google Scholar] [CrossRef]
  51. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  52. Jolicoeur-Martineau, A. The Relativistic Discriminator: A Key Element Missing from Standard GAN. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019; pp. 1–26. [Google Scholar]
  53. Zhou, Y.; Deng, W.; Tong, T.; Gao, Q. Guided Frequency Separation Network for Real-World Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1722–1731. [Google Scholar]
  54. Choi, J.; Yu, K.; Kim, Y. A New Adaptive Component-Substitution-Based Satellite Image Fusion by Using Partial Replacement. IEEE Trans. Geosci. Remote Sens. 2011, 49, 295–309. [Google Scholar] [CrossRef]
  55. Choi, J.; Kim, G.; Park, N.; Park, H.; Choi, S. A Hybrid Pansharpening Algorithm of VHR Satellite Images that Employs Injection Gains Based on NDVI to Reduce Computational Costs. Remote Sens. 2017, 9, 976. [Google Scholar] [CrossRef]
  56. Chavez, P.; Sides, S.C.; Anderson, J.A. Comparison of Three Different Methods to Merge Multiresolution and Multispectral Data: Landsat TM and SPOT Panchromatic. Photogramm. Eng. Remote Sens. 1991, 57, 295–303. [Google Scholar]
  57. Otazu, X.; Gonzalez-Audicana, M.; Nunez, O.F.J. Introduction of Sensor Spectral Response into Image Fusion Methods. Application to Wavelet-Based Methods. IEEE Trans. Geosci. Remote Sens. 2005, 43, 2376–2385. [Google Scholar] [CrossRef]
  58. Vivone, G.; Restaino, R.; Mura, M.D.; Licciardi, G.; Chanussot, J. Contrast and Error-Based Fusion Schemes for Multispectral Image Pansharpening. IEEE Geosci. Remote Sens. Lett. 2014, 11, 930–934. [Google Scholar] [CrossRef]
  59. Venkatanath, N.; Praneeth, D.; Chandrasekhar, B.M.; Channappayya, S.S.; Medasani, S.S. Blind Image Quality Evaluation Using Perception Based Features. In Proceedings of the 21st National Conference on Communications (NCC), Mumbai, India, 27 February–1 March 2015; pp. 1–6. [Google Scholar]
  60. Mittal, A.; Moorthy, A.K.; Bovik, A.C. Blind/Referenceless Image Spatial Quality Evaluator. In Proceedings of the 45th Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, USA, 6–9 November 2011; pp. 723–727. [Google Scholar]
Figure 1. Comparison of the synthetic LR image with real-world LR image and HR image: (a) bicubic-downsampled LR image; (b) real-world LR image (MS image); (c) HR image (pansharpened MS image). For the convenience of comparison, the LR images are enlarged to the size of the HR images.
Figure 1. Comparison of the synthetic LR image with real-world LR image and HR image: (a) bicubic-downsampled LR image; (b) real-world LR image (MS image); (c) HR image (pansharpened MS image). For the convenience of comparison, the LR images are enlarged to the size of the HR images.
Remotesensing 15 03309 g001
Figure 2. Overall network architecture of the proposed BLG-GAN. The proposed network consists of two stages: LR image transfer and super-resolution. In the LR image transfer stage, the generator ( G X Y ) transfers real-world LR images to bicubic-like LR images. In the super-resolution stage, the subsequent generator ( G Y Y ) learns the relationship between bicubic-like LR images and HR images.
Figure 2. Overall network architecture of the proposed BLG-GAN. The proposed network consists of two stages: LR image transfer and super-resolution. In the LR image transfer stage, the generator ( G X Y ) transfers real-world LR images to bicubic-like LR images. In the super-resolution stage, the subsequent generator ( G Y Y ) learns the relationship between bicubic-like LR images and HR images.
Remotesensing 15 03309 g002
Figure 3. Architecture of the generators used in the proposed BLG-GAN: (a) generator with residual channel attention blocks (RCABs) [13]; (b) RCAB from the generator. G X Y and G Y Y share the same framework composed of residual groups with RCABs. G Y Y includes upsampling blocks to increase the size of the LR image by a factor of 4, whereas G X Y does not require upsampling blocks because the scale of input and output images do not change in LR image transfer.
Figure 3. Architecture of the generators used in the proposed BLG-GAN: (a) generator with residual channel attention blocks (RCABs) [13]; (b) RCAB from the generator. G X Y and G Y Y share the same framework composed of residual groups with RCABs. G Y Y includes upsampling blocks to increase the size of the LR image by a factor of 4, whereas G X Y does not require upsampling blocks because the scale of input and output images do not change in LR image transfer.
Remotesensing 15 03309 g003
Figure 4. Architecture of the discriminators used in the proposed BLG-GAN: (a) discriminator for LR image transfer ( D Y ); (b) discriminator for super-resolution ( D Y ). D Y and D Y share the same discriminator structure based on patchGAN [42]. Considering the size of the input image, the stride of the first three convolution layers is set to one and two for D Y and D Y , respectively.
Figure 4. Architecture of the discriminators used in the proposed BLG-GAN: (a) discriminator for LR image transfer ( D Y ); (b) discriminator for super-resolution ( D Y ). D Y and D Y share the same discriminator structure based on patchGAN [42]. Considering the size of the input image, the stride of the first three convolution layers is set to one and two for D Y and D Y , respectively.
Remotesensing 15 03309 g004
Figure 5. Remote sensing images used in this study; WorldView-3 (WV-3) images acquired over Pyeongdong Industrial Complex located in Gwangju, Republic of Korea, on (a) 26 May 2017 and (b) 4 May 2018.
Figure 5. Remote sensing images used in this study; WorldView-3 (WV-3) images acquired over Pyeongdong Industrial Complex located in Gwangju, Republic of Korea, on (a) 26 May 2017 and (b) 4 May 2018.
Remotesensing 15 03309 g005
Figure 6. Visual comparison on two remote sensing datasets with a scale factor of four. Examples of SR results from the (a) WV3-1 and (b) WV3-2 datasets.
Figure 6. Visual comparison on two remote sensing datasets with a scale factor of four. Examples of SR results from the (a) WV3-1 and (b) WV3-2 datasets.
Remotesensing 15 03309 g006
Figure 7. Examples of SR results from BLG-GAN: (a) input real-world LR images; (b) the generated bicubic-like LR images; (c) output SR images; (d) reference HR images.
Figure 7. Examples of SR results from BLG-GAN: (a) input real-world LR images; (b) the generated bicubic-like LR images; (c) output SR images; (d) reference HR images.
Remotesensing 15 03309 g007
Figure 8. Comparison of the number of network parameters (M) and SR performance using (a) PSNR and (b) LPIPS metrics. The proposed BLG-GAN model is indicated as a red point and the CNN- and GAN-based models are shown as blue and orange points, respectively.
Figure 8. Comparison of the number of network parameters (M) and SR performance using (a) PSNR and (b) LPIPS metrics. The proposed BLG-GAN model is indicated as a red point and the CNN- and GAN-based models are shown as blue and orange points, respectively.
Remotesensing 15 03309 g008
Figure 9. Basic blocks for generator G Y Y used in this study to analyze the influence of generator architecture on SR performance: (a) residual block from SRResNet [9] (residual block with BN); (b) residual block based on ResNet-18/34 (residual block without BN); (c) residual in residual dense block (RRDB) [15].
Figure 9. Basic blocks for generator G Y Y used in this study to analyze the influence of generator architecture on SR performance: (a) residual block from SRResNet [9] (residual block with BN); (b) residual block based on ResNet-18/34 (residual block without BN); (c) residual in residual dense block (RRDB) [15].
Remotesensing 15 03309 g009
Table 1. Quantitative comparison with state-of-the-art methods on the WV3-1 dataset. The best and the second-best performances for each method are indicated in bold and underlined, respectively.
Table 1. Quantitative comparison with state-of-the-art methods on the WV3-1 dataset. The best and the second-best performances for each method are indicated in bold and underlined, respectively.
MethodPSNRSSIMSAMERGASUIQILPIPSNIQE
Bicubic30.29860.81730.024263.07690.41730.35457.2993
CNN-basedEDSR [11]31.95580.85860.021752.45380.49520.32477.3323
D-DBPN [12]31.10500.83970.024857.61840.45280.33907.2362
RRDBNet [15]31.71010.85400.023853.92450.48550.32887.7258
RDN [14]32.59400.87040.022349.08620.52190.30927.3097
RCAN [13]32.19320.86260.023451.12090.50660.31077.2772
HAN [27]32.82070.87520.021547.88510.53590.29807.5154
DRN-L [48]32.04140.86150.022151.96850.50140.32227.7383
GAN-basedSRGAN [9]29.19610.77020.056072.46880.34200.32314.8997
ESRGAN [15]29.21970.78920.044972.16510.39040.28705.0202
ESRGAN-FS [40]28.97100.78270.050474.39830.38810.28524.9360
EESRGAN [21]30.48830.81380.035062.11570.43290.26695.5291
SG-GAN [22]30.95050.83100.029358.63630.43780.30735.5822
BLG-GAN (1-stage)31.41310.83730.027255.80050.45570.27405.7224
BLG-GAN32.14160.85180.024751.84530.48830.23495.7999
Table 2. Quantitative comparison with state-of-the-art methods on the WV3-2 dataset. The best and the second-best performances for each method are indicated in bold and underlined, respectively.
Table 2. Quantitative comparison with state-of-the-art methods on the WV3-2 dataset. The best and the second-best performances for each method are indicated in bold and underlined, respectively.
MethodPSNRSSIMSAMERGASUIQILPIPSNIQE
Bicubic30.13140.81580.024654.93470.43920.36017.2807
CNN-basedEDSR [11]31.51490.85090.024447.08630.50320.33427.9368
D-DBPN [12]31.18190.84230.024948.85360.48220.34447.9284
RRDBNet [15]31.24240.84630.024548.48820.49200.34248.2237
RDN [14]31.71360.85660.024846.00440.51360.32467.8459
RCAN [13]31.53060.85250.023546.87950.50480.32707.6668
HAN [27]31.86710.86120.023945.23430.52450.31517.9126
DRN-L [48]31.52190.85370.023146.96400.50800.33108.0392
GAN-basedSRGAN [9]28.99430.76530.054462.83200.35080.33015.1271
ESRGAN [15]29.08570.77980.058262.10980.40060.30364.9066
ESRGAN-FS [40]29.22710.78990.043561.79390.41840.28945.1028
EESRGAN [21]30.32890.80760.033953.67080.43810.27815.3389
SG-GAN [22]30.49230.82000.031252.78530.45320.31815.4784
BLG-GAN (1-stage)30.85580.82670.030650.46410.45800.28585.5300
BLG-GAN31.18710.83310.027248.81930.47690.24935.6032
Table 3. Analysis of the effect of the basic block type for generator G Y Y on SR performance on the WV3-1 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Table 3. Analysis of the effect of the basic block type for generator G Y Y on SR performance on the WV3-1 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Type   of   Basic   Block for   Generator   G Y Y PSNRSSIMSAMERGASUIQILPIPSNIQE
Residual block with BN [9]31.54950.83920.036160.19680.46450.29665.5730
Residual block without BN32.03680.84860.025752.38080.47400.28455.6019
RRDB [15]32.10780.85160.025551.93060.48310.27755.2930
RCAB [13]32.20620.85520.024251.48060.49270.26365.8955
Table 4. Analysis of the effect of the basic block type for generator G Y Y on SR performance on the WV3-2 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Table 4. Analysis of the effect of the basic block type for generator G Y Y on SR performance on the WV3-2 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Type   of   Basic   Block for   Generator   G Y Y PSNRSSIMSAMERGASUIQILPIPSNIQE
Residual block with BN [9]30.86590.82550.028951.21750.45720.30575.4436
Residual block without BN30.99090.82750.026949.78920.46200.30255.4305
RRDB [15]31.13590.83080.026449.08440.46640.29535.1734
RCAB [13]31.28220.83620.025548.21460.48060.28155.6038
Table 5. Analysis of the effect of type of discriminator architecture and GAN loss on SR performance on the WV3-1 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Table 5. Analysis of the effect of type of discriminator architecture and GAN loss on SR performance on the WV3-1 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Type of
Discriminator
Type of
GAN Loss
PSNRSSIMSAMERGASUIQILPIPSNIQE
SRGAN-D [9]Standard [51]31.13220.81820.034558.08340.43630.29055.1780
LSGAN [37]32.61980.87260.021948.98790.52870.29777.9888
RaGAN [52]30.72630.80990.038061.21680.43560.29235.1638
PatchGAN [42]Standard [51]31.93890.84550.025553.15230.48050.26515.6226
LSGAN [37]32.20620.85520.024251.48060.49270.26365.8955
Table 6. Analysis of the effect of type of discriminator architecture and GAN loss on SR performance on the WV3-2 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Table 6. Analysis of the effect of type of discriminator architecture and GAN loss on SR performance on the WV3-2 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Type of
Discriminator
Type of
GAN Loss
PSNRSSIMSAMERGASUIQILPIPSNIQE
SRGAN-D [9]Standard [51]30.40620.80840.034153.22630.43880.30425.3235
LSGAN [37]31.66270.85750.023346.21620.51630.31558.0493
RaGAN [52]30.53780.82370.034452.47000.45460.32335.6434
PatchGAN [42]Standard [51]30.98090.82530.027149.98440.46660.28525.3255
LSGAN [37]31.28220.83620.025548.21460.48060.28155.6038
Table 7. Analysis of the effect of type of perceptual loss on SR performance on the WV3-1 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Table 7. Analysis of the effect of type of perceptual loss on SR performance on the WV3-1 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Type of Perceptual LossPSNRSSIMSAMERGASUIQILPIPSNIQE
No perceptual loss32.20620.85520.024251.48060.49270.26365.8955
LPIPS [39]32.14160.85180.024751.84530.48830.23495.7999
VGG19-L132.03250.84740.026452.47610.48020.24516.0107
VGG19-L231.96280.84400.027852.87090.47340.24595.7429
Table 8. Analysis of the effect of type of perceptual loss on SR performance on the WV3-2 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Table 8. Analysis of the effect of type of perceptual loss on SR performance on the WV3-2 dataset. The best and second-best performances are indicated in bold and underlined, respectively.
Type of Perceptual LossPSNRSSIMSAMERGASUIQILPIPSNIQE
No perceptual loss31.28220.83620.025548.21460.48060.28155.6038
LPIPS [39]31.18710.83310.027248.81930.47690.24935.6032
VGG19-L131.08580.83190.027249.24140.47140.26295.5312
VGG19-L230.99040.82630.029349.83020.46170.26535.5133
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chung, M.; Jung, M.; Kim, Y. Enhancing Remote Sensing Image Super-Resolution Guided by Bicubic-Downsampled Low-Resolution Image. Remote Sens. 2023, 15, 3309. https://doi.org/10.3390/rs15133309

AMA Style

Chung M, Jung M, Kim Y. Enhancing Remote Sensing Image Super-Resolution Guided by Bicubic-Downsampled Low-Resolution Image. Remote Sensing. 2023; 15(13):3309. https://doi.org/10.3390/rs15133309

Chicago/Turabian Style

Chung, Minkyung, Minyoung Jung, and Yongil Kim. 2023. "Enhancing Remote Sensing Image Super-Resolution Guided by Bicubic-Downsampled Low-Resolution Image" Remote Sensing 15, no. 13: 3309. https://doi.org/10.3390/rs15133309

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop