Remote Sensing Single-Image Resolution Improvement Using A Deep Gradient-Aware Network with Image-Specific Enhancement

Qin, Mengjiao; Mavromatis, Sébastien; Hu, Linshu; Zhang, Feng; Liu, Renyi; Sequeira, Jean; Du, Zhenhong

doi:10.3390/rs12050758

Open AccessArticle

Remote Sensing Single-Image Resolution Improvement Using A Deep Gradient-Aware Network with Image-Specific Enhancement

by

Mengjiao Qin

¹,

Sébastien Mavromatis

²

,

Linshu Hu

¹,

Feng Zhang

¹

,

Renyi Liu

¹,

Jean Sequeira

² and

Zhenhong Du

^1,*

¹

School of Earth Sciences, Zhejiang University, Hangzhou 310027, China

²

Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille 13001, France

^*

Author to whom correspondence should be addressed.

Remote Sens. 2020, 12(5), 758; https://doi.org/10.3390/rs12050758

Submission received: 29 December 2019 / Revised: 21 February 2020 / Accepted: 21 February 2020 / Published: 26 February 2020

Download

Browse Figures

Versions Notes

Abstract

Super-resolution (SR) is able to improve the spatial resolution of remote sensing images, which is critical for many practical applications such as fine urban monitoring. In this paper, a new single-image SR method, deep gradient-aware network with image-specific enhancement (DGANet-ISE) was proposed to improve the spatial resolution of remote sensing images. First, DGANet was proposed to model the complex relationship between low- and high-resolution images. A new gradient-aware loss was designed in the training phase to preserve more gradient details in super-resolved remote sensing images. Then, the ISE approach was proposed in the testing phase to further improve the SR performance. By using the specific features of each test image, ISE can further boost the generalization capability and adaptability of our method on inexperienced datasets. Finally, three datasets were used to verify the effectiveness of our method. The results indicate that DGANet-ISE outperforms the other 14 methods in the remote sensing image SR, and the cross-database test results demonstrate that our method exhibits satisfactory generalization performance in adapting to new data.

Keywords:

super-resolution; CNN; remote sensing; deep gradient-aware network; image-specific enhancement

Graphical Abstract

1. Introduction

Remote sensing (RS) has become an indispensable technology for various applications, including agricultural survey, global surface monitoring, and climate change detection. However, owing to the limitations of RS devices, atmospheric disturbances, and other uncertain factors, it is hard to obtain images at the desired resolution [1]. Low-resolution RS images are gradually becoming an obstacle to many advanced tasks, such as finer-scale land cover classification [2], object recognition [3], and precise road extraction [4].

Super-resolution (SR) methods are devoted to improving image resolution beyond the acquisition equipment limits [5]. SR has the advantages of low cost, easy implementation, and high efficiency compared to updating image acquisition devices. Remote sensing super-resolution (RS-SR) approaches can be roughly divided into two categories: single-image SR (SISR) and multi-image SR (MISR). The former requires only one image of the target scene for generating high-resolution output [5], while the latter requires multiple images that differ in terms of their satellite, angles, and sensors. For example, the authors of [6] improved the spatial resolution of Landsat images by fusion with SPOT5 images; the authors of [7] fused the information of multi-angle images for super-resolution; the authors of [8,9] merged multispectral images with panchromatic images to generate images with high spatial and spectral resolutions. MISR usually requires the input of multiple low-resolution (LR) images of the same region, which are difficult to collect in practical applications. Additionally, the feature extraction and fusion process with various resolutions and sensors are time-consuming, which restricts the applications of these techniques in real scenarios. Therefore, SISR is typically used.

According to the evolutionary trend and the complexity of the methods, we roughly divided the SISR approaches into interpolation-based approaches and machine learning-based approaches. Additionally, as deep learning methods (which are part of machine learning methods) have boomed in recent years and have achieved great success in SISR, we separated deep learning-based methods from machine learning methods into a third category.

The main idea of interpolation-based SISR is to locate each pixel of an HR image to be restored in the corresponding LR images and to interpolate the pixel’s value accordingly [10]. Bicubic, bilinear, and nearest-neighbor interpolation approaches are commonly used, and some novel interpolation methods are available [11,12,13]. Interpolation-based approaches offer a simple and fast way to improve image resolution [14]. However, they restore the missing values from a local perspective; thus, the generated images usually lack detailed information.

Machine-learning-based approaches attempt to overcome the shortcomings of interpolation-based approaches by a data-driven mechanism. The neighbor embedding method [15], sparsity-based methods [16,17,18,19], local regression [20], self-similarity algorithm [21], anchored neighborhood regression [22,23], and naive Bayes [24] are effective machine learning SISR approaches. Accurate representation of image features is key to the success of machine learning methods [25]. However, the expression ability of handcrafted features in machine learning is limited; therefore, it is difficult to handle complex data with high quality and high resolution.

In recent years, deep learning-based methods have demonstrated their powerful nonlinear expression and deep feature extraction capabilities. [26] first introduced the convolutional neural network (CNN) to solve SISR tasks, which showed excellent performance. Subsequently, many CNN-based methods emerged, such as in [27,28,29]. Residual learning [25,28,30,31,32,33,34,35] has been proposed to relieve the difficulty of training deeper networks and to improve SR performance. Some models use prior information, such as edges [36] and segmentation probability maps [37], to improve the details and fidelity of the super-resolved images. Deep Laplacian pyramid networks (LapSRN) [38], EnhanceNet [39], and the enhanced deep residual network (EDSR) [40] have all achieved great success in SISR.

However, some challenges are encountered. First, preserving the geographic information such as terrain, structure, and edge details precisely is of great significance for RS-SR, as this information can strongly affect the accuracy of subsequent analysis. The image gradient, which can sensitively reflect the changes in small details of an image [41], is highly important for RS images, and many RS applications have utilized this image gradient information. For instance, [42] used the gradient map to represent the topographic surface, and [43] used the gradient map to extract object boundaries for satellite image classification.

Moreover, the available supervised learning methods perform much better when the test images and the training set are highly similar; however, if the test images differ substantially from the training set, the results may be strongly affected. Because RS images are acquired from different sensors, their spectral, temporal, and spatial resolutions are different. It is difficult to include all scenarios in the training set, which limits the practicality of the existing supervised models.

In view of the above problems, the deep gradient-aware network with the image-specific enhancement (DGANet-ISE) method is proposed to obtain higher resolution RS images. Specifically, an enhanced deep residual network is constructed to learn the relationship between low-resolution (LR) and high-resolution (HR) images. During the training phase, a new gradient-aware loss is proposed in this network to promote the extraction of more high-frequency information, and to generate HR images with more detail. Additionally, when faced with inexperienced images, the image-specific enhancement approach is designed in the test phase to further improve SR performance. Each test LR image is inputted into the DGANet to obtain a relevant super-resolved HR image. Then, the HR image is further improved via the ISE method. No additional information is needed in this module, which focused on the specific characteristics of the single image and on realizing adaptive enhancement. In summary, our contributions are fourfold:

(1): A new SISR method, DGANet-ISE, which includes a deep gradient-aware network and an image-specific enhancement approach, is proposed to improve the spatial resolution of RS images.
(2): A deep gradient-aware network is proposed to model the complex relationship between LR and HR images. A new gradient-aware loss is designed in the training process to preserve more image gradient information in the super-resolved RS images.
(3): This paper proposes an image-specific enhancement approach to further improve the SISR performance of RS images and to enhance the generalization capability and the adaptability of our method when facing inexperienced images.
(4): Three data sets are used for evaluating the performance of DGANet-ISE. Compared to 14 methods, the experimental results indicate the superiority of DGANet-ISE.z.

2. Materials and Methods

The objective of SISR is to construct an HR image from an LR image

I_{L R}

. Representing the target HR image and the estimated image as

I_{H R}

. and

I_{H R^{'}}

, respectively, the more similar

I_{H R^{'}}

. and

I_{H R}

. are, the better the SR effect is. The images have

C

. color channels, and

W

is the width and

H

is the height of the images.

t

refers to the upscaling factor. In this section, a detailed description of the proposed method is presented. In addition, the evaluation criteria and comparison methods are introduced.

2.1. Overview of the Proposed Method

In this work, a new RS-SR method, DGANet-ISE, is proposed, as illustrated in Figure 1. The method mainly involves training and testing phases. The training phase aims at learning the complex relationships between LR and corresponding HR images. DGANet is the core element of this phase, and it is based on the enhanced residual network. This model employs residual and skip connections to devise a deep architecture, and it exhibits strong feature representation and nonlinear fitting abilities. As geographic information such as the terrain and edges are highly important for RS image interpretation, the precise preservation of more geographic details in the super-resolved images should be a focus of research. However,

L 1

and

L 2

losses generate blurry images in generation problems [44]. Therefore, we innovatively propose a new gradient-aware loss to alleviate this problem. Gradient-aware loss imposes gradient constraints to focus on the high-frequency signals in RS images, such as boundaries, edges, and terrains. As such, the proposed model can generate HR images with more detailed geographic information.

In the testing phase, the trained DGANet model is applied to generate primary HR results. However, if the testing images differ substantially from the training set (e.g., the images collected from different satellites), the performances of existing supervised learning methods will be strongly affected. To address this problem, an unsupervised learning-based enhancement approach, ISE, is introduced to further improve the performance and generate final HR images. ISE uses each input LR image to supplement more global information, and it has several advantages: (a) no additional supplementary data are required; (b) it focuses on the specific information and features of each test image, i.e., image-specific enhancement; (c) it boosts the generalization performance on inexperienced data sets.

2.2. Structure of DGANet

DGANet is proposed to extract the deep features of an LR image and upscale LR feature maps into HR output, as shown in Figure 2. The network contains different types of layers, including convolutional (Conv) layers, rectified linear unit (ReLU) layers, and pixel-shuffle layers. Based on these layers, residual blocks (ResBlocks) and upsample blocks (UpBlocks) are subsequently constructed. The ResBlocks are employed to construct deep networks, and UpBlocks to efficiently transform low-resolution feature maps into high-resolution size.

ResBlock: In each ResBlock, there are two Conv layers and one ReLU layer. The output is obtained by summing the input and the result obtained through the Conv and ReLU layers, as shown in Figure 2.

UpBlock: To transform the input feature maps to the target HR image, the UpBlocks are constructed based on one Conv layer and one pixel-shuffle layer.

t

represents the upscale factor. The Conv layer transforms the feature maps to

t^{2}

channels and the pixel-shuffle layer is a periodic shuffling operator that rearranges the elements of a tensor

C \times W \times H \times t^{2}

to a tensor of shape

C \times t W \times t H

(Figure 3). Each UpBlock upscales 2 times. Therefore, if the upscale factor is 2, one UpBlock is used; if the upscale factor is larger, multiple UpBlocks are used to improve the size gradually.

According to Figure 2, the whole network mainly consists of four modules: low-level features are extracted from the input image with Conv layers at the first module; the ResBlocks in the second module are used to learn more complex and deeper features; the third module transforms the feature maps from the LR domain to the HR domain; the last module is the global residual block, in which the input image is interpolated to high resolution directly to provide a large number of global signals of the input image.

Take an upscale factor of 8 as an example (the size of the input LR image is

32 \times 32

), Table 1 gives an exact representation of the components of DGANet. The kernel size of each Conv layer is

3 \times 3

. Five ResBlocks are used to extract the deep features sufficiently in this network. For the global residual block, the bilinear interpolation algorithm is used to directly transform the input image to another image with the target resolution. In addition, the Adam optimizer was used to train the model, and the loss function will be introduced in the next section. The learning rate was 0.0001 and was halved every 40 epochs, and the batch size in the training phase was 32.

2.3. Gradient-Aware Loss

As images usually change quickly at the boundaries between objects, the image gradient is significant in boundary detection [41]. Therefore, the gradient information is usually used to extract object edges [43] and to reflect the changes in the surface’s topography [42] from RS images.

L 1

and

L 2

losses are widely used in deep learning applications, which are expressed below, where

M

is the number of samples of one batch. The batch size was 32 in this work.

L 1 = \frac{\sum_{m = 1}^{M} | I_{H R}^{m} - I_{H R^{'}}^{m} |}{M}

(1)

L 2 = \frac{\sum_{m = 1}^{M} {(I_{H R}^{m} - I_{H R^{'}}^{m})}^{2}}{M}

(2)

Although they can accurately represent the global pixel difference between images, they usually lead to smooth and blurred images [45], and little attention is paid to the important structure and edge information of RS images. Therefore, we propose a new gradient-aware loss (

L_{G a}

) that facilitates the preservation of more edge and structure information and generates sharper HR images. The Sobel operator (

S o b

) [46] can effectively detect edges, and it enhances high spatial frequency details. Therefore, this operator is applied to generate gradient maps of the target image

I_{H R}

and the predicted

I_{H R^{'}}

images.

G M (I_{H R}) = S o b (I_{H R})

(3)

G M (I_{H R^{'}}) = S o b (I_{H R^{'}})

(4)

The gradient-aware loss is defined as the mean of

M

samples. The mean absolute error is used to evaluate the difference between gradient maps

G M (I_{H R})

and

G M (I_{H R^{'}})

, as shown below:

L_{G a} = \frac{\sum_{m = 1}^{M} | G M (I_{H R}^{m}) - G M (I_{H R^{'}}^{m}) |}{M}

(5)

To maintain the balance between the global pixel error and the gradient error, a hyperparameter

k

is introduced. From the experiment, we found that

k

works best at 0.1 (as shown in Appendix A). Therefore,

k = 0.1

was used in this paper, and the overall loss function can be formulated as

L_{a l l} = k \times L_{G a} + L 1

(6)

By using gradient-aware loss, more high-frequency information, such as edges and terrain, will be preserved in the super-resolution process, and sharper HR images with higher accuracy will be generated.

2.4. Image-Specific Enhancement

The performance of supervised methods largely depends on the training data set. If the test images differ substantially from the training set, the performance on inexperienced data will be greatly affected. As for RS images, they differ in terms of sensors, times, places, colors, types, and resolutions. It is hard to collect training samples that cover all scenarios, resulting in the insufficient generalization ability of many supervised SR methods. In this paper, the ISE algorithm is proposed to provide an effective solution to improve the generalization and adaptability of our method.

The core strategy of ISE is to back-project the error between the emulated and actual LR image to the SR image and to iteratively update it. This approach is inspired by the iterative back-projection algorithm (IBP) [47]. The specific procedure of ISE is presented in Algorithm 1, where the iteration number is

n

and

N

is the largest iteration number. The input of ISE is the original LR input

I_{L R}

and the predicted HR image

I_{H R^{'}}

obtained from the DGANet. The

I_{H R^{'}}

is firstly downsampled to the LR domain (

I_{L R^{'}}

). The difference image (

D i f f_{L}

) between

I_{L R}

and

I_{L R^{'}}

is calculated. Then, the difference image is upsampled to the HR size, and subsequently,

I_{H R^{'}}

can be updated by adding

D i f f_{H}

to the predicted image. This process is repeated iteratively until the difference is sufficiently small or the

N

has been reached. The bilinear interpolation algorithm was used for upsampling and downsampling operations in this paper.

ISE is based on the assumption that if the estimated HR image is closer to the target image, the LR image

I_{L R^{'}}

derived from the estimated HR image should be more similar to the input LR image. By backward-projecting the difference image between

I_{L R}

and

I_{L R^{'}}

to the super-resolved HR image, more differences can be considered in the estimated HR image. We can thereby obtain better SR results.

Algorithm 1. ISE

Input: LR input (

I_{L R}

), predicted HR image (

I_{H R^{'}}

);

Output: Enhanced HR image (

I_{E H R}

);

While

n < N

or

D i f f_{H}

is not sufficiently small

1.

I_{L R^{'}} = downsample (I_{H R^{'}})

2.

D i f f_{L} = I_{L R} - I_{L R^{'}}

3.

D i f f_{H} = u p s a m p l e (D i f f_{L})

4.

I_{H R^{'}} = I_{H R^{'}} + D i f f_{H}

5.

n = n + 1

I_{E H R} = I_{H R^{'}}

Return

I_{E H R}

The main advantages of ISE include that it requires no additional images for training. Furthermore, as ISE focuses on the characteristics of every single image, the image-specific information is used to further improve the quality of the super-resolved HR image. This way, the limitations that are due to the training data set can be effectively alleviated.

2.5. Evaluation Criteria and Baselines

2.5.1. Evaluation Criteria

The mean squared error (MSE), peak signal-to-noise ratio (PSNR) [48], and structural similarity index (SSIM) [49] are used to evaluate the performance of models, which are expressed as follows:

M S E = \frac{\sum_{i = 1}^{t W} \sum_{j = 1}^{t H} {(X_{i, j} - {\hat{X}}_{i, j})}^{2}}{t^{2} W H}

(7)

P S N R = 20 \times l o g_{10} \frac{MAX}{R M S E (X, \hat{X})}

(8)

S S I M = \frac{(2 μ_{X} μ_{\hat{X}} + c_{1}) \times (2 σ_{X \hat{X}} + c_{2})}{(μ_{X}^{2} + μ_{\hat{X}}^{2} + c_{1}) \times (σ_{X}^{2} + σ_{\hat{X}}^{2} + c_{2})}

(9)

where

X

is the target high-resolution image;

\hat{X}

is the super-resolved image which is generated from the low-resolution image; tW and tH are the width and the height of the HR image, respectively;

M A X

represents the maximum pixel value in the original

X

image;

R M S E

is the root mean squared error;

μ_{X}

and

μ_{\hat{X}}

are the average pixel values of

X

and

\hat{X}

, respectively;

σ_{X}

and

σ_{\hat{X}}

are the variances of

X

and

\hat{X}

, respectively; and

σ_{X \hat{X}}

is the covariance of

X

and

\hat{X}

. Moreover,

c_{1} = {(k_{1} L)}^{2}

,

c_{2} = {(k_{2} L)}^{2}

, where both variables are used to stabilize division with a weak denominator, and

L

is the dynamic range of the pixel values. The default values of

k_{1}

and

k_{2}

are

k_{1} = 0.01

and

k_{2} = 0.03

.

M S E

is commonly used to measure the error of super-resolved images. An

M S E

that is closer to 0 implies a higher model accuracy.

P S N R

is measured in decibels (dB).

S S I M

is used to measure the similarity between two images [5]; the larger the value of

P S N R

and SSIM, the better the SR performance.

2.5.2. Methods to be Compared

Fourteen widely used SISR methods, which are shown in Table 2, were compared with DGANet-ISE. These SR methods use LR input images with three channels to generate super-resolved images. The input and output schemes were the same as proposed DGANet-ISE. The theory and characteristics of these methods were described in the corresponding references shown in Table 2.

3. Results

In this section, the data sets and the implementation details are introduced firstly. Subsequently, the performance of DGANet-ISE is verified by comparison with 14 different SR methods, and cross-database tests are performed to evaluate the generalization ability of our method.

3.1. Data Sets and Implementation Details

3.1.1. Data Sets

Three different data sets were employed to verify the effectiveness and superiority of DGANet-ISE.

RSI-CB256 [55] contains 35 categories and about 24,000 images, which were collected for scene classification. This data set is rather challenging as the scenes are widely different. The pixel size of the images is 256 × 256 with 0.3–3 m spatial resolutions. Figure 4 shows the samples for 10 categories in the RSI-CB256 data set.

UCMerced [56] consists of 21 classes of land use images, including agricultural, airplane, beach, buildings, etc. Each class has 100 images with 256 × 256 pixels, and the spatial resolution is 1 foot (

\approx

0.3 m).

Landsat-test is widely used, and the super-resolution of these images is of great value for many applications, such as finer land cover monitoring. The test images used in this paper were Landsat 5 TM data of band 7 (2.08–2.35 µm), band 4 (0.76–0.90 µm), and band 2 (0.52–0.60 µm). The data set was downloaded from the Google Earth Engine (https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LT05_C01_T1). The spatial resolution is 30 m. During the experiment, we cut the image into approximately 400 small images of 256 × 256 pixels. This data set is different from the RSI-CB256 and UCMerced data sets in two aspects: (1) the resolution of the Landsat data set is much smaller than the other two data sets, resulting in very different geographic information and scene content in the same image size; (2) the first two data sets are artificially selected into categories, while Landsat-test data set is a real unselected scene.

3.1.2. Implementation Details

As there are not enough corresponding LR–HR image pairs in reality, we downsampled the images using the Bicubic interpolation (BCI) algorithm with a factor of

t \in {2, 4, 8}

to obtain LR images of different scales. Many studies, such as [29,35], have used BCI to generate LR–HR image pairs. The RSI-CB256 data set was used to compare our model with commonly used SISR methods, and all images were randomly divided into training samples (80%) and test samples (20%). Furthermore, in order to evaluate the generalization ability of our model, the Landsat-test data set was utilized for a cross-database test. That is, the models trained on RSI-CB256 were directly applied to the SR experiments of Landsat-test without any tuning. This is difficult because the data were collected from different sensors and have different spatial resolutions.

In addition, the UCMerced data set was randomly split into two balanced halves for training and testing, according to [35], which is consistent with other RS-SR methods, including CNN-7 [25], LGCNet [25], DCM [29], etc. Therefore, we compared our method with these RS-SR methods using the UCMerced data set.

The interpolation-based and deep learning-based methods were implemented in Python, and the machine learning-based methods were implemented in MATLAB. In addition, the models were used with the default settings suggested by the authors. The generation of LR images and the calculation of the evaluation criteria were implemented in the same Python environment to ensure the consistency and accuracy of the results.

3.2. Comparison with Baselines

In this section, we experimented on the RSI-CB256 and UCMerced data sets, respectively. UCMerced is widely used in RS-SR; therefore, we compared our methods with the following RS-SR methods available in the literature [29], which experimented on the UCMerced data set as well. The upscale factors were 2 and 4, and the PSNR and SSIM results are shown in Table 3. In addition, Figure 5 provides a more vivid presentation of the comparison of the PSNR and SSIM results.

According to the table, compared with recent RS-SR methods, DGANet-ISE yields the best results. Although the PSNR values of our method and DCM are close, our SSIM results are larger than those of all of the compared approaches. The experimental results indicate that our method is good at structure reconstruction of RS images.

Additionally, another experiment was conducted on the RSI-CB256 data set, and DGANet-ISE was compared with three kinds of SISR methods: interpolation-based methods (nearest-neighbor interpolation (NNI), bilinear interpolation (BLI), and BCI), machine learning-based methods (simple functions (SF), neighbor embedding (NE), and classical sparsity-based super-resolution (SCSR)), and deep learning-based methods (super-resolution convolutional neural network (SRCNN), efficient sub-pixel convolutional network (ESPCN), and enhanced deep super-resolution network (EDSR)). The experiments were conducted with three different upscale factors, i.e.,

t \in {2, 4, 8}

. The detailed results are presented in Table 4. In addition to the quantitative assessments, visual results are provided for a qualitative and intuitive evaluation. Three test images of a parking lot, a residence, and a dam were chosen as examples, and Figure 6, Figure 7 and Figure 8 show the super-resolved HR images for

t \in {2, 4, 8}

, respectively.

From the global perspective, it is obvious that as the upscale factor increases, the accuracy gradually decreases, because achieving super-resolution from a lower resolution image is much more difficult. Furthermore, according to Table 4, DGANet-ISE significantly outperforms the baselines. With the exception of DGANet-ISE, the results of EDSR are the best among the other methods. The PSNR values of DGANet-ISE are 0.33 dB, 0.26 dB, and 0.18 dB larger than those of EDSR when t is 2, 4, and 8, respectively.

BCI, SCSR, and DGANet-ISE are the most prominent within the interpolation, machine-learning, and deep learning categories, respectively. For example, when t = 2, the MSE values of BCI and SCSR are 69.74 and 48.01, respectively, much smaller than the other methods of the same category. With regard to deep learning methods, our proposed model exhibits superior performance on the SISR task. The MSE and PSNR results of our method are the best among all of the methods. For instance, when t = 2, the error of our proposed method is the smallest (MSE = 31.26), while PSNR = 37.92 dB and SSIM = 0.9477 are the largest.

Focusing on the visual results (Figure 6, Figure 7 and Figure 8), the HR images super-resolved by different methods vary in terms of their features. For example, images obtained from NNI and NE have regular dense squares, similar to mosaics. This causes very blurred edges of these super-resolved images. The reason is that both algorithms rely heavily on the values of neighboring pixels or patches and ignore other significant structural details. NE performs better as it considers that the LR patches and the corresponding HR patches have similar local geometries.

In addition, images super-resolved via BLI and SCSR have very blurred boundaries (Figure 8d,h). We conducted a thorough analysis and found that both the BLI and SCSR algorithms use linear features for SISR, which results in the emergence of stripes when t increases. The basic idea of SCSR supposes that a signal can be represented as a sparse linear combination with respect to an overcomplete dictionary. A linear feature extraction operator behaves like a high-pass filter for feature representation of LR image patches [52]. In this way, the SCSR images have sharper edges than those of NE and BLI.

The HR images (Figure 7i–l) obtained by the deep learning methods generate smoother textures than the other two kinds of methods. DGANet-ISE yields more competitive visual results, because they are the most similar to their ground-truth HR counterparts. Our proposed DGANet-ISE recovers more texture and structure details, such as the edges of the dam in Figure 8. The edges of the dam generated by our method are very clear, whereas the edges are blurred or even missing in other images. This is because the gradient-aware loss proposed in our method can capture more gradient information of the RS images, which can facilitate the generation of sharper edges and textures. In summary, the proposed method performs the best compared to other methods.

3.3. Cross-Database Test

To further evaluate the robustness and generalization capability of DGANet-ISE, the Landsat-test data set was applied for a cross-database test, i.e., the models trained on the RSI-CB256 data set were directly used to super-resolve the Landsat-test data set without any fine-tuning or modification. The results of the comparison between the different approaches under the three upscale factors are presented in Table 5, and Figure 9 shows the visual results of the different methods on an example image.

The Landsat-test data set differs substantially from the training samples, since the spatial resolution of training samples is 0.3–3m, while the spatial resolution of Landsat-test images is 30 m, and the scene contents in the same image size are very different. Therefore, it poses a great challenge to apply these supervised learning models directly to such test images.

According to the results, our proposed method shows a reasonably satisfactory performance compared to the other methods. For instance, when t = 2, the PSNR is 40.70 dB, which is 0.64 dB larger than that of EDSR and 1.57 dB larger than that of SCSR. This is because the ISE algorithm focuses on the personalized features of each test image, and provides a fast additional improvement of the super-resolved HR image obtained from the deep gradient-aware network. In summary, DGANet-ISE has strong robustness and generalization capability on new data sets.

4. Discussion

In this section, ablation studies were conducted to analyze the performance of the proposed gradient-aware loss and the impact of the ISE algorithm.

4.1. Dependency on the Type of Loss Functions

For evaluating the effectiveness of gradient-aware loss, we trained the same network with the

L 1

,

L 2

, and gradient-aware losses on training samples of the RSI-CB256 data set. The detailed results on the test samples of the RSI-CB256 data set are presented in Table 6. As we can see, the PSNR results of

L 1

loss are slightly better than those of the

L 2

loss, and our proposed loss function yields the best results. When the upscale factor is 2, the PSNR obtained with the gradient-aware loss (37.92 dB) is 0.2 dB better compared to the L1 loss (37.72 dB) and the L2 loss (37.66 dB).

To assess the detailed information preservation capability of the gradient-aware loss, an image that belongs to the “Residence” class was selected for further analysis. Figure 10 presents the reconstructed HR images obtained using the three-loss functions with an upscale factor of 8. The PSNR result of the proposed loss is 25.54 dB, which is 0.15 dB larger than L1 loss. The shape and the direction of the swimming pool that is obtained with the gradient-aware loss are much closer to reality. In addition, the PSNR and SSIM results obtained by our gradient-aware loss are much better than

L 1

and

L 2

losses. In summary, our proposed gradient-aware loss is able to generate HR images more accurately.

4.2. Impact of ISE

Since ISE aims to improve the SR results of inexperienced data, in this part, cross-data set test performances of the method with and without ISE were compared. The DGANet and EDSR trained on the RSI-CB256 data set were directly applied to the Landsat-test data. The results are presented in Table 7 and Figure 11 gives a visual comparison. The results in Table 7 illustrate that the ISE approach can further improve the SR performance. For instance, when the upscale factor is 2, the PSNR of EDSR and DGANet is improved by 0.46 dB and 0.23 dB, respectively. By using the specific information of the test image, ISE can further enhance the HR results of DGANet. As ISE is an unsupervised approach, it can greatly improve the generalization and adaptivity of the models, and reduce the dependence on the training set.

5. Conclusions

A new SISR method, DGANet-ISE, was proposed in this work to increase the spatial resolution of RS images. Specifically, the deep gradient-aware network and image-specific enhancement are two important components of DGANet-ISE. The first part extracts features and learns the precise representation between the LR and HR domains. The gradient-aware loss was first proposed in this part to preserve the important gradient information of RS images. Image-specific enhancement was used to improve the robustness and adaptability of our method, thus further improving the performance. Based on the above, three different data sets were used as experimental data sets. The proposed DGANet-ISE was applied to super-resolve the RS images with three upscale factors, which yielded remarkable results. Compared to other SISR methods, DGANet-ISE is better from both quantitative and visual perspectives. Moreover, the results of the cross-database test demonstrate that the proposed approach exhibits strong generalization and robustness, and provides an excellent opportunity for practical applications.

Author Contributions

Conceptualization, M.Q. and Z.D.; methodology, M.Q.; validation, M.Q., L.H., and S.M.; supervision, Z.D., S.M., and J.S.; investigation, M.Q., L.H., S.M., and J.S.; writing—original draft preparation, M.Q.; writing—review and editing, M.Q.; project administration, Z.D., F.Z., and R.L.; resources, R.L.; funding acquisition, Z.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key Research and Development Program of China under Grants 2016YFC1400903 and 2018YFB0505000, and the National Natural Science Foundation of China (41871287).

Acknowledgments

Thanks to all the anonymous reviewers for their constructive and valuable suggestions on the earlier drafts of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

To analyze the influence of the hyperparameter k value on the model, we plotted the trends of the average PSNR and SSIM values of

k = {1, 0.1, 0.01, 0.001, 0.0001, 0.00001}

versus the number of iterations (up to 200 epochs) in Figure A1. The experiment was conducted with an upscale factor of 2. When

k = 1

, the PSNR is lower than other cases. In terms of the SSIM results,

k = 1

and 0.1 performed better than the others. Therefore, in this paper, we used

k = 0.1

as the final weight of the gradient-aware loss.

Figure A1. Comparison of peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) between the five different k values.

References

Yang, D.; Li, Z.; Xia, Y.; Chen, Z. Remote sensing image super-resolution: Challenges and approaches. In Proceedings of the 2015 IEEE International Conference on Digital Signal Processing (DSP), Singapore, 21–24 July 2015; pp. 196–200. [Google Scholar]
Tatem, A.J.; Lewis, H.G.; Atkinson, P.M.; Nixon, M.S. Super-resolution land cover pattern prediction using a hopfield neural network. Remote Sens. Environ. 2002, 79, 1–14. [Google Scholar] [CrossRef]
Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef]
Alshehhi, R.; Marpu, P.R.; Woon, W.L.; Mura, M.D. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2017, 130, 139–149. [Google Scholar] [CrossRef]
Haut, J.M.; Fernandez-Beltran, R.; Paoletti, M.E.; Plaza, J.; Plaza, A.; Pla, F. A new deep generative network for unsupervised remote sensing single-image super-resolution. IEEE Trans. Geosci. Remote Sens. 2018, 11, 6792–6810. [Google Scholar] [CrossRef]
Song, H.; Huang, B.; Liu, Q.; Zhang, K. Improving the spatial resolution of Landsat TM/ETM+ through fusion with SPOT5 images via learning-based super-resolution. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1195–1204. [Google Scholar] [CrossRef]
Zhang, H.; Yang, Z.; Zhang, L.; Shen, H. Super-resolution reconstruction for multi-angle remote sensing images considering resolution differences. Remote Sens. 2014, 6, 637–657. [Google Scholar] [CrossRef]
Lanaras, C.; Baltsavias, E.; Schindler, K. Hyperspectral super-resolution by coupled spectral unmixing. In Proceedings of the IEEE International Conference on Computer Vision, Las Condes, Chile, 11–18 December 2015; pp. 3586–3594. [Google Scholar]
Yi, C.; Member, S.; Zhao, Y.; Chan, J.C. Hyperspectral image super-resolution based on spatial and spectral correlation fusion. IEEE Trans. Geosci. Remote Sens. 2018, 56, 4165–4177. [Google Scholar] [CrossRef]
Xu, Z.; Wang, X.; Chen, Z.; Xiong, D.; Ding, M.; Hou, W. Nonlocal similarity based DEM super resolution. ISPRS J. Photogramm. Remote Sens. 2015, 110, 48–54. [Google Scholar] [CrossRef]
Gunturk, B.K.; Glotzbach, J.; Altunbasak, Y.; Schafer, R.W.; Mersereau, R.M. Demosaicking: Color filter array interpolation. IEEE Signal Process. Mag. 2005, 22, 44–54. [Google Scholar] [CrossRef]
Li, X.; Orchard, M.T. New edge-directed interpolation. IEEE Trans. Image Process. 2001, 10, 1521–1527. [Google Scholar]
Zhang, L.; Wu, X. An edge-guided image interpolation algorithm via directional filtering and data fusion. IEEE Trans. Image Process. 2006, 15, 2226–2238. [Google Scholar] [CrossRef] [PubMed]
Wu, W.; Yang, X.; Liu, K.; Liu, Y.; Yan, B.; Hua, H. A new framework for remote sensing image super-resolution: Sparse representation-based method by processing dictionaries with multi-type features. J. Syst. Archit. 2016, 64, 63–75. [Google Scholar] [CrossRef]
Chang, H.; Yeung, D.; Xiong, Y.; Bay, C.W.; Kong, H. Super-resolution through neighbor embedding. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 27 June–2 July 2004; Volume 1, p. I-I. [Google Scholar]
Peleg, T.; Elad, M. A statistical prediction model based on sparse representations for single image super-resolution. IEEE Trans. Image Process. 2014, 23, 2569–2582. [Google Scholar] [CrossRef] [PubMed]
Dong, W.; Zhang, L.; Shi, G.; Wu, X. Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Trans. Image Process. 2011, 20, 1838–1857. [Google Scholar] [CrossRef] [PubMed]
Xinlei, W.; Naifeng, L. Super-resolution of remote sensing images via sparse structural manifold embedding. Neurocomputing 2016, 173, 1402–1411. [Google Scholar] [CrossRef]
Tang, S.; Xu, Y.; Huang, L.; Sun, L. Hyperspectral Image Super-Resolution via Adaptive Dictionary Learning and Double l1 Constraint. Remote Sens. 2019, 11, 2809. [Google Scholar] [CrossRef]
Gu, S.; Sang, N.; Ma, F. Fast image super resolution via local regression. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan, 11–15 November 2012; pp. 3128–3131. [Google Scholar]
Pan, Z.; Yu, J.; Huang, H.; Hu, S.; Zhang, A.; Ma, H.; Sun, W. Super-resolution based on compressive sensing and structural self-similarity for remote sensing images. IEEE Trans. Geosci. Remote Sens. 2013, 51, 4864–4876. [Google Scholar] [CrossRef]
Timofte, R.; De Smet, V.; Van Gool, L. Anchored neighborhood regression for fast example-based super-resolution. In Proceedings of the IEEE international conference on computer vision, Sydney, Australia, 1–8 December 2013; pp. 1920–1927. [Google Scholar]
Timofte, R.; De Smet, V.; Van Gool, L. A+: Adjusted anchored neighborhood regression for fast super-resolution. In Asian Conference on Computer Vision; Springer: New York, NY, USA, 2014; pp. 111–126. [Google Scholar]
Salvador, J.; Perez-Pellitero, E. Naive bayes super-resolution forest. In Proceedings of the IEEE International conference on computer vision, Santiago, Chile, 7–13 December 2015; pp. 325–333. [Google Scholar]
Lei, S.; Shi, Z.; Zou, Z. Super-resolution for remote sensing images via local-global combined network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1243–1247. [Google Scholar] [CrossRef]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In European Conference on Computer vision; Springer: New York, NY, USA, 2014; pp. 184–199. [Google Scholar]
Ledig, C.; Theis, L.; Husz, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2017, 2, 4681–4690. [Google Scholar]
Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1646–1654. [Google Scholar]
Haut, J.M.; Paoletti, M.E.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.; Li, J. Remote sensing single-image superresolution based on a deep compendium model. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1432–1436. [Google Scholar] [CrossRef]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3147–3155. [Google Scholar]
Ma, W.; Pan, Z.; Guo, J.; Lei, B. Achieving super-resolution remote sensing images via the wavelet transform combined with the recursive Res-Net. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3512–3527. [Google Scholar] [CrossRef]
Gu, J.; Sun, X.; Zhang, Y.; Fu, K.; Wang, L. Deep residual squeeze and excitation network for remote sensing image super-resolution. Remote Sens. 2019, 11, 1817. [Google Scholar] [CrossRef]
Lu, T.; Wang, J.; Zhang, Y.; Wang, Z.; Jiang, J. Satellite image super-resolution via multi-scale residual deep neural network. Remote Sens. 2019, 11, 1588. [Google Scholar] [CrossRef]
Haut, J.M.; Fernandez-Beltran, R.; Paoletti, M.E.; Plaza, J.; Plaza, A. Remote sensing image superresolution using deep residual channel attention. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9277–9289. [Google Scholar] [CrossRef]
Yang, W.; Feng, J.; Yang, J.; Zhao, F.; Liu, J.; Member, S. Deep edge guided recurrent residual learning for image super-resolution. IEEE Trans. Image Process. 2017, 26, 5895–5907. [Google Scholar] [CrossRef]
Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 606–615. [Google Scholar]
Lai, W.; Ahuja, N.; Yang, M.; Tech, V. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
Sch, B.; Hirsch, M. EnhanceNet: Single image super-resolution through automated texture synthesis. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4491–4500. [Google Scholar]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced deep residual networks for single image super-resolution. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Work. 2017, 1, 136–144. [Google Scholar]
Jacobs, D. Image Gradients. Available online: https://www.cs.umd.edu/~djacobs/CMSC426/ImageGradients.pdf (accessed on 3 September 2019).
Zhang, G.; Jia, X.; Hu, J. Superpixel-based graphical model for remote sensing image mapping. IEEE Trans. Geosci. Remote Sens. 2015, 53, 5861–5871. [Google Scholar] [CrossRef]
Borra, S.; Thanki, R.; Dey, N. Recurrent neural network to correct satellite image classification maps. SpringerBriefs Appl. Sci. Technol. 2019, 55, 53–81. [Google Scholar]
Li, Z.; Hu, Y.; Zhang, M.; Xu, M.; He, R. Protecting your faces: Meshfaces generation and removal via high-order relation-preserving CycleGAN. 2018 International Conference on Biometrics (ICB); IEEE: Piscataway, NJ, USA, 2018; pp. 61–68. [Google Scholar]
Huang, H.; He, R.; Sun, Z.; Tan, T. Wavelet-SRNet: A wavelet-based CNN for multi-scale face super resolution. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1689–1697. [Google Scholar]
Sobel, I. An Isotropic 3 × 3 Image Gradient Operator. Machine vision for three-dimensional scenes; Academic Press: Orlando, FL, USA, 1990; pp. 376–379. [Google Scholar]
Irani, M.; Peleg, S. Improving resolution by image registration. CVGIP Graph. Model. image Process. 1991, 53, 231–239. [Google Scholar] [CrossRef]
Huynh-Thu, Q.; Ghanbari, M. Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 2008, 44, 800–801. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Olivier, R.; Hanqiang, C. Nearest neighbor value interpolation. Int. J. Adv. Comput. Sci. Appl. 2012, 3, 1–6. [Google Scholar] [CrossRef]
Gao, S.; Gruev, V. Bilinear and bicubic interpolation methods for division of focal plane polarimeters. Opt. Express 2011, 19, 26161. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Member, S.; Wright, J.; Huang, T.S. Image super-resolution via sparse representation. IEEE Trans. image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef] [PubMed]
Yang, C.; Yang, M. Fast direct super-resolution by simple functions. In Proceedings of the IEEE international conference on computer vision, Sydney, Australia, 1–8 December 2013; pp. 561–568. [Google Scholar]
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1874–1883. [Google Scholar]
Li, H.; Tao, C.; Wu, Z.; Chen, J.; Gong, J.; Deng, M. RSI-CB: A large scale remote sensing image classification benchmark via crowdsource data. arXiv 2017, arXiv:1705.10450. [Google Scholar]
Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems; San Jose, CA, USA, 2–5 November 2010, ACM: New York, NY, USA, 2010; pp. 270–279. [Google Scholar]

Figure 1. The architecture of deep gradient-aware network with image-specific enhancement (DGANet-ISE). The training phase aims at learning a complex relationship between low-resolution (LR) and corresponding high-resolution (HR) images. DGANet is the core element of the training phase, and the gradient-aware loss is proposed for facilitating the preservation of more high-frequency information in the estimated HR images. In addition, the ISE approach is first proposed in the testing phase. By using the specific information of each test LR image, image-specific enhancement (ISE) can further boost the generalization of DGANet-ISE on inexperienced data sets.

Figure 2. Structure of DGANet. The network mainly consists of four modules: the first module extracts low-level features from the input images; the second module utilizes ResBlocks to learn more complex and deeper features; the third module transforms the feature maps from the LR domain to HR domain; the last module provides more global signals to the HR image through global residual learning.

Figure 3. The pixel-shuffle layer transforms feature maps from the LR domain to the HR image.

Figure 4. Ten example categories of the data set, including an airplane, bridge, city building, coastline, container, dam, forest, highway, parking lot, and residents.

Figure 5. Peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) comparison results using the UCMerced data set. (a) PSNR results of different methods; (b) SSIM results of different methods.

Figure 6. Super-resolution (SR) results obtained by different methods over the test image of a Parking lot with an upscale factor of 2. (a) LR image, (b) ground-truth HR image, (c) NNI, (d) BLI, (e) BCI, (f) SF, (g) NE, (h) SCSR, (i) SRCNN, (j) ESPCN, (k) EDSR, and (l) DGANet-ISE.

Figure 7. SR results obtained by different methods over the test image of a Residence with an upscale factor of 4. (a) LR image, (b) ground-truth HR image, (c) NNI, (d) BLI, (e) BCI, (f) SF, (g) NE, (h) SCSR, (i) SRCNN, (j) ESPCN, (k) EDSR, and (l) DGANet-ISE.

Figure 8. SR results obtained by different methods over the test image of a Dam with an upscale factor of 8. (a) LR image, (b) ground-truth HR image, (c) NNI, (d) BLI, (e) BCI, (f) SF, (g) NE, (h) SCSR, (i) SRCNN, (j) ESPCN, (k) EDSR, and (l) DGANet-ISE.

Figure 9. SR results obtained by different methods over the Landsat-test image with an upscale factor of 4. (a) LR image, (b) ground-truth HR image, (c) NNI, (d) BLI, (e) BCI, (f) SF, (g) NE, (h) SCSR, (i) SRCNN, (j) ESPCN, (k) EDSR, and (l) DGANet-ISE.

Figure 10. Images obtained using different loss functions with an upscale factor of 8: (a) ground-truth HR image, (b)

L 2

loss, (c)

L 1

loss, and (d) proposed loss.

Figure 10. Images obtained using different loss functions with an upscale factor of 8: (a) ground-truth HR image, (b)

L 2

loss, (c)

L 1

loss, and (d) proposed loss.

Figure 11. Images obtained with and without ISE with an upscale factor of 4: (a) ground-truth HR image, (b) EDSR, (c) EDSR-ISE, (d) DGANet, (e) DGANet-ISE.

Table 1. The specific architecture of DGANet when the upscale factor is 8.

Layer		Kernel Size	Number of Kernels	Output Size	Stride
Conv		3 × 3	64	32 × 32	1
ResBlock1	Conv	3 × 3	64	32 × 32	1
ResBlock1	Conv	3 × 3	64	32 × 32	1
ResBlock2	Conv	3 × 3	64	32 × 32	1
ResBlock2	Conv	3 × 3	64	32 × 32	1
ResBlock3	Conv	3 × 3	64	32 × 32	1
ResBlock3	Conv	3 × 3	64	32 × 32	1
ResBlock4	Conv	3 × 3	64	32 × 32	1
ResBlock4	Conv	3 × 3	64	32 × 32
ResBlock5	Conv	3 × 3	64	32 × 32	1
ResBlock5	Conv	3 × 3	64	32 × 32
Conv		3 × 3	64	32 × 32	1
UpBlock1	Conv	3 × 3	64 × 2 × 2	32 × 32	1
UpBlock1	pixel-shuffle			64 × 64
UpBlock2	Conv	3 × 3	64 × 2 × 2	64 × 64	1
UpBlock2	pixel-shuffle			128 × 128
UpBlock3	Conv	3 × 3	64 × 2 × 2	128 × 128	1
UpBlock3	pixel-shuffle			256 × 256
Conv		3 × 3	3	256 × 256	1

Table 2. Methods that are considered in the comparison experiments.

Methods	Abbreviation	Category	Reference
Nearest neighbor interpolation	NNI	Interpolation	[50]
Bilinear interpolation	BLI		[51]
Bicubic interpolation	BCI		[51]
Classical sparsity-based super resolution	SCSR	Machine learning	[52]
Neighbor embedding	NE		[15]
Simple functions	SF		[53]
Sparse coding	SC		[52]
Super resolution convolutional neural network	SRCNN	Deep learning	[26]
Efficient sub-pixel convolutional network	ESPCN		[54]
Enhanced deep super resolution network	EDSR		[40]
Fast SR convolutional neural network	FSRCNN		[53]
7 convolutional layers network	CNN-7		[25]
Local-global combined network	LGCNet		[25]
Deep compendium model	DCM		[29]

Table 3. Comparison results on the UCMerced data set.

Upscale Factor		BCI	SC	SRCNN	FSRCNN	CNN-7	LGCNet	DCM	DGANet-ISE
2	PSNR	30.76	32.77	32.84	33.18	33.15	33.48	33.65	33.68
2	SSIM	0.8789	0.9166	0.9152	0.9196	0.9191	0.9235	0.9274	0.9344
4	PSNR	25.65	26.51	26.78	26.93	26.86	27.02	27.22	27.31
4	SSIM	0.6725	0.7152	0.7219	0.7267	0.7264	0.7333	0.7528	0.7665

Table 4. Comparison between the different methods under different upscale factors on the RSI-CB256 data set (The bold values are the best among all the methods).

Methods	Category	$t = 2$			$t = 4$			$t = 8$
Methods	Category	MSE	PSNR	SSIM	MSE	PSNR	SSIM	MSE	PSNR	SSIM
NNI	Interpolation	88.25	31.90	0.8701	211.43	27.80	0.6886	357.27	25.48	0.5390
BLI		90.03	31.90	0.8531	205.46	27.99	0.6822	345.12	25.65	0.5544
BCI		69.74	33.29	0.8858	182.56	28.55	0.7087	321.93	25.96	0.5652
SF	Machine learning	53.16	34.65	0.9094	190.15	28.30	0.6956	365.47	25.40	0.5440
NE		81.48	32.33	0.8664	224.26	27.44	0.6497	404.56	24.69	0.5032
SCSR		48.01	34.94	0.9195	153.26	29.25	0.7505	293.43	26.31	0.5819
SRCNN	Deep learning	35.30	36.82	0.9414	131.71	29.93	0.7782	279.40	26.43	0.5921
ESPCN		32.94	37.35	0.9457	117.33	30.52	0.7961	254.08	26.82	0.6137
EDSR		32.34	37.59	0.9475	115.85	30.64	0.8007	249.25	27.04	0.6228
DGANet-ISE		31.26	37.92	0.9477	112.08	30.90	0.8046	240.17	27.22	0.6312

Table 5. SR results on the Landsat-test data set.

Methods	$t = 2$			$t = 4$			$t = 8$
Methods	MSE	PSNR	SSIM	MSE	PSNR	SSIM	MSE	PSNR	SSIM
NNI	18.05	36.91	0.9344	47.21	32.83	0.8387	92.84	30.00	0.7373
BLI	17.32	37.03	0.9320	42.61	33.21	0.8465	85.07	30.32	0.7601
BCI	13.08	38.23	0.9470	36.34	33.88	0.8627	75.97	30.77	0.7695
SF	11.00	38.99	0.9537	39.45	33.44	0.8495	90.52	29.90	0.7476
NE	13.74	37.72	0.9350	35.24	33.49	0.8373	78.57	30.07	0.7403
SCSR	10.50	39.13	0.9534	32.38	34.37	0.8741	69.69	31.15	0.7760
SRCNN	9.87	39.72	0.9612	34.79	34.53	0.8773	82.01	30.89	0.7688
ESPCN	10.66	39.63	0.9586	32.75	34.81	0.8835	84.25	31.12	0.7654
EDSR	8.98	40.06	0.9635	28.09	35.17	0.8907	71.30	31.41	0.7833
DGANet-ISE	7.23	40.70	0.9669	25.49	35.42	0.8951	61.37	31.76	0.7873

Table 6. Comparison of the results of different loss functions.

Upscale Factor	$L 2 loss$			$L 1 loss$			Proposed Loss
Upscale Factor	MSE	PSNR	SSIM	MSE	PSNR	SSIM	MSE	PSNR	SSIM
2	31.31	37.66	0.9483	31.85	37.72	0.9469	31.26	37.92	0.9477
4	114.39	30.67	0.7997	114.53	30.76	0.8015	112.08	30.90	0.8046
8	245.83	27.04	0.6197	245.77	27.10	0.6260	24.17	27.22	0.6312

Table 7. Comparison results of with or without image-specific enhancement (ISE).

Upscale Factor	EDSR			EDSR-ISE			DGANet			DGANet-ISE
Upscale Factor	MSE	PSNR	SSIM	MSE	PSNR	SSIM	MSE	PSNR	SSIM	MSE	PSNR	SSIM
2	8.98	40.06	0.9635	7.57	40.52	0.9665	8.01	40.47	0.9647	7.23	40.70	0.9669
4	28.09	35.17	0.8907	26.15	35.34	0.8933	26.10	35.37	0.8939	25.49	35.42	0.8951
8	71.30	31.41	0.7833	65.29	31.60	0.7840	62.28	31.72	0.7868	61.37	31.76	0.7873

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qin, M.; Mavromatis, S.; Hu, L.; Zhang, F.; Liu, R.; Sequeira, J.; Du, Z. Remote Sensing Single-Image Resolution Improvement Using A Deep Gradient-Aware Network with Image-Specific Enhancement. Remote Sens. 2020, 12, 758. https://doi.org/10.3390/rs12050758

AMA Style

Qin M, Mavromatis S, Hu L, Zhang F, Liu R, Sequeira J, Du Z. Remote Sensing Single-Image Resolution Improvement Using A Deep Gradient-Aware Network with Image-Specific Enhancement. Remote Sensing. 2020; 12(5):758. https://doi.org/10.3390/rs12050758

Chicago/Turabian Style

Qin, Mengjiao, Sébastien Mavromatis, Linshu Hu, Feng Zhang, Renyi Liu, Jean Sequeira, and Zhenhong Du. 2020. "Remote Sensing Single-Image Resolution Improvement Using A Deep Gradient-Aware Network with Image-Specific Enhancement" Remote Sensing 12, no. 5: 758. https://doi.org/10.3390/rs12050758

APA Style

Qin, M., Mavromatis, S., Hu, L., Zhang, F., Liu, R., Sequeira, J., & Du, Z. (2020). Remote Sensing Single-Image Resolution Improvement Using A Deep Gradient-Aware Network with Image-Specific Enhancement. Remote Sensing, 12(5), 758. https://doi.org/10.3390/rs12050758

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Remote Sensing Single-Image Resolution Improvement Using A Deep Gradient-Aware Network with Image-Specific Enhancement

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Proposed Method

2.2. Structure of DGANet

2.3. Gradient-Aware Loss

2.4. Image-Specific Enhancement

2.5. Evaluation Criteria and Baselines

2.5.1. Evaluation Criteria

2.5.2. Methods to be Compared

3. Results

3.1. Data Sets and Implementation Details

3.1.1. Data Sets

3.1.2. Implementation Details

3.2. Comparison with Baselines

3.3. Cross-Database Test

4. Discussion

4.1. Dependency on the Type of Loss Functions

4.2. Impact of ISE

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI