Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction

Qin, Yi; Nie, Haitao; Wang, Jiarong; Liu, Huiying; Sun, Jiaqi; Zhu, Ming; Lu, Jie; Pan, Qi

doi:10.3390/rs16162915

Open AccessArticle

Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction

by

Yi Qin

^1,2

,

Haitao Nie

¹,

Jiarong Wang

^1,*,

Huiying Liu

^1,2

,

Jiaqi Sun

^1,2,

Ming Zhu

¹,

Jie Lu

³ and

Qi Pan

³

¹

Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Satellite Information Intelligent Processing and Application Research Laboratory, Beijing 100192, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(16), 2915; https://doi.org/10.3390/rs16162915

Submission received: 27 June 2024 / Revised: 5 August 2024 / Accepted: 8 August 2024 / Published: 9 August 2024

(This article belongs to the Special Issue Intelligent Processing and Analysis of Multi-Modal Remote Sensing Data)

Download

Browse Figures

Versions Notes

Abstract

A variety of factors cause a reduction in remote sensing image resolution. Unlike super-resolution (SR) reconstruction methods with single degradation assumption, multi-degradation SR methods aim to learn the degradation kernel from low-resolution (LR) images and reconstruct high-resolution (HR) images more suitable for restoring the resolution of remote sensing images. However, existing multi-degradation SR methods only utilize the given LR images to learn the representation of the degradation kernel. The mismatches between the estimated degradation kernel and the real-world degradation kernel lead to a significant deterioration in performance of these methods. To address this issue, we design a reconstruction features-guided kernel correction SR network (RFKCNext) for multi-degradation SR reconstruction of remote sensing images. Specifically, the proposed network not only utilizes LR images to extract degradation kernel information but also employs features from SR images to correct the estimated degradation kernel, thereby enhancing the accuracy. RFKCNext utilizes the ConvNext Block (CNB) for global feature modeling. It employs CNB as fundamental units to construct the SR reconstruction subnetwork module (SRConvNext) and the reconstruction features-guided kernel correction network (RFGKCorrector). The SRConvNext reconstructs SR images based on the estimated degradation kernel. The RFGKCorrector corrects the estimated degradation kernel by reconstruction features from the generated SR images. The two networks iterate alternately, forming an end-to-end trainable network. More importantly, the SRConvNext utilizes the degradation kernel estimated by the RFGKCorrection for reconstruction, allowing the SRConvNext to perform well even if the degradation kernel deviates from the real-world scenario. In experimental terms, three levels of noise and five Gaussian blur kernels are considered on the NWPU-RESISC45 remote sensing image dataset for synthesizing degraded remote sensing images to train and test. Compared to existing super-resolution methods, the experimental results demonstrate that our proposed approach achieves significant reconstruction advantages in both quantitative and qualitative evaluations. Additionally, the UCMERCED remote sensing dataset and the real-world remote sensing image dataset provided by the “Tianzhi Cup” Artificial Intelligence Challenge are utilized for further testing. Extensive experiments show that our method delivers more visually plausible results, demonstrating the potential of real-world application.

Keywords:

ConvNext; multi-degradation; remote sensing images; super-resolution

Graphical Abstract

1. Introduction

As an important long-distance Earth observation, remote sensing technology is widely applied in meteorology, agriculture, geography, military, and other fields [1]. Remote sensing images, especially high-resolution (HR) images, serve as the data carriers of remote sensing technology, accurately representing the target and its background information. Therefore, it plays a crucial role in various remote sensing tasks such as change detection [2,3], semantic segmentation [4,5], object detection [6,7], and scene classification [8,9]. However, due to the influence of imaging environments and devices, the remote sensing images are often acquired as low-resolution (LR) images, limiting the practical applications [10]. Image super-resolution (SR) reconstruction technology, as an image processing algorithm, effectively addressed the above problem without changing the original imaging equipment. Therefore, research on the SR reconstruction of remote sensing images received extensive attention.

Image SR aims to reconstruct HR images using the limited information available in LR images. As a low-level task in computer vision, it has received increasing attention in recent years. Currently, SR methods are broadly divided into the following categories: interpolation-based, reconstruction-based, and learning-based [11]. With the rapid development of deep learning (DL) technology, Dong et al. [12] pioneeringly introduced a convolutional neural network (CNN) for SR, significantly outperforming the interpolation-based and reconstruction-based methods in terms of performance. It led to the DL-based methods becoming the mainstream approach.

The excellent network structures often inspire the network design for SR in high-level computer vision tasks, such as residual learning [13,14] and dense connections [15,16]. Fractal connection architecture based on fractal theory [17,18,19,20,21,22] has also been applied in SR [23,24,25]. Particularly, the emergence of the Vision Transformer (ViT) has surpassed and replaced traditional CNN structures, frequently utilized in SR network design [26,27]. Although ViT’s capability of long-range modeling helps to extract global image features, using fully connected layers and matrix multiplication operations in computing the self-attention mechanism results in high computational complexity [28]. Additionally, the vast number of model parameters contributes to increased computational costs [29,30]. Furthermore, it lacks the inherent inductive biases of a CNN, which hampers its generalization performance, especially training with insufficient data [31,32,33,34]. Consequently, a large amount of data and computational resources are required to compensate for these limitations. To address the above issues, ConvNext [35] adopts a more “modernized” structure that retains the simplicity and efficiency of a CNN to compete once again with Transformers in terms of accuracy and scalability. Despite ConvNext demonstrating impressive performance in high-level vision tasks, few researchers have attempted to use ConvNext in SR tasks.

Due to factors such as lighting, atmospheric propagation, and sensor quantization, remote sensing images are affected by multi-degradation, including blur and noise [10]. Currently, DL-based SR methods follow the simplistic assumption of bicubic down-sampling, which significantly declined SR network performance when dealing with multi-degradations [36]. Therefore, it is crucial to consider multi-degradation for constructing SR networks.

Existing multi-degradation SR methods rely on the given LR image to learn the degradation kernel features, utilized to augment the training dataset or directly perform SR reconstruction. When the estimated degradation kernel mismatches the real-world degradation, the reconstructed SR image tends to be overly smooth or sharp [37]. It indicates that the features of the reconstructed remote sensing SR image contain deviation information of the degradation kernel. Therefore, merely utilizing the LR image to learn degradation kernel features cannot provide effective feedback for the deviation, which makes it difficult to correct the estimated degradation kernel during the training process reasonably.

To address the aforementioned issues, this paper designs a ConvNext-based multi-degradation SR network for remote sensing images. This network utilizes the features of SR reconstructed images to correct the degradation kernel. The architecture comprises two main components: the SR reconstruction subnetwork module (SRConvNext) and the reconstruction features-guided kernel correction network (RFGKCorrector), both constructed by ConvNext Blocks (CNBs) as fundamental units. CNBs achieve global modeling of features, adapting to the characteristics of abundant spatial information and cross-scale targets. First, we initialize a dimension-reduced degradation kernel vector as an additional feature, which is input into the SR network along with the LR image. Second, the SR network transfers the reconstructed image features to the correction network, generating feedback to adjust the estimated degradation kernel. Then, we iteratively alternate between these two subnetworks to form an end-to-end trainable network. Notably, the degradation kernel vector we utilized is not the real-world degradation kernel. Throughout the loop iterative process, this vector is continuously refined by the correction network as the feedback to the SR network for reconstructing images closer to HR. This approach allows for greater tolerance when the estimated degradation kernel deviates from the real-world degradation, thereby improving robustness in real-world scenarios.

The motivation behind this work is twofold. Firstly, we jointly optimize the image reconstruction and kernel correction networks. This end-to-end training is desirable for mitigating the accumulation of errors from preceding stages in sequential cascading methods. Secondly, we control the uncertainty in the degradation kernel estimation process through SR image reconstruction features, thereby enhancing the authenticity of the kernel correction network.

The main contributions of this article are summarized as follows:

(1): We design a deep learning SR network (RFKCNext) for degradation kernel correction in multi-degradation SR for remote sensing images. Unlike the potential expression of learning degraded kernels directly from LR images, RFKCNext utilizes features from the SR images to correct the estimated degradation kernels. It enables the kernel correction network to better capture the deviation between the estimated and the real-world degradation kernels, thereby improving the accuracy of the degradation kernel estimation and the quality of the final reconstructed images.
(2): We design a CNB-based SR reconstruction subnetwork module (SRConvNext) and a reconstruction features-guided kernel correction subnetwork module (RFGKCorrector) to form RFKCNext. The introduction of CNBs addresses the limitation of traditional CNN structures in globally modeling the features of remote sensing images. To our knowledge, we are the first to utilize CNBs to construct a network for multi-degradation SR reconstruction in remote sensing images.
(3): Extensive experiments are conducted on the NWPU-RESISC45 dataset, UCMERCED dataset, and real-world remote sensing dataset provided by the “Tianzhi Cup” Artificial Intelligence Challenge. The qualitative and quantitative experimental results indicate that our method outperforms other methods, demonstrating the effectiveness of the designed method.

The remaining sections of this article are organized as follows: Section 2 provides a concise overview of related work. Section 3 offers a detailed explanation of the proposed methodology. Section 4 presents experimental results. A discussion of the experimental results is presented in Section 5. The conclusions are provided in Section 6.

2. Related Work

2.1. CNN-Based Multi-Degradation SR Methods

Over the past decade, with the rapid development of DL, numerous SR networks designed based on the bicubic degradation assumption [13,14,16,26,27,38,39,40,41] have emerged, achieving impressive results. It has laid a solid foundation for the advancement of multi-degradation SR methods. Currently, DL-based multi-degradation SR approaches have made some progress.

Zhang et al. [42] developed a Super-Resolution for Multiple Degradations (SRMD) network. They utilized principal component analysis (PCA) to reduce the dimensionality of the blur kernel features into a vector, concatenated with a noise vector. This concatenated vector was stretched into a degradation map matching the height and width of the LR image for training. Xu et al. [43] introduced the Unified Dynamic Convolutional Network for Variational Degradations (UDVD). This framework comprised a feature extraction network and a refinement network. The refinement network enhances performance through dynamic convolutions. Zhang et al. [44] decomposed the degradation model by the half-quadratic splitting (HQS) algorithm into two decoupled terms: the data and prior. They designed a deep Unfolding Super-Resolution Network (USRNet) consisting of the data module, the prior module, and the hyperparameter module to handle image SR reconstruction under various degradations. Zhang et al. [45], within the maximum a posteriori (MAP) framework, expanded the energy function to decouple the degradation model into deblurring and down-sampling image denoising. They designed the Deep Plug-and-play SR (DPSR) network, which employs CNNs during MAP optimization iterations to address these two issues, thereby achieving multi-degradation SR reconstruction. Liu et al. [46] proposed a degradation-aware self-attention-based Transformer model (DSAT). This network employed a CNN-mixed Transformer module for feature extraction and incorporated an attention mechanism to integrate latent degradation information, enabling the Transformer to adapt to unknown degradations during the learning process. Zhang et al. [47] designed a network combined kernel estimation and structural prior knowledge (KESPKNet). This network integrated kernel estimation with structural prior knowledge to reconstruct textures with high self-similarity. By designing a Global Texture Fusion Block (GTFB) to merge local and global textures, the network provided supplementary information for SR images.

Traditional CNN-based multi-degradation SR methods achieve good results in natural image reconstruction. However, remote sensing images differ significantly due to complex spatial information distribution, multi-scale targets, and abundant scene details. These characteristics necessitate global feature modeling by the network. Traditional CNN approaches are still limited by the local feature extraction of convolutional operations, which makes it difficult to carry out long distance feature modeling. Although ViT utilizes self-attention mechanisms to capture more extensive and global relationships, its complex self-attention computations increase computational costs, and the huge number of model parameters, leading to training difficulties [28,29,30]. Furthermore, ViT requires extensive reliance on large-scale datasets to compensate for these shortcomings during training because of the absence of the inherent inductive biases found in CNNs [31,32,33,34]. Therefore, it is imperative to build a network using advanced structured CNNs based on the characteristics of remote sensing images, which can achieve global feature modeling while ensuring computational cost.

2.2. Multi-Degradation SR Methods for Remote Sensing Images

Thanks to the outstanding performance of SR networks in natural images, many researchers have applied these to enhance the resolution of remote sensing images under the bicubic degradation assumption [11,48,49,50,51,52]. While these methods have improved the quality of reconstructed images, complex interferences such as imaging devices and environmental conditions expose remote sensing images to multi-degradation. The above methods are inevitably limited in practical applications. Therefore, addressing multi-degradation SR remains crucial in remote sensing images. Currently, there are existing studies aimed at tackling this issue.

Zhang et al. [10] estimated blur kernels and noise from real-world remote sensing images to synthesize a real-world training dataset. They designed a residual balanced attention network with a modified UNet discriminator (RBAN-UNet) to achieve SR reconstruction under real-world degradation conditions. Zhang et al. [36] designed an unsupervised network for handling multi-degradation. This network comprised a Degradation Network (D) and a Generative Network (G). The SR images reconstructed by G are fed into D to generate “fake” LR images, compared against the input LR images to assess the authenticity of the generated SR images. Kang et al. [53] proposed a novel Multilayer Degradation Representation-Guided Blind SR method. This approach utilized a contrastive learning framework to obtain degraded representations with different blur kernels from LR images. The degraded representations are employed to guide the extraction of high-order features at different scales, thereby enhancing the quality of the SR images. Dong et al. [54] proposed a degradation model incorporating estimated blur kernels from real-world images and kernels generated from predefined distributions to synthesize a real-world training dataset. They employed a kernel-aware network (KANet) to achieve multi-degradation SR reconstruction. Zhao et al. [55] utilized a generative adversarial network to develop a blur kernel extraction network, which employs internal information from real-world LR remote sensing images to estimate blur kernels. Xiao et al. [56] proposed a self-supervised degradation-guided adaptive network (DRSR), which utilizes contrastive learning to achieve adaptive representations of degradation. Additionally, they designed a dual-wise feature modulation network to convert features and channel dimensions, thereby mapping LR features to the desired domain for reconstruction. The designed multi-degradation remote sensing SR network enhanced feature extraction capabilities by incorporating densely connected mechanisms and multi-scale feature extraction blocks.

While significant progress has been made in multi-degradation SR for remote sensing, current methods typically involve two consecutive steps. Firstly, networks are designed to directly learn from LR remote sensing images, aiming to approximate real-world degradation kernels or using learned estimated kernels to synthesize “real-world” training datasets. Secondly, SR networks are designed to reconstruct HR images by learned degradation kernel features or expanded training datasets. However, these approaches do not consider utilizing information from SR images to correct generated degradation kernels, thereby enhancing the robustness of the network against deviations from real-world conditions. Furthermore, these two-step solutions typically independently train two networks. When errors occur in the degradation kernel estimation, SR reconstruction performance significantly degrades [57].

Therefore, inspired by the experimental results in [37], to address this issue, we utilize features from SR reconstructed images to correct the estimated degradation kernel. We propose an end-to-end SR reconstruction network coupled with a kernel correction network. Through iterative optimization, we gradually reduce errors in the estimated degradation kernels, thereby enhancing the reconstruction capability of the SR network. Our approach achieves kernel learning solely through designated multi-degradation operations during training without using learned degradation kernels to generate additional training datasets.

2.3. The Method of Kernel Correction

The degradation processes in the real world are diverse and complex. Most existing multi-degradation SR methods addressed these challenges by learning latent representations of the degradation kernels. Obtained degradation features mitigate the interference caused by degradation factors to enhance the performance of the network in re-constructing LR images from the real world. These methods effectively handled various degradation factors without covering all possible degradation scenarios. Consequently, when the estimated degradation kernel deviates significantly from the actual kernel, the performance of these methods can deteriorate substantially. Thus, it is crucial to develop effective techniques for kernel correction based on degradation features to better align with real-world conditions. Some researchers have investigated many kernel correction approaches.

Gu et al. [37] observed that SR results exhibit over-smoothing or over-sharpening when the estimated kernel utilized for SR mismatches the real kernel. Based on this observation, they proposed the Iterative Kernel Correction (IKC) network to achieve more ideal SR results by correcting the estimated kernel. Inspired by [37], Luo et al. [57] proposed a Deep Alternating Network (DAN). This network consisted of two CNN modules, Restorer and Estimator, connected end-to-end for SR reconstruction and degradation kernel prediction. Yan et al. [58] proposed a kernel-guided network for real-world blind super-resolution (KGSR), which consists of a downscaling generator and an upscaling generator. By incorporating an orientation prior mechanism within the discriminator of the downscaling generator, the network ensured that the learned kernels adhere to the degradation process of real scenarios. This approach provided the upscaling generator with accurate blur kernels, thereby facilitating the generation of high-quality images. Ates et al. [59] proposed an end-to-end trainable iterative kernel reconstruction network (IKR-Net) for blind super-resolution, based on Zhang’s work. The network comprised a kernel initialization module, a kernel reconstruction module, a noise estimator module, and an SR reconstruction module. The kernel reconstruction module employed HQS to decompose the kernel estimation into two sub-modules: the non-trainable module Dk and the kernel denoising module Pk. Dk is used for reconstructing the updated kernel, while Pk applies regularization to the reconstructed kernel, enabling iterative refinement of the estimated kernels. Zhou et al. [60] proposed an unsupervised method to learn correction filtering (kernel) for blind single-image super-resolution in a spatially variant way. The network utilized a linearly assembled pixel degradation-adaptive regression module (DARM) to adjust the degradation of LR images to match known degradations. DARM was optimized by using a dictionary of multiple predefined kernel bases, enabling accurate learning of correction kernel and enhancing the network’s adaptability to complex unknown degradations.

3. Methodology

In this section, we provide a detailed description of the proposed RFKCNext. We begin with an introduction to the degradation formulation, followed by an overview of the RFKCNext framework. Next, we introduce the SR network module, SRConvNext, the kernel correction network module, and RFGKCorrector. Finally, we present the loss functions utilized during training.

3.1. Degradation Formulation

Before introducing RFKCNext, we first present the degradation formulation. The process of obtaining LR images from HR images through multi-degradation can be obtained from Equation (1).

I_{L R} = (I_{H R} * k) ↓_{γ} + n

(1)

where

I_{H R}

represents the HR image,

I_{L R}

represents the LR image,

k

denotes the blur kernel,

↓_{γ}

signifies the down-sampling operation with a scale factor of

γ

, and

n

represents the additive white Gaussian noise (AWGN) with a standard deviation of

σ

(noise level).

3.2. Network Architecture

The overall framework of RFKCNext is illustrated in Figure 1. RFKCNext reconstructs high-quality images by the SRConvNext module and corrects the degradation kernel with the RFGKCorrector module. These two subnetwork modules are executed alternately in each loop iterative optimization, forming an end-to-end trainable network.

We employ the initialization method proposed in [25] to generate the initial blur kernel

k_{0} \in R^{m \times m}

(where

m

denotes the size of the blur kernel) and vectorize it into

α_{0} \in R^{m^{2} \times 1}

. Subsequently, PCA is applied to obtain the

q

-dimensional blur kernel vector

{\bar{α}}_{0} \in R^{q \times 1}

. Finally, this vector is concatenated with the noise level to produce the initial dimension-reduced degradation kernel vector

β \in R^{(q + 1) \times 1}

. The LR image is derived from the HR image according to Equation (1).

When the generated degradation kernel deviates from the real-world kernel, the resulting SR image tends to be overly smooth or overly sharp [23], leading to an increased difference from the HR image. Consequently, the reconstruction feature of the SR image inherently contains information about the degradation kernel deviation. As the estimated degradation kernel approximates the real-world kernel, the SR image becomes closer to the HR image. Based on this phenomenon, we input the reconstruction feature and the LR image to the RFGKCorrector, which corrects the estimated degradation kernel vector for utilizing in SRConvNext during the subsequent loop.

By employing a loop iterative approach to alternately optimize the two modules, the submodules can better exploit each other’s information to refine their respective network parameters. Compared to independently training the two networks and simply sequentially connecting them, RFKCNext mitigates the cumulative errors at each stage of the pipeline, thereby enhancing the quality of the reconstructed image.

3.3. Super-Resolution Network (SRConvNext)

SRConvNext consists of three stages: shallow feature extraction, deep feature extraction, and high-quality image reconstruction, as illustrated in Figure 2. Initially, let

I_{L R} \in R^{H \times W \times 3}

(

H

,

W

, 3 representing the height, width, and number of channels of the LR image, respectively) be the input LR image. The dimension-reduced degradation kernel vector

β \in R^{(q + 1) \times 1}

is stretched to form a degradation feature map

M \in R^{H \times W \times (q + 1)}

.

M

is concatenated with

I_{L R}

along the channel dimension to form

I_{0} \in R^{H \times W \times (q + 4)}

, which is fed into the shallow feature extraction stage to obtain the shallow features

F_{0} \in R^{H \times W \times C}

. This process can be expressed as:

F_{0} = H_{C o n v}^{k 3 s 1 p 1} (I_{0})

(2)

where

H_{C o n v}^{k 3 s 1 p 1} (\cdot)

denotes a

3 \times 3

convolution with a stride of 1 and a padding of 1, and

C

represents the number of feature channels.

Subsequently,

F_{0}

is fed into the CNB, the structure of which is illustrated in Figure 3. The feature extraction process can be described as follows:

F_{o u t} = H_{L i n e a r} (H_{G E L U} (H_{L i n e a r} (H_{L N} (H_{D C o n v}^{k 7 s 1 p 3} (F_{i n}))))) + F_{i n}

(3)

where

H_{D C o n v}^{k 7 s 1 p 3} (\cdot)

denotes the

7 \times 7

depthwise convolution with a stride of 1 and padding of 3,

H_{L N} (\cdot)

represents the LayerNorm,

H_{L i n e a r} (\cdot)

is the linear layer,

H_{G E L U} (\cdot)

refers to the GELU activation function, and

F_{i n}

and

F_{o u t}

are the intermediate input and output features, respectively.

For deep feature extraction, multiple CNBs are cascaded to obtain

F_{i} \in R^{H \times W \times 2 C}

. This process is expressed as:

F_{i} = H_{C N B}^{i} (F_{i - 1}), i = 1, 2, \dots, b

(4)

where

H_{C N B}^{i} (\cdot)

represents the

i - t h

CNB,

F_{i}

denotes the features output by the

i - t h

CNB,

F_{i - 1}

indicates the features output by the preceding CNB, and

b

is the total number of CNBs.

Finally, after the high-quality image reconstruction stage, the SR image

I_{S R} \in R^{γ H \times γ W \times 3}

is generated, where

γ

represents the scale factor. This process is represented as:

I_{S R} = H_{C o n v}^{k 3 s 1 p 1} (H_{U p} (H_{L N} (H_{C o n v}^{k 3 s 1 p 1} (F_{i}))))

(5)

where

H_{U p} (\cdot)

denotes the PixelShuffle up-sampling.

3.4. Reconstruction Features-Guided Kernel Corrector (RFGKCorrector)

The overall structure of RFGKCorrector is shown in Figure 2. The input

I_{S R}

is first processed through a

3 \times 3

convolutional operation with a stride of

γ

and padding of 1 to extract the reconstruction features

F_{r f} \in R^{H \times W \times C}

. This process is expressed as:

F_{r f} = H_{C o n v}^{k 3 s γ p 1} (I_{S R})

(6)

I_{L R}

obtains

F_{L R} \in R^{H \times W \times C}

through

H_{C o n v}^{k 3 s 1 p 1} (\cdot)

. The structure of the CNB in RFGKCorrector is the same as in SRConvNext, but the input features are the concatenation of the previous level features with

F_{r f}

along the channel dimension. Finally, a global average pooling operation is performed to obtain the corrected dimension-reduced degenerate kernel vector.

3.5. Loss Function

For SRConvNext, the proposed deep network is trained by the L₁ loss function, which is expressed as:

L_{S R} = \frac{1}{n} \sum_{i = 1}^{n} {‖F (y_{i}) - x_{i}‖}_{1}

(7)

where

n

represents the total number of training samples,

y_{i}

and

x_{i}

denote the

i - t h

pair of HR and LR image patches, respectively, and

F (y_{i})

represents the SR image generated by the network.

For RFGKCorrector, we employ the

L_{1}

loss between the estimated degenerate kernel

\hat{K}

and the ground-truth (GT) kernel

K

.

L_{K} = \frac{1}{n} \sum_{i = 1}^{n} {‖{\hat{K}}_{i} - K_{i}‖}_{1}

(8)

where

{\hat{K}}_{i}

represents the

i - t h

estimated degenerate kernel, and

K_{i}

is the GT kernel.

The total loss is described as:

L_{t o t a l} = L_{K} + L_{S R}

(9)

4. Experiment

4.1. Datasets and Metrics

Our experiments utilized two widely used remote sensing datasets, NWPU-RESISC45 [61] and UCMERCED [62], and a dataset from real-world remote sensing scenarios provided by the “Tianzhi Cup” Artificial Intelligence Challenge.

The NWPU-RESISC45 remote sensing dataset consists of 45 classes of remote sensing scene data, with each class containing 700 images, totaling 31,500 images of size 256 × 256 RGB and spatial resolutions ranging from 0.2 to 30 m. These images from Google Earth are selected from more than 100 countries and regions. The 45 scenario categories are as follows: airplane, airport, baseball diamond, basketball court, beach, bridge, chaparral, church, circular farmland, cloud, commercial area, dense residential, desert, forest, freeway, golf course, ground track field, harbor, industrial area, intersection, island, lake, meadow, medium residential, mobile home park, mountain, overpass, palace, parking lot, railway, railway station, rectangular farmland, river, roundabout, runway, sea ice, ship, snowberg, sparse residential, stadium, storage tank, tennis court, terrace, thermal power station, and wetland.

The UCMERCED remote sensing dataset comprises 21 classes, each comprising 100 images, resulting in 2100 images of size 256 × 256 RGB and a spatial resolution of approximately 0.3 m. These are USGS aerial images from 21 U.S. regions. The 21 classes are as follows: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium density residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis courts.

The real-world remote sensing scenarios dataset is derived from the “Tianzhi Cup” Artificial Intelligence Challenge organized jointly by the Beijing Remote Sensing Information Institute and the Artificial Intelligence Innovation Research Institute of the Chinese Academy of Sciences, specifically from the visible light aircraft intelligent detection and recognition track. This dataset comprises 611 images, with 308 images in the training dataset, 122 in the validation dataset, and 181 in the testing dataset (not available for download). It includes 11 classes of objects, with approximately 13,000 aircraft samples. Each image has a size of 4096 × 4096, with a spatial resolution of approximately 0.5 m. This dataset can be obtained from https://rsaicp.com/portal/dataList (accessed on 27 May 2021).

In summary, the NWPU-RESISC45 dataset contains the broadest category, covering all kinds of scenes in real life. The 21 types in the UCMERCED dataset are also basically included in the NWPU-RESISC45 categories. Although the “Tianzhi Cup” dataset is used for the detection and recognition of aircraft types with its huge image size of 4096 × 4096 (pixels), there are not only aircraft targets, but also other scenes, such as airport, dense residential, industrial area, runway, etc., which are also included in the NWPU-RESISC45 categories. The specific differences are shown in Table 1.

However, the image types and cover areas of the three datasets are different. As a result, images from different datasets of the same scene category are also different. And there is only scene similarity between images, and no overlap between image data. In addition, the above three datasets are all captured in real areas under the scene. Thus, the remote sensing images data inherently contain the original degradation factors under the real scene.

Therefore, we selected the NWPU-RESISC45 remote sensing images dataset as the training data, and the UCMERCED dataset and “Tianzhi Cup” dataset as the testing data. At the same time, we considered Gaussian blur kernels, noise, and down-sampling operations on the NWPU-RESISC45 dataset to synthesize the degraded remote sensing images to further simulate the degradation process. We randomly selected 100 images from each class as the training dataset and 10 as the testing dataset. Thus, the training dataset comprises 4500 images, while the testing dataset includes 450. To ensure the authenticity of the experimental results, there is no overlap between the training and testing datasets.

For evaluating the results on the NWPU-RESISC45 synthetic dataset, we utilized peak signal-to-noise ratio (PSNR) [63] and structural similarity index (SSIM) [64] as the experimental evaluation metrics. PSNR and SSIM were computed on the Y channel in the YCbCr color space [53]. As for evaluating the results on the UCMERCED remote sensing and real-world remote sensing datasets, due to the unavailability of HR images, we employed the natural image quality evaluator (NIQE) [65] from the no-reference image quality assessment metrics for evaluation purposes.

4.2. Experimental Settings

In this experiment, we only focused on ×2 and ×4 scale factors. During the training phase, we degraded the NWPU-RESISC45 image data by applying anisotropic Gaussian blur, bicubic interpolation down-sampling, and AWGN to generate LR images. This degradation process follows Equation (1). The size of the anisotropic Gaussian blur was fixed at

15 \times 15

, with the lengths

λ_{1}

and

λ_{2}

of two axes uniformly distributed between [0.1, 5] and random rotation angles

θ

uniformly distributed between [0,

π

]. When

λ_{1} = λ_{2}

, it becomes an isotropic Gaussian kernel. AWGN is represented by a standard deviation

σ

(noise level) set within [0, 25]. Following the setting in [20], we utilized PCA to reduce the blur kernel to a 15-dimensional vector concatenated with the noise level to form a 16-dimensional vector. During the testing phase, we conducted experiments using one isotropic Gaussian kernel and four anisotropic Gaussian kernels, as depicted in Figure 4, where

k_{1} : [λ_{1} = 2.5, λ_{2} = 2.5, θ = 0]

,

k_{2} : [λ_{1} = 3.8, λ_{2} = 1.8, θ = \frac{7}{10} π]

,

k_{3} : [λ_{1} = 1.4, λ_{2} = 2.7, θ = \frac{3}{10} π]

,

k_{4} : [λ_{1} = 0.5, λ_{2} = 2.3, θ = \frac{3}{5} π]

and

k_{5} : [λ_{1} = 2.2, λ_{2} = 3.2, θ = \frac{8}{9} π]

.

σ

adopted [0, 5, 10] so that there are

5 \times 3 = 15

degradation combinations for different combinations of Gaussian blur kernels and noise levels.

During training, we randomly cropped HR images into

64 \times 64

patches. The corresponding LR image patches at ×2 and ×4 scale factors were

32 \times 32

and

16 \times 16

, respectively. We also randomly rotated 90°, 180°, and 270° and flipped horizontally for data augmentation.

We employed the Adam optimizer [66] to train our model, with parameters

β_{1} = 0.9

and

β_{2} = 0.999

. The batch size was 32 for the ×2 scale factor and 128 for the ×4 scale factor. The total number of training epochs was 2000, with an initial learning rate of 10⁻⁴ that halved every 2 × 10⁵ iterations. SRConvNext utilized 30 CNBs, while RFGKCorrector utilized 6 CNBs; the number of loops for alternating optimization between the two networks was fixed at

t = 4

. All experiments were conducted using the PyTorch framework version 1.11 on a workstation with an i9-10900X CPU, 64 GB of RAM, and an NVIDIA RTX 3090 GPU.

4.3. Experiments on NWPU-RESISC45 Synthetic Images

We compared our proposed method with IKC [21], SRMD [22], USRNet [24], DAN [26], DRSR [56], DSAT [46], and KESPKNet [47]. Table 2 and Table 3 present the objective metrics of the model parameters, running time, PSNR, and SSIM for the networks above under ×2 and ×4 down-sampling scale factors. It is worth noting that the running time in the tables refers to the average running time on the testing dataset generated under combinations of five Gaussian blur kernels k₁–k₅ and corresponding noise levels. The best results are highlighted in red, while the second-best results are highlighted in blue. It is evident that our proposed method achieves the highest PSNR in 15 degradation combinations.

4.3.1. Quantitative Results

For the ×2 scale factor, as shown in Table 2, our proposed method improves the average PSNR of the second-ranked KESPKNet by 0.225 dB, 0.118 dB, and 0.106 dB, respectively. Compared to USRNet, which has the minimum model parameters, our method increases average PSNR by 1.966 dB, 0.444 dB, and 0.266 dB. Compared to SRMD, which has the minimum runtime, our method achieves an average increase of 0.715 dB, 0.286 dB, and 0.218 dB in PSNR. However, the average runtime only increases by approximately 30 ms.

For the ×4 scale factor, as shown in Table 3, our method achieves average PSNR improvements of 0.111 dB, 0.100 dB, and 0.111 dB. Compared to USRNet, our method increases average PSNR by 0.282 dB, 0.143 dB, and 0.136 dB. Compared to SRMD, which has the minimum runtime, our method achieves an average increase of 0257 dB, 0.175 dB, and 0.133 dB in PSNR.

Although our method is not optimal in terms of model parameters and running time, it is noteworthy that, compared to SRMD and USRNet, our approach achieves superior SR performance. This demonstrates that the increase in model complexity and running time is acceptable.

4.3.2. Qualitative Results

Figure 5 and Figure 6 display the qualitative results of different methods under ×2 and ×4 scale factors, respectively. For visual convenience, we mark the area to be enlarged on the left HR image with a red box, and the enlarged image of the cropped region is displayed in the top-right or bottom-right corner of the HR image. We crop the same region from the LR image to illustrate the impact of different degradations on the HR image. The magnified patches of the LR image and the SR images reconstructed by various methods are displayed on the right.

For the ×2 scale factor, visual quality decreases with increasing noise levels across all reconstruction methods. In Figure 5, “railway_009”, “parking_lot_554”, and “parking_lot_665”, our approach yields sharper car reconstructions compared to others, maintaining car shapes well even under high noise conditions. In “ship_081”, our method consistently preserves and restores the boundaries of circular areas across different noise levels. “harbor_368” demonstrates our method’s capability to reconstruct more authentic and detailed texture features.

For the ×4 scale factor, it is evident that smaller scaling factors pose greater challenges. The loss of image information on LR images is more severe, as evidenced by the enlarged images of the corresponding LR regions. To mitigate noise interference, all reconstructed images from various methods tend to be smoother, and this smoothing becomes more pronounced as the noise level increases. However, our proposed method still performs well compared to others. In the “harbor_409” image shown in Figure 6, our method can recover more details on the boats. In “ground_track_field_131” and “industrial_area_694” images, our method maintains the shape of the architectural areas even under high noise conditions and reconstructs sharp edges. As for “stadium_152” and “church_305”, our method can restore the building textures more clearly.

Therefore, quantitative and qualitative experiments on the NWPU-RESISC45 synthetic dataset demonstrate that our method effectively mitigates noise and yields visually satisfactory results.

4.4. Experiments on UCMERCED Remote Sensing Images

We also evaluated all approaches on the UCMERCED dataset. We randomly selected 10 images from each category, totaling 210 for testing. There is no information from the UCMERCED dataset in the training dataset. Experiments were conducted directly on the UCMERCED images rather than down-sampling images. Therefore, for the ×2 and ×4 scale factors, the corresponding reconstructed image resolution is

512 \times 512

pixels and

1024 \times 1024

pixels. Due to the absence of HR reference images, we employed the non-reference metric NIQE for quantitative evaluation, where lower scores indicate better perceptual quality.

Qualitative Results

Table 4 presents NIQE results for different methods at ×2 and ×4 scale factors. Quantitative analysis shows that our proposed method achieves the lowest NIQE scores for both scaling factors, demonstrating superior performance.

Figure 7 and Figure 8 depict the qualitative results of different methods at ×2 and ×4 scale factors, respectively. Figure 7 shows the “freeway81” of the ×2 scale factor. Our approach generates the sharpest car edges and authentic restored textures on front and rear windows. Figure 8 shows the “buildings22” of the ×4 scale factor. Our method successfully reconstructs clear signage details even after enlarging the rooftop area to 1024 ×1024 pixels. The above results indicate that our method has good effectiveness and generalization compared to other methods on other public datasets.

4.5. Experiments on Real-World Remote Sensing Images

To further evaluate the SR performance of the proposed method on real-world degraded remote sensing images, we conducted experiments on the dataset from the “Tianzhi Cup” visible light image aircraft intelligent detection and recognition track. We randomly selected two images from its given training and validation datasets for testing. Given the original image resolution of

4096 \times 4096

pixels, we cropped regions of interest into small

256 \times 256

pixel patches. Subsequently, we reconstructed patches into

512 \times 512

pixels and

1024 \times 1024

pixels at ×2 and ×4 scale factors, respectively. None of the training data for any methods included information from this dataset, and NIQE was employed as a quantitative evaluation metric.

Qualitative Results

Table 5 presents the evaluation results of all methods at both scale factors. It is evident that our approach achieved the lowest NIQE score compared to the other methods. Figure 9 and Figure 10 depict the visual outcomes of all methods at these scale factors. The regions of interest in the original image on the left are marked with a red box, and the corresponding reconstructed image is on the right. Figure 9 shows the “105” of the ×2 scale factor. Our method produced clearer edges of aircraft compared to others. Figure 10 shows the “174” of the ×4 scale factor. Our method recovered more texture details in lettering. This experiment demonstrates that our approach achieves satisfactory SR performance on real-world remote sensing images.

4.6. Ablation Studies

To validate the effectiveness of RFGKCorrector, we conducted ablation experiments. The training dataset, testing dataset, and degradation methods utilized in this experiment were consistent with Section 4.3. Firstly, we performed SR reconstruction by the SRConvNext on 15 degradation methods. Secondly, RFGKCorrector was introduced to form RFKCNext for SR reconstruction.

Table 6 and Table 7 display the PSNR and SSIM values of scale factors ×2 and ×4, with the most effective results highlighted in red. Table 6 shows that at ×2 scale factor, RFKCNext improved average PSNR by 0.184 dB, 0.165 dB, and 0.125 dB compared to SRConvNext. At ×4 scale factor, as shown in Table 7, RFKCNext improved average PSNR by 0.152 dB, 0.100 dB, and 0.088 dB. Thus, considering both scale factors, RFKCNext demonstrated superior SR performance across different noise levels and blur kernels. This indicates that, compared to employing only SRConvNext for reconstruction, introducing RFGKCorrector can effectively utilize reconstructed features for kernel correction, making the estimated kernel approximate to the real-world scenarios and obtaining better reconstruction results.

5. Discussion

In this section, we will further discuss the impact of the proposed RFKCNext.

(1) Comparison with other methods: The experimental results in Section 4.3 show that the proposed RFKCNext restores finer details compared to SRMD, USRNet, IKC, and DAN. As the noise level gradually increases, all methods tend to smooth the reconstructed images to reduce the impact of noise. In contrast, our method reconstructs results that effectively preserve the shapes of objects while performing well in restoring image details and edges. The quantitative and qualitative analyses in Section 4.4 and Section 4.5 indicate that, when reconstructing unknown degraded real-world remote sensing data, RFKCNext does not introduce invalid textures, appearing more natural compared to the above methods.

(2) The impact of model parameter quantity and running time: Table 2 and Table 3 show quantitative results, indicating that our method achieves the best SR performance at scale factors ×2 and ×4. Compared with the fastest running time of KESPKNet, our method improves average PSNR by 0.225 dB, 0.118 dB, and 0.106 dB for the ×2 scale factor. For the ×4 scale factor, the improvements are 0.111 dB, 0.100 dB, and 0.111 dB, respectively. Comparatively to USRNet, which has the minimum model parameters, our method increases average PSNR by 1.966 dB, 0.444 dB, and 0.266 dB for the ×2 scale factor. When the scale factor is ×4, the average PSNR increases by 0.282 dB, 0.143 dB, and 0.136 dB, respectively. Furthermore, RFKCNext reduces model parameters by 19.05 M and 19.06 M compared to KESPKNet at ×2 and ×4 scale factors, respectively. Therefore, considering the performance improvements, the increase in our method in model parameters and runtime is acceptable.

(3) The impact of kernel correction network: In RFKCNext, the subnetwork SRConvNext reconstructs the SR image based on LR image information and estimated degradation kernels. The kernel correction subnetwork RFGKCorrector corrects the estimated kernels utilizing LR image information and SR reconstruction features, complementing each other. Therefore, as indicated by the results in Table 5 and Table 6, the SR performance using SRConvNext alone is inferior to that achieved by jointly employing RFGKCorrector. This demonstrates that through loop iterative optimization, the kernels corrected by RFGKCorrector gradually approximate the real-world scenario, thereby enhancing the quality of images reconstructed by SRConvNext.

(4) Limitations of the method: Firstly, as the noise level increases, our method effectively reconstructs sharp edges, but the images still tend to become smoother. Secondly, under down-sampling at lower scale factors, the restored images maintain the original shapes of objects but may incur texture losses. Lastly, the robustness of the network should also be considered in our scope with more complex degradation factors such as motion blur and other types of noise. These limitations and challenges contribute to refining the proposed method, which will be our future work.

6. Conclusions

In this paper, we propose RFKCNext, a multi-degradation SR reconstruction network for remote sensing images. Unlike methods that learn degradation kernel representations directly from LR images, RFKCNext also employs features from SR reconstructed images to correct the degradation kernel. The network comprises two sub-networks: SRConvNext and RFGKCorrector, each using ConvNext Blocks (CNBs) to extract global features. These sub-networks are optimized in a loop alternation manner to form an end-to-end trainable network. SRConvNext reconstructs SR images utilizing LR images and estimated kernels. The reconstruction features guide RFGKCorrector to correct the estimated degradation kernel to approximate real-world scenarios. Both subnetworks can achieve significant improvement during the loop process. Therefore, even through the degradation kernels are unknown, RFGKCorrector provides more accurate kernel estimations than SRConvNext, enhancing its capability to reconstruct image details. Extensive experiments on synthetic remote sensing datasets and ablation studies indicate that the proposed method achieves the best SR results, particularly in high-noise conditions. Furthermore, experiments on other public remote sensing and real-world datasets demonstrate superior reconstruction performance, proving the robustness and practicality of the method.

Author Contributions

Conceptualization, Y.Q.; formal analysis, H.N.; methodology, J.W.; investigation, J.L.; supervision, M.Z.; visualization, H.L.; data curation, J.S.; funding acquisition, J.W.; software, Y.Q.; validation, Y.Q.; writing—original draft, Y.Q.; writing—review and editing, Q.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Science and Technology Department of Jilin Province of China under Grant number 20210201137GX.

Data Availability Statement

The data of experimental images used to support the findings of this research are available from the corresponding author upon reasonable request. The data are not publicly available due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, X.; Yi, J.; Guo, J.; Song, Y.; Lyu, J.; Xu, J.; Yan, W.; Zhao, J.; Cai, Q.; Min, H. A Review of Image Super-Resolution Approaches Based on Deep Learning and Applications in Remote Sensing. Remote Sens. 2022, 14, 5423. [Google Scholar] [CrossRef]
Huang, L.; An, R.; Zhao, S.; Jiang, T.; Hu, H. A Deep Learning-Based Robust Change Detection Approach for Very High Resolution Remotely Sensed Images with Multiple Features. Remote Sens. 2020, 12, 1441. [Google Scholar] [CrossRef]
Tang, X.; Zhang, H.; Mou, L.; Liu, F.; Zhang, X.; Xiang, X.; Zhu, X.; Jiao, L. An Unsupervised Remote Sensing Change Detection Method Based on Multiscale Graph Convolutional Network and Metric Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5609715. [Google Scholar] [CrossRef]
Li, X.; Yong, X.; Li, T.; Tong, Y.; Gao, H.; Wang, X.; Xu, Z.; Fang, Y.; You, Q.; Lyu, X. A Spectral–Spatial Context-Boosted Network for Semantic Segmentation of Remote Sensing Images. Remote Sens. 2024, 16, 1214. [Google Scholar] [CrossRef]
Chen, X.; Li, D.; Liu, M.; Jia, J. CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation. Remote Sens. 2023, 15, 4455. [Google Scholar] [CrossRef]
Rabbi, J.; Ray, N.; Schubert, M.; Chowdhury, S.; Chao, D. Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network. Remote Sens. 2020, 12, 1432. [Google Scholar] [CrossRef]
Liu, C.; Zhang, S.; Hu, M.; Song, Q. Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method. Remote Sens. 2024, 16, 907. [Google Scholar] [CrossRef]
Shi, J.; Liu, W.; Shan, H.; Li, E.; Li, X.; Zhang, L. Remote Sensing Scene Classification Based on Multibranch Fusion Attention Network. IEEE Geosci. Remote Sens. Lett. 2023, 20, 3001505. [Google Scholar] [CrossRef]
Wang, G.; Zhang, N.; Liu, W.; Chen, H.; Xie, Y. MFST: A Multi-Level Fusion Network for Remote Sensing Scene Classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6516005. [Google Scholar] [CrossRef]
Zhang, J.; Xu, T.; Li, J.; Jiang, S.; Zhang, Y. Single-Image Super Resolution of Remote Sensing Images with Real-world Degradation Modeling. Remote Sens. 2022, 14, 2895. [Google Scholar] [CrossRef]
Huang, B.; Guo, Z.; Wu, L.; He, B.; Li, X.; Lin, Y. Pyramid Information Distillation Attention Network for Super-Resolution Reconstruction of Remote Sensing Images. Remote Sens. 2021, 13, 5143. [Google Scholar] [CrossRef]
Dong, C.; Loy, C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
Tong, T.; Li, G.; Liu, X.; Gao, Q. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4799–4807. [Google Scholar]
Li, J.; Du, S.; Wu, C.; Leng, Y.; Song, R.; Li, Y. Drcr net: Dense residual channel re-calibration network with non-local purification for spectral super resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1259–1268. [Google Scholar]
Larsson, G.; Maire, M.; Shakhnarovich, G. FractalNet: Ultra-Deep Neural Networks without Residuals. arXiv 2016, arXiv:1605.07648. [Google Scholar]
Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.S.; Asari, V.K. A State-of-the-Art Survey on Deep Learning Theory and Architectures. Electronics 2019, 8, 292. [Google Scholar] [CrossRef]
Cheng, K.; Shen, Y.; Dinov, I.D. Applications of Deep Neural Networks with Fractal Structure and Attention Blocks for 2D and 3D Brain Tumor Segmentation. J. Stat. Theory Pract. 2024, 18, 31. [Google Scholar] [CrossRef]
Ding, C.; Chen, Y.; Algarni, A.M.; Zhang, G.; Peng, H. Application of fractal neural network in network security situation awareness. Fractals. 2022, 30, 2240090. [Google Scholar] [CrossRef]
Anil, B.C.; Dayananda, P. Automatic liver tumor segmentation based on multi-level deep convolutional networks and fractal residual network. IETE J. Res. 2023, 69, 1925–1933. [Google Scholar] [CrossRef]
Ding, S.; Gao, Z.; Wang, J.; Lu, M.; Shi, J. Fractal graph convolutional network with MLP-mixer based multi-path feature fusion for classification of histopathological images. Expert Syst. Appl. 2023, 212, 118793. [Google Scholar] [CrossRef]
Song, X.; Liu, W.; Liang, L.; Shi, W.; Xie, G.; Lu, X.; Hei, X. Image super-resolution with multi-scale fractal residual attention network. Comput. Graph. 2023, 113, 21–31. [Google Scholar] [CrossRef]
Feng, X.; Li, X.; Li, J. Multi-scale fractal residual network for image super-resolution. Appl. Intell. 2021, 51, 1845–1856. [Google Scholar] [CrossRef]
Zhou, Y.; Dong, J.; Yang, Y. Deep fractal residual network for fast and accurate single image super resolution. Neurocomputing 2020, 398, 389–398. [Google Scholar] [CrossRef]
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Gool, L.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating More Pixels in Image Super-Resolution Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22367–22377. [Google Scholar]
Wu, D.; Li, H.; Hou, Y.; Xu, C.; Cheng, G.; Guo, L.; Liu, H. Spatial–Channel Attention Transformer with Pseudo Regions for Remote Sensing Image-Text Retrieval. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4704115. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
Wang, T.; Yuan, L.; Feng, J.; Yan, S. PnP-DETR: Towards Efficient Visual Analysis with Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4641–4650. [Google Scholar]
Dai, Z.; Liu, H.; Le, Q.; Tan, M. Coatnet: Marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 2021, 34, 3965–3977. [Google Scholar]
Liu, Y.; Zhang, Y.; Wang, Y.; Hou, F.; Yuan, J.; Tian, J.; Zhang, Y.; Shi, Z.; Fan, J.; He, Z. A Survey of Visual Transformers. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 7478–7498. [Google Scholar] [CrossRef] [PubMed]
Jamil, S.; Piran, M.J.; Kwon, O.-J. A Comprehensive Survey of Transformers for Computer Vision. Drones 2023, 7, 287. [Google Scholar] [CrossRef]
Raghu, M.; Unterthiner, T.; Kornblith, S.; Zhang, C.; Dosovitskiy, A. Do vision transformers see like convolutional neural networks? Adv. Neural Inf. Process. Syst. 2021, 34, 12116–12128. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11966–11976. [Google Scholar]
Zhang, N.; Wang, Y.; Zhang, X.; Xu, D.; Wang, X.; Ben, G.; Zhao, Z.; Li, Z. A Multi-Degradation Aided Method for Unsupervised Remote Sensing Image Super Resolution with Convolution Neural Networks. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5600814. [Google Scholar] [CrossRef]
Gu, J.; Lu, H.; Zuo, W.; Dong, C. Blind Super-Resolution with Iterative Kernel Correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1604–1613. [Google Scholar]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Real-worldistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar]
Haris, M.; Shakhnarovich, G.; Ukita, N. Deep Back-Projection Networks for Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1664–1673. [Google Scholar]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2020; Springer: Cham, Swizerland, 2018; pp. 294–310. [Google Scholar]
Zhou, Y.; Li, Z.; Guo, C.-L.; Bai, S.; Cheng, M.-M.; Hou, Q. SRFormer: Permuted Self-Attention for Single Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12734–12745. [Google Scholar]
Zhang, K.; Zuo, W.; Zhang, L. Learning a Single Convolutional Super-Resolution Network for Multiple Degradations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3262–3271. [Google Scholar]
Xu, Y.; Tseng, S.; Tseng, Y.; Kuo, H.; Tsai, Y. Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12493–12502. [Google Scholar]
Zhang, K.; Gool, L.; Timofte, R. Deep Unfolding Network for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3214–3223. [Google Scholar]
Zhang, K.; Zuo, W.; Zhang, L. Deep Plug-And-Play Super-Resolution for Arbitrary Blur Kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1671–1681. [Google Scholar]
Liu, Q.; Gao, P.; Han, K.; Liu, N.; Xiang, W. Degradation-aware self-attention based transformer for blind image super-resolution. IEEE Trans. Multimed. 2024, 26, 7516–7528. [Google Scholar] [CrossRef]
Zhang, J.; Zhou, Y.; Bi, J.; Xue, Y.; Deng, W.; He, W.; Zhao, T.; Sun, K.; Tong, T.; Gao, Q.; et al. A blind image super-resolution network guided by kernel estimation and structural prior knowledge. Sci. Rep. 2024, 14, 9525. [Google Scholar] [CrossRef]
Zhang, W.; Tan, Z.; Lv, Q.; Li, J.; Zhu, B.; Liu, Y. An Efficient Hybrid CNN-Transformer Approach for Remote Sensing Super-Resolution. Remote Sens. 2024, 16, 880. [Google Scholar] [CrossRef]
Wang, Y.; Shao, Z.; Lu, T.; Huang, X.; Wang, J.; Chen, X.; Huang, H.; Zuo, X. Remote Sensing Image Super-Resolution via Multi-Scale Texture Transfer Network. Remote Sens. 2023, 15, 5503. [Google Scholar] [CrossRef]
Yue, X.; Chen, X.; Zhang, W.; Ma, H.; Wang, L.; Zhang, J.; Wang, M.; Jiang, B. Super-Resolution Network for Remote Sensing Images via Preclassification and Deep–Shallow Features Fusion. Remote Sens. 2022, 14, 925. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, L.; Liu, L.; Hu, H.; Tao, W. URNet: A U-Shaped Residual Network for Lightweight Image Super-Resolution. Remote Sens. 2021, 13, 3848. [Google Scholar] [CrossRef]
Xiong, Y.; Guo, S.; Chen, J.; Deng, X.; Sun, L.; Zheng, X.; Xu, W. Improved SRGAN for Remote Sensing Image Super-Resolution Across Locations and Sensors. Remote Sens. 2020, 12, 1263. [Google Scholar] [CrossRef]
Kang, X.; Li, J.; Duan, P.; Ma, F.; Li, S. Multilayer Degradation Representation-Guided Blind Super-Resolution for Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5534612. [Google Scholar] [CrossRef]
Dong, R.; Mou, L.; Zhang, L.; Fu, H.; Zhu, X. Real-world remote sensing image super-resolution via a practical degradation model and a kernel-aware network. ISPRS J. Photogramm. Remote Sens. 2022, 191, 155–170. [Google Scholar] [CrossRef]
Zhao, Z.; Ren, C.; Teng, Q.; He, X. A practical super-resolution method for multi-degradation remote sensing images with deep convolutional neural networks. J. Real-Time Image Process. 2022, 19, 1139–1154. [Google Scholar] [CrossRef]
Xiao, Y.; Yuan, Q.; Jiang, K.; He, J.; Wang, Y.; Zhang, L. From degrade to upgrade: Learning a self-supervised degradation guided adaptive network for blind remote sensing image super-resolution. Inf. Fusion 2023, 96, 297–311. [Google Scholar] [CrossRef]
Luo, Z.; Huang, Y.; Li, S.; Wang, L.; Tan, T. Unfolding the alternating optimization for blind super resolution. Adv. Neural Inf. Process. Syst. 2019, 33, 5632–5643. [Google Scholar]
Yan, Q.; Niu, A.; Wang, C.; Dong, W.; Woźniak, M.; Zhang, Y. KGSR: A kernel guided network for real-world blind super-resolution. Pattern Recognit. 2024, 147, 110095. [Google Scholar] [CrossRef]
Ates, H.F.; Yildirim, S.; Gunturk, B.K. Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation. Comput. Vis. Image Underst. 2023, 233, 103718. [Google Scholar] [CrossRef]
Zhou, H.; Zhu, X.; Zhu, J.; Han, Z.; Zhang, S.; Qin, J.; Yin, X. Learning Correction Filter via Degradation-Adaptive Regression for Blind Single Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12331–12341. [Google Scholar]
Cheng, G.; Han, J.; Lu, X. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef]
Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar]
Huynh-Thu, Q.; Ghanbari, M. Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 2008, 44, 800–801. [Google Scholar] [CrossRef]
Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 2366–2369. [Google Scholar]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]

Figure 1. The overall framework of RFKCNext. SRConvNext and RFGKCorrector are connected end-to-end. SRConvNext is utilized to reconstruct SR images, while RFGKCorrector estimates corrected degradation kernel vectors using the LR images and the reconstruction feature. The two subnetworks are alternately optimized for t loops.

Figure 2. The overall network structure of our network (RFKCNext). 1. SRConvNext consists of three stages: shallow feature extraction, deep feature extraction, and high-quality image reconstruction. It takes as input the concatenated features of the LR image and degradation kernel maps along the channel dimension and outputs the SR image. 2. Reconstructed features are extracted from the SR image using convolutional operations. 3. The input of the RFGKCorrector is the LR image and the reconstruction features of the SR image, which are utilized to generate the estimated degradation kernel vector. SRConvNext and RFGKCorrector are alternately connected to form an end-to-end network.

Figure 3. The network architecture of the ConvNext Block (CNB).

Figure 4. 5 Gaussian blur kernels for the testing phase, where

k_{1} : [λ_{1} = 2.5, λ_{2} = 2.5, θ = 0]

,

k_{2} : [λ_{1} = 3.8, λ_{2} = 1.8, θ = \frac{7}{10} π]

,

k_{3} : [λ_{1} = 1.4, λ_{2} = 2.7, θ = \frac{3}{10} π]

,

k_{4} : [λ_{1} = 0.5, λ_{2} = 2.3, θ = \frac{3}{5} π]

, and

k_{5} : [λ_{1} = 2.2, λ_{2} = 3.2, θ = \frac{8}{9} π]

.

Figure 4. 5 Gaussian blur kernels for the testing phase, where

k_{1} : [λ_{1} = 2.5, λ_{2} = 2.5, θ = 0]

,

k_{2} : [λ_{1} = 3.8, λ_{2} = 1.8, θ = \frac{7}{10} π]

,

k_{3} : [λ_{1} = 1.4, λ_{2} = 2.7, θ = \frac{3}{10} π]

,

k_{4} : [λ_{1} = 0.5, λ_{2} = 2.3, θ = \frac{3}{5} π]

, and

k_{5} : [λ_{1} = 2.2, λ_{2} = 3.2, θ = \frac{8}{9} π]

.

Figure 5. The visual comparisons of different methods under ×2 scale factor, Gaussian blurring k₁–k₅, and various noise levels on the NWPU-RESISC45 dataset. The HR image is depicted on the left. To enhance detail clarity, enlarged regions are displayed in the top-right or bottom-right corners of the HR image, highlighted by a red box. Corresponding LR images and reconstruction of enlarged regions using different methods are displayed on the right.

Figure 6. The visual comparisons of different methods under ×4 scale factor, Gaussian blurring k₁–k₅, and various noise levels on the NWPU-RESISC45 dataset. The HR image is depicted on the left. To enhance detail clarity, enlarged regions are displayed in the top-right or bottom-right corners of the HR image, highlighted by a red box. Corresponding LR images and reconstruction of enlarged regions using different methods are displayed on the right.

Figure 7. The visual comparisons of experimental results from different methods at the ×2 scale factor. “freeway81” is selected from the UCMERCED dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the bottom-right corner. The SR image reconstructed by different methods showed the enlarged image on the right side.

Figure 8. The visual comparisons of experimental results from different methods at the ×4 scale factor. “buildings22” is selected from the UCMERCED dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the bottom-right corner. The SR image reconstructed by different methods showed the enlarged image on the right side.

Figure 9. The visual comparisons of experimental results from different methods at the ×2 scaling factor. “105” is selected from the real-world remote sensing dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the top-right corner. The SR image reconstructed by different methods is shown in the enlarged image on the right side.

Figure 10. The visual comparisons of experimental results from different methods at the ×4 scaling factor. “174” is selected from the real-world remote sensing dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the top-right corner. The SR image reconstructed by different methods is shown in the enlarged image on the right side.

Table 1. The comparison of differences among three remote sensing datasets: NWPU-RESISC45, UCMERCED, and “Tianzhi Cup”.

Dataset	Number of Scene Classes	Number of Images	Image Size (Pixels)	Spatial Resolution (m)	Image Type	Coverage Area
NWPU-RESISC45	45	31,500	256 × 256	0.2–30	satellite images andaerial image	more than 100 countries and regions
UCMERCED	21	2100	256 × 256	0.3	aerial image	21 regions in the United States
Tianzhi Cup	aircraft, dense residential, runway and other scenes	430	4096 × 4096	0.5–1	satellite images	not mentioned in the dataset description

Table 2. The quantitative results on the NWPU-RESISC45 dataset for model parameters, running time, PSNR (dB), and SSIM under ×2 down-sampling scale factor, Gaussian blur kernels k₁–k₅, and noise levels

σ = [0, 5, 10]

. The running time is the average running time on the testing dataset generated by all degradation combinations. The best results are highlighted in red, the second-best in blue, and

↑

indicates higher values are preferable.

Table 2. The quantitative results on the NWPU-RESISC45 dataset for model parameters, running time, PSNR (dB), and SSIM under ×2 down-sampling scale factor, Gaussian blur kernels k₁–k₅, and noise levels

σ = [0, 5, 10]

. The running time is the average running time on the testing dataset generated by all degradation combinations. The best results are highlighted in red, the second-best in blue, and

↑

indicates higher values are preferable.

Model	Noise	Param	Running Time (ms/image)	k₁	k₂	k₃	k₄	k₅	Average

				PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑
SRMD	0	1.51 M	1.284	32.958/ 0.899	32.265/ 0.888	32.751/ 0.896	32.431/ 0.889	32.641/ 0.893	32.809/ 0.893
USRNet		0.81 M	23.177	31.539/ 0.860	31.182/ 0.855	31.660/ 0.868	32.013/ 0.877	31.395/ 0.858	31.558/ 0.858
IKC		9.02 M	97.907	32.900/ 0.890	31.310/ 0.873	32.422/ 0.888	32.046/ 0.878	32.608/ 0.886	32.257/ 0.883
DAN		4.65 M	113.937	33.329/ 0.902	33.011/ 0.894	33.217/ 0.910	33.287/ 0.902	33.167/ 0.903	33.202/ 0.902
DRSR		5.59 M	26.347	33.310/ 0.910	33.026/ 0.899	33.329/ 0.907	33.305/ 0.904	33.177/ 0.901	33.229/ 0.904
DSAT		15.50 M	140.251	33.400/ 0.906	33.078/ 0.904	33.360/ 0.903	33.329/ 0.910	33.171/ 0.899	33.268/ 0.904
KESPKNet		21.83 M	119.631	33.413/ 0.908	33.112/ 0.904	33.390/ 0.908	33.370/ 0.901	33.212/ 0.903	33.299/ 0.905
Ours		2.78 M	7.342	33.752/ 0.913	33.261/ 0.906	33.723/ 0.913	33.556/ 0.910	33.326/ 0.908	33.524/ 0.910
SRMD	5	1.51 M	1.282	29.810/ 0.790	29.609/ 0.788	30.135/ 0.802	30.618/ 0.823	29.658/ 0.791	29.966/ 0.799
USRNet		0.81 M	23.232	29.621/ 0.784	29.425/ 0.777	29.974/ 0.799	30.551/ 0.822	29.469/ 0.778	29.808/ 0.792
IKC		9.02 M	96.969	29.822/ 0.795	29.272/ 0.787	29.944/ 0.807	30.433/ 0.823	29.645/ 0.790	29.823/ 0.800
DAN		4.65 M	115.089	29.863/ 0.792	29.722/ 0.785	30.319/ 0.808	30.894/ 0.837	29.739/ 0.786	30.107/ 0.802
DRSR		5.59 M	25.752	29.870/ 0.800	29.755/ 0.788	30.302/ 0.811	30.794/ 0.832	29.767/ 0.788	30.098/ 0.804
DSAT		15.50 M	136.542	29.879/ 0.798	29.760/ 0.789	30.310/ 0.814	30.861/ 0.828	29.775/ 0.789	30.117/ 0.804
KESPKNet		21.83 M	114.694	29.900/ 0.794	29.747/ 0.792	30.326/ 0.809	30.908/ 0.833	29.791/ 0.791	30.134/ 0.804
Ours		2.78 M	7.612	30.090/ 0.801	29.833/ 0.793	30.455/ 0.815	31.010/ 0.837	29.870/ 0.794	30.252/ 0.808
SRMD	10	1.51 M	1.283	28.804/ 0.747	28.640/ 0.746	29.065/ 0.758	29.538/ 0.780	28.676/ 0.743	28.945/ 0.755
USRNet		0.81 M	22.964	28.738/ 0.746	28.576/ 0.739	29.032/ 0.760	29.523 /0.782	28.615/ 0.740	28.897/ 0.753
IKC		9.02 M	97.491	28.798/ 0.752	28.512/ 0.746	28.977/ 0.764	29.418/ 0.781	28.673/ 0.747	28.876/ 0.758
DAN		4.65 M	114.425	28.848/ 0.750	28.700/ 0.745	29.159/ 0.765	29.683/ 0.787	28.770/ 0.748	29.032/ 0.759
DRSR		5.59 M	26.058	28.867/ 0.750	28.709/ 0.744	29.165/ 0.764	29.687/ 0.791	28.753/ 0.748	29.036/ 0.759
DSAT		15.50 M	139.972	28.869/ 0.753	28.721/ 0.746	29.172/ 0.764	29.690/ 0.789	28.773/ 0.744	29.045/ 0.759
KESPKNet		21.83 M	117.972	28.883/ 0.752	28.731/ 0.745	29.181/ 0.767	29.709/ 0.791	28.782/ 0.749	29.057/ 0.761
Ours		2.78 M	7.542	29.015/ 0.757	28.820/ 0.750	29.318/ 0.771	29.808/ 0.794	28.856/ 0.752	29.163/ 0.765

Table 3. The quantitative results on the NWPU-RESISC45 dataset for model parameters, running time, PSNR (dB), and SSIM under ×4 down-sampling scale factor, Gaussian blur kernels k₁–k₅, and noise levels

σ = [0, 5, 10]

. The running time is the average running time on the testing dataset generated by all degradation combinations. The best results are highlighted in red, the second-best in blue, and

↑

indicates higher values are preferable.

Table 3. The quantitative results on the NWPU-RESISC45 dataset for model parameters, running time, PSNR (dB), and SSIM under ×4 down-sampling scale factor, Gaussian blur kernels k₁–k₅, and noise levels

σ = [0, 5, 10]

. The running time is the average running time on the testing dataset generated by all degradation combinations. The best results are highlighted in red, the second-best in blue, and

↑

indicates higher values are preferable.

Model	Noise	Param	Running Time (ms/image)	k₁	k₂	k₃	k₄	k₅	Average

				PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑	PSNR↑/SSIM↑
SRMD	0	1.55 M	1.162	28.663/ 0.746	28.553/ 0.743	28.625/ 0.745	28.610/ 0.745	28.644/ 0.746	28.619/ 0.745
USRNet		0.81 M	22.982	28.612/ 0.743	28.574/ 0.742	28.610/ 0.743	28.605/ 0.743	28.569/ 0.743	28.594/ 0.743
IKC		9.17 M	96.656	28.625/ 0.741	28.546/ 0.734	28.633/ 0.738	28.711/ 0.745	28.594/ 0.736	28.622/ 0.741
DAN		4.80 M	112.472	28.777/ 0.747	28.707/ 0.744	28.796/ 0.748	28.592/ 0.745	28.788/ 0.750	28.732/ 0.747
DRSR		5.74 M	25.841	28.743/ 0.746	28.673/ 0.747	28.764/ 0.749	28.678/ 0.751	28.763/ 0.752	28.724/ 0.749
DSAT		15.64 M	132.822	28.786/ 0.752	28.708/ 0.745	28.790/ 0.751	28.682/ 0.749	28.782/ 0.753	28.750/ 0.750
KESPKNet		21.98 M	110.469	28.809/ 0.751	28.718/ 0.749	28.798/ 0.753	28.702/ 0.749	28.796/ 0.751	28.765/ 0.751
Ours		2.92 M	7.425	28.916/ 0.756	28.859/ 0.755	28.888/ 0.757	28.829/ 0.754	28.887/ 0.756	28.876/ 0.756
SRMD	5	1.55 M	1.181	27.750/ 0.699	27.664/ 0.696	27.826/ 0.703	27.936/ 0.709	27.697/ 0.696	27.775/ 0.701
USRNet		0.81 M	22.679	27.774/ 0.701	27.707/ 0.698	27.859/ 0.705	27.967/ 0.711	27.729/ 0.698	27.807/ 0.703
IKC		9.17 M	96.053	27.715/ 0.696	27.613/ 0.689	27.831/ 0.701	28.010/ 0.712	27.627/ 0.689	27.759/ 0.697
DAN		4.80 M	113.919	27.792/ 0.705	27.733/ 0.698	27.877/ 0.706	27.972/ 0.717	27.756/ 0.700	27.826/ 0.705
DRSR		5.74 M	25.649	27.795/ 0.706	27.723/ 0.701	27.814/ 0.707	28.001/ 0.713	27.740/ 0.701	27.815/ 0.706
DSAT		15.64 M	133.457	27.813/ 0.702	27.745/ 0.699	27.873/ 0.711	28.013/ 0.715	27.755/ 0.703	27.840/ 0.706
KESPKNet		21.98 M	111.249	27.824/ 0.707	27.762/ 0.700	27.882/ 0.711	28.020/ 0.716	27.763/ 0.702	27.850/ 0.707
Ours		2.92 M	7.427	27.925/ 0.709	27.844/ 0.705	28.007/ 0.713	28.105/ 0.719	27.867/ 0.706	27.950/ 0.710
SRMD	10	1.55 M	1.157	26.988/ 0.663	26.910/ 0.660	27.079/ 0.669	27.219/ 0.676	26.937/ 0.661	27.027/ 0.666
USRNet		0.81 M	22.730	27.031/ 0.666	26.906/ 0.661	27.061/ 0.671	27.197/ 0.679	26.925/ 0.658	27.024/ 0.667
IKC		9.17 M	95.913	26.900/ 0.660	26.842/ 0.655	27.041/ 0.666	27.191/ 0.675	26.831/ 0.654	26.961/ 0.662
DAN		4.80 M	112.247	27.014/ 0.670	26.906/ 0.664	27.086/ 0.673	27.256/ 0.683	26.905/ 0.666	27.033/ 0.671
DRSR		5.74 M	25.725	26.981/ 0.669	26.882/ 0.665	27.083/ 0.675	27.157/ 0.679	26.917/ 0.663	27.004/ 0.670
DSAT		15.64 M	134.343	27.009/ 0.671	26.903/ 0.663	27.098/ 0.671	27.252/ 0.682	26.926/ 0.665	27.038/ 0.670
KESPKNet		21.98 M	112.032	27.033/ 0.670	26.914/ 0.666	27.107/ 0.674	27.260/ 0.685	26.933/ 0.667	27.049/ 0.672
Ours		2.92 M	7.487	27.120/ 0.672	27.046/ 0.668	27.219/ 0.677	27.352/ 0.685	27.064/ 0.670	27.160/ 0.674

Table 4. The NIQE quantitative results at ×2 and ×4 scale factors on the UCMERCED dataset. The best results are highlighted in red, while the second-best are highlighted in blue.

↓

indicates lower values are preferable.

Table 4. The NIQE quantitative results at ×2 and ×4 scale factors on the UCMERCED dataset. The best results are highlighted in red, while the second-best are highlighted in blue.

↓

indicates lower values are preferable.

Model	Scale Factor	$NIQE ↓$
SRMD	2	22.184
USRNet		22.311
IKC		22.699
DAN		22.925
DRSR		22.794
DSAT		22.561
KESPKNet		22.762
Ours		20.787
SRMD	4	20.176
USRNet		19.388
IKC		18.913
DAN		19.348
DRSR		19.091
DSAT		19.511
KESPKNet		19.505
Ours		18.415

Table 5. The NIQE quantitative results at ×2 and ×4 scale factors on the real-world remote sensing dataset. The best results are highlighted in red, while the second-best are highlighted in blue.

↓

indicates lower values are preferable.

Table 5. The NIQE quantitative results at ×2 and ×4 scale factors on the real-world remote sensing dataset. The best results are highlighted in red, while the second-best are highlighted in blue.

↓

indicates lower values are preferable.

Model	Scale Factor	$NIQE ↓$
SRMD	2	16.959
USRNet		17.831
IKC		21.541
DAN		23.032
DRSR		22.731
DSAT		21.736
KESPKNet		22.004
Ours		16.392
SRMD	4	17.715
USRNet		17.925
IKC		20.848
DAN		17.188
DRSR		18.321
DSAT		18.750
KESPKNet		18.034
Ours		16.691

Table 6. The quantitative results of PSNR (dB) and SSIM for ablation experiments using ×2 scale factor, Gaussian blur kernels k₁–k₅, and noise levels

σ = [0, 5, 10]

on the NWPU-RESISC45 dataset to validate the effectiveness of RFGKCorrector. SRConvNext combined with RFGKCorrector forms RFKCNext. The best results are highlighted in red and

↑

indicates higher values are preferable.

Table 6. The quantitative results of PSNR (dB) and SSIM for ablation experiments using ×2 scale factor, Gaussian blur kernels k₁–k₅, and noise levels

σ = [0, 5, 10]

on the NWPU-RESISC45 dataset to validate the effectiveness of RFGKCorrector. SRConvNext combined with RFGKCorrector forms RFKCNext. The best results are highlighted in red and

↑

indicates higher values are preferable.

Model	Noise	k₁	k₂	k₃	k₄	k₅	Average

		PSNR↑/ SSIM↑	PSNR↑/ SSIM↑	PSNR↑/ SSIM↑	PSNR↑/ SSIM↑	PSNR↑/ SSIM↑	PSNR↑/ SSIM↑
SRConvNext	0	33.649/0.906	33.115/0.892	33.551/0.904	33.325/0.895	33.060/0.891	33.340/0.898
RFKCNext	0	33.752/0.913	33.261/0.906	33.723/0.913	33.556/0.910	33.326/0.908	33.524/0.910
SRConvNext	5	29.893/0.793	29.755/0.788	30.231/0.812	30.860/0.830	29.697/0.787	30.087/0.802
RFKCNext	5	30.090/0.801	29.833/0.793	30.455/0.815	31.010/0.837	29.870/0.794	30.252/0.808
SRConvNext	10	28.897/0.751	28.693/0.747	29.163/0.769	29.701/0.789	28.735/0.746	29.038/0.761
RFKCNext	10	29.015/0.757	28.820/0.750	29.318/0.771	29.808/0.794	28.856/0.752	29.163/0.765

Table 7. The quantitative results of PSNR (dB) and SSIM for ablation experiments using ×4 scale factor, Gaussian blur kernels k₁–k₅, and noise levels

σ = [0, 5, 10]

on the NWPU-RESISC45 dataset to validate the effectiveness of RFGKCorrector. SRConvNext combined with RFGKCorrector forms RFKCNext. The best results are highlighted in red and

↑

indicates higher values are preferable.

Table 7. The quantitative results of PSNR (dB) and SSIM for ablation experiments using ×4 scale factor, Gaussian blur kernels k₁–k₅, and noise levels

σ = [0, 5, 10]

on the NWPU-RESISC45 dataset to validate the effectiveness of RFGKCorrector. SRConvNext combined with RFGKCorrector forms RFKCNext. The best results are highlighted in red and

↑

indicates higher values are preferable.

Model	Noise	k₁	k₂	k₃	k₄	k₅	Average

		PSNR↑/ SSIM↑	PSNR↑/ SSIM↑	PSNR↑/ SSIM↑	PSNR↑/ SSIM↑	PSNR↑/ SSIM↑	PSNR↑/ SSIM↑
SRConvNext	0	28.773/0.754	28.674/0.751	28.717/0.753	28.707/0.753	28.747/0.753	28.724/0.753
RFKCNext	0	28.916/0.756	28.859/0.755	28.888/0.757	28.829/0.754	28.887/0.756	28.876/0.756
SRConvNext	5	27.835/0.706	27.732/0.702	27.889/0.711	28.002/0.717	27.790/0.704	27.850/0.708
RFKCNext	5	27.925/0.709	27.844/0.705	28.007/0.713	28.105/0.719	27.867/0.706	27.950/0.710
SRConvNext	10	27.045/0.670	26.953/0.666	27.116/0.675	27.261/0.683	26.986/0.667	27.072/0.672
RFKCNext	10	27.120/0.672	27.046/0.668	27.219/0.677	27.352/0.685	27.064/0.670	27.160/0.674

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qin, Y.; Nie, H.; Wang, J.; Liu, H.; Sun, J.; Zhu, M.; Lu, J.; Pan, Q. Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction. Remote Sens. 2024, 16, 2915. https://doi.org/10.3390/rs16162915

AMA Style

Qin Y, Nie H, Wang J, Liu H, Sun J, Zhu M, Lu J, Pan Q. Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction. Remote Sensing. 2024; 16(16):2915. https://doi.org/10.3390/rs16162915

Chicago/Turabian Style

Qin, Yi, Haitao Nie, Jiarong Wang, Huiying Liu, Jiaqi Sun, Ming Zhu, Jie Lu, and Qi Pan. 2024. "Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction" Remote Sensing 16, no. 16: 2915. https://doi.org/10.3390/rs16162915

APA Style

Qin, Y., Nie, H., Wang, J., Liu, H., Sun, J., Zhu, M., Lu, J., & Pan, Q. (2024). Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction. Remote Sensing, 16(16), 2915. https://doi.org/10.3390/rs16162915

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction

Abstract

1. Introduction

2. Related Work

2.1. CNN-Based Multi-Degradation SR Methods

2.2. Multi-Degradation SR Methods for Remote Sensing Images

2.3. The Method of Kernel Correction

3. Methodology

3.1. Degradation Formulation

3.2. Network Architecture

3.3. Super-Resolution Network (SRConvNext)

3.4. Reconstruction Features-Guided Kernel Corrector (RFGKCorrector)

3.5. Loss Function

4. Experiment

4.1. Datasets and Metrics

4.2. Experimental Settings

4.3. Experiments on NWPU-RESISC45 Synthetic Images

4.3.1. Quantitative Results

4.3.2. Qualitative Results

4.4. Experiments on UCMERCED Remote Sensing Images

Qualitative Results

4.5. Experiments on Real-World Remote Sensing Images

Qualitative Results

4.6. Ablation Studies

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI