Next Article in Journal
Modeling Residual Magnetic Anomalies of Landmines Using UAV-Borne Vector Magnetometer: Flight Simulations and Experimental Validation
Next Article in Special Issue
MDFA-Net: Multi-Scale Differential Feature Self-Attention Network for Building Change Detection in Remote Sensing Images
Previous Article in Journal
On Optimizing Hyperspectral Inversion of Soil Copper Content by Kernel Principal Component Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction

1
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
Satellite Information Intelligent Processing and Application Research Laboratory, Beijing 100192, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(16), 2915; https://doi.org/10.3390/rs16162915
Submission received: 27 June 2024 / Revised: 5 August 2024 / Accepted: 8 August 2024 / Published: 9 August 2024

Abstract

:
A variety of factors cause a reduction in remote sensing image resolution. Unlike super-resolution (SR) reconstruction methods with single degradation assumption, multi-degradation SR methods aim to learn the degradation kernel from low-resolution (LR) images and reconstruct high-resolution (HR) images more suitable for restoring the resolution of remote sensing images. However, existing multi-degradation SR methods only utilize the given LR images to learn the representation of the degradation kernel. The mismatches between the estimated degradation kernel and the real-world degradation kernel lead to a significant deterioration in performance of these methods. To address this issue, we design a reconstruction features-guided kernel correction SR network (RFKCNext) for multi-degradation SR reconstruction of remote sensing images. Specifically, the proposed network not only utilizes LR images to extract degradation kernel information but also employs features from SR images to correct the estimated degradation kernel, thereby enhancing the accuracy. RFKCNext utilizes the ConvNext Block (CNB) for global feature modeling. It employs CNB as fundamental units to construct the SR reconstruction subnetwork module (SRConvNext) and the reconstruction features-guided kernel correction network (RFGKCorrector). The SRConvNext reconstructs SR images based on the estimated degradation kernel. The RFGKCorrector corrects the estimated degradation kernel by reconstruction features from the generated SR images. The two networks iterate alternately, forming an end-to-end trainable network. More importantly, the SRConvNext utilizes the degradation kernel estimated by the RFGKCorrection for reconstruction, allowing the SRConvNext to perform well even if the degradation kernel deviates from the real-world scenario. In experimental terms, three levels of noise and five Gaussian blur kernels are considered on the NWPU-RESISC45 remote sensing image dataset for synthesizing degraded remote sensing images to train and test. Compared to existing super-resolution methods, the experimental results demonstrate that our proposed approach achieves significant reconstruction advantages in both quantitative and qualitative evaluations. Additionally, the UCMERCED remote sensing dataset and the real-world remote sensing image dataset provided by the “Tianzhi Cup” Artificial Intelligence Challenge are utilized for further testing. Extensive experiments show that our method delivers more visually plausible results, demonstrating the potential of real-world application.

Graphical Abstract

1. Introduction

As an important long-distance Earth observation, remote sensing technology is widely applied in meteorology, agriculture, geography, military, and other fields [1]. Remote sensing images, especially high-resolution (HR) images, serve as the data carriers of remote sensing technology, accurately representing the target and its background information. Therefore, it plays a crucial role in various remote sensing tasks such as change detection [2,3], semantic segmentation [4,5], object detection [6,7], and scene classification [8,9]. However, due to the influence of imaging environments and devices, the remote sensing images are often acquired as low-resolution (LR) images, limiting the practical applications [10]. Image super-resolution (SR) reconstruction technology, as an image processing algorithm, effectively addressed the above problem without changing the original imaging equipment. Therefore, research on the SR reconstruction of remote sensing images received extensive attention.
Image SR aims to reconstruct HR images using the limited information available in LR images. As a low-level task in computer vision, it has received increasing attention in recent years. Currently, SR methods are broadly divided into the following categories: interpolation-based, reconstruction-based, and learning-based [11]. With the rapid development of deep learning (DL) technology, Dong et al. [12] pioneeringly introduced a convolutional neural network (CNN) for SR, significantly outperforming the interpolation-based and reconstruction-based methods in terms of performance. It led to the DL-based methods becoming the mainstream approach.
The excellent network structures often inspire the network design for SR in high-level computer vision tasks, such as residual learning [13,14] and dense connections [15,16]. Fractal connection architecture based on fractal theory [17,18,19,20,21,22] has also been applied in SR [23,24,25]. Particularly, the emergence of the Vision Transformer (ViT) has surpassed and replaced traditional CNN structures, frequently utilized in SR network design [26,27]. Although ViT’s capability of long-range modeling helps to extract global image features, using fully connected layers and matrix multiplication operations in computing the self-attention mechanism results in high computational complexity [28]. Additionally, the vast number of model parameters contributes to increased computational costs [29,30]. Furthermore, it lacks the inherent inductive biases of a CNN, which hampers its generalization performance, especially training with insufficient data [31,32,33,34]. Consequently, a large amount of data and computational resources are required to compensate for these limitations. To address the above issues, ConvNext [35] adopts a more “modernized” structure that retains the simplicity and efficiency of a CNN to compete once again with Transformers in terms of accuracy and scalability. Despite ConvNext demonstrating impressive performance in high-level vision tasks, few researchers have attempted to use ConvNext in SR tasks.
Due to factors such as lighting, atmospheric propagation, and sensor quantization, remote sensing images are affected by multi-degradation, including blur and noise [10]. Currently, DL-based SR methods follow the simplistic assumption of bicubic down-sampling, which significantly declined SR network performance when dealing with multi-degradations [36]. Therefore, it is crucial to consider multi-degradation for constructing SR networks.
Existing multi-degradation SR methods rely on the given LR image to learn the degradation kernel features, utilized to augment the training dataset or directly perform SR reconstruction. When the estimated degradation kernel mismatches the real-world degradation, the reconstructed SR image tends to be overly smooth or sharp [37]. It indicates that the features of the reconstructed remote sensing SR image contain deviation information of the degradation kernel. Therefore, merely utilizing the LR image to learn degradation kernel features cannot provide effective feedback for the deviation, which makes it difficult to correct the estimated degradation kernel during the training process reasonably.
To address the aforementioned issues, this paper designs a ConvNext-based multi-degradation SR network for remote sensing images. This network utilizes the features of SR reconstructed images to correct the degradation kernel. The architecture comprises two main components: the SR reconstruction subnetwork module (SRConvNext) and the reconstruction features-guided kernel correction network (RFGKCorrector), both constructed by ConvNext Blocks (CNBs) as fundamental units. CNBs achieve global modeling of features, adapting to the characteristics of abundant spatial information and cross-scale targets. First, we initialize a dimension-reduced degradation kernel vector as an additional feature, which is input into the SR network along with the LR image. Second, the SR network transfers the reconstructed image features to the correction network, generating feedback to adjust the estimated degradation kernel. Then, we iteratively alternate between these two subnetworks to form an end-to-end trainable network. Notably, the degradation kernel vector we utilized is not the real-world degradation kernel. Throughout the loop iterative process, this vector is continuously refined by the correction network as the feedback to the SR network for reconstructing images closer to HR. This approach allows for greater tolerance when the estimated degradation kernel deviates from the real-world degradation, thereby improving robustness in real-world scenarios.
The motivation behind this work is twofold. Firstly, we jointly optimize the image reconstruction and kernel correction networks. This end-to-end training is desirable for mitigating the accumulation of errors from preceding stages in sequential cascading methods. Secondly, we control the uncertainty in the degradation kernel estimation process through SR image reconstruction features, thereby enhancing the authenticity of the kernel correction network.
The main contributions of this article are summarized as follows:
(1)
We design a deep learning SR network (RFKCNext) for degradation kernel correction in multi-degradation SR for remote sensing images. Unlike the potential expression of learning degraded kernels directly from LR images, RFKCNext utilizes features from the SR images to correct the estimated degradation kernels. It enables the kernel correction network to better capture the deviation between the estimated and the real-world degradation kernels, thereby improving the accuracy of the degradation kernel estimation and the quality of the final reconstructed images.
(2)
We design a CNB-based SR reconstruction subnetwork module (SRConvNext) and a reconstruction features-guided kernel correction subnetwork module (RFGKCorrector) to form RFKCNext. The introduction of CNBs addresses the limitation of traditional CNN structures in globally modeling the features of remote sensing images. To our knowledge, we are the first to utilize CNBs to construct a network for multi-degradation SR reconstruction in remote sensing images.
(3)
Extensive experiments are conducted on the NWPU-RESISC45 dataset, UCMERCED dataset, and real-world remote sensing dataset provided by the “Tianzhi Cup” Artificial Intelligence Challenge. The qualitative and quantitative experimental results indicate that our method outperforms other methods, demonstrating the effectiveness of the designed method.
The remaining sections of this article are organized as follows: Section 2 provides a concise overview of related work. Section 3 offers a detailed explanation of the proposed methodology. Section 4 presents experimental results. A discussion of the experimental results is presented in Section 5. The conclusions are provided in Section 6.

2. Related Work

2.1. CNN-Based Multi-Degradation SR Methods

Over the past decade, with the rapid development of DL, numerous SR networks designed based on the bicubic degradation assumption [13,14,16,26,27,38,39,40,41] have emerged, achieving impressive results. It has laid a solid foundation for the advancement of multi-degradation SR methods. Currently, DL-based multi-degradation SR approaches have made some progress.
Zhang et al. [42] developed a Super-Resolution for Multiple Degradations (SRMD) network. They utilized principal component analysis (PCA) to reduce the dimensionality of the blur kernel features into a vector, concatenated with a noise vector. This concatenated vector was stretched into a degradation map matching the height and width of the LR image for training. Xu et al. [43] introduced the Unified Dynamic Convolutional Network for Variational Degradations (UDVD). This framework comprised a feature extraction network and a refinement network. The refinement network enhances performance through dynamic convolutions. Zhang et al. [44] decomposed the degradation model by the half-quadratic splitting (HQS) algorithm into two decoupled terms: the data and prior. They designed a deep Unfolding Super-Resolution Network (USRNet) consisting of the data module, the prior module, and the hyperparameter module to handle image SR reconstruction under various degradations. Zhang et al. [45], within the maximum a posteriori (MAP) framework, expanded the energy function to decouple the degradation model into deblurring and down-sampling image denoising. They designed the Deep Plug-and-play SR (DPSR) network, which employs CNNs during MAP optimization iterations to address these two issues, thereby achieving multi-degradation SR reconstruction. Liu et al. [46] proposed a degradation-aware self-attention-based Transformer model (DSAT). This network employed a CNN-mixed Transformer module for feature extraction and incorporated an attention mechanism to integrate latent degradation information, enabling the Transformer to adapt to unknown degradations during the learning process. Zhang et al. [47] designed a network combined kernel estimation and structural prior knowledge (KESPKNet). This network integrated kernel estimation with structural prior knowledge to reconstruct textures with high self-similarity. By designing a Global Texture Fusion Block (GTFB) to merge local and global textures, the network provided supplementary information for SR images.
Traditional CNN-based multi-degradation SR methods achieve good results in natural image reconstruction. However, remote sensing images differ significantly due to complex spatial information distribution, multi-scale targets, and abundant scene details. These characteristics necessitate global feature modeling by the network. Traditional CNN approaches are still limited by the local feature extraction of convolutional operations, which makes it difficult to carry out long distance feature modeling. Although ViT utilizes self-attention mechanisms to capture more extensive and global relationships, its complex self-attention computations increase computational costs, and the huge number of model parameters, leading to training difficulties [28,29,30]. Furthermore, ViT requires extensive reliance on large-scale datasets to compensate for these shortcomings during training because of the absence of the inherent inductive biases found in CNNs [31,32,33,34]. Therefore, it is imperative to build a network using advanced structured CNNs based on the characteristics of remote sensing images, which can achieve global feature modeling while ensuring computational cost.

2.2. Multi-Degradation SR Methods for Remote Sensing Images

Thanks to the outstanding performance of SR networks in natural images, many researchers have applied these to enhance the resolution of remote sensing images under the bicubic degradation assumption [11,48,49,50,51,52]. While these methods have improved the quality of reconstructed images, complex interferences such as imaging devices and environmental conditions expose remote sensing images to multi-degradation. The above methods are inevitably limited in practical applications. Therefore, addressing multi-degradation SR remains crucial in remote sensing images. Currently, there are existing studies aimed at tackling this issue.
Zhang et al. [10] estimated blur kernels and noise from real-world remote sensing images to synthesize a real-world training dataset. They designed a residual balanced attention network with a modified UNet discriminator (RBAN-UNet) to achieve SR reconstruction under real-world degradation conditions. Zhang et al. [36] designed an unsupervised network for handling multi-degradation. This network comprised a Degradation Network (D) and a Generative Network (G). The SR images reconstructed by G are fed into D to generate “fake” LR images, compared against the input LR images to assess the authenticity of the generated SR images. Kang et al. [53] proposed a novel Multilayer Degradation Representation-Guided Blind SR method. This approach utilized a contrastive learning framework to obtain degraded representations with different blur kernels from LR images. The degraded representations are employed to guide the extraction of high-order features at different scales, thereby enhancing the quality of the SR images. Dong et al. [54] proposed a degradation model incorporating estimated blur kernels from real-world images and kernels generated from predefined distributions to synthesize a real-world training dataset. They employed a kernel-aware network (KANet) to achieve multi-degradation SR reconstruction. Zhao et al. [55] utilized a generative adversarial network to develop a blur kernel extraction network, which employs internal information from real-world LR remote sensing images to estimate blur kernels. Xiao et al. [56] proposed a self-supervised degradation-guided adaptive network (DRSR), which utilizes contrastive learning to achieve adaptive representations of degradation. Additionally, they designed a dual-wise feature modulation network to convert features and channel dimensions, thereby mapping LR features to the desired domain for reconstruction. The designed multi-degradation remote sensing SR network enhanced feature extraction capabilities by incorporating densely connected mechanisms and multi-scale feature extraction blocks.
While significant progress has been made in multi-degradation SR for remote sensing, current methods typically involve two consecutive steps. Firstly, networks are designed to directly learn from LR remote sensing images, aiming to approximate real-world degradation kernels or using learned estimated kernels to synthesize “real-world” training datasets. Secondly, SR networks are designed to reconstruct HR images by learned degradation kernel features or expanded training datasets. However, these approaches do not consider utilizing information from SR images to correct generated degradation kernels, thereby enhancing the robustness of the network against deviations from real-world conditions. Furthermore, these two-step solutions typically independently train two networks. When errors occur in the degradation kernel estimation, SR reconstruction performance significantly degrades [57].
Therefore, inspired by the experimental results in [37], to address this issue, we utilize features from SR reconstructed images to correct the estimated degradation kernel. We propose an end-to-end SR reconstruction network coupled with a kernel correction network. Through iterative optimization, we gradually reduce errors in the estimated degradation kernels, thereby enhancing the reconstruction capability of the SR network. Our approach achieves kernel learning solely through designated multi-degradation operations during training without using learned degradation kernels to generate additional training datasets.

2.3. The Method of Kernel Correction

The degradation processes in the real world are diverse and complex. Most existing multi-degradation SR methods addressed these challenges by learning latent representations of the degradation kernels. Obtained degradation features mitigate the interference caused by degradation factors to enhance the performance of the network in re-constructing LR images from the real world. These methods effectively handled various degradation factors without covering all possible degradation scenarios. Consequently, when the estimated degradation kernel deviates significantly from the actual kernel, the performance of these methods can deteriorate substantially. Thus, it is crucial to develop effective techniques for kernel correction based on degradation features to better align with real-world conditions. Some researchers have investigated many kernel correction approaches.
Gu et al. [37] observed that SR results exhibit over-smoothing or over-sharpening when the estimated kernel utilized for SR mismatches the real kernel. Based on this observation, they proposed the Iterative Kernel Correction (IKC) network to achieve more ideal SR results by correcting the estimated kernel. Inspired by [37], Luo et al. [57] proposed a Deep Alternating Network (DAN). This network consisted of two CNN modules, Restorer and Estimator, connected end-to-end for SR reconstruction and degradation kernel prediction. Yan et al. [58] proposed a kernel-guided network for real-world blind super-resolution (KGSR), which consists of a downscaling generator and an upscaling generator. By incorporating an orientation prior mechanism within the discriminator of the downscaling generator, the network ensured that the learned kernels adhere to the degradation process of real scenarios. This approach provided the upscaling generator with accurate blur kernels, thereby facilitating the generation of high-quality images. Ates et al. [59] proposed an end-to-end trainable iterative kernel reconstruction network (IKR-Net) for blind super-resolution, based on Zhang’s work. The network comprised a kernel initialization module, a kernel reconstruction module, a noise estimator module, and an SR reconstruction module. The kernel reconstruction module employed HQS to decompose the kernel estimation into two sub-modules: the non-trainable module Dk and the kernel denoising module Pk. Dk is used for reconstructing the updated kernel, while Pk applies regularization to the reconstructed kernel, enabling iterative refinement of the estimated kernels. Zhou et al. [60] proposed an unsupervised method to learn correction filtering (kernel) for blind single-image super-resolution in a spatially variant way. The network utilized a linearly assembled pixel degradation-adaptive regression module (DARM) to adjust the degradation of LR images to match known degradations. DARM was optimized by using a dictionary of multiple predefined kernel bases, enabling accurate learning of correction kernel and enhancing the network’s adaptability to complex unknown degradations.

3. Methodology

In this section, we provide a detailed description of the proposed RFKCNext. We begin with an introduction to the degradation formulation, followed by an overview of the RFKCNext framework. Next, we introduce the SR network module, SRConvNext, the kernel correction network module, and RFGKCorrector. Finally, we present the loss functions utilized during training.

3.1. Degradation Formulation

Before introducing RFKCNext, we first present the degradation formulation. The process of obtaining LR images from HR images through multi-degradation can be obtained from Equation (1).
I L R = ( I H R k ) γ + n
where I H R represents the HR image, I L R represents the LR image, k denotes the blur kernel, γ signifies the down-sampling operation with a scale factor of γ , and n represents the additive white Gaussian noise (AWGN) with a standard deviation of σ (noise level).

3.2. Network Architecture

The overall framework of RFKCNext is illustrated in Figure 1. RFKCNext reconstructs high-quality images by the SRConvNext module and corrects the degradation kernel with the RFGKCorrector module. These two subnetwork modules are executed alternately in each loop iterative optimization, forming an end-to-end trainable network.
We employ the initialization method proposed in [25] to generate the initial blur kernel k 0 R m × m (where m denotes the size of the blur kernel) and vectorize it into α 0 R m 2 × 1 . Subsequently, PCA is applied to obtain the q -dimensional blur kernel vector α ¯ 0 R q × 1 . Finally, this vector is concatenated with the noise level to produce the initial dimension-reduced degradation kernel vector β R ( q + 1 ) × 1 . The LR image is derived from the HR image according to Equation (1).
When the generated degradation kernel deviates from the real-world kernel, the resulting SR image tends to be overly smooth or overly sharp [23], leading to an increased difference from the HR image. Consequently, the reconstruction feature of the SR image inherently contains information about the degradation kernel deviation. As the estimated degradation kernel approximates the real-world kernel, the SR image becomes closer to the HR image. Based on this phenomenon, we input the reconstruction feature and the LR image to the RFGKCorrector, which corrects the estimated degradation kernel vector for utilizing in SRConvNext during the subsequent loop.
By employing a loop iterative approach to alternately optimize the two modules, the submodules can better exploit each other’s information to refine their respective network parameters. Compared to independently training the two networks and simply sequentially connecting them, RFKCNext mitigates the cumulative errors at each stage of the pipeline, thereby enhancing the quality of the reconstructed image.

3.3. Super-Resolution Network (SRConvNext)

SRConvNext consists of three stages: shallow feature extraction, deep feature extraction, and high-quality image reconstruction, as illustrated in Figure 2. Initially, let I L R R H × W × 3 ( H , W , 3 representing the height, width, and number of channels of the LR image, respectively) be the input LR image. The dimension-reduced degradation kernel vector β R ( q + 1 ) × 1 is stretched to form a degradation feature map M R H × W × ( q + 1 ) . M is concatenated with I L R along the channel dimension to form I 0 R H × W × ( q + 4 ) , which is fed into the shallow feature extraction stage to obtain the shallow features F 0 R H × W × C . This process can be expressed as:
F 0 = H C o n v k 3 s 1 p 1 ( I 0 )
where H C o n v k 3 s 1 p 1 ( ) denotes a 3 × 3 convolution with a stride of 1 and a padding of 1, and C represents the number of feature channels.
Subsequently, F 0 is fed into the CNB, the structure of which is illustrated in Figure 3. The feature extraction process can be described as follows:
F o u t = H L i n e a r ( H G E L U ( H L i n e a r ( H L N ( H D C o n v k 7 s 1 p 3 ( F i n ) ) ) ) ) + F i n
where H D C o n v k 7 s 1 p 3 ( ) denotes the 7 × 7 depthwise convolution with a stride of 1 and padding of 3, H L N ( ) represents the LayerNorm, H L i n e a r ( ) is the linear layer, H G E L U ( ) refers to the GELU activation function, and F i n and F o u t are the intermediate input and output features, respectively.
For deep feature extraction, multiple CNBs are cascaded to obtain F i R H × W × 2 C . This process is expressed as:
F i = H C N B i ( F i 1 ) , i = 1 , 2 , , b
where H C N B i ( ) represents the i - t h CNB, F i denotes the features output by the i - t h CNB, F i 1 indicates the features output by the preceding CNB, and b is the total number of CNBs.
Finally, after the high-quality image reconstruction stage, the SR image I S R R γ H × γ W × 3 is generated, where γ represents the scale factor. This process is represented as:
I S R = H C o n v k 3 s 1 p 1 ( H U p ( H L N ( H C o n v k 3 s 1 p 1 ( F i ) ) ) )
where H U p ( ) denotes the PixelShuffle up-sampling.

3.4. Reconstruction Features-Guided Kernel Corrector (RFGKCorrector)

The overall structure of RFGKCorrector is shown in Figure 2. The input I S R is first processed through a 3 × 3 convolutional operation with a stride of γ and padding of 1 to extract the reconstruction features F r f R H × W × C . This process is expressed as:
F r f = H C o n v k 3 s γ p 1 ( I S R )
I L R obtains F L R R H × W × C through H C o n v k 3 s 1 p 1 ( ) . The structure of the CNB in RFGKCorrector is the same as in SRConvNext, but the input features are the concatenation of the previous level features with F r f along the channel dimension. Finally, a global average pooling operation is performed to obtain the corrected dimension-reduced degenerate kernel vector.

3.5. Loss Function

For SRConvNext, the proposed deep network is trained by the L1 loss function, which is expressed as:
L S R = 1 n i = 1 n F ( y i ) x i 1
where n represents the total number of training samples, y i and x i denote the i - t h pair of HR and LR image patches, respectively, and F ( y i ) represents the SR image generated by the network.
For RFGKCorrector, we employ the L 1 loss between the estimated degenerate kernel K ^ and the ground-truth (GT) kernel K .
L K = 1 n i = 1 n K ^ i K i 1
where K ^ i represents the i - t h estimated degenerate kernel, and K i is the GT kernel.
The total loss is described as:
L t o t a l = L K + L S R

4. Experiment

4.1. Datasets and Metrics

Our experiments utilized two widely used remote sensing datasets, NWPU-RESISC45 [61] and UCMERCED [62], and a dataset from real-world remote sensing scenarios provided by the “Tianzhi Cup” Artificial Intelligence Challenge.
The NWPU-RESISC45 remote sensing dataset consists of 45 classes of remote sensing scene data, with each class containing 700 images, totaling 31,500 images of size 256 × 256 RGB and spatial resolutions ranging from 0.2 to 30 m. These images from Google Earth are selected from more than 100 countries and regions. The 45 scenario categories are as follows: airplane, airport, baseball diamond, basketball court, beach, bridge, chaparral, church, circular farmland, cloud, commercial area, dense residential, desert, forest, freeway, golf course, ground track field, harbor, industrial area, intersection, island, lake, meadow, medium residential, mobile home park, mountain, overpass, palace, parking lot, railway, railway station, rectangular farmland, river, roundabout, runway, sea ice, ship, snowberg, sparse residential, stadium, storage tank, tennis court, terrace, thermal power station, and wetland.
The UCMERCED remote sensing dataset comprises 21 classes, each comprising 100 images, resulting in 2100 images of size 256 × 256 RGB and a spatial resolution of approximately 0.3 m. These are USGS aerial images from 21 U.S. regions. The 21 classes are as follows: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium density residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis courts.
The real-world remote sensing scenarios dataset is derived from the “Tianzhi Cup” Artificial Intelligence Challenge organized jointly by the Beijing Remote Sensing Information Institute and the Artificial Intelligence Innovation Research Institute of the Chinese Academy of Sciences, specifically from the visible light aircraft intelligent detection and recognition track. This dataset comprises 611 images, with 308 images in the training dataset, 122 in the validation dataset, and 181 in the testing dataset (not available for download). It includes 11 classes of objects, with approximately 13,000 aircraft samples. Each image has a size of 4096 × 4096, with a spatial resolution of approximately 0.5 m. This dataset can be obtained from https://rsaicp.com/portal/dataList (accessed on 27 May 2021).
In summary, the NWPU-RESISC45 dataset contains the broadest category, covering all kinds of scenes in real life. The 21 types in the UCMERCED dataset are also basically included in the NWPU-RESISC45 categories. Although the “Tianzhi Cup” dataset is used for the detection and recognition of aircraft types with its huge image size of 4096 × 4096 (pixels), there are not only aircraft targets, but also other scenes, such as airport, dense residential, industrial area, runway, etc., which are also included in the NWPU-RESISC45 categories. The specific differences are shown in Table 1.
However, the image types and cover areas of the three datasets are different. As a result, images from different datasets of the same scene category are also different. And there is only scene similarity between images, and no overlap between image data. In addition, the above three datasets are all captured in real areas under the scene. Thus, the remote sensing images data inherently contain the original degradation factors under the real scene.
Therefore, we selected the NWPU-RESISC45 remote sensing images dataset as the training data, and the UCMERCED dataset and “Tianzhi Cup” dataset as the testing data. At the same time, we considered Gaussian blur kernels, noise, and down-sampling operations on the NWPU-RESISC45 dataset to synthesize the degraded remote sensing images to further simulate the degradation process. We randomly selected 100 images from each class as the training dataset and 10 as the testing dataset. Thus, the training dataset comprises 4500 images, while the testing dataset includes 450. To ensure the authenticity of the experimental results, there is no overlap between the training and testing datasets.
For evaluating the results on the NWPU-RESISC45 synthetic dataset, we utilized peak signal-to-noise ratio (PSNR) [63] and structural similarity index (SSIM) [64] as the experimental evaluation metrics. PSNR and SSIM were computed on the Y channel in the YCbCr color space [53]. As for evaluating the results on the UCMERCED remote sensing and real-world remote sensing datasets, due to the unavailability of HR images, we employed the natural image quality evaluator (NIQE) [65] from the no-reference image quality assessment metrics for evaluation purposes.

4.2. Experimental Settings

In this experiment, we only focused on ×2 and ×4 scale factors. During the training phase, we degraded the NWPU-RESISC45 image data by applying anisotropic Gaussian blur, bicubic interpolation down-sampling, and AWGN to generate LR images. This degradation process follows Equation (1). The size of the anisotropic Gaussian blur was fixed at 15 × 15 , with the lengths λ 1 and λ 2 of two axes uniformly distributed between [0.1, 5] and random rotation angles θ uniformly distributed between [0, π ]. When λ 1 = λ 2 , it becomes an isotropic Gaussian kernel. AWGN is represented by a standard deviation σ (noise level) set within [0, 25]. Following the setting in [20], we utilized PCA to reduce the blur kernel to a 15-dimensional vector concatenated with the noise level to form a 16-dimensional vector. During the testing phase, we conducted experiments using one isotropic Gaussian kernel and four anisotropic Gaussian kernels, as depicted in Figure 4, where k 1 : [ λ 1 = 2.5 , λ 2 = 2.5 , θ = 0 ] , k 2 : [ λ 1 = 3.8 , λ 2 = 1.8 , θ = 7 10 π ] , k 3 : [ λ 1 = 1.4 , λ 2 = 2.7 , θ = 3 10 π ] , k 4 : [ λ 1 = 0.5 , λ 2 = 2.3 , θ = 3 5 π ] and k 5 : [ λ 1 = 2.2 , λ 2 = 3.2 , θ = 8 9 π ] . σ adopted [0, 5, 10] so that there are 5 × 3 = 15 degradation combinations for different combinations of Gaussian blur kernels and noise levels.
During training, we randomly cropped HR images into 64 × 64 patches. The corresponding LR image patches at ×2 and ×4 scale factors were 32 × 32 and 16 × 16 , respectively. We also randomly rotated 90°, 180°, and 270° and flipped horizontally for data augmentation.
We employed the Adam optimizer [66] to train our model, with parameters β 1 = 0.9 and β 2 = 0.999 . The batch size was 32 for the ×2 scale factor and 128 for the ×4 scale factor. The total number of training epochs was 2000, with an initial learning rate of 10−4 that halved every 2 × 105 iterations. SRConvNext utilized 30 CNBs, while RFGKCorrector utilized 6 CNBs; the number of loops for alternating optimization between the two networks was fixed at t = 4 . All experiments were conducted using the PyTorch framework version 1.11 on a workstation with an i9-10900X CPU, 64 GB of RAM, and an NVIDIA RTX 3090 GPU.

4.3. Experiments on NWPU-RESISC45 Synthetic Images

We compared our proposed method with IKC [21], SRMD [22], USRNet [24], DAN [26], DRSR [56], DSAT [46], and KESPKNet [47]. Table 2 and Table 3 present the objective metrics of the model parameters, running time, PSNR, and SSIM for the networks above under ×2 and ×4 down-sampling scale factors. It is worth noting that the running time in the tables refers to the average running time on the testing dataset generated under combinations of five Gaussian blur kernels k1k5 and corresponding noise levels. The best results are highlighted in red, while the second-best results are highlighted in blue. It is evident that our proposed method achieves the highest PSNR in 15 degradation combinations.

4.3.1. Quantitative Results

For the ×2 scale factor, as shown in Table 2, our proposed method improves the average PSNR of the second-ranked KESPKNet by 0.225 dB, 0.118 dB, and 0.106 dB, respectively. Compared to USRNet, which has the minimum model parameters, our method increases average PSNR by 1.966 dB, 0.444 dB, and 0.266 dB. Compared to SRMD, which has the minimum runtime, our method achieves an average increase of 0.715 dB, 0.286 dB, and 0.218 dB in PSNR. However, the average runtime only increases by approximately 30 ms.
For the ×4 scale factor, as shown in Table 3, our method achieves average PSNR improvements of 0.111 dB, 0.100 dB, and 0.111 dB. Compared to USRNet, our method increases average PSNR by 0.282 dB, 0.143 dB, and 0.136 dB. Compared to SRMD, which has the minimum runtime, our method achieves an average increase of 0257 dB, 0.175 dB, and 0.133 dB in PSNR.
Although our method is not optimal in terms of model parameters and running time, it is noteworthy that, compared to SRMD and USRNet, our approach achieves superior SR performance. This demonstrates that the increase in model complexity and running time is acceptable.

4.3.2. Qualitative Results

Figure 5 and Figure 6 display the qualitative results of different methods under ×2 and ×4 scale factors, respectively. For visual convenience, we mark the area to be enlarged on the left HR image with a red box, and the enlarged image of the cropped region is displayed in the top-right or bottom-right corner of the HR image. We crop the same region from the LR image to illustrate the impact of different degradations on the HR image. The magnified patches of the LR image and the SR images reconstructed by various methods are displayed on the right.
For the ×2 scale factor, visual quality decreases with increasing noise levels across all reconstruction methods. In Figure 5, “railway_009”, “parking_lot_554”, and “parking_lot_665”, our approach yields sharper car reconstructions compared to others, maintaining car shapes well even under high noise conditions. In “ship_081”, our method consistently preserves and restores the boundaries of circular areas across different noise levels. “harbor_368” demonstrates our method’s capability to reconstruct more authentic and detailed texture features.
For the ×4 scale factor, it is evident that smaller scaling factors pose greater challenges. The loss of image information on LR images is more severe, as evidenced by the enlarged images of the corresponding LR regions. To mitigate noise interference, all reconstructed images from various methods tend to be smoother, and this smoothing becomes more pronounced as the noise level increases. However, our proposed method still performs well compared to others. In the “harbor_409” image shown in Figure 6, our method can recover more details on the boats. In “ground_track_field_131” and “industrial_area_694” images, our method maintains the shape of the architectural areas even under high noise conditions and reconstructs sharp edges. As for “stadium_152” and “church_305”, our method can restore the building textures more clearly.
Therefore, quantitative and qualitative experiments on the NWPU-RESISC45 synthetic dataset demonstrate that our method effectively mitigates noise and yields visually satisfactory results.

4.4. Experiments on UCMERCED Remote Sensing Images

We also evaluated all approaches on the UCMERCED dataset. We randomly selected 10 images from each category, totaling 210 for testing. There is no information from the UCMERCED dataset in the training dataset. Experiments were conducted directly on the UCMERCED images rather than down-sampling images. Therefore, for the ×2 and ×4 scale factors, the corresponding reconstructed image resolution is 512 × 512 pixels and 1024 × 1024 pixels. Due to the absence of HR reference images, we employed the non-reference metric NIQE for quantitative evaluation, where lower scores indicate better perceptual quality.

Qualitative Results

Table 4 presents NIQE results for different methods at ×2 and ×4 scale factors. Quantitative analysis shows that our proposed method achieves the lowest NIQE scores for both scaling factors, demonstrating superior performance.
Figure 7 and Figure 8 depict the qualitative results of different methods at ×2 and ×4 scale factors, respectively. Figure 7 shows the “freeway81” of the ×2 scale factor. Our approach generates the sharpest car edges and authentic restored textures on front and rear windows. Figure 8 shows the “buildings22” of the ×4 scale factor. Our method successfully reconstructs clear signage details even after enlarging the rooftop area to 1024 ×1024 pixels. The above results indicate that our method has good effectiveness and generalization compared to other methods on other public datasets.

4.5. Experiments on Real-World Remote Sensing Images

To further evaluate the SR performance of the proposed method on real-world degraded remote sensing images, we conducted experiments on the dataset from the “Tianzhi Cup” visible light image aircraft intelligent detection and recognition track. We randomly selected two images from its given training and validation datasets for testing. Given the original image resolution of 4096 × 4096 pixels, we cropped regions of interest into small 256 × 256 pixel patches. Subsequently, we reconstructed patches into 512 × 512 pixels and 1024 × 1024 pixels at ×2 and ×4 scale factors, respectively. None of the training data for any methods included information from this dataset, and NIQE was employed as a quantitative evaluation metric.

Qualitative Results

Table 5 presents the evaluation results of all methods at both scale factors. It is evident that our approach achieved the lowest NIQE score compared to the other methods. Figure 9 and Figure 10 depict the visual outcomes of all methods at these scale factors. The regions of interest in the original image on the left are marked with a red box, and the corresponding reconstructed image is on the right. Figure 9 shows the “105” of the ×2 scale factor. Our method produced clearer edges of aircraft compared to others. Figure 10 shows the “174” of the ×4 scale factor. Our method recovered more texture details in lettering. This experiment demonstrates that our approach achieves satisfactory SR performance on real-world remote sensing images.

4.6. Ablation Studies

To validate the effectiveness of RFGKCorrector, we conducted ablation experiments. The training dataset, testing dataset, and degradation methods utilized in this experiment were consistent with Section 4.3. Firstly, we performed SR reconstruction by the SRConvNext on 15 degradation methods. Secondly, RFGKCorrector was introduced to form RFKCNext for SR reconstruction.
Table 6 and Table 7 display the PSNR and SSIM values of scale factors ×2 and ×4, with the most effective results highlighted in red. Table 6 shows that at ×2 scale factor, RFKCNext improved average PSNR by 0.184 dB, 0.165 dB, and 0.125 dB compared to SRConvNext. At ×4 scale factor, as shown in Table 7, RFKCNext improved average PSNR by 0.152 dB, 0.100 dB, and 0.088 dB. Thus, considering both scale factors, RFKCNext demonstrated superior SR performance across different noise levels and blur kernels. This indicates that, compared to employing only SRConvNext for reconstruction, introducing RFGKCorrector can effectively utilize reconstructed features for kernel correction, making the estimated kernel approximate to the real-world scenarios and obtaining better reconstruction results.

5. Discussion

In this section, we will further discuss the impact of the proposed RFKCNext.
(1) Comparison with other methods: The experimental results in Section 4.3 show that the proposed RFKCNext restores finer details compared to SRMD, USRNet, IKC, and DAN. As the noise level gradually increases, all methods tend to smooth the reconstructed images to reduce the impact of noise. In contrast, our method reconstructs results that effectively preserve the shapes of objects while performing well in restoring image details and edges. The quantitative and qualitative analyses in Section 4.4 and Section 4.5 indicate that, when reconstructing unknown degraded real-world remote sensing data, RFKCNext does not introduce invalid textures, appearing more natural compared to the above methods.
(2) The impact of model parameter quantity and running time: Table 2 and Table 3 show quantitative results, indicating that our method achieves the best SR performance at scale factors ×2 and ×4. Compared with the fastest running time of KESPKNet, our method improves average PSNR by 0.225 dB, 0.118 dB, and 0.106 dB for the ×2 scale factor. For the ×4 scale factor, the improvements are 0.111 dB, 0.100 dB, and 0.111 dB, respectively. Comparatively to USRNet, which has the minimum model parameters, our method increases average PSNR by 1.966 dB, 0.444 dB, and 0.266 dB for the ×2 scale factor. When the scale factor is ×4, the average PSNR increases by 0.282 dB, 0.143 dB, and 0.136 dB, respectively. Furthermore, RFKCNext reduces model parameters by 19.05 M and 19.06 M compared to KESPKNet at ×2 and ×4 scale factors, respectively. Therefore, considering the performance improvements, the increase in our method in model parameters and runtime is acceptable.
(3) The impact of kernel correction network: In RFKCNext, the subnetwork SRConvNext reconstructs the SR image based on LR image information and estimated degradation kernels. The kernel correction subnetwork RFGKCorrector corrects the estimated kernels utilizing LR image information and SR reconstruction features, complementing each other. Therefore, as indicated by the results in Table 5 and Table 6, the SR performance using SRConvNext alone is inferior to that achieved by jointly employing RFGKCorrector. This demonstrates that through loop iterative optimization, the kernels corrected by RFGKCorrector gradually approximate the real-world scenario, thereby enhancing the quality of images reconstructed by SRConvNext.
(4) Limitations of the method: Firstly, as the noise level increases, our method effectively reconstructs sharp edges, but the images still tend to become smoother. Secondly, under down-sampling at lower scale factors, the restored images maintain the original shapes of objects but may incur texture losses. Lastly, the robustness of the network should also be considered in our scope with more complex degradation factors such as motion blur and other types of noise. These limitations and challenges contribute to refining the proposed method, which will be our future work.

6. Conclusions

In this paper, we propose RFKCNext, a multi-degradation SR reconstruction network for remote sensing images. Unlike methods that learn degradation kernel representations directly from LR images, RFKCNext also employs features from SR reconstructed images to correct the degradation kernel. The network comprises two sub-networks: SRConvNext and RFGKCorrector, each using ConvNext Blocks (CNBs) to extract global features. These sub-networks are optimized in a loop alternation manner to form an end-to-end trainable network. SRConvNext reconstructs SR images utilizing LR images and estimated kernels. The reconstruction features guide RFGKCorrector to correct the estimated degradation kernel to approximate real-world scenarios. Both subnetworks can achieve significant improvement during the loop process. Therefore, even through the degradation kernels are unknown, RFGKCorrector provides more accurate kernel estimations than SRConvNext, enhancing its capability to reconstruct image details. Extensive experiments on synthetic remote sensing datasets and ablation studies indicate that the proposed method achieves the best SR results, particularly in high-noise conditions. Furthermore, experiments on other public remote sensing and real-world datasets demonstrate superior reconstruction performance, proving the robustness and practicality of the method.

Author Contributions

Conceptualization, Y.Q.; formal analysis, H.N.; methodology, J.W.; investigation, J.L.; supervision, M.Z.; visualization, H.L.; data curation, J.S.; funding acquisition, J.W.; software, Y.Q.; validation, Y.Q.; writing—original draft, Y.Q.; writing—review and editing, Q.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Science and Technology Department of Jilin Province of China under Grant number 20210201137GX.

Data Availability Statement

The data of experimental images used to support the findings of this research are available from the corresponding author upon reasonable request. The data are not publicly available due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, X.; Yi, J.; Guo, J.; Song, Y.; Lyu, J.; Xu, J.; Yan, W.; Zhao, J.; Cai, Q.; Min, H. A Review of Image Super-Resolution Approaches Based on Deep Learning and Applications in Remote Sensing. Remote Sens. 2022, 14, 5423. [Google Scholar] [CrossRef]
  2. Huang, L.; An, R.; Zhao, S.; Jiang, T.; Hu, H. A Deep Learning-Based Robust Change Detection Approach for Very High Resolution Remotely Sensed Images with Multiple Features. Remote Sens. 2020, 12, 1441. [Google Scholar] [CrossRef]
  3. Tang, X.; Zhang, H.; Mou, L.; Liu, F.; Zhang, X.; Xiang, X.; Zhu, X.; Jiao, L. An Unsupervised Remote Sensing Change Detection Method Based on Multiscale Graph Convolutional Network and Metric Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5609715. [Google Scholar] [CrossRef]
  4. Li, X.; Yong, X.; Li, T.; Tong, Y.; Gao, H.; Wang, X.; Xu, Z.; Fang, Y.; You, Q.; Lyu, X. A Spectral–Spatial Context-Boosted Network for Semantic Segmentation of Remote Sensing Images. Remote Sens. 2024, 16, 1214. [Google Scholar] [CrossRef]
  5. Chen, X.; Li, D.; Liu, M.; Jia, J. CNN and Transformer Fusion for Remote Sensing Image Semantic Segmentation. Remote Sens. 2023, 15, 4455. [Google Scholar] [CrossRef]
  6. Rabbi, J.; Ray, N.; Schubert, M.; Chowdhury, S.; Chao, D. Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network. Remote Sens. 2020, 12, 1432. [Google Scholar] [CrossRef]
  7. Liu, C.; Zhang, S.; Hu, M.; Song, Q. Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method. Remote Sens. 2024, 16, 907. [Google Scholar] [CrossRef]
  8. Shi, J.; Liu, W.; Shan, H.; Li, E.; Li, X.; Zhang, L. Remote Sensing Scene Classification Based on Multibranch Fusion Attention Network. IEEE Geosci. Remote Sens. Lett. 2023, 20, 3001505. [Google Scholar] [CrossRef]
  9. Wang, G.; Zhang, N.; Liu, W.; Chen, H.; Xie, Y. MFST: A Multi-Level Fusion Network for Remote Sensing Scene Classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6516005. [Google Scholar] [CrossRef]
  10. Zhang, J.; Xu, T.; Li, J.; Jiang, S.; Zhang, Y. Single-Image Super Resolution of Remote Sensing Images with Real-world Degradation Modeling. Remote Sens. 2022, 14, 2895. [Google Scholar] [CrossRef]
  11. Huang, B.; Guo, Z.; Wu, L.; He, B.; Li, X.; Lin, Y. Pyramid Information Distillation Attention Network for Super-Resolution Reconstruction of Remote Sensing Images. Remote Sens. 2021, 13, 5143. [Google Scholar] [CrossRef]
  12. Dong, C.; Loy, C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
  13. Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
  14. Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
  15. Tong, T.; Li, G.; Liu, X.; Gao, Q. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4799–4807. [Google Scholar]
  16. Li, J.; Du, S.; Wu, C.; Leng, Y.; Song, R.; Li, Y. Drcr net: Dense residual channel re-calibration network with non-local purification for spectral super resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1259–1268. [Google Scholar]
  17. Larsson, G.; Maire, M.; Shakhnarovich, G. FractalNet: Ultra-Deep Neural Networks without Residuals. arXiv 2016, arXiv:1605.07648. [Google Scholar]
  18. Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.S.; Asari, V.K. A State-of-the-Art Survey on Deep Learning Theory and Architectures. Electronics 2019, 8, 292. [Google Scholar] [CrossRef]
  19. Cheng, K.; Shen, Y.; Dinov, I.D. Applications of Deep Neural Networks with Fractal Structure and Attention Blocks for 2D and 3D Brain Tumor Segmentation. J. Stat. Theory Pract. 2024, 18, 31. [Google Scholar] [CrossRef]
  20. Ding, C.; Chen, Y.; Algarni, A.M.; Zhang, G.; Peng, H. Application of fractal neural network in network security situation awareness. Fractals. 2022, 30, 2240090. [Google Scholar] [CrossRef]
  21. Anil, B.C.; Dayananda, P. Automatic liver tumor segmentation based on multi-level deep convolutional networks and fractal residual network. IETE J. Res. 2023, 69, 1925–1933. [Google Scholar] [CrossRef]
  22. Ding, S.; Gao, Z.; Wang, J.; Lu, M.; Shi, J. Fractal graph convolutional network with MLP-mixer based multi-path feature fusion for classification of histopathological images. Expert Syst. Appl. 2023, 212, 118793. [Google Scholar] [CrossRef]
  23. Song, X.; Liu, W.; Liang, L.; Shi, W.; Xie, G.; Lu, X.; Hei, X. Image super-resolution with multi-scale fractal residual attention network. Comput. Graph. 2023, 113, 21–31. [Google Scholar] [CrossRef]
  24. Feng, X.; Li, X.; Li, J. Multi-scale fractal residual network for image super-resolution. Appl. Intell. 2021, 51, 1845–1856. [Google Scholar] [CrossRef]
  25. Zhou, Y.; Dong, J.; Yang, Y. Deep fractal residual network for fast and accurate single image super resolution. Neurocomputing 2020, 398, 389–398. [Google Scholar] [CrossRef]
  26. Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Gool, L.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
  27. Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating More Pixels in Image Super-Resolution Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22367–22377. [Google Scholar]
  28. Wu, D.; Li, H.; Hou, Y.; Xu, C.; Cheng, G.; Guo, L.; Liu, H. Spatial–Channel Attention Transformer with Pseudo Regions for Remote Sensing Image-Text Retrieval. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4704115. [Google Scholar] [CrossRef]
  29. Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
  30. Wang, T.; Yuan, L.; Feng, J.; Yan, S. PnP-DETR: Towards Efficient Visual Analysis with Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4641–4650. [Google Scholar]
  31. Dai, Z.; Liu, H.; Le, Q.; Tan, M. Coatnet: Marrying convolution and attention for all data sizes. Adv. Neural Inf. Process. Syst. 2021, 34, 3965–3977. [Google Scholar]
  32. Liu, Y.; Zhang, Y.; Wang, Y.; Hou, F.; Yuan, J.; Tian, J.; Zhang, Y.; Shi, Z.; Fan, J.; He, Z. A Survey of Visual Transformers. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 7478–7498. [Google Scholar] [CrossRef] [PubMed]
  33. Jamil, S.; Piran, M.J.; Kwon, O.-J. A Comprehensive Survey of Transformers for Computer Vision. Drones 2023, 7, 287. [Google Scholar] [CrossRef]
  34. Raghu, M.; Unterthiner, T.; Kornblith, S.; Zhang, C.; Dosovitskiy, A. Do vision transformers see like convolutional neural networks? Adv. Neural Inf. Process. Syst. 2021, 34, 12116–12128. [Google Scholar]
  35. Liu, Z.; Mao, H.; Wu, C.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11966–11976. [Google Scholar]
  36. Zhang, N.; Wang, Y.; Zhang, X.; Xu, D.; Wang, X.; Ben, G.; Zhao, Z.; Li, Z. A Multi-Degradation Aided Method for Unsupervised Remote Sensing Image Super Resolution with Convolution Neural Networks. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5600814. [Google Scholar] [CrossRef]
  37. Gu, J.; Lu, H.; Zuo, W.; Dong, C. Blind Super-Resolution with Iterative Kernel Correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1604–1613. [Google Scholar]
  38. Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Real-worldistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar]
  39. Haris, M.; Shakhnarovich, G.; Ukita, N. Deep Back-Projection Networks for Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1664–1673. [Google Scholar]
  40. Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2020; Springer: Cham, Swizerland, 2018; pp. 294–310. [Google Scholar]
  41. Zhou, Y.; Li, Z.; Guo, C.-L.; Bai, S.; Cheng, M.-M.; Hou, Q. SRFormer: Permuted Self-Attention for Single Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12734–12745. [Google Scholar]
  42. Zhang, K.; Zuo, W.; Zhang, L. Learning a Single Convolutional Super-Resolution Network for Multiple Degradations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3262–3271. [Google Scholar]
  43. Xu, Y.; Tseng, S.; Tseng, Y.; Kuo, H.; Tsai, Y. Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12493–12502. [Google Scholar]
  44. Zhang, K.; Gool, L.; Timofte, R. Deep Unfolding Network for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3214–3223. [Google Scholar]
  45. Zhang, K.; Zuo, W.; Zhang, L. Deep Plug-And-Play Super-Resolution for Arbitrary Blur Kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1671–1681. [Google Scholar]
  46. Liu, Q.; Gao, P.; Han, K.; Liu, N.; Xiang, W. Degradation-aware self-attention based transformer for blind image super-resolution. IEEE Trans. Multimed. 2024, 26, 7516–7528. [Google Scholar] [CrossRef]
  47. Zhang, J.; Zhou, Y.; Bi, J.; Xue, Y.; Deng, W.; He, W.; Zhao, T.; Sun, K.; Tong, T.; Gao, Q.; et al. A blind image super-resolution network guided by kernel estimation and structural prior knowledge. Sci. Rep. 2024, 14, 9525. [Google Scholar] [CrossRef]
  48. Zhang, W.; Tan, Z.; Lv, Q.; Li, J.; Zhu, B.; Liu, Y. An Efficient Hybrid CNN-Transformer Approach for Remote Sensing Super-Resolution. Remote Sens. 2024, 16, 880. [Google Scholar] [CrossRef]
  49. Wang, Y.; Shao, Z.; Lu, T.; Huang, X.; Wang, J.; Chen, X.; Huang, H.; Zuo, X. Remote Sensing Image Super-Resolution via Multi-Scale Texture Transfer Network. Remote Sens. 2023, 15, 5503. [Google Scholar] [CrossRef]
  50. Yue, X.; Chen, X.; Zhang, W.; Ma, H.; Wang, L.; Zhang, J.; Wang, M.; Jiang, B. Super-Resolution Network for Remote Sensing Images via Preclassification and Deep–Shallow Features Fusion. Remote Sens. 2022, 14, 925. [Google Scholar] [CrossRef]
  51. Wang, Y.; Zhao, L.; Liu, L.; Hu, H.; Tao, W. URNet: A U-Shaped Residual Network for Lightweight Image Super-Resolution. Remote Sens. 2021, 13, 3848. [Google Scholar] [CrossRef]
  52. Xiong, Y.; Guo, S.; Chen, J.; Deng, X.; Sun, L.; Zheng, X.; Xu, W. Improved SRGAN for Remote Sensing Image Super-Resolution Across Locations and Sensors. Remote Sens. 2020, 12, 1263. [Google Scholar] [CrossRef]
  53. Kang, X.; Li, J.; Duan, P.; Ma, F.; Li, S. Multilayer Degradation Representation-Guided Blind Super-Resolution for Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5534612. [Google Scholar] [CrossRef]
  54. Dong, R.; Mou, L.; Zhang, L.; Fu, H.; Zhu, X. Real-world remote sensing image super-resolution via a practical degradation model and a kernel-aware network. ISPRS J. Photogramm. Remote Sens. 2022, 191, 155–170. [Google Scholar] [CrossRef]
  55. Zhao, Z.; Ren, C.; Teng, Q.; He, X. A practical super-resolution method for multi-degradation remote sensing images with deep convolutional neural networks. J. Real-Time Image Process. 2022, 19, 1139–1154. [Google Scholar] [CrossRef]
  56. Xiao, Y.; Yuan, Q.; Jiang, K.; He, J.; Wang, Y.; Zhang, L. From degrade to upgrade: Learning a self-supervised degradation guided adaptive network for blind remote sensing image super-resolution. Inf. Fusion 2023, 96, 297–311. [Google Scholar] [CrossRef]
  57. Luo, Z.; Huang, Y.; Li, S.; Wang, L.; Tan, T. Unfolding the alternating optimization for blind super resolution. Adv. Neural Inf. Process. Syst. 2019, 33, 5632–5643. [Google Scholar]
  58. Yan, Q.; Niu, A.; Wang, C.; Dong, W.; Woźniak, M.; Zhang, Y. KGSR: A kernel guided network for real-world blind super-resolution. Pattern Recognit. 2024, 147, 110095. [Google Scholar] [CrossRef]
  59. Ates, H.F.; Yildirim, S.; Gunturk, B.K. Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation. Comput. Vis. Image Underst. 2023, 233, 103718. [Google Scholar] [CrossRef]
  60. Zhou, H.; Zhu, X.; Zhu, J.; Han, Z.; Zhang, S.; Qin, J.; Yin, X. Learning Correction Filter via Degradation-Adaptive Regression for Blind Single Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 12331–12341. [Google Scholar]
  61. Cheng, G.; Han, J.; Lu, X. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef]
  62. Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 270–279. [Google Scholar]
  63. Huynh-Thu, Q.; Ghanbari, M. Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 2008, 44, 800–801. [Google Scholar] [CrossRef]
  64. Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 2366–2369. [Google Scholar]
  65. Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
  66. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Figure 1. The overall framework of RFKCNext. SRConvNext and RFGKCorrector are connected end-to-end. SRConvNext is utilized to reconstruct SR images, while RFGKCorrector estimates corrected degradation kernel vectors using the LR images and the reconstruction feature. The two subnetworks are alternately optimized for t loops.
Figure 1. The overall framework of RFKCNext. SRConvNext and RFGKCorrector are connected end-to-end. SRConvNext is utilized to reconstruct SR images, while RFGKCorrector estimates corrected degradation kernel vectors using the LR images and the reconstruction feature. The two subnetworks are alternately optimized for t loops.
Remotesensing 16 02915 g001
Figure 2. The overall network structure of our network (RFKCNext). 1. SRConvNext consists of three stages: shallow feature extraction, deep feature extraction, and high-quality image reconstruction. It takes as input the concatenated features of the LR image and degradation kernel maps along the channel dimension and outputs the SR image. 2. Reconstructed features are extracted from the SR image using convolutional operations. 3. The input of the RFGKCorrector is the LR image and the reconstruction features of the SR image, which are utilized to generate the estimated degradation kernel vector. SRConvNext and RFGKCorrector are alternately connected to form an end-to-end network.
Figure 2. The overall network structure of our network (RFKCNext). 1. SRConvNext consists of three stages: shallow feature extraction, deep feature extraction, and high-quality image reconstruction. It takes as input the concatenated features of the LR image and degradation kernel maps along the channel dimension and outputs the SR image. 2. Reconstructed features are extracted from the SR image using convolutional operations. 3. The input of the RFGKCorrector is the LR image and the reconstruction features of the SR image, which are utilized to generate the estimated degradation kernel vector. SRConvNext and RFGKCorrector are alternately connected to form an end-to-end network.
Remotesensing 16 02915 g002
Figure 3. The network architecture of the ConvNext Block (CNB).
Figure 3. The network architecture of the ConvNext Block (CNB).
Remotesensing 16 02915 g003
Figure 4. 5 Gaussian blur kernels for the testing phase, where k 1 : [ λ 1 = 2.5 , λ 2 = 2.5 , θ = 0 ] , k 2 : [ λ 1 = 3.8 , λ 2 = 1.8 , θ = 7 10 π ] , k 3 : [ λ 1 = 1.4 , λ 2 = 2.7 , θ = 3 10 π ] , k 4 : [ λ 1 = 0.5 , λ 2 = 2.3 , θ = 3 5 π ] , and k 5 : [ λ 1 = 2.2 , λ 2 = 3.2 , θ = 8 9 π ] .
Figure 4. 5 Gaussian blur kernels for the testing phase, where k 1 : [ λ 1 = 2.5 , λ 2 = 2.5 , θ = 0 ] , k 2 : [ λ 1 = 3.8 , λ 2 = 1.8 , θ = 7 10 π ] , k 3 : [ λ 1 = 1.4 , λ 2 = 2.7 , θ = 3 10 π ] , k 4 : [ λ 1 = 0.5 , λ 2 = 2.3 , θ = 3 5 π ] , and k 5 : [ λ 1 = 2.2 , λ 2 = 3.2 , θ = 8 9 π ] .
Remotesensing 16 02915 g004
Figure 5. The visual comparisons of different methods under ×2 scale factor, Gaussian blurring k1k5, and various noise levels on the NWPU-RESISC45 dataset. The HR image is depicted on the left. To enhance detail clarity, enlarged regions are displayed in the top-right or bottom-right corners of the HR image, highlighted by a red box. Corresponding LR images and reconstruction of enlarged regions using different methods are displayed on the right.
Figure 5. The visual comparisons of different methods under ×2 scale factor, Gaussian blurring k1k5, and various noise levels on the NWPU-RESISC45 dataset. The HR image is depicted on the left. To enhance detail clarity, enlarged regions are displayed in the top-right or bottom-right corners of the HR image, highlighted by a red box. Corresponding LR images and reconstruction of enlarged regions using different methods are displayed on the right.
Remotesensing 16 02915 g005aRemotesensing 16 02915 g005b
Figure 6. The visual comparisons of different methods under ×4 scale factor, Gaussian blurring k1k5, and various noise levels on the NWPU-RESISC45 dataset. The HR image is depicted on the left. To enhance detail clarity, enlarged regions are displayed in the top-right or bottom-right corners of the HR image, highlighted by a red box. Corresponding LR images and reconstruction of enlarged regions using different methods are displayed on the right.
Figure 6. The visual comparisons of different methods under ×4 scale factor, Gaussian blurring k1k5, and various noise levels on the NWPU-RESISC45 dataset. The HR image is depicted on the left. To enhance detail clarity, enlarged regions are displayed in the top-right or bottom-right corners of the HR image, highlighted by a red box. Corresponding LR images and reconstruction of enlarged regions using different methods are displayed on the right.
Remotesensing 16 02915 g006aRemotesensing 16 02915 g006b
Figure 7. The visual comparisons of experimental results from different methods at the ×2 scale factor. “freeway81” is selected from the UCMERCED dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the bottom-right corner. The SR image reconstructed by different methods showed the enlarged image on the right side.
Figure 7. The visual comparisons of experimental results from different methods at the ×2 scale factor. “freeway81” is selected from the UCMERCED dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the bottom-right corner. The SR image reconstructed by different methods showed the enlarged image on the right side.
Remotesensing 16 02915 g007
Figure 8. The visual comparisons of experimental results from different methods at the ×4 scale factor. “buildings22” is selected from the UCMERCED dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the bottom-right corner. The SR image reconstructed by different methods showed the enlarged image on the right side.
Figure 8. The visual comparisons of experimental results from different methods at the ×4 scale factor. “buildings22” is selected from the UCMERCED dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the bottom-right corner. The SR image reconstructed by different methods showed the enlarged image on the right side.
Remotesensing 16 02915 g008
Figure 9. The visual comparisons of experimental results from different methods at the ×2 scaling factor. “105” is selected from the real-world remote sensing dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the top-right corner. The SR image reconstructed by different methods is shown in the enlarged image on the right side.
Figure 9. The visual comparisons of experimental results from different methods at the ×2 scaling factor. “105” is selected from the real-world remote sensing dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the top-right corner. The SR image reconstructed by different methods is shown in the enlarged image on the right side.
Remotesensing 16 02915 g009
Figure 10. The visual comparisons of experimental results from different methods at the ×4 scaling factor. “174” is selected from the real-world remote sensing dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the top-right corner. The SR image reconstructed by different methods is shown in the enlarged image on the right side.
Figure 10. The visual comparisons of experimental results from different methods at the ×4 scaling factor. “174” is selected from the real-world remote sensing dataset. The original image is displayed on the left. For the convenience of observing details, we marked the area to be enlarged on the original image with a red box and displayed the enlarged image of the local area in the top-right corner. The SR image reconstructed by different methods is shown in the enlarged image on the right side.
Remotesensing 16 02915 g010
Table 1. The comparison of differences among three remote sensing datasets: NWPU-RESISC45, UCMERCED, and “Tianzhi Cup”.
Table 1. The comparison of differences among three remote sensing datasets: NWPU-RESISC45, UCMERCED, and “Tianzhi Cup”.
DatasetNumber of Scene ClassesNumber of ImagesImage Size (Pixels)Spatial Resolution
(m)
Image TypeCoverage Area
NWPU-RESISC454531,500256 × 2560.2–30satellite images andaerial imagemore than 100 countries and regions
UCMERCED212100256 × 2560.3aerial image21 regions in the United States
Tianzhi Cupaircraft, dense residential, runway and other scenes4304096 × 40960.5–1satellite imagesnot mentioned in the dataset description
Table 2. The quantitative results on the NWPU-RESISC45 dataset for model parameters, running time, PSNR (dB), and SSIM under ×2 down-sampling scale factor, Gaussian blur kernels k1k5, and noise levels σ = [ 0 , 5 , 10 ] . The running time is the average running time on the testing dataset generated by all degradation combinations. The best results are highlighted in red, the second-best in blue, and indicates higher values are preferable.
Table 2. The quantitative results on the NWPU-RESISC45 dataset for model parameters, running time, PSNR (dB), and SSIM under ×2 down-sampling scale factor, Gaussian blur kernels k1k5, and noise levels σ = [ 0 , 5 , 10 ] . The running time is the average running time on the testing dataset generated by all degradation combinations. The best results are highlighted in red, the second-best in blue, and indicates higher values are preferable.
ModelNoiseParamRunning
Time
(ms/image)
k1k2k3k4k5Average
Remotesensing 16 02915 i001Remotesensing 16 02915 i002Remotesensing 16 02915 i003Remotesensing 16 02915 i004Remotesensing 16 02915 i005
PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑
SRMD01.51 M1.28432.958/
0.899
32.265/
0.888
32.751/
0.896
32.431/
0.889
32.641/
0.893
32.809/
0.893
USRNet0.81 M23.17731.539/
0.860
31.182/
0.855
31.660/
0.868
32.013/
0.877
31.395/
0.858
31.558/
0.858
IKC9.02 M97.90732.900/
0.890
31.310/
0.873
32.422/
0.888
32.046/
0.878
32.608/
0.886
32.257/
0.883
DAN4.65 M113.93733.329/
0.902
33.011/
0.894
33.217/
0.910
33.287/
0.902
33.167/
0.903
33.202/
0.902
DRSR5.59 M26.34733.310/
0.910
33.026/
0.899
33.329/
0.907
33.305/
0.904
33.177/
0.901
33.229/
0.904
DSAT15.50 M140.25133.400/
0.906
33.078/
0.904
33.360/
0.903
33.329/
0.910
33.171/
0.899
33.268/
0.904
KESPKNet21.83 M119.63133.413/
0.908
33.112/
0.904
33.390/
0.908
33.370/
0.901
33.212/
0.903
33.299/
0.905
Ours2.78 M7.34233.752/
0.913
33.261/
0.906
33.723/
0.913
33.556/
0.910
33.326/
0.908
33.524/
0.910
SRMD51.51 M1.28229.810/
0.790
29.609/
0.788
30.135/
0.802
30.618/
0.823
29.658/
0.791
29.966/
0.799
USRNet0.81 M23.23229.621/
0.784
29.425/
0.777
29.974/
0.799
30.551/
0.822
29.469/
0.778
29.808/
0.792
IKC9.02 M96.96929.822/
0.795
29.272/
0.787
29.944/
0.807
30.433/
0.823
29.645/
0.790
29.823/
0.800
DAN4.65 M115.08929.863/
0.792
29.722/
0.785
30.319/
0.808
30.894/
0.837
29.739/
0.786
30.107/
0.802
DRSR5.59 M25.75229.870/
0.800
29.755/
0.788
30.302/
0.811
30.794/
0.832
29.767/
0.788
30.098/
0.804
DSAT15.50 M136.54229.879/
0.798
29.760/
0.789
30.310/
0.814
30.861/
0.828
29.775/
0.789
30.117/
0.804
KESPKNet21.83 M114.69429.900/
0.794
29.747/
0.792
30.326/
0.809
30.908/
0.833
29.791/
0.791
30.134/
0.804
Ours2.78 M7.61230.090/
0.801
29.833/
0.793
30.455/
0.815
31.010/
0.837
29.870/
0.794
30.252/
0.808
SRMD101.51 M1.28328.804/
0.747
28.640/
0.746
29.065/
0.758
29.538/
0.780
28.676/
0.743
28.945/
0.755
USRNet0.81 M22.96428.738/
0.746
28.576/
0.739
29.032/
0.760
29.523
/0.782
28.615/
0.740
28.897/
0.753
IKC9.02 M97.49128.798/
0.752
28.512/
0.746
28.977/
0.764
29.418/
0.781
28.673/
0.747
28.876/
0.758
DAN4.65 M114.42528.848/
0.750
28.700/
0.745
29.159/
0.765
29.683/
0.787
28.770/
0.748
29.032/
0.759
DRSR5.59 M26.05828.867/
0.750
28.709/
0.744
29.165/
0.764
29.687/
0.791
28.753/
0.748
29.036/
0.759
DSAT15.50 M139.97228.869/
0.753
28.721/
0.746
29.172/
0.764
29.690/
0.789
28.773/
0.744
29.045/
0.759
KESPKNet21.83 M117.97228.883/
0.752
28.731/
0.745
29.181/
0.767
29.709/
0.791
28.782/
0.749
29.057/
0.761
Ours2.78 M7.54229.015/
0.757
28.820/
0.750
29.318/
0.771
29.808/
0.794
28.856/
0.752
29.163/
0.765
Table 3. The quantitative results on the NWPU-RESISC45 dataset for model parameters, running time, PSNR (dB), and SSIM under ×4 down-sampling scale factor, Gaussian blur kernels k1k5, and noise levels σ = [ 0 , 5 , 10 ] . The running time is the average running time on the testing dataset generated by all degradation combinations. The best results are highlighted in red, the second-best in blue, and indicates higher values are preferable.
Table 3. The quantitative results on the NWPU-RESISC45 dataset for model parameters, running time, PSNR (dB), and SSIM under ×4 down-sampling scale factor, Gaussian blur kernels k1k5, and noise levels σ = [ 0 , 5 , 10 ] . The running time is the average running time on the testing dataset generated by all degradation combinations. The best results are highlighted in red, the second-best in blue, and indicates higher values are preferable.
ModelNoiseParamRunning
Time (ms/image)
k1k2k3k4k5Average
Remotesensing 16 02915 i006Remotesensing 16 02915 i007Remotesensing 16 02915 i008Remotesensing 16 02915 i009Remotesensing 16 02915 i010
PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑ PSNR↑/SSIM↑
SRMD01.55 M1.16228.663/
0.746
28.553/
0.743
28.625/
0.745
28.610/
0.745
28.644/
0.746
28.619/
0.745
USRNet0.81 M22.98228.612/
0.743
28.574/
0.742
28.610/
0.743
28.605/
0.743
28.569/
0.743
28.594/
0.743
IKC9.17 M96.65628.625/
0.741
28.546/
0.734
28.633/
0.738
28.711/
0.745
28.594/
0.736
28.622/
0.741
DAN4.80 M112.47228.777/
0.747
28.707/
0.744
28.796/
0.748
28.592/
0.745
28.788/
0.750
28.732/
0.747
DRSR5.74 M25.84128.743/
0.746
28.673/
0.747
28.764/
0.749
28.678/
0.751
28.763/
0.752
28.724/
0.749
DSAT15.64 M132.82228.786/
0.752
28.708/
0.745
28.790/
0.751
28.682/
0.749
28.782/
0.753
28.750/
0.750
KESPKNet21.98 M110.46928.809/
0.751
28.718/
0.749
28.798/
0.753
28.702/
0.749
28.796/
0.751
28.765/
0.751
Ours2.92 M7.42528.916/
0.756
28.859/
0.755
28.888/
0.757
28.829/
0.754
28.887/
0.756
28.876/
0.756
SRMD51.55 M1.18127.750/
0.699
27.664/
0.696
27.826/
0.703
27.936/
0.709
27.697/
0.696
27.775/
0.701
USRNet0.81 M22.67927.774/
0.701
27.707/
0.698
27.859/
0.705
27.967/
0.711
27.729/
0.698
27.807/
0.703
IKC9.17 M96.05327.715/
0.696
27.613/
0.689
27.831/
0.701
28.010/
0.712
27.627/
0.689
27.759/
0.697
DAN4.80 M113.91927.792/
0.705
27.733/
0.698
27.877/
0.706
27.972/
0.717
27.756/
0.700
27.826/
0.705
DRSR5.74 M25.64927.795/
0.706
27.723/
0.701
27.814/
0.707
28.001/
0.713
27.740/
0.701
27.815/
0.706
DSAT15.64 M133.45727.813/
0.702
27.745/
0.699
27.873/
0.711
28.013/
0.715
27.755/
0.703
27.840/
0.706
KESPKNet21.98 M111.24927.824/
0.707
27.762/
0.700
27.882/
0.711
28.020/
0.716
27.763/
0.702
27.850/
0.707
Ours2.92 M7.42727.925/
0.709
27.844/
0.705
28.007/
0.713
28.105/
0.719
27.867/
0.706
27.950/
0.710
SRMD101.55 M1.15726.988/
0.663
26.910/
0.660
27.079/
0.669
27.219/
0.676
26.937/
0.661
27.027/
0.666
USRNet0.81 M22.73027.031/
0.666
26.906/
0.661
27.061/
0.671
27.197/
0.679
26.925/
0.658
27.024/
0.667
IKC9.17 M95.91326.900/
0.660
26.842/
0.655
27.041/
0.666
27.191/
0.675
26.831/
0.654
26.961/
0.662
DAN4.80 M112.24727.014/
0.670
26.906/
0.664
27.086/
0.673
27.256/
0.683
26.905/
0.666
27.033/
0.671
DRSR5.74 M25.72526.981/
0.669
26.882/
0.665
27.083/
0.675
27.157/
0.679
26.917/
0.663
27.004/
0.670
DSAT15.64 M134.34327.009/
0.671
26.903/
0.663
27.098/
0.671
27.252/
0.682
26.926/
0.665
27.038/
0.670
KESPKNet21.98 M112.03227.033/
0.670
26.914/
0.666
27.107/
0.674
27.260/
0.685
26.933/
0.667
27.049/
0.672
Ours2.92 M7.48727.120/
0.672
27.046/
0.668
27.219/
0.677
27.352/
0.685
27.064/
0.670
27.160/
0.674
Table 4. The NIQE quantitative results at ×2 and ×4 scale factors on the UCMERCED dataset. The best results are highlighted in red, while the second-best are highlighted in blue. indicates lower values are preferable.
Table 4. The NIQE quantitative results at ×2 and ×4 scale factors on the UCMERCED dataset. The best results are highlighted in red, while the second-best are highlighted in blue. indicates lower values are preferable.
ModelScale Factor NIQE
SRMD222.184
USRNet22.311
IKC22.699
DAN22.925
DRSR22.794
DSAT22.561
KESPKNet22.762
Ours20.787
SRMD420.176
USRNet19.388
IKC18.913
DAN19.348
DRSR19.091
DSAT19.511
KESPKNet19.505
Ours18.415
Table 5. The NIQE quantitative results at ×2 and ×4 scale factors on the real-world remote sensing dataset. The best results are highlighted in red, while the second-best are highlighted in blue. indicates lower values are preferable.
Table 5. The NIQE quantitative results at ×2 and ×4 scale factors on the real-world remote sensing dataset. The best results are highlighted in red, while the second-best are highlighted in blue. indicates lower values are preferable.
ModelScale Factor NIQE
SRMD216.959
USRNet17.831
IKC21.541
DAN23.032
DRSR22.731
DSAT21.736
KESPKNet22.004
Ours16.392
SRMD417.715
USRNet17.925
IKC20.848
DAN17.188
DRSR18.321
DSAT18.750
KESPKNet18.034
Ours16.691
Table 6. The quantitative results of PSNR (dB) and SSIM for ablation experiments using ×2 scale factor, Gaussian blur kernels k1k5, and noise levels σ = [ 0 , 5 , 10 ] on the NWPU-RESISC45 dataset to validate the effectiveness of RFGKCorrector. SRConvNext combined with RFGKCorrector forms RFKCNext. The best results are highlighted in red and indicates higher values are preferable.
Table 6. The quantitative results of PSNR (dB) and SSIM for ablation experiments using ×2 scale factor, Gaussian blur kernels k1k5, and noise levels σ = [ 0 , 5 , 10 ] on the NWPU-RESISC45 dataset to validate the effectiveness of RFGKCorrector. SRConvNext combined with RFGKCorrector forms RFKCNext. The best results are highlighted in red and indicates higher values are preferable.
ModelNoisek1k2k3k4k5Average
Remotesensing 16 02915 i011Remotesensing 16 02915 i012Remotesensing 16 02915 i013Remotesensing 16 02915 i014Remotesensing 16 02915 i015
PSNR↑/
SSIM↑
PSNR↑/
SSIM↑
PSNR↑/
SSIM↑
PSNR↑/
SSIM↑
PSNR↑/
SSIM↑
PSNR↑/
SSIM↑
SRConvNext033.649/0.90633.115/0.89233.551/0.90433.325/0.89533.060/0.89133.340/0.898
RFKCNext33.752/0.91333.261/0.90633.723/0.91333.556/0.91033.326/0.90833.524/0.910
SRConvNext529.893/0.79329.755/0.78830.231/0.81230.860/0.83029.697/0.78730.087/0.802
RFKCNext30.090/0.80129.833/0.79330.455/0.81531.010/0.83729.870/0.79430.252/0.808
SRConvNext1028.897/0.75128.693/0.74729.163/0.76929.701/0.78928.735/0.74629.038/0.761
RFKCNext29.015/0.75728.820/0.75029.318/0.77129.808/0.79428.856/0.75229.163/0.765
Table 7. The quantitative results of PSNR (dB) and SSIM for ablation experiments using ×4 scale factor, Gaussian blur kernels k1k5, and noise levels σ = [ 0 , 5 , 10 ] on the NWPU-RESISC45 dataset to validate the effectiveness of RFGKCorrector. SRConvNext combined with RFGKCorrector forms RFKCNext. The best results are highlighted in red and indicates higher values are preferable.
Table 7. The quantitative results of PSNR (dB) and SSIM for ablation experiments using ×4 scale factor, Gaussian blur kernels k1k5, and noise levels σ = [ 0 , 5 , 10 ] on the NWPU-RESISC45 dataset to validate the effectiveness of RFGKCorrector. SRConvNext combined with RFGKCorrector forms RFKCNext. The best results are highlighted in red and indicates higher values are preferable.
ModelNoisek1k2k3k4k5Average
Remotesensing 16 02915 i016Remotesensing 16 02915 i017Remotesensing 16 02915 i018Remotesensing 16 02915 i019Remotesensing 16 02915 i020
PSNR↑/
SSIM↑
PSNR↑/
SSIM↑
PSNR↑/
SSIM↑
PSNR↑/
SSIM↑
PSNR↑/
SSIM↑
PSNR↑/
SSIM↑
SRConvNext028.773/0.75428.674/0.75128.717/0.75328.707/0.75328.747/0.75328.724/0.753
RFKCNext28.916/0.75628.859/0.75528.888/0.75728.829/0.75428.887/0.75628.876/0.756
SRConvNext527.835/0.70627.732/0.70227.889/0.71128.002/0.71727.790/0.70427.850/0.708
RFKCNext27.925/0.70927.844/0.70528.007/0.71328.105/0.71927.867/0.70627.950/0.710
SRConvNext1027.045/0.67026.953/0.66627.116/0.67527.261/0.68326.986/0.66727.072/0.672
RFKCNext27.120/0.67227.046/0.66827.219/0.67727.352/0.68527.064/0.67027.160/0.674
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qin, Y.; Nie, H.; Wang, J.; Liu, H.; Sun, J.; Zhu, M.; Lu, J.; Pan, Q. Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction. Remote Sens. 2024, 16, 2915. https://doi.org/10.3390/rs16162915

AMA Style

Qin Y, Nie H, Wang J, Liu H, Sun J, Zhu M, Lu J, Pan Q. Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction. Remote Sensing. 2024; 16(16):2915. https://doi.org/10.3390/rs16162915

Chicago/Turabian Style

Qin, Yi, Haitao Nie, Jiarong Wang, Huiying Liu, Jiaqi Sun, Ming Zhu, Jie Lu, and Qi Pan. 2024. "Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction" Remote Sensing 16, no. 16: 2915. https://doi.org/10.3390/rs16162915

APA Style

Qin, Y., Nie, H., Wang, J., Liu, H., Sun, J., Zhu, M., Lu, J., & Pan, Q. (2024). Multi-Degradation Super-Resolution Reconstruction for Remote Sensing Images with Reconstruction Features-Guided Kernel Correction. Remote Sensing, 16(16), 2915. https://doi.org/10.3390/rs16162915

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop