Next Article in Journal
Sample Augmentation for Tree Above-Ground Biomass Estimation Under Limited Field Data: A Case Study in the Greater Khingan Mountains
Previous Article in Journal
DSM-to-DTM Reconstruction Using Only DSM-Derived Inputs with Residual Learning and CSF Priors
Previous Article in Special Issue
KAN-Enhanced Alignment and Fusion for Lightweight Satellite Video Super-Resolution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DFCFNet: A Local–Nonlocal Dual-Branch Feature Complementary Fusion Network for Remote Sensing Image Super-Resolution

1
Xi’an Institute of Optics and Precision Mechanics (XIOPM), Chinese Academy of Sciences, Xi’an 710119, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
Key Laboratory of Biomedical Spectroscopy of Xi’an, Key Laboratory of Spectral Imaging Technology, Xi’an Institute of Optics and Precision Mechanics (XIOPM), Chinese Academy of Sciences, Xi’an 710119, China
4
Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(10), 1626; https://doi.org/10.3390/rs18101626
Submission received: 1 April 2026 / Revised: 3 May 2026 / Accepted: 14 May 2026 / Published: 19 May 2026
(This article belongs to the Special Issue Super-Resolution and Reconstruction of Remote Sensing Images)

Highlights

What are the main findings?
  • We propose a local–nonlocal dual-branch feature complementary fusion network (DFCFNet), which combines a Partial Convolution Channel Mixer (PCCM) with a global variance-based strategy to jointly model local and global representations. An Efficient Feed-Forward Network (EFFN) is further introduced to refine features, leading to enhanced detail preservation and improved reconstruction quality in remote sensing images.
  • Extensive experimental results demonstrate that DFCFNet achieves superior performance on remote sensing datasets, effectively balancing reconstruction quality and inference efficiency. Furthermore, cross-domain evaluation on natural images confirms the model’s strong generalization capability.
What are the implications of the main findings?
  • DFCFNet adopts a lightweight design, enabling high-quality remote sensing image super-resolution on resource-constrained edge devices and demonstrating strong potential for real-time applications.
  • DFCFNet exhibits strong generalization capability, providing valuable guidance for future research in remote sensing image super-resolution as well as broader geospatial image processing applications.

Abstract

Remote sensing image super-resolution (RSISR) has gained significant attention in recent years due to its critical role in enhancing image analysis capabilities. While existing methods often focus on nonlocal feature extraction, they frequently overlook the importance of local information integration. Moreover, many methods reconstruct images by introducing more complex structures, which poses a challenge to resource-limited devices. To address these issues, we present a local–nonlocal dual-branch feature complementary fusion network (DFCFNet) featuring two key components: a lightweight dual-branch feature aggregation (DBFA) module and an Efficient Feed-Forward Network (EFFN). The DBFA employs a dual-branch structure comprising a Focused Local Feature Branch (FLFB) with novel Partial Convolution Channel Mixers for localized pattern modeling and a Non-Focal Exploration Branch (NFEB) utilizing global variance analysis for comprehensive feature extraction. This dual-branch design enables simultaneous capture of local and global contextual information. The EFFN is designed to further refine the features of the DBFA output in order to make full use of the detailed information of the image. Extensive experimental results show that the proposed DFCFNet reconstructs optimally on remote sensing datasets and is also optimal in terms of computational efficiency and network complexity. The framework’s versatility is further confirmed through successful adaptation to natural image SR tasks, showing consistent performance improvements across five standard datasets.

1. Introduction

Remote sensing image super-resolution (RSISR) is an image processing technique in the field of computer vision. It is the process of reconstructing one or more low-resolution (LR) images with coarse details to obtain high-resolution (HR) images with better visual quality and details. It can be applied to many remote sensing tasks, such as target detection, semantic segmentation and scene recognition. Therefore, more and more scholars are devoted to improving the performance of RSISR [1,2].
The traditional approach relies on predefined rules and a priori assumptions, which makes it challenging to handle the issue of different feature expression and detail reduction in complicated scenarios, despite its benefits of minimal hardware requirements and easy implementation.
Deep learning methods, with their powerful nonlinear modeling capabilities and the advantage of learning features from large-scale data, not only greatly increase the reconstructed images’ visual quality and accuracy but also adjust to the intricate requirements of various settings. However, RSISR is usually considered as a pathological problem with multisolvability in the SR process due to insufficient input information, which makes it challenging to construct an efficient SR model [3]. To resolve this challenge, researchers have developed a variety of CNN-based methods [4,5,6,7] that significantly improve the model’s representational capabilities by learning features layer-by-layer. But the higher complexity of these models, the larger the number of parameters and the higher computational cost limit their application on resource-constrained devices such as mobile devices. To be able to reduce the model’s parameter count, many scholars have designed many lightweight models. For example, the distilled information network is used to refine and fine-tune features [8,9,10]. Liu et al. [11] proposed the RFDN architecture and added residual blocks to achieve finer feature extraction and exploitation through feature distillation, all while lowering the number of parameters and the model’s level of complexity. Although these models have a compact architecture and achieve better results in visual effects, it is challenging to balance the model’s size and performance while fully utilizing both local and nonlocal information [12]. To overcome this problem, scholars have gone further to investigate local and nonlocal collaborative modeling methods for image features. Gao et al. [13] designed a dual-branching block (DBB) that extracts information from both local and larger regions using standard and inflated convolutions. Despite recent advancements in RSISR, current approaches struggle to meet the requirements for lightweight deployment and are still computationally costly and complex [14,15].
To address this, we propose a novel lightweight dual-branch feature aggregation (DBFA) module. The DBFA module combines local and nonlocal feature processing through two parallel pathways. The nonlocal branch employs an ADSC that first downsamples features to capture low-frequency components. These components are then modulated by global variance statistics before being adaptively fused with original features. Complementing this, the local branch incorporates a PCCM that enhances local modeling through spatial–channel interactions while reducing computation via optimized memory access patterns. This dual-path architecture enables efficient extraction of both detailed textures and global structures.
To further refine features, we introduce an EFFN that performs channel–spatial co-enhancement after DBFA processing. By integrating DBFA and EFFN into an end-to-end framework called DFCFNet, we achieve effective RSISR with minimal computational overhead. Extensive experiments on remote sensing benchmarks demonstrate DFCFNet’s superior balance between model efficiency and reconstruction quality, as illustrated in Figure 1. Additional evaluations on natural image datasets confirm its strong generalization capability.
The following is a summary of the primary contributions:
  • We created a lightweight and effective DBFA module with a dual-branch structure for extracting local and nonlocal information to extract more comprehensive and in-depth information.
  • The global variance was used to enhance the nonlocal feature representation, and the partial convolution design of the partial convolution channel mixer (PCCM) was used to enhance the ability of local modeling while reducing computational redundancy.
  • We proposed a lightweight EFFN, which further enhances the ability to extract image details while improving the stability and generalization performance of the model, thereby achieving better image reconstruction effects while ensuring efficient computing.
  • Quantitative and qualitative evaluations across multiple remote sensing datasets to test its balance between model complexity and performance, which were supplemented by cross-domain testing on natural images to verify generalizability.

2. Related Work

2.1. Deep Learning Developments for RSISR

In the domain of natural images, deep learning-based SR has demonstrated outstanding performance [16,17,18]. Therefore, many researchers design SR networks for optimising RSI. Liu et al. [19] suggested combining paired learning with a graph neural network structure to represent the degradation connection from SR to LR and the reconstruction process from LR to HR in order to recover finer texture information from a large number of remote sensing photos. GAN-based RSISR achieved more satisfactory results [20], although the GAN architecture can make the image more realistic. When applying the GAN architecture to RSISR [21,22], the generator adds pseudo-textures in the smooth regions, leading to spatial distortion. To address this issue, Ma et al. [23] designed an SD-GAN to apply different reconstruction criteria to different texture regions to improve the reconstruction quality of RSIs. Given that the SD-GAN generator reconstructs every area of the image using exactly the same parameters, in order to significantly process different regions of SR, a novel SG-FBGAN with significant and non-significant bipartite dominating PFBB [24] was proposed by Wu et al. in order to more efficiently process the images in both significant and non-significant regions. Diffusion models, owing to their powerful generative capabilities, have demonstrated significant potential in SR tasks. Zhu et al. [25]. proposed RSDiffSR based on a conditional diffusion framework, which leverages large-scale diffusion models to generate prior information, thereby effectively enhancing the visual quality of reconstructed images. Wang et al. [26] developed a semantic-guided diffusion model that utilizes pretrained generative models as priors to alleviate texture blurring in large-scale remote sensing image reconstruction. Furthermore, LDM [27] reduces computational cost by performing the diffusion process in a latent space rather than the original image space, with the aid of a pretrained autoencoder. Building upon this, StableSR [28] addresses the limitations in reconstruction fidelity and resolution flexibility by introducing a time-aware encoder, a controllable feature fusion module, and a progressive aggregation sampling strategy, thereby further improving reconstruction performance.

2.2. Feature Extraction in SR

The ability of the image to extract features is crucial for the model’s performance enhancement in the SR work. Since the application of SRCNN [5] to SR tasks breaks through the limitations of traditional methods, more and more CNN-related models have been proposed. Kim et al. [20] utilized the CNN architecture of VGG networks to learn high-frequency information to speed up training. Hu et al. [29] proposed a CMSC that extracts image features from rough to fine. Wang et al. [30] developed an MSRDN to lower the cost while maintaining high performance.
Because the self-attention (SA) mechanism can capture long-range relationships, the Transformer architecture was brought to the field of computer vision [31]. The model represented by ViT provides a solution to the vision task by capturing the correlation between elements in a sequence in a nonlocal range through the self-attention mechanism, thus effectively extracting nonlocal information and establishing long-term dependencies. Liu et al. [32] proposed Swin Transformer, which consists of a sliding window mechanism and a Transformer that employs hierarchical vision. Chen et al. [33] proposed Hybrid Attention Transformer (HAT), which strengthens the interaction of cross-window information so that through the HAT it is possible to mitigate the block effect of intermediate features. The ViT approach has achieved impressive success in visual tasks. However, recent studies have shown that SA suffers from high memory consumption and high computational cost and that ViT tends to focus on low-frequency information, leading to smooth reconstruction results [34]. These findings are based on the relationship between variance and nonlocal features explored in Refs [35]. In order to enhance model performance and optimization, we use variance to network designs, which can easily and effectively investigate nonlocal information.
To fully exploit both local and nonlocal characteristics, many scholars have developed a series of networks that combine local and nonlocal feature extraction and proposed different modules to optimize the feature extraction process. Meng et al. [36] suggested a two-branch extended network structure optimized for detail and contour reconstruction. To reduce the overall number of parameters while assuring accurate extraction of both local and nonlocal features, Hou et al. [37] used stacked local and nonlocal residual blocks for nonlocal and local feature extraction. To enhance the edge detail information of an image, Li et al. [38] designed an LGC-GDAN, which uses a dual-region discriminator and generator to enhance the edge detail information. However, these models usually require large memory and computing power, which not only reduces the efficiency of image processing but also leads to a waste of resources. Our approach prioritizes the full exploration of nonlocal information as well as the efficient extraction of local information to prevent the loss of richer low- and high-frequency information and employs a lighter-weight feature fusion structure to ensure the expressiveness and efficiency of the model while achieving a more comprehensive feature representation.

2.3. Lightweight Image SR

SR approaches have advanced significantly in the realm of image reconstruction. However, while pursuing enhanced performance, current SR models are typically sophisticated and computationally inefficient, increasing the demand on hardware resources and posing obstacles for practical deployment. Therefore, scholars have developed many lightweight SR models. To handle the issue of redundant model parameters, Hui et al. [39] introduced the IDN, which uses numerous information distillation blocks to gradually extract leftover information. Later on, it was extended to the IMDN [40], which proposes an adaptive tailoring method to achieve arbitrary scale amplification. Ahn et al. [4] suggested adopting a recursive network architecture to help improve model performance while minimizing parameter count and achieving a lightweight model. On the basis of the recursive architecture, Tai et al. [41] suggested a DRRN that reduces the number of model parameters by using the same network structure and parameters in each recursion, thus improving the model’s capacity for generalization. However, the recursive module’s recurrent actions result in more redundant processing, which slows down reasoning. In order to reduce unnecessary computation and achieve finer feature extraction and exploitation through feature distillation, Liu et al. [11] suggested the RFDN architecture with the addition of residual blocks. Kong et al. [42] modified the RFDN by removing the multi-branching structure and introducing the loss of contrast to speed up the inference speed and improve the accuracy. In order to lower complexity and enhance reconstruction quality, Gao et al. [43] created an FDEB utilizing distillation procedures and feature improvement.

3. Method

In this section, we first provide an overview of the proposed model architecture in Section 3.1. Subsequently, the dual-branch DBFA module is described in detail in Section 3.1. In Section 3.3, we explain the effectiveness of the EFFN module. Finally, the structure of the FAMs is presented in Section 3.4.

3.1. Overall Architecture

Figure 2 depicts the main network structure of our proposed DFCFNet. The network architecture starts with LR as the input and uses a 3 × 3 convolution layer for early feature extraction. Then, it is routed through a sequence of feature aggregation modules (FAMs) for deep feature extraction, with the FAMs consisting of the DBFA and EFFN. Following the FAMs, the picture with richer features is output and submitted to the next FAM for further feature extraction. To make the model more lightweight, we utilize a 3 × 3 convolution to convert the size to the requested size for upsampling operations. We use the residual operation in the deep feature extraction module to retain the input image’s nonlocal structural information.

3.2. DBFA Module

Recently, a large number of Transformer-based super-resolution methods have emerged, which leverage self-attention mechanisms to achieve superior visual performance. However, these methods typically incur high computational costs and tend to emphasize nonlocal modeling while neglecting the effective representation of local details. To address this issue, we propose a dual-branch DBFA module. Specifically, the module integrates a non-focal exploration branch (NFEB) and a focused local feature branch (FLFB) to jointly model nonlocal dependencies and local details, thereby improving both reconstruction quality and computational efficiency. As illustrated in Figure 3a, the overall architecture of DBFA is presented. In particular, the input features are first expanded along the channel dimension by a factor of two so that they can be evenly distributed to the two branches for subsequent processing. The formulation can be expressed as follows:
{ F g , F l } = S p l i t ( C o n v 1 × 1 ( F i n ) )
where F i n denotes the input of DBFA, while F g and F l represent the two outputs obtained after the channel splitting operation. C o n v 1 × 1 ( · ) denotes a 1×1 convolution, and S p l i t ( · ) denotes the channel splitting operation.

3.2.1. NFEB

As illustrated in Figure 3a, within the DBFA module, the NFEB branch is employed to model nonlocal information in images. In this branch, a global variance strategy is introduced. Specifically, this strategy measures the dispersion of channel features by computing the global statistical variance of feature maps, thereby capturing their overall distribution characteristics. By incorporating such global statistical information, the model can effectively capture long-range nonlocal dependencies while maintaining low computational cost. In the NFEB branch, the input features are first downsampled to extract low-frequency components, which are then fed into a 3 × 3 depthwise convolution to obtain initial nonlocal structural representations. Subsequently, these features are passed to the approximate depthwise separable convolution (ADSC) module to further enhance nonlocal feature representations. As shown in Figure 3c, the ADSC consists of a 1 × 1 convolution, an activation function, and a depthwise convolution. Different from conventional depthwise separable convolution, ADSC first employs a standard 1 × 1 convolution to promote inter-channel information interaction, followed by a nonlinear activation to enhance feature expressiveness, and finally applies a depthwise convolution to further refine the features. The above process can be formulated as follows:
F g a = A D S C ( D W C o n v 3 × 3 ( M ( F g ) ) )
where F g denotes the input of NFEB, M ( · ) denotes maximum pooling, F g a denotes the output of ADSC, D W C o n v 3 × 3 ( · ) denotes a 3   ×   3 deep convolutional layer, and A D S C ( · ) denotes approximate depth separable convolution.
To fully exploit the original information of the image, we do not adopt a simple local residual structure. Instead, we incorporate a global variance strategy into the local residual modeling process to facilitate more comprehensive exploration of nonlocal information. The resulting features are then effectively fused with the nonlocal representations generated by ADSC, thereby further enhancing feature expressiveness. The overall process can be formulated as follows:
F ρ = C o n v 1 × 1 ( F g a + σ 2 ( F g ) )
where F g denotes the input of NFEB, F g a denotes the output of ADSC, C o n v 1 × 1 ( · ) denotes a 1   ×   1 convolutional layer, wherein F ρ R H × W × C . For the global variance, it can be formulated as:
σ 2 ( F g ) = 1 N i = 0 N 1 ( f i μ ) 2
where σ 2 ( F g ) represents the global variance of F g , N is the total number of pixels, f i represents the value of every pixel, and μ is the average of all pixels.
In the NFEB branch, we aggregate the input and output features to further enhance the nonlocal information of the image. The process can be formulated as follows:
F g o u t = F g U ( A ( F ρ ) )
where F g o u t represents the output feature of this branch. A ( · ) denotes the activation function, U ( · ) denotes upsampling, and ⊙ denotes element-by-element multiplication computation.

3.2.2. FLFB

Local detail information is crucial for high-frequency reconstruction. Methods that rely solely on nonlocal modeling often fail to fully exploit fine-grained structures in images. To address this issue, we construct the FLFB branch to effectively extract local features from images. As shown in Figure 3a, within FLFB—inspired by CCM [7]—we design a partial convolution channel mixer (PCCM) to enhance local information modeling. Specifically, we modify and improve the CCM by incorporating the advantages of partial convolution (PConv) to more effectively model local features. On the one hand, the CCM improves parameter utilization efficiency and enhances local feature representation. On the other hand, PConv reduces redundant computations during local feature extraction. By effectively combining these two components, PCCM not only reduces computational cost but also better preserves image details and edge information, thereby improving overall reconstruction quality and visual performance.The structure of PCCM is shown in Figure 3d, while the architecture of PConv is illustrated in Figure 4.
In the FLFB branch, the input features are first normalized to ensure more stable training. After passing through an activation function, the features are then fed into the PCCM module to extract fine-grained local details. The process can be formulated as follows:
F l p = P C C M ( A ( L N ( F l ) ) )
where F l p represents the output feature of PCCM, F l represents the input of FLFB, L N ( · ) represents normalization, A ( · ) represents the activation function. PCCM (·) represents the partial convolution channel mixer.
To fully exploit image information, a residual connection is introduced after the output of PCCM to enhance information propagation, accelerate convergence, and strengthen local detail representation. Subsequently, a 1 × 1 convolution is applied to further refine the local features, which serves as the output of this branch F l o u t . The above process can be formulated as follows:
F l o u t = C o n v 1 × 1 ( F l + F l p )
where F l p represents the output feature of PCCM, F l represents the input of FLFB, C o n v 1 × 1 ( · ) denotes a 1 × 1 convolutional layer.
Finally, the outputs of NFEB and FLFB are combined to serve as the output F a of DBFA:
F a = C o n v 1 × 1 ( F g o u t + F l o u t )

3.3. Efficient Feed-Forward Network Model

When performing fully connected mapping, traditional feed-forward neural networks (FFNs) ignore the correlation between channels and the importance of features by performing equal-proportional transformations on all channels, which can easily lead to feature redundancy and information loss. In addition, in high-dimensional feature spaces, FFNs use point-by-point fully connected operations, which are computationally intensive and lack local perception, limiting the expressiveness of features and which may lead to overfitting. To address these issues, we draw on the idea of MBConv [44] and use its deep separable convolution and channel attention mechanism to reduce computational overhead while enhancing the flexibility and expressiveness of feature extraction.
Therefore, we propose an EFFN module to fully mine and utilize valuable feature information, thereby improving the generalization ability and computational efficiency of the model, as shown in Figure 3b. Specifically, this module first uses 1 × 1 convolution to expand the channel and then uses channel separation to divide one part into C / 2 for enhancing local information; following that, it uses 3 × 3 convolution to extract local information and capture the feature relationship of adjacent pixels, while the other part retains the initial image information. Finally, 1 × 1 convolution is used to integrate channels and capture their feature relationships. In this process, activation functions are used to apply nonlinear changes to help the model learn complex feature patterns more effectively, thereby improving the network’s expressiveness and generalization ability. This component can be expressed by the formula:
F a i n = A ( C o n v 1 × 1 ( F a ) )
As shown in Figure 3b, F a denotes the input of EFFN, and F a i n represents the output feature after the 1 × 1 convolution and activation function. C o n v 1   ×   1 ( · ) denotes a 1   ×   1 convolutional layer, and A ( · ) represents the activation function.
F a l o u t = A ( C o n v 1 × 1 ( C o n v 3 × 3 ( S p l i t ( F a i n ) ) ) )
where F a l o u t denotes the output after enhanced local information processing. C o n v 3 × 3 ( · ) denotes a 3 × 3 convolution, and S p l i t ( · ) denotes the channel splitting operation, wherein F a R H × W × C , and F a i n R H × W × C . While performing local feature refinement, in order to ensure that nonlocal information is not lost, we cascade and feed F a l o u t and F a 2 into the 1 × 1 convolution for further feature blending and then downscale to the original dimension. We also include a residual operation in this procedure, which can be stated as follows, in order to ensure that the output features’ detailed information is preserved and to facilitate the flow of information:
{ F a 1 , F a 2 } = S p l i t ( F a i n )
F o u t = F a + C o n v 1 × 1 ( C ( F a 2 , F a l o u t ) )
where F a 1 and F a 2 denote the two outputs obtained after channel splitting of F a i n , F o u t denotes the output of the EFFN module, and C ( · ) denotes the channel connection.

3.4. Feature Aggregation Module

In summary, our proposed DBFA and EFFN are combined into a feature aggregation module (FAM), which can be expressed as:
F f = E F F N ( D B F A ( F i n ) + F i n )
where F i n denotes the initial input of the model, and F f denotes the output of the FAM module. In particular, the proposed DFCFNet uses the same loss function as SAFMN [7] and is trained. To intuitively analyze the network’s feature extraction capability at different stages, we visualize the feature maps at key layers, as shown in Figure 5. Starting from the LR input, the features are initially processed by a Conv layer before being progressively refined through DBFA, EFFN, and multiple FAMs, ultimately generating the SR output. The feature maps at different stages illustrate how the network captures both local and nonlocal information, while the color variations reflect the feature responses at different levels, further validating its effectiveness in extracting textures, edges, and global details.

4. Experimental Results and Analyses

4.1. Primary Task: RSISR

4.1.1. Datasets

We utilized three remote sensing datasets: UCMerced [45], AID [46] and RSSCN7 [47]. The UCMerced collection includes photos from 21 different remote sensing categories. Each category comprises 100 photos with a size of 256 × 256. To maintain consistency with earlier investigations [48,49], we employed the same dataset subset. During training, 10% was retained to verify the model’s performance. The AID dataset contains 30 kinds of remote sensing photos representing various situations. The resolution for all photographs is 600 × 600. We randomly split the dataset into training and test sets. Five photographs were extracted from each category to verify the model, with the other images serving as test sets. The RSSCN7 dataset comprises 2800 images distributed among seven different classes, each with a resolution of 400 × 400 pixels. In order to test the generalization ability of the model, we also used the RSSCN7 dataset for testing.

4.1.2. Metrics

To be consistent with many studies, we calculated each image’s PSNR and SSIM values to evaluate the quality of the reconstructed image. The quality of the picture reconstruction improves with increasing PSNR and SSIM values. Additionally, to compare and examine the performance of the models in more detail, we calculated the FLOP, Parameters, and inference time. The smaller the FLOP and Parameters, the smaller the computational complexity and number of parameters of the model. Additionally, we computed the remote sensing image’s SAM [50] and SCC [51] from the image’s spectral perspective. In particular, the SAM computes the angle between two vectors to determine how similar two spectra are. The better spectral information is maintained during picture reconstruction as the angle lowers, the better the image reconstruction’s visual impact. An assessment of the spatial correlation between an image’s pixels is called the SCC. It can quantify how closely the original image’s spatial structure resembles that of the rebuilt image. The better the model reconstruction effect, the more the rebuilt image matches the original image in spatial organization as the SCC value rises. More details are lost or significant distortions are produced during the reconstruction process when the SCC value is lower.

4.1.3. Implementation Details

We employed scaling factors of × 2 , × 3 , and × 4 in training, which is in line with several of the techniques examined in RSISR. To supplement the data, we also horizontally rotated and flipped the photos, much like in previous experiments. To minimize pixel-level discrepancies and guarantee the uniformity of the overall intensity distribution between the reconstructed and original images, the suggested model employs mean square error (MSE) loss. The spectrum information is simultaneously constrained by the FFT-based frequency loss, which improves the texture and structure of the image and helps to retain high-frequency details. Adam optimized it, setting the initial learning rate at 1 × 10 3 and the number of iterations at 1000 k. All tests were conducted using the Pytorch 2.1.0 framework on an NVIDIA GeForce RTX 4090 GPU. Two DFCFNet versions with varying levels of complexity were trained. The large version DFCFNet-S has 10 FAMs and 48 channels, whereas the ordinary version DFCFNet has 8 FAMs and 36 channels. The comparison experiments provide both.

4.1.4. Quantitative Results

We compared DFCFNet with other leading RSISR methods on the UCMerced and AID datasets, as shown in Table 1. DFCFNet obtained the best results compared to CNN-based methods (i.e., SRCNN [5], DCM [52], LGCNet [53], HSENet [54], SRDD [55], FENet [56], and VDSR [20]). DFCFNet also showed the best performance compared to Transformer-based methods (e.g., TransENet [57] and OmniSR). Table 1 presents a comprehensive comparison of our proposed model with other state-of-the-art methods in terms of the PSNR, SSIM, SCC, and SAM values. Our model consistently outperformed other methods across all evaluation metrics, demonstrating its superior image reconstruction capability. Specifically, excluding our method, we took the strong-performing OmniSR as a representative baseline for comparison. On the UCMerced dataset, DFCFNet achieved average improvements of 0.56 dB and 0.012 in its PSNR and SSIM values, respectively. In terms of spectral metrics, the SCC and SAM improved by 0.0305 and 0.004 on average. On the AID dataset, the proposed method further achieved gains of 0.88 dB and 0.021 in its PSNR and SSIM values, while the SCC and SAM improved by 0.054 and 0.007, respectively. These results demonstrate that the proposed model achieves superior performance in both reconstruction accuracy and structural preservation. Moreover, the improvements in SCC and SAM further indicate enhanced spectral consistency, leading to more comprehensive performance gains.
Additionally, we examined photographs from several categories—this category includes 30 separate categories—to get a closer look at our model’s performance. In Table 2, the comparative findings are displayed. The success of DFCFNet is demonstrated by the fact that, when compared to the second best model approach, HSENet, it improved the PSNR by 0.78 dB and 0.39 dB, respectively, for the farmland and square categories.

4.1.5. Qualitative Results

Figure 6 shows the visualization results of different methods. As shown in the figure, we observe that most methods produced obvious artifacts and blur. The picture reconstructed by our DFCFNet includes more realistic and full contour information in addition to being clearer than the image reconstructed by other methods in the Figure. This is mostly due to the fact that the FLFB and NFEB in the DBFA work together to extract enough features from the image to enable the reconstruction of images with improved visual quality.

4.1.6. Inference Speed and Network Complexity

As shown in Table 3, although OmniSR has fewer parameters than previous approaches, our DFCFNet delivered a 0.39 dB greater PSNR than OmniSR while using almost four times less parameters. This is attributed to a novel PCCM we proposed. In the PCCM, we improve and combine PConv and a CCM. PConv can not only reduce redundant calculations but also extract feature information more effectively. The CCM further enhances feature representation and stabilizes training. The results prove that our proposed PCCM is effective in reducing network complexity. The model is not only superior to other networks in inference speed and network complexity but also better in its reconstruction effect. Furthermore, we carried out a number of ablation tests on the PCCM, and Figure 7 provides additional evidence of the usefulness and significance of our module for network construction.

4.2. Extended Task: Natural Image SR

4.2.1. Datasets

We chose popular test datasets (Set14 [59], B100 [60], Urban100 [61], and Manga109 [62]) for testing and used the most popular natural picture dataset, DIV2K [63], which contains 800 HR images, as the training dataset. Natural photos of a wide variety of scenes can be found in DIV2K. In MATLAB R2022b, we used double and triple interpolation downsampling to create LR pictures, which we then compared under × 2 , × 3 , and × 4 .

4.2.2. Metrics

We decide to assess the SR index using the PSNR and SSIM in order to be consistent with other studies. The Y channel is typically used for image evaluation in natural settings. Consequently, we evaluated solely on the Y channel after uniformly converting the SR results to the YCbCr color space. Furthermore, for comparison, we assessed each method’s network complexity independently.

4.2.3. Implementation Details

We employed rotation and horizontal flipping for data augmentation, in line with previous methods. Adam is what we used for optimization. We set the initial learning rate to 1 × 10 3 , the batch size to 16, and the total number of iterations to 1000 k during the training phase. All of our studies were conducted in PyTorch, and we made use of an NVIDIA RTX 3090Ti GPU.

4.2.4. Quantitative Results

We compared DFCFNet with other leading lightweight SISR methods (i.e., CARN [41], LMAN-S [48], IDN [39], IMDN [40], SMSR [64], FDIWN-M [20], RFDN [11], VLESR [65], GASSL-S [66], AMFFN [67], and FDSCSR-S [68]), and Table 4 lists the comparative results of these methods. We can observe that DFCFNet outperformed the second-place AMFFN by an average of 0.11 dB and 0.0016 in PSNR and SSIM in terms of accuracy for the ×4SR task. This experiment confirms the effectiveness of our approach to achieve a favorable balance between the image reconstruction accuracy and the achieved network parameters.

4.2.5. Qualitative Results

Figure 8 compares the visual effects of several SR networks on a natural image dataset × 4 . Our DFCFNet demonstrates sharper contours and textures. This further confirms the excellence of our approach, and our DFCFNet shows better performance in both natural SR and RSISR.

5. Ablation Study

For our model, we conducted very extensive ablation experiments to more directly observe the influence and effectiveness of each of our modules on the network. All our ablation experiments were performed on the × 4 DFCFNet model and trained and evaluated on the AID and UCMerced datasets.

5.1. Effectiveness of DBFA

The proposed DBFA module contains two branches, NFEB and FLFB, which use parallel structures to explore nonlocal information and effectively extract local information, with significantly improved accuracy. We took out the NFEB and FLFB and compared them with DFCFNet to more clearly show the success of our model DBFA. Table 5 shows that the PSNR and SSIM decreased by 0.3 dB and 0.0099 in the AID dataset and by 0.43 dB and 0.0147 in the UCMerced dataset. These experiments illustrate the importance of DBFA to DFCFNet. Furthermore, to visualize the effectiveness of DBFA, we present it in the Local Attribution Map (LAM) and diffusion index (DI), as shown in Figure 9. Specifically, when only NFEB was introduced, although the distribution range of the red points was slightly expanded, the response intensity remained relatively weak. This indicates that the model still mainly focuses on local information modeling, and its feature representation capability has not been sufficiently enhanced. When only FLFB was incorporated, a noticeable increase in the response intensity of red points could be observed in local regions; however, the spatial coverage remained limited. This suggests that while the model improves local feature representation, its ability to capture global information is still insufficient. In contrast, after introducing DBFA, the red points not only exhibited stronger responses in local regions but also expanded significantly to a wider spatial range. This demonstrates that DBFA effectively enhances feature interaction and information propagation, enabling the model to capture long-range dependencies while preserving local detail representation, thereby improving the completeness and consistency of overall feature representations.
Given that the DBFA module we created has a local feature focused branch and a nonlocal feature exploration branch, we did complimentary tests on each branch individually to demonstrate their usefulness. We can see that without NFEB, the model was reduced by 0.09 dB and 0.0028 in the AID dataset and by 0.13 dB and 0.0045 in the UCMerced dataset. To demonstrate the impact of our FLFB on DBFA, we replaced the PCCM in the FLFB with other modules, as will be detailed in Section 5.2. In the absence of the FLFB, we can see that the PSNR on both the AID and UCMerced datasets fell. In order to visualize the complementary nature of the FLFB and NFEB more visually, we used the power spectral density (PSD) to visualize the features, as shown in Figure 10. Compared with F i n , the PSD of F g a output by the FLFB is distributed in the peripheral area, indicating that high-frequency features are highlighted to increase the representation of image details, such as texture and edges. The PSD of F l p output by NFEB is concentrated in the central area, indicating that low-frequency features dominate and represent overall aspects, such as background information.

5.2. Impact of PCCM

To further verify the effect of the FLFB on local information exploration, we prohibited the use of the PCCM in the FLFB and compared it with DFCFNet, as shown in Figure 7. We plotted the index curve, and it can be seen that DFCFNet has a much higher PSNR and SSIM than ‘Not—PCCM’, with much higher metrics and better convergence. To further explore the effectiveness of PCCM, we replaced it with PConv and the CCM, respectively, and tested it in UCMerced. We observed the curves and found that when only PConv and the CCM were used, the two indicators ended up lower than DFCFNet. Furthermore, to intuitively evaluate the effectiveness of the internal components within PCCM, we conducted comparative experiments among “Pure PConv”, “Pure CCM”, and “CCM + PConv” on the AID and UCMerced datasets. The quantitative results are presented in Table 6. Specifically, when only PConv was used, although it achieved a lower parameter count and FLOPs, the PSNR and SSIM on both datasets were significantly lower than those of the combined configuration. When only the CCM was adopted, while the parameter count and PSNR ended up being comparable, the SSIM decreased by 0.0006 and 0.0009 on the AID and UCMerced datasets, respectively.
After integrating PConv and the CCM, both the PSNR and SSIM were substantially improved on the two datasets. Meanwhile, the increase in parameter count and FLOPs is marginal compared to CCM alone. These results further demonstrate that incorporating PConv can enhance reconstruction performance while maintaining low computational cost, achieving a favorable trade-off between performance and complexity.

5.3. Effectiveness of the EFFN

To further investigate the data, we introduced the EFFN. We next performed tests by deleting and substituting EFFN, respectively, to intuitively confirm the impact of our EFFN on the model. Table 5 shows that all the metrics decreased when the EFFN was removed compared to DFCFNet. As shown in Figure 7, for direct observation, we replaced the EFFN with a regular FFN, and all the metrics on the UCMerced dataset ended up being lower. To further illustrate the roles of DBFA and EFFN in image restoration, we conducted a comparative analysis of LAM, as shown in Figure 9. During the reconstruction process, the significance of the relationship between red spots and rectangular boxes was evaluated to assess their correlation. Additionally, we computed the DI values, where a higher DI indicates a broader pixel coverage. The results demonstrate that our proposed DFCFNet effectively captures more extensive information, thereby enhancing the quality of image reconstruction.

5.4. Validity of Global Variance

We used the global variance operation in the NFEB to more fully explore nonlocal information to observe the effectiveness of this module more directly, as shown in Figure 7. In addition, we conductws a validation study on the proposed global variance strategy. Specifically, we compared the self-attention (SA) with the global variance strategy. Under the condition that all other components remained unchanged, we replaced the global variance strategy with SA for comparison. The results are reported in Table 7. Although SA is capable of modeling long-range dependencies, the proposed global variance strategy achieved more favorable performance, with average improvements of 0.13 dB in PSNR and 0.0046 in SSIM across two public datasets.
Furthermore, we provide additional comparisons in terms of model complexity. The results are shown in Table 7. SA has a similar number of parameters and global residual strategy, but it incurs significantly higher computational costs in terms of FLOPs and average inference time. This further demonstrates that the proposed global variance strategy achieves a better trade-off between performance and computational efficiency.

5.5. The Effects of FAM and Channel Number on the Network

We recognized that different numbers of channels and FAMs may have varying degrees of impact on network performance, so we experimented with different numbers of channels and FAMs to see how they affected the model. We used the DIV2K dataset for training and evaluated it on the Urban100 and Manga109 datasets. As shown in Table 8, although better results were achieved with 48 channels and 14 FAMs, it was three times higher than the number of parameters with 36 channels and 8 FAMs, and the memory consumption was higher.

6. Discussion

The proposed method is capable of preserving richer detail information and achieves satisfactory results. Furthermore, to evaluate the generalization ability of the model, experiments were conducted on both remote sensing datasets and natural image datasets. The experimental results demonstrate that the proposed method not only achieves competitive performance but also exhibits strong generalization capability. However, this study still has several limitations. First, the current model has been primarily validated under relatively limited spectral settings, and its generalization ability to more complex spectral data, such as multispectral or hyperspectral scenarios, remains to be further explored. Second, real-world remote sensing imaging often involves multiple degradation factors, including complex noise, blur, and compression artifacts. However, the proposed method has been mainly trained under relatively idealized or fixed degradation assumptions, and thus its robustness under more challenging degradation conditions still has room for improvement. In addition, during practical deployment, the model performance may be constrained by hardware platforms and runtime environments. For instance, when deployed on platforms such as HiSilicon chips, if the input images contain severe compression noise, the super-resolution process may further amplify such noise, thereby increasing the difficulty of reconstruction.
Future work will focus on the following directions. First, we aim to extend the model to handle multispectral or higher-dimensional spectral data to improve its generalization capability. Second, incorporating unsupervised or self-supervised learning strategies may enhance the model’s robustness and adaptability to complex degradations. Finally, we plan to explore more efficient lightweight designs to improve deployment efficiency in real-world applications.

7. Conclusions

We propose a lightweight and effective DFCFNet model to solve image SR. The model contains a DBFA module and EFFN module. Specifically, our DBFA module contains an FLFB and NFEB, and the two branches process images in parallel. Among them, we develop a PCCM module in the FLFB branch to capture local details information. The NFEB introduces global variance to explore nonlocal detail information. In order to make the network more lightweight, we performed simple and efficient fusion at the end. In addition, we introduced the EFFN to make better use of the DBFA module’s local and nonlocal characteristics for channel and spatial information. To verify the generalization ability of our model, we also trained and tested the model in natural images and evaluated it in the natural image domain. Our suggested DFCFNet strikes a favorable compromise between reconstruction performance, computational efficiency, and light weight, according to extensive experimental data.

Author Contributions

Conceptualization, M.Z.; methodology, M.Z.; software, M.Z.; validation, M.Z.; formal analysis, M.Z.; investigation, X.C.; resources, M.Z., H.G. and W.Z.; data curation, J.P. and X.C.; writing—original draft preparation, M.Z.; writing—review and editing, W.Z., Q.W., M.Z., J.P. and X.C.; visualization, M.Z.; supervision, H.G. and W.Z.; project administration, H.G. and Q.W.; funding acquisition, H.G. and Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China General Program under Grant 62471389, and in part by Shaanxi Province Technological Innovation Guidance Special Project: Regional Science and Technology Innovation Center, Strategic Scientific and Technological Strength Category: 2024QY-SZX-26.

Data Availability Statement

The data for this article are presented in the article. The data and materials supporting the findings are available from the corresponding author upon reasonable request.

Acknowledgments

The authors sincerely thank all colleagues in the laboratory for their support and assistance during this study. The authors especially appreciate the editor for their meticulous work and professional guidance and extend their sincere gratitude to the anonymous reviewers for their constructive comments and valuable suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HRHigh-Resolution
LRLow-Resolution
SRSuper-Resolution
CNNConvolutional Neural Network
GANGenerative Adversarial Network
PSNRPeak Signal-to-Noise Ratio
SSIMStructural Similarity Index
MSEMean Square Error
FLOPsFloating Point Operations

References

  1. Zhao, Q.; Lyu, S.; Chen, L.; Liu, B.; Xu, T.B.; Cheng, G.; Feng, W. Learn by oneself: Exploiting weight-sharing potential in knowledge distillation guided ensemble network. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6661–6678. [Google Scholar] [CrossRef]
  2. López-Cifuentes, A.; Escudero-Viñolo, M.; Bescós, J.; San Miguel, J.C. Attention-based knowledge distillation in scene recognition: The impact of a DCT-driven loss. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 4769–4783. [Google Scholar] [CrossRef]
  3. Zhang, L.; Lu, W.; Huang, Y.; Sun, X.; Zhang, H. Unpaired Remote Sensing Image Super-Resolution with Multi-Stage Aggregation Networks. Remote. Sens. 2021, 13, 3167. [Google Scholar] [CrossRef]
  4. Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 252–268. [Google Scholar]
  5. Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef]
  6. Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2016; pp. 391–407. [Google Scholar] [CrossRef]
  7. Sun, L.; Dong, J.; Tang, J.; Pan, J. Spatially-adaptive feature modulation for efficient image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 13190–13199. [Google Scholar] [CrossRef]
  8. Niu, B.; Wen, W.; Ren, W.; Zhang, X.; Yang, L.; Wang, S.; Zhang, K.; Cao, X.; Shen, H. Single image super-resolution via a holistic attention network. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 191–207. [Google Scholar] [CrossRef]
  9. Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient long-range attention network for image super-resolution. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2020; pp. 649–667. [Google Scholar] [CrossRef]
  10. Li, F.; Bai, H.; Zhao, Y. FilterNet: Adaptive information filtering network for accurate and fast image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1511–1523. [Google Scholar] [CrossRef]
  11. Liu, J.; Tang, J.; Wu, G. Residual feature distillation network for lightweight image super-resolution. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 41–55. [Google Scholar]
  12. Zheng, P.; Jiang, J.; Zhang, Y.; Zeng, C.; Qin, C.; Li, Z. CGC-net: A context-guided constrained network for remote-sensing image super resolution. Remote. Sens. 2023, 15, 3171. [Google Scholar] [CrossRef]
  13. Gao, X.; Zhang, L.; Mou, X. Single image super-resolution using dual-branch convolutional neural network. IEEE Access 2018, 7, 15767–15778. [Google Scholar] [CrossRef]
  14. Wang, Y.; Zhang, H.; Zeng, X.; Wang, B.; Li, W.; Ding, W. Binary Lightweight Neural Networks for Arbitrary Scale Super-Resolution of Remote Sensing Images. IEEE Trans. Geosci. Remote. Sens. 2025, 63, 1–16. [Google Scholar] [CrossRef]
  15. Wang, Y.; Shao, Z.; Lu, T.; Huang, X.; Wang, J.; Zhang, Z.; Zuo, X. Lightweight remote sensing super-resolution with multi-scale graph attention network. Pattern Recognit. 2025, 160, 111178. [Google Scholar] [CrossRef]
  16. Chen, R.; Zhang, Y. Learning dynamic generative attention for single image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 8368–8382. [Google Scholar] [CrossRef]
  17. Liu, Y.; Jia, Q.; Fan, X.; Wang, S.; Ma, S.; Gao, W. Cross-SRN: Structure-preserving super-resolution network with cross convolution. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 4927–4939. [Google Scholar] [CrossRef]
  18. Zuo, Y.; Xie, J.; Wang, H.; Fang, Y.; Liu, D.; Wen, W. Gradient-guided single image super-resolution based on joint trilateral feature filtering. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 505–520. [Google Scholar] [CrossRef]
  19. Liu, Z.; Feng, R.; Wang, L.; Han, W.; Zeng, T. Dual learning-based graph neural network for remote sensing image super-resolution. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  20. Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1646–1654. [Google Scholar]
  21. Wang, C.; Zhang, X.; Yang, W.; Wang, G.; Li, X.; Wang, J.; Lu, B. MSWAGAN: Multi-spectral remote sensing image super resolution based on multi-scale window attention transformer. IEEE Trans. Geosci. Remote. Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
  22. Yang, Y.; Zhao, H.; Huangfu, X.; Li, Z.; Wang, P. ViT-ISRGAN: A High-Quality Super-Resolution Reconstruction Method for Multi-Spectral Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2025, 18, 3973–3988. [Google Scholar] [CrossRef]
  23. Ma, J.; Zhang, L.; Zhang, J. SD-GAN: Saliency-discriminated GAN for remote sensing image superresolution. IEEE Geosci. Remote. Sens. Lett. 2019, 17, 1973–1977. [Google Scholar] [CrossRef]
  24. Wu, H.; Zhang, L.; Ma, J. Remote sensing image super-resolution via saliency-guided feedback GANs. IEEE Trans. Geosci. Remote. Sens. 2020, 60, 1–16. [Google Scholar] [CrossRef]
  25. Zhu, C.; Liu, Y.; Huang, S.; Wang, F. Taming a diffusion model to revitalize remote sensing image super-resolution. Remote. Sens. 2025, 17, 1348. [Google Scholar] [CrossRef]
  26. Wang, C.; Sun, W. Semantic guided large scale factor remote sensing image super-resolution with generative diffusion prior. ISPRS J. Photogramm. Remote. Sens. 2025, 220, 125–138. [Google Scholar] [CrossRef]
  27. Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 10684–10695. [Google Scholar]
  28. Wang, J.; Yue, Z.; Zhou, S.; Chan, K.C.; Loy, C.C. Exploiting diffusion prior for real-world image super-resolution. Int. J. Comput. Vis. 2024, 132, 5929–5949. [Google Scholar] [CrossRef]
  29. Hu, Y.; Gao, X.; Li, J.; Huang, Y.; Wang, H. Single image super-resolution via cascaded multi-scale cross network. arXiv 2018, arXiv:1802.08808. [Google Scholar]
  30. Wang, H. MSRDN: A Super-Resolution Network for Human Body. In Proceedings of the 2024 3rd International Conference on Innovations and Development of Information Technologies and Robotics (IDITR), Hong Kong, China, 23–25 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 178–182. [Google Scholar] [CrossRef]
  31. Ou, B.; Shao, G.; Yang, B.; Fei, S. FocalSR: Revisiting image super-resolution transformers with fourier-transform cross attention layers for remote sensing image enhancement. Geomatica 2025, 77, 100042. [Google Scholar] [CrossRef]
  32. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 10012–10022. [Google Scholar]
  33. Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 22367–22377. [Google Scholar] [CrossRef]
  34. Park, N.; Kim, S. How do vision transformers work? arXiv 2022, arXiv:2202.06709. [Google Scholar] [CrossRef]
  35. Vanyan, A.; Barseghyan, A.; Tamazyan, H.; Huroyan, V.; Khachatrian, H.; Danelljan, M. Analyzing local representations of self-supervised vision transformers. arXiv 2023, arXiv:2401.00463. [Google Scholar] [CrossRef]
  36. Shi, M.; Gao, Y.; Chen, L.; Liu, X. Dual-branch multiscale channel fusion unfolding network for optical remote sensing image super-resolution. IEEE Geosci. Remote. Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  37. Hou, J.; Si, Y.; Li, L. Image super-resolution reconstruction method based on global and local residual learning. In Proceedings of the 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC), Xiamen, China, 5–7 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 341–348. [Google Scholar]
  38. Li, H.; Deng, W.; Zhu, Q.; Guan, Q.; Luo, J. Local-global context-aware generative dual-region adversarial networks for remote sensing scene image super-resolution. IEEE Trans. Geosci. Remote. Sens. 2024, 62, 5402114. [Google Scholar] [CrossRef]
  39. Hui, Z.; Wang, X.; Gao, X. Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 723–731. [Google Scholar]
  40. Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th Acm International Conference on Multimedia, Nice, France, 21–25 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2024–2032. [Google Scholar]
  41. Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3147–3155. [Google Scholar]
  42. Kong, F.; Li, M.; Liu, S.; Liu, D.; He, J.; Bai, Y.; Chen, F.; Fu, L. Residual local feature network for efficient super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 766–776. [Google Scholar] [CrossRef]
  43. Gao, F.; Li, L.; Wang, J.; Sun, K.; Lv, M.; Jia, Z.; Ma, H. A lightweight feature distillation and enhancement network for super-resolution remote sensing images. Sensors 2023, 23, 3906. [Google Scholar] [CrossRef]
  44. Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
  45. Yang, Y.; Newsam, S. Bag-of-Visual-Words and Spatial Extensions for Land-Use Classification; ACM: New York, NY, USA, 2010; pp. 270–279. [Google Scholar]
  46. Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Zhang, L. AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification. IEEE Trans. Geosci. Remote. Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef]
  47. Zou, Q.; Ni, L.; Zhang, T.; Wang, Q. Deep learning based feature selection for remote sensing scene classification. IEEE Geosci. Remote. Sens. Lett. 2015, 12, 2321–2325. [Google Scholar] [CrossRef]
  48. Wan, J.; Yin, H.; Liu, Z.; Chong, A.; Liu, Y. Lightweight Image Super-Resolution by Multi-Scale Aggregation. Broadcast. IEEE Trans. (T-BC) 2021, 67, 372–382. [Google Scholar] [CrossRef]
  49. Hajian, A.; Aramvith, S. AERU-Net: Adaptive Edge Recovery and Attention U-Shaped Network for Remote Sensing Image Super-Resolution. IEEE Access 2025, 13, 59177–59197. [Google Scholar] [CrossRef]
  50. Yuhas, R.H.; Goetz, A.F.; Boardman, J.W. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In Proceedings of the JPL, Summaries of the Third Annual JPL Airborne Geoscience Workshop, 1 June 1992; AVIRIS Workshop; NASA: Washington, DC, USA, 1992; Volume 1. [Google Scholar]
  51. Zhou, J.; Civco, D.L.; Silander, J.A. A wavelet transform method to merge Landsat TM and SPOT panchromatic data. Int. J. Remote. Sens. 1998, 19, 743–757. [Google Scholar] [CrossRef]
  52. Haut, J.M.; Paoletti, M.E.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.; Li, J. Remote Sensing Single-Image Superresolution Based on a Deep Compendium Model. IEEE Geosci. Remote. Sens. Lett. 2019, 16, 1432–1436. [Google Scholar] [CrossRef]
  53. Lei, S.; Shi, Z.; Zou, Z. Super-Resolution for Remote Sensing Images via Local–Global Combined Network. IEEE Geosci. Remote. Sens. Lett. 2017, 14, 1243–1247. [Google Scholar] [CrossRef]
  54. Lei, S.; Shi, Z. Hybrid-Scale Self-Similarity Exploitation for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 5401410. [Google Scholar] [CrossRef]
  55. Maeda, S. Image Super-Resolution with Deep Dictionary. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
  56. Wang, Z.; Li, L.; Xue, Y.; Jiang, C.; Wang, J.; Sun, K.; Ma, H. FeNet: Feature Enhancement Network for Lightweight Remote-Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 5622112. [Google Scholar] [CrossRef]
  57. Lei, S.; Shi, Z.; Mo, W. Transformer-Based Multistage Enhancement for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 11. [Google Scholar] [CrossRef]
  58. Wang, H.; Chen, X.; Ni, B.; Liu, Y.; Liu, J. Omni aggregation networks for lightweight image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 22378–22387. [Google Scholar] [CrossRef]
  59. Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the International Conference on Curves and Surfaces; Springer: Berlin/Heidelberg, Germany, 2010; pp. 711–730. [Google Scholar]
  60. Arbelaez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 898–916. [Google Scholar] [CrossRef]
  61. Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 5197–5206. [Google Scholar]
  62. Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Sketch-based manga retrieval using manga109 dataset. Multimed. Tools Appl. 2017, 76, 21811–21838. [Google Scholar] [CrossRef]
  63. Timofte, R.; Agustsson, E.; Van Gool, L.; Yang, M.H.; Zhang, L. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 114–125. [Google Scholar]
  64. Wang, L.; Dong, X.; Wang, Y.; Ying, X.; Lin, Z.; An, W.; Guo, Y. Exploring Sparsity in Image Super-Resolution for Efficient Inference. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 4915–4924. [Google Scholar]
  65. Gao, D.; Zhou, D. A very lightweight and efficient image super-resolution network. Expert Syst. Appl. 2023, 213, 118898. [Google Scholar] [CrossRef]
  66. Wang, H.; Zhang, Y.; Qin, C.; Van Gool, L.; Fu, Y. Global aligned structured sparsity learning for efficient image super-resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10974–10989. [Google Scholar] [CrossRef] [PubMed]
  67. Wang, H.; Cheng, S.; Li, Y.; Du, A. Lightweight remote-sensing image super-resolution via attention-based multilevel feature fusion network. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 2005715. [Google Scholar] [CrossRef]
  68. Wang, Z.; Gao, G.; Li, J.; Yan, H.; Zheng, H.; Lu, H. Lightweight feature de-redundancy and self-calibration network for efficient image super-resolution. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 110. [Google Scholar] [CrossRef]
  69. Kim, J.; Lee, J.K.; Lee, K.M. Deeply-Recursive Convolutional Network for Image Super-Resolution; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
  70. Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
  71. Haris, M.; Shakhnarovich, G.; Ukita, N. Deep back-projection networks for super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1664–1673. [Google Scholar]
  72. Gu, J.; Dong, C. Interpreting super-resolution networks with local attribution maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 9199–9208. [Google Scholar]
Figure 1. The overall performance of our proposed method in comparison with other state-of-the-art methods (on Ucmerced dataset for × 4 SR), with the size of the circle representing the FLOPs of the model. The proposed DFCFNet achieves a better balance between reconstruction performance and computational efficiency.
Figure 1. The overall performance of our proposed method in comparison with other state-of-the-art methods (on Ucmerced dataset for × 4 SR), with the size of the circle representing the FLOPs of the model. The proposed DFCFNet achieves a better balance between reconstruction performance and computational efficiency.
Remotesensing 18 01626 g001
Figure 2. The overall architecture diagram of our proposed DFCFNet.
Figure 2. The overall architecture diagram of our proposed DFCFNet.
Remotesensing 18 01626 g002
Figure 3. Important module structure diagram. (a) DBFA. (b) EFFN. (c) ADSC. (d) PCCM.
Figure 3. Important module structure diagram. (a) DBFA. (b) EFFN. (c) ADSC. (d) PCCM.
Remotesensing 18 01626 g003
Figure 4. PConv.
Figure 4. PConv.
Remotesensing 18 01626 g004
Figure 5. Visualization of network feature graph. NFEB indicates Non-Focal Exploration Branch. FLFB indicates Focused Local Feature Branch. DBFA indicates dual-branch feature aggregation. FAM indicates feature aggregation module. Pseudo-color is used to highlight features in the feature graph.
Figure 5. Visualization of network feature graph. NFEB indicates Non-Focal Exploration Branch. FLFB indicates Focused Local Feature Branch. DBFA indicates dual-branch feature aggregation. FAM indicates feature aggregation module. Pseudo-color is used to highlight features in the feature graph.
Remotesensing 18 01626 g005
Figure 6. Visual comparisons for × 4 on the RSSCN7 and UCMerced dataset.
Figure 6. Visual comparisons for × 4 on the RSSCN7 and UCMerced dataset.
Remotesensing 18 01626 g006
Figure 7. Impact of different modules on indicators. (a) The effect of different models on PSNR. (b) The effect of different models on SSIM. The module in front of ‘–’ replaces the module that follows. Evaluated for × 4 on the UCMerced dataset.
Figure 7. Impact of different modules on indicators. (a) The effect of different models on PSNR. (b) The effect of different models on SSIM. The module in front of ‘–’ replaces the module that follows. Evaluated for × 4 on the UCMerced dataset.
Remotesensing 18 01626 g007
Figure 8. Visual comparisons for ×4 SR on the BSD100 and Urban100 datasets.
Figure 8. Visual comparisons for ×4 SR on the BSD100 and Urban100 datasets.
Remotesensing 18 01626 g008
Figure 9. A comparative analysis of LAMs and DIs [72]. DFCFNet makes full use of rich feature information to reconstruct more accurate and structured images.
Figure 9. A comparative analysis of LAMs and DIs [72]. DFCFNet makes full use of rich feature information to reconstruct more accurate and structured images.
Remotesensing 18 01626 g009
Figure 10. The power spectral density (PSD). FLFB activates more high-frequency information, while NFEB activates more low-frequency information.
Figure 10. The power spectral density (PSD). FLFB activates more high-frequency information, while NFEB activates more low-frequency information.
Remotesensing 18 01626 g010
Table 1. Average evaluation metrics on the UCMerced and AID datasets. The best and second-best results are highlighted in red and blue.
Table 1. Average evaluation metrics on the UCMerced and AID datasets. The best and second-best results are highlighted in red and blue.
ScaleMetricSRCNN [5]VDSR [20]DCM [52]LGCNet [53]HSENet [54]TransENet [57]SRDD [55]FENet [56]OmniSR [58]DFCFNet-SDFCFNet
UCMerced
× 2
PSNR33.0433.9534.1433.5434.3234.0534.2534.1434.3334.5834.69
SSIM0.91810.92810.93060.92420.93200.92940.93190.93040.93240.93550.9362
SCC0.58230.62280.63190.60510.63920.62750.63740.63320.63930.65190.6567
SAM0.05750.05150.05050.05420.04920.05110.04980.05070.04940.04720.0465
UCMerced
× 3
PSNR29.0029.7829.8629.3630.0429.9029.9229.9329.5630.4130.51
SSIM0.81420.83540.83930.82470.84330.83970.84110.84070.84450.85190.8542
SCC0.34440.39650.40250.36660.41310.39880.40850.40160.40130.43680.4439
SAM0.09010.08260.08200.08660.08050.08160.08150.08140.08270.07610.0753
UCMerced
× 4
PSNR26.9227.5627.6027.1827.7527.7827.6727.7027.7828.0928.17
SSIM0.72860.75220.75560.73940.76110.76350.76090.75890.76570.77300.7769
SCC0.21560.25900.26100.22860.26920.27010.27180.26390.27800.30110.3095
SAM0.11280.10540.10510.10980.10340.10290.10430.10410.10300.09830.0970
AID
× 2
PSNR34.7435.2035.3535.0035.5035.4035.3335.3134.8535.5135.64
SSIM0.92990.93490.93660.93270.93830.93720.93670.93610.93810.93830.9396
SCC0.60960.62210.64070.61730.66260.65380.63730.63710.63950.64860.6674
SAM0.05710.05390.05310.05540.05240.05300.05310.05350.05390.05200.0493
AID
× 3
PSNR30.6331.2531.3630.8731.4931.5031.3831.3331.5331.5531.62
SSIM0.83800.85260.85570.84410.85880.85880.85640.86480.85940.85960.8617
SCC0.35380.38480.39710.36470.40530.40670.39840.39610.40820.41110.4153
SAM0.08910.08920.08200.08660.08060.08060.08170.08230.08040.08010.0793
AID
× 4
PSNR28.5129.0129.2028.6729.3229.4429.2129.1527.8529.3629.43
SSIM0.75770.77460.78260.76460.78670.79120.78350.78030.73190.78750.7898
SCC0.21530.24280.26790.21980.27650.28840.26950.26030.16180.28310.2892
SAM0.11160.10550.10320.10940.10160.10020.10300.10390.12030.10110.1002
Table 2. Average PSNR (dB) of each class on the AID dataset at × 4 scale. The best and second-best results are highlighted in red and blue.
Table 2. Average PSNR (dB) of each class on the AID dataset at × 4 scale. The best and second-best results are highlighted in red and blue.
ClassBicubicSRCNN [5]LGCNet [53]VDSR [20]DCM [52]HSENet [54]DFCFNet-S (Ours)DFCFNet (Ours)
airport27.0328.1728.3928.8228.9929.0329.1729.25
bareland34.8835.6335.7836.1736.2136.2136.3836.40
baseballfield29.0630.5130.7531.1831.3631.2331.5631.62
beach31.0731.9232.0832.2932.4532.7632.6632.68
bridge28.9830.4130.6731.1931.3931.3031.6431.68
center25.2626.5926.9227.4827.7227.8427.9828.06
church22.1523.4123.6824.1224.2924.3924.4924.54
commercial25.8327.0527.2427.6227.7827.9927.9628.01
denseresidential23.0524.1324.3324.7024.8725.1325.0825.14
desert38.4938.8439.0639.1339.2739.3739.5139.52
farmland32.3033.4833.7734.2034.4233.9034.6134.68
forest27.3928.1528.2028.368.4728.3128.5728.59
industrial24.7526.0026.2426.7226.9226.9927.1427.22
meadow32.0632.5732.6532.7732.8832.7432.9732.99
mediumresidential26.0927.3727.6328.0628.2528.4528.4928.54
mountain28.0428.9028.9729.1129.1829.2629.2629.28
park26.2327.2527.3727.6927.8228.0127.9728.02
parking22.3324.0124.4025.2125.7426.1726.2526.41
playground27.2728.7229.0429.6229.9231.1830.3030.31
pond28.9429.8530.0030.2630.2930.4030.5030.55
port24.6925.8226.0226.4326.6226.9226.8526.94
railwaystation26.3127.5527.7628.1928.3828.4728.5628.62
resort25.9827.1227.3227.7127.8827.9928.0828.13
river29.6130.4830.6030.8230.9130.8831.0131.03
school24.9126.1326.3426.7826.9427.5127.1727.25
sparseresidential25.4126.1626.2726.4626.5326.4326.6426.67
square26.7528.1328.3928.9129.1329.0529.3829.44
stadium24.8126.1026.3726.8827.1027.2827.3227.41
storagetanks24.1825.2725.4825.8626.0026.0726.1826.23
viaduct25.8627.0327.2627.7427.9328.1228.1328.21
AVG27.3028.4028.6128.9929.1729.2129.3929.45
Table 3. Comparison of different methods in terms of Parameters, FLOPs, Inference time, and PSNR on the UCMerced at × 4 scale.
Table 3. Comparison of different methods in terms of Parameters, FLOPs, Inference time, and PSNR on the UCMerced at × 4 scale.
MethodParams (M)FLOPs (G)Time (ms)PSNR
SRCNN [5]0.074.530.40926.92
VDSR [20]0.6744.035.13827.56
LGCNet [53]0.1912.651.78827.18
DCM [52]2.1713.003.39127.60
HSENet [54]5.4319.2041.10527.75
TransENet [57]37.4621.4426.87327.78
FENet [56]3.3212.91146.01827.70
SRDD [55]9.346.4627.96927.67
OmniSR [58]3.2612.94102.38827.78
DFCFNet-S (Ours)0.3620.0012.2528.09
DFCFNet (Ours)0.7943.4820.7928.17
Table 4. Evaluation results of SR on five benchmark datasets. The best result is shown in red, and the second-best result is shown in blue.
Table 4. Evaluation results of SR on five benchmark datasets. The best result is shown in red, and the second-best result is shown in blue.
MethodScaleParamsSet5Set14BSD100Urban100Manga109
PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
DRCN [69] × 2 1774K37.630.958833.040.911831.850.894230.750.9133--
EDSR [70] × 2 1370K37.990.960433.570.917532.160.899431.980.927238.540.9769
DBPN [71] × 2 -38.090.960033.850.919032.270.900033.020.931039.320.9780
IDN [39] × 2 553K37.830.960033.300.914832.080.898531.270.919638.010.9749
CARN [41] × 2 1592K37.760.959033.520.916632.090.897831.920.925638.360.9765
IMDN [40] × 2 694K38.000.960533.630.917732.190.899632.170.928338.880.9774
RFDN [11] × 2 534K38.050.960633.680.918432.160.899432.120.927838.880.9773
LMAN-S [48] × 2 525K37.940.960333.490.916732.080.898431.850.925138.430.9765
SMSR [64] × 2 985K38.000.960133.640.917932.170.899032.190.928438.760.9771
FDIWN-M [20] × 2 433K38.030.960633.600.917932.170.899532.190.9284--
VLESR [65] × 2 311K38.010.960533.580.917732.160.899332.140.928038.750.9770
GASSL-S [66] × 2 280K37.910.960233.530.917232.140.899231.810.925338.570.9769
FDSCSR-S [68] × 2 466K38.020.960633.510.917432.180.899632.240.928838.670.9771
AMFFN [67] × 2 298K38.070.960733.590.917832.210.900132.370.929938.890.9774
DFCFNet-S (Ours) × 2 360K38.090.960833.700.918632.220.900232.290.929239.010.9777
DFCFNet (Ours) × 2 792K38.180.961133.880.920432.290.901132.690.933139.260.9780
DRCN [69] × 3 1774K33.820.922629.760.831128.800.796327.150.8276--
EDSR [70] × 3 1555K34.370.927030.280.841729.090.805228.150.852733.450.9439
DBPN [71] × 3 -32.470.898028.820.786027.720.740028.080.795031.500.9140
IDN [39] × 3 553K34.110.925329.990.835428.950.801327.420.835932.710.9381
CARN [41] × 3 118K34.290.925530.290.840729.060.803428.060.849333.500.9440
IMDN [40] × 3 703K34.360.927030.320.841729.090.804628.170.851933.610.9445
RFDN [11] × 3 541K34.410.927330.340.842029.090.805028.210.852533.670.9449
LMAN-S [48] × 3 709K34.310.926530.240.839729.020.803028.020.848733.420.9433
SMSR [64] × 3 993K34.400.927030.330.841229.100.805028.250.853633.680.9445
FDIWN-M [20] × 3 446K34.460.927430.350.842329.100.805128.160.8528--
VLESR [65] × 3 319K34.400.927230.340.841529.080.804328.160.851933.610.9445
GASSL-S [66] × 3 373K34.240.926030.280.840729.060.803827.950.847433.420.9434
FDSCSR-S [68] × 3 471K34.240.927430.370.842929.100.805228.200.853233.550.9443
AMFFN [67] × 3 305K34.480.927530.340.842029.110.805128.290.854433.720.9451
DFCFNet-S (Ours) × 3 366K34.520.928130.460.844429.150.806528.390.855433.970.9463
DFCFNet (Ours) × 3 798K34.620.929130.510.845729.220.808328.650.860734.240.9480
DRCN [69] × 4 1774K31.530.885428.020.767027.230.723325.140.7510--
EDSR [70] × 4 1518K32.090.893828.580.781327.570.735726.040.784930.350.9067
DBPN [71] × 4 -27.210.784025.130.648024.880.601023.250.622025.500.7990
IDN [39] × 4 553K31.820.890328.250.773027.410.729725.410.763229.410.8942
CARN [41] × 4 1592K32.130.893728.600.780627.580.734926.070.783730.470.9084
IMDN [40] × 4 715K32.210.894828.580.781127.560.735326.040.783830.450.9075
RFDN [11] × 4 541K32.240.895228.610.781927.570.736026.110.785830.580.9089
LMAN-S [48] × 4 672K32.120.893928.530.779827.510.734025.960.781330.300.9062
SMSR [64] × 4 1060K32.120.893228.550.780827.550.735126.110.786830.540.9085
FDIWN-M [20] × 4 454K32.170.894128.550.780627.580.736426.020.7844--
VLESR [65] × 4 331K32.170.894528.550.780227.550.734526.030.783030.480.9073
GASSL-S [66] × 4 428K32.010.893128.560.780827.560.735125.980.781830.350.9070
FDSCSR-S [68] × 4 478K32.250.895928.610.782127.580.736726.120.786630.510.9087
AMFFN [67] × 4 314K32.290.895828.620.782127.590.736526.220.788930.500.9083
DFCFNet-S (Ours) × 4 371K32.300.896328.680.783827.650.738126.300.789730.860.9114
DFCFNet (Ours) × 4 807K32.470.898128.740.785727.710.740126.530.796631.080.9140
Table 5. Indicators of the different models evaluated on the UCMerced and AID testsets (calculating PSNR and SSIM at × 4 scale).
Table 5. Indicators of the different models evaluated on the UCMerced and AID testsets (calculating PSNR and SSIM at × 4 scale).
FLFBNFEBEFFNParams (M)FLOPs (G)GPU Mem (M)Avg. Time (ms)AIDUCMerced
0.2914.8065.293.9529.23/0.783327.79/0.7632
0.061.5940.602.8528.64/0.762227.09/0.7377
0.084.8368.832.2129.06/0.777627.66/0.7583
0.2816.1089.326.3729.32/0.786128.01/0.7701
0.1315.4777.125.0129.18/0.781527.82/0.7640
0.3018.7077.666.2029.27/0.784727.96/0.7685
0.3620.0089.6012.2529.36/0.787528.09/0.7730
Table 6. Influence of each component in PCCM (calculating PSNR and SSIM at × 4 scale).
Table 6. Influence of each component in PCCM (calculating PSNR and SSIM at × 4 scale).
PConvCCMParams (M)FLOPs (G)GPU Mem (M)Avg. Time (ms)AIDUCMerced
0.168.0188.627.5128.64/0.784627.96/0.7683
0.3418.6189.5112.5329.34/0.786928.06/0.7721
0.3620.0089.6012.2529.36/0.787528.09/0.7730
Table 7. Comparison between global variance (Var) strategy and self-attention (SA) (calculating PSNR and SSIM at × 4 scale).
Table 7. Comparison between global variance (Var) strategy and self-attention (SA) (calculating PSNR and SSIM at × 4 scale).
VarSAParams (M)FLOPs (G)GPU Mem (M)Avg. Time (ms)Urban100Manga109
0.381220.0025,366.6216,982.4026.08/0.781230.82/0.9107
0.3620.0089.6012.2526.30/0.789730.86/0.9114
Table 8. Influence of the number of channels and FAM on network performance (calculating PSNR and SSIM at × 4 scale; Dim represents the number of channels).
Table 8. Influence of the number of channels and FAM on network performance (calculating PSNR and SSIM at × 4 scale; Dim represents the number of channels).
DimFAMParams (M)FLOPs (G)GPU Mem (M)Avg. Time (ms)Urban100Manga109
3680.3620.0089.6012.2526.30/0.789730.86/0.9114
48100.7943.48123.526.8426.53/0.796631.08/0.9140
48120.9451.92124.1222.0426.53/0.797331.14/0.9150
48141.1060.37124.1627.6526.63/0.799631.21/0.9155
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, M.; Wang, Q.; Zhang, W.; Chen, X.; Pan, J.; Guo, H. DFCFNet: A Local–Nonlocal Dual-Branch Feature Complementary Fusion Network for Remote Sensing Image Super-Resolution. Remote Sens. 2026, 18, 1626. https://doi.org/10.3390/rs18101626

AMA Style

Zhang M, Wang Q, Zhang W, Chen X, Pan J, Guo H. DFCFNet: A Local–Nonlocal Dual-Branch Feature Complementary Fusion Network for Remote Sensing Image Super-Resolution. Remote Sensing. 2026; 18(10):1626. https://doi.org/10.3390/rs18101626

Chicago/Turabian Style

Zhang, Miaomiao, Quan Wang, Wuxia Zhang, Xiangpeng Chen, Jiaxin Pan, and Huinan Guo. 2026. "DFCFNet: A Local–Nonlocal Dual-Branch Feature Complementary Fusion Network for Remote Sensing Image Super-Resolution" Remote Sensing 18, no. 10: 1626. https://doi.org/10.3390/rs18101626

APA Style

Zhang, M., Wang, Q., Zhang, W., Chen, X., Pan, J., & Guo, H. (2026). DFCFNet: A Local–Nonlocal Dual-Branch Feature Complementary Fusion Network for Remote Sensing Image Super-Resolution. Remote Sensing, 18(10), 1626. https://doi.org/10.3390/rs18101626

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop