Next Article in Journal
Multitask Learning for Authenticity and Authorship Detection
Previous Article in Journal
Non-Singular Fast Terminal Sliding Mode Control of 6-PUS Parallel Systems Based on Adaptive Disturbance Estimation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DyLKANet: A Lightweight Dynamic Distillation Network for Remote Sensing Image Super-Resolution Based on Large-Kernel Attention

1
School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
2
Shenzhen Institutes of Advanced Technology, Institute of Technology for Carbon Neutrality, Chinese Academy of Sciences (CAS), Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(6), 1112; https://doi.org/10.3390/electronics14061112
Submission received: 3 February 2025 / Revised: 3 March 2025 / Accepted: 6 March 2025 / Published: 12 March 2025

Abstract

:
Lightweight remote sensing image super-resolution methods aim to enhance image resolution and recover fine details through lightweight neural networks. However, current lightweight methods still suffer from poor performance and unattractive details. DyLKANet introduces a novel lightweight architecture that utilizes a multi-level feature integration strategy to enhance information exchange between context-aware and large kernel attention mechanisms. The network comprises two core modules: the feature distillation and enhancement block for efficient feature extraction, and the context-aware attention-based feature fusion module for capturing global interdependencies. Experiments conducted on the UCMerced, AID, and DIV2K datasets reveal that DyLKANet achieves comparable performance while maintaining a low parameter count and computational complexity. Taking the 2× upscaling results on the UCMerced dataset as an example, specifically, DyLKANet improves PSNR by 0.212–3.544 dB, SSIM by 0.005–0.038, and reduces parameters by 18.79–95.46%. DyLKANet reduces FLops by 7.25–82.63%, making it a promising solution for remote sensing image super-resolution tasks in resource-constrained environments.

1. Introduction

High-resolution remote sensing images are extensively utilized in ecological monitoring, crop assessment, urban planning, military reconnaissance, and disaster response due to their wide coverage, rich information, and enduring records [1]. However, the undersampling effect of imaging sensors and various degradation factors in the imaging process significantly limit the acquisition of high-resolution remote sensing images. Enhancing resolution through improved hardware performance involves significant costs and risks. Therefore, achieving high-resolution images without hardware modifications is crucial for remote sensor design, improving human visual perception, and advancing subsequent applications.
Single-image super-resolution (SISR) is a fundamental challenging task in computer vision, focusing on reconstructing low-resolution images into high-resolution ones by recovering high-frequency details lost due to imaging system limitations [2]. The evolution of SISR techniques has gradually transitioned from the initial simple interpolation methods such as neighborhood interpolation, bi-quadratic interpolation, and bi-tertiary interpolation, to the degradation process of considering SISR as a difficult problem to solve and inversely solving it by mathematical modeling of image sampling [3]. This technology has demonstrated wide application potential in various fields such as remote sensing and medical imaging [4], and can effectively improve the processing quality and effectiveness of subsequent related tasks [5]. Numerous variants of Convolutional Neural Networks (CNNs) and Visual Transformers (ViTs) have been developed to model the nonlinear relationship between low-resolution (LR) and high-resolution (HR) image pairs. Early CNN-based models, including SRCNN [6], SCN [7], and FSRCNN [8], prioritized simplicity but struggled to preserve high-frequency details. The development of deep residual networks (e.g., VDSR [9], LapSRN [10], EDSR [11]), and recurrent neural networks (e.g., DRCN [12], DRRN [13]) enabled multi-scale super-resolution reconstruction. Additionally, advanced architectures such as densely connected networks (e.g., SRDenseNet [14], RDN [15]), generative adversarial networks (e.g., SRGAN [16], RFB-ESRGAN [17]), and attention-based networks (e.g., SAN [18]) have significantly improved super-resolution performance at high magnifications. These innovations enhance reconstruction quality and introduce novel neural network structures, including SRFormer [19], CROSS-MPI [20], SRWARP [21], IMDN [22], ECBSR [23], and FMEN [24]. Recently, Transformer architectures have gained prominence in SISR due to their self-attention mechanisms, which capture global information and leverage image self-similarity. Models like TTSR [25], ESRT [26], SwinIR [27], and TransENet [28] improve global dependency handling by employing techniques such as feature dimension fusion, cross-token attention, and sliding windows. Despite the progress of CNN- and Transformer-based methods in SISR, their high parameter counts and computational complexity pose significant challenges, particularly for resource-constrained devices. Thus, further research is needed to optimize the balance among performance, parameter count, and computational complexity.
Remote sensing images exhibit diverse feature types and are affected by multiple degradation factors, including sampling errors, shape distortions, sharpness loss, and noise interference during imaging. Additionally, ground artifacts caused by lighting changes, such as cloud cover, terrain variations, and haze, further increase their semantic complexity compared to natural images, complicating remote sensing image super-resolution (RSISR). Researchers have introduced methods such as the characteristic resonance loss function [29], a two-stage design strategy incorporating spatial and spectral knowledge [30], and a hybrid higher-order attention mechanism [31] to address these challenges. However, these approaches still struggle with the complexities inherent in natural image super-resolution networks. Thus, there is an urgent need for lightweight RSISR techniques capable of efficiently processing massive remote-sensing images.
Recently, many lightweight super-resolution models have been developed, employing techniques such as cascading mechanisms [32], special convolutional layers [33], context-switching layers [34], channel-separation strategies [35], multi-feature distillation [22], neighborhood filtering [36], hierarchical feature fusion [37,38,39], novel self-attention mechanisms [40,41,42], large kernel distillation [43,44], and N-Gram contextual information [45]. CNN-based methods reduce model parameters by decreasing convolutional layers or residual blocks [46] and using dilated or group convolutions instead of traditional operations, often employing recursive structures or parameter-sharing strategies [12,13,38]. Transformer-based methods reduce computational complexity using small windows and sliding mechanisms. However, most of these models are limited to capturing local or regional interdependencies and struggle to explicitly capture long-range global dependencies across the entire image. This limitation is primarily due to the high computational cost of capturing global dependencies. Therefore, existing lightweight RSISR methods require further optimization and breakthroughs to enhance performance.
This paper introduces a super-resolution reconstruction method for remote sensing images, employing a dynamic large kernel attention-based feature distillation and fusion network (DyLKANet) to overcome the limitations of existing RSISR methods. The DyLKANet employs a multi-level feature integration strategy to facilitate information exchange between the context-aware attention mechanism and the large kernel attention mechanism. The DyLKANet framework comprises two key modules: the feature distillation and enhancement block (FDEB) for efficient feature extraction, and the context-aware attention-based feature fusion module (CFFM) for capturing global interdependencies. The primary contributions of this study are outlined as follows:
(1)
DyLKANet introduces a novel lightweight network architecture that employs a multi-level feature integration strategy. This strategy facilitates information exchange between the context-aware attention mechanism and the large kernel attention mechanism, enhancing the overall performance of the network;
(2)
We designed a dynamic convolutional residual block, which utilizes dynamic convolution to adaptively generate convolution kernels for each input sample. This approach not only reduces computational overhead but also preserves the depth of feature extraction;
(3)
The proposed feature distillation and enhancement block incorporates feature distillation, compression, and enhancement stages. This block is capable of efficiently extracting key features while significantly reducing the number of parameters, contributing to the lightweight nature of DyLKANet.
The rest of the paper is organized as follows. Section 2 is the description of the methodology proposed in this paper. Section 3 analyzes the experiments and results. Section 4 analyzes and discusses the validity of each component of the proposed method and Section 5 gives a conclusion.

2. Methods

This section describes the architecture of the proposed dynamic distillation and fusion lightweight network, termed DyLKANet, which utilizes large-kernel attention and dynamic convolution. Next, the feature distillation and enhancement block (FDEB) and dynamic convolutional residual block (DCRB) are then described in detail. Finally, the context-aware-attention-based feature fusion block (CFFB) is introduced.

2.1. Network Architecture

As shown in Figure 1, the proposed DyLKANet architecture includes four stages: shallow feature extraction, deep feature refinement, multi-layer feature fusion, and super-resolution image reconstruction from the LR input. I L R and I S R represent the input and output images, respectively. Before shallow feature extraction, the input image is duplicated n times and merged along the channel dimensions, increasing the available image information. Subsequently, shallow feature extraction is performed using a standard 3 × 3 convolution. This process is represented as follows.
F 0 = H C o n v 3 × 3 I L R 1 , , I L R m
where H C o n v 3 × 3 is the convolution operation and F 0 is the extracted shallow features. F 0 is then used as input to the deep feature refinement layer. The deep feature refinement layer is composed of n FDEBs.
F 1 = H F D E B 1 F 0 F n = H F D E B n F n 1
where H F D E B i is the i-th FDEB and F n is the extracted feature information. F 1 F n will then be used as the input to the feature fusion layer CFFM. The Content-aware-based Multi-scale Feature Fusion Module (CFFM) is a module composed of FDEBs and CFFBs, with the structure being recursively called.
F i n 0 = C s F 1 , F n F j 1 = H F D E B F i n j 1 F j 2 = H F D E B F j 1 O j = H C F F B C s F j 2 i 2 , F j 2 i C f = H C F F B O 1 , , O 6
where C s denotes the stacking operation. F j 1 and F j 2 represent the FDEB results in the j th recursively called structure, C f is the condensed feature, while O j denotes the CFFB results in the same structure. j denotes the number of recursively calls to the structure. Except for the first call where F i n j 1 = F i n 0 , every subsequent call sets F i n j 1 = C s F j 1 , O j . The final recursive call requires the output of the CFFM structure from preceding iterations into the last CFFM. The final O j from the recursive calls is processed by dynamic convolution (DyConv) to perform feature fusion. Finally, the output F o u t from the fusion layer, along with F 0 , is input into the SR image reconstruction layer.
F o u t = D y C o n v C f
The reconstruction layer includes a 3 × 3 convolution and a sub-pixel convolution applied to the F 0 and F o u t features. The process of up-sampling can be formulated as:
I S R = R e c F o u t + F 0
where R e c is the image reconstruction operation, and I S R is the generated SR image.

2.2. Feature Distillation and Enhancement Block

Receptive field diversity is essential for capturing contextual information, particularly in reconstructing remotely sensed images. Based on the principle of feature distillation, we introduce the FDEB, as illustrated in Figure 2.
The FDEB comprises three phases: feature distillation, feature compression, and feature enhancement. The distillation phase employs the DCRB module to reduce parameters and uses additional convolutional layers to refine input features and extract key features progressively. The distillation phase reduces the number of feature channels by half, thereby selectively extracting key features. This can be expressed as follows:
F 1 r o u g h = H D C R B F i n F 2 r o u g h = H D C R B F 1 r o u g h F 3 r o u g h = H D C R B F 2 r o u g h F 4 = C o n v 3 F 3 r o u g h F 1 d i s t i l l 1 = C o n v 1 F i n F 2 d i s t i l l 2 = C o n v 1 F 1 r o u g h F 3 d i s t i l l 3 = C o n v 1 F 2 r o u g h
where F n r o u g h and F n d i s t i l l n represent the n-th coarse and refined features, respectively. H D C R B , defined as the dynamic convolution operation applied to residual blocks, handles the refinement process to enhance feature representation. Additionally, C o n v 1 and C o n v 3 manage the distillation process, emphasizing the extraction of key features.
In the feature compression stage, the features extracted from different layers are concatenated and then processed through a 1 × 1 convolutional layer to synthesize the fused features, which can be expressed as follows:
F f u s i o n = C o n v 1 C o n c a t F 1 d i s t i l l 1 , F 2 d i s t i l l 2 , F 3 d i s t i l l 3 , F 4
where F f u s i o n denotes the fused feature.
In the feature enhancement stage, large kernel attention (LKA) is cascaded with Channel and Spatial Attention (CSA) to enhance feature representation. Applying CSA following LKA enables further feature fusion and refinement. While LKA captures macroscopic features, CSA refines these features, resulting in more precise and targeted fusion. This process is expressed as follows:
H L K A = C o n v 1 × 1 C o n v D W D C o n v D W F f u s i o n
H C S A = B C A B S A B C A = C o n v 1 × 1 S o f t m a x C o n v 1 × 1 F f u s i o n C o n v 1 × 1 ( F f u s i o n ) B S A = C o n v 1 × 1 S o f t m a x G P C o n v 1 × 1 F f u s i o n C o n v 1 × 1 F f u s i o n
F e = H C S A ( H L K A F f u s i o n )
where H L K A is the operation employing LKA, H C S A is the operation utilizing CSA, B C A is the channel attention branch, B S A is the spatial attention branch, and F e is the enhanced feature. To enhance model performance, a pixel normalization module is incorporated during the feature transformation stage to stabilize model training.
F t = N p i x H t r a n s F e
where H t r a n s is the 1 × 1 convolutional operation, N p i x is the pixel normalization operation, and F t is the transformed features. Finally, longer skip connections are used to enhance the residual learning capability of the model:
C o u t = F t r a n s + F i n

2.3. Dynamic Convolutional Residual Block

To dynamically generate a specific convolution kernel for each input sample, the kernel is adaptively adjusted based on the features, offering an efficient strategy to reduce computational overhead while preserving feature extraction depth. We designed DyConv as a lightweight alternative to convolutional operations. We constructed DCRB (as shown in Figure 3) by combining DyConv with the depth-wise separable convolution DWConv and the GLUE activation function. In DCRB, DyConv dynamically generates convolution kernels, unlike traditional convolution, which operates on the entire channel set. The dynamic convolution kernel is denoted as:
K D y n a m i c = i = 1 n w i · k i d y D 0 = H D y C o n v C i n , K d y n a m i c
where D 0 represents the output after the DyConv, C i n represents the input feature, K D y n a m i c is the convolution kernel, w i is the weight coefficients, and k i d y is a candidate convolution kernel of the same dimension. Compared with traditional convolution, DyConv significantly reduces computational requirements. Next, DCRB utilizes depth-wise convolution (DWConv) to process the output D 0 , generating intermediate results D d w :
D f u s e = H B N H D W C o n v P 0
where D f u s e represents the output after the DWConv, H D W C o n v represents the DWConv, H B N represents the BatchNorm, this step ensures that the depth of the network is not affected. Finally, by summing the input C i n and fusion output D f u s e and applying the GELU activation function, the final output C o u t of DCRB is obtained:
C o u t = G E L U C i n + D f u s e
where C o u t represents the output after the DCRB.

2.4. Context-Aware-Attention-Based Feature Fusion Block

In remote sensing image SR, utilizing hierarchical information is essential. To address this, we developed CFFM (as shown in Figure 1), which integrates features at different levels, capturing both low-level details and high-level semantics for improved representation. The core of the feature fusion architecture is the context-aware, attention-based feature fusion block (CFFB), shown in Figure 4.
The process begins with a series of n input features, each containing C channels. This can be expressed as follows:
F c o n c a t = C o n c a t F 1 , F 2 , F 3 F r e f i n e d = C o n v 1 F c o n c a t F i = H C A A F r e f i n e d
where F i is the feature of the i-th input. Then, dimensionality reduction is performed by a 1 × 1 convolutional operation. F r e f i n e d is the refined feature that reduces the channel dimension back to C , retaining the most salient features. F r e f i n e d is augmented by a context-aware attention mechanism (CAA).
Figure 4 illustrates the structure of the CAA. A set of feature maps is first passed through an average pooling layer, and the resulting sum is then processed by a convolutional layer (denoted as C o n v 1 ) to augment the features. These features are subsequently approximated using two depth-wise convolutions. A sigmoid function is applied to generate combination coefficients, which range from 0 to 1. Finally, the input feature maps are multiplied by these coefficients to produce the output feature maps.

2.5. Training Losses

The objectives of network training are as follows: (1) to generate high-quality I S R through training; (2) to preserve I L R spatial structure and semantic information; (3) to uncover more detailed texture information in remote sensing images. In this paper, reconstruction loss, content loss, and total variation loss are used to train the network. The overall loss can be described as follows:
L t o t a l = λ r e c L r e c + λ c o n t e n t L c o n t e n t + λ T V L T V
The reconstruction loss is used to ensure the similarity between the generated image and the original image; thus, it is usually set λ r e c to 1 or greater to avoid generating blurry images. Since L1 loss converges more easily than L2 loss, L1 loss is chosen as the reconstruction loss.
L r e c = 1 C H W I H R I S R 1
For the complex textures and structural variations in remote sensing images, this paper chooses Huber loss as the content loss, because Huber loss can reduce generated artifacts when dealing with remote sensing images that have high noise and low resolution.
L c o n t e n t = 1 2 I H R I S R 2 ,       i f   I H R I S R δ δ ( I H R I S R 1 2 δ ) ,       i f   I H R I S R > δ
where I H R denotes the true value, I S R denotes the predicted value, and δ is the threshold parameter.
The total variational loss L T V drives the resulting image to be smoother in smooth regions while preserving the sharpness of the edges.
d i f f = I H R I S R L T V = i , j d i f f i , j 1 d i f f i , j 2 + d i f f i + 1 , j d i f f i , j 2
where d i f f i , j is the pixel value of the d i f f image at position (i,j).

3. Experiments and Results

This section outlines the implementation details, including datasets, evaluation metrics, experimental configurations, and training strategies. Comparative experiments are conducted on the UCMerced [47] and AID [48] datasets to analyze quantitative results and visualize them in comparison with state-of-the-art methods. Finally, the efficiency of the baseline model is evaluated against the proposed model using the UCMerced dataset.

3.1. Datasets

Two publicly available remote sensing datasets, UCMerced and AID, are used to validate the proposed model’s effectiveness. The UCMerced dataset contains 2100 images across 21 remote sensing scenes, with each category comprising 100 images of 256 × 256 pixels. The AID dataset consists of 10,000 images from 30 remote sensing scenes, including airports, farmland, beaches, and deserts, with each image of 600 × 600 pixels. Each dataset is divided into training, testing, and validation sets with an 8:1:1 ratio. During experiments, the original HR images from each dataset were used, while LR images were generated using bicubic interpolation to create HR-LR image pairs.

3.2. Metrics

The results were evaluated using the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM).
P S N R I H R , I S R = 10 l o g 255 2 M S E I H R , I S R
where
M S E I H R , I S R = W = 1 W h = 1 H I H R h , w I S R h , w 2 H × W
where W and H are the width and height of the image, respectively; the larger the PSNR value, the better the reconstruction effect.
S S I M I H R , I S R = 2 μ I H R μ I S R + C 1 σ I H R I S R + C 2 μ I H R 2 + μ I S R 2 σ I H R 2 + σ I S R 2 + C 2
S S I M · indicates the similarity between the HR image and the reconstructed SR image in terms of brightness, contrast, and structure. Larger SSIM values indicate higher image quality.

3.3. Experimental Settings

To ensure a fair comparison, a unified degradation model was used to generate the experimental dataset, and consistent data enhancement techniques were applied during model training. Six SISR methods—SRCNN, FSRCNN, IMDN, ECBSR FMEN, and OMNISR—and six RSISR methods—CTNet, FENet, HSENET, AMFFN, TransENet, TTST—were selected for this study. The proposed model was developed in this experimental environment using the PyTorch 1.12.0 framework on the Ubuntu 22.04 operating system. An NVIDIA RTX3090, manufactured by ASUS in Taipei, Taiwan, was used to accelerate model training. The GPU was manufactured by ASUS in Santa Clara, California, USA. During model training, the Adam optimizer was employed with momentum parameters β 1   = 0.9 and β 2 = 0.99. The batch size was set to 16. The initial learning rate was 0.0005, and a linear decay strategy was applied to adjust it progressively during the training cycle. Additionally, the performance of all models was evaluated at scales of ×2, ×3, and ×4. To emphasize details more strongly, we set the λ r e c , λ c o n t e n t , and λ T V to 1, 1 × 10−1, and 1 × 10−2, respectively. The reconstruction loss is the only loss involved in the first three stages of the training process, while the content loss and total variation loss are added in the subsequent stages, up to the final stage.

3.4. Results of Experiments on UCMerced Dataset

Table 1 summarizes the performance of SR models, including SRCNN, FSRCNN, IMDN, ECBSR, OMNISR, FMEN, CTNet, FENet, HSENET, AMFFN, TransENet, TTST and DyLKANet (denoted as “Ours”). The evaluation metrics include the number of parameters (Params [K]), computational complexity (FLops [G]), memory consumption (Memory [MB]), inference time (Time [ms]) PSNR, and SSIM at scales of ×2, ×3, and ×4. Compared to natural image super-resolution models at the scale of ×2, such as IMDN, ECBSR, OMNISR, and FMEN, DyLKANet significantly reduces parameters and computational complexity while enhancing PSNR by 0.44–0.59 dB and SSIM by 0.006–0.010. Compared to remote sensing super-resolution models at the scale of ×2, such as CTNet, FENet, HSENET, and AMFFN, DyLKANet demonstrates superior performance, increasing PSNR by 0.40–0.57 dB, SSIM by 0.005–0.009, and reducing parameters by 18.79–95.42%. At the scale of ×3, DyLKANet improves PSNR by 0.28–3.41 dB and SSIM by 0.008–0.012 compared to IMDN, ECBSR, OMNISR, and FMEN. Compared to remote sensing models such as CTNet, FENet, HSENET, and AMFFN, DyLKANet improves PSNR by 0.21–0.57 dB, SSIM by 0.004–0.012, and reduces parameters by 18.68–95.46%. At the scale of ×4 magnification, DyLKANet enhances PSNR by 0.15–0.65 dB, SSIM by 0.006–0.021, and reduces parameters by 24.63–83.24%. It also reduces computational complexity by 1.81–6.81 times compared to IMDN and ECBSR. Compared to remote sensing models such as CTNet and FENet, DyLKANet achieves PSNR improvements of 0.12–0.60 dB, SSIM improvements of 0.004–0.022, and parameter reductions of 18.15–95.26%. At each super-resolution scale, the FLops operations of DyLKANet are only 12.79–17.36% of those required by ECBSR.
Table 2 presents the performance results of each model across various categories and magnification scales of the UCMerced dataset. At ×2 magnification, the DyLKANet model achieves the highest PSNR values in the chaparral, denseresidential, freeway, parkinglot, and tenniscourt categories, with improvements of 0.031 dB, 0.109 dB, 0.281 dB, 0.143 dB, and 0.018 dB in comparison to TTST. For SSIM values, DyLKANet closely matches the performance of the less optimal model, indicating its strong ability to preserve image structure. At ×3 magnification, the DyLKANet model outperforms the less optimal model in PSNR across all categories, with improvements of 0.093 dB in chaparral, 0.065 dB in freeway, 0.062 dB in parkinglot, 0.045 dB in tenniscourt, and 0.064 dB in denseresidential. Notably, at the scale of ×3, the DyLKANet model is comparable to the AMFFN model in the denseresidential category, being 0.177 dB higher, demonstrating its robust super-resolution reconstruction and generalization capabilities.
Figure 5 compares the reconstruction results of each model on the UCMerced dataset at scale ×4. The visual comparison clearly shows that DyLKANet excels at accurately capturing crucial high-frequency details in remote sensing images. Compared to other models, the image reconstructed by DyLKANet is closer to the real HR, with sharper edge contours and more detailed and accurate texture information. Notably, in the ×4 reconstruction of the parkinglot, only DyLKANet clearly reproduces the front and rear window details of the car.
Figure 6 presents the residual visualization results of different models in the RSISR task. In the figure, yellow and blue regions, indicated by color bars, represent the mean square error (MSE) distribution between the super-resolution and ground truth images. Yellow indicates high MSE values (large error), while blue indicates low MSE values (small error). A comparison of the DyLKANet and AMFFN in Figure 6 reveals that DyLKANet has a larger blue area in the residual map, indicating superior performance in recovering details with lower MSE values. This comparison highlights the advantages of DyLKANet in the RSISR task, demonstrating that the proposed network structure more effectively captures and recovers high-frequency details.

3.5. Results of Experiments on AID Dataset

Table 3 presents the PSNR and SSIM results for scales ×2, ×3, and ×4 on the AID dataset. DyLKANet demonstrates PSNR gains over suboptimal models across all scale combinations. Specifically, at the scale of ×2, compared to IMDN, ECBSR, OMNISR, and FMEN models, DyLKANet reduces parameters and computational complexity while improving PSNR by 0.16–0.50 dB and SSIM by 0.002–0.024. Compared to remote sensing super-resolution models, such as CTNet, FENet, HSENET, and AMFFN, DyLKANet improves PSNR by 0.19–0.36 dB and SSIM by 0.003–0.004. At the scale of ×3, DyLKANet increases PSNR by 0.168–0.662 dB and SSIM by 0.004–0.074. Compared to advanced remote sensing models like CTNet, it improves PSNR by 0.16–1.50 dB and SSIM by 0.004–0.025. At the scale of ×4 magnification, DyLKANet improves PSNR by 0.10–2.83 dB and SSIM by 0.003–0.089. Compared to CTNet and similar models, it achieves PSNR gains of 0.1–0.658 dB and SSIM gains of 0.003–0.013. While DyLKANet’s SSIM slightly decreases at the scale of ×4, its overall SSIM performance remains stable and comparable to or better than state-of-the-art models. These results indicate that DyLKANet performs well across various scale combinations, balancing parameter count and computational complexity while achieving comparable or superior performance to state-of-the-art models in PSNR and SSIM metrics.
Table 4 presents five typical categories from the AID dataset—bareland, desert, farmland, playground, and bridge—to compare the performance of different SR models. As shown in Table 4, DyLKANet achieves the best average performance across all evaluation metrics, demonstrating its superior SR performance in remote sensing benchmarks for all categories. At scale ×2, DyLKANet achieves great PSNR values across most of the categories, with gains ranging from 0.033 dB to 0.23 dB. Similarly, the SSIM values range from an increase of 0.001 to 0.004. At scale ×3, DyLKANet achieves great average performance across all categories, with gains ranging from 0.099 dB to 0.243 dB. The SSIM values of DyLKANet range from an increase of 0.001 to 0.005. At scale ×4, DyLKANet achieves the highest PSNR values across all categories, highlighting its superior ability to recover image details. In the desert category, it achieves the highest increase in SSIM value of 0.004 compared to TTST.
Figure 7 presents visual comparison results for a ×4 magnification factor. The figure illustrates qualitative results from test set samples of remote sensing images to compare different models in the SR task. The figure demonstrates that DyLKANet excels in recovering image edge textures. For instance, in the ×4 reconstruction of the playground, DyLKANet accurately recovers the regular lines, rendering them clear, coherent, and consistent with the real scene. In contrast, other SR models struggle with reconstruction, producing relatively blurred image details. This comparison highlights DyLKANet’s advantage in reconstructing fine details of remote sensing images. The comparison indicates that DyLKANet-reconstructed images exhibit rich edge textures and accurately capture key high-frequency details.
Figure 8 compares residual visualizations generated by various models for the remote sensing image super-resolution task on the AID dataset. The figure shows that compared to the AMFFN model, DyLKANet exhibits a larger blue area in the residual visualization, indicating superior detail reproduction in remote sensing images and a lower MSE value. In contrast, the residual visualization of AMFFN contains more yellow regions and higher MSE values. This suggests that DyLKANet more effectively captures high-frequency details in remote-sensing images.

3.6. Results of Experiments on DIV2K Dataset

To further validate the effectiveness and generalization ability of DyLKANet, experiments were conducted on the widely used DIV2K dataset at a scale of ×4. As illustrated in Table 5, Compared to HSENET, DyLKANet achieves a 95.26% reduction in parameter count. It also reduces the computational complexity by 66.27% compared to TTST. Although the inference time of DyLKANet has increased slightly compared to the RSISR model HSENET, it remains lower than that of the other five RSISR models. Specifically, DyLKANet achieves a higher PSNR (29.148 dB) and SSIM (0.819). This demonstrates the efficiency and effectiveness of DyLKANet in achieving high-quality super-resolution with a much lighter model architecture.
Figure 9 presents a visual comparison of ×4 super-resolution results on the DIV2K. The image shows a comparison of various image restoration models, highlighting the performance of the proposed model in restoring image details and clarity. Subjectively, the proposed model recovers more precise details of the columns and walls compared to other models, and objectively, it outperforms in metrics such as dB and SSIM values. This indicates the superior performance of the proposed model in image restoration tasks.

4. Ablation Experiments and Discussions

This section analyzes the impact of the key modules of DyLKANet and its variants. For this purpose, we selected the UCMerced dataset and trained all models with a ×4 scale factor to ensure the reliability of the analysis and conclusions.

4.1. Effectiveness of the Feature Distillation and Enhancement Blocks

To evaluate the effectiveness of the FDEB, a comparison experiment was conducted. In the experiment, the feature distillation stage of FDEB was replaced with three standard convolutional layers, and this modified block was used as the core component of the network. The experimental results are presented in Table 6. Analysis of the first two rows in Table 6 indicates that integrating FDEB not only enables a lighter network architecture but also significantly enhances reconstruction accuracy. Specifically, the network parameters are reduced to 257 K, while the PSNR increases by 0.08 dB and the SSIM improves by 0.023. These metric improvements demonstrate the effectiveness of FDEB in preserving image details and enhancing image quality.
To comprehensively demonstrate the impact of FDEB, a visual analysis was conducted using the Local Attribute Map (LAM) [49]. Figure 10 illustrates that integrating FDEB increases the diffusion index (DI) value from 1.73 to 3.07. The DI metric quantifies pixel engagement, where a higher DI indicates a broader scope of attention. Furthermore, the improved SR results reinforce the effectiveness of FDEB. Collectively, these advancements underscore the superiority of FDEB.

4.2. Analysis of the Number of FDEB Modules

To fine-tune network parameters and performance, we investigate how the number of FDEBs affects overall network performance, as these blocks are critical to the architecture and directly influence the final outcome. During the evaluation, we consider both performance improvement and the constraints of parameter size (Params) and computational complexity (measured by FLops). As shown in Table 7, we tested varying numbers of FDEBs (from 2 to 10) and recorded the corresponding performance metrics. The results indicate that at lower values of n, the network’s performance (measured by PSNR and SSIM) improves significantly, demonstrating a clear positive correlation. Performance peaks at n = 6, with the PSNR reaching its maximum value of 28.16 and SSIM stabilizing at a high level. Beyond n = 6, performance indicators stabilize with no significant improvement, suggesting saturation in network performance. Additionally, as the number of FDEBs increases, both parameter size and computational complexity increase proportionally. Specifically, as n increases from 2 to 10, network parameters rise from 153 K to 392 K, while computational complexity grows from 11.6 G to 27.7 G. Given the goal of developing a lightweight RSISR technique, we prioritize configurations with fewer parameters and reduced computation when performance is comparable. After balancing performance and lightweight design, we determine n = 6 as the optimal network configuration. At n = 6, the network achieves high performance while effectively controlling parameter size and computational complexity.

4.3. Effectiveness of CFFM

To quantitatively assess the effectiveness of the CFFM, we performed a comprehensive comparative analysis with several other feature fusion structures (as shown in Figure 11), each regarded as a core network component. Table 8 presents the performance metrics of various structures, including parameters (Params), computational complexity (FLops), PSNR, and SSIM.
The multi-level feature fusion strategy in DyLKANet, as illustrated in Figure 11d, integrates features from different levels to enhance the quality of the reconstructed image. Compared to MSRN (Figure 11a), which employs a standard feature fusion structure, DyLKANet reduces the number of parameters by 57.67%. When compared to AMFFN (Figure 11b), which uses a multi-level feature fusion module, DyLKANet’s CFFM demonstrates superior performance in terms of parameter efficiency and computational complexity, achieving a Peak Signal-to-Noise Ratio (PSNR) of 28.16 dB and a Structural Similarity Index (SSIM) of 0.7743 with only 257 K parameters and 25.9 G FLOPs. Compared to FENET (Figure 11c), which utilizes a reverse fusion module, DyLKANet reduces the number of parameters by 21%.

4.4. Effectiveness of LK-CSA

To evaluate the LK-CSA module’s contribution to model performance, we constructed two model sets: one including the LK or CSA module (“w”) and one without it (“w/o”), keeping all other components and configurations identical. Table 9 indicates that incorporating the LK-CSA module reduces model parameters by approximately 21% (from 312 K to 257 K) while significantly improving performance. Specifically, the LK-CSA-equipped model improved PSNR by 1.54dB (from 26.62 to 28.16) and SSIM by 0.061 (from 0.713 to 0.774). These results demonstrate the LK-CSA module’s effectiveness in enhancing performance and its importance in resource-constrained environments. Specifically, combining LKA and CSA may improve efficiency in feature extraction and information fusion compared to using either module alone, enhancing the model’s capacity to process complex data structures.
To evaluate the effectiveness of different attention mechanisms, we conducted experiments with various combinations of attention modules. The results are presented in Table 10, which includes the number of parameters, computational complexity (FLops), PSNR, and SSIM for each configuration. Compared to the baseline model, the combination of ESA and CA results in a moderate reduction in parameters and computational complexity. However, the PSNR and SSIM values are slightly lower than those achieved with LKA and CSA, indicating that this combination is less effective in capturing long-range dependencies and refining feature representations. The LKA + CSA combination achieves the highest PSNR and SSIM values in the test configuration. The LKA module effectively captures long-term dependencies, while the CSA module further refines feature representations to achieve optimal overall performance.

4.5. Effectiveness of DyConv

In this section, we further analyze the advantages of the dynamic convolutional residual block (DCRB) with dynamic convolution (DyConv) compared to static convolution. The experimental results in Table 11 provide a clear comparison between models with DyConv and with static convolution.
As shown in Table 11, the model with DyConv (“w DyConv”) has 257 K parameters and 25.9 G FLOPs, while the model with static convolution (“w Conv”) has 276 K parameters and 28.1 G FLOPs. This indicates that the use of DyConv reduces the number of parameters by approximately 7.6% (from 276 K to 257 K) and the computational complexity by approximately 7.8% (from 28.1 G to 25.9 G). Despite the reduction in parameters and computational complexity, the model with DyConv demonstrates improved performance in terms of PSNR and SSIM. Specifically, the model with DyConv achieves a PSNR of 28.16 dB and an SSIM of 0.774, while the model with static convolution achieves a PSNR of 28.02 dB and an SSIM of 0.772. This indicates that DyConv not only reduces the computational overhead but also enhances the model’s ability to reconstruct high-quality images.
The advantages of DyConv can be attributed to its ability to dynamically generate convolution kernels based on the input features. This dynamic adjustment allows the model to adaptively capture the most relevant features for each input sample, leading to more efficient and effective feature extraction. In contrast, static convolution uses fixed convolution kernels that may not be optimal for all input samples, leading to redundant computations and less effective feature extraction.

4.6. Analysis of the Number of Layers of DyConv

To further analyze the impact of the number of layers in the dynamic convolutional residual block (DCRB) on the performance of DyLKANet, we conducted an ablation study with different numbers of layers (1, 5, 7, 13, 15). The results are presented in Table 12. The results indicate that the number of layers in the DyConv has a significant impact on the performance of DyLKANet. While increasing the number of layers generally improves the model’s ability to capture complex features, there is an optimal point beyond which adding more layers leads to diminishing returns or even performance degradation. The optimal number of layers appears to be around 7, where the model achieves the highest PSNR and SSIM values while maintaining a reasonable computational complexity. Further increasing the number of layers beyond this point results in decreased performance, likely due to overfitting or the increased computational burden.

4.7. Models Efficiency

To comprehensively evaluate model efficiency, we introduce inference time, memory consumption, parameters, and FLops as intuitive metrics and present a comparative analysis in Figure 12a–d. Figure 12a illustrates the inference times of different models, showing that DyLKANet achieves a relatively low inference time compared to other state-of-the-art models. Specifically, DyLKANet outperforms the heavily parameterized HSENET, achieving a 0.557 dB improvement in PSNR while maintaining similar model complexity. This performance enhancement is accompanied by a slight increase in inference time (approximately 21.2 ms for 100 images), which remains within an acceptable range, demonstrating DyLKANet’s efficiency. Figure 12b visualizes the memory consumption of different models, indicating that DyLKANet has a moderate memory footprint, making it suitable for deployment on devices with limited memory resources. Figure 12c shows the trade-off between PSNR and the number of parameters. DyLKANet achieves a high PSNR with a relatively low number of parameters, indicating its effectiveness in balancing performance and model size. This balance is crucial for practical deployment, as it allows for high-quality super-resolution without the need for excessive computational resources. Figure 12d presents the FLops of different models, highlighting DyLKANet’s low computational complexity. DyLKANet has a significantly lower GFLOP count compared to models like HSENET, making it more efficient in terms of computational requirements. This is particularly important for real-world applications where computational resources are limited, such as edge devices.

5. Limitations and Future Work

DyLKANet, while reducing parameters to meet the demands of lightweight applications, is able to achieve super-resolution results comparable to state-of-the-art models. Although DyLKANet has significantly reduced the number of parameters and computational complexity, there is still room for further optimization. In the future, we will leverage techniques such as pre-training and LoRA to reduce model size, while also decreasing GPU memory usage and maintaining inference efficiency during deployment. Currently, DyLKANet has been tested on visible light remote sensing images, but further optimization and improvements are needed to enhance the network’s ability to handle extremely low-resolution images and remote sensing images of other modalities. Additionally, the degradation model is one of the key issues that need to be addressed in remote sensing super-resolution. In this paper, a fixed degradation model similar to those used in most studies is still adopted, which may not fully capture the diverse degradation patterns encountered in real-world remote sensing images. Future research needs to explore more sophisticated training methods, such as self-supervised or unsupervised learning, to improve the model’s robustness and generalization ability across different degradation scenarios. Lastly, the evaluation of DyLKANet has primarily been conducted on publicly available datasets, which may not fully represent the complexity and diversity of real-world remote sensing data. Future work should include more comprehensive testing on a wider range of datasets, including those with varying levels of noise, blur, and other degradation factors, to better assess the model’s performance in practical applications.

6. Conclusions

This paper proposes a dynamic distillation network, DyLKANet, to address the challenge of remote sensing image super-resolution. The network leverages a large-kernel attention mechanism to achieve efficient feature extraction and capture global dependencies through a multi-level feature fusion strategy. Experimental results show that DyLKANet performs comparably to state-of-the-art methods on the publicly available UCMerced, AID, and DIV2K datasets, while maintaining low parameter count and computational complexity. Specifically, on the UCMerced dataset, DyLKANet improves the PSNR by 0.212 dB and 0.151 dB over the suboptimal TTST at the scale of ×2 and ×4 scaling factors, respectively. At the scale of ×2, DyLKANet improves PSNR by 0.439–0.589 dB, SSIM by 0.006–0.010, and reduces parameters by 25.54–82.57% compared to natural image super-resolution models. Compared to remote sensing super-resolution models, DyLKANet improves PSNR by 0.212–0.576 dB, SSIM by 0.005–0.009, and reduces parameters by 18.79–95.46%. DyLKANet reduces FLops operations by 7.25–67.30%. Furthermore, ablation experiments validate the effectiveness of key modules, including FDEB, CFFM, and dynamic convolution. Results show that these modules significantly enhance model performance while reducing parameter count and computational complexity. In conclusion, DyLKANet, a lightweight super-resolution network for remote sensing images, demonstrates significant potential in resource-constrained environments.

Author Contributions

Conceptualization, methodology, B.H. and B.W.; software, B.W.; validation, B.H., L.S. and B.W.; formal analysis, B.H., B.W. and X.M.; investigation, B.H. and Y.F.; data curation, resources, B.W.; writing—original draft preparation, B.H., B.W. and X.M.; writing—review and editing, B.H., L.S. and Y.F.; visualization, B.W.; supervision, B.H.; project administration, B.H. and Y.F.; funding acquisition, B.H. and Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Sichuan Province (No. 2023NSFSC0243), the Talent Introduction Program of Chengdu University of Information Technology (No. KYTZ202261).

Data Availability Statement

The data utilized in this study are sourced from publicly available datasets, as specified in the reference section of this paper. All datasets are open access, and detailed descriptions, including access links, are provided in the cited references.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

CAAContext-Aware Attention Mechanism
CFFBContext-Aware Attention-Based Feature Fusion Block
CFFMContext-Aware Attention-Based Feature Fusion Module
CNNConvolutional Neural Network
CSAChannel And Spatial Attention
DCRBDynamic Convolutional Residual Block
DyConvDynamic Convolution
FDEBFeature Distillation And Enhancement Block
HRHigh-Resolution
LAMLocal Attribute Map
LKALarge Kernel Attention
LRLow-Resolution
PSNRPeak Signal-To-Noise Ratio
RSISRRemote Sensing Image Super-Resolution
SISRSingle-Image Super-Resolution
SSIMStructural Similarity Index
VITVisual Transformer

References

  1. Liu, H.; Qian, Y.; Zhong, X.; Chen, L.; Yang, G. Research on super-resolution reconstruction of remote sensing images: A comprehensive review. Opt. Eng. 2021, 60, 100901. [Google Scholar] [CrossRef]
  2. Wang, Z.; Chen, J.; Hoi, S.C. Deep learning for image super-resolution: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3365–3387. [Google Scholar] [CrossRef] [PubMed]
  3. Zhu, F. A review of deep learning based image super-resolution techniques. arXiv 2022, arXiv:2201.10521. [Google Scholar] [CrossRef]
  4. Hayat, M.; Aramvith, S.; Achakulvisut, T. SEGSRNet for Stereo-Endoscopic Image Super-Resolution and Surgical Instrument Segmentation. arXiv 2024, arXiv:2404.13330v2. [Google Scholar]
  5. Ahmad, N.; Strand, R.; Sparresäter, B.; Tarai, S.; Lundström, E.; Bergström, G.; Ahlström, H.; Kullberg, J. Automatic segmentation of large-scale CT image datasets for detailed body composition analysis. BMC Bioinform. 2023, 24, 346. [Google Scholar] [CrossRef] [PubMed]
  6. Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 184–199. [Google Scholar] [CrossRef]
  7. Yang, H.; Yang, X.; Liu, K.; Jeon, G.; Zhu, C. SCN: Self-Calibration Network for fast and accurate image super-resolution. Expert Syst. Appl. 2023, 226, 120159. [Google Scholar] [CrossRef]
  8. Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. pp. 391–407. [Google Scholar] [CrossRef]
  9. Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar] [CrossRef]
  10. Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar] [CrossRef]
  11. Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar] [CrossRef]
  12. Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar] [CrossRef]
  13. Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2790–2798. [Google Scholar] [CrossRef]
  14. Tong, T.; Li, G.; Liu, X.; Gao, Q. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4809–4817. [Google Scholar] [CrossRef]
  15. Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar] [CrossRef]
  16. Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar] [CrossRef]
  17. Shang, T.; Dai, Q.; Zhu, S.; Yang, T.; Guo, Y. Perceptual extreme super-resolution network with receptive field block. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 440–441. [Google Scholar] [CrossRef]
  18. Dai, T.; Cai, J.; Zhang, Y.; Xia, S.T.; Zhang, L. Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11065–11074. [Google Scholar] [CrossRef]
  19. Zhou, Y.; Li, Z.; Guo, C.L.; Bai, S.; Cheng, M.M.; Hou, Q. SRformer: Permuted self-attention for single image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–16 October 2023; pp. 12780–12791. [Google Scholar] [CrossRef]
  20. Zhou, Y.; Wu, G.; Fu, Y.; Li, K.; Liu, Y. Cross-MPI: Cross-scale stereo for image super-resolution using multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14842–14851. [Google Scholar] [CrossRef]
  21. Son, S.; Lee, K.M. SRWARP: Generalized image super-resolution under arbitrary transformation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7782–7791. [Google Scholar] [CrossRef]
  22. Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th Acm International Conference on Multimedia, New York, NY, USA, 21–25 October 2019; pp. 2024–2032. [Google Scholar] [CrossRef]
  23. Zhang, X.; Zeng, H.; Zhang, L. Edge-oriented convolution block for real-time super resolution on mobile devices. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 4034–4043. [Google Scholar] [CrossRef]
  24. Du, Z.; Liu, D.; Liu, J.; Tang, J.; Wu, G.; Fu, L. Fast and memory-efficient network towards efficient image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 853–862. [Google Scholar] [CrossRef]
  25. Wang, Y.; Jin, S.; Yang, Z.; Guan, H.; Ren, Y.; Cheng, K.; Zhao, X.; Liu, X.; Chen, M.; Liu, Y.; et al. TTSR: A transformer-based topography neural network for digital elevation model super-resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4403719. [Google Scholar] [CrossRef]
  26. Lu, Z.; Li, J.; Liu, H.; Huang, C.; Zhang, L.; Zeng, T. Transformer for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 456–465. [Google Scholar] [CrossRef]
  27. Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. SwinIR: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar] [CrossRef]
  28. Xiao, Y.; Yuan, Q.; Jiang, K.; He, J.; Wang, Y.; Zhang, L. From degrade to upgrade: Learning a self-supervised degradation guided adaptive network for blind remote sensing image super-resolution. Inf. Fusion 2023, 96, 297–311. [Google Scholar] [CrossRef]
  29. Qin, M.; Mavromatis, S.; Hu, L.; Zhang, F.; Liu, R.; Sequeira, J.; Du, Z. Remote sensing single-image resolution improvement using a deep gradient-aware network with image-specific enhancement. Remote Sens. 2020, 12, 758. [Google Scholar] [CrossRef]
  30. Li, Q.; Yuan, Y.; Jia, X.; Wang, Q. Dual-stage approach toward hyperspectral image super-resolution. IEEE Trans. Image Process. 2022, 31, 7252–7263. [Google Scholar] [CrossRef]
  31. Dong, R.; Mou, L.; Zhang, L.; Fu, H.; Zhu, X.X. Real-world remote sensing image super-resolution via a practical degradation model and a kernel-aware network. ISPRS J. Photogramm. Remote Sens. 2022, 191, 155–170. [Google Scholar] [CrossRef]
  32. Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 252–268. [Google Scholar] [CrossRef]
  33. Yang, B.; Bender, G.; Le, Q.V.; Ngiam, J. Condconv: Conditionally parameterized convolutions for efficient inference. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2019; p. 32. [Google Scholar] [CrossRef]
  34. Wang, S.; Zhou, T.; Lu, Y.; Di, H. Contextual transformation network for lightweight remote-sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5615313. [Google Scholar] [CrossRef]
  35. Wang, Z.; Li, L.; Xue, Y.; Jiang, C.; Wang, J.; Sun, K.; Ma, H. FeNet: Feature enhancement network for lightweight remote-sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5615313. [Google Scholar] [CrossRef]
  36. Li, W.; Zhou, K.; Qi, L.; Jiang, N.; Lu, J.; Jia, J. LAPAR: Linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond. Adv. Neural Inf. Process. Syst. 2020, 33, 20343–20355. [Google Scholar] [CrossRef]
  37. Qin, J.; Liu, F.; Liu, K.; Jeon, G.; Yang, X. Lightweight hierarchical residual feature fusion network for single-image super-resolution. Neurocomputing 2022, 478, 104–123. [Google Scholar] [CrossRef]
  38. Liu, F.; Yang, X.; De Baets, B. A deep recursive multi-scale feature fusion network for image super-resolution. J. Vis. Commun. Image Represent. 2023, 90, 103730. [Google Scholar] [CrossRef]
  39. Wang, H.; Cheng, S.; Li, Y.; Du, A. Lightweight Remote-Sensing Image Super-Resolution via Attention-Based Multilevel Feature Fusion Network. IEEE Trans. Geosci. Remote Sens. 2023, 61, 2005715. [Google Scholar] [CrossRef]
  40. Park, K.; Soh, J.W.; Cho, N.I. A dynamic residual self-attention network for lightweight single image super-resolution. IEEE Trans. Multimed. 2023, 25, 907–918. [Google Scholar] [CrossRef]
  41. Wang, H.; Chen, X.; Ni, B.; Liu, Y.; Liu, J. Omni aggregation networks for lightweight image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22378–22387. [Google Scholar] [CrossRef]
  42. Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient long-range attention network for image super-resolution. In Proceedings of the European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; pp. 649–667. [Google Scholar] [CrossRef]
  43. Xie, C.; Zhang, X.; Li, L.; Meng, H.; Zhang, T.; Li, T.; Zhao, X. Large kernel distillation network for efficient single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1283–1292. [Google Scholar] [CrossRef]
  44. Luo, P.; Xiao, G.; Gao, X.; Wu, S. LKD-Net: Large kernel convolution network for single image dehazing. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 1601–1606. [Google Scholar] [CrossRef]
  45. Choi, H.; Lee, J.; Yang, J. N-gram in swin transformers for efficient lightweight image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 2071–2081. [Google Scholar] [CrossRef]
  46. Li, J.; Fang, F.; Mei, K.; Zhang, G. Multi-scale residual network for image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar] [CrossRef]
  47. Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 3–5 November 2010; pp. 270–279. [Google Scholar] [CrossRef]
  48. Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef]
  49. Gu, J.; Dong, C. Interpreting super-resolution networks with local attribution maps. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9195–9204. [Google Scholar] [CrossRef]
Figure 1. The architecture of dynamic feature distillation and fusion network.
Figure 1. The architecture of dynamic feature distillation and fusion network.
Electronics 14 01112 g001
Figure 2. The architecture of feature distillation and enhancement block.
Figure 2. The architecture of feature distillation and enhancement block.
Electronics 14 01112 g002
Figure 3. The architecture of dynamic convolutional residual block.
Figure 3. The architecture of dynamic convolutional residual block.
Electronics 14 01112 g003
Figure 4. The architecture of context-aware-attention-based feature fusion block.
Figure 4. The architecture of context-aware-attention-based feature fusion block.
Electronics 14 01112 g004
Figure 5. Comparison of ×4 with different models.
Figure 5. Comparison of ×4 with different models.
Electronics 14 01112 g005
Figure 6. Comparison of SR results and residual maps with different models on UCMerced dataset: (a) residual maps of agricultural; (b) residual maps of denseresidential; (c) residual maps of mobilehomepark.
Figure 6. Comparison of SR results and residual maps with different models on UCMerced dataset: (a) residual maps of agricultural; (b) residual maps of denseresidential; (c) residual maps of mobilehomepark.
Electronics 14 01112 g006aElectronics 14 01112 g006b
Figure 7. Comparison of ×2 SR results for playground categories in AID test set.
Figure 7. Comparison of ×2 SR results for playground categories in AID test set.
Electronics 14 01112 g007
Figure 8. Comparison of SR results and residual maps with different models on AID dataset: (a) residual maps of pond; (b) residual maps of farmland.
Figure 8. Comparison of SR results and residual maps with different models on AID dataset: (a) residual maps of pond; (b) residual maps of farmland.
Electronics 14 01112 g008
Figure 9. Comparison of ×4 SR results in DIV2K testset.
Figure 9. Comparison of ×4 SR results in DIV2K testset.
Electronics 14 01112 g009
Figure 10. Comparison of the LAM results with different settings on the UCMerced dataset at scale ×4.
Figure 10. Comparison of the LAM results with different settings on the UCMerced dataset at scale ×4.
Electronics 14 01112 g010
Figure 11. Comparison of different feature fusion structures. (a) MSRN standard feature fusion structure; (b) AMFFN multi-level feature fusion module structure; (c) FENET reverse fusion module structure; (d) DyLKANet core fusion structure.
Figure 11. Comparison of different feature fusion structures. (a) MSRN standard feature fusion structure; (b) AMFFN multi-level feature fusion module structure; (c) FENET reverse fusion module structure; (d) DyLKANet core fusion structure.
Electronics 14 01112 g011
Figure 12. Ablation studies of memory consumption, inference times, parameters, and PSNR of DyLKANet on AID: (a) inference times of different models; (b) memory consumption of different models; (c) PSNR and parameters of different models; (d) FLops of different models.
Figure 12. Ablation studies of memory consumption, inference times, parameters, and PSNR of DyLKANet on AID: (a) inference times of different models; (b) memory consumption of different models; (c) PSNR and parameters of different models; (d) FLops of different models.
Electronics 14 01112 g012
Table 1. PSNR and SSIM results of the UCMerced test dataset for ×2, ×3, ×4.
Table 1. PSNR and SSIM results of the UCMerced test dataset for ×2, ×3, ×4.
ModelYearScaleParams [K]FLops [G]Memory [MB]Time [ms]UCMerced
PSNRSSIM
SRCNN2014×2697.1220.18731.0960.893
FSRCNN2016×2256.71340.40733.1480.914
IMDN2019×269470.7705.53634.0510.921
ECBSR2021×21387139.925210.05334.0610.922
OMNISR2023×277277.668616.03134.2010.925
FMEN2022×232533.2322.41334.0630.922
CTNet2021×234926.22528.50934.0660.922
FENet2022×235134.346894.25834.1250.922
HSENET2022×2529016.718167.35234.2210.926
AMFFN2023×229826.836212.93834.2320.926
TransENet2022×237,3119.7852037.31134.0640.924
TTST2024×218,15674.33480113.44234.4280.925
Ours ×224224.358614.80434.6400.931
SRCNN2014×3697.1220.15727.0820.773
FSRCNN2016×32513.71360.39129.0350.814
IMDN2019×370371.5705.79830.0270.838
ECBSR2021×31571159.625210.97530.0200.836
OMNISR2023×378078.568615.24030.2120.840
FMEN2022×333233.9322.22729.9230.836
CTNet2021×334936.62529.81830.0030.837
FENet2022×335735.046894.26929.9200.836
HSENET2022×3547017.418146.40330.2810.842
AMFFN2023×330527.436214.38430.2810.844
TransENet2022×337,49514.343237.49529.9840.837
TTST2024×318,34175.13476113.59930.2910.843
Ours ×324825.058815.05230.4980.848
SRCNN2014×4697.1220.14926.3020.697
FSRCNN2016×42523.41360.43028.1020.721
IMDN2019×471572.8705.77327.7270.758
ECBSR2021×41534202.425412.49027.5100.755
OMNISR2023×479279.768815.99128.0100.768
FMEN2022×434134.8342.64227.5720.753
CTNet2021×436044.925410.10327.6540.756
FENet2022×436635.947096.23827.5590.752
HSENET2022×4543019.218166.27627.6070.754
AMFFN2023×431428.436426.45528.0370.770
TransENet2022×437,45821.4372237.45827.8820.764
TTST2024×418,30476.83480112.15028.0130.771
Ours ×425725.958816.52628.1640.774
Table 2. PSNR and SSIM results of different methods on 5 categories on the UCMerced dataset at scale ×2, ×3, and ×4.
Table 2. PSNR and SSIM results of different methods on 5 categories on the UCMerced dataset at scale ×2, ×3, and ×4.
ModelScaleImage
ChaparralDenseresidentialFreewayParkinglotTenniscourt
PSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIM
SRCNN×231.2150.90728.8140.91332.9740.92224.6770.87329.4680.871
FSRCNN×231.9860.91630.4980.91933.1050.91325.6860.88430.0360.875
IMDN×233.2130.93333.4120.94937.4870.95628.2130.92332.8910.929
ECBSR×232.8510.93032.5230.94236.6830.95027.5590.91331.6930.918
OMNISR×233.2120.93433.3240.94837.3430.95428.0190.92232.8610.928
FMEN×233.1710.93333.1830.94836.8750.94927.9600.92132.6780.926
CTNet×233.2250.93433.4700.95037.4190.95527.9930.92132.8300.928
FENet×233.2150.93433.3640.94837.4450.95727.9180.92032.8950.929
HSENET×233.2310.93533.5540.95037.4440.95728.2830.92633.0000.930
AMFFN×233.2360.93433.5050.94937.5390.95728.1680.92433.0470.930
TransENet×233.1130.93233.4130.95337.4160.95927.9730.92233.2050.932
TTST×233.3160.93333.6140.95537.5020.95728.2940.92433.3940.934
Ours×233.3470.93533.7230.95437.7830.96128.4370.92933.4120.935
SRCNN×327.7830.81224.1260.78828.2160.81920.9160.72925.9550.730
FSRCNN×329.3460.84826.3390.82129.5010.84022.0340.76026.7420.749
IMDN×330.4710.87628.3820.87132.1120.87923.8550.82528.9460.837
ECBSR×330.2560.87127.9920.86531.4440.86721.9390.78828.3180.819
OMNISR×330.4340.87628.2560.86931.8330.87623.7780.82328.8040.835
FMEN×330.4390.87628.1350.86831.5650.87023.6670.82028.8170.834
CTNet×330.5150.87828.2920.87231.9560.87723.6320.82028.8460.835
FENet×330.4870.87628.1540.86731.9630.88023.6400.81828.8720.835
HSENET×330.4890.87628.4150.87232.1610.88124.0220.83128.9600.841
AMFFN×330.5480.87928.5160.87432.1980.88324.0710.83229.0860.843
TransENet×330.3370.87328.5070.87132.1980.88423.7520.82529.1700.842
TTST×330.6300.88028.6290.87432.3110.88524.1840.83029.2420.844
Ours×330.7230.88328.6930.87832.3760.88724.2460.83629.2870.847
SRCNN×425.1620.69521.3760.65725.5310.70718.7880.58824.0210.618
FSRCNN×427.3960.77623.4490.70526.7130.74320.0220.64025.1360.650
IMDN×428.3270.80925.2230.78128.7710.79021.3680.72826.7860.751
ECBSR×428.2560.80924.8790.76728.4860.78721.0030.70826.3420.722
OMNISR×428.4120.81125.2970.78328.5010.78821.2390.72226.6700.746
FMEN×428.3380.80725.0900.77728.3880.78320.9900.70826.5550.735
CTNet×428.4390.81425.3430.78828.5700.79121.1040.71826.7320.748
FENet×428.3910.80925.0310.77328.6800.79221.0440.71226.7110.745
HSENET×428.3700.80925.1840.77928.7690.79321.3620.72926.8180.755
AMFFN×428.4920.81625.4510.78928.8670.79921.6250.74026.9040.755
TransENet×428.3840.81125.4260.78928.8390.79221.7010.74326.9100.757
TTST×428.5640.81425.4890.79328.9830.80121.7370.74427.0280.755
Ours×428.6260.81925.5850.79328.9920.80421.7510.74527.0380.759
Table 3. PSNR and SSIM results of different methods on AID test dataset at scale ×2, ×3, and ×4.
Table 3. PSNR and SSIM results of different methods on AID test dataset at scale ×2, ×3, and ×4.
ModelYearScaleParams [K]FLops [G]Memory [MB]Time [ms]AID
PSNRSSIM
SRCNN2014×2697.1220.18732.3260.906
FSRCNN2016×2256.71340.40734.1450.924
IMDN2019×269470.7705.53634.9210.928
ECBSR2021×21387139.925210.05335.0010.933
OMNISR2023×277277.668616.03135.2470.936
FMEN2022×232533.2322.41334.9120.914
CTNet2021×234926.22528.50935.1100.934
FENet2022×235134.346894.25835.0820.934
HSENET2022×2529016.718167.35235.0500.934
AMFFN2023×229826.836212.93835.2140.935
TransENet2022×237,3119.7852037.31135.3110.935
TTST2024×218,15674.33480113.44235.2820.936
Ours ×224224.358614.80435.4120.938
SRCNN2014×3697.1220.15728.2140.785
FSRCNN2016×32513.71360.39130.1830.832
IMDN2019×370371.5705.79830.9690.835
ECBSR2021×31571159.625210.97530.7260.843
OMNISR2023×378078.568615.24031.1900.855
FMEN2022×333233.9322.22730.7470.845
CTNet2021×334936.62529.81830.8850.849
FENet2022×335735.046894.26929.8570.834
HSENET2022×3547017.418146.40330.9590.851
AMFFN2023×330527.436214.38431.1940.855
TransENet2022×337,49514.343237.49530.7000.844
TTST2024×318,34175.13476113.59931.1710.850
Ours ×324825.058815.05231.3620.859
SRCNN2014×4697.1220.14926.3020.697
FSRCNN2016×42523.41360.43028.1020.749
IMDN2019×471572.8705.77328.7300.762
ECBSR2021×41534202.425412.49028.4910.766
OMNISR2023×479279.768815.99129.0060.782
FMEN2022×434134.8342.64228.5150.765
CTNet2021×436044.925410.10328.8140.776
FENet2022×436635.947096.23828.7610.773
HSENET2022×4543019.218166.27628.7430.773
AMFFN2023×431428.436426.45529.0340.783
TransENet2022×437,45821.4372237.45828.7560.774
TTST2024×418,30476.83480112.15028.4760.778
Ours ×425725.958816.52629.1340.786
Table 4. PSNR and SSIM results for 5 categories in the AID dataset using different methods, including ×2, ×3, and ×4.
Table 4. PSNR and SSIM results for 5 categories in the AID dataset using different methods, including ×2, ×3, and ×4.
ModelScaleImage
BarelandDesertFarmlandPlaygroundBridge
PSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIMPSNRSSIM
SRCNN×238.8510.89443.3470.96735.8240.91130.4960.88032.8310.910
FSRCNN×239.4210.89944.0320.97137.1670.92632.3100.90036.4170.945
IMDN×240.6450.90244.9010.97638.1210.93933.7650.92337.5590.954
ECBSR×240.7880.90545.1280.97738.4280.94634.2470.92937.8330.955
OMNISR×240.6980.90344.8950.97638.1150.93933.7970.92337.5670.954
FMEN×240.6960.90344.8660.97538.0810.93733.8430.92337.5880.954
CTNet×240.7820.90545.1530.97738.2700.94133.9220.92537.6740.955
FENet×240.7250.90345.1080.97738.3330.94233.8550.92437.5940.954
HSENET×240.7880.90645.0580.97538.2630.93533.9960.91937.5580.948
AMFFN×240.7890.90945.1470.97538.3920.93934.0120.92637.5590.949
TransENet×239.4260.90539.8010.97438.1040.93633.7330.91637.3410.947
TTST×240.9970.91345.1420.97638.4490.94034.1770.92537.6400.950
Ours×240.9470.91245.3820.97838.4820.94434.3520.92337.9430.951
SRCNN×331.8850.81039.9700.92732.6100.82227.0730.76129.0240.810
FSRCNN×332.7590.82440.7380.93733.6910.84528.7030.79332.2700.881
IMDN×332.9760.82841.2900.94234.6350.86929.9320.83333.2650.895
ECBSR×333.0450.83041.5470.94534.1040.88130.5200.84733.4880.899
OMNISR×332.9250.82941.2650.94234.6310.86929.9730.83433.2510.895
FMEN×332.9040.82641.2490.94234.6420.86830.0400.83633.2920.895
CTNet×332.6160.82241.0950.94134.1430.85829.1850.81232.8370.888
FENet×333.0110.83041.5240.94534.7710.87329.9750.83533.2970.895
HSENET×338.4740.93141.5390.94334.9030.86730.3770.83033.4210.885
AMFFN×338.4650.93141.5790.94334.8890.87030.4630.83333.4510.897
TransENet×338.2720.92941.0480.94134.7530.86430.3170.82733.3820.884
TTST×338.3340.93241.4460.93934.8520.87130.4180.84133.4410.896
Ours×338.5930.93241.6780.94435.1440.87630.7660.83933.7310.898
SRCNN×432.0380.83338.4150.89631.3210.76425.4990.67527.1980.734
FSRCNN×434.7230.87439.1570.90832.1520.78626.9000.70829.9160.815
IMDN×435.3900.88239.5880.91333.0280.81427.8790.75330.8460.835
ECBSR×435.5200.88539.8250.91733.3810.82728.4230.77431.1120.841
OMNISR×435.3600.88239.0810.91132.8740.80827.7390.74630.6140.827
FMEN×435.3810.88239.5500.91232.9630.81027.9110.75530.8270.835
CTNet×435.1560.87938.9170.90932.3520.79227.1930.72330.1820.817
FENet×435.6190.88539.8130.91733.1110.81827.9000.75530.8410.836
HSENET×435.6510.87939.8390.91733.3340.81931.0270.82631.0380.826
AMFFN×435.6390.88539.9070.91833.3830.82128.5300.76431.0540.844
TransENet×435.5910.87939.3630.91633.3540.82228.6130.76631.2740.831
TTST×435.6520.88539.8520.91533.2540.81628.5890.76330.9670.833
Ours×435.7240.88839.9880.92233.4970.82428.6210.76731.3830.847
Table 5. PSNR and SSIM results of different methods on DIV2K test dataset at scale ×4.
Table 5. PSNR and SSIM results of different methods on DIV2K test dataset at scale ×4.
ModelYearScaleParams [K]FLops [G]Memory [MB]Time [ms]DIV2K
PSNRSSIM
SRCNN2014×4697.1220.14926.8290.762
FSRCNN2016×42523.41360.43027.3890.776
IMDN2019×471572.8705.77328.5800.808
ECBSR2021×41534202.425412.49029.0070.816
OMNISR2023×479279.768815.99128.1690.804
FMEN2022×434134.8342.64228.4100.804
CTNet2021×436044.925410.10328.0780.792
FENet2022×436635.947096.23828.6460.809
HSENET2022×4543019.218166.27628.7540.815
AMFFN2023×431428.436426.45528.5040.806
TransENet2022×437,45821.4372237.45828.3790.804
TTST2024×418,30476.83480112.15028.7320.808
Ours ×425725.958816.52629.1480.819
Table 6. Effectiveness of FDEB on the UC Merced test dataset for ×4.
Table 6. Effectiveness of FDEB on the UC Merced test dataset for ×4.
SettingsFDEBCFFMParams [K]FLops [G]PSNRSSIM
1××41242.327.040.733
2×31728.528.050.748
3×43141.428.080.751
425725.928.160.774
Table 7. Performance results with different settings of n on the UCMerced dataset at scale ×4.
Table 7. Performance results with different settings of n on the UCMerced dataset at scale ×4.
SettingsParams [K]FLops [G]PSNRSSIM
215311.627.960.770
418719.728.010.771
625725.928.160.774
828927.128.140.772
1039227.728.100.771
Table 8. Performance results with different fusion structures on the UC Merced test dataset for ×4 SR.
Table 8. Performance results with different fusion structures on the UC Merced test dataset for ×4 SR.
VariantParams [K]FLops [G]PSNRSSIM
a60726.827.440.7645
b31428.428.030.7701
c36635.927.560.7526
d25725.928.160.7743
Table 9. Performance results with different settings on the ucmerced test dataset for ×4 SR.
Table 9. Performance results with different settings on the ucmerced test dataset for ×4 SR.
SettingsParams [K]FLops [G]PSNRSSIM
w LKA + CSA25725.928.160.774
w LKA23926.127.570.770
w CSA23226.727.840.770
w/o LKA + CSA23024.626.620.713
Table 10. Performance results with different attentions on the ucmerced test dataset for ×4 SR.
Table 10. Performance results with different attentions on the ucmerced test dataset for ×4 SR.
AttentionsParams [K]FLops [G]PSNRSSIM
w ESA + CA26527.527.890.770
w ESA + LKA25526.528.020.770
w ESA + CSA25725.928.140.774
w CA + LKA25326.028.050.771
w LKA + CSA25725.928.160.774
Table 11. Quantitative comparison of different convolution decomposition approaches.
Table 11. Quantitative comparison of different convolution decomposition approaches.
SettingsParams [K]FLops [G]PSNRSSIM
w DyConv25725.928.160.774
w Conv27628.128.020.772
Table 12. Performance results with different numbers of layers in DyConv on the UCMerced test dataset for ×4 SR.
Table 12. Performance results with different numbers of layers in DyConv on the UCMerced test dataset for ×4 SR.
Number of LayersParams [K]FLops [G]PSNRSSIM
123023.827.660.763
527624.627.970.769
725725.928.160.774
1332028.128.120.772
1534532.528.070.770
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, B.; Wang, B.; Fu, Y.; Ma, X.; Sun, L. DyLKANet: A Lightweight Dynamic Distillation Network for Remote Sensing Image Super-Resolution Based on Large-Kernel Attention. Electronics 2025, 14, 1112. https://doi.org/10.3390/electronics14061112

AMA Style

He B, Wang B, Fu Y, Ma X, Sun L. DyLKANet: A Lightweight Dynamic Distillation Network for Remote Sensing Image Super-Resolution Based on Large-Kernel Attention. Electronics. 2025; 14(6):1112. https://doi.org/10.3390/electronics14061112

Chicago/Turabian Style

He, Bing, Bingchao Wang, Ying Fu, Xuebing Ma, and Liqun Sun. 2025. "DyLKANet: A Lightweight Dynamic Distillation Network for Remote Sensing Image Super-Resolution Based on Large-Kernel Attention" Electronics 14, no. 6: 1112. https://doi.org/10.3390/electronics14061112

APA Style

He, B., Wang, B., Fu, Y., Ma, X., & Sun, L. (2025). DyLKANet: A Lightweight Dynamic Distillation Network for Remote Sensing Image Super-Resolution Based on Large-Kernel Attention. Electronics, 14(6), 1112. https://doi.org/10.3390/electronics14061112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop