LSANet: Lightweight Super Resolution via Large Separable Kernel Attention for Edge Remote Sensing

Yong, Tingting; Liu, Xiaofang

doi:10.3390/app15137497

Open AccessArticle

LSANet: Lightweight Super Resolution via Large Separable Kernel Attention for Edge Remote Sensing

by

Tingting Yong

and

Xiaofang Liu

^*

School of Computer Science and Engineering, Sichuan University of Science and Engineering, Yibin 644000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7497; https://doi.org/10.3390/app15137497

Submission received: 8 May 2025 / Revised: 25 June 2025 / Accepted: 28 June 2025 / Published: 3 July 2025

Download

Browse Figures

Versions Notes

Abstract

In recent years, remote sensing imagery has become indispensable for applications such as environmental monitoring, land use classification, and urban planning. However, the physical constraints of satellite imaging systems frequently limit the spatial resolution of these images, impeding the extraction of fine-grained information critical to downstream tasks. Super-resolution (SR) techniques thus emerge as a pivotal solution to enhance the spatial fidelity of remote sensing images via computational approaches. While deep learning-based SR methods have advanced reconstruction accuracy, their high computational complexity and large parameter counts restrict practical deployment in real-world remote sensing scenarios—particularly on edge or low-power devices. To address this gap, we propose LSANet, a lightweight SR network customized for remote sensing imagery. The core of LSANet is the large separable kernel attention mechanism, which efficiently expands the receptive field while retaining low computational overhead. By integrating this mechanism into an enhanced residual feature distillation module, the network captures long-range dependencies more effectively than traditional shallow residual blocks. Additionally, a residual feature enhancement module, leveraging contrast-aware channel attention and hierarchical skip connections, strengthens the extraction and integration of multi-level discriminative features. This design preserves fine textures and ensures smooth information propagation across the network. Extensive experiments on public datasets such as UC Merced Land Use and NWPU-RESISC45 demonstrate LSANet’s competitive or superior performance compared to state-of-the-art methods. On the UC Merced Land Use dataset, LSANet achieves a PSNR of 34.33, outperforming the best-baseline HSENet with its PSNR of 34.23 by 0.1. For SSIM, LSANet reaches 0.9328, closely matching HSENet’s 0.9332 while demonstrating excellent metric-balancing performance. On the NWPU-RESISC45 dataset, LSANet attains a PSNR of 35.02, marking a significant improvement over prior methods, and an SSIM of 0.9305, maintaining strong competitiveness. These results, combined with the notable reduction in parameters and floating-point operations, highlight the superiority of LSANet in remote sensing image super-resolution tasks.

Keywords:

super resolution; remote sensing; lightweight; large separable kernel; feature distillation

1. Introduction

Remote sensing imagery constitutes the fundamental data source for geospatial analysis. High-resolution observations, in particular, have proven highly valuable across diverse domains, including ecosystem assessment, land use mapping, and smart city development [1]. In the realm of remote sensing, satellite-captured imagery often suffers from low resolution. This is attributable to influences like orbital distance, atmospheric interference, and satellite motion during data acquisition [2], posing substantial hurdles for subsequent interpretation tasks. While upgrading satellite imaging hardware could enhance resolution, such an approach incurs exorbitant costs. In contrast to hardware overhauls requiring massive capital outlays, super resolution offers a more economical means of obtaining high-spatial-resolution remote sensing data.

In recent years, deep learning has achieved remarkable advancements in the domain of super-resolution tasks. Notably, convolutional neural networks, leveraging their powerful feature-learning capabilities, have seen widespread adoption across diverse fields such as remote sensing, medical imaging, and computer vision. Dong et al. pioneered the super-resolution convolutional neural network (SRCNN) [3], enabling end-to-end super-resolution reconstruction within a three-layer shallow CNN architecture. Relative to traditional methods rooted in interpolation or dictionary learning, its reconstruction quality witnessed a marked enhancement. Subsequently, Kim et al. introduced the very deep super resolution (VDSR) [4], which further bolstered the performance of SR tasks by deepening the network to 20 layers and incorporating residual learning. GoogLeNet [5] devised a multi-scale Inception network structure, employing convolution kernels of varying sizes to extract features from images of different dimensions, thus making efficient use of computational resources. However, its generalization ability is constrained, implying that for specific tasks, tailored adjustments are requisite. Nevertheless, the exponential rise in computational complexity and parameter count inherent to deep networks has emerged as a pivotal factor impeding their broader application.

In order to improve the expressiveness of SR models and reduce computational costs, researchers have explored network structure optimization: for example, enhanced deep residual networks (EDSR) [6] achieved notable performance gains on benchmark datasets by removing batch normalization layers and increasing network depth, while residual dense network (RDN) [7] adopted a dense connection strategy to fully reuse features across different layers and facilitate information flow transmission. Very deep residual channel attention networks (RCAN) [8] further introduced channel attention based on the RDN structure, enabling adaptive adjustment of channel weights to enhance the model’s focus on high-frequency detail information. However, the architectural design of these models involves a large number of parameters, leading to excessively high computational overhead in practical deployment scenarios. To address this, lightweight SR networks have emerged as a key research direction in recent years: CARN [9] employed a recursive architecture and cascade feature refinement to reduce parameter counts without compromising reconstruction quality, while the information multi-distillation network (IMDN) [10] significantly cut down on computation by grouping features and fusing information through multiple approaches—still delivering performance competitive with RDN. The residual feature distillation network (RFDN) [11] further optimized IMDN’s structure to boost inference speed and parameter utilization, and MAFFSRN [12] reduced computational redundancy via feature filtering to lower complexity while retaining high accuracy. In computer vision, large-kernel convolution has attracted extensive attention, with designs like RepLKNet [13] and ConvNeXt [14] effectively reducing computational complexity while preserving model expressiveness. Inspired by this, some researchers have attempted to introduce large-kernel convolution into SR tasks. For example, large kernel attention (LKA) [15] boosts the capability to model long-range dependencies via the large kernel attention mechanism. For remote sensing scenarios, existing lightweight SR networks face critical limitations: First, they lack sufficient capability to model intricate details like complex textures and edges in remote sensing images, failing to meet the demand for fine-grained information in remote sensing interpretation. Second, their network architectures are inefficient in fusing multi-scale and heterogeneous features of remote sensing data, which hinders feature representation and utilization. Third, the balance between lightness and performance improvement is suboptimal; in practical remote sensing deployment scenarios such as edge computing, the computational and storage costs still need optimization. Thus, designing an efficient, lightweight remote sensing image super-resolution network with long-range information modeling capability has become a crucial problem that demands immediate solution.

Building upon existing lightweight module architectures, this paper proposes a large separable kernel attention network (LSANet). The model enhances the RFDN framework by replacing the original SRB with the large separable kernel attention (LSKA) module, which significantly expands the network’s receptive field and strengthens its capability to capture both global and local contextual information. Additionally, a novel residual large kernel separable feature distillation module is designed, integrating contrast-aware channel attention, LSKA, and multi-layer skip connections to further optimize feature extraction and fusion efficiency. The primary contributions of this model are as follows:

Breaking through module design bottlenecks: The large separable kernel attention mechanism (LSKA) is innovatively introduced into the feature distillation network, and the residual feature distillation module (RLSKFDM) is proposed. By constructing a large-receptive-field attention mechanism, the modeling of complex textures and edge information in remote sensing images is strengthened, breaking through the limitation of insufficient representation ability of shallow residual blocks and providing a more effective feature processing path for the detail restoration of remote sensing images;
Innovation in multi-dimensional feature enhancement: A residual feature enhancement module (RFEM) that fuses multiple attention mechanisms is designed. It integrates the contrast-aware channel attention (CCA) and the large separable kernel attention (LSKA) and combines with a multi-level skip connection structure. It accurately strengthens local feature expression and multi-scale information fusion, realizes the efficient coupling of shallow details and deep semantics, and meets the interpretation requirements of multi-dimensional features of remote sensing images;
Synergistic optimization of lightness and performance: While improving the reconstruction performance, through the design of lightweight modules and the optimization of the feature flow mechanism, LSANet can greatly reduce the number of parameters and computational overhead while maintaining a competitive restoration effect. An efficient model suitable for edge computing and low-resource remote sensing scenarios is created, solving the “performance - cost” imbalance problem of lightweight networks in the actual application of remote sensing;
Verification for remote sensing scenarios: Systematic experiments are carried out on two representative remote sensing image datasets, NWPU-RESISC45 and UC Merced Land Use. The advantages of LSANet in image reconstruction quality, model efficiency, and visual performance are fully verified. Covering typical application data scenarios of remote sensing, it fully demonstrates the generality and practical value of the model and provides strong technical support for the implementation of remote sensing super resolution.

2. Related Works

2.1. Attention Mechanism

The attention mechanism has gained widespread adoption in deep learning tasks owing to its exceptional capability to capture long-range contextual information, emerging as a pivotal component in computer vision [16,17]. The squeeze-and-excitation (SE) network [18] pioneered channel attention by modeling inter-channel dependencies to recalibrate feature maps. Building on this, CBAM [19] integrated spatial and channel attention modules to further optimize feature representation. Non-local neural networks [20] introduced global feature learning, enabling more effective modeling of long-range dependencies. In super resolution (SR), attention mechanisms have significantly enhanced reconstruction quality: RCAN [8] used residual channel attention blocks (RCAB) to refine high-frequency details, while SAN [21] leveraged second-order channel attention (SOCA) to exploit feature statistical information. SwinIR [22] adopted the Swin Transformer architecture with W-MSA to model local–global feature dependencies, and CAS-ViT [23] combined CNN and Transformer structures with cross-scale channel attention (CCSA) for efficiency–performance balance. Concurrently, efforts to reduce attention’s computational cost have emerged: IMDN [10] proposed contrast-aware channel attention (CCA) to minimize complexity while focusing on key information, and RFANet [24] introduced the efficient spatial attention (ESA) module to enhance region-of-interest focus for improved SR quality. These studies highlight that designing efficient lightweight attention mechanisms remains a critical research direction in SR, with broad practical implications. While attention modules like RCAB, CCA, and ESA differ in architecture, their core mechanisms all focus on constructing global channel dependencies. Xu et al. [25] proposed a remote sensing image super-resolution reconstruction model integrating multiple attention mechanisms, attempting to enhance feature interaction. However, it fails to adequately address the balance between long-range dependency modeling and lightweight deployment for remote sensing images.

However, the existing methods often fail to balance long-range dependency modeling with lightweight deployment, which is critical for remote sensing applications constrained by limited onboard resources. To address this, we incorporate a novel attention mechanism, large separable kernel attention, which expands the receptive field by combining depthwise convolution and dilated convolution in a separable form. This design allows the network to capture fine-grained spatial details—such as road edges and building structures—while maintaining low computational cost. Integrated into an improved feature distillation framework, LSKA, together with a contrast-aware channel attention module, enables our LSANet to achieve a better trade-off between reconstruction quality and efficiency, addressing key limitations of prior attention-based SR models.

2.2. Single Image Super Resolution

The single image super-resolution task aims to reconstruct a high-resolution image from its low-resolution counterpart. In recent years, deep-learning-based methods have made remarkable progress in this field, evolving from early shallow convolutional networks to deeper architectures and gradually integrating attention mechanisms and Transformer frameworks to enhance both reconstruction quality and computational efficiency. Initially, SRCNN [3] pioneered the use of a three-layer convolutional neural network to learn the LR-to-HR mapping, setting a precedent for deep learning in super resolution. Subsequently, VDSR [4] mitigated the gradient vanishing issue through deep residual learning and leveraged a large receptive field to improve reconstruction performance. EDSR [6] further enhanced model representational capability by removing batch normalization layers. Many studies [26,27,28,29] incorporated attention mechanisms into the EDSR framework to boost performance. Considerable research has focused on optimizing network block designs and feature propagation strategies—for instance, dense connections have been proven effective in enhancing feature expressiveness for SISR tasks [30,31,32]. SRDenseNet [33] introduces a dense connection structure that realizes efficient feature reuse and alleviates the problem of gradient disappearance. However, the use of dense connections leads to an increase in computational overhead, making it unsuitable for lightweight requirements. To further reduce the computational overhead, some studies have introduced depthwise separable convolution and grouped convolution structures [34,35,36]. DRCN [37] and DRRN [38] adopt a recursive learning strategy to reduce the number of parameters by sharing weights. However, these methods overlook the demand for high-resolution details of complex ground objects in remote sensing images, such as urban building clusters, resulting in semantic ambiguity in the reconstruction results. LapSRN [39] uses a progressive upsampling mechanism to improve inference efficiency that overlooks the multi-scale feature hierarchy critical for remote sensing images; this limitation leads to blurred boundaries in high-magnification SR tasks. Bian et al. [40] proposed Vision Transformers based on structural re-parameterization, demonstrating innovations in model lightweighting but inadequately modeling remote-sensing-specific texture and edge details. Peng et al. [41] presented a context-aware lightweight remote sensing image super-resolution network, yet it still has room for enhancement in multi-scale feature fusion and performance balance. Moreover, the Transformer architecture has attracted significant attention in the super-resolution field due to its outstanding ability to model long-range dependencies. Although the above-mentioned methods have made remarkable progress in performance, they still encounter significant challenges in scenarios with limited computing resources. Therefore, designing a more efficient network architecture to further reduce computational complexity while maintaining high reconstruction quality remains a key research topic in the field of super resolution. Our LSANet tackles this challenge by incorporating hierarchical skip connections into the RFEM, facilitating the seamless integration of shallow-scale texture features and deep-scale semantic features. This design showcases enhanced multi-scale feature integration capabilities, specifically adapted to the complex multi-resolution characteristics of remote sensing scenes.

3. Method

3.1. Network Architecture

The LSANet architecture proposed in this study is designed to strike an optimal balance between reconstruction quality and computational efficiency for super-resolution tasks, as illustrated in Figure 1. Building upon the RFDN framework, the network incorporates large separable kernel attention mechanisms to enhance performance. The architecture consists of three main components: shallow feature extraction, deep feature extraction, and image reconstruction. The overall framework shares similarities with both RFDN and IMDN in its structural design.

The shallow feature extraction module captures fundamental texture and structural information, establishing a foundational representation for subsequent deep processing. The deep feature extraction stage, integrated with residual connections to shallow features, further enhances the representational depth by refining multi-scale contextual details. The image reconstruction module ensures accurate restoration of spatial details with minimal artifacts, leveraging pixel-wise refinement to preserve edge integrity in remote sensing scenes.

In the shallow feature extraction stage, a

3 \times 3

convolution is used to extract preliminary shallow features from low-resolution remote sensing images; the deep feature extraction consists of multiple RLSKFDM, feature fusion components, and long jump connections; the image reconstruction consists of a

3 \times 3

convolution and a layer of PixelShuffle.

Assuming that the input low-resolution image is

I_{L R}

, and the output high-resolution image is

I_{S R}

, the shallow feature extraction can be expressed as

F_{S F} = H_{3 \times 3} (I_{L R}),

(1)

where

F_{S F}

represents the extracted features, and

H_{3 \times 3}

represents a single layer of

3 \times 3

convolution. Subsequently, deeper features are extracted from the initial features

F_{S F}

through multiple RLSKFDM blocks:

\{\begin{matrix} F_{1} = H_{R 1} (F_{S F}) \\ F_{2} = H_{R 2} (F) \\ \dots \\ F_{n} = H_{R n} (F_{n - 1}) \end{matrix},

(2)

where

F_{1, 2, . . . n}

represents the output of each RLSKFDM, and

H_{R 1, R 2, . . . R n}

represents the RLSKFDM of each stage. The output features from each RLSKFDM block are concatenated and fused to form a richer feature representation:

F_{C} = H_{C} (F_{1}, F_{2}, \dots, F_{n}),

(3)

F_{D F} = H_{3 \times 3} (H_{H S} (H_{1 \times 1} (F_{C}))) + F_{S F},

(4)

where

F_{C}

represents the output features of each module,

H_{C}

represents the Concat connection,

F_{D F}

represents the fused features,

H_{3 \times 3}

represents a single layer

3 \times 3

convolution,

H_{1 \times 1}

represents a single layer

1 \times 1

convolution, and

H_{H S}

represents the Hard Swish activation function. In the image reconstruction stage, after a

3 \times 3

convolution layer and PixelShuffle operation, upsampling is performed to obtain a high-resolution image

I_{H R}

:

I_{H R} = H_{P S} (H_{3 \times 3} (F_{D F})),

(5)

where

H_{P S}

represents the PixelShuffle operation.

3.2. Residual Large Kernel Separable Feature Distillation Module (RLSKFDM)

This paper proposes an enhanced feature distillation module termed the residual large kernel separable feature distillation module (RLSKFDM), with its architecture illustrated in Figure 2.

This module builds upon the RFDB framework with key architectural improvements: replacing the original SRB module in RFDB with RLSKAB to introduce large receptive field feature learning, enabling the model to more effectively capture global contextual information and local details; substituting the CCA module in RFDB with RFEM to enhance feature extraction efficiency and channel correlation utilization. RFEM leverages an efficient channel attention mechanism to enhance focus on key features and optimizes feature representation during the distillation process. Through these designs, RLSKFDM significantly improves feature distillation quality while maintaining module lightness.

Assuming the input feature is

F_{i n}

, the calculation process of the i-th RLSKFDM can be depicted as

F_{R L A S K B}^{i} = R L S K A B (F_{i n}^{i}),

(6)

Then, the intermediate features of multiple branches are distilled and fused using

1 \times 1

convolution and feature concatenation operations:

F_{fuse}^{i} = Concat (ϕ_{1} (F_{R L S K A B}^{i}), ϕ_{2} (F_{R L S K A B}^{i}), ϕ_{3} (F_{R L S K A B}^{i})),

(7)

Here,

F_{fuse}^{i}

represents the fused features generated by the distillation and concatenation operations of

F_{R L S K A B}^{i}

, and

ϕ

represents the feature transformation of

1 \times 1

convolution in the branch. The fused features are then further processed by RFEM:

F_{enhanced}^{i} = R F E M (F_{fuse}^{i}),

(8)

Here,

F_{enhanced}^{i}

represents the enhanced features after further processing by RFEM, and

R F E M

represents the enhancement mechanism of RFEM. Finally, the module obtains the output features via residual connections:

F_{out}^{i} = F_{enhanced}^{i} + F_{in}^{i},

(9)

3.3. Residual Large Kernel Separable Attention Block (RLSKAB)

The LSKA proposed by Kim et al. [42] represents an innovative integration of traditional convolution and self-attention mechanisms embedded within the visual attention network (VAN) [43]. Its architecture is detailed in Figure 3a. While self-attention excels at capturing long-range dependencies, its high computational complexity and lack of channel adaptability hinder practical applications in image processing. Similarly, large-kernel convolution expands the receptive field but introduces significant computational overhead and parameter growth. To address these challenges, LSKA decomposes large-kernel convolution into multiple simple convolutional layers, incorporating self-attention principles to reduce both computational complexity and parameter count. This approach enhances network expressiveness while maintaining effective modeling of long-range dependencies.

The RLSKAB structure proposed in this study is illustrated in Figure 3b. This module builds upon the attention block of the VAN network, integrating the architectural logic of shallow residual blocks from residual feature distillation networks. By leveraging large separable kernel attention mechanisms, RLSKAB enhances feature extraction capabilities and strengthens the modeling of global–local contextual information. Additionally, the module incorporates the Hard Swish activation function after residual skip connections, which boosts the network’s non-linear representation capacity while incurring minimal computational overhead. This design ensures superior performance with reduced resource consumption, making it particularly well-suited for remote sensing image super-resolution reconstruction tasks.

For a single RLSKAB, assuming the input feature is

F_{R L S K A B_{-} In}

, its output feature

F_{R L S K A B_{-} out}

can be expressed as

F_{Conv} = {Conv}_{1 \times 1} (B N (F_{R L S K A B_{-} in})),

(10)

F_{L S K A} = LSKA (G e L U (F_{Conv})),

(11)

F_{R L S K A B_{-} out} = H S ({Conv}_{1 \times 1} (F_{L S K A}) + F_{R L S K A B_{-} in}),

(12)

where

F_{Conv}

and

F_{L S K A}

represent the output of the first layer

3 \times 3

convolution and the output of the large separable kernel attention,

{Conv}_{1 \times 1}

represents the convolution operation,

B N

represents the Batch Norm standardization,

G e L U

and

H S

represent the GeLU activation function and the Hard Swish activation function, respectively. Based on the integration of VAN’s attention mechanism and RFDN logic, RLSKAB effectively expands the receptive field through LSKA and improves the perception of the global information of remote sensing images.

3.4. Residual Feature Enhancement Module (RFEM)

In IMDN and RFDN, a single-layer CCA module is commonly used to evaluate the channel-wise importance of distilled features and enhance them through weighted adjustment. However, this approach only optimizes in the channel dimension, failing to deeply exploit spatial detail information. As lightweight models often extract coarse-grained features, direct reliance on channel-wise adjustment tends to overlook latent detailed information, thus limiting the enhancement capability of fused features. To address these challenges, this paper proposes a residual feature enhancement module (RFEM), whose architecture is depicted in Figure 2.

This module enhances the CCA framework by integrating a “Batch Norm + LSKA” structure to further exploit spatial correlations in fused features and improve the model’s capability to model long-range feature dependencies. Batch normalization is employed to standardize feature distributions, mitigating instability caused by excessive activation values. To enhance feature fluidity and preserve multi-level information, the module incorporates multiple skip connections, enabling seamless transmission of features from each submodule to subsequent network layers. Additionally, by inserting convolutional layers before and after skip connections, the module allows flexible adjustment of feature channels, enhancing its applicability and efficiency. This design fully integrates spatial enhancement with feature detail extraction, providing robust support for improving overall network performance.

Assuming that the input characteristic of RFEM is

H S

and the output characteristic is

H S

, the characteristic optimization module of the i-th RLSKFDM can be depicted as

F_{residual 1} = C C A (Conv (F_{i, R E F M_{-} i n})) + F_{Conv},

(13)

F_{residual 2} = LSKA (B N (F_{residual 1})) + F_{residual 1},

(14)

F_{i, R F E M} = Conv (F_{r e s i d u a l 2}) + F_{i, R E F M_{-} in},

(15)

where

F_{residual 1}

and

F_{residual 2}

represent the features after two internal skip connections, and

C C A

,

L S K A

, and

B N

represent CCA, LSKA, and Batch Norm layers, respectively.

To effectively capture long-range dependencies in remote sensing images while maintaining a lightweight structure, the LSKA module employs a decomposition strategy of large-kernel convolutions into consecutive

K \times 1

and

1 \times K

depthwise convolutions. Based on empirical practices in similar large-kernel designs, we set the kernel size

K = 21

and the dilation rate

d = 3

to expand the receptive field without significantly increasing the parameter count. This design choice follows the general principle used in lightweight super-resolution networks like RFDN, which balances performance and efficiency.

4. Implementation Details

4.1. Experimental Configuration

In the experimental setup, the batch size is configured to 16, and the number of iterations is determined as 400. To improve data diversity and model generalization for remote sensing image super resolution, the following data augmentation strategies are employed: random horizontal/vertical flipping, 90°/180°/270° rotation, and pixel-wise Gaussian noise injection. For multi-scale training compatibility, batches are composed by randomly sampling images from ×2, ×3, and ×4 super-resolution tasks. The model is trained using the Adam optimizer (

β_{1} = 0.9

,

β_{2} = 0.99

) with an L1 loss function for optimization.The initial learning rate is set to

2 \times 10^{- 5}

, and dynamic regulation is implemented via MultiStepLR: the rate is decayed by 0.5 at 200 and 300 epochs. The experiment is conducted on an Ubuntu system with an RTX A5000 24GB GPU. The operating environment includes Python 3.9, PyTorch 2.1.0, and CUDA 11.9.

In the model parameter settings, the kernel sizes of the depthwise convolution and depthwise dilated convolution in the large separable kernel attention are set to 5 and 7, respectively. Following the configuration of the RFDN architecture, the number of residual large separable kernel feature distillation modules m and the intermediate feature channel count are set to 6 and 52, respectively. This configuration is chosen to maintain comparability in model depth and complexity, enabling a fair evaluation of the proposed enhancements.

This paper uses PSNR [44] and SSIM [45] as the main evaluation metrics, where PSNR primarily measures the noise suppression capability of the remote sensing image reconstruction model, and SSIM evaluates its performance in preserving texture structures. The higher the values of these two metrics, the better the model’s performance. It is noted that the reported PSNR and SSIM values are average results across multiple test datasets. Additionally, this study assesses the model’s parameter count and computational complexity to further demonstrate its efficiency in terms of model scale and inference performance.

4.2. Dataset

In this study, low-resolution images were generated by performing bicubic downsampling on real-world images using MATLAB R2023b (https://ww2.mathworks.cn/products/matlab.html, accessed on 10 January 2025). The experimental datasets are derived from authentic remote sensing data: the NWPU-RESISC45 and UC Merced Land Use datasets. The UC Merced Land Use dataset is a publicly available resource widely adopted in land classification and remote sensing research. Comprising 2100 images across 21 categories (100 images per category), it covers diverse environments such as farmland, forests, beaches, and urban areas. Each image measures 256 × 256 pixels with a spatial resolution of 0.3 m/pixel. The NWPU-RESISC45 dataset encompasses 45 categories of remote sensing images, including typical ground objects and targets like airports, bridges, and football fields. With 700 images per category (31500 images in total), each image is 256 × 256 pixels at a spatial resolution of 30 m/pixel. Both datasets were divided into training and test sets at a 7:3 ratio.For LR image construction, bicubic interpolation was used to downsample HR images by factors of ×2, ×3, and ×4, with HR images retained as ground truth. For super-resolution tasks at different magnification factors, HR images were randomly cropped to generate 96 × 96, 144 × 144, and 196 × 196 pixel sample blocks for model training.

4.3. Quantitative Experimental Results

To verify the performance of our model in remote sensing image super-resolution reconstruction, we conducted experiments on the test dataset and compared our model with several state-of-the-art models, including SRCNN [3], IMDN [10], RFDN [11], CARN [9], CTN [40], FeNet [46], LGCNet [47], DCM [48], and HSENet [49]. All compared models were retrained on the same dataset to ensure fairness in the evaluation.

4.3.1. Quantitative Results Obtained on the UC Merced Land Use Dataset

As demonstrated in Table 1, this study evaluates the super-resolution performance and computational efficiency of each model across different scaling factors on the UC Merced Land Use dataset, with the best results highlighted in bold. LSANet consistently outperforms state-of-the-art methods in objective metrics, while maintaining low computational cost and a compact parameter footprint. Specifically, for the ×2 task, LSANet achieves the highest accuracy with only moderate computation, demonstrating its suitability for real-time or resource-constrained applications. As the upscaling factor increases to ×3 and ×4, LSANet exhibits strong scalability, with performance degradation well-controlled compared to competing models. This robustness stems from the network’s ability to effectively model both local textures and long-range dependencies through the proposed LSKA module, whose large receptive field and separable structure facilitate efficient context aggregation without substantial computational overhead. Additionally, the residual learning design—based on feature distillation and contrast-aware enhancement—supports fine-grained texture recovery even under high magnification, where spatial details are scarce. As shown in Figure 4, LSANet preserves clearer boundaries and structural consistency across all scaling levels, outperforming prior methods that tend to produce blurred contours or artifacts. These results suggest that LSANet achieves a favorable balance between reconstruction quality and model efficiency, making it a promising solution for high-resolution remote sensing image enhancement in practical applications.

4.3.2. Quantitative Results Obtained on the NWPU-RESISC45 Dataset

To further demonstrate the efficiency and adaptability of the proposed LSANet in remote sensing image super resolution, extensive experiments were conducted on the NWPU-RESISC45 dataset. As shown in Table 2, LSANet consistently achieved the highest PSNR values across all upscaling factors while maintaining a relatively lightweight model size and low computational complexity. Notably, under the ×2 upscaling factor, LSANet only requires 22.1G Multi-Adds and 602K parameters, yet outperforms all baseline methods, including IMDN, RFDN, and CTN, in terms of both PSNR and SSIM. Figure 5 presents visual comparisons for three representative scenes. Compared with other networks, LSANet generates images with finer textures, sharper edge structures, and clearer boundaries of roads and objects. This advantage is particularly evident in challenging regions such as basketball court lines and bridge markings. These results highlight LSANet’s ability to recover detailed spatial structures with minimal artifacts.

4.4. Ablation Study

This paper devises several ablation experiments with the aim of validating the efficacy of the proposed module. RFDN is used as the benchmark model, and the results of ×4 super-resolution reconstruction of different experiments on the UC Merced Land Use validation set are statistically analyzed. The validation set is 105 images randomly selected from the test set. The feature extraction block of this benchmark model is SRB, and the number of residual modules and channels is set to 6 and 48, respectively. The ablation experiment in this section consists of the feature extraction module, the feature processing part in the module, and the model dimension and activation function.

Although the ablation study primarily evaluates each component independently, we observe that combining RLSKFDM with the proposed RFEM module consistently yields better results than using either module alone. This suggests a synergistic interaction, where the LSKA-based RLSKFDM provides enhanced global context modeling, while the RFE module focuses on refining locally distilled features. The integration of these modules balances long-range dependency modeling and fine-grained reconstruction, which underpins the improved performance of LSANet.

4.4.1. Effectiveness of Feature Extraction Module

To validate the effectiveness of RLSKAB and determine the optimal large separable kernel attention configuration, this ablation study systematically experiments with the LSKA settings listed in the table. Table 3 presents the model performance when using different feature extraction modules. The two values in the “LSKA Size” column denote the kernel sizes of the depthwise convolution and depthwise dilated convolution (DW-Conv and DW-D-Conv) in the large kernel attention mechanism.

According to the data in the table, compared with Experiment (a) using only SRB as the feature extraction module, Experiment (b) incorporating RLSKAB achieves significant improvements: PSNR increases by 0.20 dB and SSIM by 0.0023, accompanied by a 39.3% reduction in parameters and 41.4% decrease in computational complexity. This demonstrates that RLSKAB enables more efficient feature extraction. Further analysis of Experiments (b)–(f) reveals that as the LSKA scale increases, the model’s computational cost slightly rises while performance continues to improve. For instance, Experiment (d) outperforms Experiment (b) with a 0.06 dB PSNR gain and 0.0018 SSIM improvement, while the computational overhead only increases by 11.0%.

Considering that the improvement of experiments (e) and (f) is limited compared with (d) (PSNR decreases by 0.04 dB and increases by 0.02 dB, respectively, and SSIM decreases by 0.0013 and increases by 0.0006, respectively), experiment (d) is finally selected as the optimal solution as the basis for subsequent experiments, and the corresponding LSKA size is (5, 7).

4.4.2. Effectiveness of the Feature Processing Part of the Module

The experimental results in Table 4 demonstrate that in contrast to the CCA-only experiment (d), the experiment incorporating LSKA (g) leads to an increase in PSNR by 0.03 dB. Regarding the number of parameters, there is a rise of 13.4%, and the computational complexity increases by 12.3%. This indicates that LSKA can boost the feature extraction ability. On this foundation, when the RFEM is introduced in experiment (h), PSNR and the relevant performance are further enhanced. The PSNR goes up by 0.05 dB compared to experiment (g), while the number of parameters and computational amount stay the same, thus verifying the effectiveness of RFEM. As the input dimension (Inp. Dim.) increases from 48 (in experiment h) to 60 (in experiment k), the PSNR climbs by 0.14 dB, and the computational amount (FLOPS/G) increases by 54.4%. Considering the overall performance and computational cost, the experiment with an input dimension of 52 (i) is chosen as the final scheme.

4.4.3. Effectiveness of Hard Swish Activation Function

To validate the impact of the Hard Swish activation function on super-resolution reconstruction performance, a set of ablation experiments was designed to compare the effects of different activation functions on model parameters, computational complexity, and reconstruction quality. The results are presented in Table 5. The experimental findings show that compared with configurations without an activation function (Experiment m) or using common functions (ReLU, Sigmoid, ELU, Tanh), Hard Swish achieves superior reconstruction performance in terms of PSNR and SSIM. Although integrating Hard Swish introduces a slight increase in both parameter count and computational complexity, the magnitude of this increase is minimal. In contrast, the performance enhancement is significant, indicating that this activation function not only maintains high computational efficiency but also effectively improves the super-resolution reconstruction quality of remote sensing images. Thus, as an efficient activation function, Hard Swish demonstrates good applicability and effectiveness in the super-resolution tasks of this study.

5. Conclusions

In this paper, we propose LSANet, a lightweight and efficient network for single-image super resolution in remote sensing scenarios. By introducing a large separable kernel attention mechanism and a compact residual learning structure, LSANet enhances feature representation while maintaining low computational complexity. Extensive experiments on two benchmark remote sensing datasets, UC Merced Land Use and NWPU-RESISC45, demonstrate that LSANet achieves competitive or superior performance compared to several state-of-the-art methods, especially in terms of PSNR, SSIM, and parameter efficiency. The visual results further validate its capability in reconstructing detailed structures such as roads and buildings with fewer artifacts and clearer edges.

Despite its effectiveness, several limitations remain. First, the current model is optimized for optical RGB imagery; its generalizability to other modalities such as SAR or hyperspectral images remains to be explored. Second, while LSANet supports scale factors up to ×4, performance under ultra-high scaling or in extremely low-quality inputs may degrade. Additionally, although the model is designed for low computational cost, experiments were conducted on high-end hardware; future work will include validation on edge or embedded devices to further demonstrate its hardware adaptability.

Looking ahead, we plan to extend LSANet to video super resolution via spatiotemporal modeling and explore hybrid architectures incorporating lightweight Transformer-style modules to enhance global dependency modeling. Furthermore, we aim to evaluate LSANet’s effectiveness in operational remote sensing tasks such as land cover classification or disaster monitoring, where improved spatial resolution can yield tangible societal and environmental benefits.

Author Contributions

Conceptualization, T.Y. and X.L.; methodology, T.Y.; data curation, T.Y.; writing—original draft, T.Y.; writing—review and editing, T.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by in part by the Special Fund for Training High Level Innovative Talents of Sichuan University of Science and Engineering, under Grant B12402005, and in part the National Natural Science Foundation of China under Grant 42471437.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets in this study are publicly available, and the websites providing access to these datasets are included in the article.

Acknowledgments

We are grateful to the anonymous reviewers for their precious opinions and suggestions, whose professional reviews conspicuously enhanced the quality of this paper. Furthermore, our appreciation goes to the editorial team for their painstaking efforts and guidance throughout the publication process.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:

SRCNN	Super-resolution convolutional neural network
VDSR	Very deep super resolution
GoogLeNet	Google Inception net
EDSR	Enhanced deep residual networks
RDN	Residual dense network
RCAN	Very deep residual channel attention network
CARN	Cascading residual network
IMDN	Information multi-distillation network
RFDN	Residual feature distillation network
MAFFSRN	Multi-attentive feature fusion super-resolution network
RepLKNet	Reparameterized large kernel network
LKA	Large kernel attention
SE	squeeze-and-excitation network
CBAM	Convolutional block attention module
SAN	Storage area network
SwinIR	Swin Transformer for image restoration
CAS-ViT	Convolutional additive self-attention vision Transformers
RFANet	Residual feature aggregation network
SRDenseNet	Image super-resolution using dense skip connections network
DRCN	Deep recursive convolutional network
DRRN	Deep recursive residual network
LapSRN	Laplacian Pyramid super-resolution network
LSANet	Large separable kernel attention network
CCA	Contrast-aware channel attention
LSKA	Large separable kernel attention
VAN	Visual attention network
RLSKFDM	Residual large separable kernel feature distillation module
RLSKAB	Residual large kernel separable attention block
RFEM	Residual feature enhancement module
PSNR	Peak signal-to-noise ratio
SSIM	Structural similarity index
LGCNet	Lightweight global context network
DCM	Dual channel module
HSENet	Hybrid spectral enhancement network

References

Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning texture transformer network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision And pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5791–5800. [Google Scholar]
Chen, G.; Wang, H.; Chen, K.; Li, Z.; Song, Z.; Liu, Y. A survey of the four pillars for small object detection: Multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 1–18. [Google Scholar] [CrossRef]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Part IV 13. pp. 184–199. [Google Scholar]
Kim, J.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 252–268. [Google Scholar]
Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2024–2032. [Google Scholar]
Liu, J.; Tang, J.; Wu, G. Residual feature distillation network for lightweight image super-resolution. In Proceedings of the Computer Vision—ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Part III 16. pp. 41–55. [Google Scholar]
Hu, Z.; Sun, W.; Chen, Z. Lightweight image super-resolution with sliding Proxy Attention Network. Signal Process. 2025, 227, 109704. [Google Scholar] [CrossRef]
Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31×31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11963–11975. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
Huang, W.; Ju, M.; Yin, J.; Sun, L.; Wu, Z.; Xu, Y. Multiscale Feature Enhancement Network Based on Large Kernel Attention Mechanism for Pansharpening; SSRN: Rochester, NY, USA, 2022. [Google Scholar]
Cai, X.; Lai, Q.; Wang, Y.; Wang, W.; Sun, Z.; Yao, Y. Poly Kernel Inception Network for Remote Sensing Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
Lai, S.J.; Cheung, T.H.; Fung, K.C.; Xue, K.W.; Lam, K.M. HAAT: Hybrid attention aggregation transformer for image super-resolution. arXiv 2025, arXiv:2411.18003. [Google Scholar]
Jie, H.; Li, S.; Gang, S.; Albanie, S.; Wu, E. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Xing, N.; Wang, J.; Wang, Y.; Ning, K.; Chen, F. Point Cloud Completion Based on Nonlocal Neural Networks with Adaptive Sampling. Inf. Technol. Control 2024, 53, 160–170. [Google Scholar] [CrossRef]
Lyn, J.; Yan, S. Non-local second-order attention network for single image super resolution. In Proceedings of the Machine Learning and Knowledge Extraction: 4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020, Dublin, Ireland, 25–28 August 2020; Proceedings 4. pp. 267–279. [Google Scholar]
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
Zhang, T.; Li, L.; Zhou, Y.; Liu, W.; Qian, C.; Hwang, J.N.; Ji, X. Cas-vit: Convolutional additive self-attention vision transformers for efficient mobile applications. arXiv 2024, arXiv:2408.03703. [Google Scholar]
Liu, J.; Zhang, W.; Tang, Y.; Tang, J.; Wu, G. Residual feature aggregation network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2359–2368. [Google Scholar]
Xu, Y.; Guo, T.; Wang, C. A remote sensing image super-resolution reconstruction model combining multiple attention mechanisms. Sensors 2024, 24, 4492. [Google Scholar] [CrossRef]
Zhu, Y.; Gei, C.; So, E. Image super-resolution with dense-sampling residual channel-spatial attention networks for multi-temporal remote sensing image classification. Int. J. Appl. Earth Obs. Geoinf. 2021, 104, 102543. [Google Scholar] [CrossRef]
Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient long-range attention network for image super-resolution. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 649–667. [Google Scholar]
Zeng, H.; Wu, Q.; Zhang, J.; Xia, H. Lightweight subpixel sampling network for image super-resolution. Vis. Comput. 2024, 40, 3781–3793. [Google Scholar] [CrossRef]
Chen, Q.; Wang, L.; Zhang, Z.; Wang, X.; Liu, W.; Xia, B.; Ding, H.; Zhang, J.; Xu, S.; Wang, X. Dual-path aggregation transformer network for super-resolution with images occlusions and variability. Eng. Appl. Artif. Intell. 2025, 139, 109535. [Google Scholar] [CrossRef]
Liu, B.; Ning, X.; Ma, S.; Yang, Y. Multi-scale dense spatially-adaptive residual distillation network for lightweight underwater image super-resolution. Front. Mar. Sci. 2024, 10, 1328436. [Google Scholar] [CrossRef]
Fan, S.; Song, T.; Li, P.; Jin, J.; Jin, G.; Zhu, Z. Dense-gated network for image super-resolution. Neural Process. Lett. 2023, 55, 11845–11861. [Google Scholar] [CrossRef]
Fang, J.; Chen, X.; Zhao, J.; Zeng, K. A scalable attention network for lightweight image super-resolution. J. King Saud. Univ. Comput. Inf. Sci. 2024, 36, 102185. [Google Scholar] [CrossRef]
Tong, T.; Li, G.; Liu, X.; Gao, Q. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4799–4807. [Google Scholar]
Zhang, Y.; Gao, F.; Zhou, K. Efficient Energy Disaggregation via Residual Learning-Based Depthwise Separable Convolutions and Segmented Inference. IEEE Trans. Ind. Inform. 2025, 21, 2224–2233. [Google Scholar] [CrossRef]
Yasir, M.; Ullah, I.; Choi, C. Depthwise channel attention network (DWCAN): An efficient and lightweight model for single image super-resolution and metaverse gaming. Expert Syst. 2024, 41, e13516. [Google Scholar] [CrossRef]
Chen, Y.; Wan, L.; Li, S.; Liao, L. AMF-SparseInst: Attention-guided Multi-Scale Feature Fusion Network Based on SparseInst. Inf. Technol. Control 2024, 53, 675–694. [Google Scholar]
Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar]
Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3147–3155. [Google Scholar]
Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
Bian, J.; Liu, Y.; Chen, J. Lightweight Super-Resolution Reconstruction Vision Transformers of Remote Sensing Image Based on Structural Re-Parameterization. Appl. Sci. 2024, 14, 917. [Google Scholar] [CrossRef]
Peng, G.; Xie, M.; Fang, L. Context-Aware Lightweight Remote-Sensing Image Super-Resolution Network. Front. Neurorobot. 2023, 17, 1220166. [Google Scholar] [CrossRef]
Lau, K.W.; Po, L.M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Syst. Appl. 2023, 236, 121352. [Google Scholar] [CrossRef]
Guo, M.H.; Lu, C.Z.; Liu, Z.N.; Cheng, M.M.; Hu, S.M. Visual attention network. Comput. Vis. Media 2023, 9, 733–752. [Google Scholar] [CrossRef]
Ponomarenko, N.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Carli, M. Modified image visual quality metrics for contrast change and mean shift accounting. In Proceedings of the 2011 11th International Conference: The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), Polyana, Ukraine, 23–25 February 2011; pp. 305–311. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Wang, Z.; Li, L.; Xue, Y.; Jiang, C.; Wang, J.; Sun, K.; Ma, H. FeNet: Feature enhancement network for lightweight remote-sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
Lei, S.; Shi, Z.; Zou, Z. Super-resolution for remote sensing images via local–global combined network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1243–1247. [Google Scholar] [CrossRef]
Haut, J.M.; Paoletti, M.E.; Fernández-Beltran, R.; Plaza, J.; Plaza, A.; Li, J. Remote sensing single-image superresolution based on a deep compendium model. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1432–1436. [Google Scholar] [CrossRef]
Lei, S.; Shi, Z. Hybrid-scale self-similarity exploitation for remote sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–10. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of the LSANet network.

Figure 2. Structure of the Residual Large Kernel Separable Feature Distillation Module (RLSKFDM).

Figure 3. Structure of different blocks. (a) LSKA in VAN. (b) RLSKAB.

Figure 4. SR reconstruction validation results produced using different methods on the UC Merced Land Use dataset.

Figure 5. SR reconstruction validation results produced using different methods on the NWPU-RESISC45 dataset.

Table 1. The comparison results were derived from the UC Merced Land Use remote sensing dataset. The top performing result is highlighted in bold.

Method	Scale	Params/K	M-Adds/G	PSNR	SSIM
SRCNN	×2	8	4.5	32.89	0.8975
CARN	×2	16	0.3	33.18	0.9196
IMDN	×2	694	21.3	32.19	0.8996
RFDN	×2	534	26.4	33.55	0.9032
DCM	×2	1842	30.2	33.52	0.9166
LGCNet	×2	193	12.7	33.65	0.9285
CTN	×2	349	4.2	33.50	0.9242
FeNet	×2	423	67.8	33.65	0.9278
HSENet	×2	5286	66.8	34.23	0.9332
LSANet	×2	527	19.5	34.33	0.9328
SRCNN	×3	8	4.5	28.66	0.8038
CARN	×3	17	0.8	29.09	0.8167
IMDN	×3	706	22.1	29.19	0.8367
RFDN	×3	584	28.7	29.47	0.8392
DCM	×3	2258	16.3	29.52	0.8396
LGCNet	×3	193	12.7	29.65	0.8239
CTN	×3	349	2.6	29.44	0.8319
FeNet	×3	430	69.2	29.85	0.8405
HSENet	×3	5370	70.8	29.98	0.8416
LSANet	×3	539	15.8	30.08	0.8446
SRCNN	×4	8	4.5	26.78	0.7219
CARN	×4	20	1.1	26.93	0.7267
IMDN	×4	721	22.5	27.10	0.7413
RFDN	×4	598	30.8	27.28	0.7505
DCM	×4	2175	13.0	27.22	0.7528
LGCNet	×4	193	12.7	27.02	0.7333
CTN	×4	360	2.6	27.41	0.7512
FeNet	×4	438	71.5	27.65	0.7596
HSENet	×4	5433	81.2	27.73	0.7623
LSANet	×4	558	13.6	27.79	0.7635

Table 2. The comparison results were derived from the NWPU-RESISC45 remote sensing dataset. The top performing result is highlighted in bold.

Method	Scale	Params/K	M-Adds/G	PSNR	SSIM
SRCNN	×2	57	0.52	30.21	0.8722
CARN	×2	1592	222.6	33.93	0.9203
IMDN	×2	694	141.2	33.72	0.9181
RFDN	×2	534	94.1	34.15	0.9215
DCM	×2	2341	387.2	33.54	0.9176
LGCNet	×2	193	12.7	33.82	0.9197
CTN	×2	1128	178.3	34.06	0.9224
FeNet	×2	423	67.8	33.32	0.9156
HSENet	×2	312	48.6	33.63	0.9187
LSANet	×2	602	22.1	35.02	0.9305
SRCNN	×3	57.2	0.53	27.84	0.8123
CARN	×3	1602	225.3	30.74	0.8681
IMDN	×3	702	143.7	30.57	0.8656
RFDN	×3	541	94.1	30.99	0.8714
DCM	×3	2355	390.6	30.49	0.8635
LGCNet	×3	193	12.7	30.62	0.8698
CTN	×3	1135	180.1	30.84	0.8721
FeNet	×3	430	69.2	30.22	0.8608
HSENet	×3	318	49.8	30.57	0.8668
LSANet	×3	605	22.3	31.24	0.8799
SRCNN	×4	57.3	0.54	26.13	0.7633
CARN	×4	1618	228.9	28.87	0.8275
IMDN	×4	715	28.65	28.74	0.7413
RFDN	×4	550	96.8	29.05	0.8317
DCM	×4	2378	395.1	28.54	0.8223
LGCNet	×4	193	12.7	28.74	0.8286
CTN	×4	1147	1.8	28.93	0.8332
FeNet	×4	438	71.5	28.35	0.8189
HSENet	×4	325	51.3	28.69	0.8242
LSANet	×4	609	22.7	29.29	0.8405

Table 3. Comparison of different feature extraction modules and different LSKA sizes.

Number	SRB	RLSKAB	LSKA Size	Params/K	FLOPS/G	PSNR	SSIM	Times/s
(a)	✓		–	550	26.3	25.99	1.29	–
(b)		✓	(3, 5)	334	15.4	26.19	0.7866	0.80
(c)		✓	(5, 5)	346	16.1	26.22	0.7875	0.86
(d)		✓	(5, 7)	365	17.1	26.25	0.7884	0.91
(e)		✓	(7, 7)	387	18.2	26.21	0.7871	0.94
(f)		✓	(7, 9)	396	19.3	26.23	0.7890	0.99

Table 4. Comparison of different feature processing modules and different input dimensions.

Number	CCA	CCA+LSKA	RFEM	Inp. Dim.	Params/K	FLOPS/G	PSNR	Times/s
(d)	✓			48	365	17.1	26.25	0.74
(g)		✓		48	414	19.2	26.28	0.82
(h)			✓	48	414	19.2	26.33	0.85
(i)			✓	52	463	21.4	26.39	0.90
(j)			✓	56	506	24.8	26.42	0.96
(k)			✓	60	594	29.6	26.47	1.02

Table 5. Comparison of different activation functions.

Number	Activation Function	Params/K	FLOPS/G	PSNR	SSIM
(l)	Hard Swish	558	13.6	27.79	0.7635
(m)	–	556	13.4	26.25	0.7584
(n)	ReLU	557	13.5	26.38	0.7599
(o)	Sigmoid	560	13.7	26.40	0.7611
(p)	ELU	557	13.5	26.47	0.7602
(q)	Tanh	561	13.9	27.40	0.7622

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yong, T.; Liu, X. LSANet: Lightweight Super Resolution via Large Separable Kernel Attention for Edge Remote Sensing. Appl. Sci. 2025, 15, 7497. https://doi.org/10.3390/app15137497

AMA Style

Yong T, Liu X. LSANet: Lightweight Super Resolution via Large Separable Kernel Attention for Edge Remote Sensing. Applied Sciences. 2025; 15(13):7497. https://doi.org/10.3390/app15137497

Chicago/Turabian Style

Yong, Tingting, and Xiaofang Liu. 2025. "LSANet: Lightweight Super Resolution via Large Separable Kernel Attention for Edge Remote Sensing" Applied Sciences 15, no. 13: 7497. https://doi.org/10.3390/app15137497

APA Style

Yong, T., & Liu, X. (2025). LSANet: Lightweight Super Resolution via Large Separable Kernel Attention for Edge Remote Sensing. Applied Sciences, 15(13), 7497. https://doi.org/10.3390/app15137497

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LSANet: Lightweight Super Resolution via Large Separable Kernel Attention for Edge Remote Sensing

Abstract

1. Introduction

2. Related Works

2.1. Attention Mechanism

2.2. Single Image Super Resolution

3. Method

3.1. Network Architecture

3.2. Residual Large Kernel Separable Feature Distillation Module (RLSKFDM)

3.3. Residual Large Kernel Separable Attention Block (RLSKAB)

3.4. Residual Feature Enhancement Module (RFEM)

4. Implementation Details

4.1. Experimental Configuration

4.2. Dataset

4.3. Quantitative Experimental Results

4.3.1. Quantitative Results Obtained on the UC Merced Land Use Dataset

4.3.2. Quantitative Results Obtained on the NWPU-RESISC45 Dataset

4.4. Ablation Study

4.4.1. Effectiveness of Feature Extraction Module

4.4.2. Effectiveness of the Feature Processing Part of the Module

4.4.3. Effectiveness of Hard Swish Activation Function

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI