MSFANet: A Multi-Scale Feature Fusion Transformer with Hybrid Attention for Remote Sensing Image Super-Resolution

Yu, Jie; Lin, Chengcheng; Peng, Luyao; Zhong, Cheng; Li, Hui

doi:10.3390/s25216729

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

MSFANet: A Multi-Scale Feature Fusion Transformer with Hybrid Attention for Remote Sensing Image Super-Resolution

by

Jie Yu

¹

,

Chengcheng Lin

¹,

Luyao Peng

¹,

Cheng Zhong

¹ and

Hui Li

^2,*

¹

Badong National Observation and Research Station of Geohazards, China University of Geosciences, Wuhan 430074, China

²

The School of Earth Sciences, China University of Geosciences, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(21), 6729; https://doi.org/10.3390/s25216729

Submission received: 21 September 2025 / Revised: 20 October 2025 / Accepted: 1 November 2025 / Published: 3 November 2025

(This article belongs to the Section Remote Sensors)

Download Versions Notes

Abstract

To address the issue of insufficient resolution in remote sensing images due to limitations in sensors and transmission, this paper proposes a multi-scale feature fusion model, MSFANet, based on the Swin Transformer architecture for remote sensing image super-resolution reconstruction. The model comprises three main modules: shallow feature extraction, deep feature extraction, and high-quality image reconstruction. The deep feature extraction module innovatively introduces three core components: Feature Refinement Augmentation (FRA), Local Structure Optimization (LSO), and Residual Fusion Network (RFN), which effectively extract and adaptively aggregate multi-scale information from local to global levels. Experiments conducted on three public remote sensing datasets (RSSCN7, AID, and WHU-RS19) demonstrate that MSFANet outperforms state-of-the-art models (including HSENet and TransENet) across five evaluation metrics in ×2, ×3, and ×4 super-resolution tasks. Furthermore, MSFANet achieves superior reconstruction quality with reduced computational overhead, striking an optimal balance between efficiency and performance. This positions MSFANet as an effective solution for remote sensing image super-resolution applications.

Keywords: remote sensing; super-resolution reconstruction; Swin Transformer; deep learning; attention mechanism

Share and Cite

MDPI and ACS Style

Yu, J.; Lin, C.; Peng, L.; Zhong, C.; Li, H. MSFANet: A Multi-Scale Feature Fusion Transformer with Hybrid Attention for Remote Sensing Image Super-Resolution. Sensors 2025, 25, 6729. https://doi.org/10.3390/s25216729

AMA Style

Yu J, Lin C, Peng L, Zhong C, Li H. MSFANet: A Multi-Scale Feature Fusion Transformer with Hybrid Attention for Remote Sensing Image Super-Resolution. Sensors. 2025; 25(21):6729. https://doi.org/10.3390/s25216729

Chicago/Turabian Style

Yu, Jie, Chengcheng Lin, Luyao Peng, Cheng Zhong, and Hui Li. 2025. "MSFANet: A Multi-Scale Feature Fusion Transformer with Hybrid Attention for Remote Sensing Image Super-Resolution" Sensors 25, no. 21: 6729. https://doi.org/10.3390/s25216729

APA Style

Yu, J., Lin, C., Peng, L., Zhong, C., & Li, H. (2025). MSFANet: A Multi-Scale Feature Fusion Transformer with Hybrid Attention for Remote Sensing Image Super-Resolution. Sensors, 25(21), 6729. https://doi.org/10.3390/s25216729

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MSFANet: A Multi-Scale Feature Fusion Transformer with Hybrid Attention for Remote Sensing Image Super-Resolution

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI