SRCT: Structure-Preserving Method for Sub-Meter Remote Sensing Image Super-Resolution
Abstract
1. Introduction
- (1)
- Structure Encoder (SE): A multi-level cascaded convolutional and spatial attention module is designed to effectively extract and preserve key structural information from the original image, while capturing both macro structural features and local texture patterns.
- (2)
- Structure Guidance Module (SGM): A two-stage injection strategy is implemented, including feature initialization guidance and deep feature fusion. Through attention-enhanced hierarchical transmission of structural features, structural information is precisely injected into all levels of the super-resolution network, effectively alleviating the information bottleneck effect in Transformer models and ensuring the integrity of complex ground object structures.
- (3)
- Dual-Branch Residual Dense Group (DBRDG) for Remote Sensing Characteristics: A dual-branch structure is innovatively designed with window-based multi-head self-attention in the main path and lightweight convolution in the residual path. This effectively balances the reconstruction needs of regular geometric structures and irregular textures in remote sensing images, achieving collaborative optimization of global structure modeling and local texture preservation. It is particularly suitable for handling the complex and diverse ground object types in remote sensing images.
2. Methods
2.1. Overview
2.2. Structure Encoder (SE)
2.3. Structure Guidance Module (SGM)
2.4. Super-Resolution Reconstruction Network SGCT
| Algorithm 1 Training and inference pipeline of SRCT |
| Input: HR image set , LR image set , structure encoder SE, structure guidance mechanism SGM, backbone SGCT, loss weights Output: Trained SRCT model; SR prediction for given LR input 1 Initialize SE, SGM and SGCT parameters 2 for epoch = 1 …total_epochs do 3 for batch (xHR, xLR) in (XHR, XLR) do 4 // Phase 1: Structure encoding 5 fSE = SE(xHR) 6 Farch = gSA(fSE) 7 // Phase 2: Structure guidance 8 (E1x, E2x, E3x) = SGM(Farch) 9 Inject E1x, E2x, E3x =⇒ FSRCT1, FSRCT3, FSRCT5 10 // Phase 3: Structure-guided SR reconstruction 11 F0 = Conv3 × 3(xLR) 12 Fdeep = SGRCT(F0, E1x, E2x, E3x) 13 xSR = Upsample(Fdeep) 14 // Loss computation 15 Lrec = ||xSR − xHR||1 16 Lper = VGG(xSR, xHR) 17 Lstr = ||SE(xSR) − SE(xHR)||1 18 Ltotal = Lrec + per · Lper + str · Lstr 19 Backpropagate Ltotal, update parameters of SE, SGM, SGCT 20 end for 21 end for 22 // Inference 23 Given unseen LR image xLR* 24 fSE* = SE(HR_ref) or skip if unavailable 25 (E1*, E2*, E3*) = SGM(fSE*) or cached guidance 26 xSR* = SGRCT(xLR*, E1*, E2*, E3*) 27 return xSR* |
3. Experiment and Results
3.1. Dataset
3.2. Experimental Details
3.3. Evaluation Metrics
3.4. Comparison with Other Models
3.5. Ablation Study
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jiao, L.; Huang, Z.; Liu, X.; Yang, Y.; Ma, M.; Zhao, J.; You, C.; Hou, B.; Yang, S.; Liu, F.; et al. Brain-inspired remote sensing interpretation: A comprehensive survey. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 2992–3033. [Google Scholar] [CrossRef]
- Li, D.; Wang, M.; Jiang, J. China’s high-resolution optical remote sensing satellites and their mapping applications. Geo-Spat. Inf. Sci. 2021, 24, 85–94. [Google Scholar] [CrossRef]
- Neyns, R.; Canters, F. Mapping of urban vegetation with high-resolution remote sensing: A review. Remote Sens. 2022, 14, 1031. [Google Scholar] [CrossRef]
- Zhang, Z.; Xu, H.; Lin, S.; Li, D.; Gao, Y. S2Transformer: Exploring Sparsity in Remote Sensing Images for Efficient Super-Resolution. Sensors 2025, 25, 5643. [Google Scholar] [CrossRef] [PubMed]
- Yu, J.; Lin, C.; Peng, L.; Zhong, C.; Li, H. MSFANet: A Multi-Scale Feature Fusion Transformer with Hybrid Attention for Remote Sensing Image Super-Resolution. Sensors 2025, 25, 6729. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 391–407. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3147–3155. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Zhang, K.; Zuo, W.; Zhang, L. Learning a single convolutional super-resolution network for multiple degradations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3262–3271. [Google Scholar]
- Guo, Y.; Chen, J.; Wang, J.; Chen, Q.; Cao, J.; Deng, Z.; Xu, Y.; Tan, M. Closed-loop matters: Dual regression networks for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 5407–5416. [Google Scholar]
- Lee, J.; Jin, K.H. Local texture estimator for implicit representation function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1929–1938. [Google Scholar]
- Wang, L.; Dong, X.; Wang, Y.; Ying, X.; Lin, Z.; An, W.; Guo, Y. Exploring sparsity in image super-resolution for efficient inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4917–4926. [Google Scholar]
- Wang, J.; Wang, B.; Wang, X.; Zhao, Y.; Long, T. Hybrid attention-based U-shaped network for remote sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5612515. [Google Scholar] [CrossRef]
- Chen, X.; Wu, Y.; Lu, T.; Kong, Q.; Wang, J.; Wang, Y. Remote sensing image super-resolution with residual split attention mechanism. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 1–13. [Google Scholar] [CrossRef]
- Wang, Y.; Huang, Z.; Wang, X.; Zhang, S.; Liu, S.; Feng, L. Lightweight Edge-Guided Super-Resolution Network for Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5626714. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Park, S.J.; Son, H.; Cho, S.; Hong, K.S.; Lee, S. Srfeat: Single image super-resolution with feature discrimination. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 439–455. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zhang, W.; Liu, Y.; Dong, C.; Qiao, Y. Ranksrgan: Generative adversarial networks with ranker for image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3096–3105. [Google Scholar]
- Ma, C.; Rao, Y.; Cheng, Y.; Chen, C.; Lu, J.; Zhou, J. Structure-preserving super resolution with gradient guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7769–7778. [Google Scholar]
- Lei, S.; Shi, Z.; Zou, Z. Coupled adversarial training for remote sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3633–3643. [Google Scholar] [CrossRef]
- Ma, W.; Pan, Z.; Guo, J.; Lei, B. Super-resolution of remote sensing images based on transferred generative adversarial network. In Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 1148–1151. [Google Scholar]
- Jia, S.; Wang, Z.; Li, Q.; Jia, X.; Xu, M. Multiattention generative adversarial network for remote sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5624715. [Google Scholar] [CrossRef]
- Dong, R.; Zhang, L.; Fu, H. RRSGAN: Reference-based super-resolution for remote sensing image. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5601117. [Google Scholar] [CrossRef]
- Korkmaz, C.; Tekalp, A.M.; Dogan, Z. Training generative image super-resolution models by wavelet-domain losses enables better control of artifacts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5926–5936. [Google Scholar]
- Tu, J.; Mei, G.; Ma, Z.; Piccialli, F. SWCGAN: Generative adversarial network combining swin transformer and CNN for remote sensing image super-resolution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5662–5673. [Google Scholar] [CrossRef]
- Wang, C.; Zhang, X.; Yang, W.; Wang, G.; Li, X.; Wang, J.; Lu, B. MSWAGAN: Multispectral remote sensing image super-resolution based on multiscale window attention transformer. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5404715. [Google Scholar] [CrossRef]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 1833–1844. [Google Scholar]
- Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12299–12310. [Google Scholar]
- Liu, X.; Liu, J.; Tang, J.; Wu, G. CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 20–25 June 2025; pp. 17902–17912. [Google Scholar]
- Kang, Y.; Wang, X.; Zhang, X.; Wang, S.; Jin, G. ACT-SR: Aggregation Connection Transformer for Remote Sensing Image Super-Resolution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 8953–8964. [Google Scholar] [CrossRef]
- Xiao, Y.; Yuan, Q.; Jiang, K.; He, J.; Lin, C.W.; Zhang, L. TTST: A top-k token selective transformer for remote sensing image super-resolution. IEEE Trans. Image Process. 2024, 33, 738–752. [Google Scholar] [CrossRef]
- Shi, L.; Cheng, Y.; Li, R.; Wang, H.; Zhao, J.; Qiang, Y.; Zhao, J. Dual-Domain Optimization Model Based on Discrete Fourier Transform and Frequency Domain Fusion for Remote Sensing Single-image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2025, 63, 3000215. [Google Scholar] [CrossRef]
- Xiao, Y.; Yuan, Q.; Jiang, K.; Chen, Y.; Zhang, Q.; Lin, C.W. Frequency-assisted mamba for remote sensing image super-resolution. IEEE Trans. Multimed. 2024, 27, 1783–1796. [Google Scholar] [CrossRef]
- Wang, C.; Sun, W. Controllable Reference-Based Real-World Remote Sensing Image Super-Resolution with Generative Diffusion Priors. arXiv 2025, arXiv:2506.23801. [Google Scholar]
- Sun, K.; Tian, Y. Dbfnet: A dual-branch fusion network for underwater image enhancement. Remote Sens. 2023, 15, 1195. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Ding, K.; Ma, K.; Wang, S.; Simoncelli, E.P. Image quality assessment: Unifying structure and texture similarity. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2567–2581. [Google Scholar] [CrossRef]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
- Zhang, L.; Zhang, L.; Bovik, A.C. A feature-enriched completely blind image quality evaluator. IEEE Trans. Image Process. 2015, 24, 2579–2591. [Google Scholar] [CrossRef]
- Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5148–5157. [Google Scholar]
- Yang, S.; Wu, T.; Shi, S.; Lao, S.; Gong, Y.; Cao, M.; Wang, J.; Yang, Y. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1191–1200. [Google Scholar]
- Wang, J.; Chan, K.C.; Loy, C.C. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 2555–2563. [Google Scholar]
- Zhao, J.; Ma, Y.; Chen, F.; Shang, E.; Yao, W.; Zhang, S.; Yang, J. SA-GAN: A second order attention generator adversarial network with region aware strategy for real satellite images super resolution reconstruction. Remote Sens. 2023, 15, 1391. [Google Scholar] [CrossRef]
- Mao, Y.; He, G.; Wang, G.; Yin, R.; Peng, Y.; Guan, B. DESAT: A Distance-Enhanced Strip Attention Transformer for Remote Sensing Image Super-Resolution. Remote Sens. 2024, 16, 4251. [Google Scholar] [CrossRef]
- Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning convolutional neural networks for resource efficient inference. arXiv 2016, arXiv:1611.06440. [Google Scholar]










| Satellite | PAN Spatial Resolution/m | Multispectral Spatial Resolution/m |
|---|---|---|
| GF Multi-Mode | 0.5 | 2 |
| GF-1 | 2 | 8 |
| GF-6 | 2 | 8 |
| Parameter | Value |
|---|---|
| Batch size | 4 |
| Training patch size | |
| Iterations | 300,000 |
| Learning rate | |
| Optimizer | Adam, |
| Data augmentation | Random rotation, horizontal flip |
| GPU | NVIDIA RTX 3090 |
| Metrics | Formula | Description |
|---|---|---|
| PSNR | A higher PSNR value indicates that the difference in pixel values between the reconstructed image and the original image is smaller. | |
| SSIM | A higher SSIM value shows that the two images are more similar in terms of structural information. | |
| LPIPS | A lower LPIPS value means that the two images are more similar in terms of human-perceived visual quality. | |
| DISTS | A higher DISTS value represents a higher similarity between the images in terms of structure- and texture-related dimensions. | |
| NIQE | A lower NIQE value indicates that the statistical properties of the image are closer to the statistical laws of typical natural images. | |
| FID | A lower FID means the features of the generated/processed image are more similar to the real/reference image. | |
| MUSIQ | A higher MUSIQ score shows that the image has fewer visual flaws like artifacts and clearer details. | |
| MANIQA | A higher MANIQA score means the image has a more coherent structure, richer details, and fewer distortions. | |
| CLIPIQA | A higher CLIPIQA score shows that the image has more qualities that match what humans consider a “high-quality image”. |
| Type | Method | PSNR ↑ | SSIM ↑ | LPIPS ↓ | DISTS ↓ | FID ↓ | NIQE ↓ | CLIPIQA ↑ | MUSIQ ↑ | MANIQA ↑ |
|---|---|---|---|---|---|---|---|---|---|---|
| CNN-based | EDSR | 20.809 | 0.468 | 0.2721 | 0.503 | 96.402 | 7.208 | 0.1857 | 29.492 | 0.145 |
| HAUNet | 22.746 | 0.5534 | 0.2598 | 0.469 | 86.810 | 6.963 | 0.2629 | 36.915 | 0.1682 | |
| GAN-based | SPSR | 21.0137 | 0.4926 | 0.267 | 0.4924 | 102.810 | 6.743 | 0.2915 | 35.665 | 0.171 |
| WGSR | 23.974 | 0.6402 | 0.3097 | 0.1959 | 84.097 | 6.432 | 0.322 | 42.891 | 0.1873 | |
| Mamba-based | FreMamba | 22.994 | 0.5987 | 0.3074 | 0.198 | 98.523 | 5.9749 | 0.2681 | 40.618 | 0.1799 |
| Transformer-based | ACT-SR | 24.835 | 0.6651 | 0.2768 | 0.4094 | 96.471 | 8.454 | 0.2349 | 36.196 | 0.1665 |
| CATANET | 24.734 | 0.6622 | 0.5313 | 0.3021 | 94.945 | 8.285 | 0.159 | 25.092 | 0.155 | |
| Ours | 24.821 | 0.6516 | 0.2594 | 0.1625 | 77.301 | 5.833 | 0.3776 | 42.725 | 0.2027 |
| Base | DBRDG | SE | SGM | PSNR ↑ | SSIM ↑ | LPIPS ↓ | DISTS ↓ | MANIQA ↑ |
|---|---|---|---|---|---|---|---|---|
| 🗸 | × | × | × | 23.651 | 0.6143 | 0.2748 | 0.206 | 0.1524 |
| 🗸 | 🗸 | × | × | 24.2235 | 0.6312 | 0.2646 | 0.2123 | 0.1654 |
| 🗸 | 🗸 | 🗸 | × | 24.8915 | 0.6423 | 0.2745 | 0.1732 | 0.1956 |
| 🗸 | 🗸 | 🗸 | 🗸 | 24.821 | 0.6516 | 0.2594 | 0.1625 | 0.2027 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Gao, T.; Zhang, S.; Yao, W.; Shang, E.; Yang, J.; Ma, Y.; Ma, Y. SRCT: Structure-Preserving Method for Sub-Meter Remote Sensing Image Super-Resolution. Sensors 2026, 26, 733. https://doi.org/10.3390/s26020733
Gao T, Zhang S, Yao W, Shang E, Yang J, Ma Y, Ma Y. SRCT: Structure-Preserving Method for Sub-Meter Remote Sensing Image Super-Resolution. Sensors. 2026; 26(2):733. https://doi.org/10.3390/s26020733
Chicago/Turabian StyleGao, Tianxiong, Shuyan Zhang, Wutao Yao, Erping Shang, Jin Yang, Yong Ma, and Yan Ma. 2026. "SRCT: Structure-Preserving Method for Sub-Meter Remote Sensing Image Super-Resolution" Sensors 26, no. 2: 733. https://doi.org/10.3390/s26020733
APA StyleGao, T., Zhang, S., Yao, W., Shang, E., Yang, J., Ma, Y., & Ma, Y. (2026). SRCT: Structure-Preserving Method for Sub-Meter Remote Sensing Image Super-Resolution. Sensors, 26(2), 733. https://doi.org/10.3390/s26020733
