LACE-Net: A Swin Transformer with Local Frequency-Domain Energy and Adaptive Contrast Enhancement for Fine-Grained Land Cover Classification
Abstract
1. Introduction
- (1)
- A novel network architecture, LACE-Net, is proposed by integrating local frequency domain energy with adaptive contrast enhancement. Employing the Swin Transformer as the backbone, the model innovatively embeds the Local Frequency-Domain Energy-Adaptive Contrast Enhancement Multi-Scale Attention (LACE) block, which incurs minimal parameter overhead. This design effectively mitigates key challenges such as the confusion of similar land cover categories, insufficient utilization of high-frequency textural information, and limited feature representation in low-contrast regions.
- (2)
- A texture-adaptive momentum adjustment mechanism is designed to enhance feature discriminability. By leveraging the physical priors extracted from the local frequency-domain energy and contrast enhancement branches, this mechanism utilizes dynamically adjusted momentum coefficients to adaptively optimize the weight distribution of spatial attention. This approach significantly improves the model’s ability to distinguish complex textures.
- (3)
- Systematic experimental validation is conducted on both a self-constructed regional dataset and a public benchmark. Through field data collection and rigorous screening, this study has developed the Guangxi Regional Dataset (GLC-30), which includes 30 categories of fine-grained land features. Experimental results on the self-constructed Guangxi regional datasets (GLC-30) and NWPU-RESISC45 datasets demonstrate that LACE-Net achieves higher classification accuracy than mainstream vision models, including ResNet, EfficientNet, and the original Swin Transformer. These findings confirm the effectiveness and technical superiority of the proposed model in fine-grained land cover classification tasks.
2. Related Work
2.1. Traditional LULC Classification Research
2.2. Research on LULC Classification Based on Deep Learning
3. Methodology
3.1. System Overview
- (1)
- Utilizing the Swin Transformer as the backbone, a hierarchical feature extraction path is constructed through its shifted window mechanism. This allows the model to capture multi-scale spatial semantics ranging from local details to global contexts.
- (2)
- To address the inherent insensitivity of Transformer architectures to high-frequency information, the LACE block is embedded between Stage 2 and Stage 3 of the backbone network. Serving as a physical-aware corrector, this module guides feature reconstruction by decoupling frequency-domain textural energy from spatial-domain contrast features, thereby incorporating physical priors into the learning process.
- (3)
- Feature maps calibrated through both frequency and spatial domains are fed into subsequent Transformer blocks for high-level semantic abstraction. Finally, the land cover recognition results are generated via Global Average Pooling (GAP) and a linear classification head.
3.2. LACE Block
3.2.1. SEA Branch
3.2.2. Physics-Based Frequency-Domain Energy Sensing
3.2.3. Texture-Adaptive Momentum Adjustment Mechanism
4. Experiments
4.1. Dataset
4.2. Experimental Environment and Setup
4.3. Evaluation Metrics
4.4. Quantitative Comparison Experiment
4.5. Melting Experiments and Analysis
4.5.1. Analysis of the Contributions of Each Component in LACE
4.5.2. Analysis of Hierarchical Synergistic Effects at the LACE Embedding Location
4.5.3. Comparison of Local Frequency Operators in Different Directions
4.5.4. Performance Comparison of Frequency Kernels of Different Sizes
4.5.5. Validation of the Dynamic Momentum Strategy Based on Texture Complexity
4.5.6. Confusion Matrix Analysis
4.6. Visual Analytics
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Guo, H.; Liang, D.; Sun, Z.; Chen, F.; Wang, X.; Li, J.; Zhu, L.; Bian, J.; Wei, Y.; Huang, L.; et al. Measuring and evaluating SDG indicators with Big Earth Data. Sci. Bull. 2022, 67, 1792–1801. [Google Scholar] [CrossRef] [PubMed]
- Qin, R.; Liu, T. A review of landcover classification with very-high resolution remotely sensed optical images—Analysis unit, model scalability and transferability. Remote Sens. 2022, 14, 646. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G.; Cao, L.; Liu, G. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Paheding, S.; Saleem, A.; Siddiqui, M.F.H.; Rawashdeh, N.; Essa, A.; Reyes, A.A. Advancing horizons in remote sensing: A comprehensive survey of deep learning models and applications in image classification and beyond. Neural Comput. Appl. 2024, 36, 16727–16767. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. 2022, 54, 200. [Google Scholar] [CrossRef]
- Sun, X.; Wang, P.; Yan, Z.; Xu, F.; Wang, R.; Diao, W.; Chen, J.; Li, J.; Feng, Y.; Xu, T.; et al. FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2022, 184, 116–130. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Mahmon, N.A.; Ya’acob, N.; Yusof, A.L. Differences of image classification techniques for land use and land cover classification. In Proceedings of the 2015 IEEE 11th International Colloquium on Signal Processing & Its Applications (CSPA), Kuala Lumpur, Malaysia, 6–8 March 2015; pp. 90–94. [Google Scholar]
- Maxwell, A.E.; Warner, T.A.; Fang, F. Implementation of machine-learning classification in remote sensing: An applied review. Int. J. Remote Sens. 2018, 39, 2784–2817. [Google Scholar] [CrossRef]
- You, H.; Huang, Y.; Qin, Z.; Chen, J.; Liu, Y. Forest tree species classification based on Sentinel-2 images and auxiliary data. Forests 2022, 13, 1416. [Google Scholar] [CrossRef]
- Maulani, Y.; Surendro, K. Detailed land use classification model based on vegetation indices and texture features. Remote Sens. Appl. Soc. Environ. 2025, 40, 101786. [Google Scholar] [CrossRef]
- Hossain, M.D.; Chen, D. Segmentation for object-based image analysis (OBIA): A review of algorithms and challenges from remote sensing perspective. ISPRS J. Photogramm. Remote Sens. 2019, 150, 115–134. [Google Scholar] [CrossRef]
- Le, M.T.; Tran, K.H.; Dao, P.D.; El-Askary, H.; Ha, T.V.; Park, T. High spatial resolution crop type and land use land cover classification without labels: A framework using multi-temporal PlanetScope images and variational Bayesian Gaussian mixture model. Sci. Remote Sens. 2025, 12, 100264. [Google Scholar] [CrossRef]
- Hermosilla, T.; Wulder, M.A.; White, J.C.; Coops, N.C. Land cover classification in an era of big and open data: Optimizing localized implementation and training data selection to improve mapping outcomes. Remote Sens. Environ. 2022, 268, 112780. [Google Scholar] [CrossRef]
- Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
- Wang, X.; Zhang, X.; Su, C. Land use classification of remote sensing images based on multi-scale learning and deep convolution neural network. J. Zhejiang Univ. Sci. Ed. 2020, 47, 715–723. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Jiang, W.; Pan, J.; Yue, X. Feature fusion classification for optical image and SAR image based on spatial-spectral attention. J. Electron. Inf. Technol. 2023, 45, 987–995. [Google Scholar]
- Karishma, S.; Anitha, V.; Kalaiselvi, S.; Manimaran, V. Enhancing land use and land cover classification in satellite imagery using vision transformers: A comparative analysis with convolutional neural networks. In Proceedings of the 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 5–7 January 2025; pp. 1613–1618. [Google Scholar]
- Dosovitskiy, A. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. Available online: https://arxiv.org/abs/2010.11929 (accessed on 2 April 2026).
- Shailaja, P.; Kumar, P.M.; Nikhitha, N.; Reddy, K.N.K.; Reddy, E.M.; Reddy, G.G.; Indu, V. LCC-Net: Swin Transformer-CNN hybrid for enhanced land cover classification in natural disaster monitoring. Syst. Soft Comput. 2025, 7, 200303. [Google Scholar] [CrossRef]
- Ji, R.; Tan, K.; Wang, X.; Jiao, L.; Wang, L. PatchOut: A novel patch-free approach based on a transformer-CNN hybrid framework for fine-grained land-cover classification on large-scale airborne hyperspectral images. Int. J. Appl. Earth Obs. Geoinf. 2025, 138, 104457. [Google Scholar] [CrossRef]
- Fan, Y.; Zhang, D.; Li, J.; Xiao, J. A focusing-attention deformable convolution and transformer network with multi-scale contour-render for land cover classification in high-resolution remote-sensing images. Eng. Appl. Artif. Intell. 2025, 160, 111949. [Google Scholar] [CrossRef]
- Zhang, Z.; Shu, D.; Liao, C.; Liu, C.; Zhao, Y.; Wang, R.; Huang, X.; Zhang, M.; Gong, J. FlexiSAM: A flexible SAM-based semantic segmentation model for land cover classification using high-resolution multimodal remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2025, 227, 594–612. [Google Scholar] [CrossRef]
- Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5511815. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Tan, M.; Le, Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 11976–11986. [Google Scholar]
- Yang, C.; Chen, M.; Xiong, Z.; Yuan, Y.; Wang, Q. Cm-net: Concentric mask based arbitrary-shaped text detection. IEEE Trans. Image Process. 2022, 31, 2864–2877. [Google Scholar] [CrossRef]
- Yang, C.; Chen, M.; Yuan, Y.; Wang, Q. Zoom text detector. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 15745–15757. [Google Scholar] [CrossRef]








| Methods | GLC-30 | NWPU-RESISC45 | ||||||
|---|---|---|---|---|---|---|---|---|
| Top-1 Acc (%) | mPre (%) | mRec (%) | mF1 (%) | Top-1 Acc (%) | mPre (%) | mRec (%) | mF1 (%) | |
| ResNet-152 | 94.10 | 89.15 | 89.29 | 89.03 | 95.63 | 95.74 | 95.63 | 95.64 |
| ConvNeXt-Base | 95.47 | 90.91 | 90.01 | 90.31 | 96.84 | 96.87 | 96.84 | 96.84 |
| EfficientNet-B4 (380 × 380) | 95.84 | 92.19 | 92.08 | 92.10 | 96.98 | 97.02 | 96.98 | 96.98 |
| Vit-B/16 | 94.17 | 90.21 | 89.26 | 89.37 | 96.37 | 96.40 | 96.37 | 96.37 |
| Swin-B | 96.23 | 92.02 | 92.62 | 92.26 | 97.16 | 97.17 | 97.16 | 97.15 |
| LACE-Net | 96.48 | 93.81 | 92.72 | 93.13 | 97.32 | 97.34 | 97.32 | 97.31 |
| Methods | Flops (G) | Parameters (M) |
|---|---|---|
| ResNet-152 | 11.58 | 58.21 |
| ConvNeXt-Base | 15.36 | 87.6 |
| EfficientNet-B4 | 4.51 | 17.6 |
| Vit-B/16 | 16.86 | 88.25 |
| Swin-B | 15.14 | 87.28 |
| LACE-Net | 15.17 | 87.34 |
| Methods | Top-1 Acc (%) | mPre (%) | mRec (%) | mF1 (%) |
|---|---|---|---|---|
| Swin Transformer | 96.23 | 92.02 | 92.62 | 92.26 |
| Swin Transformer + EMA | 95.89 | 92.25 | 92.32 | 92.26 |
| Swin Transformer + SEA | 96.14 | 93.44 | 92.45 | 92.81 |
| Swin Transformer + CEB | 96.30 | 93.71 | 92.56 | 92.99 |
| Swin Transformer + LFE | 96.18 | 92.94 | 92.54 | 92.58 |
| LACE-Net | 96.48 | 93.81 | 92.72 | 93.13 |
| Insertion Point | Top-1 Acc (%) | mPre (%) | mRec (%) | mF1 (%) |
|---|---|---|---|---|
| Patch Partition-Stage 1 | 96.00 | 91.35 | 92.47 | 91.85 |
| Stage 1–2 | 96.32 | 92.59 | 92.61 | 92.58 |
| Stage 2–3 | 96.48 | 93.81 | 92.72 | 93.13 |
| Stage 3–4 | 96.27 | 92.01 | 92.60 | 92.25 |
| Stage4-GAP | 96.07 | 92.33 | 92.33 | 92.30 |
| Configuration | Top-1 Acc (%) | mPre (%) | mRec (%) | mF1 (%) |
|---|---|---|---|---|
| 45° + 135° | 96.64 | 92.50 | 92.87 | 92.66 |
| 0° + 90° + 45° + 135° | 96.05 | 93.43 | 92.43 | 92.80 |
| 0° + 90° | 96.48 | 93.81 | 92.72 | 93.13 |
| Kernel Size | Top-1 Acc (%) | mPre (%) | mRec (%) | mF1 (%) |
|---|---|---|---|---|
| (1 × 3) & (3 × 1) | 96.21 | 92.56 | 92.45 | 92.48 |
| (1 × 5) & (5 × 1) | 96.48 | 93.81 | 92.72 | 93.13 |
| (1 × 7) & (7 × 1) | 96.07 | 93.42 | 92.42 | 92.78 |
| m | Top-1 Acc (%) | mPre (%) | mRec (%) | mF1 (%) |
|---|---|---|---|---|
| Static 0.9 | 96.00 | 93.07 | 92.37 | 92.58 |
| Static 0.75 | 96.41 | 93.79 | 92.63 | 93.08 |
| Static 0.6 | 96.16 | 92.45 | 92.39 | 92.39 |
| Dynamic [0.6, 0.9] | 96.48 | 93.81 | 92.72 | 93.13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tan, Y.; Chen, G.; Huang, Y.; Ye, H.; Tang, J. LACE-Net: A Swin Transformer with Local Frequency-Domain Energy and Adaptive Contrast Enhancement for Fine-Grained Land Cover Classification. Computers 2026, 15, 281. https://doi.org/10.3390/computers15050281
Tan Y, Chen G, Huang Y, Ye H, Tang J. LACE-Net: A Swin Transformer with Local Frequency-Domain Energy and Adaptive Contrast Enhancement for Fine-Grained Land Cover Classification. Computers. 2026; 15(5):281. https://doi.org/10.3390/computers15050281
Chicago/Turabian StyleTan, Yongmei, Gong Chen, Yan Huang, Hengzhou Ye, and Jincheng Tang. 2026. "LACE-Net: A Swin Transformer with Local Frequency-Domain Energy and Adaptive Contrast Enhancement for Fine-Grained Land Cover Classification" Computers 15, no. 5: 281. https://doi.org/10.3390/computers15050281
APA StyleTan, Y., Chen, G., Huang, Y., Ye, H., & Tang, J. (2026). LACE-Net: A Swin Transformer with Local Frequency-Domain Energy and Adaptive Contrast Enhancement for Fine-Grained Land Cover Classification. Computers, 15(5), 281. https://doi.org/10.3390/computers15050281

