Language-Guided Contrastive Learning and Difference Enhancement for Semantic Change Detection in Remote Sensing Images
Highlights
- We propose LGDENet, a lightweight framework that unifies Language-Guided Contrastive Learning with a difference enhancement mechanism.
- The method achieves state-of-the-art accuracy on the SECOND and Landsat-SCD datasets while maintaining high computational efficiency compared to foundation model-based approaches.
- The language-guided strategy aligns visual features with text prompts, effectively resolving directional semantic ambiguities in “from–to” transitions.
- The Difference Enhancement Module (DEM) and the hybrid encoder decouples spatial and channel information to adaptively suppress pseudo-change noise, such as registration errors.
Abstract
1. Introduction
- We propose a Language-Guided Contrastive Learning framework specifically for SCD. Unlike static scene classification models, our method aligns visual difference vectors with text prompts representing temporal transitions, effectively resolving semantic ambiguity in complex change scenarios.
- We design a Difference Enhancement Module (DEM) that utilizes the channel–spatial decoupling property of DSConv to suppress irrelevant variations (e.g., registration noise) while enhancing genuine semantic changes.
- A hybrid encoder architecture is proposed to effectively balance local feature refinement and global semantic reasoning by combining the asymmetric convolution, the Squeeze-and-Excitation (SE) module, and the hierarchical Swin Transformer.
2. Materials and Methods
2.1. Proposed LGDENet Framework
2.2. Image Encoder
2.2.1. Asymmetric Convolution Stem
2.2.2. Hierarchical Swin Transformer Backbone
2.3. Difference Enhancement Module
2.4. Language-Guided Contrastive Learning
2.4.1. Text Prompts and Encoder
This formulation ensures that each text query encodes not only the categories involved but also their transition relationship, providing richer semantic priors for alignment with visual difference features. For example, “a change from farmland to residential area” explicitly embeds the direction and type of change, which assists the model in discriminating visually similar but semantically distinct transitions.“A remote sensing image shows a change from [classbefore] to [classafter].”
This dual prompt strategy allows the model to cover both changed and unchanged cases within a unified contrastive framework. The combination of change-oriented and unchanged textual descriptions provides complementary semantic information, which will guild the later contrastive learning to simultaneously reduce intra-class distance and enlarge inter-class separation across modalities.“A remote sensing image remains [classunchanged].”
2.4.2. Language-Guided Multi-Modal Contrastive Learning
2.5. Datasets
- SECOND: This dataset contains 4662 pairs of multi-temporal, high-resolution RS images collected from major urban areas—principally Shanghai, Chengdu, and Hangzhou. Image spatial resolution spans 0.5–3 m, and all tiles are uniformly cropped to pixels for consistent analysis. Land cover is annotated using one “no change” label together with six semantic classes: water bodies, non-vegetated surfaces, low vegetation, trees, buildings, and sports fields. This labeling scheme supplies diverse context, enabling comprehensive semantic change detection. In the publicly available release, the split comprises 2968 pairs for training and 1694 pairs for testing, which supports robust model development and evaluation across the aforementioned land cover types.
- Landsat-SCD: This dataset is constructed from Landsat satellite imagery acquired between 1990 and 2020, with a geographic focus on Tumushuke in Xinjiang, China, located along the margin of the Taklamakan Desert. Each bitemporal pair has a spatial resolution of 30 m and is annotated with one “no change” label together with four land cover classes—farmland, desert, buildings, and water bodies—where only regions exhibiting change are labeled. The original release contains 8468 pairs at pixels; after removing redundant augmentation duplicates in the public version, 2425 unique pairs remain. In our experiments, we adopt 1455 pairs for training, 485 for validation, and 485 for testing. This partitioning supports comprehensive training, model selection, and evaluation, enabling the reliable assessment of change detection performance.
2.6. Experimental Setup
2.6.1. Text Prompt
Here, and are dataset-specific category phrases defined below. In this way, all the semantic change types present in the datasets are systematically mapped to language descriptions. For unchanged regions, we use a special prompt:“A remote sensing image where changes to .”
“There is no change.”
- Low vegetation;
- Non-vegetated surface;
- Tree;
- Water;
- Building;
- Playground.
- “There is no change.”
- “A remote sensing image where low vegetation changes to buildings.”
- “A remote sensing image where non-vegetated surface changes to playground.”
- Farmland;
- Desert;
- Building;
- Water.
- “There is no change.”
- “A remote sensing image where farmland changes to buildings.”
- “A remote sensing image where desert changes to water.”
2.6.2. Comparison Baselines:
- SSESN [33]: A method integrates multi-scale features through a pyramid structure and assigns spatial priorities to bi-temporal branches for change interpretation.
- SCDNet [29]: A dual-branch encoder–decoder model, in which a multi-scale convolutional unit expands the receptive field, and the resulting features are fused with encoder representations before being decoded.
- Bi-SRNet [30]: A method incorporates self-attention for richer semantic interaction and a cross-temporal module for improving correspondence, which surpasses its baseline counterparts.
- SCanNet [8]: A method explicitly models the temporal-to-temporal semantic transformations via a semantic change Transformer, and applies spatio-temporal constraints to align with the task objective.
- GAPL-SCD [34]: A method leverages graph aggregation-based prototype learning under a multi-task optimization regime. By introducing adaptive weight allocation and gradient modulation strategies, it effectively mitigates conflicts among different training objectives, thereby enhancing the stability and efficiency of multi-task learning.
2.6.3. Assessment Criteria
2.6.4. Implementation Details
3. Results
3.1. Quantitative Results and Analysis
3.2. Visual Evaluation
3.3. Ablation Study
3.3.1. Impact of Core Architectural Components
3.3.2. Impact of Textual Alignment Validity
3.4. Model Efficiency Analysis
3.5. Interpretability of the Enhancements via Grad-CAM
- Suppression of Irrelevant Variations by DEM: As observed in column (a) of Figure 5, the pure visual baseline frequently exhibits diffuse and scattered attention. It is easily distracted by pseudo-changes such as seasonal shifts, illumination differences, and registration errors (especially evident in the vast backgrounds of the Landsat-SCD examples in the bottom two rows). Upon incorporating the DEM (column b), the scattered background noise is significantly attenuated. The attention maps become much cleaner and begin to group around the actual changed regions. This empirically substantiates that the channel–spatial decoupling mechanism within the DEM effectively acts as a noise filter, isolating and suppressing non-semantic, high-frequency spatial disturbances before feature fusion.
- Semantic Disambiguation via Language Guidance: While the DEM successfully reduces background noise, the feature responses in (b) still exhibit somewhat blurry boundaries and occasionally weak activations on complex transitions. The integration of the text branch (column c) yields a profound qualitative shift. Guided by explicit transition prompts (e.g., “bare land changes to built-up”), the model’s feature responses strictly alter: the heatmaps become highly concentrated, exhibiting intense peak activations (deep red regions) that perfectly align with the specific “from–to” semantic boundaries defined by the ground truth. This visual evidence compellingly illustrates that language guidance actively steers the visual encoder to focus on precise semantic transitions, thereby successfully resolving directional semantic ambiguities that visual features alone struggle to discern.
4. Discussion
4.1. The Role of Language Guidance in Semantic Disambiguation
4.2. Robustness Against Irrelevant Variations
4.3. Efficiency vs. Performance Trade-Off
4.4. Limitations and Future Perspectives
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wang, D.; Zhang, J.; Du, B.; Xia, G.S.; Tao, D. An empirical study of remote sensing pretraining. IEEE Trans. Geosci. Remote Sens. 2022, 61, 5608020. [Google Scholar] [CrossRef]
- Miao, W.; Geng, J.; Jiang, W. Multigranularity decoupling network with pseudolabel selection for remote sensing image scene classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5603813. [Google Scholar] [CrossRef]
- Geng, J.; Deng, X.; Ma, X.; Jiang, W. Transfer learning for SAR image classification via deep joint distribution adaptation networks. IEEE Trans. Geosci. Remote Sens. 2020, 58, 5377–5392. [Google Scholar] [CrossRef]
- Hu, J.; Zhang, Y. Seasonal change of land-use/land cover (LULC) detection using MODIS data in rapid urbanization regions: A case study of the pearl river delta region (China). IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 1913–1920. [Google Scholar] [CrossRef]
- Bai, T.; Wang, L.; Yin, D.; Sun, K.; Chen, Y.; Li, W.; Li, D. Deep learning for change detection in remote sensing: A review. Geo-Spat. Inf. Sci. 2023, 26, 262–288. [Google Scholar] [CrossRef]
- Saleh, T.; Weng, X.; Holail, S.; Hao, C.; Xia, G.S. DAM-Net: Flood detection from SAR imagery using differential attention metric-based vision transformers. ISPRS J. Photogramm. Remote Sens. 2024, 212, 440–453. [Google Scholar] [CrossRef]
- Wang, D.; Ma, G.; Zhang, H.; Wang, X.; Zhang, Y. Refined change detection in heterogeneous low-resolution remote sensing images for disaster emergency response. ISPRS J. Photogramm. Remote Sens. 2025, 220, 139–155. [Google Scholar] [CrossRef]
- Ding, L.; Zhang, J.; Guo, H.; Zhang, K.; Liu, B.; Bruzzone, L. Joint spatio-temporal modeling for semantic change detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5610814. [Google Scholar] [CrossRef]
- Varghese, A.; Gubbi, J.; Ramaswamy, A.; Balamuralidhar, P. ChangeNet: A deep learning architecture for visual change detection. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops; Springer: Berlin/Heidelberg, Germany, 2018; pp. 129–145. [Google Scholar]
- Xu, Z.; Jiang, W.; Geng, J. Dual-branch dynamic modulation network for hyperspectral and LiDAR data classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5514813. [Google Scholar] [CrossRef]
- Xu, Z.; Jiang, W.; Geng, J. Texture-aware causal feature extraction network for multimodal remote sensing data classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5103512. [Google Scholar] [CrossRef]
- Wu, H.; Geng, J.; Jiang, W. Multidomain constrained translation network for change detection in heterogeneous remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5616916. [Google Scholar] [CrossRef]
- Wang, G.; Cheng, G.; Zhou, P.; Han, J. Cross-level attentive feature aggregation for change detection. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 6051–6062. [Google Scholar] [CrossRef]
- Zhou, Y.; Wang, F.; Zhao, J.; Yao, R.; Chen, S.; Ma, H. Spatial-temporal based multihead self-attention for remote sensing image change detection. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6615–6626. [Google Scholar] [CrossRef]
- Zhang, C.; Yue, P.; Tapete, D.; Jiang, L.; Shangguan, B.; Huang, L.; Liu, G. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS J. Photogramm. Remote Sens. 2020, 166, 183–200. [Google Scholar] [CrossRef]
- Zhang, C.; Wang, L.; Cheng, S.; Li, Y. SwinSUNet: Pure transformer network for remote sensing image change detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5224713. [Google Scholar] [CrossRef]
- Li, Q.; Zhong, R.; Du, X.; Du, Y. TransUNetCD: A hybrid transformer network for change detection in optical remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5622519. [Google Scholar] [CrossRef]
- Bandara, W.G.C.; Patel, V.M. A transformer-based siamese network for change detection. In Proceedings of the IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium; IEEE: New York, NY, USA, 2022; pp. 207–210. [Google Scholar]
- Saha, S.; Bovolo, F.; Bruzzone, L. Building change detection in VHR SAR images via unsupervised deep transcoding. IEEE Trans. Geosci. Remote Sens. 2020, 59, 1917–1929. [Google Scholar] [CrossRef]
- Hou, X.; Bai, Y.; Xie, Y.; Zhang, Y.; Fu, L.; Li, Y.; Shang, C.; Shen, Q. Self-supervised multimodal change detection based on difference contrast learning for remote sensing imagery. Pattern Recognit. 2025, 159, 111148. [Google Scholar] [CrossRef]
- Wang, Q.; Jing, W.; Chi, K.; Yuan, Y. Cross-difference semantic consistency network for semantic change detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4406312. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, M.; Gao, X.; Shi, W. Advances and challenges in deep learning-based change detection for remote sensing images: A review through various learning paradigms. Remote Sens. 2024, 16, 804. [Google Scholar] [CrossRef]
- Xie, W.; Shao, W.; Li, D.; Li, Y.; Fang, L. MIFNet: Multi-scale interaction fusion network for remote sensing image change detection. IEEE Trans. Circuits Syst. Video Technol. 2024, 35, 2725–2739. [Google Scholar] [CrossRef]
- Cui, B.; Peng, Y.; Zhang, Y.; Yin, H.; Fang, H.; Guo, S.; Du, P. Enhanced edge information and prototype constrained clustering for SAR change detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5206116. [Google Scholar] [CrossRef]
- Long, J.; Li, M.; Wang, X.; Stein, A. Semantic change detection using a hierarchical semantic graph interaction network from high-resolution remote sensing images. ISPRS J. Photogramm. Remote Sens. 2024, 211, 318–335. [Google Scholar] [CrossRef]
- Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sens. 2020, 12, 1662. [Google Scholar] [CrossRef]
- Shi, Q.; Liu, M.; Li, S.; Liu, X.; Wang, F.; Zhang, L. A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5604816. [Google Scholar] [CrossRef]
- Li, Z.; Tang, C.; Liu, X.; Zhang, W.; Dou, J.; Wang, L.; Zomaya, A.Y. Lightweight remote sensing change detection with progressive feature aggregation and supervised attention. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5602812. [Google Scholar] [CrossRef]
- Peng, D.; Bruzzone, L.; Zhang, Y.; Guan, H.; He, P. SCDNET: A novel convolutional network for semantic change detection in high resolution optical remote sensing imagery. Int. J. Appl. Earth Obs. Geoinf. 2021, 103, 102465. [Google Scholar] [CrossRef]
- Ding, L.; Guo, H.; Liu, S.; Mou, L.; Zhang, J.; Bruzzone, L. Bi-temporal semantic reasoning for the semantic change detection in HR remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5620014. [Google Scholar] [CrossRef]
- Chang, H.; Wang, P.; Diao, W.; Xu, G.; Sun, X. A triple-branch hybrid attention network with bitemporal feature joint refinement for remote-sensing image semantic change detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5613816. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhong, Y.; Tian, S.; Ma, A.; Zhang, L. ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection. ISPRS J. Photogramm. Remote Sens. 2022, 183, 228–239. [Google Scholar] [CrossRef]
- Zhao, M.; Zhao, Z.; Gong, S.; Liu, Y.; Yang, J.; Xiong, X.; Li, S. Spatially and semantically enhanced Siamese network for semantic change detection in high-resolution remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2563–2573. [Google Scholar] [CrossRef]
- Xu, Z.; Wu, H.; Jiang, W.; Geng, J. Graph Aggregation Prototype Learning for Semantic Change Detection in Remote Sensing. arXiv 2025, arXiv:2507.10938. [Google Scholar] [CrossRef]
- Li, Z.; Wang, X.; Fang, S.; Zhao, J.; Yang, S.; Li, W. A decoder-focused multitask network for semantic change detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5609115. [Google Scholar] [CrossRef]
- Yuan, P.; Zhao, Q.; Zhao, X.; Wang, X.; Long, X.; Zheng, Y. A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images. Int. J. Digit. Earth 2022, 15, 1506–1525. [Google Scholar] [CrossRef]
- Ou, X.; Liu, L.; Tan, S.; Zhang, G.; Li, W.; Tu, B. A hyperspectral image change detection framework with self-supervised contrastive learning pretrained model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7724–7739. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2021; pp. 8748–8763. [Google Scholar]
- Xu, C. LDGNet: A Lightweight Difference Guiding Network for Remote Sensing Change Detection. arXiv 2025, arXiv:2504.05062. [Google Scholar] [CrossRef]
- Wang, X.; Dong, S.; Zheng, X.; Lu, R.; Jia, J. Explicit High-Level Semantic Network for Domain Generalization in Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5538314. [Google Scholar] [CrossRef]
- Liu, F.; Chen, D.; Guan, Z.; Zhou, X.; Zhu, J.; Ye, Q.; Fu, L.; Zhou, J. RemoteCLIP: A Vision Language Foundation Model for Remote Sensing. arXiv 2024, arXiv:2306.11029. [Google Scholar] [CrossRef]
- Li, X.; Wen, C.; Hu, Y.; Zhou, N. RS-CLIP: Zero shot remote sensing scene classification via contrastive vision-language supervision. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103497. [Google Scholar] [CrossRef]
- Jiang, W.; Sun, Y.; Lei, L.; Kuang, G.; Ji, K. AdaptVFMs-RSCD: Advancing Remote Sensing Change Detection from binary to semantic with SAM and CLIP. ISPRS J. Photogramm. Remote Sens. 2025, 230, 304–317. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 261–272. [Google Scholar]
- Yang, K.; Xia, G.S.; Liu, Z.; Du, B.; Yang, W.; Pelillo, M.; Zhang, L. Asymmetric siamese networks for semantic change detection in aerial images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5609818. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, M.; Li, W.; Wang, S.; Tao, R. Language-aware domain generalization network for cross-scene hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5501312. [Google Scholar] [CrossRef]
- Dong, L.; Geng, J.; Jiang, W. Spectral–spatial enhancement and causal constraint for hyperspectral image cross-scene classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5507013. [Google Scholar] [CrossRef]
- Zhao, H.; Zhang, J.; Lin, L.; Wang, J.; Gao, S.; Zhang, Z. Locally linear unbiased randomization network for cross-scene hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5526512. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, W.; Sun, W.; Tao, R.; Du, Q. Single-source domain expansion network for cross-scene hyperspectral image classification. IEEE Trans. Image Process. 2023, 32, 1498–1512. [Google Scholar] [CrossRef]





| Category | SSESN [33] | SCDNet [29] | Bi-SRNet [30] | SCanNet [8] | GAPL-SCD [34] | LGDENet |
|---|---|---|---|---|---|---|
| OA (%) | 93.45 | 93.78 | 93.62 | 93.93 | 94.02 | 94.14 |
| mIoU (%) | 73.56 | 74.45 | 74.83 | 75.67 | 76.02 | 76.46 |
| SeK (%) | 82.17 | 82.88 | 83.20 | 83.88 | 84.23 | 84.71 |
| Fscd (%) | 85.44 | 86.12 | 86.60 | 87.10 | 87.49 | 87.90 |
| Non-change | 93.13 | 92.22 | 92.10 | 93.25 | 93.56 | 93.68 |
| Low vegetation | 89.23 | 91.11 | 90.67 | 91.67 | 92.06 | 92.31 |
| N.v.g. surface | 92.52 | 92.25 | 92.35 | 93.02 | 93.22 | 93.44 |
| Tree | 85.67 | 86.22 | 86.45 | 86.90 | 87.15 | 87.24 |
| Water | 91.23 | 91.73 | 91.58 | 92.32 | 92.78 | 93.20 |
| Building | 74.56 | 73.45 | 74.33 | 75.10 | 75.63 | 75.82 |
| Playground | 83.33 | 82.89 | 83.22 | 84.05 | 84.47 | 84.70 |
| Category | SSESN [33] | SCDNet [29] | Bi-SRNet [30] | SCanNet [8] | GAPL-SCD [34] | LGDENet |
|---|---|---|---|---|---|---|
| OA (%) | 89.15 | 91.44 | 93.80 | 95.04 | 95.30 | 95.46 |
| mIoU (%) | 74.17 | 77.95 | 82.94 | 86.37 | 87.02 | 87.48 |
| SeK (%) | 24.28 | 32.46 | 44.27 | 52.63 | 53.88 | 54.23 |
| Fscd (%) | 68.27 | 74.82 | 82.01 | 85.62 | 85.99 | 88.71 |
| Non-change | 95.10 | 95.61 | 96.75 | 97.84 | 97.63 | 97.78 |
| Farmland | 63.07 | 71.24 | 79.26 | 81.91 | 83.65 | 84.02 |
| Desert | 66.74 | 74.76 | 81.31 | 83.91 | 85.75 | 86.16 |
| Building | 40.18 | 59.45 | 77.22 | 77.62 | 81.21 | 81.34 |
| Water | 85.49 | 84.96 | 88.23 | 89.13 | 89.99 | 90.54 |
| Exp. | Hybrid Encoder | DEM | Language-Guided CL | OA (%) | mIoU (%) | (%) |
|---|---|---|---|---|---|---|
| 1 | × (ResNet) | × (Simple Subtraction) | × | 92.50 | 73.15 | 84.00 |
| 2 | ✓ | × (Simple Subtraction) | × | 93.10 | 74.82 | 85.20 |
| 3 | ✓ | ✓ | × | 93.80 | 75.90 | 86.80 |
| 4 | ✓ | ✓ | ✓ (LGDENet) | 94.14 | 76.46 | 87.90 |
| Text Strategy | OA (%) | mIoU (%) | (%) |
|---|---|---|---|
| No Text (Visual Only) | 93.80 | 75.90 | 86.80 |
| CLIP Text Embedding (Ours) | 94.14 | 76.46 | 87.90 |
| Method | Backbone | Params (M) | FLOPs (G) | (%) |
|---|---|---|---|---|
| SCDNet | ResNet-50 | 39.62 | 116.98 | 86.12 |
| Bi-SRNet | ResNet-18 | 22.24 | 189.91 | 86.60 |
| SCanNet | ResNet-18 | 27.90 | - | 87.10 |
| AdaptVFMs-RSCD | SAM-B + CLIP | >150.00 | >500.00 | - |
| LGDENet (Ours) | Swin-T + CNN | 33.45 | 65.30 | 87.90 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Hu, Y.; Ren, L.; Jiang, H.; Guo, K.; Liu, T.; Gao, J.; Sun, Y.; Yin, B. Language-Guided Contrastive Learning and Difference Enhancement for Semantic Change Detection in Remote Sensing Images. Remote Sens. 2026, 18, 964. https://doi.org/10.3390/rs18060964
Hu Y, Ren L, Jiang H, Guo K, Liu T, Gao J, Sun Y, Yin B. Language-Guided Contrastive Learning and Difference Enhancement for Semantic Change Detection in Remote Sensing Images. Remote Sensing. 2026; 18(6):964. https://doi.org/10.3390/rs18060964
Chicago/Turabian StyleHu, Yongli, Lintian Ren, Huajie Jiang, Kan Guo, Tengfei Liu, Junbin Gao, Yanfeng Sun, and Baocai Yin. 2026. "Language-Guided Contrastive Learning and Difference Enhancement for Semantic Change Detection in Remote Sensing Images" Remote Sensing 18, no. 6: 964. https://doi.org/10.3390/rs18060964
APA StyleHu, Y., Ren, L., Jiang, H., Guo, K., Liu, T., Gao, J., Sun, Y., & Yin, B. (2026). Language-Guided Contrastive Learning and Difference Enhancement for Semantic Change Detection in Remote Sensing Images. Remote Sensing, 18(6), 964. https://doi.org/10.3390/rs18060964

