Glacier Extraction from Cloudy Satellite Images Using a Multi-Task Generative Adversarial Network Leveraging Transformer-Based Backbones
Highlights
- The proposed SCGEM accurately delineates glacier boundaries under clouds, with the highest IoU of 0.7700.
- Both the generative adversarial mechanism and multi-task architecture notably improved the glacier boundary delineation accuracy under cloud cover.
- The Topo., SAR, and Tempo. features all contribute to glacier extraction in cloudy areas, with the Tempo. features contributing the most.
- The proposed architecture serves both to data clean and enhance the extraction of glacier texture features.
Abstract
1. Introduction
2. Study Area and Material
2.1. Study Area
2.2. Remote Sensing Data
2.3. Training Samples
2.4. Data Preprocess
3. Methodology
3.1. Model Structure
3.1.1. Encoder
- Multi-Head Self-Attention
- Convolutional Block Attention Module
- Non-Local block
- Convolutional Position Encoding block
3.1.2. Discriminator (Dis.)
3.1.3. Decoder
- Reconstruction (Recon.) Decoder
- Segmentation (Seg.) Decoder
3.1.4. Loss Function
- Discriminator Loss
- Generator Loss
3.2. Implementation Detail and Evaluation
4. Result
4.1. Feature Contribution
4.2. Model Efficiency
4.3. Ablation Study
4.4. Comparison Analysis
5. Discussion
5.1. The Cloud-Insensitive Features Properties
5.2. Contribution of the Model Architectural
5.3. Comparison with Current Glacier Inventory
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| SCGEM | Sub-cloudy glacier extraction model |
| Topo. | Topographic |
| SAR | Synthetic aperture radar |
| Tempo. | Temporal |
| QTP | Qinghai–Tibet Plateau |
| GAMDAM | Glacier Area Mapping for Discharge from the Asian Mountains |
| CGI | Chinese Glacier Inventory |
| RGI | Randolph Glacier Inventory |
| CNNs | Convolutional neural networks |
| ViT | Vision transformers |
| GAN | Generative adversarial network |
| Spec. | Spectral |
| CS | Cloudiness score |
| NDSI | Normalized difference snow index |
| DEM | Digital elevation model |
| MHSA | Multi-head self-attention |
| CBAM | Convolutional block attention module |
| NL | Non-local |
| CPE | Convolutional position encoding |
| Dis. | Discriminator |
| Recon. | Reconstruction |
| Seg. | Segmentation |
| WGAN-GP | Wasserstein GAN with Gradient Penalty |
| OA | Overall accuracy |
| IoU | Intersection over Union |
References
- Wangchuk, S.; Bolch, T.; Robson, B.A. Monitoring glacial lake outburst flood susceptibility using Sentinel-1 SAR data, Google Earth Engine, and persistent scatterer interferometry. Remote Sens. Environ. 2022, 271, 112910. [Google Scholar] [CrossRef]
- You, Q.L.; Cai, Z.Y.; Pepin, N.; Chen, D.L.; Ahrens, B.; Jiang, Z.H.; Wu, F.Y.; Kang, S.C.; Zhang, R.N.; Wu, T.H.; et al. Warming amplification over the Arctic Pole and Third Pole: Trends, mechanisms and consequences. Earth-Sci. Rev. 2021, 217, 103625. [Google Scholar] [CrossRef]
- Nuimura, T.; Sakai, A.; Taniguchi, K.; Nagai, H.; Lamsal, D.; Tsutaki, S.; Kozawa, A.; Hoshina, Y.; Takenaka, S.; Omiya, S.; et al. The GAMDAM glacier inventory: A quality-controlled inventory of Asian glaciers. Cryosphere 2015, 9, 849–864. [Google Scholar] [CrossRef]
- Guo, W.Q.; Liu, S.Y.; Xu, L.; Wu, L.Z.; Shangguan, D.H.; Yao, X.J.; Wei, J.F.; Bao, W.J.; Yu, P.C.; Liu, Q.; et al. The second Chinese glacier inventory: Data, methods and results. J. Glaciol. 2015, 61, 357–372. [Google Scholar] [CrossRef]
- Pfeffer, W.T.; Arendt, A.A.; Bliss, A.; Bolch, T.; Cogley, J.G.; Gardner, A.S.; Hagen, J.O.; Hock, R.; Kaser, G.; Kienholz, C.; et al. The Randolph Glacier Inventory: A globally complete inventory of glaciers. J. Glaciol. 2014, 60, 537–552. [Google Scholar] [CrossRef]
- Zhu, Q.; Guo, H.; Zhang, L.; Liang, D.; Wu, Z.; Liu, Y.; Lv, Z. GLA-STDeepLab: SAR Enhancing Glacier and Ice Shelf Front Detection Using Swin-TransDeepLab With Global–Local Attention. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
- Zahriban Hesari, M.; Buono, A.; Nunziata, F.; Aulicino, G.; Migliaccio, M. Multi-Polarisation C-Band SAR Imagery to Estimate the Recent Dynamics of the d’Iberville Glacier. Remote Sens. 2022, 14, 5758. [Google Scholar] [CrossRef]
- Shi, Y.; Liu, G.; Wang, X.; Liu, Q.; Zhang, R.; Jia, H. Assessing the Glacier Boundaries in the Qinghai-Tibetan Plateau of China by Multi-Temporal Coherence Estimation with Sentinel-1A InSAR. Remote Sens. 2019, 11, 392. [Google Scholar] [CrossRef]
- Ke, L.H.; Zhang, J.S.; Fan, C.Y.; Zhou, J.J.; Song, C.Q. Large-Scale Monitoring of Glacier Surges by Integrating High-Temporal- and -Spatial-Resolution Satellite Observations: A Case Study in the Karakoram. Remote Sens. 2022, 14, 4668. [Google Scholar] [CrossRef]
- Mitkari, K.V.; Arora, M.K.; Tiwari, R.K.; Sofat, S.; Gusain, H.S.; Tiwari, S.P. Large-Scale Debris Cover Glacier Mapping Using Multisource Object-Based Image Analysis Approach. Remote Sens. 2022, 14, 3202. [Google Scholar] [CrossRef]
- Zhang, M.; Wang, X.; Shi, C.; Yan, D. Automated Glacier Extraction Index by Optimization of Red/SWIR and NIR/SWIR Ratio Index for Glacier Mapping Using Landsat Imagery. Water 2019, 11, 1223. [Google Scholar] [CrossRef]
- Chen, L.; Zhang, W.; Yi, Y.; Zhang, Z.; Chao, S. Long Time-Series Glacier Outlines in the Three-Rivers Headwater Region from 1986 to 2021 Based on Deep Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5734–5752. [Google Scholar] [CrossRef]
- Periyasamy, M.; Davari, A.; Seehaus, T.; Braun, M.; Maier, A.; Christlein, V. How to Get the Most Out of U-Net for Glacier Calving Front Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 1712–1723. [Google Scholar] [CrossRef]
- Xie, Z.; Haritashya, U.K.; Asari, V.K.; Young, B.W.; Bishop, M.P.; Kargel, J.S. GlacierNet: A Deep-Learning Approach for Debris-Covered Glacier Mapping. IEEE Access 2020, 8, 83495–83510. [Google Scholar] [CrossRef]
- Mohajerani, Y.; Wood, M.; Velicogna, I.; Rignot, E. Detection of Glacier Calving Margins with Convolutional Neural Networks: A Case Study. Remote Sens. 2019, 11, 74. [Google Scholar] [CrossRef]
- Peng, Y.; He, J.; Yuan, Q.; Wang, S.; Chu, X.; Zhang, L. Automated glacier extraction using a Transformer based deep learning approach from multi-sensor remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2023, 202, 303–313. [Google Scholar] [CrossRef]
- Hu, M.; Zhou, G.; Lv, X.; Zhou, L.; He, X.; Tian, Z. A New Automatic Extraction Method for Glaciers on the Tibetan Plateau under Clouds, Shadows and Snow Cover. Remote Sens. 2022, 14, 3084. [Google Scholar] [CrossRef]
- Huang, L.; Li, Z.; Zhou, J.M.; Zhang, P. An automatic method for clean glacier and nonseasonal snow area change estimation in High Mountain Asia from 1990 to 2018. Remote Sens. Environ. 2021, 258, 112376. [Google Scholar] [CrossRef]
- Lin, L.; Liu, L.; Liu, M.; Zhang, Q.; Feng, M.; Khalil, Y.S.; Yin, F. DEDNet: Dual-Encoder DeeplabV3+ Network for Rock Glacier Recognition Based on Multispectral Remote Sensing Image. Remote Sens. 2024, 16, 2603. [Google Scholar] [CrossRef]
- Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
- Roerink, G.J.; Menenti, M.; Verhoef, W. Reconstructing cloudfree NDVI composites using Fourier analysis of time series. Int. J. Remote Sens. 2000, 21, 1911–1917. [Google Scholar] [CrossRef]
- Clark, M.L.; Aide, T.M.; Grau, H.R.; Riner, G. A scalable approach to mapping annual land cover at 250 m using MODIS time series data: A case study in the Dry Chaco ecoregion of South America. Remote Sens. Environ. 2010, 114, 2816–2832. [Google Scholar] [CrossRef]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein GANs. In Proceedings of the Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5769–5779. [Google Scholar]
- Wang, Y.M.; Peng, X.Y.; Huang, W.Q.; Ye, X.P.; Jiang, M.F. Self-supervised non-rigid structure from motion with improved training of Wasserstein GANs. IET Comput. Vis. 2023, 17, 404–414. [Google Scholar] [CrossRef]
- Johnson, J.; Alahi, A.; Li, F.F. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. Comput. Vis.-ECCV 2016 2016, 9906, 694–711. [Google Scholar] [CrossRef]
- Ding, X.H.; Zhang, X.Y.; Ma, N.N.; Han, J.G.; Ding, G.G.; Sun, J. RepVGG: Making VGG-style ConvNets Great Again. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13728–13737. [Google Scholar] [CrossRef]
- Cheng, B.W.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention Mask Transformer for Universal Image Segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (Cvpr 2022), New Orleans, LA, USA, 18–24 June 2022; pp. 1280–1289. [Google Scholar] [CrossRef]








| Dataset | Feature | Temporal Cover | Channel | Resolution | Source |
|---|---|---|---|---|---|
| Spec. | B2, B3, B4, B5, B8, B11, NDSI | 2020.06.01–2020.09.30 | 7 | 10 m/20 m | Sentinel-2 |
| CS | Cloudy score | Same as Spec. | 1 | 10 m | Sentinel-2 |
| SAR | VV, VH | 2020.06.01–2020.09.30 | 2 | 10 m | Sentinel-1 |
| Topo. | Elevation, slope, aspect | 2020s | 3 | 30 m | SRTM |
| Tempo. | Phase, amplitude, baseline (spectral) | 2019.01.01–2021.12.31 | 21 | 10 m | Sentinel-2 |
| Experiment Area | Latitude | Longitude | Area | Training Patch | Validation Patch |
|---|---|---|---|---|---|
| Region 1 | 29.884 | 90.014 | 262.14 km2 | 671 | 168 |
| Region 2 | 29.308 | 96.627 | 2202.01 km2 | 4510 | 1128 |
| Region 3 | 29.617 | 95.110 | 943.72 km2 | 1874 | 468 |
| Region 4 | 29.854 | 94.992 | 786.43 km2 | 1600 | 400 |
| Region 5 | 29.861 | 84.576 | 655.36 km2 | 1344 | 336 |
| Total | / | / | 4849.66 km2 | 10,000 | 2500 |
| Combination | Feature | IoU | F1 | OA | Recall |
|---|---|---|---|---|---|
| 1 | Topo.; SAR; Tempo. | 0.7700 | 0.8700 | 0.9716 | 0.9394 |
| 2 | SAR; Tempo. | 0.7523 | 0.8586 | 0.9694 | 0.9193 |
| 3 | Topo.; Tempo. | 0.7225 | 0.8387 | 0.9641 | 0.9208 |
| 4 | Topo.; SAR, | 0.3832 | 0.5536 | 0.8817 | 0.7182 |
| 5 | Tempo. | 0.6976 | 0.8216 | 0.9592 | 0.9248 |
| 6 | SAR | 0.2912 | 0.4494 | 0.8948 | 0.4385 |
| 7 | Topo. | 0.2877 | 0.4461 | 0.8825 | 0.4771 |
| Noise Level | IoU | F1 | OA | Recall |
|---|---|---|---|---|
| Non-cloudy | 0.7828 | 0.8782 | 0.9739 | 0.9301 |
| 20% noise | 0.7820 | 0.8777 | 0.9738 | 0.9306 |
| 40% noise | 0.7806 | 0.8768 | 0.9735 | 0.9315 |
| 60% noise | 0.7789 | 0.8757 | 0.9732 | 0.9328 |
| 80% noise | 0.7782 | 0.8753 | 0.9731 | 0.9349 |
| 100% noise | 0.7772 | 0.8746 | 0.9728 | 0.9363 |
| Model | Method | IoU | F1 | OA | Recall |
|---|---|---|---|---|---|
| Encoder | CNN, CBAM, NL | 0.7128 | 0.8321 | 0.9632 | 0.8995 |
| CNN, NL, CPE | 0.7367 | 0.8484 | 0.9679 | 0.8880 | |
| CNN, CBAM, CPE | 0.7303 | 0.8438 | 0.9686 | 0.8431 | |
| NLB, CPE | 0.7189 | 0.8364 | 0.9674 | 0.8242 | |
| CNN, CBAM, NL, CPE | 0.7537 | 0.8596 | 0.9715 | 0.8634 | |
| Recon loss | MSE | 0.7243 | 0.8401 | 0.9658 | 0.8862 |
| Perc. | 0.7245 | 0.8402 | 0.9664 | 0.8741 | |
| SSIM, MSE | 0.7383 | 0.8494 | 0.9675 | 0.9060 | |
| SSIM, Perc. | 0.7700 | 0.8700 | 0.9716 | 0.9394 | |
| MSE, Perc. | 0.7537 | 0.8596 | 0.9715 | 0.8634 | |
| Architecture | Seg. Dis. | 0.6715 | 0.8031 | 0.9604 | 0.8015 |
| Recon., Seg., | 0.7102 | 0.8305 | 0.9634 | 0.8867 | |
| Recon., Seg., Dis. | 0.7700 | 0.8700 | 0.9716 | 0.9394 |
| Model | IOU | F1 | OA | Recall | GFLOPs | Parameters |
|---|---|---|---|---|---|---|
| NDSI threshold | 0.6725 | 0.8042 | 0.9533 | 0.9471 | / | / |
| RepVGG-B3 | 0.6247 | 0.7687 | 0.9415 | 0.9577 | 1.18 GFLOPs | 23.39 M |
| YOLOv8-M | 0.7034 | 0.8258 | 0.9590 | 0.9465 | 1.24 GFLOPs | 12.56 M |
| Mask2Former | 0.6353 | 0.7769 | 0.9483 | 0.8881 | 2.04 GFLOPs | 37.64 M |
| ViT-B | 0.7098 | 0.8303 | 0.9632 | 0.8898 | 1.55 GFLOPs | 91.74 M |
| Swin-S | 0.7447 | 0.8536 | 0.9670 | 0.9481 | 0.80 GFLOPs | 49.61 M |
| This paper | 0.7700 | 0.8700 | 0.9716 | 0.9394 | 1.35 GFLOPs | 65.64 M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cui, Y.; Jia, K.; Wei, H.; Tao, G.; Ji, F.; Li, J.; Qiao, S.; Zhao, L.; Jiang, Z.; Gao, X.; et al. Glacier Extraction from Cloudy Satellite Images Using a Multi-Task Generative Adversarial Network Leveraging Transformer-Based Backbones. Remote Sens. 2025, 17, 3570. https://doi.org/10.3390/rs17213570
Cui Y, Jia K, Wei H, Tao G, Ji F, Li J, Qiao S, Zhao L, Jiang Z, Gao X, et al. Glacier Extraction from Cloudy Satellite Images Using a Multi-Task Generative Adversarial Network Leveraging Transformer-Based Backbones. Remote Sensing. 2025; 17(21):3570. https://doi.org/10.3390/rs17213570
Chicago/Turabian StyleCui, Yuran, Kun Jia, Haishuo Wei, Guofeng Tao, Fengcheng Ji, Jie Li, Shijiao Qiao, Linlin Zhao, Zihang Jiang, Xinyi Gao, and et al. 2025. "Glacier Extraction from Cloudy Satellite Images Using a Multi-Task Generative Adversarial Network Leveraging Transformer-Based Backbones" Remote Sensing 17, no. 21: 3570. https://doi.org/10.3390/rs17213570
APA StyleCui, Y., Jia, K., Wei, H., Tao, G., Ji, F., Li, J., Qiao, S., Zhao, L., Jiang, Z., Gao, X., Gan, L., & Wang, Q. (2025). Glacier Extraction from Cloudy Satellite Images Using a Multi-Task Generative Adversarial Network Leveraging Transformer-Based Backbones. Remote Sensing, 17(21), 3570. https://doi.org/10.3390/rs17213570

