Label-Efficient Fine-Tuning for Remote Sensing Imagery Segmentation with Diffusion Models
Abstract
1. Introduction
- Label-efficient fine-tuning: We propose a pre-training strategy on homogeneous unlabeled datasets to enhance the DDPM encoder. This strategy helps perform label-efficient fine-tuning, based on which we developed two self-supervised learning datasets based on homogeneous images from MiniFrance and the Gaofen Image Dataset (GID).
- Multi-scale features analysis from the unsupervised decoder: We activate and visualize intermediate-layer features, comparing the features extracted by the diffusion model’s encoder and decoder. The curated features from the DDPM decoder are then used for remote sensing imagery segmentation.
- Scheduled noisy imagery input for dense prediction: We implement a noise-based augmentation strategy to enhance the generalization of diffusion models. Our experiments show that injecting a controlled proportion of Gaussian noise can effectively improve segmentation accuracy.
- Multi-layer perceptron (MLP) segmentor for frozen backbones: We design a simple MLP segmentor that effectively fuses and optimizes multi-scale features obtained from the frozen diffusion decoder, facilitating an easy and effective transfer to downstream segmentation tasks.
2. Related Works
2.1. Pre-Training with Diffusion Models
2.1.1. Generative Pre-Training
2.1.2. Diffusion Models
2.2. Intermediate Features for Semantic Segmentation
2.3. Diffusion Models in Remote Sensing
3. Methodology
3.1. A Generative Pre-Training Paradigm for Semantic Segmentation
3.2. Fine-Tuning Strategy
3.3. Feature Selection
3.4. MLP-Based Segmentor Architecture
4. Experimental Results
4.1. Datasets
4.2. Experimental Implementation
4.2.1. Pre-Training Setup
4.2.2. Hyperparameters
4.2.3. Evaluation
4.3. Ablation Study
4.3.1. Two-Step Pre-Training
4.3.2. Intermediate Features in Segmentor
4.3.3. Effect of Noise Scales
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, X.; Zhou, Y.; Luo, J. Deep learning for processing and analysis of remote sensing big data: A technical review. Big Earth Data 2022, 6, 527–560. [Google Scholar] [CrossRef]
- Sun, X.; Tian, Y.; Lu, W.; Wang, P.; Niu, R.; Yu, H.; Fu, K. From single- to multi-modal remote sensing imagery interpretation: A survey and taxonomy. Sci. China Inf. Sci. 2023, 66, 140301. [Google Scholar] [CrossRef]
- Sobrino, J.A.; Raissouni, N. Toward remote sensing methods for land cover dynamic monitoring: Application to Morocco. Int. J. Remote Sens. 2000, 21, 353–366. [Google Scholar] [CrossRef]
- Mo, Y.; Wu, Y.; Yang, X.; Liu, F.; Liao, Y. Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 2022, 493, 626–646. [Google Scholar] [CrossRef]
- Alshari, E.A.; Gawali, B.W. Development of classification system for LULC using remote sensing and GIS. Glob. Transit. Proc. 2021, 2, 8–17. [Google Scholar] [CrossRef]
- Liu, P. A survey of remote-sensing big data. Front. Environ. Sci. 2015, 3, 45. [Google Scholar] [CrossRef]
- Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
- Amit, T.; Shaharbany, T.; Nachmani, E.; Wolf, L. SegDiff: Image Segmentation with Diffusion Probabilistic Models. arXiv 2022, arXiv:2112.00390. [Google Scholar] [CrossRef]
- Talukdar, S.; Singha, P.; Mahato, S.; Shahfahad; Pal, S.; Liou, Y.-A.; Rahman, A. Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations—A Review. Remote Sens. 2020, 12, 1135. [Google Scholar] [CrossRef]
- Luo, Y.; Wang, J.; Yang, X.; Yu, Z.; Tan, Z. Pixel Representation Augmented through Cross-Attention for High-Resolution Remote Sensing Imagery Segmentation. Remote Sens. 2022, 14, 5415. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Sumbul, G.; de Wall, A.; Kreuziger, T.; Marcelino, F.; Costa, H.; Benevides, P.; Caetano, M.; Demir, B.; Markl, V. BigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and Retrieval. IEEE Geosci. Remote Sens. Mag. 2021, 9, 174–180. [Google Scholar] [CrossRef]
- Long, Y.; Xia, G.-S.; Li, S.; Yang, W.; Yang, M.Y.; Zhu, X.X.; Zhang, L.; Li, D. On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AID. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4205–4230. [Google Scholar] [CrossRef]
- Manas, O.; Lacoste, A.; Giro-i-Nieto, X.; Vazquez, D.; Rodriguez, P. Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 9394–9403. [Google Scholar] [CrossRef]
- Cha, K.; Seo, J.; Lee, T. A Billion-scale Foundation Model for Remote Sensing Images. arXiv 2023, arXiv:2304.05215. [Google Scholar] [CrossRef]
- Ayush, K.; Uzkent, B.; Meng, C.; Tanmay, K.; Burke, M.; Lobell, D.; Ermon, S. Geography-Aware Self-Supervised Learning. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 10161–10170. [Google Scholar] [CrossRef]
- Fuller, A.; Millard, K.; Green, J.R. SatViT: Pretraining Transformers for Earth Observation. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 16000–16009. [Google Scholar] [CrossRef]
- Reed, C.J.; Gupta, R.; Li, S.; Brockman, S.; Funk, C.; Clipp, B.; Keutzer, K.; Candido, S.; Uyttendaele, M.; Darrell, T. Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 4088–4099. [Google Scholar]
- Li, Z.; Hou, B.; Ma, S.; Wu, Z.; Guo, X.; Ren, B.; Jiao, L. Masked Angle-Aware Autoencoder for Remote Sensing Images. arXiv 2024, arXiv:2408.01946. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems, Proceedings of the Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020; pp. 6840–6851. [Google Scholar]
- Doersch, C. Tutorial on Variational Autoencoders. arXiv 2021, arXiv:1606.05908. [Google Scholar] [CrossRef]
- An, T.; Xue, B.; Huo, C.; Xiang, S.; Pan, C. Efficient Remote Sensing Image Super-Resolution via Lightweight Diffusion Models. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
- Ayala, C.; Sesma, R.; Aranda, C.; Galar, M. Diffusion Models for Remote Sensing Imagery Semantic Segmentation. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Pasadena, CA, USA, 16–21 July 2023; pp. 5654–5657. [Google Scholar] [CrossRef]
- Bandara, W.G.C.; Nair, N.G.; Patel, V.M. DDPM-CD: Remote Sensing Change Detection using Denoising Diffusion Probabilistic Models. arXiv 2022, arXiv:2206.11892. [Google Scholar] [CrossRef]
- Thrun, S.; Pratt, L. Learning to Learn: Introduction and Overview. In Learning to Learn; Springer: Boston, MA, USA, 1998; pp. 3–17. [Google Scholar] [CrossRef]
- Doersch, C.; Gupta, A.; Efros, A.A. Unsupervised Visual Representation Learning by Context Prediction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1422–1430. [Google Scholar] [CrossRef]
- Han, X.; Zhang, Z.; Ding, N.; Gu, Y.; Liu, X.; Huo, Y.; Qiu, J.; Yao, Y.; Zhang, A.; Zhang, L.; et al. Pre-trained models: Past, present and future. AI Open 2021, 2, 225–250. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Zhang, W.; Cui, B.; Yang, M.-H. Diffusion Models: A Comprehensive Survey of Methods and Applications. ACM Comput. Surv. 2023, 56, 1–39. [Google Scholar] [CrossRef]
- Reed, C.J.; Gupta, R.; Li, S.; Brockman, S.; Funk, C.; Clipp, B.; Keutzer, K.; Candido, S.; Uyttendaele, M.; Darrell, T. Self-Supervised Pretraining Improves Self-Supervised Pretraining. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 4–8 January 2022; pp. 2584–2594. [Google Scholar] [CrossRef]
- Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative Adversarial Networks: An Overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
- Alqahtani, H.; Kavakli-Thorne, M.; Kumar, G. Applications of Generative Adversarial Networks (GANs): An Updated Review. Arch. Comput. Methods Eng. 2021, 28, 525–552. [Google Scholar] [CrossRef]
- Sainath, T.N.; Kingsbury, B.; Ramabhadran, B. Auto-encoder bottleneck features using deep belief networks. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 4153–4156. [Google Scholar] [CrossRef]
- Jiang, X.; Zhang, Y.; Zhang, W.; Xiao, X. A novel sparse auto-encoder for deep unsupervised learning. In Proceedings of the 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI), Beijing, China, 19–21 October 2013; pp. 256–261. [Google Scholar] [CrossRef]
- Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar] [CrossRef]
- Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.-A.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
- Vahdat, A.; Kautz, J. NVAE: A Deep Hierarchical Variational Autoencoder. In Advances in Neural Information Processing Systems, Proceedings of the Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020; pp. 19667–19679. [Google Scholar]
- Gupta, A.; Wu, J.; Deng, J.; Li, F.-F. Siamese Masked Autoencoders. In Advances in Neural Information Processing Systems (NeurIPS), Proceedings of the Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Curran Associates Inc.: Red Hook, NY, USA, 2023; pp. 40676–40693. [Google Scholar]
- Singh, M.; Duval, Q.; Alwala, K.V.; Fan, H.; Aggarwal, V.; Adcock, A.; Joulin, A.; Dollár, P.; Feichtenhofer, C.; Girshick, R.; et al. The effectiveness of MAE pre-pretraining for billion-scale pretraining. arXiv 2023, arXiv:2303.13496. [Google Scholar] [CrossRef]
- Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021; pp. 1–20. [Google Scholar]
- Baranchuk, D.; Voynov, A.; Rubachev, I.; Khrulkov, V.; Babenko, A. Label-Efficient Semantic Segmentation with Diffusion Models. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021; pp. 1–15. [Google Scholar]
- Lei, J.; Wang, Q.; Cheng, P.; Ba, Z.; Qin, Z.; Wang, Z.; Liu, Z.; Ren, K. Masked Diffusion Models Are Fast Distribution Learners. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 1 May 2023; pp. 1–23. [Google Scholar]
- Huang, S.; Lu, Z.; Cheng, R.; He, C. FaPN: Feature-aligned Pyramid Network for Dense Image Prediction. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 844–853. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar] [CrossRef]
- Xiao, T.; Liu, Y.; Zhou, B.; Jiang, Y.; Sun, J. Unified Perceptual Parsing for Scene Understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 418–434. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
- Chen, J.; Qin, D.; Hou, D.; Zhang, J.; Deng, M.; Sun, G. Multiscale Object Contrastive Learning–Derived Few-Shot Object Detection in VHR Imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Wang, D.; Zhang, J.; Du, B.; Xia, G.-S.; Tao, D. An Empirical Study of Remote Sensing Pretraining. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–20. [Google Scholar] [CrossRef]
- Gidaris, S.; Singh, P.; Komodakis, N. Unsupervised Representation Learning by Predicting Image Rotations. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–16. [Google Scholar]
- Bao, H.; Dong, L.; Piao, S.; Wei, F. BEiT: BERT Pre-Training of Image Transformers. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021; pp. 1–18. [Google Scholar]
- Wei, C.; Mangalam, K.; Huang, P.-Y.; Li, Y.; Fan, H.; Xu, H.; Wang, H.; Xie, C.; Yuille, A.; Feichtenhofer, C. Diffusion Models as Masked Autoencoders. arXiv 2023, arXiv:2304.03283. [Google Scholar] [CrossRef]
- Pan, Z.; Chen, J.; Shi, Y. Masked Diffusion as Self-supervised Representation Learner. arXiv 2023, arXiv:2308.05695. [Google Scholar] [CrossRef]
- Zhao, Z.; Bai, H.; Zhu, Y.; Zhang, J.; Xu, S.; Zhang, Y.; Zhang, K.; Meng, D.; Timofte, R.; Van Gool, L. DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 8082–8093. [Google Scholar] [CrossRef]
- Xu, Y.; Yu, W.; Ghamisi, P.; Kopp, M.; Hochreiter, S. Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks. IEEE Trans. Image Process. 2023, 32, 5737–5750. [Google Scholar] [CrossRef] [PubMed]
- Czerkawski, M.; Tachtatzis, C. Exploring the Capability of Text-to-Image Diffusion Models with Structural Edge Guidance for Multispectral Satellite Image Inpainting. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
- Xiao, Y.; Yuan, Q.; Jiang, K.; He, J.; Jin, X.; Zhang, L. EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
- Jia, J.; Lee, G.; Wang, Z.; Lyu, Z.; He, Y. Siamese Meets Diffusion Network: SMDNet for Enhanced Change Detection in High-Resolution RS Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 8189–8202. [Google Scholar] [CrossRef]
- Kolbeinsson, B.; Mikolajczyk, K. Multi-Class Segmentation from Aerial Views Using Recursive Noise Diffusion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 8439–8449. [Google Scholar]
- Chen, N.; Yue, J.; Fang, L.; Xia, S. SpectralDiff: A Generative Framework for Hyperspectral Image Classification with Diffusion Models. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–16. [Google Scholar] [CrossRef]
- Ma, J.; Xie, W.; Li, Y.; Fang, L. BSDM: Background Suppression Diffusion Model for Hyperspectral Anomaly Detection. arXiv 2023, arXiv:2307.09861. [Google Scholar] [CrossRef]
- Li, T.; Katabi, D.; He, K. Return of Unconditional Generation: A Self-supervised Representation Generation Method. arXiv 2023, arXiv:2312.03701. [Google Scholar] [CrossRef]
- Misra, I.; van der Maaten, L. Self-Supervised Learning of Pretext-Invariant Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 6707–6717. [Google Scholar] [CrossRef]
- Zhang, D.; Li, C.; Li, H.; Huang, W.; Huang, L.; Zhang, J. Rethinking Alignment and Uniformity in Unsupervised Image Semantic Segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11709–11717. [Google Scholar] [CrossRef]
- Kornblith, S.; Shlens, J.; Le, Q.V. Do Better ImageNet Models Transfer Better? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 2661–2671. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Mukhopadhyay, S.; Gwilliam, M.; Agarwal, V.; Padmanabhan, N.; Swaminathan, A.; Hegde, S.; Zhou, T.; Shrivastava, A. Diffusion Models Beat GANs on Image Classification. arXiv 2023, arXiv:2307.08702. [Google Scholar] [CrossRef]
- Melas-Kyriazi, L.; Rupprecht, C.; Laina, I.; Vedaldi, A. Finding an Unsupervised Image Segmenter in each of your Deep Generative Models. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021; pp. 1–18. [Google Scholar]
- Castillo-Navarro, J.; Le Saux, B.; Boulch, A.; Audebert, N.; Lefèvre, S. Semi-supervised semantic segmentation in Earth Observation: The MiniFrance suite, dataset analysis and multi-task network study. Mach. Learn. 2022, 111, 3125–3160. [Google Scholar] [CrossRef]
- Tong, X.-Y.; Xia, G.-S.; Lu, Q.; Shen, H.; Li, S.; You, S.; Zhang, L. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sens. Environ. 2020, 237, 111322. [Google Scholar] [CrossRef]
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 11976–11986. [Google Scholar] [CrossRef]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Event, 19–25 June 2021; pp. 6881–6890. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual Event, 6–14 December 2021; pp. 12077–12090. [Google Scholar]
- Li, X.; He, H.; Li, X.; Li, D.; Cheng, G.; Shi, J.; Weng, L.; Tong, Y.; Lin, Z. PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Event, 19–25 June 2021; pp. 4217–4226. [Google Scholar] [CrossRef]
- Li, Y.; Li, X.; Dai, Y.; Hou, Q.; Liu, L.; Liu, Y.; Cheng, M.-M.; Yang, J. LSKNet: A Foundation Lightweight Backbone for Remote Sensing. Int. J. Comput. Vis. 2025, 133, 1410–1431. [Google Scholar] [CrossRef]
DDPM Encoder | DDPM Decoder | Ft Paras * (M) | ||||||
---|---|---|---|---|---|---|---|---|
1/16 | Multi-Resolution | 1/16 | 1/8 | 1/4 | 1/2 | Multi-Resolution | ||
MLP segmentor | 35.4 | 40.6 | 38.5 | 39.7 | 41.6 | 36.3 | 42.7 | 16.3 |
UPerNet [46] | - | 42.4 | - | - | - | - | 42.5 | 57.3 |
UB | Ind. | PU | Mine | AA | AL | PC | Pas. | For. | HV | WL | Water | mIoU | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
t = [0] | 59.4 | 39.3 | 21.2 | 13.5 | 17.9 | 34.1 | 43.3 | 46.4 | 40.4 | 25.0 | 56.5 | 57.3 | 37.9 |
t = [50] | 62.7 | 42.1 | 21.3 | 13.9 | 19.2 | 37.1 | 44.4 | 47.7 | 42.2 | 28.0 | 60.2 | 61.5 | 40.0 |
t = [100] | 62.4 | 43.0 | 21.8 | 14.2 | 18.8 | 36.3 | 45.1 | 48.4 | 42.9 | 31.6 | 61.0 | 60.5 | 40.5 |
t = [100, 50] | 63.1 | 43.4 | 24.1 | 15.7 | 19.7 | 36.3 | 44.6 | 48.8 | 42.8 | 29.2 | 60.9 | 64.4 | 41.1 |
t = [200, 100, 50] | 62.6 | 43.0 | 22.1 | 13.5 | 19.1 | 36.0 | 44.4 | 47.4 | 40.9 | 31.5 | 60.8 | 62.5 | 40.3 |
t = [150, 100, 50] | 58.1 | 44.2 | 57.0 | 10.7 | 18.0 | 42.3 | 35.8 | 46.6 | 36.6 | 45.1 | 61.0 | 56.6 | 42.7 |
MiniFrance-S | MiniFrance-M | MiniFrance-L | Ft Params (M) | ||
---|---|---|---|---|---|
Random-init (ConvNeXt-B [71]) | 12.9 | 26.2 | 30.5 | 122.0 | |
IN-init (ConvNeXt-B) | Frozen | 16.6 34.2 | 28.3 37.7 | 32.6 44.5 | 21.6 |
Fine-tuning | 122.0 | ||||
MAE [19] | Frozen | 34.1 42.0 | 38.8 41.4 | 42.7 43.5 | 23.2 |
Fine-tuning | 96.4 | ||||
ScaleMAE [20] | Frozen | 32.9 | 40.1 | 43.9 | 23.2 |
Fine-tuning | 42.6 | 44.2 | 45.0 | 96.1 | |
MA3E [21] | Frozen | 36.3 | 40.5 | 43.3 | 23.2 |
Fine-tuning | 42.2 | 43.5 | 46.8 | 96.4 | |
ours | Frozen | 42.7 41.4 | 46.3 47.1 | 48.1 48.6 | 16.3 |
Fine-tuning | 107.5 |
Method | UB | Ind. | PU | Mine | AA | AL | PC | Pas. | For. | HV | WL | Water | mIoU | Ft-Paras |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SETR [72] | 58.4 | 36.7 | 11.4 | 11.3 | 16.1 | 34.5 | 45.2 | 45.9 | 38.6 | 18.6 | 51.3 | 43.7 | 34.3 | 73.2 M |
SegFormer-B4 [73] | 64.7 | 45.6 | 21.3 | 28.0 | 13.7 | 42.1 | 55.6 | 52.9 | 46.6 | 0.0 | 67.0 | 69.1 | 42.2 | 64.1 M |
PFNet [74] | 67.9 | 49.8 | 24.5 | 31.7 | 19.1 | 40.5 | 54.8 | 54.6 | 47.1 | 0.0 | 69.4 | 74.7 | 44.5 | 33.0 M |
MCS [59] | 59.9 | 79.7 | 24.9 | 20.1 | 13.6 | 23.3 | 44.2 | 54.2 | 77.9 | 29.2 | 68.0 | 52.0 | 45.6 | 11.8 M |
ConvNeXt-B [71] | 62.9 | 52.2 | 33.2 | 45.2 | 25.0 | 39.1 | 57.1 | 45.7 | 48.8 | 8.2 | 75.1 | 72.3 | 47.1 | 122.0 M |
LSKNet-T-FPN [75] | 69.9 | 53.2 | 28.8 | 39.9 | 19.3 | 43.6 | 58.6 | 52.9 | 53.2 | 5.4 | 74.6 | 75.5 | 47.9 | 15.0 M |
Ours (MiniFrance-S) | 58.1 | 44.2 | 57.0 | 10.7 | 18.0 | 42.3 | 35.8 | 46.6 | 36.6 | 45.1 | 61.0 | 56.6 | 42.7 | 16.3 M |
Ours (MiniFrance-L) | 67.3 | 49.7 | 31.7 | 34.9 | 20.5 | 40.8 | 54.8 | 56.4 | 50.4 | 22.6 | 75.5 | 77.2 | 48.6 | 16.3 M |
Method | IDL | UR | RR | TL | PF | IGL | DC | GP | AW | SL | NG | AG | River | Lake | Pond | mIoU | Ft-Paras |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SETR-M | 68.7 | 79.3 | 60.7 | 69.6 | 55.7 | 91.5 | 55.1 | 38.8 | 59.5 | 35.3 | 79.5 | 24.5 | 56.6 | 73.6 | 14.1 | 58.0 | 73.2 M |
SegFormer-B4 | 69.2 | 79.7 | 69.2 | 71.4 | 57.8 | 90.5 | 59.2 | 40.2 | 59.0 | 37.2 | 82.0 | 26.4 | 60.6 | 82.5 | 13.4 | 60.2 | 64.1 M |
PFNet | 69.6 | 79.3 | 65.3 | 71.2 | 55.8 | 89.4 | 62.2 | 46.8 | 62.5 | 30.4 | 82.1 | 22.5 | 61.1 | 82.2 | 22.3 | 60.2 | 33.0 M |
MCS | 75.2 | 76.5 | 66.7 | 48.7 | 59.8 | 56.2 | 75.3 | 43.9 | 75.6 | 40.1 | 63.3 | 81.8 | 75.2 | 71.2 | 21.5 | 62.1 | 11.8 M |
MA3E | 59.6 | 66.1 | 53.7 | 54.5 | 70.9 | 80.0 | 66.0 | 34.0 | 76.5 | 21.2 | 59.2 | 44.2 | 90.4 | 82.5 | 73.8 | 62.6 | 96.4 M |
ConvNeXt-B | 70.3 | 91.4 | 67.1 | 57.92 | 70.8 | 64.5 | 76.8 | 77.8 | 58.8 | 81.9 | 81.1 | 28.3 | 63.3 | 74.7 | 40.5 | 66.2 | 122.0 M |
LSKNet-T-FPN | 72.7 | 80.5 | 69.6 | 59.4 | 70.3 | 72.5 | 80.7 | 34.7 | 73.6 | 28.1 | 92.1 | 73.4 | 71.7 | 66.8 | 51.7 | 66.5 | 14.4 M |
ours | 71.8 | 92.9 | 69.9 | 59.7 | 70.6 | 63.8 | 81.0 | 40.3 | 81.9 | 21.0 | 92.4 | 78.2 | 74.0 | 59.3 | 49.2 | 67.1 | 16.3 M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, Y.; Wang, J.; Sequeira, J.; Yang, X.; Wang, D.; Liu, J.; Yao, G.; Mavromatis, S. Label-Efficient Fine-Tuning for Remote Sensing Imagery Segmentation with Diffusion Models. Remote Sens. 2025, 17, 2579. https://doi.org/10.3390/rs17152579
Luo Y, Wang J, Sequeira J, Yang X, Wang D, Liu J, Yao G, Mavromatis S. Label-Efficient Fine-Tuning for Remote Sensing Imagery Segmentation with Diffusion Models. Remote Sensing. 2025; 17(15):2579. https://doi.org/10.3390/rs17152579
Chicago/Turabian StyleLuo, Yiyun, Jinnian Wang, Jean Sequeira, Xiankun Yang, Dakang Wang, Jiabin Liu, Grekou Yao, and Sébastien Mavromatis. 2025. "Label-Efficient Fine-Tuning for Remote Sensing Imagery Segmentation with Diffusion Models" Remote Sensing 17, no. 15: 2579. https://doi.org/10.3390/rs17152579
APA StyleLuo, Y., Wang, J., Sequeira, J., Yang, X., Wang, D., Liu, J., Yao, G., & Mavromatis, S. (2025). Label-Efficient Fine-Tuning for Remote Sensing Imagery Segmentation with Diffusion Models. Remote Sensing, 17(15), 2579. https://doi.org/10.3390/rs17152579