Unsupervised Multi-Scale Hybrid Feature Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images
Abstract
:1. Introduction
- (1)
- The Multi-Scale Pixel-Guided CNN Encoder employs a parallel architecture, using deformable convolutions to learn offsets and adaptively adjust the convolutional receptive fields. These adjustments enhance the model’s ability to extract nonlinear and deformed features in the absence of label guidance and to achieve more accurate extraction of multi-scale local features when handling complex spatial structures in remote sensing images. This approach enhances both the model’s flexibility and accuracy. Additionally, the pixel-guided fusion module assesses the confidence levels of the extracted multi-scale local features within the same class and guides the precise fusion of local pixel features. This significantly enhances the model’s ability to capture fine-grained features in unsupervised semantic segmentation tasks.
- (2)
- The Multi-Scale Aggregation Transformer Encoder efficiently aggregates deeper multi-scale global features for improved feature representation. The integration of the Parallel Aggregation Pyramid Pooling Module (PAPPM) after the Feed-Forward Neural Network (FFN) layer of the conventional Transformer encoder allows the extracted contextual information to be extended to multiple scales in parallel and further fused. This approach effectively extracts deeper multi-scale features during unsupervised semantic segmentation training while minimizing computational overhead, thereby making the aggregation of global contextual information both more efficient and comprehensive. Additionally, it enhances the model’s ability to capture global contextual information in complex remote sensing scenes.
- (3)
- The Parallel Attention Fusion Module combines a channel attention module and a spatial attention module in parallel, with each stage processing features for fusion in the channel and spatial dimensions. This module fuses the multi-scale global and local features with the features fused at the previous stage during the stages. This approach gradually enhances the expressiveness of the fused features throughout the training process, ultimately providing more accurate features for unsupervised clustering.
2. Related Work
3. Materials and Methods
3.1. Multi-Scale Pixel-Guided CNN Encoder
3.2. Multi-Scale Aggregated Transformer Encoder
3.3. Parallel Attention Fusion Module
4. Experimental Results and Analysis
4.1. Datasets
4.2. Parameter Setting and Evaluation Index
4.3. Semantic Segmentation Results and Analysis
4.4. Ablation Experiments
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, J.; Xiong, R.; Yu, H.; Xu, G.; Xing, M. Nonparametric Full-Aperture Autofocus Imaging for Microwave Photonic SAR. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5214815. [Google Scholar] [CrossRef]
- Chen, J.; Li, M.; Yu, H.; Xing, M. Full-aperture processing of airborne microwave photonic SAR raw data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5218812. [Google Scholar] [CrossRef]
- Khaleel, T.A.; Mustafa, F.A.; Khattab, M.F. Applications of Sensor Networks and Remote Sensing in Environmental Sustainability: A Review. In Proceedings of the 2022 International Conference on Engineering & MIS (ICEMIS), Istanbul, Turkey, 4–6 July 2022; pp. 1–3. [Google Scholar]
- Li, X.; Wen, C.; Hu, Y.; Yuan, Z.; Zhu, X.X. Vision-language models in remote sensing: Current progress and future trends. IEEE Geosci. Remote Sens. Mag. 2024, 12, 32–66. [Google Scholar] [CrossRef]
- Qian, S.E. Overview of hyperspectral imaging remote sensing from satellites. In Advances in Hyperspectral Image Processing Techniques; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2022; pp. 41–66. [Google Scholar]
- Li, J.; Ou, Z. Remote Sensing Image Processing of Ecological Environment Monitoring Based on Multi-scale Retinex Algorithm. In Proceedings of the 2023 2nd International Conference on 3D Immersion, Interaction and Multi-Sensory Experiences (ICDIIME), Madrid, Spain, 27–29 June 2023; pp. 21–24. [Google Scholar]
- Kumar, C.M.; Nidamanuri, R.R.; Dadhwal, V.K. Subpixel level discrimination of vegetable crops in a complex landscape environment. In Proceedings of the 2023 International Conference on Machine Intelligence for GeoAnalytics and Remote Sensing (MIGARS), Hyderabad, India, 27–29 January 2023; Volume 1, pp. 1–4. [Google Scholar]
- Song, J.; Kim, D.j.; Hwang, J.H.; Kim, H.; Li, C.; Han, S.; Kim, J. Effective Vessel Recognition in High Resolution SAR Images Using Quantitative and Qualitative Training Data Enhancement From Target Velocity Phase Refocusing. IEEE Trans. Geosci. Remote Sens. 2024, 62, 3346171. [Google Scholar] [CrossRef]
- Peeling, J.A.; Chen, C.; Judge, J.; Singh, A.; Achidago, S.; Eide, A.; Tarrio, K.; Olofsson, P. Applications of Remote Sensing for Land Use Planning Scenarios with Suitability Analysis. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 6366–6378. [Google Scholar] [CrossRef]
- Khalsa, S.J.S.; Percivall, G. Standardization in Geoscience Remote Sensing. In Proceedings of the IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 4676–4678. [Google Scholar]
- Chauhan, K.; Tomar, H.; Kamal, K.; Goel, P. Feature Extraction from Image Sensing (Remote): Image Segmentation. In Proceedings of the 2023 5th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, 15–16 December 2023; pp. 227–232. [Google Scholar]
- Wang, Y.; Shao, Z.; Lu, T.; Wu, C.; Wang, J. Remote sensing image super-resolution via multiscale enhancement network. IEEE Geosci. Remote Sens. Lett. 2023, 20, 5000905. [Google Scholar] [CrossRef]
- Qiu, W.; Gu, L.; Gao, F.; Jiang, T. Building extraction from very high-resolution remote sensing images using refine-UNet. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6002905. [Google Scholar] [CrossRef]
- Chen, L.; Dou, X.; Peng, J.; Li, W.; Sun, B.; Li, H. EFCNet: Ensemble full convolutional network for semantic segmentation of high-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 8011705. [Google Scholar] [CrossRef]
- Meng, X.; Yang, Y.; Wang, L.; Wang, T.; Li, R.; Zhang, C. Class-guided swin transformer for semantic segmentation of remote sensing imagery. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6517505. [Google Scholar] [CrossRef]
- Moghimi, A.; Welzel, M.; Celik, T.; Schlurmann, T. A Comparative Performance Analysis of Popular Deep Learning Models and Segment Anything Model (SAM) for River Water Segmentation in Close-Range Remote Sensing Imagery. IEEE Access 2024, 12, 52067–52085. [Google Scholar] [CrossRef]
- Prado Osco, L.; Wu, Q.; Lopes de Lemos, E.; Nunes Gonçalves, W.; Marques Ramos, A.P.; Li, J.; Marcato Junior, J. The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot. arXiv 2023, arXiv:2306.16623. [Google Scholar]
- Shi, J.; Liu, W.; Shan, H.; Li, E.; Li, X.; Zhang, L. Remote sensing scene classification based on multibranch fusion attention network. IEEE Geosci. Remote Sens. Lett. 2023, 20, 3001505. [Google Scholar] [CrossRef]
- Huang, L.; Jiang, B.; Lv, S.; Liu, Y.; Fu, Y. Deep Learning-Based Semantic Segmentation of Remote Sensing Images: A Survey. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 8370–8396. [Google Scholar] [CrossRef]
- Zou, J.; Li, Z.; Lu, F.; He, W.; Zhang, H. Multimodal unsupervised domain adaptation for remote sensing image segmentation. In Proceedings of the 2023 13th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Athens, Greece, 31 October–2 November 2023; pp. 1–5. [Google Scholar]
- Jia, Y.; Wan, G.; Liu, J.; Zhao, C.; Wang, G.; Zhang, Y.; Liu, L.; Xie, B. A Multi-Scale Transformer Fusion Deep Clustering Network for Unsupervised Planetary Change Detection. IEEE Geosci. Remote Sens. Lett. 2023, 21, 8000205. [Google Scholar] [CrossRef]
- Nadgauda, S.S.; Pennamada, Y.R.; Sumathi, D. StegaNet: A Deep Learning Model for Image Steganography Using Customized CNN and Autoencoders. In Proceedings of the 2023 OITS International Conference on Information Technology (OCIT), Raipur, India, 13–15 December 2023; pp. 196–201. [Google Scholar]
- Yu, Y.; Liang, M.; Yin, M.; Lu, K.; Du, J.; Xue, Z. Unsupervised Multimodal Graph Contrastive Semantic Anchor Space Dynamic Knowledge Distillation Network for Cross-Media Hash Retrieval. In Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands, 13–16 May 2024; pp. 4699–4708. [Google Scholar]
- Liu, H.; Yao, M.; Xiao, X.; Zheng, B.; Cui, H. Marsscapes and udaformer: A panorama dataset and a transformer-based unsupervised domain adaptation framework for martian terrain segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 62, 4600117. [Google Scholar] [CrossRef]
- Zhang, L.; Lan, M.; Zhang, J.; Tao, D. Stagewise unsupervised domain adaptation with adversarial self-training for road segmentation of remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5609413. [Google Scholar] [CrossRef]
- Zhu, J.; Guo, Y.; Sun, G.; Yang, L.; Deng, M.; Chen, J. Unsupervised domain adaptation semantic segmentation of high-resolution remote sensing imagery with invariant domain-level prototype memory. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5603518. [Google Scholar] [CrossRef]
- Fallahreyhani, M.; Ghassemian, H.; Imani, M. Unsupervised Classification of Remotely Sensed High resolution Images using RP-CNN. In Proceedings of the 2024 13th Iranian/3rd International Machine Vision and Image Processing Conference (MVIP), Tehran, Iran, 6–7 March 2024; pp. 1–7. [Google Scholar]
- Wei, L.; Chen, G.; Zhou, Q.; Liu, C.; Cai, C. Cross-mapping net: Unsupervised change detection from heterogeneous remote sensing images using a transformer network. In Proceedings of the 2023 8th International Conference on Computer and Communication Systems (ICCCS), Guangzhou, China, 21–24 April 2023; pp. 1021–1026. [Google Scholar]
- Dai, L.; Zhang, G.; Zhang, R. RADANet: Road augmented deformable attention network for road extraction from complex high-resolution remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5602213. [Google Scholar] [CrossRef]
- Xiao, T.; Liu, Y.; Huang, Y.; Li, M.; Yang, G. Enhancing multiscale representations with transformer for remote sensing image semantic segmentation. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5605116. [Google Scholar] [CrossRef]
- Song, J.; Li, Y.; Li, X.; Yang, S.; Xie, J.; Zhu, R. Unsupervised remote sensing image classification with differentiable feature clustering by coupled transformer. J. Appl. Remote Sens. 2024, 18, 026505. [Google Scholar] [CrossRef]
- Maaz, M.; Shaker, A.; Cholakkal, H.; Khan, S.; Zamir, S.W.; Anwer, R.M.; Shahbaz Khan, F. Edgenext: Efficiently amalgamated cnn-transformer architecture for mobile vision applications. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 3–20. [Google Scholar]
- Cui, L.; Jing, X.; Wang, Y.; Huan, Y.; Xu, Y.; Zhang, Q. Improved swin transformer-based semantic segmentation of postearthquake dense buildings in urban areas using remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 16, 369–385. [Google Scholar] [CrossRef]
- Yang, Y.; Yuan, G.; Li, J. Multielement Feature-Based Hierarchical Context Integration Network for Remote Sensing Image Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 7971–7985. [Google Scholar] [CrossRef]
- Xi, W.; Sun, L.; Sun, J. Upgrade your network in-place with deformable convolution. In Proceedings of the 2020 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), Xuzhou, China, 16–19 October 2020; pp. 239–242. [Google Scholar]
- Xu, M.; Wang, W.; Wang, K.; Dong, S.; Sun, P.; Sun, J.; Luo, G. Vision Transformers (ViT) Pretraining on 3D ABUS Image and Dual-CapsViT: Enhancing ViT Decoding via Dual-Channel Dynamic Routing. In Proceedings of the 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Istanbul, Turkiye, 5–8 December 2023; pp. 1596–1603. [Google Scholar]
- Li, Z.; Guo, Y. Semantic segmentation of landslide images in Nyingchi region based on PSPNet network. In Proceedings of the 2020 7th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 18-20 December 2020; pp. 1269–1273. [Google Scholar]
- Namin, N.A.; Garaaghaji, E.; Rezaei, M.; Lighvan, M.Z. Light Weight Semantic Segmentation: A Modified DDRNET Approach Trained on Cityscapes and COCO-Stuff Datasets for Efficient Image Analysis. In Proceedings of the 2023 7th International Symposium on Innovative Approaches in Smart Technologies (ISAS), Istanbul, Turkiye, 23–25 November 2023; pp. 1–5. [Google Scholar]
- Chen, X.; Zou, Y.; Ke, H. TrafficYOLO: YOLO with Multi-Head Attention Mechanism for Traffic Detection Scenarios. In Proceedings of the 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), Nanjing, China, 22–24 March 2024; pp. 2276–2279. [Google Scholar]
- Heryadi, Y.; Irwansyah, E.; Miranda, E.; Soeparno, H.; Herlawati; Hashimoto, K. The effect of resnet model as feature extractor network to performance of DeepLabV3 model for semantic satellite image segmentation. In Proceedings of the 2020 IEEE Asia-Pacific Conference on Geoscience, Electronics and Remote Sensing Technology (AGERS), Jakarta, Indonesia, 7–8 December 2020; pp. 74–77. [Google Scholar]
- Guo, Z.; Zhao, L.; Yuan, J.; Yu, H. Msanet: Multiscale aggregation network integrating spatial and channel information for lung nodule detection. IEEE J. Biomed. Health Inform. 2021, 26, 2547–2558. [Google Scholar] [CrossRef] [PubMed]
- Meng, Y.; Yuan, Z.; Yang, J.; Liu, P.; Yan, J.; Zhu, H.; Ma, Z.; Jiang, Z.; Zhang, Z.; Mi, X. Cross-domain Land Cover Classification of Remote Sensing Images based on Full-level Domain Adaptation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 11434–11450. [Google Scholar] [CrossRef]
- Huang, H.; Li, B.; Zhang, Y.; Chen, T.; Wang, B. Joint distribution adaptive-alignment for cross-domain segmentation of high-resolution remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5401214. [Google Scholar] [CrossRef]
- Li, T.; Pei, G.; Cai, X.; Liu, H.; Wang, Q.; Yao, Y. Universal Organizer of SAM for Unsupervised Semantic Segmentation. arXiv 2024, arXiv:2405.11742. [Google Scholar]
Model | IoU | mIoU | OA | F1 | ||||
---|---|---|---|---|---|---|---|---|
Building | Low-Veg | Surface | Tree | Car | ||||
DeeplabV3+ | 52.79 | 22.57 | 47.91 | 50.70 | 22.64 | 40.86 | 61.74 | 53.9 |
MSANet | 70.34 | 34.78 | 50.07 | 43.61 | 37.84 | 47.65 | 65.92 | 60.1 |
FLDA-NET | 74.56 | 31.31 | 56.28 | 40.49 | 46.57 | 49.81 | 68.43 | 64.7 |
JDAF | 77.19 | 47.39 | 68.76 | 58.38 | 42.76 | 55.52 | 72.38 | 70.4 |
SAM | 75.49 | 48.06 | 64.80 | 51.84 | 46.88 | 49.69 | 75.62 | 72.5 |
OURS | 77.61 | 48.90 | 69.63 | 59.21 | 47.04 | 58.79 | 79.19 | 77.5 |
Model | IoU | mIoU | OA | F1 | ||||
---|---|---|---|---|---|---|---|---|
Building | Low-Veg | Surface | Tree | Car | ||||
DeeplabV3+ | 71.61 | 34.49 | 62.51 | 56.26 | 39.20 | 44.68 | 63.08 | 56.7 |
MSANet | 78.13 | 38.89 | 64.17 | 58.37 | 44.86 | 50.75 | 68.85 | 65.1 |
FLDA-NET | 82.07 | 38.46 | 61.06 | 61.91 | 51.18 | 58.94 | 75.13 | 71.6 |
JDAF | 81.15 | 49.84 | 68.10 | 66.57 | 53.23 | 60.07 | 78.83 | 76.9 |
SAM | 74.33 | 47.76 | 64.81 | 62.65 | 50.37 | 58.39 | 79.48 | 77.3 |
OURS | 83.54 | 54.77 | 71.09 | 67.61 | 56.73 | 62.81 | 83.14 | 80.4 |
Model | FLOPs (G) | Inference Time (ms) |
---|---|---|
DeeplabV3+ | 56.3 | 77 |
MSANet | 44.7 | 82 |
FLDA-NET | 83.4 | 118 |
JDAF | 126.1 | 149 |
SAM | 185.8 | 174 |
OURS | 159.6 | 167 |
Baseline | Multi-Scale Pixel-Guided CNN Encoder | Multi-Scale Aggregation Transformer Encoder | Parallel Attention Fusion Module | mIoU | OA | F1 |
---|---|---|---|---|---|---|
✓ | 17.64 | 33.94 | 28.7 | |||
✓ | 20.09 | 36.77 | 34.1 | |||
✓ | 34.89 | 54.95 | 52.0 | |||
✓ | ✓ | ✓ | 58.79 | 79.19 | 77.5 |
Baseline | Multi-Scale Pixel-Guided CNN Encoder | Multi-Scale Aggregation Transformer Encoder | Parallel Attention Fusion Module | mIoU | OA | F1 |
---|---|---|---|---|---|---|
✓ | 21.47 | 38.87 | 30.6 | |||
✓ | 28.31 | 47.36 | 45.3 | |||
✓ | 37.64 | 58.86 | 56.9 | |||
✓ | ✓ | ✓ | 62.81 | 83.14 | 80.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, W.; Nie, F.; Wang, C.; Jiang, Y.; Wu, Y. Unsupervised Multi-Scale Hybrid Feature Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2024, 16, 3774. https://doi.org/10.3390/rs16203774
Song W, Nie F, Wang C, Jiang Y, Wu Y. Unsupervised Multi-Scale Hybrid Feature Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sensing. 2024; 16(20):3774. https://doi.org/10.3390/rs16203774
Chicago/Turabian StyleSong, Wanying, Fangxin Nie, Chi Wang, Yinyin Jiang, and Yan Wu. 2024. "Unsupervised Multi-Scale Hybrid Feature Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images" Remote Sensing 16, no. 20: 3774. https://doi.org/10.3390/rs16203774
APA StyleSong, W., Nie, F., Wang, C., Jiang, Y., & Wu, Y. (2024). Unsupervised Multi-Scale Hybrid Feature Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sensing, 16(20), 3774. https://doi.org/10.3390/rs16203774