remotesensing-logo

Journal Browser

Journal Browser

Remote Sensing Image Classification and Semantic Segmentation

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "Remote Sensing Image Processing".

Deadline for manuscript submissions: closed (20 February 2024) | Viewed by 31880

Special Issue Editors


E-Mail Website
Guest Editor
The State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China
Interests: remote sensing image processing; spectral super-resolution; 3D computer vision; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Gipsa-Lab, Grenoble Institute of Technology, 38031 Grenoble, France
Interests: image analysis; hyperspectral remote sensing; data fusion; machine learning; artificial intelligence
Special Issues, Collections and Topics in MDPI journals
School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
Interests: deep learning; artificial intelligence; feature extraction; geophysical image processing; image segmentation; remote sensing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
The State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China
Interests: hyperspectral image processing; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
The State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China
Interests: image/video codec; computer vision; 3D computer vision; remote sensing image processing

E-Mail Website
Guest Editor
The state key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China
Interests: image/video processing; coding and transmission; chip design; high-performance computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

With the rapid growth in remote sensing imaging technology, vast amounts of remote sensing data are generated, which is nontrivial for land-monitoring systems, national security, agriculture, medical, and atmosphere, etc., for Earth, Mars, etc. In recent decades, deep learning techniques have had a significant effect on remote sensing data processing and analysis, especially in image classification and semantic segmentation. However, several challenges still exist due to the limited number of annotated datasets, restricted computing resources, the special characteristics of different sensors or data sources, the complexity and diversity of large-scale areas and other specific problems, which make deep-learning-based algorithms more difficult in real-world applications. Therefore, the novel deep neural networks combined with few-shot learning, meta-learning, attention mechanisms or other new transformer technologies need to be given more attention, which is of vital importance in remote sensing image classification and semantic segmentation. It is also necessary to develop lightweight, explainable, and robust networks for remote image applications, especially image classification and sematic segmentation.

This Special Issue aims to develop state-of-the-art deep networks for more accurate remote sensing image classification and sematic segmentation. Furthermore, it also aims to achieve a cross-domain performance with high efficiency through a lightweight network design.

This Special Issue encourages authors to submit research articles, review articles, or application-oriented articles on topics regarding remote sensing image classification, semantic segmentation, detection, spectral super-resolution and understanding-related works; these include, but are not limited to, the following topics:

  • Machine/deep-learning-based algorithms;
  • Remote sensing image processing and pattern recognition;
  • Image classification;
  • Semantic segmentation;
  • Target detection/change detection;
  • Image or data fusion/fusion classification;
  • Lightweight deep neural networks;
  • Domain-adaptation/few-shot-learning/meta-learning-based algorithms;
  • Onboard real-time applications.

Dr. Jiaojiao Li
Prof. Dr. Qian Du
Prof. Dr. Jocelyn Chanussot
Prof. Dr. Wei Li
Dr. Bobo Xi
Prof. Dr. Rui Song
Prof. Dr. Yunsong Li
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • remote sensing
  • deep learning
  • semantic segmentation
  • classification
  • cross-domain
  • earth observation

Published Papers (24 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

20 pages, 4443 KiB  
Article
PointMM: Point Cloud Semantic Segmentation CNN under Multi-Spatial Feature Encoding and Multi-Head Attention Pooling
by Ruixing Chen, Jun Wu, Ying Luo and Gang Xu
Remote Sens. 2024, 16(7), 1246; https://doi.org/10.3390/rs16071246 - 31 Mar 2024
Viewed by 496
Abstract
For the actual collected point cloud data, there are widespread challenges such as semantic inconsistency, density variations, and sparse spatial distribution. A network called PointMM is developed in this study to enhance the accuracy of point cloud semantic segmentation in complex scenes. The [...] Read more.
For the actual collected point cloud data, there are widespread challenges such as semantic inconsistency, density variations, and sparse spatial distribution. A network called PointMM is developed in this study to enhance the accuracy of point cloud semantic segmentation in complex scenes. The main contribution of PointMM involves two aspects: (1) Multi-spatial feature encoding. We leverage a novel feature encoding module to learn multi-spatial features from the neighborhood point set obtained by k-nearest neighbors (KNN) in the feature space. This enhances the network’s ability to learn the spatial structures of various samples more finely and completely. (2) Multi-head attention pooling. We leverage a multi-head attention pooling module to address the limitations of symmetric function-based pooling, such as maximum and average pooling, in terms of losing detailed feature information. This is achieved by aggregating multi-spatial and attribute features of point clouds, thereby enhancing the network’s ability to transmit information more comprehensively and accurately. Experiments on publicly available point cloud datasets S3DIS and ISPRS 3D Vaihingen demonstrate that PointMM effectively learns features at different levels, while improving the semantic segmentation accuracy of various objects. Compared to 12 state-of-the-art methods reported in the literature, PointMM outperforms the runner-up by 2.3% in OA on the ISPRS 3D Vaihingen dataset, and achieves the third best performance in both OA and MioU on the S3DIS dataset. Both achieve a satisfactory balance between OA, F1, and MioU. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

22 pages, 7238 KiB  
Article
ASPP+-LANet: A Multi-Scale Context Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images
by Lei Hu, Xun Zhou, Jiachen Ruan and Supeng Li
Remote Sens. 2024, 16(6), 1036; https://doi.org/10.3390/rs16061036 - 14 Mar 2024
Viewed by 610
Abstract
Semantic segmentation of remote sensing (RS) images is a pivotal branch in the realm of RS image processing, which plays a significant role in urban planning, building extraction, vegetation extraction, etc. With the continuous advancement of remote sensing technology, the spatial resolution of [...] Read more.
Semantic segmentation of remote sensing (RS) images is a pivotal branch in the realm of RS image processing, which plays a significant role in urban planning, building extraction, vegetation extraction, etc. With the continuous advancement of remote sensing technology, the spatial resolution of remote sensing images is progressively improving. This escalation in resolution gives rise to challenges like imbalanced class distributions among ground objects in RS images, the significant variations of ground object scales, as well as the presence of redundant information and noise interference. In this paper, we propose a multi-scale context extraction network, ASPP+-LANet, based on the LANet for semantic segmentation of high-resolution RS images. Firstly, we design an ASPP+ module, expanding upon the ASPP module by incorporating an additional feature extraction channel, redesigning the dilation rates, and introducing the Coordinate Attention (CA) mechanism so that it can effectively improve the segmentation performance of ground object targets at different scales. Secondly, we introduce the Funnel ReLU (FReLU) activation function for enhancing the segmentation effect of slender ground object targets and refining the segmentation edges. The experimental results show that our network model demonstrates superior segmentation performance on both Potsdam and Vaihingen datasets, outperforming other state-of-the-art (SOTA) methods. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

20 pages, 65421 KiB  
Article
Multi-Scale Feature Fusion Network with Symmetric Attention for Land Cover Classification Using SAR and Optical Images
by Dongdong Xu, Zheng Li, Hao Feng, Fanlu Wu and Yongcheng Wang
Remote Sens. 2024, 16(6), 957; https://doi.org/10.3390/rs16060957 - 08 Mar 2024
Viewed by 713
Abstract
The complementary characteristics of SAR and optical images are beneficial in improving the accuracy of land cover classification. Deep learning-based models have achieved some notable results. However, how to effectively extract and fuse the unique features of multi-modal images for pixel-level classification remains [...] Read more.
The complementary characteristics of SAR and optical images are beneficial in improving the accuracy of land cover classification. Deep learning-based models have achieved some notable results. However, how to effectively extract and fuse the unique features of multi-modal images for pixel-level classification remains challenging. In this article, a two-branch supervised semantic segmentation framework without any pretrained backbone is proposed. Specifically, a novel symmetric attention module is designed with improved strip pooling. The multiple long receptive fields can better perceive irregular objects and obtain more anisotropic contextual information. Meanwhile, to solve the semantic absence and inconsistency of different modalities, we construct a multi-scale fusion module, which is composed of atrous spatial pyramid pooling, varisized convolutions and skip connections. A joint loss function is introduced to constrain the backpropagation and reduce the impact of class imbalance. Validation experiments were implemented on the DFC2020 and WHU-OPT-SAR datasets. The proposed model achieved the best quantitative values on the metrics of OA, Kappa and mIoU, and its class accuracy was also excellent. It is worth mentioning that the number of parameters and the computational complexity of the method are relatively low. The adaptability of the model was verified on RGB–thermal segmentation task. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

26 pages, 4583 KiB  
Article
An Overlay Accelerator of DeepLab CNN for Spacecraft Image Segmentation on FPGA
by Zibo Guo, Kai Liu, Wei Liu, Xiaoyao Sun, Chongyang Ding and Shangrong Li
Remote Sens. 2024, 16(5), 894; https://doi.org/10.3390/rs16050894 - 02 Mar 2024
Viewed by 874
Abstract
Due to the absence of communication and coordination with external spacecraft, non-cooperative spacecraft present challenges for the servicing spacecraft in acquiring information about their pose and location. The accurate segmentation of non-cooperative spacecraft components in images is a crucial step in autonomously sensing [...] Read more.
Due to the absence of communication and coordination with external spacecraft, non-cooperative spacecraft present challenges for the servicing spacecraft in acquiring information about their pose and location. The accurate segmentation of non-cooperative spacecraft components in images is a crucial step in autonomously sensing the pose of non-cooperative spacecraft. This paper presents a novel overlay accelerator of DeepLab Convolutional Neural Networks (CNNs) for spacecraft image segmentation on a FPGA. First, several software–hardware co-design aspects are investigated: (1) A CNNs-domain COD instruction set (Control, Operation, Data Transfer) is presented based on a Load–Store architecture to enable the implementation of accelerator overlays. (2) An RTL-based prototype accelerator is developed for the COD instruction set. The accelerator incorporates dedicated units for instruction decoding and dispatch, scheduling, memory management, and operation execution. (3) A compiler is designed that leverages tiling and operation fusion techniques to optimize the execution of CNNs, generating binary instructions for the optimized operations. Our accelerator is implemented on a Xilinx Virtex-7 XC7VX690T FPGA at 200 MHz. Experiments demonstrate that with INT16 quantization our accelerator achieves an accuracy (mIoU) of 77.84%, experiencing only a 0.2% degradation compared to that of the original fully precision model, in accelerating the segmentation model of DeepLabv3+ ResNet18 on the spacecraft component images (SCIs) dataset. The accelerator boasts a performance of 184.19 GOPS/s and a computational efficiency (Runtime Throughput/Theoretical Roof Throughput) of 88.72%. Compared to previous work, our accelerator improves performance by 1.5× and computational efficiency by 43.93%, all while consuming similar hardware resources. Additionally, in terms of instruction encoding, our instructions reduce the size by 1.5× to 49× when compiling the same model compared to previous work. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

20 pages, 8569 KiB  
Article
A Joint Convolutional Cross ViT Network for Hyperspectral and Light Detection and Ranging Fusion Classification
by Haitao Xu, Tie Zheng, Yuzhe Liu, Zhiyuan Zhang, Changbin Xue and Jiaojiao Li
Remote Sens. 2024, 16(3), 489; https://doi.org/10.3390/rs16030489 - 26 Jan 2024
Cited by 1 | Viewed by 1160
Abstract
The fusion of hyperspectral imagery (HSI) and light detection and ranging (LiDAR) data for classification has received widespread attention and has led to significant progress in research and remote sensing applications. However, existing common CNN architectures suffer from the significant drawback of not [...] Read more.
The fusion of hyperspectral imagery (HSI) and light detection and ranging (LiDAR) data for classification has received widespread attention and has led to significant progress in research and remote sensing applications. However, existing common CNN architectures suffer from the significant drawback of not being able to model remote sensing images globally, while transformer architectures are not able to capture local features effectively. To address these bottlenecks, this paper proposes a classification framework for multisource remote sensing image fusion. First, a spatial and spectral feature projection network is constructed based on parallel feature extraction by combining HSI and LiDAR data, which is conducive to extracting joint spatial, spectral, and elevation features from different source data. Furthermore, in order to construct local–global nonlinear feature mapping more flexibly, a network architecture coupling together multiscale convolution and a multiscale vision transformer is proposed. Moreover, a plug-and-play nonlocal feature token aggregation module is designed to adaptively adjust the domain offsets between different features, while a class token is employed to reduce the complexity of high-dimensional feature fusion. On three open-source remote sensing datasets, the performance of the proposed multisource fusion classification framework improves about 1% to 3% over other state-of-the-art algorithms. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

25 pages, 8436 KiB  
Article
Self-Distillation-Based Polarimetric Image Classification with Noisy and Sparse Labels
by Ningwei Wang, Haixia Bi, Fan Li, Chen Xu and Jinghuai Gao
Remote Sens. 2023, 15(24), 5751; https://doi.org/10.3390/rs15245751 - 15 Dec 2023
Viewed by 631
Abstract
Polarimetric synthetic aperture radar (PolSAR) image classification, a field crucial in remote sensing, faces significant challenges due to the intricate expertise required for accurate annotation, leading to susceptibility to labeling inaccuracies. Compounding this challenge are the constraints posed by limited labeled samples and [...] Read more.
Polarimetric synthetic aperture radar (PolSAR) image classification, a field crucial in remote sensing, faces significant challenges due to the intricate expertise required for accurate annotation, leading to susceptibility to labeling inaccuracies. Compounding this challenge are the constraints posed by limited labeled samples and the perennial issue of class imbalance inherent in PolSAR image classification. Our research objectives are to address these challenges by developing a novel label correction mechanism, implementing self-distillation-based contrastive learning, and introducing a sample rebalancing loss function. To address the quandary of noisy labels, we proffer a novel label correction mechanism that capitalizes on inherent sample similarities to rectify erroneously labeled instances. In parallel, to mitigate the limitation of sparsely labeled data, this study delves into self-distillation-based contrastive learning, harnessing sample affinities for nuanced feature extraction. Moreover, we introduce a sample rebalancing loss function that adjusts class weights and augments data for small classes. Through extensive experiments on four benchmark PolSAR images, our approach demonstrates its effectiveness in addressing label inaccuracies, limited samples, and class imbalance. Through extensive experiments on four benchmark PolSAR images, our research substantiates the robustness of our proposed methodology, particularly in rectifying label discrepancies in contexts marked by sample paucity and imbalance. The empirical findings illuminate the superior efficacy of our approach, positioning it at the forefront of state-of-the-art PolSAR classification techniques. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

19 pages, 16515 KiB  
Article
SDAT-Former++: A Foggy Scene Semantic Segmentation Method with Stronger Domain Adaption Teacher for Remote Sensing Images
by Ziquan Wang, Yongsheng Zhang, Zhenchao Zhang, Zhipeng Jiang, Ying Yu, Li Li and Lei Zhang
Remote Sens. 2023, 15(24), 5704; https://doi.org/10.3390/rs15245704 - 12 Dec 2023
Viewed by 814
Abstract
Semantic segmentation based on optical images can provide comprehensive scene information for intelligent vehicle systems, thus aiding in scene perception and decision making. However, under adverse weather conditions (such as fog), the performance of methods can be compromised due to incomplete observations. Considering [...] Read more.
Semantic segmentation based on optical images can provide comprehensive scene information for intelligent vehicle systems, thus aiding in scene perception and decision making. However, under adverse weather conditions (such as fog), the performance of methods can be compromised due to incomplete observations. Considering the success of domain adaptation in recent years, we believe it is reasonable to transfer knowledge from clear and existing annotated datasets to images with fog. Technically, we follow the main workflow of the previous SDAT-Former method, which incorporates fog and style-factor knowledge into the teacher segmentor to generate better pseudo-labels for guiding the student segmentor, but we identify and address some issues, achieving significant improvements. Firstly, we introduce a consistency loss for learning from multiple source data to better converge the performance of each component. Secondly, we apply positional encoding to the features of fog-invariant adversarial learning, strengthening the model’s ability to handle the details of foggy entities. Furthermore, to address the complexity and noise in the original version, we integrate a simple but effective masked learning technique into a unified, end-to-end training process. Finally, we regularize the knowledge transfer in the original method through re-weighting. We tested our SDAT-Former++ on mainstream benchmarks for semantic segmentation in foggy scenes, demonstrating improvements of 3.3%, 4.8%, and 1.1% (as measured by the mIoU) on the ACDC, Foggy Zurich, and Foggy Driving datasets, respectively, compared to the original version. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

21 pages, 11351 KiB  
Article
Learn to Few-Shot Segment Remote Sensing Images from Irrelevant Data
by Qingwei Sun, Jiangang Chao, Wanhong Lin, Zhenying Xu, Wei Chen and Ning He
Remote Sens. 2023, 15(20), 4937; https://doi.org/10.3390/rs15204937 - 12 Oct 2023
Cited by 2 | Viewed by 734
Abstract
Few-shot semantic segmentation (FSS) is committed to segmenting new classes with only a few labels. Generally, FSS assumes that base classes and novel classes belong to the same domain, which limits FSS’s application in a wide range of areas. In particular, since annotation [...] Read more.
Few-shot semantic segmentation (FSS) is committed to segmenting new classes with only a few labels. Generally, FSS assumes that base classes and novel classes belong to the same domain, which limits FSS’s application in a wide range of areas. In particular, since annotation is time-consuming, it is not cost-effective to process remote sensing images using FSS. To address this issue, we designed a feature transformation network (FTNet) for learning to few-shot segment remote sensing images from irrelevant data (FSS-RSI). The main idea is to train networks on irrelevant, already labeled data but inference on remote sensing images. In other words, the training and testing data neither belong to the same domain nor category. The FTNet contains two main modules: a feature transformation module (FTM) and a hierarchical transformer module (HTM). Among them, the FTM transforms features into a domain-agnostic high-level anchor, and the HTM hierarchically enhances matching between support and query features. Moreover, to promote the development of FSS-RSI, we established a new benchmark, which other researchers may use. Our experiments demonstrate that our model outperforms the cutting-edge few-shot semantic segmentation method by 25.39% and 21.31% in the one-shot and five-shot settings, respectively. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

27 pages, 6209 KiB  
Article
A New Architecture of a Complex-Valued Convolutional Neural Network for PolSAR Image Classification
by Yihui Ren, Wen Jiang and Ying Liu
Remote Sens. 2023, 15(19), 4801; https://doi.org/10.3390/rs15194801 - 01 Oct 2023
Cited by 2 | Viewed by 1229
Abstract
Polarimetric synthetic aperture radar (PolSAR) image classification has been an important area of research due to its wide range of applications. Traditional machine learning methods were insufficient in achieving satisfactory results before the advent of deep learning. Results have significantly improved with the [...] Read more.
Polarimetric synthetic aperture radar (PolSAR) image classification has been an important area of research due to its wide range of applications. Traditional machine learning methods were insufficient in achieving satisfactory results before the advent of deep learning. Results have significantly improved with the widespread use of deep learning in PolSAR image classification. However, the challenge of reconciling the complex-valued inputs of PolSAR images with the real-valued models of deep learning remains unsolved. Current complex-valued deep learning models treat complex numbers as two distinct real numbers, providing limited assistance in PolSAR image classification results. This paper proposes a novel, complex-valued deep learning approach for PolSAR image classification to address this issue. The approach includes amplitude-based max pooling, complex-valued nonlinear activation, and a cross-entropy loss function based on complex-valued probability. Amplitude-based max pooling reduces computational effort while preserving the most valuable complex-valued features. Complex-valued nonlinear activation maps feature into a high-dimensional complex-domain space, producing the most discriminative features. The complex-valued cross-entropy loss function computes the classification loss using the complex-valued model output and dataset labels, resulting in more accurate and robust classification results. The proposed method was applied to a shallow CNN, deep CNN, FCN, and SegNet, and its effectiveness was verified on three public datasets. The results showed that the method achieved optimal classification results on any model and dataset. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

19 pages, 6321 KiB  
Article
GLF-Net: A Semantic Segmentation Model Fusing Global and Local Features for High-Resolution Remote Sensing Images
by Wanying Song, Xinwei Zhou, Shiru Zhang, Yan Wu and Peng Zhang
Remote Sens. 2023, 15(19), 4649; https://doi.org/10.3390/rs15194649 - 22 Sep 2023
Viewed by 870
Abstract
Semantic segmentation of high-resolution remote sensing images holds paramount importance in the field of remote sensing. To better excavate and fully fuse the features in high-resolution remote sensing images, this paper introduces a novel Global and Local Feature Fusion Network, abbreviated as GLF-Net, [...] Read more.
Semantic segmentation of high-resolution remote sensing images holds paramount importance in the field of remote sensing. To better excavate and fully fuse the features in high-resolution remote sensing images, this paper introduces a novel Global and Local Feature Fusion Network, abbreviated as GLF-Net, by incorporating the extensive contextual information and refined fine-grained features. The proposed GLF-Net, devised as an encoder–decoder network, employs the powerful ResNet50 as its baseline model. It incorporates two pivotal components within the encoder phase: a Covariance Attention Module (CAM) and a Local Fine-Grained Extraction Module (LFM). And an additional wavelet self-attention module (WST) is integrated into the decoder stage. The CAM effectively extracts the features of different scales from various stages of the ResNet and then encodes them with graph convolutions. In this way, the proposed GLF-Net model can well capture the global contextual information with both universality and consistency. Additionally, the local feature extraction module refines the feature map by encoding the semantic and spatial information, thereby capturing the local fine-grained features in images. Furthermore, the WST maximizes the synergy between the high-frequency and the low-frequency information, facilitating the fusion of global and local features for better performance in semantic segmentation. The effectiveness of the proposed GLF-Net model is validated through experiments conducted on the ISPRS Potsdam and Vaihingen datasets. The results verify that it can greatly improve segmentation accuracy. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

26 pages, 6250 KiB  
Article
TCUNet: A Lightweight Dual-Branch Parallel Network for Sea–Land Segmentation in Remote Sensing Images
by Xuan Xiong, Xiaopeng Wang, Jiahua Zhang, Baoxiang Huang and Runfeng Du
Remote Sens. 2023, 15(18), 4413; https://doi.org/10.3390/rs15184413 - 07 Sep 2023
Cited by 1 | Viewed by 1016
Abstract
Remote sensing techniques for shoreline extraction are crucial for monitoring changes in erosion rates, surface hydrology, and ecosystem structure. In recent years, Convolutional neural networks (CNNs) have developed as a cutting-edge deep learning technique that has been extensively used in shoreline extraction from [...] Read more.
Remote sensing techniques for shoreline extraction are crucial for monitoring changes in erosion rates, surface hydrology, and ecosystem structure. In recent years, Convolutional neural networks (CNNs) have developed as a cutting-edge deep learning technique that has been extensively used in shoreline extraction from remote sensing images, owing to their exceptional feature extraction capabilities. They are progressively replacing traditional methods in this field. However, most CNN models only focus on the features in local receptive fields, and overlook the consideration of global contextual information, which will hamper the model’s ability to perform a precise segmentation of boundaries and small objects, consequently leading to unsatisfactory segmentation results. To solve this problem, we propose a parallel semantic segmentation network (TCU-Net) combining CNN and Transformer, to extract shorelines from multispectral remote sensing images, and improve the extraction accuracy. Firstly, TCU-Net imports the Pyramid Vision Transformer V2 (PVT V2) network and ResNet, which serve as backbones for the Transformer branch and CNN branch, respectively, forming a parallel dual-encoder structure for the extraction of both global and local features. Furthermore, a feature interaction module is designed to achieve information exchange, and complementary advantages of features, between the two branches. Secondly, for the decoder part, we propose a cross-scale multi-source feature fusion module to replace the original UNet decoder block, to aggregate multi-scale semantic features more effectively. In addition, a sea–land segmentation dataset covering the Yellow Sea region (GF Dataset) is constructed through the processing of three scenes from Gaofen-6 remote sensing images. We perform a comprehensive experiment with the GF dataset to compare the proposed method with mainstream semantic segmentation models, and the results demonstrate that TCU-Net outperforms the competing models in all three evaluation indices: the PA (pixel accuracy), F1-score, and MIoU (mean intersection over union), while requiring significantly fewer parameters and computational resources compared to other models. These results indicate that the TCU-Net model proposed in this article can extract the shoreline from remote sensing images more effectively, with a shorter time, and lower computational overhead. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

23 pages, 4666 KiB  
Article
RockSeg: A Novel Semantic Segmentation Network Based on a Hybrid Framework Combining a Convolutional Neural Network and Transformer for Deep Space Rock Images
by Lili Fan, Jiabin Yuan, Xuewei Niu, Keke Zha and Weiqi Ma
Remote Sens. 2023, 15(16), 3935; https://doi.org/10.3390/rs15163935 - 09 Aug 2023
Cited by 1 | Viewed by 1166
Abstract
Rock detection on the surface of celestial bodies is critical in the deep space environment for obstacle avoidance and path planning of space probes. However, in the remote and complex deep environment, rocks have the characteristics of irregular shape, being similar to the [...] Read more.
Rock detection on the surface of celestial bodies is critical in the deep space environment for obstacle avoidance and path planning of space probes. However, in the remote and complex deep environment, rocks have the characteristics of irregular shape, being similar to the background, sparse pixel characteristics, and being easy for light and dust to affect. Most existing methods face significant challenges to attain high accuracy and low computational complexity in rock detection. In this paper, we propose a novel semantic segmentation network based on a hybrid framework combining CNN and transformer for deep space rock images, namely RockSeg. The network includes a multiscale low-level feature fusion (MSF) module and an efficient backbone network for feature extraction to achieve the effective segmentation of the rocks. Firstly, in the network encoder, we propose a new backbone network (Resnet-T) that combines the part of the Resnet backbone and the transformer block with a multi-headed attention mechanism to capture the global context information. Additionally, a simple and efficient multiscale feature fusion module is designed to fuse low-level features at different scales to generate richer and more detailed feature maps. In the network decoder, these feature maps are integrated with the output feature maps to obtain more precise semantic segmentation results. Finally, we conduct experiments on two deep space rock datasets: the MoonData and MarsData datasets. The experimental results demonstrate that the proposed model outperforms state-of-the-art rock detection algorithms under the conditions of low computational complexity and fast inference speed. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

20 pages, 11396 KiB  
Article
Neural Network Compression via Low Frequency Preference
by Chaoyan Zhang, Cheng Li, Baolong Guo and Nannan Liao
Remote Sens. 2023, 15(12), 3144; https://doi.org/10.3390/rs15123144 - 16 Jun 2023
Cited by 1 | Viewed by 1049
Abstract
Network pruning has been widely used in model compression techniques, and offers a promising prospect for deploying models on devices with limited resources. Nevertheless, existing pruning methods merely consider the importance of feature maps and filters in the spatial domain. In this paper, [...] Read more.
Network pruning has been widely used in model compression techniques, and offers a promising prospect for deploying models on devices with limited resources. Nevertheless, existing pruning methods merely consider the importance of feature maps and filters in the spatial domain. In this paper, we re-consider the model characteristics and propose a novel filter pruning method that corresponds to the human visual system, termed Low Frequency Preference (LFP), in the frequency domain. It is essentially an indicator that determines the importance of a filter based on the relative low-frequency components across channels, which can be intuitively understood as a measurement of the “low-frequency components”. When the feature map of a filter has more low-frequency components than the other feature maps, it is considered more crucial and should be preserved during the pruning process. We conduct the proposed LFP on three different scales of datasets through several models and achieve superior performances. The experimental results obtained on the CIFAR datasets and ImageNet dataset demonstrate that our method significantly reduces the model size and FLOPs. The results on the UC Merced dataset show that our approach is also significant for remote sensing image classification. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

15 pages, 16100 KiB  
Article
Multi-Pooling Context Network for Image Semantic Segmentation
by Qing Liu, Yongsheng Dong, Zhiqiang Jiang, Yuanhua Pei, Boshi Zheng, Lintao Zheng and Zhumu Fu
Remote Sens. 2023, 15(11), 2800; https://doi.org/10.3390/rs15112800 - 28 May 2023
Cited by 4 | Viewed by 1480
Abstract
With the development of image segmentation technology, image context information plays an increasingly important role in semantic segmentation. However, due to the complexity of context information in different feature maps, simple context capture operations can easily cause context information omission. Rich context information [...] Read more.
With the development of image segmentation technology, image context information plays an increasingly important role in semantic segmentation. However, due to the complexity of context information in different feature maps, simple context capture operations can easily cause context information omission. Rich context information can better classify categories and improve the quality of image segmentation. On the contrary, poor context information will lead to blurred image category segmentation and an incomplete target edge. In order to capture rich context information as completely as possible, we constructed a Multi-Pooling Context Network (MPCNet), which is a multi-pool contextual network for the semantic segmentation of images. Specifically, we first proposed the Pooling Context Aggregation Module to capture the deep context information of the image by processing the information between the space, channel, and pixel of the image. At the same time, the Spatial Context Module was constructed to capture the detailed spatial context of images at different stages of the network. The whole network structure adopted the form of codec to better extract image context. Finally, we performed extensive experiments on three semantic segmentation datasets (Cityscapes, ADE20K, and PASCAL VOC2012 datasets), which fully proved that our proposed network effectively alleviated the lack of context extraction and verified the effectiveness of the network. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

20 pages, 10575 KiB  
Article
Unmixing-Guided Convolutional Transformer for Spectral Reconstruction
by Shiyao Duan, Jiaojiao Li, Rui Song, Yunsong Li and Qian Du
Remote Sens. 2023, 15(10), 2619; https://doi.org/10.3390/rs15102619 - 18 May 2023
Cited by 1 | Viewed by 1301
Abstract
Deep learning networks based on CNNs or transformers have made progress in spectral reconstruction (SR). However, many methods focus solely on feature extraction, overlooking the interpretability of network design. Additionally, models exclusively based on CNNs or transformers may lose other prior information, sacrificing [...] Read more.
Deep learning networks based on CNNs or transformers have made progress in spectral reconstruction (SR). However, many methods focus solely on feature extraction, overlooking the interpretability of network design. Additionally, models exclusively based on CNNs or transformers may lose other prior information, sacrificing reconstruction accuracy and robustness. In this paper, we propose a novel Unmixing-Guided Convolutional Transformer Network (UGCT) for interpretable SR. Specifically, transformer and ResBlock components are embedded in Paralleled-Residual Multi-Head Self-Attention (PMSA) to facilitate fine feature extraction guided by the excellent priors of local and non-local information from CNNs and transformers. Furthermore, the Spectral–Spatial Aggregation Module (S2AM) combines the advantages of geometric invariance and global receptive fields to enhance the reconstruction performance. Finally, we exploit a hyperspectral unmixing (HU) mechanism-driven framework at the end of the model, incorporating detailed features from the spectral library using LMM and employing precise endmember features to achieve a more refined interpretation of mixed pixels in HSI at sub-pixel scales. Experimental results demonstrate the superiority of our proposed UGCT, especially in the grss_d f c_2018 dataset, in which UGCT attains an RMSE of 0.0866, outperforming other comparative methods. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

20 pages, 3897 KiB  
Article
Application of Machine Learning to Tree Species Classification Using Active and Passive Remote Sensing: A Case Study of the Duraer Forestry Zone
by Su Rina, Hong Ying, Yu Shan, Wala Du, Yang Liu, Rong Li and Dingzhu Deng
Remote Sens. 2023, 15(10), 2596; https://doi.org/10.3390/rs15102596 - 16 May 2023
Cited by 1 | Viewed by 1907
Abstract
The technology of remote sensing-assisted tree species classification is increasingly developing, but the rapid refinement of tree species classification on a large scale is still challenging. As one of the treasures of ecological resources in China, Arxan has 80% forest cover, and tree [...] Read more.
The technology of remote sensing-assisted tree species classification is increasingly developing, but the rapid refinement of tree species classification on a large scale is still challenging. As one of the treasures of ecological resources in China, Arxan has 80% forest cover, and tree species classification surveys guarantee ecological environment management and sustainable development. In this study, we identified tree species in three samples within the Arxan Duraer Forestry Zone based on the spectral, textural, and topographic features of unmanned aerial vehicle (UAV) multispectral remote sensing imagery and light detection and ranging (LiDAR) point cloud data as classification variables to distinguish among birch, larch, and nonforest areas. The best extracted classification variables were combined to compare the accuracy of the random forest (RF), support vector machine (SVM), and classification and regression tree (CART) methodologies for classifying species into three sample strips in the Arxan Duraer Forestry Zone. Furthermore, the effect on the overall classification results of adding a canopy height model (CHM) was investigated based on spectral and texture feature classification combined with field measurement data to improve the accuracy. The results showed that the overall accuracy of the RF was 79%, and the kappa coefficient was 0.63. After adding the CHM extracted from the point cloud data, the overall accuracy was improved by 7%, and the kappa coefficient increased to 0.75. The overall accuracy of the CART model was 78%, and the kappa coefficient was 0.63; the overall accuracy of the SVM was 81%, and the kappa coefficient was 0.67; and the overall accuracy of the RF was 86%, and the kappa coefficient was 0.75. To verify whether the above results can be applied to a large area, Google Earth Engine was used to write code to extract the features required for classification from Sentinel-2 multispectral and radar topographic data (create equivalent conditions), and six tree species and one nonforest in the study area were classified using RF, with an overall accuracy of 0.98, and a kappa coefficient of 0.97. In this paper, we mainly integrate active and passive remote sensing data for forest surveying and add vertical data to a two-dimensional image to form a three-dimensional scene. The main goal of the research is not only to find schemes to improve the accuracy of tree species classification, but also to apply the results to large-scale areas. This is necessary to improve the time-consuming and labor-intensive traditional forest survey methods and to ensure the accuracy and reliability of survey data. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

17 pages, 4551 KiB  
Article
ACTNet: A Dual-Attention Adapter with a CNN-Transformer Network for the Semantic Segmentation of Remote Sensing Imagery
by Zheng Zhang, Fanchen Liu, Changan Liu, Qing Tian and Hongquan Qu
Remote Sens. 2023, 15(9), 2363; https://doi.org/10.3390/rs15092363 - 29 Apr 2023
Cited by 3 | Viewed by 2087
Abstract
In recent years, the application of semantic segmentation methods based on the remote sensing of images has become increasingly prevalent across a diverse range of domains, including but not limited to forest detection, water body detection, urban rail transportation planning, and building extraction. [...] Read more.
In recent years, the application of semantic segmentation methods based on the remote sensing of images has become increasingly prevalent across a diverse range of domains, including but not limited to forest detection, water body detection, urban rail transportation planning, and building extraction. With the incorporation of the Transformer model into computer vision, the efficacy and accuracy of these algorithms have been significantly enhanced. Nevertheless, the Transformer model’s high computational complexity and dependence on a pre-training weight of large datasets leads to a slow convergence during the training for remote sensing segmentation tasks. Motivated by the success of the adapter module in the field of natural language processing, this paper presents a novel adapter module (ResAttn) for improving the model training speed for remote sensing segmentation. The ResAttn adopts a dual-attention structure in order to capture the interdependencies between sets of features, thereby improving its global modeling capabilities, and introduces a Swin Transformer-like down-sampling method to reduce information loss and retain the original architecture while reducing the resolution. In addition, the existing Transformer model is limited in its ability to capture local high-frequency information, which can lead to an inadequate extraction of edge and texture features. To address these issues, this paper proposes a Local Feature Extractor (LFE) module, which is based on a convolutional neural network (CNN), and incorporates multi-scale feature extraction and residual structure to effectively overcome this limitation. Further, a mask-based segmentation method is employed and a residual-enhanced deformable attention block (Deformer Block) is incorporated to improve the small target segmentation accuracy. Finally, a sufficient number of experiments were performed on the ISPRS Potsdam datasets. The experimental results demonstrate the superior performance of the model described in this paper. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

28 pages, 7019 KiB  
Article
A Semantic Segmentation Framework for Hyperspectral Imagery Based on Tucker Decomposition and 3DCNN Tested with Simulated Noisy Scenarios
by Efrain Padilla-Zepeda, Deni Torres-Roman and Andres Mendez-Vazquez
Remote Sens. 2023, 15(5), 1399; https://doi.org/10.3390/rs15051399 - 01 Mar 2023
Cited by 2 | Viewed by 1998
Abstract
The present work, unlike others, does not try to reduce the noise in hyperspectral images to increase the semantic segmentation performance metrics; rather, we present a classification framework for noisy Hyperspectral Images (HSI), studying the classification performance metrics for different SNR levels and [...] Read more.
The present work, unlike others, does not try to reduce the noise in hyperspectral images to increase the semantic segmentation performance metrics; rather, we present a classification framework for noisy Hyperspectral Images (HSI), studying the classification performance metrics for different SNR levels and where the inputs are compressed. This framework consists of a 3D Convolutional Neural Network (3DCNN) that uses as input data a spectrally compressed version of the HSI, obtained from the Tucker Decomposition (TKD). The advantage of this classifier is the ability to handle spatial and spectral features from the core tensor, exploiting the spatial correlation of remotely sensed images of the earth surface. To test the performance of this framework, signal-independent thermal noise and signal-dependent photonic noise generators are implemented to simulate an extensive collection of tests, from 60 dB to −20 dB of Signal-to-Noise Ratio (SNR) over three datasets: Indian Pines (IP), University of Pavia (UP), and Salinas (SAL). For comparison purposes, we have included tests with Support Vector Machine (SVM), Random Forest (RF), 1DCNN, and 2DCNN. For the test cases, the datasets were compressed to only 40 tensor bands for a relative reconstruction error less than 1%. This framework allows us to classify the noisy data with better accuracy and significantly reduces the computational complexity of the Deep Learning (DL) model. The framework exhibits an excellent performance from 60 dB to 0 dB of SNR for 2DCNN and 3DCNN, achieving a Kappa coefficient from 0.90 to 1.0 in all the noisy data scenarios for a representative set of labeled samples of each class for training, from 5% to 10% for the datasets used in this work. The source code and log files of the experiments used for this paper are publicly available for research purposes. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

21 pages, 6516 KiB  
Article
MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images
by Min Yuan, Dingbang Ren, Qisheng Feng, Zhaobin Wang, Yongkang Dong, Fuxiang Lu and Xiaolin Wu
Remote Sens. 2023, 15(2), 361; https://doi.org/10.3390/rs15020361 - 06 Jan 2023
Cited by 14 | Viewed by 2534
Abstract
Semantic segmentation for urban remote sensing images is one of the most-crucial tasks in the field of remote sensing. Remote sensing images contain rich information on ground objects, such as shape, location, and boundary and can be found in high-resolution remote sensing images. [...] Read more.
Semantic segmentation for urban remote sensing images is one of the most-crucial tasks in the field of remote sensing. Remote sensing images contain rich information on ground objects, such as shape, location, and boundary and can be found in high-resolution remote sensing images. It is exceedingly challenging to identify remote sensing images because of the large intraclass variance and low interclass variance caused by these objects. In this article, we propose a multiscale hierarchical channel attention fusion network model based on a transformer and CNN, which we name the multiscale channel attention fusion network (MCAFNet). MCAFNet uses ResNet-50 and Vit-B/16 to learn the global–local context, and this strengthens the semantic feature representation. Specifically, a global–local transformer block (GLTB) is deployed in the encoder stage. This design handles image details at low resolution and extracts global image features better than previous methods. In the decoder module, a channel attention optimization module and a fusion module are added to better integrate high- and low-dimensional feature maps, which enhances the network’s ability to obtain small-scale semantic information. The proposed method is conducted on the ISPRS Vaihingen and Potsdam datasets. Both quantitative and qualitative evaluations show the competitive performance of MCAFNet in comparison to the performance of the mainstream methods. In addition, we performed extensive ablation experiments on the Vaihingen dataset in order to test the effectiveness of multiple network components. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

19 pages, 4967 KiB  
Article
SegMarsViT: Lightweight Mars Terrain Segmentation Network for Autonomous Driving in Planetary Exploration
by Yuqi Dai, Tie Zheng, Changbin Xue and Li Zhou
Remote Sens. 2022, 14(24), 6297; https://doi.org/10.3390/rs14246297 - 12 Dec 2022
Cited by 5 | Viewed by 1853
Abstract
Planetary rover systems need to perform terrain segmentation to identify feasible driving areas and surround obstacles, which falls into the research area of semantic segmentation. Recently, deep learning (DL)-based methods were proposed and achieved great performance for semantic segmentation. However, due to the [...] Read more.
Planetary rover systems need to perform terrain segmentation to identify feasible driving areas and surround obstacles, which falls into the research area of semantic segmentation. Recently, deep learning (DL)-based methods were proposed and achieved great performance for semantic segmentation. However, due to the on-board processor platform’s strict comstraints on computational complexity and power consumption, existing DL approaches are almost impossible to be deployed on satellites under the burden of extensive computation and large model size. To fill this gap, this paper targeted studying effective and efficient Martian terrain segmentation solutions that are suitable for on-board satellites. In this article, we propose a lightweight ViT-based terrain segmentation method, namely, SegMarsViT. In the encoder part, the mobile vision transformer (MViT) block in the backbone extracts local–global spatial and captures multiscale contextual information concurrently. In the decoder part, the cross-scale feature fusion modules (CFF) further integrate hierarchical context information and the compact feature aggregation module (CFA) combines multi-level feature representation. Moreover, we evaluate the proposed method on three public datasets: AI4Mars, MSL-Seg, and S5Mars. Extensive experiments demonstrate that the proposed SegMarsViT was able to achieve 68.4%, 78.22%, and 67.28% mIoU on the AI4Mars-MSL, MSL-Seg, and S5Mars, respectively, under the speed of 69.52 FPS. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Figure 1

Other

Jump to: Research

12 pages, 3883 KiB  
Technical Note
Exploring Semantic Prompts in the Segment Anything Model for Domain Adaptation
by Ziquan Wang, Yongsheng Zhang, Zhenchao Zhang, Zhipeng Jiang, Ying Yu, Li Li and Lei Li
Remote Sens. 2024, 16(5), 758; https://doi.org/10.3390/rs16050758 - 21 Feb 2024
Viewed by 948
Abstract
Robust segmentation in adverse weather conditions is crucial for autonomous driving. However, these scenes struggle with recognition and make annotations expensive, resulting in poor performance. As a result, the Segment Anything Model (SAM) was recently proposed to finely segment the spatial structure of [...] Read more.
Robust segmentation in adverse weather conditions is crucial for autonomous driving. However, these scenes struggle with recognition and make annotations expensive, resulting in poor performance. As a result, the Segment Anything Model (SAM) was recently proposed to finely segment the spatial structure of scenes and to provide powerful prior spatial information, thus showing great promise in resolving these problems. However, SAM cannot be applied directly for different geographic scales and non-semantic outputs. To address these issues, we propose SAM-EDA, which integrates SAM into an unsupervised domain adaptation mean-teacher segmentation framework. In this method, we use a “teacher-assistant” model to provide semantic pseudo-labels, which will fill in the holes in the fine spatial structure given by SAM and generate pseudo-labels close to the ground truth, which then guide the student model for learning. Here, the “teacher-assistant” model helps to distill knowledge. During testing, only the student model is used, thus greatly improving efficiency. We tested SAM-EDA on mainstream segmentation benchmarks in adverse weather conditions and obtained a more-robust segmentation model. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

17 pages, 3693 KiB  
Technical Note
Optimizing Few-Shot Remote Sensing Scene Classification Based on an Improved Data Augmentation Approach
by Zhong Dong, Baojun Lin and Fang Xie
Remote Sens. 2024, 16(3), 525; https://doi.org/10.3390/rs16030525 - 30 Jan 2024
Viewed by 760
Abstract
In the realm of few-shot classification learning, the judicious application of data augmentation methods has a significantly positive impact on classification performance. In the context of few-shot classification tasks for remote sensing images, the augmentation of features and the efficient utilization of limited [...] Read more.
In the realm of few-shot classification learning, the judicious application of data augmentation methods has a significantly positive impact on classification performance. In the context of few-shot classification tasks for remote sensing images, the augmentation of features and the efficient utilization of limited features are of paramount importance. To address the performance degradation caused by challenges such as high interclass overlap and large intraclass variance in remote sensing image features, we present a data augmentation-based classification optimization method for few-shot remote sensing image scene classification. First, we construct a distortion magnitude space using different types of features, and we perform distortion adjustments on the support set samples while introducing an optimal search for the distortion magnitude (ODS) method. Then, the augmented support set offers a wide array of feature distortions in terms of types and degrees, significantly enhancing the generalization of intrasample features. Subsequently, we devise a dual-path classification (DC) decision strategy, effectively leveraging the discriminative information provided by the postdistortion features to further reduce the likelihood of classification errors. Finally, we evaluate the proposed method using a widely used remote sensing dataset. Our experimental results demonstrate that our approach outperforms benchmark methods, achieving improved classification accuracy. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

16 pages, 54621 KiB  
Technical Note
Automated Detection and Analysis of Massive Mining Waste Deposits Using Sentinel-2 Satellite Imagery and Artificial Intelligence
by Manuel Silva, Gabriel Hermosilla, Gabriel Villavicencio and Pierre Breul
Remote Sens. 2023, 15(20), 4949; https://doi.org/10.3390/rs15204949 - 13 Oct 2023
Viewed by 1126
Abstract
This article presents a method to detect and segment mine waste deposits, specifically waste rock dumps and leaching wasted dumps, in Sentinel-2 satellite imagery using artificial intelligence. This challenging task has important implications for mining companies and regulators like the National Geology and [...] Read more.
This article presents a method to detect and segment mine waste deposits, specifically waste rock dumps and leaching wasted dumps, in Sentinel-2 satellite imagery using artificial intelligence. This challenging task has important implications for mining companies and regulators like the National Geology and Mining Service in Chile. Challenges include limited knowledge of mine waste deposit numbers, as well as logistical and technical difficulties in conducting inspections and surveying physical stability parameters. The proposed method combines YOLOv7 object detection with a vision transformer classifier to locate mine waste deposits, as well as a deep generative model for data augmentation to enhance detection and segmentation accuracy. The ViT classifier achieved 98% accuracy in differentiating five satellite imagery scene types, while the YOLOv7 model achieved an average precision of 81% for detection and 79% for segmentation of mine waste deposits. Finally, the model was used to calculate mine waste deposit areas, with an absolute error of 6.6% compared to Google Earth API results. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

17 pages, 955 KiB  
Technical Note
Semantic Segmentation of High-Resolution Remote Sensing Images Based on Sparse Self-Attention and Feature Alignment
by Li Sun, Huanxin Zou, Juan Wei, Xu Cao, Shitian He, Meilin Li and Shuo Liu
Remote Sens. 2023, 15(6), 1598; https://doi.org/10.3390/rs15061598 - 15 Mar 2023
Cited by 2 | Viewed by 1914
Abstract
Semantic segmentation of high-resolution remote sensing images (HRSI) is significant, yet challenging. Recently, several research works have utilized the self-attention operation to capture global dependencies. HRSI have complex scenes and rich details, and the implementation of self-attention on a whole image will introduce [...] Read more.
Semantic segmentation of high-resolution remote sensing images (HRSI) is significant, yet challenging. Recently, several research works have utilized the self-attention operation to capture global dependencies. HRSI have complex scenes and rich details, and the implementation of self-attention on a whole image will introduce redundant information and interfere with semantic segmentation. The detail recovery of HRSI is another challenging aspect of semantic segmentation. Several networks use up-sampling, skip-connections, parallel structure, and enhanced edge features to obtain more precise results. However, the above methods ignore the misalignment of features with different resolutions, which affects the accuracy of the segmentation results. To resolve these problems, this paper proposes a semantic segmentation network based on sparse self-attention and feature alignment (SAANet). Specifically, the sparse position self-attention module (SPAM) divides, rearranges, and resorts the feature maps in the position dimension and performs position attention operations (PAM) in rearranged and restored sub-regions, respectively. Meanwhile, the proposed sparse channel self-attention module (SCAM) groups, rearranges, and resorts the feature maps in the channel dimension and performs channel attention operations (CAM) in the rearranged and restored sub-channels, respectively. SPAM and SCAM effectively model long-range context information and interdependencies between channels, while reducing the introduction of redundant information. Finally, the feature alignment module (FAM) utilizes convolutions to obtain a learnable offset map and aligns feature maps with different resolutions, helping to recover details and refine feature representations. Extensive experiments conducted on the ISPRS Vaihingen, Potsdam, and LoveDA datasets demonstrate that the proposed method precedes general semantic segmentation- and self-attention-based networks. Full article
(This article belongs to the Special Issue Remote Sensing Image Classification and Semantic Segmentation)
Show Figures

Graphical abstract

Back to TopTop