remotesensing-logo

Journal Browser

Journal Browser

Semantic Segmentation of High-Resolution Remote Sensing Images with Advanced Deep Learning Techniques

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "Remote Sensing Image Processing".

Deadline for manuscript submissions: 31 May 2025 | Viewed by 10549

Special Issue Editors

School of Computer and Software, Nanjing University of Information Science and Technology, No. 219 Ningliu Road, Nanjing 210044, Jiangsu Province, China
Interests: hyperspectral remote sensing image processing (including: unmixing, classification, fusion); deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
Interests: intelligent image; graphics processing; deep learning
Infocomm Technology Cluster, Singapore Institute of Technology, Singapore 138683, Singapore
Interests: computer graphics; virtual reality; human-computer interaction; computer vision; visualization
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years, semantic segmentation has emerged as a prominent research area within image processing and computer vision. The rapid advancement of deep learning (DL) has significantly fueled interest in this domain. A plethora of influential DL models, including convolutional neural networks (CNNs), generative adversarial networks (GANs), graph convolutional networks (GCNs), multimodal data fusion networks, and Transformer, have been developed for semantic segmentation tasks. These models exhibit remarkable performance across diverse applications, ranging from scene comprehension in autonomous driving to precise segmentation of skin lesions for medical diagnosis and hyperspectral/multispectral image segmentation for remote sensing applications.

By virtue of the advancements in spectral imaging and aerial photography technologies, there has been a notable facilitation in the acquisition of an extensive repository of aerial multispectral and hyperspectral images. These images serve as invaluable resources across various domains of remote sensing applications, encompassing tasks ranging from quantifying forest cover to conducting land-use assessments and projecting urban-planning scenarios. However, despite the commendable strides made in deep learning-based semantic segmentation within the domain of natural images, the transference of such methodologies to the intricate realm of pixel-level or superpixel-level classification/segmentation of remote sensing images (RSIs), including multispectral and hyperspectral imagery, poses a host of formidable challenges.

Diverging from natural images, high-resolution RSIs present a myriad of object categories alongside redundant details. Semantic segmentation methods for RSIs must accommodate their unique characteristics, handle interclass distinction, and maintain intraclass consistency. However, inputting full high-resolution images into DL models is computationally impractical, leading to excessive complexity. Some current approaches sacrifice segmentation accuracy for processing speed using spatial-based image decomposition. This Special Issue seeks original contributions from researchers pioneering high-performance semantic segmentation of high-resolution RSIs, leveraging deep learning to address these challenges.

Dr. Le Sun
Dr. Qian Sun
Dr. Kan Chen
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • high-resolution/super-pixel remote sensing image segmentation
  • semantic segmentation/change detection
  • data augmentation
  • pixel-wise classification
  • zero-shot learning/ensemble learning
  • attention mechanisms
  • convolutional neural networks/generative adversarial networks/graph convolutional networks/transformer
  • multi-scale feature fusion
  • multimodal data fusion
  • time series image analysis
  • computational complexity
  • domain adaptation
  • explainable AI (XAI)

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 60546 KiB  
Article
Adapting Cross-Sensor High-Resolution Remote Sensing Imagery for Land Use Classification
by Wangbin Li, Kaimin Sun and Jinjiang Wei
Remote Sens. 2025, 17(5), 927; https://doi.org/10.3390/rs17050927 - 5 Mar 2025
Viewed by 849
Abstract
High-resolution visible remote sensing imagery, as a fundamental contributor to Earth observation, has found extensive application in land use classification. However, the heterogeneous array of optical sensors, distinguished by their unique design architectures, exhibit disparate spectral responses and spatial distributions when observing ground [...] Read more.
High-resolution visible remote sensing imagery, as a fundamental contributor to Earth observation, has found extensive application in land use classification. However, the heterogeneous array of optical sensors, distinguished by their unique design architectures, exhibit disparate spectral responses and spatial distributions when observing ground objects. These discrepancies between multi-sensor data present a significant obstacle to the widespread application of intelligent methods. In this paper, we propose a method tailored to accommodate these disparities, with the aim of achieving a smooth transfer for the model across diverse sets of images captured by different sensors. Specifically, to address the discrepancies in spatial resolution, a novel positional encoding has been incorporated to capture the correlation between the spatial resolution details and the characteristics of ground objects. To tackle spectral disparities, random amplitude mixup augmentation is introduced to mitigate the impact of feature anisotropy resulting from discrepancies in low-level features between multi-sensor images. Additionally, we integrate convolutional neural networks and Transformers to enhance the model’s feature extraction capabilities, and employ a fine-tuning strategy with dynamic pseudo-labels to reduce the reliance on annotated data from the target domain. In the experimental section, the Gaofen-2 images (4 m) and the Sentinel-2 images (10 m) were selected as training and test datasets to simulate cross-sensor model transfer scenarios. Also, Google Earth images of Suzhou City, Jiangsu Province, were utilized for further validation. The results indicate that our approach effectively mitigates the degradation in model performance attributed to image source inconsistencies. Full article
Show Figures

Graphical abstract

21 pages, 22124 KiB  
Article
ACDF-YOLO: Attentive and Cross-Differential Fusion Network for Multimodal Remote Sensing Object Detection
by Xuan Fei, Mengyao Guo, Yan Li, Renping Yu and Le Sun
Remote Sens. 2024, 16(18), 3532; https://doi.org/10.3390/rs16183532 - 23 Sep 2024
Cited by 5 | Viewed by 2279
Abstract
Object detection in remote sensing images has received significant attention for a wide range of applications. However, traditional unimodal remote sensing images, whether based on visible light or infrared, have limitations that cannot be ignored. Visible light images are susceptible to ambient lighting [...] Read more.
Object detection in remote sensing images has received significant attention for a wide range of applications. However, traditional unimodal remote sensing images, whether based on visible light or infrared, have limitations that cannot be ignored. Visible light images are susceptible to ambient lighting conditions, and their detection accuracy can be greatly reduced. Infrared images often lack rich texture information, resulting in a high false-detection rate during target identification and classification. To address these challenges, we propose a novel multimodal fusion network detection model, named ACDF-YOLO, basedon the lightweight and efficient YOLOv5 structure, which aims to amalgamate synergistic data from both visible and infrared imagery, thereby enhancing the efficiency of target identification in remote sensing imagery. Firstly, a novel efficient shuffle attention module is designed to assist in extracting the features of various modalities. Secondly, deeper multimodal information fusion is achieved by introducing a new cross-modal difference module to fuse the features that have been acquired. Finally, we combine the two modules mentioned above in an effective manner to achieve ACDF. The ACDF not only enhances the characterization ability for the fused features but also further refines the capture and reinforcement of important channel features. Experimental validation was performed using several publicly available multimodal real-world and remote sensing datasets. Compared with other advanced unimodal and multimodal methods, ACDF-YOLO separately achieved a 95.87% and 78.10% mAP0.5 on the LLVIP and VEDAI datasets, demonstrating that the deep fusion of different modal information can effectively improve the accuracy of object detection. Full article
Show Figures

Figure 1

26 pages, 2861 KiB  
Article
Attention Guide Axial Sharing Mixed Attention (AGASMA) Network for Cloud Segmentation and Cloud Shadow Segmentation
by Guowei Gu, Zhongchen Wang, Liguo Weng, Haifeng Lin, Zikai Zhao and Liling Zhao
Remote Sens. 2024, 16(13), 2435; https://doi.org/10.3390/rs16132435 - 2 Jul 2024
Viewed by 1202
Abstract
Segmenting clouds and their shadows is a critical challenge in remote sensing image processing. The shape, texture, lighting conditions, and background of clouds and their shadows impact the effectiveness of cloud detection. Currently, architectures that maintain high resolution throughout the entire information-extraction process [...] Read more.
Segmenting clouds and their shadows is a critical challenge in remote sensing image processing. The shape, texture, lighting conditions, and background of clouds and their shadows impact the effectiveness of cloud detection. Currently, architectures that maintain high resolution throughout the entire information-extraction process are rapidly emerging. This parallel architecture, combining high and low resolutions, produces detailed high-resolution representations, enhancing segmentation prediction accuracy. This paper continues the parallel architecture of high and low resolution. When handling high- and low-resolution images, this paper employs a hybrid approach combining the Transformer and CNN models. This method facilitates interaction between the two models, enabling the extraction of both semantic and spatial details from the images. To address the challenge of inadequate fusion and significant information loss between high- and low-resolution images, this paper introduces a method based on ASMA (Axial Sharing Mixed Attention). This approach establishes pixel-level dependencies between high-resolution and low-resolution images, aiming to enhance the efficiency of image fusion. In addition, to enhance the effective focus on critical information in remote sensing images, the AGM (Attention Guide Module) is introduced, to integrate attention elements from original features into ASMA, to alleviate the problem of insufficient channel modeling of the self-attention mechanism. Our experimental results on the Cloud and Cloud Shadow dataset, the SPARCS dataset, and the CSWV dataset demonstrate the effectiveness of our method, surpassing the state-of-the-art techniques for cloud and cloud shadow segmentation. Full article
Show Figures

Figure 1

29 pages, 3844 KiB  
Article
VALNet: Vision-Based Autonomous Landing with Airport Runway Instance Segmentation
by Qiang Wang, Wenquan Feng, Hongbo Zhao, Binghao Liu and Shuchang Lyu
Remote Sens. 2024, 16(12), 2161; https://doi.org/10.3390/rs16122161 - 14 Jun 2024
Cited by 8 | Viewed by 2411
Abstract
Visual navigation, characterized by its autonomous capabilities, cost effectiveness, and robust resistance to interference, serves as the foundation for vision-based autonomous landing systems. These systems rely heavily on runway instance segmentation, which accurately divides runway areas and provides precise information for unmanned aerial [...] Read more.
Visual navigation, characterized by its autonomous capabilities, cost effectiveness, and robust resistance to interference, serves as the foundation for vision-based autonomous landing systems. These systems rely heavily on runway instance segmentation, which accurately divides runway areas and provides precise information for unmanned aerial vehicle (UAV) navigation. However, current research primarily focuses on runway detection but lacks relevant runway instance segmentation datasets. To address this research gap, we created the Runway Landing Dataset (RLD), a benchmark dataset that focuses on runway instance segmentation mainly based on X-Plane. To overcome the challenges of large-scale changes and input image angle differences in runway instance segmentation tasks, we propose a vision-based autonomous landing segmentation network (VALNet) that uses band-pass filters, where a Context Enhancement Module (CEM) guides the model to learn adaptive “band” information through heatmaps, while an Orientation Adaptation Module (OAM) of a triple-channel architecture to fully utilize rotation information enhances the model’s ability to capture input image rotation transformations. Extensive experiments on RLD demonstrate that the new method has significantly improved performance. The visualization results further confirm the effectiveness and interpretability of VALNet in the face of large-scale changes and angle differences. This research not only advances the development of runway instance segmentation but also highlights the potential application value of VALNet in vision-based autonomous landing systems. Additionally, RLD is publicly available. Full article
Show Figures

Graphical abstract

25 pages, 5654 KiB  
Article
Deep-Learning-Based Daytime COT Retrieval and Prediction Method Using FY4A AGRI Data
by Fanming Xu, Biao Song, Jianhua Chen, Runda Guan, Rongjie Zhu, Jiayu Liu and Zhongfeng Qiu
Remote Sens. 2024, 16(12), 2136; https://doi.org/10.3390/rs16122136 - 13 Jun 2024
Cited by 1 | Viewed by 1245
Abstract
The traditional method for retrieving cloud optical thickness (COT) is carried out through a Look-Up Table (LUT). Researchers must make a series of idealized assumptions and conduct extensive observations and record features in this scenario, consuming considerable resources. The emergence of deep learning [...] Read more.
The traditional method for retrieving cloud optical thickness (COT) is carried out through a Look-Up Table (LUT). Researchers must make a series of idealized assumptions and conduct extensive observations and record features in this scenario, consuming considerable resources. The emergence of deep learning effectively addresses the shortcomings of the traditional approach. In this paper, we first propose a daytime (SOZA < 70°) COT retrieval algorithm based on FY-4A AGRI. We establish and train a Convolutional Neural Network (CNN) model for COT retrieval, CM4CR, with the CALIPSO’s COT product spatially and temporally synchronized as the ground truth. Then, a deep learning method extended from video prediction models is adopted to predict COT values based on the retrieval results obtained from CM4CR. The COT prediction model (CPM) consists of an encoder, a predictor, and a decoder. On this basis, we further incorporated a time embedding module to enhance the model’s ability to learn from irregular time intervals in the input COT sequence. During the training phase, we employed Charbonnier Loss and Edge Loss to enhance the model’s capability to represent COT details. Experiments indicate that our CM4CR outperforms existing COT retrieval methods, with predictions showing better performance across several metrics than other benchmark prediction models. Additionally, this paper also investigates the impact of different lengths of COT input sequences and the time intervals between adjacent frames of COT on prediction performance. Full article
Show Figures

Figure 1

22 pages, 28598 KiB  
Article
FFEDet: Fine-Grained Feature Enhancement for Small Object Detection
by Feiyue Zhao, Jianwei Zhang and Guoqing Zhang
Remote Sens. 2024, 16(11), 2003; https://doi.org/10.3390/rs16112003 - 2 Jun 2024
Cited by 2 | Viewed by 1584
Abstract
Small object detection poses significant challenges in the realm of general object detection, primarily due to complex backgrounds and other instances interfering with the expression of features. This research introduces an uncomplicated and efficient algorithm that addresses the limitations of small object detection. [...] Read more.
Small object detection poses significant challenges in the realm of general object detection, primarily due to complex backgrounds and other instances interfering with the expression of features. This research introduces an uncomplicated and efficient algorithm that addresses the limitations of small object detection. Firstly, we propose an efficient cross-scale feature fusion attention module called ECFA, which effectively utilizes attention mechanisms to emphasize relevant features across adjacent scales and suppress irrelevant noise, tackling issues of feature redundancy and insufficient representation of small objects. Secondly, we design a highly efficient convolutional module named SEConv, which reduces computational redundancy while providing a multi-scale receptive field to improve feature learning. Additionally, we develop a novel dynamic focus sample weighting function called DFSLoss, which allows the model to focus on learning from both normal and challenging samples, effectively addressing the problem of imbalanced difficulty levels among samples. Moreover, we introduce Wise-IoU to address the impact of poor-quality examples on model convergence. We extensively conduct experiments on four publicly available datasets to showcase the exceptional performance of our method in comparison to state-of-the-art object detectors. Full article
Show Figures

Figure 1

Back to TopTop