MDPI - Publisher of Open Access Journals

24 pages, 8074 KiB

Open AccessArticle

MMRAD-Net: A Multi-Scale Model for Precise Building Extraction from High-Resolution Remote Sensing Imagery with DSM Integration

by Yu Gao, Huiming Chai and Xiaolei Lv

Remote Sens. 2025, 17(6), 952; https://doi.org/10.3390/rs17060952 - 7 Mar 2025

Viewed by 725

Abstract

High-resolution remote sensing imagery (HRRSI) presents significant challenges for building extraction tasks due to its complex terrain structures, multi-scale features, and rich spectral and geometric information. Traditional methods often face limitations in effectively integrating multi-scale features while maintaining a balance between detailed and [...] Read more.

High-resolution remote sensing imagery (HRRSI) presents significant challenges for building extraction tasks due to its complex terrain structures, multi-scale features, and rich spectral and geometric information. Traditional methods often face limitations in effectively integrating multi-scale features while maintaining a balance between detailed and global semantic information. To address these challenges, this paper proposes an innovative deep learning network, Multi-Source Multi-Scale Residual Attention Network (MMRAD-Net). This model is built upon the classical encoder–decoder framework and introduces two key components: the GCN OA-SWinT Dense Module (GSTDM) and the Res DualAttention Dense Fusion Block (R-DDFB). Additionally, it incorporates Digital Surface Model (DSM) data, presenting a novel feature extraction and fusion strategy. Specifically, the model enhances building extraction accuracy and robustness through hierarchical feature modeling and a refined cross-scale fusion mechanism, while effectively preserving both detail information and global semantic relationships. Furthermore, we propose a Hybrid Loss, which combines Binary Cross-Entropy Loss (BCE Loss), Dice Loss, and an edge-sensitive term to further improve the precision of building edges and foreground reconstruction capabilities. Experiments conducted on the GF-7 and WHU datasets validate the performance of MMRAD-Net, demonstrating its superiority over traditional methods in boundary handling, detail recovery, and adaptability to complex scenes. On the GF-7 Dataset, MMRAD-Net achieved an F1-score of 91.12% and an IoU of 83.01%. On the WHU Building Dataset, the F1-score and IoU were 94.04% and 88.99%, respectively. Ablation studies and transfer learning experiments further confirm the rationality of the model design and its strong generalization ability. These results highlight that innovations in multi-source data fusion, multi-scale feature modeling, and detailed feature fusion mechanisms have enhanced the accuracy and robustness of building extraction. Full article

► Show Figures

Figure 1

24 pages, 2879 KiB

Open AccessArticle

A Frequency Attention-Enhanced Network for Semantic Segmentation of High-Resolution Remote Sensing Images

by Jianyi Zhong, Tao Zeng, Zhennan Xu, Caifeng Wu, Shangtuo Qian, Nan Xu, Ziqi Chen, Xin Lyu and Xin Li

Remote Sens. 2025, 17(3), 402; https://doi.org/10.3390/rs17030402 - 24 Jan 2025

Cited by 2 | Viewed by 1769

Abstract

Semantic segmentation of high-resolution remote sensing images (HRRSIs) presents unique challenges due to the intricate spatial and spectral characteristics of these images. Traditional methods often prioritize spatial information while underutilizing the rich spectral context, leading to limited feature discrimination capabilities. To address these [...] Read more.

Semantic segmentation of high-resolution remote sensing images (HRRSIs) presents unique challenges due to the intricate spatial and spectral characteristics of these images. Traditional methods often prioritize spatial information while underutilizing the rich spectral context, leading to limited feature discrimination capabilities. To address these issues, we propose a novel frequency attention-enhanced network (FAENet), which incorporates a frequency attention model (FreqA) to jointly model spectral and spatial contexts. FreqA leverages discrete wavelet transformation (DWT) to decompose input images into distinct frequency components, followed by a two-stage attention mechanism comprising inner-component channel attention (ICCA) and cross-component channel attention (CCCA). These mechanisms enhance spectral representation, which is further refined through a self-attention (SA) module to capture long-range dependencies before transforming back into the spatial domain. FAENet’s encoder–decoder architecture facilitates multiscale feature refinement, enabling effective segmentation. Extensive experiments on the ISPRS Potsdam and LoveDA benchmarks demonstrate that FAENet outperforms state-of-the-art models, achieving superior segmentation accuracy. Ablation studies further validate the contributions of ICCA and CCCA. Moreover, efficiency comparisons confirm the superiority of the proposed FAENet over other models. Full article

(This article belongs to the Special Issue Multi-Task Remote Sensing Image Analysis: Classification, Segmentation, and Change Detection)

► Show Figures

Figure 1

22 pages, 9723 KiB

Open AccessArticle

AFENet: An Attention-Focused Feature Enhancement Network for the Efficient Semantic Segmentation of Remote Sensing Images

by Jiarui Li and Shuli Cheng

Remote Sens. 2024, 16(23), 4392; https://doi.org/10.3390/rs16234392 - 24 Nov 2024

Cited by 6 | Viewed by 1175

Abstract

The semantic segmentation of high-resolution remote sensing images (HRRSIs) faces persistent challenges in handling complex architectural structures and shadow occlusions, limiting the effectiveness of existing deep learning approaches. To address these limitations, we propose an attention-focused feature enhancement network (AFENet) with a novel [...] Read more.

The semantic segmentation of high-resolution remote sensing images (HRRSIs) faces persistent challenges in handling complex architectural structures and shadow occlusions, limiting the effectiveness of existing deep learning approaches. To address these limitations, we propose an attention-focused feature enhancement network (AFENet) with a novel encoder–decoder architecture. The encoder architecture combines ResNet50 with a parallel multistage feature enhancement group (PMFEG), enabling robust feature extraction through optimized channel reduction, scale expansion, and channel reassignment operations. Building upon this foundation, we develop a global multi-scale attention mechanism (GMAM) in the decoder that effectively synthesizes spatial information across multiple scales by learning comprehensive global–local relationships. The architecture is further enhanced by an efficient feature-weighted fusion module (FWFM) that systematically integrates remote spatial features with local semantic information to improve segmentation accuracy. Experimental results across diverse scenarios demonstrate that AFENet achieves superior performance in building structure detection, exhibiting enhanced segmentation connectivity and completeness compared to state-of-the-art methods. Full article

► Show Figures

Figure 1

18 pages, 3490 KiB

Open AccessArticle

MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images

by Yan Wang, Li Cao and He Deng

Sensors 2024, 24(22), 7266; https://doi.org/10.3390/s24227266 - 13 Nov 2024

Cited by 3 | Viewed by 2935

Abstract

Semantic segmentation of remote sensing images is a fundamental task in computer vision, holding substantial relevance in applications such as land cover surveys, environmental protection, and urban building planning. In recent years, multi-modal fusion-based models have garnered considerable attention, exhibiting superior segmentation performance [...] Read more.

Semantic segmentation of remote sensing images is a fundamental task in computer vision, holding substantial relevance in applications such as land cover surveys, environmental protection, and urban building planning. In recent years, multi-modal fusion-based models have garnered considerable attention, exhibiting superior segmentation performance when compared with traditional single-modal techniques. Nonetheless, the majority of these multi-modal models, which rely on Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for feature fusion, face limitations in terms of remote modeling capabilities or computational complexity. This paper presents a novel Mamba-based multi-modal fusion network called MFMamba for semantic segmentation of remote sensing images. Specifically, the network employs a dual-branch encoding structure, consisting of a CNN-based main encoder for extracting local features from high-resolution remote sensing images (HRRSIs) and of a Mamba-based auxiliary encoder for capturing global features on its corresponding digital surface model (DSM). To capitalize on the distinct attributes of the multi-modal remote sensing data from both branches, a feature fusion block (FFB) is designed to synergistically enhance and integrate the features extracted from the dual-branch structure at each stage. Extensive experiments on the Vaihingen and the Potsdam datasets have verified the effectiveness and superiority of MFMamba in semantic segmentation of remote sensing images. Compared with state-of-the-art methods, MFMamba achieves higher overall accuracy (OA) and a higher mean F1 score (mF1) and mean intersection over union (mIoU), while maintaining low computational complexity. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

22 pages, 1329 KiB

Open AccessArticle

A Scene Classification Model Based on Global-Local Features and Attention in Lie Group Space

by Chengjun Xu, Jingqian Shu, Zhenghan Wang and Jialin Wang

Remote Sens. 2024, 16(13), 2323; https://doi.org/10.3390/rs16132323 - 25 Jun 2024

Cited by 7 | Viewed by 2001

Abstract

The efficient fusion of global and local multi-scale features is quite important for remote sensing scene classification (RSSC). The scenes in high-resolution remote sensing images (HRRSI) contain many complex backgrounds, intra-class diversity, and inter-class similarities. Many studies have shown that global features and [...] Read more.

The efficient fusion of global and local multi-scale features is quite important for remote sensing scene classification (RSSC). The scenes in high-resolution remote sensing images (HRRSI) contain many complex backgrounds, intra-class diversity, and inter-class similarities. Many studies have shown that global features and local features are helpful for RSSC. The receptive field of a traditional convolution kernel is small and fixed, and it is difficult to capture global features in the scene. The self-attention mechanism proposed in transformer effectively alleviates the above shortcomings. However, such models lack local inductive bias, and the calculation is complicated due to the large number of parameters. To address these problems, in this study, we propose a classification model of global-local features and attention based on Lie Group space. The model is mainly composed of three independent branches, which can effectively extract multi-scale features of the scene and fuse the above features through a fusion module. Channel attention and spatial attention are designed in the fusion module, which can effectively enhance the crucial features in the crucial regions, to improve the accuracy of scene classification. The advantage of our model is that it extracts richer features, and the global-local features of the scene can be effectively extracted at different scales. Our proposed model has been verified on publicly available and challenging datasets, taking the AID as an example, the classification accuracy reached 97.31%, and the number of parameters is 12.216 M. Compared with other state-of-the-art models, it has certain advantages in terms of classification accuracy and number of parameters. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning with Applications in Remote Sensing II)

► Show Figures

Graphical abstract

19 pages, 2039 KiB

Open AccessArticle

EAD-Net: Efficiently Asymmetric Network for Semantic Labeling of High-Resolution Remote Sensing Images with Dynamic Routing Mechanism

by Qiongqiong Hu, Feiting Wang and Ying Li

Remote Sens. 2024, 16(9), 1478; https://doi.org/10.3390/rs16091478 - 23 Apr 2024

Cited by 1 | Viewed by 1349

Abstract

Semantic labeling of high-resolution remote sensing images (HRRSIs) holds a significant position in the remote sensing domain. Although numerous deep-learning-based segmentation models have enhanced segmentation precision, their complexity leads to a significant increase in parameters and computational requirements. While ensuring segmentation accuracy, it [...] Read more.

Semantic labeling of high-resolution remote sensing images (HRRSIs) holds a significant position in the remote sensing domain. Although numerous deep-learning-based segmentation models have enhanced segmentation precision, their complexity leads to a significant increase in parameters and computational requirements. While ensuring segmentation accuracy, it is also crucial to improve segmentation speed. To address this issue, we propose an efficient asymmetric deep learning network for HRRSIs, referred to as EAD-Net. First, EAD-Net employs ResNet50 as the backbone without pooling, instead of the RepVGG block, to extract rich semantic features while reducing model complexity. Second, a dynamic routing module is proposed in EAD-Net to adjust routing based on the pixel occupancy of small-scale objects. Concurrently, a channel attention mechanism is used to preserve their features even with minimal occupancy. Third, a novel asymmetric decoder is introduced, which uses convolutional operations while discarding skip connections. This not only effectively reduces redundant features but also allows using low-level image features to enhance EAD-Net’s performance. Extensive experimental results on the ISPRS 2D semantic labeling challenge benchmark demonstrate that EAD-Net achieves state-of-the-art (SOTA) accuracy performance while reducing model complexity and inference time, while the mean Intersection over Union (mIoU) score reaching 87.38% and 93.10% in the Vaihingen and Potsdam datasets, respectively. Full article

► Show Figures

Figure 1

20 pages, 18978 KiB

Open AccessArticle

Dual Hybrid Attention Mechanism-Based U-Net for Building Segmentation in Remote Sensing Images

by Jingxiong Lei, Xuzhi Liu, Haolang Yang, Zeyu Zeng and Jun Feng

Appl. Sci. 2024, 14(3), 1293; https://doi.org/10.3390/app14031293 - 4 Feb 2024

Cited by 9 | Viewed by 2768

Abstract

High-resolution remote sensing images (HRRSI) have important theoretical and practical value in urban planning. However, current segmentation methods often struggle with issues like blurred edges and loss of detailed information due to the intricate backgrounds and rich semantics in high-resolution remote sensing images. [...] Read more.

High-resolution remote sensing images (HRRSI) have important theoretical and practical value in urban planning. However, current segmentation methods often struggle with issues like blurred edges and loss of detailed information due to the intricate backgrounds and rich semantics in high-resolution remote sensing images. To tackle these challenges, this paper proposes an end-to-end attention-based Convolutional Neural Network (CNN) called Double Hybrid Attention U-Net (DHAU-Net). We designed a new Double Hybrid Attention structure consisting of dual-parallel hybrid attention modules to replace the skip connections in U-Net, which can eliminate redundant information interference and enhances the collection and utilization of important shallow features. Comprehensive experiments on the Massachusetts remote sensing building dataset and the Inria aerial image labeling dataset demonstrate that our proposed method achieves effective pixel-level building segmentation in urban remote sensing images by eliminating redundant information interference and making full use of shallow features, and improves the segmentation performance without significant time costs (approximately 15%). The evaluation metrics reveal significant results, with an accuracy rate of 0.9808, precision reaching 0.9300, an F1 score of 0.9112, a mean intersection over union (mIoU) of 0.9088, and a recall rate of 0.8932. Full article

► Show Figures

Graphical abstract

15 pages, 4440 KiB

Open AccessTechnical Note

Multi-Feature Dynamic Fusion Cross-Domain Scene Classification Model Based on Lie Group Space

by Chengjun Xu, Jingqian Shu and Guobin Zhu

Remote Sens. 2023, 15(19), 4790; https://doi.org/10.3390/rs15194790 - 30 Sep 2023

Cited by 6 | Viewed by 1577

Abstract

To address the problem of the expensive and time-consuming annotation of high-resolution remote sensing images (HRRSIs), scholars have proposed cross-domain scene classification models, which can utilize learned knowledge to classify unlabeled data samples. Due to the significant distribution difference between a source domain [...] Read more.

To address the problem of the expensive and time-consuming annotation of high-resolution remote sensing images (HRRSIs), scholars have proposed cross-domain scene classification models, which can utilize learned knowledge to classify unlabeled data samples. Due to the significant distribution difference between a source domain (training sample set) and a target domain (test sample set), scholars have proposed domain adaptation models based on deep learning to reduce the above differences. However, the existing models have the following shortcomings: (1) insufficient learning of feature information, resulting in feature loss and restricting the spatial extent of domain-invariant features; (2) models easily focus on background feature information, resulting in negative transfer; (3) the relationship between the marginal distribution and the conditional distribution is not fully considered, and the weight parameters between them are manually set, which is time-consuming and may fall into local optimum. To address the above problems, this study proposes a novel remote sensing cross-domain scene classification model based on Lie group spatial attention and adaptive multi-feature distribution. Concretely, the model first introduces Lie group feature learning and maps the samples to the Lie group manifold space. By learning features of different levels and different scales and feature fusion, richer features are obtained, and the spatial scope of domain-invariant features is expanded. In addition, we also design an attention mechanism based on dynamic feature fusion alignment, which effectively enhances the weight of key regions and dynamically balances the importance between marginal and conditional distributions. Extensive experiments are conducted on three publicly available and challenging datasets, and the experimental results show the advantages of our proposed method over other state-of-the-art deep domain adaptation methods. Full article

(This article belongs to the Special Issue Deep Learning Techniques Applied in Remote Sensing)

► Show Figures

Figure 1

16 pages, 14094 KiB

Open AccessArticle

Remote Sensing Image Compression Based on the Multiple Prior Information

by Chuan Fu and Bo Du

Remote Sens. 2023, 15(8), 2211; https://doi.org/10.3390/rs15082211 - 21 Apr 2023

Cited by 18 | Viewed by 3505

Abstract

Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and [...] Read more.

Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000. Full article

(This article belongs to the Special Issue AI-Based Obstacle Detection and Avoidance in Remote Sensing Images)

► Show Figures

Figure 1

18 pages, 13436 KiB

Open AccessArticle

Context-Driven Feature-Focusing Network for Semantic Segmentation of High-Resolution Remote Sensing Images

by Xiaowei Tan, Zhifeng Xiao, Yanru Zhang, Zhenjiang Wang, Xiaole Qi and Deren Li

Remote Sens. 2023, 15(5), 1348; https://doi.org/10.3390/rs15051348 - 28 Feb 2023

Cited by 4 | Viewed by 2279

Abstract

High-resolution remote sensing images (HRRSIs) cover a broad range of geographic regions and contain a wide variety of artificial objects and natural elements at various scales that comprise different image contexts. In semantic segmentation tasks based on deep convolutional neural networks (DCNNs), different [...] Read more.

High-resolution remote sensing images (HRRSIs) cover a broad range of geographic regions and contain a wide variety of artificial objects and natural elements at various scales that comprise different image contexts. In semantic segmentation tasks based on deep convolutional neural networks (DCNNs), different resolution features are not equally effective for extracting ground objects with different scales. In this article, we propose a novel context-driven feature-focusing network (CFFNet) aimed at focusing on the multi-scale ground object in fused features of different resolutions. The CFFNet consists of three components: a depth-residual encoder, a context-driven feature-focusing (CFF) decoder, and a classifier. First, features with different resolutions are extracted using the depth-residual encoder to construct a feature pyramid. The multi-scale information in the fused features is then extracted using the feature-focusing (FF) module in the CFF decoder, followed by computing the focus weights of different scale features adaptively using the context-focusing (CF) module to obtain the weighted multi-scale fused feature representation. Finally, the final results are obtained using the classifier. The experiments are conducted on the public LoveDA and GID datasets. Quantitative and qualitative analyses of state-of-the-art (SOTA) segmentation benchmarks demonstrate the rationality and effectiveness of the proposed approach. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Graphical abstract

22 pages, 8434 KiB

Open AccessArticle

Multi-Branch Adaptive Hard Region Mining Network for Urban Scene Parsing of High-Resolution Remote-Sensing Images

by Haiwei Bai, Jian Cheng, Yanzhou Su, Qi Wang, Haoran Han and Yijie Zhang

Remote Sens. 2022, 14(21), 5527; https://doi.org/10.3390/rs14215527 - 2 Nov 2022

Cited by 6 | Viewed by 1880

Abstract

Scene parsing of high-resolution remote-sensing images (HRRSIs) refers to parsing different semantic regions from the images, which is an important fundamental task in image understanding. However, due to the inherent complexity of urban scenes, HRRSIs contain numerous object classes. These objects present large-scale [...] Read more.

Scene parsing of high-resolution remote-sensing images (HRRSIs) refers to parsing different semantic regions from the images, which is an important fundamental task in image understanding. However, due to the inherent complexity of urban scenes, HRRSIs contain numerous object classes. These objects present large-scale variation and irregular morphological structures. Furthermore, their spatial distribution is uneven and contains substantial spatial details. All these features make it difficult to parse urban scenes accurately. To deal with these dilemmas, in this paper, we propose a multi-branch adaptive hard region mining network (MBANet) for urban scene parsing of HRRSIs. MBANet consists of three branches, namely, a multi-scale semantic branch, an adaptive hard region mining (AHRM) branch, and an edge branch. First, the multi-scale semantic branch is constructed based on a feature pyramid network (FPN). To reduce the memory footprint, ResNet50 is chosen as the backbone, which, combined with the atrous spatial pyramid pooling module, can extract rich multi-scale contextual information effectively, thereby enhancing object representation at various scales. Second, an AHRM branch is proposed to enhance feature representation of hard regions with a complex distribution, which would be difficult to parse otherwise. Third, the edge-extraction branch is introduced to supervise boundary perception training so that the contours of objects can be better captured. In our experiments, the three branches complemented each other in feature extraction and demonstrated state-of-the-art performance for urban scene parsing of HRRSIs. We also performed ablation studies on two HRRSI datasets from ISPRS and compared them with other methods. Full article

► Show Figures

Figure 1

19 pages, 1347 KiB

Open AccessArticle

SERNet: Squeeze and Excitation Residual Network for Semantic Segmentation of High-Resolution Remote Sensing Images

by Xiaoyan Zhang, Linhui Li, Donglin Di, Jian Wang, Guangsheng Chen, Weipeng Jing and Mahmoud Emam

Remote Sens. 2022, 14(19), 4770; https://doi.org/10.3390/rs14194770 - 23 Sep 2022

Cited by 33 | Viewed by 5223

Abstract

The semantic segmentation of high-resolution remote sensing images (HRRSIs) is a basic task for remote sensing image processing and has a wide range of applications. However, the abundant texture information and wide imaging range of HRRSIs lead to the complex distribution of ground [...] Read more.

The semantic segmentation of high-resolution remote sensing images (HRRSIs) is a basic task for remote sensing image processing and has a wide range of applications. However, the abundant texture information and wide imaging range of HRRSIs lead to the complex distribution of ground objects and unclear boundaries, which bring huge challenges to the segmentation of HRRSIs. To solve this problem, in this paper we propose an improved squeeze and excitation residual network (SERNet), which integrates several squeeze and excitation residual modules (SERMs) and a refine attention module (RAM). The SERM can recalibrate feature responses adaptively by modeling the long-range dependencies in the channel and spatial dimensions, which enables effective information to be transmitted between the shallow and deep layers. The RAM pays attention to global features that are beneficial to segmentation results. Furthermore, the ISPRS datasets were processed to focus on the segmentation of vegetation categories and introduce Digital Surface Model (DSM) images to learn and integrate features to improve the segmentation accuracy of surface vegetation, which has certain prospects in the field of forestry applications. We conduct a set of comparative experiments on ISPRS Vaihingen and Potsdam datasets. The results verify the superior performance of the proposed SERNet. Full article

(This article belongs to the Special Issue Remote Sensing and Smart Forestry)

► Show Figures

Figure 1

18 pages, 4094 KiB

Open AccessArticle

Region-Enhancing Network for Semantic Segmentation of Remote-Sensing Imagery

by Bo Zhong, Jiang Du, Minghao Liu, Aixia Yang and Junjun Wu

Sensors 2021, 21(21), 7316; https://doi.org/10.3390/s21217316 - 3 Nov 2021

Cited by 1 | Viewed by 2155

Abstract

Semantic segmentation for high-resolution remote-sensing imagery (HRRSI) has become increasingly popular in machine vision in recent years. Most of the state-of-the-art methods for semantic segmentation of HRRSI usually emphasize the strong learning ability of deep convolutional neural network to model the contextual relationship [...] Read more.

Semantic segmentation for high-resolution remote-sensing imagery (HRRSI) has become increasingly popular in machine vision in recent years. Most of the state-of-the-art methods for semantic segmentation of HRRSI usually emphasize the strong learning ability of deep convolutional neural network to model the contextual relationship in the image, which takes too much consideration on every pixel in images and subsequently causes the problem of overlearning. Annotation errors and easily confused features can also lead to the confusion problem while using the pixel-based methods. Therefore, we propose a new semantic segmentation network—the region-enhancing network (RE-Net)—to emphasize the regional information instead of pixels to solve the above problems. RE-Net introduces the regional information into the base network, to enhance the regional integrity of images and thus reduce misclassification. Specifically, the regional context learning procedure (RCLP) can learn the context relationship from the perspective of regions. The region correcting procedure (RCP) uses the pixel aggregation feature to recalibrate the pixel features in each region. In addition, another simple intra-network multi-scale attention module is introduced to select features at different scales by the size of the region. A large number of comparative experiments on four different public datasets demonstrate that the proposed RE-Net performs better than most of the state-of-the-art ones. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

0 pages, 11541 KiB

Open AccessArticle

RETRACTED: Attention-Based Deep Feature Fusion for the Scene Classification of High-Resolution Remote Sensing Images

by Ruixi Zhu, Li Yan, Nan Mo and Yi Liu

Remote Sens. 2019, 11(17), 1996; https://doi.org/10.3390/rs11171996 - 23 Aug 2019

Cited by 48 | Viewed by 6284 | Retraction

Abstract

Scene classification of high-resolution remote sensing images (HRRSI) is one of the most important means of land-cover classification. Deep learning techniques, especially the convolutional neural network (CNN) have been widely applied to the scene classification of HRRSI due to the advancement of graphic [...] Read more.

Scene classification of high-resolution remote sensing images (HRRSI) is one of the most important means of land-cover classification. Deep learning techniques, especially the convolutional neural network (CNN) have been widely applied to the scene classification of HRRSI due to the advancement of graphic processing units (GPU). However, they tend to extract features from the whole images rather than discriminative regions. The visual attention mechanism can force the CNN to focus on discriminative regions, but it may suffer from the influence of intra-class diversity and repeated texture. Motivated by these problems, we propose an attention-based deep feature fusion (ADFF) framework that constitutes three parts, namely attention maps generated by Gradient-weighted Class Activation Mapping (Grad-CAM), a multiplicative fusion of deep features and the center-based cross-entropy loss function. First of all, we propose to make attention maps generated by Grad-CAM as an explicit input in order to force the network to concentrate on discriminative regions. Then, deep features derived from original images and attention maps are proposed to be fused by multiplicative fusion in order to consider both improved abilities to distinguish scenes of repeated texture and the salient regions. Finally, the center-based cross-entropy loss function that utilizes both the cross-entropy loss and center loss function is proposed to backpropagate fused features so as to reduce the effect of intra-class diversity on feature representations. The proposed ADFF architecture is tested on three benchmark datasets to show its performance in scene classification. The experiments confirm that the proposed method outperforms most competitive scene classification methods with an average overall accuracy of 94% under different training ratios. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

23 pages, 10782 KiB

Open AccessArticle

Class-Specific Anchor Based and Context-Guided Multi-Class Object Detection in High Resolution Remote Sensing Imagery with a Convolutional Neural Network

by Nan Mo, Li Yan, Ruixi Zhu and Hong Xie

Remote Sens. 2019, 11(3), 272; https://doi.org/10.3390/rs11030272 - 30 Jan 2019

Cited by 23 | Viewed by 5059

Abstract

In this paper, the problem of multi-scale geospatial object detection in High Resolution Remote Sensing Images (HRRSI) is tackled. The different flight heights, shooting angles and sizes of geographic objects in the HRRSI lead to large scale variance in geographic objects. The inappropriate [...] Read more.

In this paper, the problem of multi-scale geospatial object detection in High Resolution Remote Sensing Images (HRRSI) is tackled. The different flight heights, shooting angles and sizes of geographic objects in the HRRSI lead to large scale variance in geographic objects. The inappropriate anchor size to propose the objects and the indiscriminative ability of features for describing the objects are the main causes of missing detection and false detection in multi-scale geographic object detection. To address these challenges, we propose a class-specific anchor based and context-guided multi-class object detection method with a convolutional neural network (CNN), which can be divided into two parts: a class-specific anchor based region proposal network (RPN) and a discriminative feature with a context information classification network. A class-specific anchor block providing better initial values for RPN is proposed to generate the anchor of the most suitable scale for each category in order to increase the recall ratio. Meanwhile, we proposed to incorporate the context information into the original convolutional feature to improve the discriminative ability of the features and increase classification accuracy. Considering the quality of samples for classification, the soft filter is proposed to select effective boxes to improve the diversity of the samples for the classifier and avoid missing or false detection to some extent. We also introduced the focal loss in order to improve the classifier in classifying the hard samples. The proposed method is tested on a benchmark dataset of ten classes to prove the superiority. The proposed method outperforms some state-of-the-art methods with a mean average precision (mAP) of 90.4% and better detects the multi-scale objects, especially when objects show a minor shape change. Full article

(This article belongs to the Special Issue Recent Advances in Neural Networks for Remote Sensing)

► Show Figures

Graphical abstract

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (16)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI