MDPI - Publisher of Open Access Journals

21 pages, 3538 KiB

Open AccessArticle

MFFP-Net: Building Segmentation in Remote Sensing Images via Multi-Scale Feature Fusion and Foreground Perception Enhancement

by Huajie Xu, Qiukai Huang, Haikun Liao, Ganxiao Nong and Wei Wei

Remote Sens. 2025, 17(11), 1875; https://doi.org/10.3390/rs17111875 - 28 May 2025

Viewed by 541

Abstract

The accurate segmentation of small target buildings in high-resolution remote sensing images remains challenging due to two critical issues: (1) small target buildings often occupy few pixels in complex backgrounds, leading to frequent background confusion, and (2) significant intra-class variance complicates feature representation [...] Read more.

The accurate segmentation of small target buildings in high-resolution remote sensing images remains challenging due to two critical issues: (1) small target buildings often occupy few pixels in complex backgrounds, leading to frequent background confusion, and (2) significant intra-class variance complicates feature representation compared to conventional semantic segmentation tasks. To address these challenges, we propose a novel Multi-Scale Feature Fusion and Foreground Perception Enhancement Network (MFFP-Net). This framework introduces three key innovations: (1) a Multi-Scale Feature Fusion (MFF) module that hierarchically aggregates shallow features through cross-level connections to enhance fine-grained detail preservation, (2) a Foreground Perception Enhancement (FPE) module that establishes pixel-wise affinity relationships within foreground regions to mitigate intra-class variance effects, and (3) a Dual-Path Attention (DPA) mechanism combining parallel global and local attention pathways to jointly capture structural details and long-range contextual dependencies. Experimental results demonstrate that the IoU of the proposed method achieves improvements of 0.44%, 0.98% and 0.61% compared to mainstream state-of-the-art methods on the WHU Building, Massachusetts Building, and Inria Aerial Image Labeling datasets, respectively, validating its effectiveness in handling small targets and intra-class variance while maintaining robustness in complex scenarios. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

22 pages, 9294 KiB

Open AccessArticle

Deep Layered Network Based on Rotation Operation and Residual Transform for Building Segmentation from Remote Sensing Images

by Shuzhe Zhang, Taoyi Chen, Fei Su, Hao Xu, Yan Li and Yaohui Liu

Sensors 2025, 25(8), 2608; https://doi.org/10.3390/s25082608 - 20 Apr 2025

Cited by 1 | Viewed by 446

Abstract

Deep learning has been widely applied in building segmentation from high-resolution remote sensing (HRS) images. However, HRS images suffer from insufficient complementary representation of target points in terms of capturing details and global information. To this end, we propose a novel building segmentation [...] Read more.

Deep learning has been widely applied in building segmentation from high-resolution remote sensing (HRS) images. However, HRS images suffer from insufficient complementary representation of target points in terms of capturing details and global information. To this end, we propose a novel building segmentation model for HRS images, termed C_ASegformer. Specifically, we design a Deep Layered Enhanced Fusion (DLEF) module to integrate hierarchical information from different receptive fields, thereby enhancing the feature representation capability of HRS information from global to detailed levels. Additionally, we introduce a Triplet Attention (TA) Module, which establishes dependency relationships between buildings and the environment through multi-directional rotation operations and residual transformations. Furthermore, we propose a Multi-Level Dilated Connection (MDC) Module to efficiently capture contextual relationships across different scales at a low computational cost. We conduct comparative experiments with several state-of-the-art models on three datasets, including the Massachusetts dataset, the INRIA dataset, and the WHU dataset. On the Massachusetts dataset, C_ASegformer achieves 95.42%, 85.69%, and 75.46% for OA, F1score, and mIoU, respectively. C_ASegformer shows more accurate performance, demonstrating the validity and sophistication of the model. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Remote Sensing)

► Show Figures

Figure 1

17 pages, 6005 KiB

Open AccessArticle

FEPA-Net: A Building Extraction Network Based on Fusing the Feature Extraction and Position Attention Module

by Yuexin Liu, Yulin Duan, Xuya Zhang, Wen Zhang and Chang Wang

Appl. Sci. 2025, 15(8), 4432; https://doi.org/10.3390/app15084432 - 17 Apr 2025

Cited by 1 | Viewed by 329

Abstract

The extraction of buildings from remote sensing images is of crucial significance in urban management and planning, but it remains difficult to automatically extract buildings with precise boundaries from remote sensing images. In this paper, we propose the FEPA-Net network model, which integrates [...] Read more.

The extraction of buildings from remote sensing images is of crucial significance in urban management and planning, but it remains difficult to automatically extract buildings with precise boundaries from remote sensing images. In this paper, we propose the FEPA-Net network model, which integrates the feature extraction and position attention module for the extraction of buildings in remote sensing images. The suggested model is implemented by employing U-Net as a base model. Firstly, the number of convolutional operations in the model was increased to extract more abstract features of the objects on the ground; secondly, within the network, the ordinary convolution is substituted with the dilated convolution. This substitution aims to broaden the receptive field, with the primary intention of enabling the output of each convolution layer to incorporate a broader spectrum of feature information. Additionally, a feature extraction module is added to mitigate the loss of detailed features. Finally, the position attention module is introduced to obtain more context information. The model undergoes validation and analysis using the Massachusetts dataset and the WHU dataset. The experimental results demonstrate that the FEPA-Net model outperforms other comparative methods in quantitative evaluation. Specifically, compared to the U-Net model, the average cross-merge ratio on the two datasets improves by 1.41% and 1.43%, respectively. The comparison of the results shows that the FEPA-Net model effectively improves the accuracy of building extraction, reduces the phenomenon of wrong detection and omission, and can more clearly identify the building outline. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

25 pages, 2849 KiB

Open AccessArticle

Enhanced Hybrid U-Net Framework for Sophisticated Building Automation Extraction Utilizing Decay Matrix

by Ting Wang, Zhuyi Gong, Anqi Tang, Qian Zhang and Yun Ge

Buildings 2024, 14(11), 3353; https://doi.org/10.3390/buildings14113353 - 23 Oct 2024

Viewed by 1246

Abstract

Automatically extracting buildings from remote sensing imagery using deep learning techniques has become essential for various real-world applications. However, mainstream methods often encounter difficulties in accurately extracting and reconstructing fine-grained features due to the heterogeneity and scale variations in building appearances. To address [...] Read more.

Automatically extracting buildings from remote sensing imagery using deep learning techniques has become essential for various real-world applications. However, mainstream methods often encounter difficulties in accurately extracting and reconstructing fine-grained features due to the heterogeneity and scale variations in building appearances. To address these challenges, we propose LDFormer, an advanced building segmentation model based on linear decay. LDFormer introduces a multi-scale detail fusion bridge (MDFB), which dynamically integrates shallow features to enhance the representation of local details and capture fine-grained local features effectively. To improve global feature extraction, the model incorporates linear decay self-attention (LDSA) and depthwise large separable kernel multi-layer perceptron (DWLSK-MLP) optimizations in the decoder. Specifically, LDSA employs a linear decay matrix within the self-attention mechanism to address long-distance dependency issues, while DWLSK-MLP utilizes step-wise convolutions to achieve a large receptive field. The proposed method has been evaluated on the Massachusetts, Inria, and WHU building datasets, achieving IoU scores of 76.10%, 82.87%, and 91.86%, respectively. LDFormer demonstrates superior performance compared to existing state-of-the-art methods in building segmentation tasks, showcasing its significant potential for building automation extraction. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 2nd Volume)

► Show Figures

Figure 1

23 pages, 7912 KiB

Open AccessArticle

Asymmetric Network Combining CNN and Transformer for Building Extraction from Remote Sensing Images

by Junhao Chang, Yuefeng Cen and Gang Cen

Sensors 2024, 24(19), 6198; https://doi.org/10.3390/s24196198 - 25 Sep 2024

Cited by 7 | Viewed by 2081

Abstract

The accurate extraction of buildings from remote sensing images is crucial in fields such as 3D urban planning, disaster detection, and military reconnaissance. In recent years, models based on Transformer have performed well in global information processing and contextual relationship modeling, but suffer [...] Read more.

The accurate extraction of buildings from remote sensing images is crucial in fields such as 3D urban planning, disaster detection, and military reconnaissance. In recent years, models based on Transformer have performed well in global information processing and contextual relationship modeling, but suffer from high computational costs and insufficient ability to capture local information. In contrast, convolutional neural networks (CNNs) are very effective in extracting local features, but have a limited ability to process global information. In this paper, an asymmetric network (CTANet), which combines the advantages of CNN and Transformer, is proposed to achieve efficient extraction of buildings. Specifically, CTANet employs ConvNeXt as an encoder to extract features and combines it with an efficient bilateral hybrid attention transformer (BHAFormer) which is designed as a decoder. The BHAFormer establishes global dependencies from both texture edge features and background information perspectives to extract buildings more accurately while maintaining a low computational cost. Additionally, the multiscale mixed attention mechanism module (MSM-AMM) is introduced to learn the multiscale semantic information and channel representations of the encoder features to reduce noise interference and compensate for the loss of information in the downsampling process. Experimental results show that the proposed model achieves the best F1-score (86.7%, 95.74%, and 90.52%) and IoU (76.52%, 91.84%, and 82.68%) compared to other state-of-the-art methods on the Massachusetts building dataset, the WHU building dataset, and the Inria aerial image labeling dataset. Full article

(This article belongs to the Special Issue Data Fusion and Artificial Intelligence Applications in Remote Sensing)

► Show Figures

Figure 1

24 pages, 9004 KiB

Open AccessArticle

NPSFF-Net: Enhanced Building Segmentation in Remote Sensing Images via Novel Pseudo-Siamese Feature Fusion

by Ningbo Guo, Mingyong Jiang, Xiaoyu Hu, Zhijuan Su, Weibin Zhang, Ruibo Li and Jiancheng Luo

Remote Sens. 2024, 16(17), 3266; https://doi.org/10.3390/rs16173266 - 3 Sep 2024

Cited by 5 | Viewed by 1529

Abstract

Building segmentation has extensive research value and application prospects in high-resolution remote sensing image (HRSI) processing. However, complex architectural contexts, varied building morphologies, and non-building occlusions make building segmentation challenging. Compared with traditional methods, deep learning-based methods present certain advantages in terms of [...] Read more.

Building segmentation has extensive research value and application prospects in high-resolution remote sensing image (HRSI) processing. However, complex architectural contexts, varied building morphologies, and non-building occlusions make building segmentation challenging. Compared with traditional methods, deep learning-based methods present certain advantages in terms of accuracy and intelligence. At present, the most popular option is to first apply a single neural network to encode an HRSI, then perform a decoding process through up-sampling or using a transposed convolution operation, and then finally obtain the segmented building image with the help of a loss function. Although effective, this approach not only tends to lead to a loss of detail information, but also fails to fully utilize the contextual features. As an alternative, we propose a novel network called NPSFF-Net. First, using an improved pseudo-Siamese network composed of ResNet-34 and ResNet-50, two sets of deep semantic features of buildings are extracted with the support of transfer learning, and four encoded features at different scales are obtained after fusion. Then, information from the deepest encoded feature is enriched using a feature enhancement module, and the resolutions are recovered via the operations of skip connections and transposed convolutions. Finally, the discriminative features of buildings are obtained using the designed feature fusion algorithm, and the optimal segmentation model is obtained by fitting a cross-entropy loss function. Our method obtained intersection-over-union values of 89.45% for the Aerial Imagery Dataset, 71.88% for the Massachusetts Buildings Dataset, and 68.72% for the Satellite Dataset I. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

22 pages, 3918 KiB

Open AccessArticle

A Prior-Guided Dual Branch Multi-Feature Fusion Network for Building Segmentation in Remote Sensing Images

by Yingbin Wu, Peng Zhao, Fubo Wang, Mingquan Zhou, Shengling Geng and Dan Zhang

Buildings 2024, 14(7), 2006; https://doi.org/10.3390/buildings14072006 - 2 Jul 2024

Cited by 1 | Viewed by 1410

Abstract

The domain of remote sensing image processing has witnessed remarkable advancements in recent years, with deep convolutional neural networks (CNNs) establishing themselves as a prominent approach for building segmentation. Despite the progress, traditional CNNs, which rely on convolution and pooling for feature extraction [...] Read more.

The domain of remote sensing image processing has witnessed remarkable advancements in recent years, with deep convolutional neural networks (CNNs) establishing themselves as a prominent approach for building segmentation. Despite the progress, traditional CNNs, which rely on convolution and pooling for feature extraction during the encoding phase, often fail to precisely delineate global pixel interactions, potentially leading to the loss of vital semantic details. Moreover, conventional CNN-based segmentation models frequently neglect the nuanced semantic differences between shallow and deep features during the decoding phase, which can result in subpar feature integration through rudimentary addition or concatenation techniques. Additionally, the unique boundary characteristics of buildings in remote sensing images, which offer a rich vein of prior information, have not been fully harnessed by traditional CNNs. This paper introduces an innovative approach to building segmentation in remote sensing images through a prior-guided dual branch multi-feature fusion network (PDBMFN). The network is composed of a prior-guided branch network (PBN) in the encoding process, a parallel dilated convolution module (PDCM) designed to incorporate prior information, and a multi-feature aggregation module (MAM) in the decoding process. The PBN leverages prior region and edge information derived from superpixels and edge maps to enhance edge detection accuracy during the encoding phase. The PDCM integrates features from both branches and applies dilated convolution across various scales to expand the receptive field and capture a more comprehensive semantic context. During the decoding phase, the MAM utilizes deep semantic information to direct the fusion of features, thereby optimizing segmentation efficacy. Through a sequence of aggregations, the MAM gradually merges deep and shallow semantic information, culminating in a more enriched and holistic feature representation. Extensive experiments are conducted across diverse datasets, such as WHU, Inria Aerial, and Massachusetts, revealing that PDBMFN outperforms other sophisticated methods in terms of segmentation accuracy. In the key segmentation metrics, including mIoU, precision, recall, and F1 score, PDBMFN shows a marked superiority over contemporary techniques. The ablation studies further substantiate the performance improvements conferred by the PBN’s prior information guidance and the efficacy of the PDCM and MAM modules. Full article

(This article belongs to the Section Construction Management, and Computers & Digitization)

► Show Figures

Figure 1

21 pages, 9059 KiB

Open AccessArticle

EUNet: Edge-UNet for Accurate Building Extraction and Edge Emphasis in Gaofen-7 Images

by Ruijie Han, Xiangtao Fan and Jian Liu

Remote Sens. 2024, 16(13), 2397; https://doi.org/10.3390/rs16132397 - 29 Jun 2024

Cited by 2 | Viewed by 2584

Abstract

Deep learning is currently the mainstream approach for building extraction tasks in remote-sensing imagery, capable of automatically learning features of buildings in imagery and yielding satisfactory extraction results. However, due to the diverse sizes, irregular layouts, and complex spatial relationships of buildings, extracted [...] Read more.

Deep learning is currently the mainstream approach for building extraction tasks in remote-sensing imagery, capable of automatically learning features of buildings in imagery and yielding satisfactory extraction results. However, due to the diverse sizes, irregular layouts, and complex spatial relationships of buildings, extracted buildings often suffer from incompleteness and boundary issues. Gaofen-7 (GF-7), as a high-resolution stereo mapping satellite, provides well-rectified images from its rear-view imagery, which helps mitigate occlusions in highly varied terrain, thereby offering rich information for building extraction. To improve the integrity of the edges of the building extraction results, this paper proposes a dual-task network (Edge-UNet, EUnet) based on UNet, incorporating an edge extraction branch to emphasize edge information while predicting building targets. We evaluate this method using a self-made GF-7 Building Dataset, the Wuhan University (WHU) Building Dataset, and the Massachusetts Buildings Dataset. Comparative analysis with other mainstream semantic segmentation networks reveals significantly higher F1 scores for the extraction results of our method. Our method exhibits superior completeness and accuracy in building edge extraction compared to unmodified algorithms, demonstrating robust performance. Full article

► Show Figures

Figure 1

19 pages, 59071 KiB

Open AccessArticle

Road Extraction from Remote Sensing Imagery with Spatial Attention Based on Swin Transformer

by Xianhong Zhu, Xiaohui Huang, Weijia Cao, Xiaofei Yang, Yunfei Zhou and Shaokai Wang

Remote Sens. 2024, 16(7), 1183; https://doi.org/10.3390/rs16071183 - 28 Mar 2024

Cited by 9 | Viewed by 3922

Abstract

Road extraction is a crucial aspect of remote sensing imagery processing that plays a significant role in various remote sensing applications, including automatic driving, urban planning, and path navigation. However, accurate road extraction is a challenging task due to factors such as high [...] Read more.

Road extraction is a crucial aspect of remote sensing imagery processing that plays a significant role in various remote sensing applications, including automatic driving, urban planning, and path navigation. However, accurate road extraction is a challenging task due to factors such as high road density, building occlusion, and complex traffic environments. In this study, a Spatial Attention Swin Transformer (SASwin Transformer) architecture is proposed to create a robust encoder capable of extracting roads from remote sensing imagery. In this architecture, we have developed a spatial self-attention (SSA) module that captures efficient and rich spatial information through spatial self-attention to reconstruct the feature map. Following this, the module performs residual connections with the input, which helps reduce interference from unrelated regions. Additionally, we designed a Spatial MLP (SMLP) module to aggregate spatial feature information from multiple branches while simultaneously reducing computational complexity. Two public road datasets, the Massachusetts dataset and the DeepGlobe dataset, were used for extensive experiments. The results show that our proposed model has an improved overall performance compared to several state-of-the-art algorithms. In particular, on the two datasets, our model outperforms D-LinkNet with an increase in Intersection over Union (IoU) metrics of 1.88% and 1.84%, respectively. Full article

(This article belongs to the Special Issue Quantitative Inversion and Validation of Satellite Remote Sensing Products)

► Show Figures

Figure 1

17 pages, 6520 KiB

Open AccessArticle

Multi-Scale Feature Fusion Attention Network for Building Extraction in Remote Sensing Images

by Jia Liu, Hang Gu, Zuhe Li, Hongyang Chen and Hao Chen

Electronics 2024, 13(5), 923; https://doi.org/10.3390/electronics13050923 - 28 Feb 2024

Cited by 10 | Viewed by 2693

Abstract

The efficient semantic segmentation of buildings in high spatial resolution remote sensing images is a technical prerequisite for land resource management, high-precision mapping, construction planning and other applications. Current building extraction methods based on deep learning can obtain high-level abstract features of images. [...] Read more.

The efficient semantic segmentation of buildings in high spatial resolution remote sensing images is a technical prerequisite for land resource management, high-precision mapping, construction planning and other applications. Current building extraction methods based on deep learning can obtain high-level abstract features of images. However, the extraction of some occluded buildings is inaccurate, and as the network deepens, small-volume buildings are lost and edges are blurred. Therefore, we introduce a multi-resolution attention combination network, which employs a multiscale channel and spatial attention module (MCAM) to adaptively capture key features and eliminate irrelevant information, which improves the accuracy of building extraction. In addition, we present a layered residual connectivity module (LRCM) to enhance the expression of information at different scales through multi-level feature fusion, significantly improving the understanding of context and the capturing of fine edge details. Extensive experiments were conducted on the WHU aerial image dataset and the Massachusetts building dataset. Compared with state-of-the-art semantic segmentation methods, this network achieves better building extraction results in remote sensing images, proving the effectiveness of the method. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

20 pages, 18978 KiB

Open AccessArticle

Dual Hybrid Attention Mechanism-Based U-Net for Building Segmentation in Remote Sensing Images

by Jingxiong Lei, Xuzhi Liu, Haolang Yang, Zeyu Zeng and Jun Feng

Appl. Sci. 2024, 14(3), 1293; https://doi.org/10.3390/app14031293 - 4 Feb 2024

Cited by 9 | Viewed by 2786

Abstract

High-resolution remote sensing images (HRRSI) have important theoretical and practical value in urban planning. However, current segmentation methods often struggle with issues like blurred edges and loss of detailed information due to the intricate backgrounds and rich semantics in high-resolution remote sensing images. [...] Read more.

High-resolution remote sensing images (HRRSI) have important theoretical and practical value in urban planning. However, current segmentation methods often struggle with issues like blurred edges and loss of detailed information due to the intricate backgrounds and rich semantics in high-resolution remote sensing images. To tackle these challenges, this paper proposes an end-to-end attention-based Convolutional Neural Network (CNN) called Double Hybrid Attention U-Net (DHAU-Net). We designed a new Double Hybrid Attention structure consisting of dual-parallel hybrid attention modules to replace the skip connections in U-Net, which can eliminate redundant information interference and enhances the collection and utilization of important shallow features. Comprehensive experiments on the Massachusetts remote sensing building dataset and the Inria aerial image labeling dataset demonstrate that our proposed method achieves effective pixel-level building segmentation in urban remote sensing images by eliminating redundant information interference and making full use of shallow features, and improves the segmentation performance without significant time costs (approximately 15%). The evaluation metrics reveal significant results, with an accuracy rate of 0.9808, precision reaching 0.9300, an F1 score of 0.9112, a mean intersection over union (mIoU) of 0.9088, and a recall rate of 0.8932. Full article

► Show Figures

Graphical abstract

23 pages, 14400 KiB

Open AccessArticle

A Dual-Branch Fusion Network Based on Reconstructed Transformer for Building Extraction in Remote Sensing Imagery

by Yitong Wang, Shumin Wang and Aixia Dou

Sensors 2024, 24(2), 365; https://doi.org/10.3390/s24020365 - 7 Jan 2024

Cited by 3 | Viewed by 2204

Abstract

Automatic extraction of building contours from high-resolution images is of great significance in the fields of urban planning, demographics, and disaster assessment. Network models based on convolutional neural network (CNN) and transformer technology have been widely used for semantic segmentation of buildings from [...] Read more.

Automatic extraction of building contours from high-resolution images is of great significance in the fields of urban planning, demographics, and disaster assessment. Network models based on convolutional neural network (CNN) and transformer technology have been widely used for semantic segmentation of buildings from high resolution remote sensing images (HRSI). However, the fixed geometric structure and the local receptive field of the convolutional kernel are not good at global feature extraction, and the transformer technique with self-attention mechanism introduces computational redundancies and extracts local feature details poorly in the process of modeling the global contextual information. In this paper, a dual-branch fused reconstructive transformer network, DFRTNet, is proposed for efficient and accurate building extraction. In the encoder, the traditional transformer is reconfigured by designing the local and global feature extraction module (LGFE); the branch of global feature extraction (GFE) performs dynamic range attention (DRA) based on the idea of top-k attention for extracting global features; furthermore, the branch of local feature extraction (LFE) is used to obtain fine-grained features. The multilayer perceptron (MLP) is employed to efficiently fuse the local and global features. In the decoder, a simple channel attention module (CAM) is used in the up-sampling part to enhance channel dimension features. Our network achieved the best segmentation accuracy on both the WHU and Massachusetts building datasets when compared to other mainstream and state-of-the-art methods. Full article

(This article belongs to the Special Issue Data Fusion and Artificial Intelligence Applications in Remote Sensing)

► Show Figures

Figure 1

22 pages, 41001 KiB

Open AccessArticle

SDSNet: Building Extraction in High-Resolution Remote Sensing Images Using a Deep Convolutional Network with Cross-Layer Feature Information Interaction Filtering

by Xudong Wang, Mingliang Tian, Zhijun Zhang, Kang He, Sheng Wang, Yan Liu and Yusen Dong

Remote Sens. 2024, 16(1), 169; https://doi.org/10.3390/rs16010169 - 31 Dec 2023

Cited by 6 | Viewed by 2109

Abstract

Building extraction refers to the automatic identification and separation of buildings from the background in remote sensing images. It plays a significant role in urban planning, land management, and disaster monitoring. Deep-learning methods have shown advantages in building extraction, but they still face [...] Read more.

Building extraction refers to the automatic identification and separation of buildings from the background in remote sensing images. It plays a significant role in urban planning, land management, and disaster monitoring. Deep-learning methods have shown advantages in building extraction, but they still face challenges such as variations in building types, object occlusions, and complex backgrounds. To address these issues, SDSNet, a deep convolutional network that incorporates global multi-scale feature extraction and cross-level feature fusion, is proposed. SDSNet consists of three modules: semantic information extraction (SIE), multi-level merge (MLM), and semantic information fusion (SIF). The SIE module extracts contextual information and improves recognition of multi-scale buildings. The MLM module filters irrelevant details guided by high-level semantic information, aiding in the restoration of edge details for buildings. The SIF module combines filtered detail information with extracted semantic information for refined building extraction. A series of experiments conducted on two distinct public datasets for building extraction consistently demonstrate that SDSNet outperforms the state-of-the-art deep-learning models for building extraction tasks. On the WHU building dataset, the overall accuracy (OA) and intersection over union (IoU) achieved impressive scores of 98.86% and 90.17%, respectively. Meanwhile, on the Massachusetts dataset, SDSNet achieved OA and IoU scores of 94.05% and 71.6%, respectively. SDSNet exhibits a unique advantage in recovering fine details along building edges, enabling automated and intelligent building extraction. This capability effectively supports urban planning, resource management, and disaster monitoring. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

20 pages, 7540 KiB

Open AccessArticle

MFFNet: A Building Extraction Network for Multi-Source High-Resolution Remote Sensing Data

by Keliang Liu, Yantao Xi, Junrong Liu, Wangyan Zhou and Yidan Zhang

Appl. Sci. 2023, 13(24), 13067; https://doi.org/10.3390/app132413067 - 7 Dec 2023

Cited by 3 | Viewed by 1832

Abstract

The use of deep learning methods to extract buildings from remote sensing images is a key contemporary research focus, and traditional deep convolutional networks continue to exhibit limitations in this regard. This study introduces a novel multi-feature fusion network (MFFNet), with the aim [...] Read more.

The use of deep learning methods to extract buildings from remote sensing images is a key contemporary research focus, and traditional deep convolutional networks continue to exhibit limitations in this regard. This study introduces a novel multi-feature fusion network (MFFNet), with the aim of enhancing the accuracy of building extraction from high-resolution remote sensing images of various sources. MFFNet improves feature capture for building targets by integrating deep semantic information from various attention mechanisms with multi-scale spatial information from a spatial pyramid module, significantly enhancing the results of building extraction. The performance of MFFNet was tested on three datasets: the self-constructed Jilin-1 building dataset, the Massachusetts building dataset, and the WHU building dataset. Notably, experimental results from the Jilin-1 building dataset demonstrated that MFFNet achieved an average intersection over union (MIoU) of 89.69%, an accuracy of 97.05%, a recall rate of 94.25%, a precision of 94.66%, and an F1 score of 94.82%. Comparisons with the other two public datasets also showed MFFNet’s significant advantages over traditional deep convolutional networks. These results confirm the superiority of MFFNet in extracting buildings from different high-resolution remote sensing data compared to other network models. Full article

(This article belongs to the Special Issue Deep Learning in Satellite Remote Sensing Applications)

► Show Figures

Figure 1

28 pages, 6569 KiB

Open AccessArticle

A Novel Building Extraction Network via Multi-Scale Foreground Modeling and Gated Boundary Refinement

by Junlin Liu, Ying Xia, Jiangfan Feng and Peng Bai

Remote Sens. 2023, 15(24), 5638; https://doi.org/10.3390/rs15245638 - 5 Dec 2023

Cited by 2 | Viewed by 1849

Abstract

Deep learning-based methods for building extraction from remote sensing images have been widely applied in fields such as land management and urban planning. However, extracting buildings from remote sensing images commonly faces challenges due to specific shooting angles. First, there exists a foreground–background [...] Read more.

Deep learning-based methods for building extraction from remote sensing images have been widely applied in fields such as land management and urban planning. However, extracting buildings from remote sensing images commonly faces challenges due to specific shooting angles. First, there exists a foreground–background imbalance issue, and the model excessively learns features unrelated to buildings, resulting in performance degradation and propagative interference. Second, buildings have complex boundary information, while conventional network architectures fail to capture fine boundaries. In this paper, we designed a multi-task U-shaped network (BFL-Net) to solve these problems. This network enhances the expression of the foreground and boundary features in the prediction results through foreground learning and boundary refinement, respectively. Specifically, the Foreground Mining Module (FMM) utilizes the relationship between buildings and multi-scale scene spaces to explicitly model, extract, and learn foreground features, which can enhance foreground and related contextual features. The Dense Dilated Convolutional Residual Block (DDCResBlock) and the Dual Gate Boundary Refinement Module (DGBRM) individually process the diverted regular stream and boundary stream. The former can effectively expand the receptive field, and the latter utilizes spatial and channel gates to activate boundary features in low-level feature maps, helping the network refine boundaries. The predictions of the network for the building, foreground, and boundary are respectively supervised by ground truth. The experimental results on the WHU Building Aerial Imagery and Massachusetts Buildings Datasets show that the IoU scores of BFL-Net are 91.37% and 74.50%, respectively, surpassing state-of-the-art models. Full article

(This article belongs to the Special Issue Deep Learning Meets Remote Sensing for Earth Observation and Monitoring)

► Show Figures

Figure 1

Search Results (41)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (41)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI