Next Article in Journal
Effects of Atmospheric Tide Loading on GPS Coordinate Time Series
Previous Article in Journal
Deep Learning-Based Contrail Segmentation in Thermal Infrared Satellite Cloud Images via Frequency-Domain Enhancement
Previous Article in Special Issue
Unsupervised Multimodal UAV Image Registration via Style Transfer and Cascade Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

DeepSwinLite: A Swin Transformer-Based Light Deep Learning Model for Building Extraction Using VHR Aerial Imagery

Department of Geomatics Engineering, Gebze Technical University, Gebze-Kocaeli 41400, Turkey
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(18), 3146; https://doi.org/10.3390/rs17183146
Submission received: 5 August 2025 / Revised: 9 September 2025 / Accepted: 10 September 2025 / Published: 10 September 2025
(This article belongs to the Special Issue Advances in Deep Learning Approaches: UAV Data Analysis)

Abstract

Accurate extraction of building features from remotely sensed data is essential for supporting research and applications in urban planning, land management, transportation infrastructure development, and disaster monitoring. Despite the prominence of deep learning as the state-of-the-art (SOTA) methodology for building extraction, substantial challenges remain, largely stemming from the diversity of building structures and the complexity of background features. To mitigate these issues, this study introduces DeepSwinLite, a lightweight architecture based on the Swin Transformer, designed to extract building footprints from very high-resolution (VHR) imagery. The model integrates a novel local–global attention module to enhance the interpretation of objects across varying spatial resolutions and facilitate effective information exchange between different feature abstraction levels. It comprises three modules: multi-scale feature aggregation (MSFA), improving recognition across varying object sizes; multi-level feature pyramid (MLFP), fusing detailed and semantic features; and AuxHead, providing auxiliary supervision to stabilize and enhance learning. Experimental evaluations on the Massachusetts and WHU Building Datasets reveal the superior performance of DeepSwinLite architecture when compared to existing SOTA models. On the Massachusetts dataset, the model attained an OA of 92.54% and an IoU of 77.94%, while on the WHU dataset, it achieved an OA of 98.32% and an IoU of 92.02%. Following the correction of errors identified in the Massachusetts ground truth and iterative enhancement, the model’s performance further improved, reaching 94.63% OA and 79.86% IoU. A key advantage of the DeepSwinLite model is its computational efficiency, requiring fewer floating-point operations (FLOPs) and parameters compared to other SOTA models. This efficiency makes the model particularly suitable for deployment in mobile and resource-constrained systems.
Keywords: building footprint extraction; deep learning; semantic segmentation; Swin Transformer; VHR imagery; Massachusetts building dataset; WHU building dataset building footprint extraction; deep learning; semantic segmentation; Swin Transformer; VHR imagery; Massachusetts building dataset; WHU building dataset

Share and Cite

MDPI and ACS Style

Yilmaz, E.O.; Kavzoglu, T. DeepSwinLite: A Swin Transformer-Based Light Deep Learning Model for Building Extraction Using VHR Aerial Imagery. Remote Sens. 2025, 17, 3146. https://doi.org/10.3390/rs17183146

AMA Style

Yilmaz EO, Kavzoglu T. DeepSwinLite: A Swin Transformer-Based Light Deep Learning Model for Building Extraction Using VHR Aerial Imagery. Remote Sensing. 2025; 17(18):3146. https://doi.org/10.3390/rs17183146

Chicago/Turabian Style

Yilmaz, Elif Ozlem, and Taskin Kavzoglu. 2025. "DeepSwinLite: A Swin Transformer-Based Light Deep Learning Model for Building Extraction Using VHR Aerial Imagery" Remote Sensing 17, no. 18: 3146. https://doi.org/10.3390/rs17183146

APA Style

Yilmaz, E. O., & Kavzoglu, T. (2025). DeepSwinLite: A Swin Transformer-Based Light Deep Learning Model for Building Extraction Using VHR Aerial Imagery. Remote Sensing, 17(18), 3146. https://doi.org/10.3390/rs17183146

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop