A Boundary-Assisted Multi-Scale Transformer for Object-Level Building Extraction from Satellite Remote Sensing Imagery
Abstract
1. Introduction
2. Data and Method
2.1. Data
2.2. Method
2.2.1. Overview of MRLNet Architecture
2.2.2. Transformer as Region Encoder
2.2.3. Multiscale Pixel Region Geometrics
2.2.4. Boundary Assist Loss
2.2.5. Verification Metrics
3. Experimental Results
3.1. Comparative Experiment
3.2. Cross-Dataset Validation on Massachusetts Buildings Dataset
3.3. Ablation Experiment
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| MRLNet | Multi-Scale Region Learning Network |
| CMSFM | Cascade Multi-Scale Fusion Module |
| CNN | Convolutional Neural Network |
| MIoU | Mean Intersection over Union |
| WHU | Wuhan University |
References
- Naheed, S.; Shooshtarian, S. The Role of Cultural Heritage in Promoting Urban Sustainability: A Brief Review. Land 2022, 11, 1508. [Google Scholar] [CrossRef]
- Huang, X.; Wen, D.; Li, J.; Qin, R. Multi-Level Monitoring of Subtle Urban Changes for the Megacities of China Using High-Resolution Multi-View Satellite Imagery. Remote Sens. Environ. 2017, 196, 56–75. [Google Scholar] [CrossRef]
- Wu, G.; Shao, X.; Guo, Z.; Chen, Q.; Yuan, W.; Shi, X.; Xu, Y.; Shibasaki, R. Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks. Remote Sens. 2018, 10, 407. [Google Scholar] [CrossRef]
- Yan, G.; Jing, H.; Li, H.; Guo, H.; He, S. Enhancing Building Segmentation in Remote Sensing Images: Advanced Multi-Scale Boundary Refinement with MBR-HRNet. Remote Sens. 2023, 15, 3766. [Google Scholar] [CrossRef]
- Yang, F.; Jiang, F.; Li, J.; Lu, L. MSTrans: Multi-Scale Transformer for Building Extraction from HR Remote Sensing Images. Electronics 2024, 13, 4610. [Google Scholar] [CrossRef]
- Guo, Z.; Shao, X.; Xu, Y.; Miyazaki, H.; Ohira, W.; Shibasaki, R. Identification of Village Building via Google Earth Images and Supervised Machine Learning Methods. Remote Sens. 2016, 8, 271. [Google Scholar] [CrossRef]
- Yin, C.; Yan, J.; Yuan, M.; Tian, G.; Wen, Q.; Wang, L.; Li, L. How Does Built Environment Affect the Urban Heat Island Effect? A Systematic Framework Integrating Land Use, Building Form, and Road Network. Environ. Dev. Sustain. 2025. [Google Scholar] [CrossRef]
- Wang, S.; Wang, Z.; Zhang, Y.; Fan, Y. Characteristics of Urban Heat Island in China and Its Influences on Building Energy Consumption. Appl. Sci. 2022, 12, 7678. [Google Scholar] [CrossRef]
- Xu, Y.; Wu, L.; Xie, Z.; Chen, Z. Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters. Remote Sens. 2018, 10, 144. [Google Scholar] [CrossRef]
- Hu, Q.; Zhen, L.; Mao, Y.; Zhou, X.; Zhou, G. Automated Building Extraction Using Satellite Remote Sensing Imagery. Autom. Constr. 2021, 123, 103509. [Google Scholar] [CrossRef]
- Weidner, U.; Förstner, W. Towards Automatic Building Extraction from High-Resolution Digital Elevation Models. ISPRS J. Photogramm. Remote Sens. 1995, 50, 38–49. [Google Scholar] [CrossRef]
- Jin, X.; Davis, C.H. Automated Building Extraction from High-Resolution Satellite Imagery in Urban Areas Using Structural, Contextual, and Spectral Information. Eurasip J. Adv. Signal Process. 2005, 2005, 1–11. [Google Scholar] [CrossRef]
- Fan, Z.; Wang, S.; Pu, X.; Wei, H.; Liu, Y.; Sui, X.; Chen, Q. Fusion-Former: Fusion Features across Transformer and Convolution for Building Change Detection. Electronics 2023, 12, 4823. [Google Scholar] [CrossRef]
- Lee, D.S.; Shan, J.; Bethel, J.S. Class-Guided Building Extraction from Ikonos Imagery. Photogramm. Eng. Remote Sens. 2003, 69, 143–150. [Google Scholar] [CrossRef]
- Bi, Q.; Qin, K.; Zhang, H.; Zhang, Y.; Li, Z.; Xu, K. A Multi-Scale Filtering Building Index for Building Extraction in Very High-Resolution Satellite Imagery. Remote Sens. 2019, 11, 482. [Google Scholar] [CrossRef]
- Huang, X.; Zhang, L. Morphological Building/Shadow Index for Building Extraction From High-Resolution Imagery Over Urban Areas. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 161–172. [Google Scholar] [CrossRef]
- Dong, X.; Cao, J.; Zhao, W. A Review of Research on Remote Sensing Images Shadow Detection and Application to Building Extraction. Eur. J. Remote Sens. 2024, 57, 2293163. [Google Scholar] [CrossRef]
- Li, Q.; Mou, L.; Sun, Y.; Hua, Y.; Shi, Y.; Zhu, X.X. A Review of Building Extraction From Remote Sensing Imagery: Geometrical Structures and Semantic Attributes. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4702315. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Wu, T.; Hu, Y.; Peng, L.; Chen, R. Improved Anchor-Free Instance Segmentation for Building Extraction from High-Resolution Remote Sensing Images. Remote Sens. 2020, 12, 2910. [Google Scholar] [CrossRef]
- Zhu, Y.; Liang, Z.; Yan, J.; Chen, G.; Wang, X. E-D-Net: Automatic Building Extraction From High-Resolution Aerial Images With Boundary Information. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4595–4606. [Google Scholar] [CrossRef]
- Tian, Q.; Zhao, Y.; Li, Y.; Chen, J.; Chen, X.; Qin, K. Multiscale Building Extraction with Refined Attention Pyramid Networks. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8011305. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S.; Lu, M. A Scale Robust Convolutional Neural Network for Automatic Building Extraction from Aerial and Satellite Imagery. Int. J. Remote Sens. 2019, 40, 3308–3322. [Google Scholar] [CrossRef]
- Xia, L.; Mi, S.; Zhang, J.; Luo, J.; Shen, Z.; Cheng, Y. Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction. Remote Sens. 2023, 15, 2689. [Google Scholar] [CrossRef]
- Hu, A.; Wu, L.; Xu, Y.; Xie, Z. SANET: A Shape-Aware Building Footprints Extraction Method in Remote Sensing Images by Integrating Fourier Shape Descriptors. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5632215. [Google Scholar] [CrossRef]
- Gibril, M.B.A.; Al-Ruzouq, R.; Shanableh, A.; Jena, R.; Bolcek, J.; Shafri, H.Z.M.; Ghorbanzadeh, O. Transformer-Based Semantic Segmentation for Large-Scale Building Footprint Extraction from Very-High Resolution Satellite Images. Adv. Space Res. 2024, 73, 4937–4954. [Google Scholar] [CrossRef]
- Yiming, T.; Tang, X.; Shang, H. A Shape-Aware Enhancement Vision Transformer for Building Extraction from Remote Sensing Imagery. Int. J. Remote Sens. 2024, 45, 1250–1276. [Google Scholar] [CrossRef]
- Wang, L.; Fang, S.; Meng, X.; Li, R. Building Extraction with Vision Transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5625711. [Google Scholar] [CrossRef]
- Zhang, R.; Wan, Z.; Zhang, Q.; Zhang, G. DSAT-Net: Dual Spatial Attention Transformer for Building Extraction From Aerial Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6008405. [Google Scholar] [CrossRef]
- Chen, X.; Qiu, C.; Guo, W.; Yu, A.; Tong, X.; Schmitt, M. Multiscale Feature Learning by Transformer for Building Extraction From Satellite Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 2503605. [Google Scholar] [CrossRef]
- Chang, J.; He, X.; Li, P.; Tian, T.; Cheng, X.; Qiao, M.; Zhou, T.; Zhang, B.; Chang, Z.; Fan, T. Multi-Scale Attention Network for Building Extraction from High-Resolution Remote Sensing Images. Sensors 2024, 24, 1010. [Google Scholar] [CrossRef] [PubMed]
- Yuan, Q.; Xia, B. Cross-Level and Multiscale CNN-Transformer Network for Automatic Building Extraction from Remote Sensing Imagery. Int. J. Remote Sens. 2024, 45, 2893–2914. [Google Scholar] [CrossRef]
- Chang, J.; Cen, Y.; Cen, G. Asymmetric Network Combining CNN and Transformer for Building Extraction from Remote Sensing Images. Sensors 2024, 24, 6198. [Google Scholar] [CrossRef]
- Ji, S.; Wei, S.; Lu, M. Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set. IEEE Trans. Geosci. Remote Sens. 2019, 57, 574–586. [Google Scholar] [CrossRef]
- Mnih, V. Machine Learning for Aerial Image Labeling. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2013. [Google Scholar]
- Zhang, Y.; Pang, B.; Lu, C. Semantic Segmentation by Early Region Proxy. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1248–1258. [Google Scholar]
- Wang, C.; Zhang, Y.; Cui, M.; Ren, P.; Yang, Y.; Xie, X.; Hua, X.S.; Bao, H.; Xu, W. Active Boundary Loss for Semantic Segmentation. Proc. Aaai Conf. Artif. Intell. 2022, 36, 2397–2405. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.S.; et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6877–6886. [Google Scholar]







| Model | MIoU | F1 |
|---|---|---|
| FCN | 76.18 ± 0.35% | 79.32 ± 0.31% |
| UNet | 83.27 ± 0.28% | 86.52 ± 0.24% |
| DeepLab V3 | 86.41 ± 0.22% | 88.39 ± 0.19% |
| SETR | 88.33 ± 0.26% | 90.26 ± 0.21% |
| MRLNet (Ours) | 90.14 ± 0.18% | 92.47 ± 0.15% |
| Model | Params (M) | Time (ms) | FPS |
|---|---|---|---|
| FCN | 53.27 | 10.1 | 98.6 |
| UNet | 62.03 | 9.9 | 100.9 |
| DeepLab V3 | 63.00 | 13.1 | 76.3 |
| SETR | 98.00 | 27.1 | 36.8 |
| MRLNet (Ours) | 86.60 | 12.7 | 78.6 |
| Model | MIoU | F1 |
|---|---|---|
| FCN | 81.20 ± 0.32% | 89.20 ± 0.27% |
| UNet | 82.15 ± 0.27% | 89.81 ± 0.23% |
| DeepLab V3 | 81.26 ± 0.30% | 89.24 ± 0.26% |
| SETR | 80.93 ± 0.34% | 89.01 ± 0.29% |
| MRLNet (Ours) | 83.14 ± 0.21% | 90.46 ± 0.17% |
| Model | MIoU | F1 |
|---|---|---|
| Baseline | 88.56% | 90.36% |
| Baseline + CMSFM | 89.75% | 91.79% |
| Baseline + | 89.28% | 91.54% |
| Baseline + CMSFM + | 90.14% | 92.47% |
| Model | MIoU | F1 |
|---|---|---|
| Baseline | 81.47% | 88.96% |
| Baseline + CMSFM | 82.38% | 89.72% |
| Baseline + | 82.05% | 89.48% |
| Baseline + CMSFM + | 83.14% | 90.46% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, S.; Wang, H.; Yao, J.; Wu, Z.; Chen, Z. A Boundary-Assisted Multi-Scale Transformer for Object-Level Building Extraction from Satellite Remote Sensing Imagery. Electronics 2026, 15, 1301. https://doi.org/10.3390/electronics15061301
Li S, Wang H, Yao J, Wu Z, Chen Z. A Boundary-Assisted Multi-Scale Transformer for Object-Level Building Extraction from Satellite Remote Sensing Imagery. Electronics. 2026; 15(6):1301. https://doi.org/10.3390/electronics15061301
Chicago/Turabian StyleLi, Suju, Haoran Wang, Jing Yao, Zhaoming Wu, and Zhengchao Chen. 2026. "A Boundary-Assisted Multi-Scale Transformer for Object-Level Building Extraction from Satellite Remote Sensing Imagery" Electronics 15, no. 6: 1301. https://doi.org/10.3390/electronics15061301
APA StyleLi, S., Wang, H., Yao, J., Wu, Z., & Chen, Z. (2026). A Boundary-Assisted Multi-Scale Transformer for Object-Level Building Extraction from Satellite Remote Sensing Imagery. Electronics, 15(6), 1301. https://doi.org/10.3390/electronics15061301

