ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data

Zhuang, Zhanwu; Li, Ning; Xiao, Weiye; Wu, Jiawei; Zhou, Lei

doi:10.3390/ijgi15040146

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data

by

Zhanwu Zhuang

^1,2

,

Ning Li

¹,

Weiye Xiao

^2,*,

Jiawei Wu

²

and

Lei Zhou

¹

School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

²

State Key Laboratory of Lake and Watershed Science for Water Security, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 211135, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2026, 15(4), 146; https://doi.org/10.3390/ijgi15040146 (registering DOI)

Submission received: 13 January 2026 / Revised: 11 March 2026 / Accepted: 20 March 2026 / Published: 26 March 2026

Download Versions Notes

Abstract

Building height is a key indicator of vertical urbanization and urban morphological complexity, yet accurately mapping building height at fine spatial resolution and large spatial scales remains challenging. This study proposes an attention-based deep learning framework (ABHNet) for building height estimation at a 10 m spatial resolution by integrating multi-source remote sensing data and socioeconomic information. The model jointly exploits Sentinel-1 synthetic aperture radar data, Sentinel-2 multispectral imagery, and point of interest (POI) data. The proposed framework is evaluated in Shanghai, a megacity with dense and vertically complex urban structures, using Baidu Maps-derived building height data as reference information. The results demonstrate that the proposed method achieves accurate building height estimation, with a root mean squared error (RMSE) of 3.81 m and a mean absolute error (MAE) of 0.96 m for 2023, and an RMSE of 3.30 m and an MAE of 0.78 m for 2019, indicating robust performance across different time periods. Also, this model is applied in two other cities (Changzhou and Guiyang) and the results indicate good performance. In addition, the expandability of the framework is examined by incorporating higher-resolution ZY-3 imagery, for which the spatial resolution was increased to 2.5 m, highlighting the potential extension of the model to heterogeneous data sources. Overall, this study demonstrates the effectiveness of attention-based deep learning and multimodal data fusion for large-scale and fine-resolution building height estimation using open-source data.

Keywords: building height; deep learning; attention mechanism; U-net; multimodal data

Share and Cite

MDPI and ACS Style

Zhuang, Z.; Li, N.; Xiao, W.; Wu, J.; Zhou, L. ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data. ISPRS Int. J. Geo-Inf. 2026, 15, 146. https://doi.org/10.3390/ijgi15040146

AMA Style

Zhuang Z, Li N, Xiao W, Wu J, Zhou L. ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data. ISPRS International Journal of Geo-Information. 2026; 15(4):146. https://doi.org/10.3390/ijgi15040146

Chicago/Turabian Style

Zhuang, Zhanwu, Ning Li, Weiye Xiao, Jiawei Wu, and Lei Zhou. 2026. "ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data" ISPRS International Journal of Geo-Information 15, no. 4: 146. https://doi.org/10.3390/ijgi15040146

APA Style

Zhuang, Z., Li, N., Xiao, W., Wu, J., & Zhou, L. (2026). ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data. ISPRS International Journal of Geo-Information, 15(4), 146. https://doi.org/10.3390/ijgi15040146

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI