This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data
by
Zhanwu Zhuang
Zhanwu Zhuang 1,2
,
Ning Li
Ning Li 1,
Weiye Xiao
Weiye Xiao 2,*,
Jiawei Wu
Jiawei Wu 2
and
Lei Zhou
Lei Zhou 1
1
School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
2
State Key Laboratory of Lake and Watershed Science for Water Security, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 211135, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2026, 15(4), 146; https://doi.org/10.3390/ijgi15040146 (registering DOI)
Submission received: 13 January 2026
/
Revised: 11 March 2026
/
Accepted: 20 March 2026
/
Published: 26 March 2026
Abstract
Building height is a key indicator of vertical urbanization and urban morphological complexity, yet accurately mapping building height at fine spatial resolution and large spatial scales remains challenging. This study proposes an attention-based deep learning framework (ABHNet) for building height estimation at a 10 m spatial resolution by integrating multi-source remote sensing data and socioeconomic information. The model jointly exploits Sentinel-1 synthetic aperture radar data, Sentinel-2 multispectral imagery, and point of interest (POI) data. The proposed framework is evaluated in Shanghai, a megacity with dense and vertically complex urban structures, using Baidu Maps-derived building height data as reference information. The results demonstrate that the proposed method achieves accurate building height estimation, with a root mean squared error (RMSE) of 3.81 m and a mean absolute error (MAE) of 0.96 m for 2023, and an RMSE of 3.30 m and an MAE of 0.78 m for 2019, indicating robust performance across different time periods. Also, this model is applied in two other cities (Changzhou and Guiyang) and the results indicate good performance. In addition, the expandability of the framework is examined by incorporating higher-resolution ZY-3 imagery, for which the spatial resolution was increased to 2.5 m, highlighting the potential extension of the model to heterogeneous data sources. Overall, this study demonstrates the effectiveness of attention-based deep learning and multimodal data fusion for large-scale and fine-resolution building height estimation using open-source data.
Share and Cite
MDPI and ACS Style
Zhuang, Z.; Li, N.; Xiao, W.; Wu, J.; Zhou, L.
ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data. ISPRS Int. J. Geo-Inf. 2026, 15, 146.
https://doi.org/10.3390/ijgi15040146
AMA Style
Zhuang Z, Li N, Xiao W, Wu J, Zhou L.
ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data. ISPRS International Journal of Geo-Information. 2026; 15(4):146.
https://doi.org/10.3390/ijgi15040146
Chicago/Turabian Style
Zhuang, Zhanwu, Ning Li, Weiye Xiao, Jiawei Wu, and Lei Zhou.
2026. "ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data" ISPRS International Journal of Geo-Information 15, no. 4: 146.
https://doi.org/10.3390/ijgi15040146
APA Style
Zhuang, Z., Li, N., Xiao, W., Wu, J., & Zhou, L.
(2026). ABHNet: An Attention-Based Deep Learning Framework for Building Height Estimation Fusing Multimodal Data. ISPRS International Journal of Geo-Information, 15(4), 146.
https://doi.org/10.3390/ijgi15040146
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.